It’s been a long time!
During this year I achieved my reconversion from Academia to Data Science. I went through the Insight Data Science Program and got a job as a Data Scientist.
I will soon post something about my journey at Insight (it was great!) but to resume my blogger’s activity, I decided to start with something fun!
Video games are fun, neural nets are fun and lately there is a big fuzz about the new app Pokemon Go, sooo… what if capturing Pokemons was not enough, what about creating new ones?
A recurrent neural net to generate new Pokemons
Some time ago (a year!), Andrej Karpathy published an article on recurrent neural networks, The Unreasonable Effectiveness of Recurrent Neural Networks and released his code to train language models. While RNNs for NLP have been around for a while, the recent Data Science Summit in SF emphasized the higher performance of the Long Short-Term Memory networks (LSTM). Andrej showed the model’s ability to learn not only the writing styles (Paul Graham, Shakespeare) but also the structure of documents allowing the model to successfully learn from and then reproduce a wide range of sources such as Wikipedia articles, LaTeX documents, and Linux Source code, and this character by character!
Very cool projects have been done using this method such as creating job postings (by one of my fellow Jeremy Karnowski) or from Andrej’s talk in London: generating cooking recipes, music sequences and even Magic cards.
The latter, in combination with Pokemon Go, inspired me for this project 🙂
To create new pokemons, I:
- Obtained training data from PokeAPI
- Use recurrent neural network to learn the Pokedex information
- Use the learned network to generate imaginary Pokemons
Obtaining Training Data
Here is the code to obtain the data
import pykemon import re import numpy as np import requests import json client = pykemon.V1Client() def get_useful_info_on_pokemon(pokemon_name): path = './Outputs/' f = open(path+pokemon_name+'.txt', 'w') # Query the API p = pykemon.get(pokemon=pokemon_name) # Write the info to the file f.write('Name: '+pokemon_name+'\n') f.write('POKEDEX DATA\n') #f.write('National_id: '+str(p.national_id)+'\n') str_types = 'Types: ' for key in p.types: str_types += key+', ' f.write(str_types+'\n') str_species = 'Species: ' for key in p.species: str_species += key+', ' f.write(str_species+'\n') f.write('Height: '+str(p.height)+'\n') f.write('Weight: '+str(p.weight)+'\n') str_abilities = 'Abilities: ' for key in p.abilities: str_abilities += key+', ' f.write(str_abilities+'\n') f.write('BASE STATS\n') f.write('HP: '+str(p.hp)+'\n') f.write('Attack: '+str(p.attack)+'\n') f.write('Defense: '+str(p.defense)+'\n') f.write('SP Attack: '+str(p.sp_atk)+'\n') f.write('SP Defense: '+str(p.sp_def)+'\n') f.write('Speed: '+str(p.speed)+'\n') f.write('Total: '+str(p.total)+'\n') f.write('TRAINING\n') f.write('EV yield: '+str(p.ev_yield)+'\n') f.write('Catch rate: '+str(p.catch_rate)+'\n') f.write('Base Happiness: '+str(p.happiness)+'\n') f.write('Base XP: '+str(p.exp)+'\n') f.write('Growth Rate: '+p.growth_rate+'\n') f.write('BREEDING\n') str_egg_groups = 'Egg Groups: ' for key in p.egg_groups: str_egg_groups += key+', ' f.write(str_egg_groups+'\n') f.write('Gender: '+str(p.male_female_ratio)+'\n') f.write('Egg cycles: '+str(p.egg_cycles)+'\n') str_evolutions = 'Evolutions: ' for key in p.evolutions: str_evolutions += key+', ' f.write(str_evolutions+'\n') f.write('DESCRIPTION\n') str_description = 'Description: ' all_texts =  for key, value in p.descriptions.iteritems(): description_id = re.split(&quot;/&quot;, value)[-2] info = pykemon.get(description_id= description_id) info_to_text = info.description.encode('ascii', 'replace').replace("\n", " ").replace("\x0c", " ").replace("??", "e").lower() .replace(&quot;\x0c&quot;, &quot; &quot;).lower() all_texts.append(info_to_text) for x in np.unique(all_texts): str_description+= &quot; &quot;+x f.write(str_description+'\n') f.write('\n') f.write('\n') f.write('\n') f.write('\n') f.close() def get_pokemon_names(): names_list =  url = 'http://pokeapi.co/api/v1/pokedex/1/' response = requests.get(url) if response.status_code == 200: data = json.loads(response.text) for key in data['pokemon']: names_list.append(key['name']) else: print 'An error occurred querying the API' return names_list
mylist = get_pokemon_names() for pokemon_name in mylist: print pokemon_name get_useful_info_on_pokemon(pokemon_name)
I outputted 778 Pokemon descriptions and gathered them all into one text document of about 550K characters.
Here are some examples of texts with some great illustrations!
Training the Recurrent Neural Network
To train the RNN, I closely followed the instructions from here using the same configuration as in the example
th train.lua -data_dir data/pokemon/ -rnn_size 512 -num_layers 2 -dropout 0.5 -gpuid -1
I let it run over night on my mac and in the morning it was done!
The code produces a set of files that represent checkpoints in training so we can take a look at the loss function:
loss_files = np.sort(os.listdir('./char-rnn/cv/')) lfs =  for lf in loss_files: lf = map(float,lf[13:-3].split('_')) lfs.append(lf) lfs = np.array(sorted(lfs, key = lambda x: x)) plt.plot(lfs[:,0],lfs[:,1]) plt.title('Loss function during training') plt.ylabel('Loss') plt.xlabel('Epoch')
We can see the model achieved its best results around epoch 19. After this, the performance is getting worse probably due to overfitting.
Let’s use the checkpoint that had the lowest validation loss (epoch 19) to produce samples from the model.
We can set the parameter
-length 10000 (default = 2000)
to generate several new pokemons.
th sample.lua cv/lm_lstm_epoch19.70_0.6921.t7 -gpuid -1 -length 10000
Looking at the output, I see that the model generate words that do not really exist which is not surprising looking at the “Pokemon vocabulary”.
I also noticed that Pokemon which the suffix “mega” in their names tend to be stronger and heavier than the other Pokemons.
Here are some fun examples.
This one has the ability “technician” and is pretty heavy for types “fairy” and “grass”. The description is somehow poetic.
One called “magactor” has the abilities “gluttony, thick-fat”. Not good for summer I guess 🙂
A last one that I like because it is an analyst!
I would love to illustrate these new pokemons!
I also would like to point to this French blogger/illustrator who already made his own (Lovecraft) version.
And don’t forget to catch’em all 🙂