Create your own Pokemons!

KG6NrT3

Background

It’s been a long time!

During this year I achieved my reconversion from Academia to Data Science. I went through the Insight Data Science Program and got a job as a Data Scientist.

I will soon post something about my journey at Insight (it was great!) but to resume my blogger’s activity, I decided to start with something fun!

Video games are fun, neural nets are fun and lately there is a big fuzz about the new app Pokemon Go, sooo… what if capturing Pokemons was not enough, what about creating new ones?

A recurrent neural net to generate new Pokemons

Some time ago (a year!), Andrej Karpathy published an article on recurrent neural networks, The Unreasonable Effectiveness of Recurrent Neural Networks and released his code to train language models. While RNNs for NLP have been around for a while, the recent Data Science Summit in SF emphasized the higher performance of  the Long Short-Term Memory networks (LSTM). Andrej showed the model’s ability to learn not only the writing styles (Paul Graham, Shakespeare) but also the structure of documents allowing the model to successfully learn from and then reproduce a wide range of sources such as Wikipedia articles, LaTeX documents, and Linux Source code, and this character by character!

Very cool projects have been done using this method such as creating  job postings (by one of my fellow Jeremy Karnowski) or from Andrej’s talk in London: generating cooking recipesmusic sequences and even Magic cards.

The latter, in combination with Pokemon Go, inspired me for this project 🙂

To create new pokemons, I:

  1. Obtained training data from PokeAPI
  2. Use recurrent neural network to learn the Pokedex information
  3. Use the learned network to generate imaginary Pokemons

Obtaining Training Data

Here is the code to obtain the data

import pykemon
import re
import numpy as np
import requests
import json

client = pykemon.V1Client()

def get_useful_info_on_pokemon(pokemon_name):
    path = './Outputs/'
    f = open(path+pokemon_name+'.txt', 'w')
    # Query the API
    p = pykemon.get(pokemon=pokemon_name)
    # Write the info to the file
    f.write('Name: '+pokemon_name+'\n')

    f.write('POKEDEX DATA\n')
    #f.write('National_id: '+str(p.national_id)+'\n')
    str_types = 'Types: '
    for key in p.types:
        str_types += key+', '
    f.write(str_types+'\n')
    str_species = 'Species: '
    for key in p.species:
        str_species += key+', '
    f.write(str_species+'\n')
    f.write('Height: '+str(p.height)+'\n')
    f.write('Weight: '+str(p.weight)+'\n')
    str_abilities = 'Abilities: '
    for key in p.abilities:
        str_abilities += key+', '
    f.write(str_abilities+'\n')

    f.write('BASE STATS\n')
    f.write('HP: '+str(p.hp)+'\n')
    f.write('Attack: '+str(p.attack)+'\n')
    f.write('Defense: '+str(p.defense)+'\n')
    f.write('SP Attack: '+str(p.sp_atk)+'\n')
    f.write('SP Defense: '+str(p.sp_def)+'\n')
    f.write('Speed: '+str(p.speed)+'\n')
    f.write('Total: '+str(p.total)+'\n')

    f.write('TRAINING\n')
    f.write('EV yield: '+str(p.ev_yield)+'\n')
    f.write('Catch rate: '+str(p.catch_rate)+'\n')
    f.write('Base Happiness: '+str(p.happiness)+'\n')
    f.write('Base XP: '+str(p.exp)+'\n')
    f.write('Growth Rate: '+p.growth_rate+'\n')

    f.write('BREEDING\n')
    str_egg_groups = 'Egg Groups: '
    for key in p.egg_groups:
        str_egg_groups += key+', '
    f.write(str_egg_groups+'\n')
    f.write('Gender: '+str(p.male_female_ratio)+'\n')
    f.write('Egg cycles: '+str(p.egg_cycles)+'\n')
    str_evolutions = 'Evolutions: '
    for key in p.evolutions:
        str_evolutions += key+', '
    f.write(str_evolutions+'\n')

    f.write('DESCRIPTION\n')
    str_description = 'Description: '
    all_texts = []
    for key, value in p.descriptions.iteritems():
        description_id = re.split("/", value)[-2]
        info = pykemon.get(description_id= description_id)
        info_to_text = info.description.encode('ascii', 'replace').replace("\n", " ").replace("\x0c", " ").replace("??", "e").lower()
        .replace("\x0c", " ").lower()
        all_texts.append(info_to_text)

    for x in np.unique(all_texts):
        str_description+= " "+x
    f.write(str_description+'\n')

    f.write('\n')
    f.write('\n')
    f.write('\n')
    f.write('\n')

    f.close()

def get_pokemon_names():
    names_list = []
    url = 'http://pokeapi.co/api/v1/pokedex/1/'
    response = requests.get(url)
    if response.status_code == 200:
        data = json.loads(response.text)
        for key in data['pokemon']:
            names_list.append(key['name'])
    else:
        print 'An error occurred querying the API'
    return names_list

Then doing

mylist = get_pokemon_names()
for pokemon_name in mylist:
    print pokemon_name
    get_useful_info_on_pokemon(pokemon_name)

 

I outputted 778 Pokemon descriptions and gathered them all into one text document of about 550K characters.

Here are some examples of texts with some great illustrations!

Screen Shot 2016-07-18 at 6.56.42 PM

Screen Shot 2016-07-18 at 7.01.43 PM

Training the Recurrent Neural Network

To train the RNN, I closely followed the instructions from here using the same configuration as in the example

th train.lua -data_dir data/pokemon/ -rnn_size 512 -num_layers 2 -dropout 0.5 -gpuid -1

I let it run over night on my mac and in the morning it was done!
The code produces a set of files that represent checkpoints in training so we can take a look at the loss function:

loss_files = np.sort(os.listdir('./char-rnn/cv/'))
lfs = []
for lf in loss_files:
    lf = map(float,lf[13:-3].split('_'))
    lfs.append(lf)
lfs = np.array(sorted(lfs, key = lambda x: x[0]))
plt.plot(lfs[:,0],lfs[:,1])
plt.title('Loss function during training')
plt.ylabel('Loss')
plt.xlabel('Epoch')

Screen Shot 2016-07-19 at 10.02.59 PM

We can see the model achieved its best results around epoch 19. After this, the performance is getting worse probably due to overfitting.

New pokemons!

Let’s use the checkpoint that had the lowest validation loss (epoch 19) to produce samples from the model.

We can set the parameter

-length 10000 (default = 2000)

to generate several new pokemons.

th sample.lua cv/lm_lstm_epoch19.70_0.6921.t7 -gpuid -1 -length 10000

Looking at the output, I see that the model generate words that do not really exist which is not surprising looking at the “Pokemon vocabulary”.

I also noticed that Pokemon which the suffix “mega” in their names tend to be stronger and heavier than the other Pokemons.

Here are some fun examples.

This one has the ability “technician” and is pretty heavy for types “fairy” and “grass”. The description is somehow poetic.

Screen Shot 2016-07-19 at 8.12.17 AM

One called “magactor” has the abilities “gluttony, thick-fat”. Not good for summer I guess 🙂

A last one that I like because it is an analyst!

Screen Shot 2016-07-19 at 7.57.25 AM

I would love to illustrate these new pokemons!
I also would like to point to this French blogger/illustrator who already made his own (Lovecraft) version.
And don’t forget to catch’em all 🙂

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s