Training Karpathy's char-rnn to talk like Snooki - What happens when AI watches TV (hint, it's not very intelligent)

Motivation:

Many people trained Andrej Karpathy's RNN from this blog post on text that is highly complex, such as Shakespeare, the Bible, or even Eminem's lyrics with its puns, rhymes and meter. Wondering how I could get some different results, I decided to see whether the network could learn to find structure from something completely devoid of any intellectual substance: Cable TV.

Initially, I wanted to train the network on John Madden's commentary and then create a program to read aloud, in the voice of John Madden, the commentary text that the char RNN would generate. Unfortunately, I couldn't find any text of his scripts online (if anyone has access to them, PLEASE DO contact me). Not to be discouraged, I soon decided on the scripts from Jersey Shore and Hannah Monatana as suitable alternatives. Since I grew up in a house where both parents were teachers and TV was banned and books were encouraged, I decided to conduct an experiment by training some networks on TV and others on books. For comparison, I trained another two RNNs on some quality books like the Lord of the Rings and David Copperfield.

General Procedure:

  1. I downloaded the scripts and texts ( Jersey Shore seasons 3-6 from this website, Hannah Montana seasons 1-4 from the same website, the 3 books of The Lord of the Rings Trilogy from here and finally David Copperfiel from here ).

  2. I trained the network on Karpathy's char-rnn using my macbook pro's cpu (which took hours and hours) varying the dropout and number of neurons in response to (usually) overfitting and (sometimes) underfitting.

  3. I sampled the results from the checkpoint with the lowest loss on the validation data at times varying the temperature and sometimes seeding the LSTM net with some text.

Sampling of some of my favorite results from each of the networks. If you want to see the full results, here they are. I'll let you guess which sample corresponds to which TV show or Book (answers at the bottom):

1)
Oh, no, no, no, no, no, no, no, no, no, no.
They're not gonna be the one who said I was gonna see that to get in there?

2)
- The meatballs see something of the beach.
- Oh, my God, I'm going to the better way a getter finished of my matter and good to and that's the drunk.
- Yeah.

3)
'I mean, this is a funny thing you're not staying for my boyfriend like this, and then I'm not the only one who wants to be the one of the party that he said I don't think I don't have a hair and you want to be a real movie for yourself.
- Oh, yeah.
I don't know.
- Oh, yeah, I said that I don't say you can do that.
I am so good.

4)
'It is not a word,' said I, 'I have no doubt.'
'I was so sure, and she will be so much as to be a good deal of the seat, and what a word of a school-coach, one friend,' said Mr. Peggotty.
'I don't know what I was a bright distance of trust, and so much about to be the more than it was a little distracted the man of her company

5)
Mr. Micawber was alone, and the sort of thing was passing from his stranger and her hands and little object, which was the door, and could not look at the time, and said that I could have a subject than I had a letter in a water, though I don't know what was going to be such a little past spoken, and stopped on the state of the spirits

6)
there were not going in the land of the West, and maybe the company was made again. The long stars were all the hobbits and had seemed to be such a man of men, and the halfling was set out of the House of the North.

7)
the strength of the Shire and the stream of the water and the stream of the stream that was still the strength of the stream of the stream.

8)
Mike for you.
Bro.
I can just do at it.
What happened and this is your hair.
- Can not go to go to the plate what I want to be people he wants to do in, let me ask you?
- Yeah.
- Bewier that I know I don't know what I mean? Gay.
- Sam would think you're the starting, and we're in the girls.
I want to do it.
I can not just know what?
- Good care, so I don't know what I did not have a good with you.
- Oh, my God, drunk!
- I don't know.
Plays, I really can't go at a lot.
- You can't say it? What the [Bleep].

Conclusion

After playing around with the char-rnn for a while, I found that it learns a lot like a preschool age child that doesn't quite understand context, but can grasp the basic structure of a sentence and learn words. Personally, I thought that even without looking at which training set I sampled from, I could easily distinguish based on the vocabulary and sentence structure, not only whether the generated text was based on TV scripts or books, but also could distiguish between LOTR and David Copperfield or between Jersey Shore and Hannah Monatana.

The output in 7 indicates that the network was being overly conservative in it's output since I had the temperature set to a low 0.25, and as a result, repeated itself over and over, much like a shy 3 year old that will only tell others that he likes Thomas trains. I found the temperature parameter to be analogous to how shy a child is. Generally, I found that the more data the net was trained on, the higher I could raise the temperature without getting completely nonsensical output.

I thought it was fitting (no pun intended) that my TV kiddies used very short sentences and had a limited vocabulary compared to my book kiddies who go into great detail and had a large vocabulary. In general, I also found that the results from the TV shows were more convincing than the results from the books. This obviously suggests that the structure involved in conversation in quality literature is much more complex and nuanced than the intellectual diarrhea that cable producers throw in our faces.

If you'd like to see the full results and how I got them check out my project page

Answers from sample results:

  1. Hannah Montana
  2. Jersey Shore
  3. Hannah Montana
  4. David Copperfield
  5. David Copperfield
  6. The Lord of the Rings
  7. The Lord of the Rings
  8. Jersey Shore

Comments:

Analytics

Written on July 29, 2015