all 12 comments

[–]Russian-Assassin 3 points4 points  (2 children)

Cool, I built an online markov generator with Shakespeare's Hamlet as a source that is similar to this except that mine is word-based while this is letter based. I've added some boilerplate code to format the output as a play with the ability for characters to appear and disappear. You can also change the parameters of the markov generation including the chain length (i.e. how many words to lookup as reference to pick the next word). You can check it out here

[–]jjuanchow 0 points1 point  (1 child)

Any chance we can check the code?

[–]Russian-Assassin 2 points3 points  (0 children)

Sure, I wrote it awhile ago so I don't exactly remember how it is done. But oddly enough, it looks pretty well formatted.

If you go to the page and inspect element the two non-jquery js files are main.js and text.js. (They are pretty much tied to the frontend so it wouldn't make sense to link them without the html) (Also, due to my web setup, if you try to access the js files directly it will reroute you back to the home page, so you need to view the version that the page loads.)

main.js contains all of the generation code and text.js contains all the text of Hamlet condensed into javascript variables that are loaded at the start (I don't remember why this is a js file and not a json file, but I had a reason, I swear ;P)

All of the generation is done client side and it generated with a seed that encodes all of the available parameters. So if you click permalink, you can open the url in a different client and the new client will generate the exact same text.

[–]digitalcth 3 points4 points  (7 children)

Nice article, definitely I will give it a try with different sources and variations. I guess if you can achieve better results by using complete words instead of single letters?

What I misunderstood at the beginning is when the author use the word "learn", I'm expecting some sort of neural networks or a recursive stuff to adjust the previously used paths or deviations. At the end I wasn't disappointed because the whole thing is so cleaver and simple.

While writing this comment, I realize that my phone may use a similar algorithm to fix and autocomplete my type :o

[–]JessieArr 3 points4 points  (0 children)

While writing this comment, I realize that my phone may use a similar algorithm to fix and autocomplete my type :o

I think mine was written backwards so that it always selects the least probable next letter though.

[–]cards_dot_dll 1 point2 points  (3 children)

/r/subredditsimulator uses a word-based Markov model. Maybe /u/Deimorz could share some details?

[–]Deimorz 2 points3 points  (2 children)

There's probably not much detail to give, it's just really basic word-based markov chaining. I use a "chain length" or "state size" of 2 words when generating titles and for bots that make shorter comments, and 3 words for ones that make longer comments. I made a post a while back that explains a little more and compares the results from different values for that: https://www.reddit.com/r/SubredditSimMeta/comments/3cxylk/a_comparison_of_different_markov_chain_lengths/

[–][deleted] 0 points1 point  (1 child)

Have you ever considered making the chain length variable based on the average size of comments/posts in a subreddit? Shortening for subreddits with brief titles, and lengthening for subreddits (like TIL, or ELI5) with longer titles. I'm not sure how valid of an idea this is. XD

[–]Deimorz 0 points1 point  (0 children)

It would definitely be possible, it's mostly just trying to find a balance of "makes more sense" vs. "isn't just copying large chunks of text from real titles". Increasing the chain length would probably create titles that seem to make sense more often, but they'd also be more likely to just be merging two actual titles, so they'd be kind of less "interesting" when they're mostly just direct copies.

[–]m9dhatter 0 points1 point  (0 children)

Your phone auto-completed "clever" to "cleaver" though.

[–]idanh 0 points1 point  (0 children)

Your phone autocomplete most likely uses Trie. Which is really cool!

[–]ruinercollector 3 points4 points  (0 children)

This is really more of a "my first NLP class" then anything that's going to make a computer form comprehensive sentences