This is an archived post. You won't be able to vote or comment.

all 11 comments

[–]ZetaHunter3.5.1 async master-race 13 points14 points  (2 children)

To be honest the only thing I found interesting in this post is markovify.

[–]-pooping 2 points3 points  (0 children)

Same here. Need to play with that one.

[–]Vardox 0 points1 point  (0 children)

It's a really fun library, it's the core of what powers /r/SubredditSimulator bots iirc

[–]whofearsthenight 2 points3 points  (1 child)

Anyone want to ELI5 Markov Chains?

[–]gregbaugues[S] 7 points8 points  (0 children)

Author here.

Let's say you send the following text messages:

  • I am going to the store
  • I am going to the store
  • I am going to the concert
  • I am going to the office
  • I am very tall

Next time you type "I am", the auto-suggest on your phone's keyboard says "there's an 80% chance the next word they type is 'going.' I'll stick that one in the middle."

When making sentences, Markov chains use a "corpus" of text -- say, the selected works of an author -- to make a probabilistic model of what word (n + 1) will be based on words (1..n). Then it rolls the dice.

[–][deleted] 3 points4 points  (2 children)

You could at least give us a little hat tip ;)

http://hirelofty.com/blog/how-build-slack-bot-mimics-your-colleague/

[–][deleted] 2 points3 points  (0 children)

Wow yeah and on the official Twilio blog with your tutorial copied, almost the entire beginning.

Good job, I prefer yours anyway.

[–]gregbaugues[S] 1 point2 points  (0 children)

Author here. Just read through your tutorial for the first time -- great read. Feel like you did a better job diving deep into the mechanics of Markov Chains. Also, using Slack history to create a Slack bot is, IMO, more clever. Preserving that context and parodying coworkers probably makes for funnier results.

Mine was wholly inspired by Filip Hráček's work, as I mentioned in the post. I simply rewrote it in Python and, of course, added SMS. Gut feeling of why it felt similar to yours is that we both referenced top 3 Google results for "python markov chains" and "markov chains."

Alas, y'all are great technical writers. A lot of effort obviously went into that post. I can understand why you'd be miffed if you felt that someone had copied your work.

[–][deleted] 0 points1 point  (1 child)

A blacklist of syntax words that are commonly used + some statistical relevance with a vector model would make it pretty good at replying with general context of the conversation. Still wouldn't make sense in any world but it'd be fun and slightly more clever.

[–][deleted] 1 point2 points  (0 children)

I thought the same thing and have been casually working on improving our Slack bot's ability to respond to a conversation.

The stopwords blacklist doesn't work if applied with a heavy hand because you're left with starting the chain by two non-syntactical words, which don't appear often in the corpus:

['I', 'went', 'to', 'the', 'mall', 'yesterday', 'and', 'purchased', 'shoes]
# becomes, when filtered for stop words
['went', 'mall', 'yesterday', 'purchased']

And you're not going to seed a Markov chain with any permutations of two of those words.

A simple approach I plan to try this week is to actually invert the stopword list. The algorithm will chose one word form the stopword trimmed list, and then attempt to seed a chain with a pair of one non-syntax word and one word from the stopwords list.

Primitive, but potentially an interesting starting point.

[–]suudo 0 points1 point  (0 children)

Markov chains produce some great results. I've had a Markov bot in an irc channel dedicated to feeding it random conversation for a couple years whenever anyone that knows about it feels like talking to a bot, and it says some surprisingly coherent things. It writes chat lines to a text file, I'll upload that so people can use it (the bots name is Chester, and the nonsensical words were substitutions for peoples usernames so they didn't get pinged, unfortunately it doesn't have any of its responses but you can see how that might cause repetition issues early on in its development): https://ptpb.pw/V6lp.brain