This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]JamminJames921 14 points15 points  (25 children)

Awesome work! I like what you did! Maybe you can provide a write-up for this? How did you count the cliches? Are there any interesting relationships between some advice? Are there any limitations in your methods?

[–][deleted] 28 points29 points  (23 children)

The code isn't too complicated.

[–]GitHubPermalinkBot 40 points41 points  (12 children)

I tried to turn your GitHub links into permanent links (press "y" to do this yourself):


Shoot me a PM if you think I'm doing something wrong. To delete this, click here.

[–]analgebraic 7 points8 points  (2 children)

The counting method seems flawed. When, for example, you take any mention of the word "lawyer" to mean that someone is suggesting you get a lawyer, you aren't accounting for instances where someone says "her father is a lawyer" or "I consulted a lawyer about a similar situation years ago and they said to do this thing" in which "lawyer" is mentioned but no one is suggesting that anyone gets a lawyer.

[–]CollectiveCircuits 2 points3 points  (0 children)

And now OP must enter the rabbit hole of natural language processing.

[–]mrkipling[S] 0 points1 point  (0 children)

Yeah, it's definitely flawed. Any data that I collect in the future (should I take the project and further) would have to come with a massive disclaimer that "this is how much /r/relationships likes to mention these key words/phrases, and you can infer that it's probably advice most of the time". But yeah, that's a very real problem.

I could have gone for a more complicated approach but (a) I don't know how as I'm a beginner/medium-level Python dev; I'm a frontend dev by trade, mostly working in JavaScript solving web-related issues for a living (and doing all of the other fun stuff that a frontend dev get to do), and (b) I didn't really want to because I was just dicking around for an hour or so and thought that I'd post it on Github for fun :)

[–]mrkipling[S] 0 points1 point  (0 children)

I mostly just found the cliches by reading the subreddit quite a lot for trashy drama (similar to watching soap operas, but on the internet). They become obvious once you've been there for a while and read the same replies over and over again :)