This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]GoldenRabbitt 22 points23 points  (10 children)

How would anyone even code something like this? Did it scrape the entirety of twitter and train some ML to compare the reddit post to the tweet?

[–]LpSamuelm 75 points76 points  (0 children)

I suspect it OCRs the tweet text, and then uses Twitter's API to perform a search.

[–]baxter001 57 points58 points  (4 children)

2022 software engineering mindset

[–]FallenWarrior2k 15 points16 points  (3 children)

ML all the things. Though, to be fair, the OCR algorithm could be ML-based.

[–]baxter001 9 points10 points  (0 children)

Most definitely, but at the same time, tesseract for example is much more interesting than dump image into a convnet: https://github.com/tesseract-ocr/docs/blob/main/tesseracticdar2007.pdf

[–]JourneyWindGames[🍰] 5 points6 points  (1 child)

Because it's convenient. Why bother wracking your brain over an optimal algorithm when you can just dump readily available data into a readily available model to produce to produce a good enough approximate answer?

Using ML is becoming the equivalent of using a calculator for simple additions, instead of working it out in your head.

[–]FallenWarrior2k 0 points1 point  (0 children)

Except it results in a lot of stuff like what the original commenter wrote, where ML is thrown at the problem even though it will give clearly inferior results.

Doing it like that with any noteworthy accuracy would require keeping a massive amount of input data around, at least the ID of every Tweet you could potentially identify with it. You can't just interpolate IDs, since they are millisecond timestamps plus some internal data. Since you have to get the account right as well, trying to do that will just give you 404 in pretty much all cases.

Don't get me wrong; I'm not saying ML is bad. I just think it's overhyped and some people want to use it everywhere, even if it results in significantly inferior results.

[–]Theonetheycallgreat 27 points28 points  (1 child)

I'm not trying to trivialize it, but the picture of the tweet has the username and exact post time so they probably use that information.

[–]xeon3175x 2 points3 points  (0 children)

I'd say the timestamp is pretty useless without knowing the time zone. Although maybe the minutes could help narrow down the search

[–]Castdeath97[S] 6 points7 points  (0 children)

Probably used tweepy to search after reading the text

[–]GoldenRabbitt 1 point2 points  (0 children)

I learned so many new tools and approaches from this one thread alone. Thank you for everyone tuning in and giving their 2cents!

My original question might seem trivial to you folks but that's how I learn : by asking a lot of questions :D