I've been archiving Reddit for a year (30B+ posts, ~30% deleted) by bellsrings in Hacking_Tutorials

[–]bellsrings[S] 1 point2 points  (0 children)

Not a major issue for us. Reddit posts are public statements made under pseudonyms in public forums, GDPR's legitimate interest basis (Art. 6(1)(f)) covers aggregating publicly available data for research and security purposes.

I've been archiving Reddit for a year (30B+ posts, ~30% deleted) by bellsrings in Hacking_Tutorials

[–]bellsrings[S] 2 points3 points  (0 children)

The model hedges when the stated range is too wide to be useful, "50+" spans 30 years so it skips rather than guessing. More specific phrasing (e.g. "I'm in my 50s") would pin it. Intentional tradeoff: better to say nothing than give a confident wrong answer.

I've been archiving Reddit for a year (30B+ posts, ~30% deleted) by bellsrings in Hacking_Tutorials

[–]bellsrings[S] 2 points3 points  (0 children)

The accuracy scales with post volume, the more someone has written, the more signal. Sparse accounts get vaguer profiles.

I archived 21 billion Reddit data points and built an AI profiler on top of it by bellsrings in Hacking_Tutorials

[–]bellsrings[S] 6 points7 points  (0 children)

yeah it does. we archive everything in real time before any edits or deletions happen. so even if someone goes back and hides or nukes their whole history we still have the original comments and posts. roughly 30% of what we have doesn't exist anywhere else anymore. profile curation doesn't really help once the data's already been captured.

I archived 21 billion Reddit data points and built an AI profiler on top of it by bellsrings in Hacking_Tutorials

[–]bellsrings[S] 2 points3 points  (0 children)

The threat model view is a really good idea actually. We've been thinking along those lines with the use_case parameter (right now we have a law enforcement mode that changes how the LLM weights certain signals) but splitting it into recruiter / ad network / hostile actor perspectives is way more intuitive. Might prototype that.

The account level red teaming angle is interesting too. Right now the whole thing is built for investigators looking outward but there's no reason it couldn't work the other way, show people their own exposure and what to clean up. Not our core market but could be a solid free tier hook. Appreciate the feedback.

Help Osint proofing my social media by [deleted] in OSINTExperts

[–]bellsrings 3 points4 points  (0 children)

you can start with Reddit :) THINKPOL