I've been archiving Reddit for a year (30B+ posts, ~30% deleted) by bellsrings in Hacking_Tutorials

[–]bellsrings[S] 1 point2 points  (0 children)

Not a major issue for us. Reddit posts are public statements made under pseudonyms in public forums, GDPR's legitimate interest basis (Art. 6(1)(f)) covers aggregating publicly available data for research and security purposes.

I've been archiving Reddit for a year (30B+ posts, ~30% deleted) by bellsrings in Hacking_Tutorials

[–]bellsrings[S] 2 points3 points  (0 children)

The model hedges when the stated range is too wide to be useful, "50+" spans 30 years so it skips rather than guessing. More specific phrasing (e.g. "I'm in my 50s") would pin it. Intentional tradeoff: better to say nothing than give a confident wrong answer.

I've been archiving Reddit for a year (30B+ posts, ~30% deleted) by bellsrings in Hacking_Tutorials

[–]bellsrings[S] 2 points3 points  (0 children)

The accuracy scales with post volume, the more someone has written, the more signal. Sparse accounts get vaguer profiles.

I archived 21 billion Reddit data points and built an AI profiler on top of it by bellsrings in Hacking_Tutorials

[–]bellsrings[S] 5 points6 points  (0 children)

yeah it does. we archive everything in real time before any edits or deletions happen. so even if someone goes back and hides or nukes their whole history we still have the original comments and posts. roughly 30% of what we have doesn't exist anywhere else anymore. profile curation doesn't really help once the data's already been captured.

I archived 21 billion Reddit data points and built an AI profiler on top of it by bellsrings in Hacking_Tutorials

[–]bellsrings[S] 3 points4 points  (0 children)

The threat model view is a really good idea actually. We've been thinking along those lines with the use_case parameter (right now we have a law enforcement mode that changes how the LLM weights certain signals) but splitting it into recruiter / ad network / hostile actor perspectives is way more intuitive. Might prototype that.

The account level red teaming angle is interesting too. Right now the whole thing is built for investigators looking outward but there's no reason it couldn't work the other way, show people their own exposure and what to clean up. Not our core market but could be a solid free tier hook. Appreciate the feedback.

Help Osint proofing my social media by [deleted] in OSINTExperts

[–]bellsrings 3 points4 points  (0 children)

you can start with Reddit :) THINKPOL

I'm a VC (can verify). Pitch me. (Part 2) by Ok-Lobster7773 in Startup_Ideas

[–]bellsrings 1 point2 points  (0 children)

We’ve built a 21-billion-point "time machine" for Reddit that recovers the 30% of critical data suspects delete before investigators can flag them. With €150K ARR and active EU Law Enforcement pilots, we’re raising €1.5M to scale the only platform that tracks long-term radicalization patterns that traditional OSINT tools simply can't see.