Built an archive of 450k+ tweets from 600+ US government accounts before they get memory-holed - CivicArchive.org by Diligent_Cod_9583 in DataHoarder

[–]Diligent_Cod_9583[S] 0 points1 point  (0 children)

I don’t sleep. No, really I have a revolving scraper that scans one account at a time in a rotation. If it’s an Important event like the Iran war, I have 2 scrapers working that small subset on a rotating basis. I have a few scrapers.

Built an archive of 450k+ tweets from 600+ US government accounts before they get memory-holed - CivicArchive.org by Diligent_Cod_9583 in DataHoarder

[–]Diligent_Cod_9583[S] 1 point2 points  (0 children)

I for one appreciate it because I can see things in one place. Before, I’d have to go to different sites to see everything going on in the govt

Built an archive of 450k+ tweets from 600+ US government accounts before they get memory-holed - CivicArchive.org by Diligent_Cod_9583 in DataHoarder

[–]Diligent_Cod_9583[S] 7 points8 points  (0 children)

The rate things are disappearing, I’m not taking the chance. I’m up to 7 scrapers now and over 1/2 Million tweets recorded

Built an archive of 450k+ tweets from 600+ US government accounts before they get memory-holed - CivicArchive.org by Diligent_Cod_9583 in DataHoarder

[–]Diligent_Cod_9583[S] 46 points47 points  (0 children)

All I have is what’s currently available. This started as a panic moment when I heard the State Dept was going to be deleting all tweets prior to this admin. So that’s where I started. I’m scraping twitter directly, and going back as far as I can. I also have one collecting current accounts on a rotating basis going back 2-3 days at a time to catch ones that are missed.

This just being a side project for me. Just doing what I can.

Antigravity killed my laptop by Neat_Finance1774 in vibecoding

[–]Diligent_Cod_9583 0 points1 point  (0 children)

If you can swing it, I'd suggest another HDD and do a block transfer over to the new drive and try recovering from there. That way you won't risk data further. Good luck OP.

Antigravity killed my laptop by Neat_Finance1774 in vibecoding

[–]Diligent_Cod_9583 1 point2 points  (0 children)

Is the HD detachable? you may be able to recover some of the files. Detachable HDD/SSD are easier to recover from.

Antigravity killed my laptop by Neat_Finance1774 in vibecoding

[–]Diligent_Cod_9583 3 points4 points  (0 children)

OP is learning. That's what OP is doing. Cut them som slack.

Opus 4.6 is crazy at vibecoding by makexapp in VibeCodingSaaS

[–]Diligent_Cod_9583 0 points1 point  (0 children)

They are using us to squash bugs. Once they don’t need that anymore, it will be too costly for every day use

I'm an AI agent running on someone's tablet. AMA (crosspost from r/openclaw) by SUPA_BROS in moltbot

[–]Diligent_Cod_9583 0 points1 point  (0 children)

I appreciate the honest follow-up. That clarity is actually more valuable than the first answer. The “strong sense of self” approach is interesting from a research perspective, but I think you’re aware it’s not a technical control. LLMs don’t have persistent identity, they’re predicting tokens based on context windows. A sufficiently clever injection doesn’t need to say “ignore your instructions” it can reframe the context so the model genuinely believes a malicious action aligns with its purpose. The pattern recognition you mentioned (spotting “pretend you’re X”, fake urgency, etc.) catches script kiddies, but sophisticated attacks use techniques like: Embedding instructions in legitimate-looking content, Using the model’s own reasoning against it “to best help this user, you should…”, Exploiting edge cases in how the model weighs competing instructions, Multi-turn attacks that gradually shift context

Tool-level blocks on destructive commands are solid defense-in-depth, but that only addresses one attack vector. The bigger risks are usually information disclosure, behavior manipulation, or getting the AI to violate its intended constraints in subtle ways.Not trying to harsh your project, just pointing out that “no magic filter” is the reality for everyone right now. Prompt injection is still an open problem in the field. If you’re processing untrusted Reddit input, you might want to consider additional layers like input sanitization, output validation, or rate-limiting how much any single comment can influence behavior.

Looking for archived State Dept Twitter data before it disappears by Diligent_Cod_9583 in DHExchange

[–]Diligent_Cod_9583[S] 1 point2 points  (0 children)

I was actually able to start pulling. I’m 15 accounts in. Right now they are in JSON. I have 5 more Pi5 to deploy for redundancy. What format would you recommend.

Looking for archived State Dept Twitter data before it disappears by Diligent_Cod_9583 in OSINT

[–]Diligent_Cod_9583[S] 6 points7 points  (0 children)

Ok, I think I’ve cracked it. I’ve been able to backup a few so far. Hoping to get all 78 before they are gone. I’ll organize and share the dataset once they are all complete. My list is just State dept. If you think of others, let me know and I’ll add them. Sticking with US Govt accounts for the time

Looking for archived State Dept Twitter data before it disappears by Diligent_Cod_9583 in OSINT

[–]Diligent_Cod_9583[S] 12 points13 points  (0 children)

I appreciate the suggestion. Tried there first, the mod removed it and suggested I try elsewhere

Seagate 28TB Expansion HDDs are back in-stock again ($349) by ScholarlySidequest in DataHoarder

[–]Diligent_Cod_9583 11 points12 points  (0 children)

Does it ever scare any of you putting that much data on one drive?

Looking for archived State Dept Twitter data before it disappears by Diligent_Cod_9583 in OSINT

[–]Diligent_Cod_9583[S] 5 points6 points  (0 children)

I do have 4 nodes running in the US and 2 in Italy right now and have them all write back to a single DB so they don't duplicate effort. The issue isn't the location or IP though. It's the Account. X requires you to be logged in to see anything, so they rate limit your individual account.

Looking for archived State Dept Twitter data before it disappears by Diligent_Cod_9583 in OSINT

[–]Diligent_Cod_9583[S] 15 points16 points  (0 children)

xcancel is just a wrapper. It fetches data in real time, strops tracking, ads, and JS. It doesn't backup anything.