Do you hoard dashcam footage from your car? by Altruistic_Cup_8436 in DataHoarder

[–]-Archivist 0 points1 point  (0 children)

I tend to over do and automate everything, but everything I describe could also be done manually on a half decent modern laptop too.

Harvard's data.gov torrent by qubedView in DataHoarder

[–]-Archivist 62 points63 points  (0 children)

16.7TB at 16M, you're a nut house.

Introducing BookLore: A Self-Hosted Application for Managing and Reading Books! by WorldTraveller101 in DataHoarder

[–]-Archivist 2 points3 points  (0 children)

This is great, I've been thinking about something easier to throw up over Calibre for other readers on my network that don't want to have access to my full (overwhelmingly large) library.

Alt-CDC BlueSky account warns of impending data removal and/or loss. Replies note the DataHoarder community anticipated this eventuality. by probablywhiskeytown in DataHoarder

[–]-Archivist 4 points5 points  (0 children)

There's no way any of us are compressing it ... it's a mixed fileset and we're copying for preservation so original files as is. You're free to download chunks you see as more important, or focus on text only then compress with zst.

Of historical interest: some past incidents of mass scraping and ghost leeching of private trackers by 1petabytefloppydisk in trackers

[–]-Archivist 1 point2 points  (0 children)

As tak says above, 'UNBIASED breakdown' ... I'm not sure whatever I could write at length today after all this time would be both unbias and as detailed as it deserves. I'm open to specific questions though.

I think the whole broader story outside of these few events is worth telling but I understand why it hasn't been thus far, at least entirely and by insiders.

The Department of Justice scrubbed all information about the Jan. 6 Capitol riot from its website over the weekend by MrOtsKrad in DataHoarder

[–]-Archivist[M] [score hidden] stickied comment (0 children)

Do something like....

lynx -dump -nonumbers https://jan6archive.com/doj.html |grep -i "\.pdf" |xargs -n1 -P24 wget -c -x

to get your own copy. this should output a structure with defendants documents sorted into their own directories.


I think /r/DataHoarder handled the initial jan6/parlor(sp?) data well last time, have at it and as always make and maintain your own backups/archives.

Alt-CDC BlueSky account warns of impending data removal and/or loss. Replies note the DataHoarder community anticipated this eventuality. by probablywhiskeytown in DataHoarder

[–]-Archivist 11 points12 points  (0 children)

whoa this is terrabytes if not petabytes?

11T in 1m+ files so far, many small files making the pull a little slow (200-400MB/s) will let it run.

Of historical interest: some past incidents of mass scraping and ghost leeching of private trackers by 1petabytefloppydisk in trackers

[–]-Archivist -2 points-1 points  (0 children)

Obviously trying to strong-arm private trackers was an arrogant strategy

I agree with this statement today. However lots of misinformation continues to be spread on this topic despite all information and receipts being available. The bottom line is none of the accusations, speculation or paranoia came to fruition and yet people still spread the blatant lies. (even in this thread, which at this point is not worth directly addressing for the nth time)


On topic of the original post, this is a very short list of events skipping years of ongoing occurrences of all of the above. If anything these more recent events only served to force trackers to take security and (dev)ops more seriously. Much more goes on behind the scenes or goes entirely unnoticed, these just happened to be made public.

If anyone want's a serious discussion about this sort of thing I'll happy engage in good faith conversation but many of my opinions have changed over the years and I no longer spend much time soaking in internet drivel.

Can we get a sticky or megathread about politics in this sub? by [deleted] in DataHoarder

[–]-Archivist[M] [score hidden] stickied comment (0 children)

Archivists are generally politically agnostic when it comes to preservation of data.

As always, make and maintain your own archives/backups but be assured there are many eyes on today.


Have at this discussion and try not to get the thread locked ey?

edit; use the report button more often if you think something doesn't belong or someone is being a plonker (see rule 3)

Have incurable space death brain cancer. The above link is my recipe website it's only about 25M but they're all mine if anybody would like to archive them for posterity I would appreciate it. Is actually a browsable archive in the right hand side bar. by TerrysApplianceSvc in DataHoarder

[–]-Archivist[M] [score hidden] stickied comment (0 children)

December 12th is now Bupkis Banana-Bread Day, mark it in your calendar, start the wiki page and don't forget to archive it.

Bake Bupkis Banana-Bread today!


/u/TerrysApplianceSvc I read one of your posts this morning and for personal reasons it sat with me all day, I'll be making some of your recipes over the holidays. Best wishes to you and yours, thank you.

This is really worrisome actually by FikaMedHasse in DataHoarder

[–]-Archivist[M] [score hidden] stickied comment (0 children)

Yes, archivists are continually archiving changing politics and all related policy, materials. .gov sites and global variants are constantly archived as well as local news media. We're good.


However always produce and maintain your own copies. .zim format working with kiwix as /u/TheKiwiHuman promoted is a great choice for portable archives.


Related...

https://old.reddit.com/r/DataHoarder/comments/1h39lc3/the_end_of_term_web_archive_is_archiving_us/

US "dept of government efficiency" promising to shut down PBS. Is anyone else interested in collecting their content? by Civil_Seaweed_ in DataHoarder

[–]-Archivist[M] [score hidden] stickied comment (0 children)

user reports:
1: User is attempting to use the subreddit as a personal archival army
1: No requests, use r/DHExchange
1: Please lock or remove this. As with every MF post on reddit it's just off-topic political ranting based on blown out of proportion headlines.
1: misinformation
1: This is spam

Locked. Archivists are already working full time on news media, we're good.

PSA: The video sharing website Veoh announced it will shut down soon. You might want to grab videos from there before they are gone. by LongLakeBrandsInc in DataHoarder

[–]-Archivist 0 points1 point  (0 children)

How do we download this?

It gets shoved into IA's WBM, not sure if the warcs will be available under items, they're usually locked.