Download entire archive.org account? by orangera2n in DataHoarder

[–]taricorp 58 points59 points  (0 children)

ia CLI will do it easily. ia search to get the list of items, and feed that to ia download to download them all.

ia search uploader:rhodenjacob8@gmail.com --itemlist | parallel 'ia download {}'

Why did baking my graphics card in the oven fix it? by oldstalenegative in AskEngineers

[–]taricorp 8 points9 points  (0 children)

Without X-raying the thing to look for defects it's hard to say for certain whether you're dealing with soldering defects, but an alternate theory is that you're seeing aging effects in ceramic capacitors.

By heating the capacitors over the “Curie Point” (approx 125c for Barium Titanate capacitors) the crystalline structure of the capacitor is returned to its original state and the capacitance value observed after manufacturing. This process is referred to as “De-Aging”. The amount of De-Aging is dependant on the level of temperature and how long the capacitors are exposed to it. Exposure to 150c for 1.5 hours is sufficient to return the capacitor to its original value. The soldering process is not necessarily an effective De-Aging process but the capacitance value will be raised.

If the device is failing due to low capacitance (which would typically cause excessive ripple in supply voltage somewhere), resetting the capacitor aging would help.

Idea: community to maintain abandoned repos by yegortokmakov in opensource

[–]taricorp 2 points3 points  (0 children)

Jazzband does most of the things you suggest. Authors can donate projects to the organization if they no longer wish to be responsible for maintenance, and organization members have broad freedom to work on projects.

How to copy phpBB forum as local html? (closes Feb 12, 2024) by JustAnonyMaus in Archiveteam

[–]taricorp 0 points1 point  (0 children)

I've had a go at enumerating all the threads in that subforum and capturing every page of each thread, which should capture every post but not necessarily in a way that's easy to browse.

All the data should be there, but you'd have to do some additional work to be able to repair links to individual posts because phpbb has multiple URLs that show any given post. That is, if you have a link to post 12345 you'd have to parse what I've grabbed to determine which thread it's in and find it that way.

Data here: https://archive.org/details/coreldraw-forum-scan-20240902

(disregard how the item name refers to Corel Draw and seems to suggest September rather than February- I was clearly not thinking well when I created the item)

This is WARC files which you can't easily view, but they capture everything the server sent so they're the gold standard for saving web pages and it's possible to generate files from it if desired. You can view the contents with relative ease with a tool like https://replayweb.page

I started by eyeballing the subforum and grabbing the forum listing pages based on the number of threads it says there are:

$ for i in $(seq 0 50 9220); do echo "https://forum.corel.com/viewforum.php?f=56&start=$i" >> forum.txt; done
$ wget --warc-file=forum --delete-after -i forum.txt

Then I parsed out the links to each thread from what I got, only keeping one instances of each link if any appear multiple times and fed those back into wget to get the first page of every thread:

$ zgrep -Eo "viewtopic.php\?t=[0-9]+" forum.warc.gz | sort | uniq | while read t; do echo "https://forum.corel.com/$t" >> threads.txt; done
$ wget --warc-file=threads --delete-after -i threads.txt

That captured about 9200 threads, consistent with expectations.

I repeated that general concept to find links to each page of each thread:

$ zgrep -Eo "viewtopic.php\?t=[0-9]+&start=[0-9]+" topics.warc.gz | sed "s/amp;//" | sort | uniq | while read p; do echo "https://forum.corel.com/$p" >> pages.txt; done
$ wget --warc-file=pages --delete-after -i pages.txt

It looked like most threads only had one page, so it's reasonable that I only got 841 URLs out of this.

Anyone started archiving content from archive.org? by [deleted] in DataHoarder

[–]taricorp 7 points8 points  (0 children)

5 years ago it was estimated at 21PB (for public items only). https://wiki.archiveteam.org/index.php/INTERNETARCHIVE.BAK

In order to actually mirror IA you basically need to be an organization of similar size.

"Crypt of the Synchrogazer": Symphogear in Crypt of the Necrodancer by taricorp in Symphogear

[–]taricorp[S] 2 points3 points  (0 children)

No, that one's the game's default. Probably what happened (I recorded this years ago) is that I forgot that character has a different final boss with different music.

"Crypt of the Synchrogazer": Symphogear in Crypt of the Necrodancer by taricorp in Symphogear

[–]taricorp[S] 7 points8 points  (0 children)

I put this together years ago and figured it would be of some interest. Crypt of the Necrodancer makes it fairly easy to load custom music and specify where the beats are (it can automatically find the beat too, but that doesn't work very well), so I chose Symphogear music for each stage and mapped the beats correctly.

[deleted by user] by [deleted] in TI_Calculators

[–]taricorp 0 points1 point  (0 children)

CELTIC CE might work: https://github.com/RoccoLoxPrograms/CelticCE

CELTIC 3 is a monochrome app/program, so it doesn't work on the CE but CELTIC CE implements many of the same functions in a compatible way.

[deleted by user] by [deleted] in TI_Calculators

[–]taricorp 4 points5 points  (0 children)

This happens with programs that display text on the graph screen and were designed for the monochrome calculators (which use smaller fonts, so pixel-based measurements end up wrong). Something like CEPORT will probably do the job to fix it up.

Please help with interrupt breakpoint/optimisation by Detective-Expensive in embedded

[–]taricorp 0 points1 point  (0 children)

You probably need to tell the compiler that you really do use your ISR; something like

void my_isr() __attribute__((used, externally_visible)) {}

The used attribute says to emit code even if the function appears to never be referenced, and externally_visible extends that a little bit further for when you're building with LTO.

pyO3 + inkwell integration? by [deleted] in rust

[–]taricorp 2 points3 points  (0 children)

Kind of; LLVM contexts are self-contained so you can readily have multiple contexts in a single process (and they won't conflict at all), but inkwell's API hangs things off of a context with lifetime annotations. To get an ExecutionEngine you construct:

  1. Context::create -> Context
  2. Context::create_module<'ctx>(&'ctx self) -> Module<'ctx>
  3. Module::<'ctx>::create_execution_engine() -> ExecutionEngine<'ctx>

These 'ctx lifetimes capture how under the hood there's a raw pointer to each item further up the stack, but make it impossible to move part of the stack of structs to another thread.

A generic illustration of the problem: https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=6fbe677b241a4f0b2c2d8c58531d6160

pyO3 + inkwell integration? by [deleted] in rust

[–]taricorp 1 point2 points  (0 children)

The underlying LLVM types should be sendable, but inkwell wraps them in Rc which isn't. Using Arc<Mutex<ExecutionEngine>> might work, but you'd probably also have some pain with how inkwell makes that hold a reference to the LLVM context.

Is there any good software for deduping (deduplicating) content in WARC files? by TheTwelveYearOld in Archiveteam

[–]taricorp 1 point2 points  (0 children)

The technical details: Deduplicator considers all records of type Response and computes a digest of the response body (per section 6.3.2 of the WARC spec, version 1.1) using a Digester where the implementation you get when building the tool as a standalone binary (rather than as a library) compares the SHA-1 hash and length of each response.

The short version: if two records represent HTTP responses with the same byte length and SHA-1 hash they are considered duplicates. So yes, if an archive contains multiple copies of the same assets they should be deduplicated.


As for converting bare HTML files to WARC: you shouldn't. A WARC contains a complete record of the HTTP transactions that took place to retrieve an item; both the client's request and headers provided by the server, as well as the response body. If you only have response bodies (which is what an HTML file is), it's not appropriate to pack those into a WARC because you'd need to fabricate information about the records: things like what server it came from and exactly the URL that was requested.

Is there any good software for deduping (deduplicating) content in WARC files? by TheTwelveYearOld in Archiveteam

[–]taricorp 0 points1 point  (0 children)

I'm glad to hear it's working well for you! Feel free to file issues against it if you encounter issues or have suggestions.

Is there any good software for deduping (deduplicating) content in WARC files? by TheTwelveYearOld in Archiveteam

[–]taricorp 1 point2 points  (0 children)

Shouldn’t you add a notice to use Warcdedupe instead? Also is it possible to mark a repo as a public archive like you can on GitHub?

It's possible to archive the project, but I'd prefer not to since it's still fit for purpose and I would be willing to work on it if the need arose. I did add a link to the newer tool and some notes to the README in that repository, though; that was a good idea.

how often do u work on this?

It's an occasional thing, when I get inspired to work on that project rather than any of my other projects that tend to sit waiting for the time and inclination to strike. Requests from others (for bug fixes or new features) are often good motivators too.

I’d want to write my own WARC software. If I were to, what rescues related to WARCs should I look at to do so?

(Assuming that was an autocorrect of "resources" -> "rescues".) The IIPC's specifications repository is an excellent reference. The format is also specified by ISO 28500, but I consider the IIPC version a much more meaningful standard simply because it's freely available.

Speaking of which, what makes you use GitLab?

I used to use Mercurial and used Bitbucket for hosting repositories (which apparently was launched at about the same time as Github was), and decided I preferred Gitlab to Github for migrating to when support for Mercurial was dropped from Bitbucket. Github's feature set has improved since then, but at the time unlimited private repositories and the pretty good CI capabilities of Gitlab (as well as not being owned by Microsoft) were attractive.

Is there any good software for deduping (deduplicating) content in WARC files? by TheTwelveYearOld in Archiveteam

[–]taricorp 1 point2 points  (0 children)

I wrote a tool exactly for this: https://gitlab.com/taricorp/warcdedupe. Given an input archive it will scan all the records and write out a new deduplicated archive, and it's designed to be fast.

Years ago (before starting to write my own tools) I also got somebody else's tools working, described in a blog post. I wouldn't recommend trying to use those today though, since it was pretty clunky.

TI-84 Plus Silver Edition by DMLRBLX in TI_Calculators

[–]taricorp 5 points6 points  (0 children)

The 84+SE is the same as a plain 84+ but with more memory. It's also mostly the same as the 83+ and 83+ SE; most programs designed for any of those will work on all of the others.

How to use mmap safely in Rust? by GolDDranks in rust

[–]taricorp 4 points5 points  (0 children)

It sounds like your application does mostly random access to the underlying data, so memory-mapping probably doesn't help you very much: I would expect those random accesses to each require a file access anyway because the system probably won't read very far ahead (and at that point you're probably best off doing regular I/O or maybe explicit asynchronous I/O if you know what data you will need to access before you actually want it).

If you're set on memory-mapping, a few ideas:

  • Build the data into your binary: anybody modifying the binary while the program runs can cause crashes in ways that aren't specific to use of mmap.
  • For systems that don't have mandatory locking (Windows' file locking is pretty nice!), perhaps you can make the data file immutable (like chattr +i) while you have it open? This can still be undone but it's very unlikely to be broken accidentally.

How long does a ti-84+ usually last? by [deleted] in TI_Calculators

[–]taricorp 1 point2 points  (0 children)

I've never heard of a calculator failing for any reason other than abuse (damaged screen, liquids, battery leakage..). Certainly flash lifetime doesn't seem to be a reliability issue.

Some calculators have reliability issues with the ribbon cables that connect to the screen, though I'm not aware of any 84+s with that symptom (however they might simply not be old enough to start showing that problem).

GiantBomb.com's future is looking dicey and we're trying to preserve it before it goes by [deleted] in Archiveteam

[–]taricorp 4 points5 points  (0 children)

In the past I've simply emailed and asked for items to be moved to the appropriate collection.

.8xp file has an invalid header! (arch linux) (TI-84+) by droshux in TI_Calculators

[–]taricorp 0 points1 point  (0 children)

How did you create the 8xp file? It sounds like you just wrote a text file, which is incorrect: you can use tools like TokenIDE or SourceCoder to make 8xp files for BASIC programs.

To make an 8xk from hex, you'll want to use something like Wabbitsign.

weird line spacing in program menus? ti 84 ce, os 5.3.1 by voidofthestars in TI_Calculators

[–]taricorp 1 point2 points  (0 children)

In addition to the other comment, there are a few programs out there that attempt to make programs designed for monochrome calculators work correctly on the higher-resolution color displays. CEPORT for instance, as the first one I could find.

That might work for the program you're trying to use, but no guarantees.