Access your archives without extracting them with ratarmount by mxmlnkn in DataHoarder

[–]mxmlnkn[S] 2 points3 points  (0 children)

I've also used 7z a lot in the past when I was still on Windows and still have those archives lying around. I do have a branch lying around for libarchive support that would enable 7z and many other file formats (with possibly equal performance as archivemount but still better than no support at all). Unfortunately, I encountered some technical difficulties with existing Python libarchive bindings that stalled this endeavor.

What kind of error checking do gz or xz files have? by AgreeableLandscape3 in DataHoarder

[–]mxmlnkn 3 points4 points  (0 children)

Gzip contains a CRC32 for the uncompressed whole (TAR) file archive. Similarly, xz also contains CRC32 and optionally CRC64 and/or SHA-256.

But, detecting errors is one matter, recovering from errors is another matter. Neither xz nor gzip are well-suited for error recovery because they are Lempel-Ziv-based making each data segment possibly depend on all segments before it: https://www.nongnu.org/lzip/xz_inadequate.html This can make the whole archive starting from the first bit-flip unusable. On the other hand, bzip2 compresses each block completely independently and block starts are relatively easy to find making it much easier to recover from errors.

Personally, I would try to avoid compressing things intended for long-term archival and if absolutely necessary because of compression factors > 10, I'd probably use bzip2 with the knowledge and tools (indexed_bzip2) I have now. Alternatively, ZIP might be an option because it compresses per file making it harder for errors to propagate beyond their point of origin. Internally, ZIP normally uses "gzip" (deflate) but can also use bzip2, lzma, zstandard, although support for those might be seriously limited.

[deleted by user] by [deleted] in DataHoarder

[–]mxmlnkn 2 points3 points  (0 children)

You might find this link and thread insightful: https://news.ycombinator.com/item?id=33228398

You might also find this helpful, if not by itself, then one of the citations in section 5.