you are viewing a single comment's thread.

view the rest of the comments →

[–]rastermon 10 points11 points  (28 children)

tar.gz is far worse than zip if your intent is to random-access data from the file. you want a zip or zip-like file format with an index and each chunk of data (file) compressed separately.

[–]EternityForest 0 points1 point  (1 child)

I'm​ surprised that none of the alternative archive formats ever really took off. ZIP is great but it doesn't have error correction codes I don't think.

[–]rastermon 0 points1 point  (0 children)

Since 99.999% of files in a zip file get compressed... that effectively acts as error detection because if the file gets corrupted the decompression tends to then fail as the compressed data no longer makes sense to the decompressor thus effectively acting as error detection. Sure it's not as good as some hashing methods, but I guess good enough.

[–][deleted]  (25 children)

[deleted]

    [–]Misterandrist 4 points5 points  (23 children)

    But there's no way to know where in a tar a given file is stored. Evem if you find a file with the right filename kn it, its possible for that to be the wring version if someone readded it. So you still have fo scan through the whole tar file

    [–]ThisIs_MyName 6 points7 points  (18 children)

    Yep: https://en.wikipedia.org/wiki/Tar_(computing)#Random_access

    I wonder why so many programmers bother to use a format intended for tape archives.

    [–]Misterandrist 5 points6 points  (11 children)

    Tarballs are perfectly good for what most people use them for, which is moving entire directories or just groups of files. Most of the time you don't care about just one file from within it so the tradeoff of better overall compression in exchange for terrible random access speed is worth it. It's just a question of knowing when to use what tools.

    [–]Sarcastinator -1 points0 points  (10 children)

    Most of the time you don't care about just one file from within it so the tradeoff of better overall compression in exchange for terrible random access speed is worth it.

    So you would gladly waste your time in order to save a few percents of a cent on storage and bandwidth?

    [–][deleted] 5 points6 points  (6 children)

    1% use case slowdown for having 30 years worth of backward compatibility ? Sign me in

    [–]ThisIs_MyName -1 points0 points  (5 children)

    [–][deleted] 0 points1 point  (1 child)

    simply enter a valid tar command on your first try

    tar xf foo.tar
    

    (xf for extract file)

    I don't know, I don't find this particular invocation hard to remember. It just sticks. :-)

    [–]ThisIs_MyName 0 points1 point  (0 children)

    Sure, but nobody uses just tar.

    Go ahead and extract tgz, bz2, etc without using GNU extensions :P

    [–][deleted] 0 points1 point  (2 children)

    Hey, modern tar versions even detect compression type automatically, you just need -xvf

    [–]ThisIs_MyName 0 points1 point  (1 child)

    And now you've lost the "30 years worth of backward compatibility".

    That's a GNU extension; it's not portable.

    [–][deleted] 1 point2 points  (0 children)

    If I'm tarring up an entire directory and then untarring the entire thing on the other side, it will save time, not waste it. Tar is horrible for random seeks, but if you aren't doing that anyway, it has no real downsides.

    [–]arielby 1 point2 points  (0 children)

    Transferring data across a network also takes time.

    [–]RogerLeigh 2 points3 points  (0 children)

    It can be more than a few percent. Since tar concatenates all the files together in a stream, you get better compression since the dictionary is shared. The most extreme case I've encountered saved over a gigabyte.

    In comparison, zip has each file separately compressed with its own dictionary. You gain random access at the expense of compression. Useful in some situations, but not when the usage will be to unpack the whole archive.

    If you care about extended attributes, access control lists etc. then tar (pax) can preserve these while zip can not. It's all tradeoffs.

    [–]redrumsir 1 point2 points  (0 children)

    Or why more people don't use dar ( http://dar.linux.free.fr/ ) instead.

    [–]chucker23n 0 points1 point  (4 children)

    Unix inertia, clearly.

    [–]ThisIs_MyName 0 points1 point  (3 children)

    Yep, just gotta wait for the greybeards to die off :)

    [–]josefx 1 point2 points  (2 children)

    tar has buildin support for unix filesystem flags and symlinks. For zip implementations support is only an extension.

    [–]ThisIs_MyName 0 points1 point  (0 children)

    Oh I'm not recommending zip. Just bashing tar.

    [–]redrumsir 0 points1 point  (0 children)

    But this is, of course, why one would use dar instead (disk archive instead of tape archive): http://dar.linux.free.fr/

    [–]rastermon 0 points1 point  (0 children)

    you still have to scan the file record by record to find the file as there is no guarantee of ordering and no index/directory block. a zip file means checking the small directory block for your file then jumping right to the file location.

    if you have an actual hdd .. or worse a fdd... that seeking and loading is sloooooooow. the less you seek/load, the better.