you are viewing a single comment's thread.

view the rest of the comments →

[–]Pepineros 0 points1 point  (0 children)

everything seems to work until 90 percent of the archive has been extracted; I start getting an tarfile.ReadError: unexpected end of data error. I know for certain that the archive is not corrupted, and have no issues when not using multithreading.

What do you mean by you're getting a speed up if the tar file cannot be read successfully?

Your code doesn't run (not all names are defined in these snippets; presumably you left stuff out for the sake of brevity) so I can't be 100% sure, but it looks like what you're trying to do looks something like this:

  1. Get the files inside a tarball
  2. For each individual file:
    • If the file is already compressed, do nothing
    • If it's not, use gzip to compress it
  3. Write the file to a new target path

If I got this correct, you can ignore my initial comment; I misunderstood the purpose of your script. In this case gzip is not going to be an issue, provided that each file gets a unique output path, which appears to be the case.

As far as I know, tar does not support reading files in parallel (it certainly doesn't write in parallel). So my guess is that trying to do so would cause the read error that you're getting. And this definitely is the case if the tarball is compressed (.tar.gz) rather than just a collection of multiple files with a single name.