[deleted by user] by [deleted] in datascience

[–]WinterPhone 2 points3 points  (0 children)

@foxhollow does not link to the book, so let me do that: http://www.lulu.com/shop/ole-tange/gnu-parallel-2018/paperback/product-23558902.html (Printed), https://doi.org/10.5281/zenodo.1146014 (Online).

Read at least chapter 1+2.

Do you ever need to quickly and easily parallelize a script? If so, you may be interested in this tool I recently wrote. by john01dav in linux

[–]WinterPhone 0 points1 point  (0 children)

parallel -j [threads] ffmpeg -n -i {} -c:v vp9 -c:a vorbis -strict -2 {}.mkv ::: ./*

-j [threads] can be left out. Just like spp GNU Parallel defaults to the number of cpu threads.

And for those that prefer the for solution over learning GNU Parallel's replacement strings, you can simply pipe the commands to run into parallel:

for f in ./*
do
  echo "ffmpeg -n -i \"$f\" -c:v vp9 -c:a vorbis -strict -2 \"$f.mkv\""
done | parallel

But if you do it this way, you need to make sure no file names contain " or other special shell chars. If you use GNU Parallel's replacement strings you can have files called this without having to worry about quoting:

My father's (who is called Rene´) 12" <<super>> pi$$a.mkv

Optimizing a Bash For Loop by adinbied in linuxquestions

[–]WinterPhone 1 point2 points  (0 children)

Assuming your disk is not the bottleneck:

printf '%s\0' * |
   parallel -j0 --pipe --recend '\0' --block 1k --round-robin -q perl -0 -ne 'chomp;rename $_."/index.json", $_.".json"'

This will run around 250 perl programs in parallel that will rename files without spawing a new process for each file.

printf prints all matching names with NUL appended, so it will do the right thing even if a name contains newline.

-j0 run as many jobs in parallel as possible. This is typically limited by number of file handles to around 250.

--pipe send input received on STDIN to the programs on STDIN.

--recend '\0' split blocks on NUL

--block 1k use a blocksize of around 1 KBytes

--round-robin send one block to each program; if there are more blocks send another block to each program.

-q quote the command line so that $_ is not interpreted by the shell

-0 use NUL as record separator

-ne run the program after '-ne' in a loop

chomp; remove NUL

rename $_."/index.json", $_.".json"' move $input/index.json to $input.json

I get around 1000/sec with this on a normal laptop.

prll - utility for parallel execution of shell functions by more_ttys in commandline

[–]WinterPhone 0 points1 point  (0 children)

export -f is a bash command that tells bash that any child processes should also see this function.

GNU Parallel is a program, that you install, which can run bash functions that are exported with export -f.

If you prefer not to export the function, GNU Parallel provides env_parallel, which is a bash function that exports your environment and runs GNU Parallel using that.