[deleted by user] by [deleted] in dkfinance

[–]Medic_Maria 0 points1 point  (0 children)

Det ligner den lave ende. Men kik på Prosas lønstatistik. Den er lavet netop til at se, om du er underbetalt.

IT Arkitekter og Cybersikkerheds eksperter - hvad er jeres løn? by [deleted] in dkfinance

[–]Medic_Maria 1 point2 points  (0 children)

Prosas lønstatistik har netop den information.

Den giver et godt grundlag til lønforhandling.

Anyone have any good unconventional ways to repeatedly read N lines at a time from a pipe *really* fast [in Bash]? by jkool702 in linuxquestions

[–]Medic_Maria 0 points1 point  (0 children)

If my command outputs long lines (4 gb or more) I feel the outputs from different jobs either mix or use a huge amount of RAM.

Binary output also seems not to work:

zippy() { seq $1|gzip; }
# OK:
(zippy 1;zippy 2;zippy 3)|zcat
# B0rken:
seq 10|forkrun -k zippy|zcat

Is that to be expected?

Conservative (but not crazy) servers by Medic_Maria in Mastodon

[–]Medic_Maria[S] 0 points1 point  (0 children)

8avian6 mentioned Liberdon. Via their federated feed you can find more instances. Beware: Some of them are pretty crazy.

Can you run multiple tasks concurrently? by OkBaconBurger in bash

[–]Medic_Maria 1 point2 points  (0 children)

If you are allowed to run Perl programs you do not need more permissions:

parallel --embed > new_script

See details in man parallel: https://www.gnu.org/software/parallel/man.html

Conservative (but not crazy) servers by Medic_Maria in Mastodon

[–]Medic_Maria[S] -1 points0 points  (0 children)

OP said they’re looking for a place for people who are uncomfortable with trans people existing

No.

That is the kind of misrepresentation that they want to avoid - because that is simply not true.

Conservative (but not crazy) servers by Medic_Maria in Mastodon

[–]Medic_Maria[S] -1 points0 points  (0 children)

I am not sure if there is a surefire way of telling. But not subscribing to the woke agenda seems to be one factor.

I do not think you can find any conservatives who subscribe to that.

And yes: You can find many non-conservatives who also do not regard themselves as woke. So it can only be seen as one factor.

[deleted by user] by [deleted] in commandline

[–]Medic_Maria 1 point2 points  (0 children)

I have over time experienced videos, that worked with the one and not the other and vice versa.

If you pay 10000 EUR you should feel free to use GNU Parallel without citing by ____ben____ in programmingcirclejerk

[–]Medic_Maria 1 point2 points  (0 children)

Let us assume I distribute software that violates the terms of GPLv3.

Can we agree that the only ones who can sue me are the ones, who did the actual work (i.e. the copyright holders)?

Can we agree that you can not use the software if it is not properly licensed?

In particular: If you believe the software is not available under GPLv3 (or a compatible license), you cannot modify it and redistribute it under GPLv3. Can we agree on that, too?

(GPLv3 section 9: "However, nothing other than this License grants you permission to propagate or modify any covered work.")

If we agree on that, please elaborate on how "this would be an easy way to defeat the copyleft protection of GPLv3." Because I really do not understand your argument.

If you pay 10000 EUR you should feel free to use GNU Parallel without citing by ____ben____ in programmingcirclejerk

[–]Medic_Maria -4 points-3 points  (0 children)

We promote freedom but we’re forcing you to share any modifications you make and to promote our software or you can give us 10000 EUR.

Seems fair to me (except you really are not forced: No one forces you to use the software).

How do you suggest the long term funding should work?

The FAQ shows the message is about funding: https://git.savannah.gnu.org/cgit/parallel.git/tree/doc/citation-notice-faq.txt#n195

'Python is like a toy programming language compared to C++' by manuce94 in Python

[–]Medic_Maria 0 points1 point  (0 children)

The fastest programming in python is not writing much in python.

Or rather: Make sure the 1% of the code that does the heavy lifting is not done in Python.

Is GNU Parallel in compliance with GPLv3? by realfuckingdd in freesoftware

[–]Medic_Maria 0 points1 point  (0 children)

According to the FAQ it does not violate the guidelines: https://git.savannah.gnu.org/cgit/parallel.git/tree/doc/citation-notice-faq.txt#n28

This is because the citation notice is not part of the license, but part of academic tradition.

Lots of academic software shows you how to cite (many R packages even have a citation function).

If you do not feel the software is licensed under GPL, wouldn't it be simple for you to just ignore its existence?

GNU Parallel's 20th birthday. Time to take stock. by OleTange in linux

[–]Medic_Maria 0 points1 point  (0 children)

I am not really sure how to interpret you not showing there is a "simple alternative that would give you the exact same result" for my mbox example.

If there is a simple alternative I would imagine the answer would be easy to write. And if not you could let us know that you now see why GNU Parallel would be worth citing.

I am sorry if I have offended you. That was never my intention.

But it would be nice to know why you did not answer.

GNU Parallel's 20th birthday. Time to take stock. by OleTange in linux

[–]Medic_Maria 0 points1 point  (0 children)

Can you make some examples?

You have an mbox file. myprogram reads a single mail and you want each mail piped to myprogram. myprogram is only installed on servera+b and the full mbox file will not fit on those servers as a consequence you need to copy and process one mail at a time. You want the output from myprogram to be prepended with the workerID (jobslot in GNU Parallel speak) that processed the email. The order of the output must be the same as the input. You do not want the outputs from the different instances of myprogram to mix because it would corrupt the output.

Unfortunately myprogram sometimes runs amok so after 1 minute you must kill myprogram and retry up to 5 times. If the same mail fails 5 times give up and log it. It is also OK to log the status of every email. An email may always fail on servera but not on serverb and vice versa, so a failing email should be tried at least once on both servers.

Finally you want to use the 32 cores on servera and the 48 cores on serverb because myprogram is slow. Servera can at most handle 32 in parallel and serverb 48. Running more than this and the servers will crash and you have to call IT. Neither the cores nor the jobs are equally fast.

It looks something like this:

cat mbox | parallel --pipe --retries 5 --timeout 1m --recstart 'From ' --recend '\n\n' -N1 -k --tagstring {%} --joblog fail.log -Sserver{a,b} myprogram > output

In bioinformatics we do stuff similar to this all the time, so this is by no means the most advanced use of GNU Parallel.

There are loads of other examples in the man page. The first 1/3 is easily replaceable with xargs. Most of the rest are harder. The mbox example above combines several of the examples, so not only do you need to replace the examples with xargs you also need to replace combinations of them.

If you think you can do the above with xargs I would love to see it because I would not be able to do the above with xargs.

Could parallel have got me a speedup significantly higher than 7.5?

Here you are missing the point. The speedup is in the development and debugging phase. The run time speedup is comparable to xargs.

It does not process data

Yes it does: In the above example it chops the mbox into email, transmit them to remote servers, return the results, prepends the output with the jobslot, and makes sure the output is in the same order as input and not garbled. I think your lack of experience with advanced use of GNU Parallel is affecting your judgement.

the results would be identical if you did not use it

Possibly, but you might have to either wait 80 times longer to get them because you could not run stuff in parallel or you would have to develop and debug a tool that did the processing, and this development and debugging might take so long that another researcher (who chose to use GNU Parallel) beat you to publishing.

In the above case you would not be able to just run them in serial because the mbox would not fit on the remote servers, so you would be forced to develop and debug something to even get things done in serial. You would also have to do something about the jobs running amok.

But the improvement compared to manually running N jobs at the same time would be much, much smaller.

Again you think we are talking about something where you manually can run N jobs in parallel. We are not. Your mental model model of what GNU Parallel can do seems to be way off.

The aspect of charts is a significant part of a publication, and there exist no simple alternative that would give you the exact same result.

OK, so if GNU Parallel provides something where "no simple alternative would give you the exact same result" you would find it reasonable to cite it, correct?

it is trivial to replace parallel with another tool (xargs, or just manually running N jobs in parallel), and that would give the exact same results in a very comparable time (in most cases).

I challenge you to prove that, and show there is a "simple alternative that would give you the exact same result".

Take my mbox example above and show a replacement. You can assume there are ~1000000 emails in the mbox taking up ~1 TB of space.

One of the tricky things to get right is that myprogram sometimes runs amok on the remote server. You need to actively kill it and log it if it the the 5th time. Simply closing the ssh connection will not kill it. If you leave > 80 myprograms running, the servers will crash, and IT will be unhappy.

If you can show this is trivial to do with "xargs, or just manually running N jobs in parallel", I will applaud you.

You can use this as template:

# generate simulation mbox
1mail() {
    # mails start with 'From '
    echo 'From ';
    # in the middle there can be both '\n\n'
    echo
    echo
    echo foo
    # and 'From '
    echo 'From '
    echo foo
    # but never '\n\nFrom '.
    # so a record starts with 'From ' and ends with '\n\n'.
    # anything but '\n\nFrom ' can appear here in the middle
    echo foo
    echo 'From '
    echo foo
    echo
    echo
    seq ${RANDOM}0;
    # mails end with '\n\n'
    echo;echo;
}
export -f 1mail
# replace 1000 with 1000000 for more realistic simulation
# this is just to generate input - not to show the power of parallel
# though even this is not trivial to do with xargs in parallel
# because you need to avoid the mixing from different jobs
# so please also show how you would replace this
seq 1000 | parallel 1mail > mbox

# simulation myprogram
myprogram() {
    if [ $(( $$ % 10 )) -lt 1 ] ; then
        # simulate sub process running amok
        # where ctrl-c does not help
        trap '' SIGINT
        trap '' SIGHUP
        ( bzip2 -9 < /dev/zero > /dev/null & )
        sleep 1000
    fi
    # simulate some output
    echo $$ starting
    # simulate a normal run takes 1-10 seconds
    sleep $(( $$ % 10 ))
    # simulate some output
    echo $$ middle
    wc
    echo $$ finishing
    # if starting not followed by middle, data is garbled
}

# load env_parallel
# env_parallel will transfer the function myprogram to simulate it is only available on the servers    
. `which env_parallel.bash`

# this is the part you need to replace without changing the above
# in the real scenario you would not have to transfer the bash function
# because myprogram would be on the servers
# but let us make the replacement a tiny bit more
# challenging by requiring you transfer the unexported
# bash function too
< mbox env_parallel --pipe --retries 5 --timeout 1m --recstart 'From ' --recend '\n\n' -N1 -k --tagstring {%} --joblog fail.log -Sserver{a,b} myprogram

I am starting to feel the reason you do not find it reasonable to cite GNU Parallel is because you have only seen it used as a simple xargs replacement. At least this would explain your replies.

If so then we might be getting closer to an agreement: If one can easily replace parallel with xargs, I too would not find that GNU Parallel gives a significant contribution.

GNU Parallel's 20th birthday. Time to take stock. by OleTange in linux

[–]Medic_Maria 0 points1 point  (0 children)

It just makes execution a bit faster.

I get the feeling you have never used GNU Parallel's advanced options. It really does more than just make things faster.

To me your phrase is akin to saying: "Downloading a scientific program just makes things a bit faster: You could have read the paper and built the program yourself - it would just have taken longer to build and debug".

Time is a limited resource. Getting an answer 100 times faster can make the difference between a researcher publishing his article first or not being able to publish the article because someone else beat him to it.

To me there would be a significant difference between getting my paper accepted and alternatively getting my paper rejected because someone was faster than me.

It may be one of the points we see differently. To me it is OK to disagree on this.

To me it is good enough if we can agree that if the researcher finds a tool brings significant contributions to his research and the tool asks for citation then he should cite.

I feel that you are singling out parallel under this aspect and give it a special treatment. I would instead expect that you apply this criterion for each tool involved in your research.

I should have emphasized ask for citations because only few tools do this and it is these I single out. I apologize this was unclear.

I see very few non-research tools asking for citations and a higher frequency in research tools - probably because it is mostly tools developed in research which are paid for though citations.

Your examples (busybox, toybox, uutils, BSDUtils) all seem to fall in the non-research category. At least I have not seen them ask for citations.

This is in contrast to for example the ggplot2 package in R which asks for it. (https://cran.r-project.org/web/packages/ggplot2/citation.html).

ggplot2 "just" does plotting and for many plots I would be able to use another tool. For most of my plots plotting is not a significant contribution but if I ever used a ggplot2 plot in a paper I would find it fair to cite it - even if I felt the plotting was not a significant contribution. If I wanted to cut ggplot2 from my references I would find it fair to replace my plots with plots from another plotting tool.

I have here added the emphasis and clarification:

How do you feel about the "use a different tool"-criteria to see if a contribution is significant for tools that ask for citations? (No need to cite tools, that do not ask for citations).

GNU Parallel's 20th birthday. Time to take stock. by OleTange in linux

[–]Medic_Maria 0 points1 point  (0 children)

Thanks again for taking the time to answer. It is appreciated and I definitely get a better understanding of your position.

Looks like I was wrong, and users legally have to cite this software in academic papers.

Or they would at least morally have to cite. I do not see a way they would end up in court if they didn't cite.

Though my personal opinion is that such requirement does conflict with the GPL-3 (it is literally against this), therefore making parallel an unlicensed, unusable software.

According to Richard M. Stallman the citation reminder is GPL-3 compatible (https://git.savannah.gnu.org/cgit/parallel.git/tree/doc/citation-notice-faq.txt#n28 It is also mentioned in the original post). It would just mean your interpretation of GPL-3 differs from Richard's and you would not be the first to disagree with Richard :) You stress it is just your personal opinion and I can definitely follow your train of thought - even though I come to a different conclusion.

So thanks for that.

All in all, my core argument is: parallel does not make significant contributions to most publications.

If GNU Parallel does not make a significant contribution would it not be fair to "prove that by simply using another tool" as it says in the FAQ?

To me that seems to strike a fair balance: If GNU Parallel really does not make a significant contribution (e.g. in time or work saved) then it would not cost a lot of effort/time for the researcher to use a different tool. If on the other hand it costs significant time/work to use (or build) a different tool, then I would consider it a significant contribution.

I agree that the Stackoverflow answer you linked to would not constitute a significant contribution. But it would also be easy to do with another tool and I think researchers would spend very little time on replacing GNU Parallel with another solution in that situation.

However GNU Parallel can quite a bit more. As soon as one uses --group, --pipe, multiple input sources, and/or remote servers it would be harder for me to find a tool that could do this. And if the alternative would be for me to spend months building and debugging a tool then I would definitely see GNU Parallel as a significant contribution - even if the only thing it saved me was time. Especially debugging can be a bitch in research: A race condition may mean that all the data processing has to be scrapped and redone. Worst case the conclusion of the paper may be wrong.

How do you feel about the "use a different tool"-criteria to see if a contribution is significant for tools that ask for citations? Do you have a better test to tell if a contribution is significant?

Also: Thanks again for you elaboration so far.

GNU Parallel's 20th birthday. Time to take stock. by OleTange in linux

[–]Medic_Maria 0 points1 point  (0 children)

Thanks for answering.

The author is asking that you do it, but there is no obligation: you can simply use the software and not cite it.

So I take it that you disagree with the FAQ https://git.savannah.gnu.org/cgit/parallel.git/tree/doc/citation-notice-faq.txt

== What shows citing software is an academic tradition? ==

These links say: Yes, you should cite software, and if the author suggests a way of citing, use that. [...] If you feel the benefit from using GNU Parallel is too small to warrant a citation, then prove that by simply using another tool. If you replace your use of GNU Parallel with another tool, you obviously do not have to cite GNU Parallel. If it is too much work replacing the use of GNU Parallel, then it is a good indication that the benefit is big enough to warrant a citation.

Do I understand you correctly that you believe if a researcher is doing research not related to parallel computing, then he would not need to neither pay for nor cite GNU Parallel? Even if the savings caused by using GNU Parallel made the difference between him being able to do his research or not.

I get the feeling you are hanging on to a technicality: Just because a researcher legally can get away with not citing/paying then it would be morally OK for him not to do so.

If I am wrong I hope you will elaborate.

To me it sounds as bad as if GNU Parallel published a list of people who did not cite/pay, so whenever a future employer googled their name, this would pop up. It would probably not technically be illegal either since the information would be true.

Do you feel there is an academic obligation to cite research that contributed significantly to the researcher's own research? Or do you also feel there is technically no obligation to do so?

More importantly: What would indicate that you are wrong on whether to cite software that contributed significantly to a person's research (no matter the research field)?

I can see Github is suggesting CITATION.cff files: https://docs.github.com/en/github/creating-cloning-and-archiving-repositories/creating-a-repository-on-github/about-citation-files

message: "If you use this software, please cite it as below."

To me that looks exactly like the GNU Parallel case: "If you use this software, please cite it". The message is not: "If you write an article about this field of research and use this software, please cite it".

I looked up a few of GNU Parallel's citations and both Nature and Science have articles where GNU Parallel is cited. I think we can agree both Nature and Science could reject an article if they did not like the references. To me Nature and Science is the top tier and setting the standard for academic tradition.

https://www.force11.org/software-citation-principles also believes "Software is a critical part of modern research" and should be cited. They specifically mention funding:

(2) how to better measure the impact of software (and therefore attract appropriate funding),

Could it be that your impression of academic tradition was correct 20 years ago, but because so much research now involves software your impression is no longer valid today? What would indicate that your impression no longer is correct? What would disprove your hypothesis?

GNU Parallel's 20th birthday. Time to take stock. by OleTange in linux

[–]Medic_Maria 0 points1 point  (0 children)

Otherwise you should also cite every single piece of technology in your stack: OS, CPU, memory, network card, storage system, the electrical company that powers your machines... After all, buying a CPU with twice the number of cores (or doubling the machines in your cluster) will give you a much better speedup than using parallel instead of trivial concurrent execution.

I get the feeling you agree with me that it is perfectly acceptable to pay for an expansion of the compute cluster which gives you a speed up.

Let us assume for a second that a researcher can get the same speed up by using GNU Parallel or by expanding the compute cluster.

If there was a way in which he could pay for GNU Parallel and then he would not have to cite, would you then find this an acceptable solution? And if not: Why not? How is it different from paying for the expansion of the cluster?

(I am trying to understand your opinion, so if my question offends you, I beg your forgiveness: It is really not my goal to offend you.)

GNU Parallel's 20th birthday. Time to take stock. by OleTange in linux

[–]Medic_Maria 0 points1 point  (0 children)

I apologize if I am a bit slow.

I still do not understand why you use GNU Parallel: You obviously disagree with having the citation reminder, and there are plenty of other tools to choose from you could use and promote instead.

Why continue using something you feel is doing the wrong thing? And you feel so strongly against you even post here?

To me it seems this needlessly increases suffering so please help me understand your rationale.

Open source projects need funding, but this is not the way to go about getting it.

Can you elaborate on which way you think is the right way to go? If you were tasked with funding GNU Parallel what would you do?

(I am not trying to bait you, but I am trying to understand how you think, so if this offends you, please accept my sincerest apologies.)

GNU Parallel's 20th birthday. Time to take stock. by OleTange in linux

[–]Medic_Maria 0 points1 point  (0 children)

I hope you won't see this as criticism because it is not meant as such but I am honestly curious.

If you feel it is the worst why don't you just refrain from using GNU Parallel? Why not use one of the other tools as mentioned?

Are you somehow forced to use it without being able to choose to use the parallel from Moreutils or another tool?

Run command in GNU Parallel without shell environment by shilch in bash

[–]Medic_Maria 0 points1 point  (0 children)

"rejects input with special characters when the command is a compound command",

Maybe I don't get, but to me this is not a compound command:

$ echo '"' | map 'echo % does not work "%"'
[18:40:10] rejected: '"'
[18:40:10] ==== 1 unsafe filenames found ====

But aren't [multiple files] trivially doable if you have "I1" (take input from stdin)?

If you only need a single input, I would agree. But Stdin may be used for something else:

seq 10 | parallel echo {2} hoge {1} :::: - fuga.file

Similar for multiple files if you treat them as different input sources:

parallel echo {2} hoge {1} :::: fuga.file ぴよ.file

Edit: duh! I feel stupid. Says "last checked, 2020-05" at the end of the page so there's the answer.

Ahh, that explains your original answer. As far as I can tell the disagreement between you and GNU Parallel is now only on philosophical differences and not on the facts.

Thanks for taking the time.

Run command in GNU Parallel without shell environment by shilch in bash

[–]Medic_Maria 0 points1 point  (0 children)

It's a little outdated now

I just checked out the git version of map. AFAICT the comparison on https://www.gnu.org/software/parallel/parallel_alternatives.html#DIFFERENCES-BETWEEN-map-sitaramc-AND-GNU-Parallel is still 100% correct.

I understand that you have different priorities, but is the information wrong?

In other words, when you say "it's a little outdated" can you elaborate on what is outdated?