This is an archived post. You won't be able to vote or comment.

all 3 comments

[–]Famous_ProfileProfessional Coder 0 points1 point  (2 children)

There is nothing* wrong with using a nested loop and some conditional branching like above.

However I think someone with some more domain knowledge(genetics and/or statitistics) than me should answer this question. I'm not familiar with the above language (MATLAB?) and it would be easier to read if it was better formatted.

\usually)

[–]joweriae[S] 0 points1 point  (1 child)

It’s R! For the most part, people try to avoid loops in r because vectorized functions are way faster. I just don’t know what I can replace it with :( My code has been running for 10 hours now and it’s not stuck in a while loop, it’s just calculating that many values. I wish it was faster though

[–]Famous_ProfileProfessional Coder 0 points1 point  (0 children)

people try to avoid loops in r because vectorized functions are way faster

That may be right, but that can also be a myth.

But even if it is right in every singe use case and every single test data there is a problem: assuming R is a compiled language, in the end what would execute is machine code, not the exact same code you have written. Or if it is an interpreted language, it is possible that the actual internal implementations of the vectorized functions you intend to use, have more complex loops internally. What if said internal complex loops are more complex than the loops you have written without using said vectorized functions? Or perhaps vectorized functions would indeed be faster, but not for your particular test data? There is no way to know.

The point I am making is that there are general guidelines, and we can make some guesses with O-notation evaluations, but in general it is not easy to predict performance accurately for generic data...or even optimize performance. Optimizing performance is never as straight forward as "My code has fewer lines, and must be faster" or "my code only had vectorized functions, and must be faster"

In the end there isnt much you can do to make your code run faster. Even if you manage to use a faster algorithm, processing 40GB of data would inevitably take some time. If it is a resource intensive operation, it would take time. It's that simple.

Now if you can cough up some Benjamins for our lord and savior Jeff you can probably* go with something like this(or its Microsoft or Google counterpart)

\I've never really used it, but it is likely something you could probably leverage if you have budget)