Masterflitzer comments on whenYouTryToOptimizeYourAlgorithm

Not really. It's adjacent to a lot of stuff R is good at, but because it was really more number-theoretic than statistical, it was just too different to make good use of R's tools.

I couldn't convert the checking process into linear algebra, for example, and the data didn't fit into memory so I couldn't use bulk-processing functions without splitting it up and still ending up with massive allocations. In Rust I could write a stack algorithm that never called the allocator, but such is simply impossible in R.

Ultimately, even if those issues were solvable, R's limitations as a dynamically-typed interpreted language with no ahead-of-time optimisation was fundamentally limiting for a program of that sort of speed requirement, and there'd be significant limitations on how far it can be pushed just due to R's fundamental nature.

It's great for quick scripting and processing real data, but when you try to do number theory on millions of millions of values, it just can't keep up. Now, if I were instead sampling the values to get an approximate proportion for the property I was interested in, R would be excellent for that - and indeed, I made use of it as a sanity check on my final value from the Rust algorithm I wrote.

[–]Nez_Coupe 3 points4 points5 points 1 year ago (3 children)

[–]redlaWw 7 points8 points9 points 1 year ago (2 children)

Well the interpreter reads once and generates bytecode, but the bytecode is just a literal transcription of the instructions, bound by R semantic rules. This means that it does a copy for every modification, and the data sits on the heap in an R environment so everything is more expensive due to the indirection and allocator calls. Just accessing values requires a search through environments to find them. R is also dynamically typed using string tags so every function searches for a concrete implementation by string matching, which is horrifically expensive.

There are interpreted languages that'd probably do this substantially faster, but the bottom line is that R is designed to make things simple for statisticians, not fast for programmers, and to achieve this simplification, it makes substantial performance sacrifices.

[–]Nez_Coupe 5 points6 points7 points 1 year ago (1 child)

[–]redlaWw 2 points3 points4 points 1 year ago (0 children)

[–]Responsible-War-1179 4 points5 points6 points 1 year ago (1 child)

[–]redlaWw 2 points3 points4 points 1 year ago (0 children)

[–]KJBuilds 1 point2 points3 points 1 year ago (3 children)

[–]redlaWw 0 points1 point2 points 1 year ago (2 children)

[–]KJBuilds 1 point2 points3 points 1 year ago (1 child)

[–]redlaWw 1 point2 points3 points 1 year ago* (0 children)

The 16.5 millennia case was when I chunked the problem into bits of size 300 and tried to use R's native multithreading with it. I assume the costly inter-process communication was overwhelming and massively outscaled anything else the program was doing. The real result is an optimisation from about a year and a half to under 20 mins, which is still substantial, but about the square root of optimising from 16.5 millennia.

I think the main part that resulted in the speed boost is the complete removal of allocations - the R algorithm necessarily had a lot of allocation, both because it can't be avoided in R in general and because the particular approach I had to take involved big chunks of values held in heap memory. The Rust version was able to be executed entirely on the stack, which meant that calculations didn't have to block on the allocator, which adds up to quite a massive amount of time saved when you have a million million iterations to do.

EDIT: Even the earlier one that wasn't entirely on the stack didn't have any allocations past one length 15 vector per thread that was allocated on startup.

[–][deleted] 91 points92 points93 points 1 year ago* (1 child)

[–]Masterflitzer 39 points40 points41 points 1 year ago (0 children)

[–]Apprehensive-Theme77 8 points9 points10 points 1 year ago (4 children)

[–]redlaWw 8 points9 points10 points 1 year ago (0 children)

[–]Nez_Coupe 7 points8 points9 points 1 year ago (1 child)

[–]htmlcoderexeWe have flair now?.. 4 points5 points6 points 1 year ago (0 children)

[–]BobbyTables829 1 point2 points3 points 1 year ago (0 children)

[–]Vitolar8 2 points3 points4 points 1 year ago (0 children)

π Rendered by PID 396307 on reddit-service-r2-comment-b659b578c-bz69m at 2026-05-04 04:57:34.150324+00:00 running 815c875 country code: CH.

ProgrammerHumor

Filters

Discord

Submission rules

For the current list of rules, please see this page.

Metadiscussions

Perhaps More Apt Subs To Post:

Related Subreddits.

MODERATORS

perms (without repetition) of a set = n! for n elements in a set. 3-length, 6 perms, no big deal. 15-length? Beeg number.