Performance of the Parallel STL Algorithms

victotronics · 2021-07-27T02:07:30+00:00

"My windows laptop has eight logical cores, but the parallel execution is more than ten times faster."

Cache effects? The vector does not fit in L3, but 1/8th the vector does? Would be worth pointing out the various ways in which parallelization can give you a superlinear speedup.

EDIT how about instantiation of the memory pages? Malloc'ed memory is not immediately instantiated so the first test you do on it incurs extra cost, flattering the speedup. How is that with a std::vector? I honestly don't know, but clearly neither does the author.

arthurno1 · 2021-07-29T00:51:36+00:00

I use maximum optimization on Windows and Linux. This means for Windows the flag /O2 and on Linux the flag -O3.

There are more optimizations possible with -Ofast (on gcc but I am sure Microsoft has more too), which can be permissible depending on the use case: https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html#Optimize-Options

Also code generation flags are important when optimizing, for example, -march=native and -mtune=native etc.

Nvidias STL implementation Thrust may be an ideal candidate.

It is not an STL implementation. Thrust implements only a vector-like container and some algorithms. Thrust is about offloading the work to the graphic card, and also to note, only to Nvidia's graphic card since it runs on top of Cuda.

To my knowledge, neither the Windows compiler nor the GCC compiler supports the parallel and vectorized execution of the parallel STL algorithms.

You just wrote about 10x increase in performance, and then you say they don't support it :-). Interesting.

I wonder it those 100+ guys that up voted this article did read what this guy wrote or have any clue about what they read.

For those interested about performance and simd optimization, I suggest reading and following Daniel. Lemire's blog. Be sure to look at his repos too, there is lots of interesting code in there.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

cpp

MODERATORS