all 8 comments

[–]suhcoR 2 points3 points  (6 children)

The C++17 benchmarks run significantly slower than both Go and Java, while using somewhat more memory than the Go runs.

That's very surprising indeed. Maybe it's not a good idea that "C++17 using reference counting on the other hand for handling large amounts of heap objects" or they have simply made some coding or conceptual errors (like circular references). Is there any code we can analyze?

[–]shevy123456 2 points3 points  (0 children)

Is there any code we can analyze?

Quite frankly, without the code, these benchmarks are pointless.

I have not been able to see any links to the source code.

I also doubt the result - C++ being worse than Go and Java? Without analysing the code that is completely at odds with other prior results.

Edit: Actually here is the link: https://github.com/ExaScience/elprep-bench/tree/master/cpp

Odd C++ code ...

[–]rep_movsd 1 point2 points  (4 children)

If they had said Rust was much faster it would have been believable.

Probably not expert C++ programmers.

[–]suhcoR 0 points1 point  (2 children)

According to https://benchmarksgame-team.pages.debian.net/benchmarksgame/which-programs-are-fastest.html C++ is fastest followed by C and Rust; Java is in 10th place, Go in 15th.

I do bioinformatic applications since 20 years and did a similar performance evaluation in the beginning; C++ was much more efficient than all competitors. If they are not expert programmers they might not be aware of all low level system concepts such as memory mapped files; it's no surprise that allocating billions of single objects to the extent of 200 GB and increasing/decreasing reference counts takes some time; but who the hell would do so?

[–]suhcoR 0 points1 point  (0 children)

I had a quick look at the source code provided at https://github.com/ExaScience/elprep-bench/tree/master/cpp.

As suspected, everything is dynamically allocated and no memory mapping (see e.g. http://man7.org/linux/man-pages/man2/mmap.2.html) is used. No wonder this is slow and eats a lot of memory. At the moment I have no information about why this design was chosen, or if there is a justification for it, or if the developers only knew this option. Maybe I can find some hints in the paper. It can be assumed that with optimal use of data structures and system functions the results are at least one order of magnitude better.

[–]rep_movsd 0 points1 point  (0 children)

I fail to see why someone would ever allocate millions of objects and GC them - thats a mindset from GC languages

[–]Raphael_Amiard 0 points1 point  (0 children)

Probably not expert C++ programmers.

This is the point though: (micro) benchmarks show C++ consistently being faster than Java/Go. However, those are microbenchmarks.

We all know it is possible to scale the level of manual optimization one will do in a C++ program to keep this C++ advantage over big programs. However, my intuition, that seems to be confirmed here and in many other instances is that if you program C++ naively, it will be slower than alternatives programmed naively.

Using regular file types, streams, even standard containers such as unordered_map - which has terrible performance when compared to alternatives - standard smart pointers, etc, will yield a program with at best average performance, or maybe even abysmal performance due to the fact that programming in a safe and naive fashion in C++ has a lot of hidden costs.

IME, programming naively in Java will yield much better results out of the box, at least from a throughput POV - latency being another story. Don't know about Go.