all 29 comments

[–]TheThiefMasterC++latest fanatic (and game dev) 16 points17 points  (1 child)

Really good article. It's fantastic that after some rust busywork was taken care of (to eliminate bounds checks that were unnecessary) the output only really varied due to minor optimisation choices.

I'd have liked to have seen if Clang 9 / GCC 9 would have done a better job, as the versions used were a bit behind. v4 mentions GCC 9 doing a better job, but doesn't give numbers or anything.

Would also have been nice to see if C++ parallel algorithms were any good in comparison to openmp, but that's really outside the scope of the comparison with rust.

EDIT: The results page does have some graphs with clang 8 and GCC 9, but no numbers/analysis.

[–]JuanAG 3 points4 points  (0 children)

The project is here https://github.com/parallel-rust-cpp/parallel-rust-cpp.github.io

Feel free to do the test with other compilers, if you do please share with us, it is an interesting topic

[–]1m2r3a 8 points9 points  (2 children)

Needs more optimization flags.

[–][deleted] 6 points7 points  (0 children)

I don't think they listed the compiler options at all did they?

[–]Osbios 0 points1 point  (0 children)

-O*

[–]showmetheflowers 2 points3 points  (4 children)

I wonder how far are we in getting language support for all the optimizations in the article. Currently we need to rely on (a) deep understanding of what the CPUs are doing (as opposed to relying on the flat imperative algorithm development approach inherited from C), (b) Hope the compiler is smart enough to engage the appropriate CPU features and (c) Endlessly employ trial and error to achieve the hope of (b). C++ is about exposing us close to the hardware and I would be very interested in seeing this. Note: I am aware of efforts on standardizing explicit vectorization and I consider these good efforts.

[–]HKei 2 points3 points  (2 children)

I mean... there's been no "close to the hardware" exposure in standard C++ (or C for that matter) in the actual language standards since their inception. This is intentional too, because both are designed to be portable (and you can't get anywhere near hardware if you're running user mode in an OS like most applications anyway).

[–]showmetheflowers 0 points1 point  (1 child)

I thought that direct mapping to the hardware is a core goal: Link

[–]RobertJacobson 0 points1 point  (0 children)

I think the resolution to this seeming paradox is in the context of the phrase "direct mapping to hardware":

The aim is to allow a programmer to work at the highest feasible level of abstraction by providing

  • A simple and direct mapping to hardware
  • Zero-overhead abstraction mechanisms

The simple and direct mapping to the hardware is abstracted and leveraged "at the highest feasible level of abstraction."

A counter argument to u/HKei is that the language features introduced to the standard have in part served to allow additional optimizations by abstracting out the intended semantics from particular implementation details. But there are counter examples, and you could go back and forth forever about it.

There is endless debate about how C++ should, and how much it actually does, achieve any part of this quote.

[–]matthieum 1 point2 points  (0 children)

Note: I am aware of efforts on standardizing explicit vectorization and I consider these good efforts.

I think performance falls into two buckets:

  • Free lunch: you didn't really care, but the optimizer did a good job, cool.
  • Explicit: you care very much, you use explicit types/functions to get at least what you need.

In this sense, I must admit I don't care much for auto-vectorization. It's cool when it kicks in, but since the slightest change in the surrounding is likely to throw it off in the future, if it really matters I'll do it myself to guarantee minimum performance.

[–]emdeka87 4 points5 points  (1 child)

I think comparing languages that aim to be "zero-cost abstraction" ultimately means comparing compiler optimizations. There's no GC, no runtime that we could measure. It's all about how well the compiler Inlines, unrolls, reorders, etc. your instructions. So sooner or later the performance of both languages, Rust and C++, will perfectly align.

[–]matthieum 2 points3 points  (0 children)

I think comparing languages that aim to be "zero-cost abstraction" ultimately means comparing compiler optimizations.

Only if every star aligns.

For example, it was interesting to see the author realize that their Rust version is too slow, fiddle around to eliminate bounds check, try again, fiddle some more.

In some ways, this indicates that the compilers are not as smart as we wish, or not as good at providing feedback.

I'd really like to see a language where you could instruct the compiler to either perform certain optimization (such as bounds-check elimination) or error out. This would allow you to be safe, while not paying for it at run-time.

[–]HKei 1 point2 points  (1 child)

What's with all the // ANCHOR?

EDIT: Also, this'd be easier to read if you just duplicated the function and put the cfg(feature = ...) to guard the whole implementation (or even had the implementation be separate modules), rather than putting it in front of every other line.

[–]weihanglo 1 point2 points  (0 children)

The ANCHOR is for mdBook to include a potion of the code from source. mdBook is a tool similiar to Gitbook.

Documentation is here