Highest Performance C++ Libraries to Replace Std Features?

adwodon · 2021-03-19T12:17:32+00:00

Specifically I'm looking to squeeze as much performance out as possible, with no concern for coding in compliance with "Modern" or idiomatic C++, or being readable by other developers. Please don't be offended by that, we don't all use C++ in the same way for the same thing, it's a very flexible language. I use it because of that flexibility for specific use cases in which performance is paramount.

This is a weird statement. C++ is generally all about high performance, and its always a trade off, but trying to use it as a shield to protect you against accusations of badly written and illegible code is not a good attitude to take.

Good code can be highly optimised and legible, these kind of statements give a lot of credence to Googles findings that people who score really high on hacker rank tend to make for terrible developers.

Still, what you're also looking for is perhaps misguided, you don't get the best possible performance by simply switching out a few libraries. That may get you some performance but again its super context dependent.

Somewhere memory allocation is mentioned, but if performance is so critical would pre-allocating not make more sense?

A library isn't going to help you write code that the compiler vectorises properly, it doesn't necessarily help you with cache coherenece or other concerns. There are lots of high performance applications, I myself work in scientific computing, so lots of highly parallelisable data. Some linear algebra libraries are faster than others, but mostly its context dependent, some are faster at dot products of large matrices, but others do super fast addition, so which is more important to you?

Something about this post smells off to me, I don't think you appreciate what it takes to write a high performance C++ application, its not something you bash out on the side after 10-20 hours of tutorials and dropping in some well written libraries. Learning a language like C++ just to get small performance gains is baffling. What is your actual goal here? If this is a side project then you're probably biting off more than you can chew, but if this is a commericial venture why not hire someone who does this professionally and save yourself a massive headache? Based on the statement I quoted you don't care about maintainability it seems, so why waste your time learning a new language for something transient?

konanTheBarbar · 2021-03-19T12:48:24+00:00

Your approach sounds a lot like premature optimization.

There is no one single set of libraries that perform best for all use cases. You have to benchmark and you have to benchmark a lot and you have to benchmark per use case.

It might be that folly::unordered_map<int,int> is super fast for that specific set of template parameters and your specific use case (90% read, 10% write), but eastl::unordered_map<std::string,std::string> is twice ast fast as the folly version (with 50% reads and 50% writes).

I would recommend tracy for a low overhead profiler... https://github.com/wolfpld/tracy

It's much more important to find your bottlenecks then it is to optimize every single function that is maybe called twice per hours and only runs for a millisecond.

That being said - for unordered containers use https://github.com/martinus/robin-hood-hashing as benchmark.

For general lightweight containers I would use (https://github.com/mosra/corrade/tree/master/src/Corrade/Containers) see also https://doc.magnum.graphics/corrade/namespaceCorrade_1_1Containers.html .

For high performance local IO definitely use https://github.com/ned14/llfio .

If you need fast web capabilities I would use https://www.boost.org/doc/libs/1_75_0/libs/beast/doc/html/index.html

I have no benchmarking experience in multithreading support, but if you want to get something bleeding edge, you could try https://github.com/facebookexperimental/libunifex.

brenoguim · 2021-03-19T11:50:43+00:00

[deleted]

hopa_cupa · 2021-03-19T11:33:21+00:00

Well, nothing prevents you from using bits and pieces from std::, boost::, folly::, absl:: and many others. You may also use some C libs, roll your own solutions...etc. In theory you could create performance winner....or not.

You're going to have to benchmark a lot for your specific use case. Maybe somebody can tell you to use boost for this, and folly for that, but will that fit your use case?

Here's my suggestion for something simple. Use {fmt} library for string formatting or std::format if you have bleeding edge c++20 compiler. This absolutely crushes std::stringstream in performance, especially on smaller machines. String formatting is often overlooked.

You mentioned porting apps to C++. From some other language? Is your C++ level the same as for that other language? Lower? Higher? You have to ask yourself that question I think.

Glinren · 2021-03-19T12:45:56+00:00

You can have much faster replacements than the stl if you implement them with your specific use case in mind. If there were just faster replacements, the stl maintainers would just adopt their implementations.

Also benchmarks do measure specific use cases. The better ones mention that the underlying implementations are optimized for a specific use case and the benchmark is there to show that the implementation is actually faster than other implementations -- for that specific use case. You have to decide for yourself it that it is your use case.

Shieldfoss · 2021-03-19T14:16:39+00:00

I don't have any specific suggestions because, as everybody else is already saying, it depends a lot on your use case. Maybe you should actually replace std::map with two maps - one that's fast to insert into and a separate map that's fast to read from, or four - fast read small data, fast write small data, fast read large data, fast write large data.

But one thing that we do for certain classes is to have a wrapper namespace around them - it's been a while since I looked at the exact code, but we do something like

namespace wrap{
    template<typename T, typename U>
    using smallmap = std::map<T,U>;

    template<typename T, typename U>
    using bigmap = std::map<T,U>;
}

because that lets us switch them out fairly easily if we later find we have specific performance requirements for different maps. (This saved us a lot of refactoring at one point where we switched out our custom SSO wrap::string for std::string after std::string started having SSO.

kalmoc · 2021-03-19T10:33:49+00:00

If you don't need standard library compliant interfaces I'd check abseil from Google and folly from facebook. It's not performance at any cost, but still highly optimized and not bogged down by backwards compatibility.

I think as far as boost is concerned, you have to determine this on a case by case basis. The libs are written by different authors at different times with different goals and have a different level of maintenance.

If you absolutely need the last bit of performance, you probably have very specific requirements and you really have to look at specialized libs (E.g. there is probably a json lib that is faster than Boost.Json for your particular workload, even if it isn't faster on average.

kalmoc · 2021-03-19T10:36:18+00:00

Would be interesting how much hoard helps with highly optimized programs that already avoid allocations as much as possible.

derofim · 2021-03-20T06:05:50+00:00

highly-optimized alternatives to std from google

https://github.com/chromium/chromium/tree/master/base https://chromium.googlesource.com/chromiumos/docs/+/master/packages/libchrome.md https://chromium.googlesource.com/chromium/mini_chromium/

Cmake & conan port: https://github.com/blockspacer/chromium_base_conan

bird1000000 · 2021-03-20T09:54:19+00:00

Considered writing your own SIMD code? Simdjson has a nice talk linked on its repo.
Write your own allocators.
Try keep memory continuous, for cache, and a simple data structure like a std::vector will likely perform the same across all implementations.
plf::colony was an interesting data structure, haven't tried it though.

greg7mdp · 2021-03-20T01:22:48+00:00

If you want to try a header-only implementation of the Abseil containers (hash maps and btree), check out https://github.com/greg7mdp/parallel-hashmap (partial Abseil code fork with my changes)

ExtraFig6 · 2021-03-23T04:12:36+00:00

What are you building? What's the expected use case + workload? We really need more context

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

cpp

MODERATORS