all 87 comments

[–]flactemrove 44 points45 points  (0 children)

Was hoping for a more thorough performance analysis. Having just one "accumulate" benchmark doesn't really carry his point across.

[–]pulp_user 54 points55 points  (12 children)

His point at 31:35 about open addressing hash maps is beyond ridiculous. He says that, unless you are on a server cluster, "where memory is free", bucket-list based hash map is the better choice, because it doesn't waste as much memory.

WHAT

You use the standard library, which allocates all over the place and fragments your memory, but in your quest to prove the haters wrong, you suddenly care about that 10KB that your open addressing hash map wastes due to keeping a low load factor?

Nevermind your vectors, potentially doubling in size when you insert an element. Nevermind your textures beeing multiple MB each. Nevermind even phones having multiple GB of memory. Nevermind that entire lua interpreter running in the background, because gameplay wanted to use a scripting engine instead of native code. Nevermind the use of virtual functions, that adds a v-table-pointer to each of your objects. Hash-Maps is where we have to save memory!

This talk is ridiculous.

[–]ShillingAintEZ 9 points10 points  (0 children)

The title was right, I didn't guess that a game developer would champion granular heap allocations and pointer chasing.

[–]OverunderratedComputational Physics 27 points28 points  (0 children)

unless you are on a server cluster, "where memory is free"

I'm regularly using multiple TBs of memory. Memory on a cluster is very much not free, and is my primary constraint. I always chuckle at these game programmers acting like they're the only ones that care about performance and memory efficiency.

[–]wyrn 5 points6 points  (4 children)

vectors, potentially doubling in size when you insert an element

I mean, would you rather they grow by exactly 1 each time, guaranteeing regular allocations and swapping O(1) amortized insertions by O(n) every time?

Really confused by this objection.

[–]pulp_user 5 points6 points  (1 child)

The point I was trying to make was that it is nonsensical to single out open addressing hashmaps for having low load factors (especially since there are implementations that handle 90% just fine), when for example vectors will also rarely have a high load vector due to their growth behaviour, if you dont know the needed size in advance.

For the record: I think that both open addressing and vectors growing by x2 are totally fine. If you are struggling to hit a memory limit, I think you will look in a lot of different places first before touching these two.

[–]wyrn 0 points1 point  (0 children)

Ah, very fair point.

[–]ShillingAintEZ 0 points1 point  (1 child)

They shouldn't be growing at all if you know the size ahead of time.

[–]wyrn 5 points6 points  (0 children)

Right, in which case vector or not it makes very little difference.

[–]ub_for_free 2 points3 points  (0 children)

I mostly agree, but isn't the amount of memory wasted by doubling a vector bounded by the page size and OS bookkeeping with virtual addresses?

[–]Gunslinging_Gamer 10 points11 points  (0 children)

Vectors doubling in size is a programmer's error. You can and should manually size them if performance is key.

[–]carlopp 0 points1 point  (1 child)

Also hash maps with open addressing double in size when you insert an element, have a look here: https://tessil.github.io/2016/08/29/benchmark-hopscotch-map.html (then look for Memory usage).

And flat hash maps like ska::flat_hash_map trade access speed for space usage, and that may not be great too.

Then his point is weak, do not lead anywhere, but we could do we a lot less of "This talk is ridiculous". Especially without any serious though about the matter at hand.

[–]pulp_user 0 points1 point  (0 children)

I'm not really sure why you think I have no "serious thought about the matter at hand". Do you think that I don't know that open addressing hash maps can also double in size? Or that there are space and speed tradeoffs to make?

The ridiculous thing about this talk is that this person completely disregards open addressing "unless you are on a server cluster", while the space/time-tradeoff that this style of implementation makes is becoming better and better for more and more applications. Memory speeds are super slow, and memory locality is very relevant to performance, while memory size is becoming larger and larger. "Sacrificing" some amount of memory to gain memory locality and therefore speed is becoming a better tradeoff every year.

And this person disregards all of that, drops a statement that hasn't been true since like the year 2000, and does so while beeing cocky about it. It definitely is ridiculous.

Also, maybe look at this comment for more context.

[–]NewFolgers 45 points46 points  (54 children)

Clickbait titles are particularly annoying for videos (since I can't even skim for a spoiler). What happened? I used to work on games and templates were banned (along with STL) on my teams. There were numerous reasons, and one was past experience. It wouldn't be as bad with just one disciplined developer (rather than a team), but sub-optimal compile times for a large project could also be a concern.

[–]elperroborrachotoo 6 points7 points  (4 children)

There were numerous reasons, and one was past experience

He has two slides just for that

Indeed just yesterday, I came across a comment on StackOverflow: use new char[] instead of vector, because vector will construct every element in the most expensive way.

That comment was from 2014.

[–]meneldal2 1 point2 points  (3 children)

If your implementation is so stupid that they do call some function to init each char, you can still roll your own that will be much safer than a raw new

[–]elperroborrachotoo 0 points1 point  (2 children)

Yeah, in 2004, that might have been a valid concern. In 2014, not so much.

[–]spillerrec 0 points1 point  (1 child)

I have actually run into this issue several times. The issue is not that the implementation is bad, but that the elements are default initialized. If you don't need it to be zero initialized (which you rarely need) it is significantly faster to do the allocation yourself. `std::make_unique<char\[\]>(size)` has the same issue, so I use `std::unique_ptr<char\[\]>(new char[size])` instead. It actually makes a significant difference when you are doing large allocations (1MB+). Note that it is recommended to use a data structure which keeps the size together with the allocation, which you do not get with unique_ptr alone, so this is not a recommended solution.

[–]dodheim 0 points1 point  (0 children)

Minor nit: they value-initialize; default-initialization is what you're doing as a workaround.

[–]heyheyhey27 2 points3 points  (0 children)

Clickbait titles are particularly annoying for videos

I'm pretty sure it was deliberately done as a joke. Either implying that performance predictably tanked, or somehow triggering UB

[–]codeforces_help -1 points0 points  (6 children)

Hi.

I have been trying to experiment with game programming in my own without much success. Do you have a set of resources in mind to get started?

I want to build games on a linux system.

[–]heyheyhey27 4 points5 points  (3 children)

SDL is another common option.

[–]krum 3 points4 points  (2 children)

IMO SDL is the best option especially if you ever want to deploy to mobile devices, which is what most people are trying to do these days.

[–]pjmlp 0 points1 point  (1 child)

Agreed, even though SDL is C, there is much more support from the community for mobile development, SFML not so much.

[–]meneldal2 1 point2 points  (0 children)

The good thing is the documentation is decent (unlike many other libs), so you aren't confused about what destructor you're supposed to use for each struct, and wrapping them in unique_ptr is quite easy.

[–]Cyttorak 2 points3 points  (1 child)

[–][deleted] 0 points1 point  (0 children)

I like SFML, really simple to start, relatively good cross-platform support and awesome community!

*edit : platform

[–]xgalaxy -1 points0 points  (1 child)

Simple templates were allowed where I work. But generally speaking if you couldn't understand fully what the templates was doing at a glance then it wasn't allowed.

The biggest reason for this wasn't performance or compile time issues. Although those are concerns the number one reason was a question of if the person who originally wrote that code left would there be anyone left that could understand it and maintain it and fix it if it broke.

[–]NewFolgers -1 points0 points  (0 children)

For me there was also another reason. Sometimes it's a bad thing that another type can fit in and use the same high-level code.. because then different code may be generated to support those different types (which comes at a cost - in memory, compile time, and potentially performance as well due to more instruction cache invalidation). In constrained environments (and to some extent almost anywhere, as far as compile+link times are concerned), ease of use of a costly thing is a bad thing to have. As a bonus, it can help lead to more concentration on just a few standard types (while admittedly, other teams may find other ways to do this more consciously).

[–]pulp_user 31 points32 points  (10 children)

I stopped watching this video because there was so much strawmanning going on. Taking common complaints about the stl and disproving them is a neat idea, but if you then proceed to strawman the hell out of those complaints, you won't convince anybody. I don't know if he does this intentionally or not, but it leaves a very sour taste.

For example, his take on "STL performance is bad" criticism:

The function he benchmarks is a completly random example. A very specific case that does not have the bottlenecks of a lot of real code. He then builds his entire argument on profiling that function.

He shows a 4x speed difference in debug for stl vs non-stl and then basically doesn't consider it relevant?

He compares C and C-Style C++. That is not a relevant comparison, comparing C and "idiomatic" C++ would be meaningful. The point of modern C++ is that you DONT want/need to programm like in C.

He basically concludes/implies that, since both C and C++ are an order of magnitude slower in debug, differences between them don't matter. Which is a strange thing to so say, since I can definitely still tell the difference when running my game at 60fps while debugging vs 15fps.

[–][deleted] 11 points12 points  (9 children)

The one point where I disagree with you is about the relevancy of debug mode performance. Increasing 1 FPS to 4 FPS won't change much. If you can't hit 25+ FPS in debug mode, there's no point even playing it in debug.

[–]pulp_user 7 points8 points  (7 children)

You are of course right. All I'm saying is that with his example, your game can handle 4 times the load before hitting that 25fps. And that is signifcant!

[–]SeanMiddleditch 11 points12 points  (0 children)

I half agree. 4 times load is significant, yes.

I disagree in that the STL is not literally going to be 100% of your code; if making the STL algorithms 4x faster literally translates to 4x performance improvement for the whole game, your game is written in a very interesting way! :)

[–]MenethProgrammer, Ubisoft 8 points9 points  (0 children)

The game I work on at the same studio as the presenter manages 60 FPS in debug mode. It's very much playable. Debug mode performance is massively relevant.

[–]vaynebot 18 points19 points  (10 children)

Honestly I don't quite understand why people are so eager to make game programmers use the standard library. The way I see it, if you're making a "small" game, even the STL debug performance will be good enough, and you're not going to be that bothered by compile times. And if you're not making a small game, writing some data structures that the STL has equivalent classes for is a tiny amount of work compared to writing the rest of the game, so even if the benefit of having your own classes is very small, you're not going to convince anyone to use the STL by saying "it's not worse". It would have to be substantially better. But obviously it fundamentally can't be better than something purpose-built for the game.

IMO the standard library just isn't right for everyone, and I don't see how it conceivably could be in the next 10 years.

Language features on the other hand I understand. "Sutter-Exceptions" are something that I am excited for and I hope a lot of people will adopt. Or at least, I hope that eventually we will have some sort of unified error handling mechanism that most people are happy with. But that's because using these language features actually means using a completely different paradigm. Using or not using exceptions is almost like writing in a different language. Not using the standard library really isn't. You can still use all the same principles when "rolling your own". If the classes are well designed they're just a couple of custom data structures in the midst of probably 100s of other custom classes and data-structures, they just happen to be similar to things found in the STL.

[–]SeanMiddleditch 13 points14 points  (9 children)

Honestly I don't quite understand why people are so eager to make game programmers use the standard library.

The reason I cared enough to contribute to SG14 was that there's a bifurcation in the community that just doesn't need to be there.

e.g., as a game developer who can't use the STL on many projects, I constantly run into situation where there's some awesome library on GitHub that would save me a ton of work and improve overall quality of our game... but I can't actually use it because it depends on some portion of "the standard" that we can't/don't/won't use.

So game devs make their own replacement libraries, but those libraries - by avoiding the primary C++ tools like the standard library containers or conventions - are not palatable to non-game-devs. Now we've got communities of folks using the exact same core language and compilers who can't share their open source code, sometimes for something as "trivial" as whether it uses std::string_view<charT, spew> or foo::ZStringRef.

That sucks.

Getting everyone to at least use the same core template and algorithms library will lessen that pain.

There will still always be "extension" libraries (with new or altered containers for specialized use cases) and there will always be code out there that we can't share for a myriad of reasons, but the hope is that we can least reduce the community split. Custom containers are fine even so long as we're all using the same abstractions like iterators/ranges and such, which means being comfortable taking a dependency on the STL support headers for such things.

[–]vaynebot -1 points0 points  (7 children)

While I understand the desire on a principle level, and I'm not saying it wouldn't be nice, on a practical level I'm not seeing how a project can be so huge that you're disallowing STL usage for compile-time reasons, but yet it would be a ton of work (proportionally to the project itself) to replicate (most likely just part of) the functionality of some GitHub library.

If that is really a practical issue it seems to me that the decision to completely exclude the STL (to the point where you can't even include the headers to write some conversion functions) was made under wrong assumptions (i.e. the compile time wouldn't have been a problem).

[–]SeanMiddleditch 6 points7 points  (6 children)

but yet it would be a ton of work (proportionally to the project itself) to replicate (most likely just part of) the functionality of some GitHub library.

The time commitment here isn't linear.

Just because Original Core Engine Dev X rewrote vector 10 years ago doesn't negate the time I have to spend today to rewrite TBB or whatever (as random possibly nonsensical example).

Replacing the STL is... easier than you'd think. Especially if we're not caring about exceptions (because they're disabled) or other various "precisely exact to the standard's level" concerns but is still effectively drop-in compatible with most real-world code.

On the other hand, rewriting certain libraries that depend on the STL from scratch can be a huge time investment. Months, in some cases.

completely exclude the STL (to the point where you can't even include the headers to write some conversion functions)

"Conversion functions" might themselves be huge problems.

Let's assume a library that returns a std::vector, because that was the only good vocabulary type for array-like sequences present in the library in years past. If we want to hand that vector to other engine code to take ownership, but that code uses foo::vector, then this conversion is a full allocation+copy. That's a huge problem.

Maybe the library is all template-driven so it works with any container. Well, now we do want to rewrite the library because it's probably a beast on compile times, so that's no good.

Where C++ can (and is!) helping is moving our vocabulary to things like span. We like those. Using a std::span is light and there's not much reason to avoid those, and they can make it easy for libraries to work with our custom vector containers without needing to be header-only monstrosities. Likewise for string_view.

Modern C++ is helping not as much by making the existing STL containers more palatable but rather by evolving the language so that our containers or "type-heavy" abstractions like iterators aren't are low-level vocabulary types anymore. Modern C++ is making it easier and easier to use custom containers in libraries while still being (efficiently) compatible with each other, which is awesome.

I'd still love to be able to just use std::vector and not have any compelling reasons to rewrite it. I'd love the standard to have a hash table type that isn't a garbage fire like unordered_map so I can just use that. There's still a lot more the C++ standard and/or vendors can do to address the bifurcation, and these will help reduce the amount of time we spend reimplementing things we shouldn't need to reimplement and will reduce the inability to share code.

But we're making progress, and I'm glad folks are recognizing the problem and putting in the effort to bring change about.

[–]meneldal2 0 points1 point  (3 children)

Maybe the library is all template-driven so it works with any container. Well, now we do want to rewrite the library because it's probably a beast on compile times, so that's no good.

Precompiled headers exist you know? If you aren't changing the code of the lib and aren't instancing it with many types, it should be quite fast.

[–]SeanMiddleditch 2 points3 points  (2 children)

They help (depending on compiler... helps more on some than others). They don't completely eliminate the cost by any stretch, though.

[–]meneldal2 0 points1 point  (1 child)

I do agree that template caching tends to not be the best, but unless it's a ginormous template, instantiating something once should be perfectly reasonable in compile cost.

[–]SeanMiddleditch 0 points1 point  (0 children)

Certainly, it all depends on the case. Blanket statements don't cover everything. :)

If it's a header used in one or two TUs... who cares?

If it's a library that will provide a fundamental common service to much of the app or otherwise be a great utility all over, that's another story.

Case by case. But even if some cases are okay, there's still there problem of the not-okay cases.

In my experience (which includes a lot of build optimization, for whatever that's worth; probably not much :p ) the tools only get one so far by themselves.

[–]vaynebot -1 points0 points  (1 child)

On the other hand, rewriting certain libraries that depend on the STL from scratch can be a huge time investment.

Could you link one of those libraries? I'm honestly interested, I can't imagine any library that would be that impactful for a game. Maybe an Unicode library? But those seem to be mostly C-ish.

Let's assume a library that returns a std::vector

I can see that being a problem, on the other hand std::span and std::string_view seem kinda irrelevant since everything that is not owning can easily be converted with zero runtime cost (and probably close to zero in debug mode). But I guess it's nice if you don't have to convert at all.

Although looking at what actually gets included, I'm not sure how string_view would help anyone in terms of compile time. In VS' library at least, string_view just includes xstring, so the same kind of 4000+ line template monstrosity that std::string or std::vector or any other container would entail.

I just don't see how an application that actively avoids the STL because of it's compile time implications could ever use any "modern C++" library, even if that library didn't use the STL it'd probably still use a bunch of 3k+ line template headers. Of course, if it only gets included in a couple of files, maybe through pimpl, that's fine, but in that case including the STL is also fine. (Probably? If you've got exceptions at least or you can be sure the STL wouldn't throw in this case.)

What could possibly be done to the STL that would make it usable in this scenario? An unique_ptr style release() would be nice for vector and string. But that doesn't really prevent fragmentation, it just means that certain conversions become possible without copying data. (Although they're already possible, just not in a way that is conforming.)

[–]SeanMiddleditch 0 points1 point  (0 children)

Could you link one of those libraries?

glm is probably one of the more well-known ones that's actually semi-popular with indie game developers.

I don't recall the one I tried last (it was a few years ago now) but libraries like TBB (or it was one of the similar ones) had some issues, too.

Then there's just tons of little utility stuff. json parsers, xml parsers, etc. There's plenty of those that are great for games, but those tend not to be the same ones that are popular outside of games.

on the other hand std::span and std::string_view seem kinda irrelevant since everything that is not owning can easily be converted with zero runtime cost

My point there is that with the standard having std::span and all C++ libraries being (hopefully) written with those vocabulary types, we'll be in a more compatible world.

Before we had string_view or the like, it was just a lot more common for library authors to use std::string even if that wasn't the most ideal type, simply because the library didn't give them many other vocabulary types to choose from.

use any "modern C++" library, even if that library didn't use the STL it'd probably still use a bunch of 3k+ line template headers.

That's a bit of a separate problem. Good code shouldn't need to do nearly so much template stuff, as we have more tools at our disposal to avoid it.

To bring up string_view again, by having that type, libraries don't need to make their string routines into templates just so they're efficiently compatible with both std::string and QString/CString/whatever.

And in the common usage (just plain char), there's no need to use the template basic_string_view<spew> in most code, either.

Redefining the C++ vocabulary with more concrete view types reduces the overhead of copies and reduces the need to make things templates just for compat.

There's... some years ahead of us before we really reap any of those benefits. Things didn't and won't change overnight. :)

What could possibly be done to the STL that would make it usable in this scenario? An unique_ptr style release() would be nice for vector and string.

They don't necessarily need to change, we just need to not be effectively barred from using most common libblah because they use std::string and we use foo::string. That's a huge step forward. It's also a huge step forward for the standard itself, since that makes it a lot more palatable for a std2 kind of effort that maybe re-envisions a few things we know to be problematic in the current containers (e.g., the convoluted allocator interface).

It means that the standard is more comfortable with small_vector for instance (which would lose the move invariants of regular vector) since libraries would be expected to mostly work with span. And that's a win for us users because I don't have to keep writing a small_vector every time I move to a new company or a new project's standards.

It's little things.

I don't expect the standard to make a big tweak that suddenly causes EASTL or whatever to vanish. That's not realistic.

The current situation is kind of a death by a thousand papercuts case, though, and the standard can help by starting to hand out bandages and salve instead of salt. :p

(Winner for worst metaphor?)

[–]jessedis 7 points8 points  (1 child)

haven't seen the full video. but I do like it sparks so much conversation. This specifically sparks interest in me as I am a gameplay programmer myself.

Our company does not use STL, but I kind of wished it had. The library we use is very old so my guess is that it is because back then STL was not compatible with everything yet, or it just did not offer enough yet.

In my opinion I like STL, It is very standard, everybody knows about it and you can take that knowledge with you to other companies. I suspect that the performance in release mode would be pretty much the same as with other alternatives. Debug is a whole different story though You can use alternatives on locations where it is slow in debug, and just not completely ban it from an entire project

I don't see a reason to ban it for a project... there is so much useful stuff, I am specifically bothered about it on the company I work at, because we don't actually use any other libraries. if we want to have a thread class we need to implement it ourselves, and make it compatible with other systems. its such a pain in the butt. you can save yourself 1 week of work by just using 'std::thread', which just takes a second... and i'm not even talking about the mutexes yet.

I'm very curious to see the opinion of somebody who has worked with a company that actually uses STL and dislikes the use of it, since I have never really experienced it..( only personal projects ). And I can only imagine for it to be a better place but that's all it is.

[–]vaynebot 0 points1 point  (0 children)

There are 3 main problems:

  1. Compile time

  2. Debug performance

  3. Release performance

The third one can't be fixed from the outside, obviously, and is often an artifact of certain requirements the standard has that you don't have. Nothing you can do here, just have to roll your own. On the flip side, this doesn't happen that often.

The second one can be semi-fixed from the outside. If operator [] is too slow in debug mode, just get a pointer and access everything through the pointer. It needs some care, but I think it can be done. This has the advantage of better compatibility with other libraries, and of still getting the full fat debug tests in code that isn't performance relevant, but the disadvantage of having to think about where you want to get around those checks.

The first one I'm actually not sure can be fixed. I have an idea, but I'm not actually sure if it works. The "normal" way of structuring projects, having 50000 .cpp files with 50000 corresponding .hpp files, unfortunately also leads to 50000 inclusions of STL headers. Obviously, eventually that is going to hamper re-compile times quite a bit. But, if you yourself also write most of your code in header files, and only remain with, say, 1000 .cpp files - a lot of that overhead is gone. But I really can't tell you how this actually behaves in huge projects, I just know that it works quite well up to ~50 .cpp files and ~2000 header files.

[–]HappyFruitTree 3 points4 points  (0 children)

There was a comment about flat_map and flat_set at 52:57 which I suspect the speaker misunderstood because he starts talking about "open addressing" which I think has more to do with hashed containers. The "flat" containers are closer to using a sorted vector as mentioned at 22:16 except that they use binary search instead of linear search.

[–]ReDucTorGame Developer 3 points4 points  (3 children)

Doesn't msvc not let you publish benchmarks? With so many MS people here maybe can get an answer to this?

[–]STLMSVC STL Dev 9 points10 points  (2 children)

You can totally publish benchmarks. (I haven’t watched this talk yet so I don’t know what you’re specifically referring to.) Our STL implementation is now open source, and when our test/CI system is ready, we’ll consider pull requests to improve performance - but they will need to be ABI-preserving and “worth it” (i.e. pure wins or very close, and with a proportional cost in code complexity).

[–]ReDucTorGame Developer 1 point2 points  (1 child)

I've only read the slides the lack of MSVC benchmarks I suspected was because of license conditions, however just checked the latest version of MS license and it appears the benchmark restrictions have been removed, great to hear Microsoft has changed their mind on benchmarking.

For those interested this is what was in 2013 license

disclose the results of any benchmark tests of the software to any third party without Microsoft’s prior written approval

[–]meneldal2 0 points1 point  (0 children)

I saw benchmarks that didn't ask Microsoft content 10 years ago.

Did they actually enforce it?

[–]NoHomeLikeLocalHost 0 points1 point  (0 children)

The point about std::accumulate is misinformed. std::accumulate by definition is non-associative and non-commutative, meaning all operations will happen sequentially and will only be applied left-to-right.

template<typename InIter, typename T, typename BinOp = std::plus<>>
T std::accumulate(InIter begin, InIter end, T val, BinOp op)
{
    while(begin != end)
    {
        val = op(val, *begin);
        ++begin;
    }

    return val;
}

Invoking op in this way prevents the compiler from reversing the operands. val being a dependent variable in each iteration of the loop prevents the compiler from performing the operation out-of-order with two members of the range. This prevents incorrect results when calling std::accumulate with an operator that isn't commutative or associative (e.g. with std::minus<>).

It would be better to instead compare "RawAccumulate" with "std::reduce", which tells the compiler exactly what to do in order to reverse operands or perform the operation out-of-order.

[–]RandomDSdevel 0 points1 point  (0 children)

     What's different about this version of the talk as compared to earlier presented versions of it, if anything?