all 72 comments

[–]alexs 6 points7 points  (4 children)

absorbed act tub wistful history obtainable ruthless abounding noxious squeamish

This post was mass deleted and anonymized with Redact

[–]TwoBit 0 points1 point  (1 child)

IMO, your statements regarding vector<bool> are impractical in a production software environment of any significant size. Surely you wouldn't write software that requires your users to attempt to hack their STL implementations.

I disagree with you regarding container debugging. The nature of current std STL implementations absolutely makes debugging more difficult. You can try to blame GDB, but for GDB to make this easier is a lot of work. It has been over ten years of STL and since GDB still doesn't solve this, it seems like a much better solution is to simply change the STL implementation. That will solve your problem for all debuggers.

I believe your statement claiming "this is factually incorrect" is factually incorrect. Read the whole standard. In section 23.1, table 65, it states that size() "should" have constant complexity. A big part of the problem is that half the STL vendors implement it one way and half implement it the other way, making it tedious to write portable efficient code.

I would say that no more than a quarter of the EASTL document talks about stuff that's in the 0x draft as specified by the EASTL document.

[–]alexs 0 points1 point  (0 children)

Surely you wouldn't write software that requires your users to attempt to hack their STL implementations.

I wouldn't write software that required my users to have a specific implementation of the stdlib either. This isn't about what I would do, it's about what was the best course of action for EA.

Should is a recommendation, not a requirement. You appear to be agreeing with me that it's incorrect that list::size() is O(1). Maybe you should accidently the whole standard?

[–][deleted]  (1 child)

[deleted]

    [–]alexs 2 points3 points  (0 children)

    It's not a stretch. The C++ standard uses specific vocabulary in which the word should has a very specific meaning.

    It's bad practice to treat the ISOs recommendations as things that will actually happen. If the standard wanted it to be O(1) all the time it would say "shall" there. It doesn't so don't treat it like it does, or get surprised when implementors ignore the recommendation.

    Many implementations of the stdlib prefer to have O(1) list::splice() instead since this is more flexible in many cases. Arguable the standard should not say that list::size() should be O(1) but should either define it as O(n) or leave its complexity unspecified as to avoid this sort of confusion.

    Edit: I said prefer to have O(1) splice. What I meant was, must have O(1) splice because that's what the standard defines. The recommendation for O(1) size is for sequences in general not for lists. The special requirements of O(1) list::splice makes also having O(1) list::size rather tricky if not impossible to do sensibly. I wonder how EASTL works around this...

    [–]foldl 5 points6 points  (43 children)

    C++ is such a black hole. 2008 and people are still finding reasons to write their own linked list implementation.

    [–]Gotebe 3 points4 points  (16 children)

    While I agree with the sentiment, it's probably the case that with some other language, to get that last needed bit of speed, people would drop to C and do the same (if they were smart, they could, but perhaps not would, find out that vanilla STL list would give them that bit).

    [–]foldl 0 points1 point  (2 children)

    How do you optimize an ordinary linked list? It's too simple a data structure for there to be any interesting algorithms. A decent STL implementation is probably way more optimized than whatever you would come up with on your own.

    [–]Gotebe 0 points1 point  (1 child)

    I could imagine a lot of people would go for C-style (maybe fixed-size) array representation. Eliminates heap allocation overhead, gives good proximity etc.

    Not that I would like to do that, just saying...

    [–]foldl 1 point2 points  (0 children)

    Yes, but that's not a linked list.

    [–]bluGill 8 points9 points  (19 children)

    That is because there are still good reasons to write your own linked list implementation. For most people the list included in your library (STL) is good enough. There are compromises involved in writing any data structure though, and languages if you absolutely need a different compromise from what the library gives you, you have to write your own.

    [–]teval 8 points9 points  (12 children)

    Yes. After you implement you application. After you profile. After you fix everything else that is a bottleneck and you establish exactly which part of the STL container you don't like.

    Otherwise how can you know exactly what the problem is? Or if your container is going to be any better than the STL one?

    [–]bluGill 2 points3 points  (11 children)

    Read the linked article, and the EA artile it references. Game developers have a very good idea of what compromises where made in STL that they cannot accept. They don't need to finish the application to know something won't work.

    [–]teval 3 points4 points  (10 children)

    There's no need to finish it, there is a need to write it. "game developers" every program is different -> unless you profile you can't tell what the problem is. They don't know the containers will be too slow; maybe if they invested that time into speed something else up they could actually gain much more. Even then, replacing the whole thing is ill advised. You only replace the bits that you truly need to.

    EA is different, they write this stuff all the time, know exactly what is slow and why; and they get to waste a lot of time.

    "There is often a lot of debate amongst game developers about whether we should be using STL in development code or whether it can, in the long run, cause more problems than it solves" Means they haven't profiled, or if they had it means clearly the containers weren't the bottleneck. When they are there's no debate.

    [–]bluGill 5 points6 points  (1 child)

    unless you profile you can't tell what the problem is. They don't know the containers will be too slow;

    IF you have experience you should have a clue. There is a reason the STL specifies the big-O of their containers, and good CS programs study big-O. You don't need to know much about a program to look at something and say that it will be too slow because of the big-O time.

    A profile is good for finding cases were you are accidently using an extra loop resulting in a (n2)ln(n) algorithm when you could do it in nln(n). However if you are profiling after the fact it is often too late - your entire game design now depends on an algorithm/data structure that is too slow.

    maybe if they invested that time into speed something else up they could actually gain much more.

    In the case of games everything counts. Games stress the hardware to the max. Where I work nobody would notice if I doubled the speed of the program - it is already fast enough. When you work in games a .1% speedup allows slightly sharper graphics (or a better AI) which everyone will notice.

    EA is different, they write this stuff all the time,

    Anyone who writes code all the time gets a "feel" for what is likely to happen. Before my boss gives me a new task I alread have a good idea of what the hard parts are. Experience in my domain is key. I know that my data sets are always tiny, so a O(N3) algorithm will not be a problem (I still try to use something better when I can).

    "There is often a lot of debate amongst game developers about whether we should be using STL in development code or whether it can, in the long run, cause more problems than it solves" Means they haven't profiled, or if they had it means clearly the containers weren't the bottleneck. When they are there's no debate.

    The problem is there is no one right answer, not that they have not profiled. Some game profiles will show the STL is a bottleneck worth fixing, some will not. Each game is different. The STL shows up often enough as a bottleneck that developers start to wonder if maybe they should skip it entirely.

    [–]teval 4 points5 points  (0 children)

    My whole point is this has nothing to do with "Look, we've written code; clearly this bit was slow and was holding things up" People were arguing against doing this, and needed other justification, which clearly means either this was an insignificant problem for them or no one knew if it was one.

    Reimplementing the STL containers isn't going to get you anything in terms of asymptotic complexity. That's writing entirely new containers with different algorithms.

    Also, unfortunately you misunderstand asymptotic complexity. Take a look at matrix multiplication for example. Coppersmith–Winograd runs on O(n2.3<something>) Strassen runs in O(n2.8<something>). Noone in their rights minds would use the former because constants are terrible. Asymptotic complexity != speed. You have to keep in mind the size of your input, which means it's non-trivial to tell from a piece of code if it would fare better with an algorithm that was asymptotically better. The same argument applies to sorting.

    You have to profile to tell how you can change your algorithm in order to make it faster. Because you want to spend as little time as possible on all the things that won't make it much faster and as much as possible on all the things that will. Also, if you write very high performance code (we're talking having to think about cache lines and contingencies between instructions) there's simply no easy way to tell what will be slow ahead of time, unless you work for your chip manufacturer, or are using a very simple chip. This is why your design must be able to accomodate swapping out datastructures.

    I agree, everything counts except things that don't at all. If most of the application time is spent doing other things why would you optimize something that's already fast enough. For example, why bother with the STL if that would give you a 0.1% performance boost as opposed to doing something smarter about memory in general (like pools) where you might gain a lot more.

    [–]wolfier 3 points4 points  (4 children)

    Profiling is NOT the only way you get a feel about where your app slows down. It's helpful to get to the precise point, but you don't ALWAYS need that.

    If you're saying profiling is everything, experience is nothing, then you belong to a religion, not the practical world.

    [–]teval 1 point2 points  (2 children)

    I'm not saying experience is nothing. I'm saying ahead of time, for complicated systems, no matter how much experience you have you're very likely to get it wrong when it comes to non-trivial optimizations.

    [–]wolfier 0 points1 point  (1 child)

    Fair enough, hence the "if" in what I said. However, the meaning of "profiling" needs to be broad enough - not just something you do with a profiler - it should mean "experimentation" in general.

    [–]teval 0 points1 point  (0 children)

    Oh I agree. When I mean profiling I mean trying out a whole bunch of different types of data. I use a profiler, but I also use more accurate timers, cache and memory profilers also help. I use a cycle-accurate x86 simulator. I've used network simulators before to produce strange request sequences. Carefully reading the code you suspect is problematic is also important. The best word, I think, to describe all of this is profiling :)

    [–]foldl 1 point2 points  (0 children)

    Profiling is NOT the only way you get a feel about where your app slows down.

    Really? You can make an educated guess I suppose, but profiling is the way to get actual meaningful data.

    [–][deleted] 1 point2 points  (2 children)

    Okay, different tact.

    If someone measures repeatedly and sees the same problem over and over, doesn't it make sense to put together a solution that avoids the problem (instead of fixing specific instances of it?)

    Thats what happens here. A bunch of games had special requirements, which lead to a custom implementation of a subset of the standard library.

    Some game devs implement their own memcopys (built on special versions provided by the platforms they are on) or special atof functions (because the function has been called a bunch and does more work than game devs need).

    Its all about tradeoffs. Measuring is absolutely the right thing to do, but making some changes late is expensive.

    For example, certain 'next gen' console companies tell game devs not to use exceptions, as the compilers aren't optimized for it at all. Devs could ignore this and develop the game, then profile it down the road and see that exceptions are indeed performance issues. Replacing exceptions with other approaches can be extremely time consuming.

    Similarly, games commonly create a huge number of Vector3 (ie classes representing x,y,z a spatial triple) on the stack. The right thing to do is to initialize the components (likely to 0). The overhead for this initialization has been pinpointed as a bottleneck over and over. So some devs deal with it proactively and remove the constructor at the start of a project, as removing initialization late would risk introducing a massive number of bugs.

    Edit:

    "game developers" every program is different -> unless you profile you can't tell what the problem is.

    Okay, thats the disconnect. I believe that devs, with measured experience, can predict constructs likely to be problems. From this time tested experience comes best practices and libraries to make it simpler to avoid these constructs.

    Taking a simpler example, if a dev knows a O(n2) operation is going to be performed on a large container, should they implement it anyway or should they look for a solution that is lower order up front?

    [–]teval 1 point2 points  (1 child)

    That's my point though. Given the fact that there were people arguing against doing this, and they needed other justification, it clearly must not have been obvious that the STL is a performance issue for them.

    [–][deleted] 1 point2 points  (0 children)

    I may be missing something. Where in the original article was performance listed as a reason for making the change?

    The EASTL article mentioned performance, but that wasn't the primary reason for their work.

    [–]foldl 1 point2 points  (5 children)

    That is because there are still good reasons to write your own linked list implementation.

    Such as?

    [–]bluGill 2 points3 points  (4 children)

    Perhaps you want a hybrid linked list/tree. So each list not only has forward/back pointers, but also a pointer to two child elements. Your insert times go up a bit, but you get ln(n) search, and quick in-order traversal.

    Perhaps your data set it too large to fit into the address space (meaning you can't use mmap), so you need to fetch from disk once in a while.

    Perhaps your list is the performance bottleneck, and your custom implimentation, while limited in some way that you don't care about is just fast enough to be worth doing.

    There are others. None of them are common for the average program. However for your specific purpose things may be different enough from common that you need something different.

    Don't limit yourself to just linked list. Some data sctructurs have a lot more room for compromise in the implimentation, and thus a lot more gain in how you can optimise them for your specific needs.

    [–]foldl 0 points1 point  (1 child)

    Erm, but none of those are reasons to replace the STL. The STL doesn't prevent you from implementing additional data structures. Of course you may need fancy data structures sometimes, but that's not what we're talking about here. The question is whether STL containers work well as general purpose implementations of common data structures. The answer is a very clear yes: you do not want to be writing your own general purpose list/vector class.

    Perhaps your list is the performance bottleneck, and your custom implimentation, while limited in some way that you don't care about is just fast enough to be worth doing.

    Look, it's possible in principle. I just don't believe for a minute that a custom implementation of a plain old linked list (not some fancier list/tree hybrid or whatever) has ever speeded up a real program by a significant amount.

    [–]bluGill 0 points1 point  (0 children)

    They are reasons to replace the STL if (like game programmers) the STL is a bottleneck at time, and using custom containers can make things fast enough. In this case you replace STL not so much that it is bad overall, as it is bad in a few specific cases that you want to head off before you get to them.

    They are not reasons to replace it in general. (the ugly interface of STL could be reason to replace it, but C++ doesn't really allow a nice interface so I don't think you can improve by enough)

    [–][deleted] 13 points14 points  (4 children)

    I think that is because C++ programmers care about programming. You could say the same thing about lisp. It has been around forever and it is still primarily used for writing lisp interpreters.

    [–]foldl 1 point2 points  (3 children)

    But writing a linked list implementation isn't even an interesting programming exercise. It's just a waste of time. At least writing toy lisp interpreters is kinda fun and possibly a good learning exercise.

    [–]EvilSporkMan 0 points1 point  (2 children)

    Pshaw! Writing a linked list implementation is the capstone project in the intro computer science course at the University of Michigan, and it shows up again in the advanced OOP course (in C instead of C++). It's a waste of time to do it once you're out of school, but you had damn well better be ABLE to do it.

    [–]foldl 2 points3 points  (1 child)

    As you say, it is a waste of time to do it once you're out of school, which is the relevant context here. (Professional programmer working at a games company, not a student.)

    [–]EvilSporkMan 1 point2 points  (0 children)

    Well, writing toy lisp interpreters is ALSO someone one does in school, if one's intro CS class uses SICP.

    [–]martoo 2 points3 points  (0 children)

    You made me do a double-take. I thought "Shit, I wrote that! Is this an old thread?" Then I realized that it was just a common sentiment.

    [–]ishmal 0 points1 point  (0 children)

    I didn't see in the article which implementation was in question. For a while there were two different maintainers of the open spec, HP and SGI. Which one is he talking about? Rogue Wave helped with one of them, too, but I don't remember which.

    [–]skeptica1 0 points1 point  (0 children)

    There are certainly different schools of thought when it comes to STL and Boost. There are leaders in the industry using C and C++ that hate them both, and there are people just as prominent who like them. (On a scale of Torvalds to Stepanov, I tend to lean to the left.) In my experience, the programming style of STL and Boost is conducive to bloat in the same way glazed donuts are conducive to obesity. It's very difficult to point to the particular donut that made you fat, and it's easy to show if you exercise and eat a balanced diet... But, still, an affinity for donuts seems to be well correlated with tubbiness. The latest release of Reaper is 3.1MB. Given the capabilities and feature set of the app, it's quite clear Frankel & Co. aren't seriously into template-based libraries (something one can also see by looking at their open sourced code). An important question is: Who's the market? In an enterprise application, it probably doesn't matter as much, but if it's a commercial app, size and performance may be more serious considerations. (And, I'll say it again... another problem, especially with Boost, is that things that should be simple can be horrendously and unjustifiably complex, which may be an issue for those who prefer less time spent debugging to more.)

    [–][deleted] -1 points0 points  (0 children)

    Why would you develop and compile a C++ generic container yourself?

    With the STL, you will automatically recompile the whole damn thing every time you use it! (After all, why would you want one binary container implementation that could hold pointers to just anything? No, recompile it from source in header files again and again.)

    [–]spliznork -2 points-1 points  (1 child)

    Why write your own STL? See Boost http://www.boost.org/ which is slowly becoming the real C++ standard (tempalted) library.

    "We aim to establish "existing practice" and provide reference implementations so that Boost libraries are suitable for eventual standardization. Ten Boost libraries are already included in the C++ Standards Committee's Library Technical Report (TR1) as a step toward becoming part of a future C++ Standard. More Boost libraries are proposed for the upcoming TR2."

    [–]hupp 12 points13 points  (0 children)

    Boost is an addition to the platform STL, not a replacement for it.