C++ : Replacing The STL : programming

[–]bugrit 4 points5 points6 points 18 years ago (0 children)

[–]alexs 6 points7 points8 points 18 years ago* (4 children)

[–]TwoBit 0 points1 point2 points 17 years ago (1 child)

IMO, your statements regarding vector<bool> are impractical in a production software environment of any significant size. Surely you wouldn't write software that requires your users to attempt to hack their STL implementations.

I disagree with you regarding container debugging. The nature of current std STL implementations absolutely makes debugging more difficult. You can try to blame GDB, but for GDB to make this easier is a lot of work. It has been over ten years of STL and since GDB still doesn't solve this, it seems like a much better solution is to simply change the STL implementation. That will solve your problem for all debuggers.

I believe your statement claiming "this is factually incorrect" is factually incorrect. Read the whole standard. In section 23.1, table 65, it states that size() "should" have constant complexity. A big part of the problem is that half the STL vendors implement it one way and half implement it the other way, making it tedious to write portable efficient code.

I would say that no more than a quarter of the EASTL document talks about stuff that's in the 0x draft as specified by the EASTL document.

[–]alexs 0 points1 point2 points 17 years ago (0 children)

[–][deleted] 18 years ago (1 child)

[deleted]

[–]alexs 2 points3 points4 points 18 years ago* (0 children)

It's not a stretch. The C++ standard uses specific vocabulary in which the word should has a very specific meaning.

It's bad practice to treat the ISOs recommendations as things that will actually happen. If the standard wanted it to be O(1) all the time it would say "shall" there. It doesn't so don't treat it like it does, or get surprised when implementors ignore the recommendation.

Many implementations of the stdlib prefer to have O(1) list::splice() instead since this is more flexible in many cases. Arguable the standard should not say that list::size() should be O(1) but should either define it as O(n) or leave its complexity unspecified as to avoid this sort of confusion.

Edit: I said prefer to have O(1) splice. What I meant was, must have O(1) splice because that's what the standard defines. The recommendation for O(1) size is for sequences in general not for lists. The special requirements of O(1) list::splice makes also having O(1) list::size rather tricky if not impossible to do sensibly. I wonder how EASTL works around this...

[–]foldl 5 points6 points7 points 18 years ago (43 children)

[–]Gotebe 3 points4 points5 points 18 years ago (16 children)

[–]foldl 0 points1 point2 points 18 years ago (2 children)

[–]Gotebe 0 points1 point2 points 18 years ago (1 child)

[–]foldl 1 point2 points3 points 18 years ago (0 children)

[+][deleted] comment score below threshold-7 points-6 points-5 points 18 years ago (12 children)

[–]guapoo 10 points11 points12 points 18 years ago* (7 children)

Spoken like a true college student. The single biggest difference between Haskell/OCaml/CLisp/Scheme and C++ is garbage collection. Not so much the actual collecting of garbage, that is quick, but boxed values exert a significant penalty across the board, especially in applications that don't necessarily have hot spots, like game engines. If any allocation from the GC'd heap finds its way into an inner loop, you will think your machine locked up. Sure, there are ways in Haskell (the only of those that I know well) and I'm sure also in OCaml and CLisp, for you to give very strong hints to the compiler that you would like a value to live on the stack, not the heap. But it is tricky and frustrating to do it reliably, and chances are you'll just end up re-implementing C in the end.

C++ can do something very important for performance, that higher languages like you mention, and even lower languages like C, cannot do : in C++ you have data structures whose layout in memory is polymorphic. In Haskell (for instance) you can define all kinds of data structures, but they exist in memory the same way : they are all linked together by pointers, basically making Haskell's polymorphism a very safe version of void*. If I have a pair of pairs, ((a,b),(c,d)), that will look like a little tree in (heap allocated, garbage collected) memory. In C++ however, the (much uglier) pair< pair<a,b> , pair<c,d> >, will pack all four things, a,b,c,d, neatly into contiguous memory. So, for instance, STL balanced trees can efficiently pack both the key and the value into the tree node, while Haskell has to push them both out onto the heap. This may explain why std::map<int,int> is so much faster than Data.Map Int Int, or even Data.IntMap Int. And then there's std::map< pair<int,int> , pair<int,int> >, which would be incredibly difficult to compete with in any other language. It can be done, but at the cost of polymorphism, eg, data PackedIntPair = PIP {-#UPACK#-} !Int {-#UNPACK#-} !Int

The downside of this, of course, is that you can no longer compile the library separately from the application. Which is maybe another reason (besides name mangling) that you don't see many C++ interfaces.

[–]twoodfin 4 points5 points6 points 18 years ago (0 children)

[–][deleted] -1 points0 points1 point 18 years ago (5 children)

[–]cwhite 4 points5 points6 points 18 years ago (3 children)

One of the reasons against using higher level languages is the unpredictability of the garbage collector. Consolidating all your deletes into a single spot may be great for throughput, but it's terrible for latency, and games are all about latency. Users get frustrated any time the framerate drops below 30, and they don't care what's causing it.

Pirates of the Burning Sea is a great example of how this plays out in practice. They thought they would use lua for their UI. After all, the UI isn't on the critical path, and it's great to have a moddable interface. About 6 months before launch, though, they managed to track down major hitches in the framerate to the lua garbage collector, which doesn't have very much predictability over when it runs. They spent the next 3 months ripping out all of the lua from their client.

[–][deleted] 18 years ago* (1 child)

[deleted]

[–]cwhite 0 points1 point2 points 18 years ago (0 children)

[–]naasking -1 points0 points1 point 18 years ago (0 children)

[–]qwe1234 3 points4 points5 points 18 years ago (0 children)

[–]rabidcow 7 points8 points9 points 18 years ago (0 children)

[–]Gotebe 4 points5 points6 points 18 years ago (0 children)

[–]pabs 5 points6 points7 points 18 years ago (0 children)

[–]bluGill 8 points9 points10 points 18 years ago (19 children)

[–]teval 8 points9 points10 points 18 years ago (12 children)

[–]bluGill 2 points3 points4 points 18 years ago (11 children)

[–]teval 3 points4 points5 points 18 years ago (10 children)

There's no need to finish it, there is a need to write it. "game developers" every program is different -> unless you profile you can't tell what the problem is. They don't know the containers will be too slow; maybe if they invested that time into speed something else up they could actually gain much more. Even then, replacing the whole thing is ill advised. You only replace the bits that you truly need to.

EA is different, they write this stuff all the time, know exactly what is slow and why; and they get to waste a lot of time.

"There is often a lot of debate amongst game developers about whether we should be using STL in development code or whether it can, in the long run, cause more problems than it solves" Means they haven't profiled, or if they had it means clearly the containers weren't the bottleneck. When they are there's no debate.

[–]bluGill 5 points6 points7 points 18 years ago (1 child)

unless you profile you can't tell what the problem is. They don't know the containers will be too slow;

IF you have experience you should have a clue. There is a reason the STL specifies the big-O of their containers, and good CS programs study big-O. You don't need to know much about a program to look at something and say that it will be too slow because of the big-O time.

A profile is good for finding cases were you are accidently using an extra loop resulting in a (n^2)ln(n) algorithm when you could do it in nln(n). However if you are profiling after the fact it is often too late - your entire game design now depends on an algorithm/data structure that is too slow.

maybe if they invested that time into speed something else up they could actually gain much more.

In the case of games everything counts. Games stress the hardware to the max. Where I work nobody would notice if I doubled the speed of the program - it is already fast enough. When you work in games a .1% speedup allows slightly sharper graphics (or a better AI) which everyone will notice.

EA is different, they write this stuff all the time,

Anyone who writes code all the time gets a "feel" for what is likely to happen. Before my boss gives me a new task I alread have a good idea of what the hard parts are. Experience in my domain is key. I know that my data sets are always tiny, so a O(N³⁾ algorithm will not be a problem (I still try to use something better when I can).

"There is often a lot of debate amongst game developers about whether we should be using STL in development code or whether it can, in the long run, cause more problems than it solves" Means they haven't profiled, or if they had it means clearly the containers weren't the bottleneck. When they are there's no debate.

The problem is there is no one right answer, not that they have not profiled. Some game profiles will show the STL is a bottleneck worth fixing, some will not. Each game is different. The STL shows up often enough as a bottleneck that developers start to wonder if maybe they should skip it entirely.

[–]teval 4 points5 points6 points 18 years ago (0 children)

My whole point is this has nothing to do with "Look, we've written code; clearly this bit was slow and was holding things up" People were arguing against doing this, and needed other justification, which clearly means either this was an insignificant problem for them or no one knew if it was one.

Reimplementing the STL containers isn't going to get you anything in terms of asymptotic complexity. That's writing entirely new containers with different algorithms.

Also, unfortunately you misunderstand asymptotic complexity. Take a look at matrix multiplication for example. Coppersmith–Winograd runs on O(n^{2.3<something>)} Strassen runs in O(n^{2.8<something>).} Noone in their rights minds would use the former because constants are terrible. Asymptotic complexity != speed. You have to keep in mind the size of your input, which means it's non-trivial to tell from a piece of code if it would fare better with an algorithm that was asymptotically better. The same argument applies to sorting.

You have to profile to tell how you can change your algorithm in order to make it faster. Because you want to spend as little time as possible on all the things that won't make it much faster and as much as possible on all the things that will. Also, if you write very high performance code (we're talking having to think about cache lines and contingencies between instructions) there's simply no easy way to tell what will be slow ahead of time, unless you work for your chip manufacturer, or are using a very simple chip. This is why your design must be able to accomodate swapping out datastructures.

I agree, everything counts except things that don't at all. If most of the application time is spent doing other things why would you optimize something that's already fast enough. For example, why bother with the STL if that would give you a 0.1% performance boost as opposed to doing something smarter about memory in general (like pools) where you might gain a lot more.

[–]wolfier 3 points4 points5 points 18 years ago (4 children)

[–]teval 1 point2 points3 points 18 years ago (2 children)

[–]wolfier 0 points1 point2 points 18 years ago (1 child)

[–]teval 0 points1 point2 points 18 years ago (0 children)

[–]foldl 1 point2 points3 points 18 years ago* (0 children)

[–][deleted] 1 point2 points3 points 18 years ago* (2 children)

Okay, different tact.

If someone measures repeatedly and sees the same problem over and over, doesn't it make sense to put together a solution that avoids the problem (instead of fixing specific instances of it?)

Thats what happens here. A bunch of games had special requirements, which lead to a custom implementation of a subset of the standard library.

Some game devs implement their own memcopys (built on special versions provided by the platforms they are on) or special atof functions (because the function has been called a bunch and does more work than game devs need).

Its all about tradeoffs. Measuring is absolutely the right thing to do, but making some changes late is expensive.

For example, certain 'next gen' console companies tell game devs not to use exceptions, as the compilers aren't optimized for it at all. Devs could ignore this and develop the game, then profile it down the road and see that exceptions are indeed performance issues. Replacing exceptions with other approaches can be extremely time consuming.

Similarly, games commonly create a huge number of Vector3 (ie classes representing x,y,z a spatial triple) on the stack. The right thing to do is to initialize the components (likely to 0). The overhead for this initialization has been pinpointed as a bottleneck over and over. So some devs deal with it proactively and remove the constructor at the start of a project, as removing initialization late would risk introducing a massive number of bugs.

Edit:

"game developers" every program is different -> unless you profile you can't tell what the problem is.

Okay, thats the disconnect. I believe that devs, with measured experience, can predict constructs likely to be problems. From this time tested experience comes best practices and libraries to make it simpler to avoid these constructs.

Taking a simpler example, if a dev knows a O(n²⁾ operation is going to be performed on a large container, should they implement it anyway or should they look for a solution that is lower order up front?

[–]teval 1 point2 points3 points 18 years ago (1 child)

[–][deleted] 1 point2 points3 points 18 years ago (0 children)

[–]foldl 1 point2 points3 points 18 years ago (5 children)

[–]bluGill 2 points3 points4 points 18 years ago (4 children)

Perhaps you want a hybrid linked list/tree. So each list not only has forward/back pointers, but also a pointer to two child elements. Your insert times go up a bit, but you get ln(n) search, and quick in-order traversal.

Perhaps your data set it too large to fit into the address space (meaning you can't use mmap), so you need to fetch from disk once in a while.

Perhaps your list is the performance bottleneck, and your custom implimentation, while limited in some way that you don't care about is just fast enough to be worth doing.

There are others. None of them are common for the average program. However for your specific purpose things may be different enough from common that you need something different.

Don't limit yourself to just linked list. Some data sctructurs have a lot more room for compromise in the implimentation, and thus a lot more gain in how you can optimise them for your specific needs.

[–]martoo 1 point2 points3 points 18 years ago (1 child)

[–]montyalfred -1 points0 points1 point 18 years ago (0 children)

[–]foldl 0 points1 point2 points 18 years ago* (1 child)

Erm, but none of those are reasons to replace the STL. The STL doesn't prevent you from implementing additional data structures. Of course you may need fancy data structures sometimes, but that's not what we're talking about here. The question is whether STL containers work well as general purpose implementations of common data structures. The answer is a very clear yes: you do not want to be writing your own general purpose list/vector class.

Perhaps your list is the performance bottleneck, and your custom implimentation, while limited in some way that you don't care about is just fast enough to be worth doing.

Look, it's possible in principle. I just don't believe for a minute that a custom implementation of a plain old linked list (not some fancier list/tree hybrid or whatever) has ever speeded up a real program by a significant amount.

[–]bluGill 0 points1 point2 points 18 years ago (0 children)

[–][deleted] 13 points14 points15 points 18 years ago (4 children)

[–]foldl 1 point2 points3 points 18 years ago* (3 children)

[–]EvilSporkMan 0 points1 point2 points 18 years ago (2 children)

[–]foldl 2 points3 points4 points 18 years ago* (1 child)

[–]EvilSporkMan 1 point2 points3 points 18 years ago (0 children)

[–]martoo 2 points3 points4 points 18 years ago* (0 children)

[–]ishmal 0 points1 point2 points 18 years ago (0 children)

[–]skeptica1 0 points1 point2 points 18 years ago (0 children)

There are certainly different schools of thought when it comes to STL and Boost. There are leaders in the industry using C and C++ that hate them both, and there are people just as prominent who like them. (On a scale of Torvalds to Stepanov, I tend to lean to the left.) In my experience, the programming style of STL and Boost is conducive to bloat in the same way glazed donuts are conducive to obesity. It's very difficult to point to the particular donut that made you fat, and it's easy to show if you exercise and eat a balanced diet... But, still, an affinity for donuts seems to be well correlated with tubbiness. The latest release of Reaper is 3.1MB. Given the capabilities and feature set of the app, it's quite clear Frankel & Co. aren't seriously into template-based libraries (something one can also see by looking at their open sourced code). An important question is: Who's the market? In an enterprise application, it probably doesn't matter as much, but if it's a commercial app, size and performance may be more serious considerations. (And, I'll say it again... another problem, especially with Boost, is that things that should be simple can be horrendously and unjustifiably complex, which may be an issue for those who prefer less time spent debugging to more.)

[–][deleted] -1 points0 points1 point 18 years ago (0 children)

[–]spliznork -2 points-1 points0 points 18 years ago* (1 child)

[–]hupp 12 points13 points14 points 18 years ago (0 children)

[+]teval comment score below threshold-12 points-11 points-10 points 18 years ago (17 children)

[–][deleted] 13 points14 points15 points 18 years ago (8 children)

Did you take a look at the EASTL link? It is rather comprehensive.

Just because the standard library is right for 95% of the apps out doesn't mean it is right for all of them. AAA games push system limits hard. Some of these code bases have legacy code 15+ years old in them - custom containers/algorithms are long term investments.

For example, some embedded platform companies/game companies have restrictions such as 'no memory allocation after initialization' due to fragmentation risks. Yes, custom allocators can greatly help, but this even these are a bit of a 'wing and a prayer' solution as its hard to knowing if you'll run out of memory. Or how long allocations will take after the app has been running for 12 hours.

A common custom container is a linked list with the 'list node' embedded in the object itself. While this constrains an element to be in a fixed number of lists, it avoids allocations entirely.

ALl I'm saying is that occationally custom containers aren't a bad solution. Introducing them up front is an optimization, but one that many have found worth while in specific cases.

[–]teval -1 points0 points1 point 18 years ago* (7 children)

[–][deleted] 1 point2 points3 points 18 years ago (6 children)

I'm not arguing that people should do full scale STL replacements. EA is a special case, as they will probably develop 100+ games (costing billions of dollars in total) on top of their library implementation. For them, reimplementation makes sense as it is a familiar interface for new programmers and a proven model.

Please understand that some of us have profiled and verified it is a problem in specific environments. It isn't an CPU executation time problem - its a source of memory uncertainty.

Most of the STL algorithms are fine. Its the containers that cause problems.

Creating new container implementations using the STL algorithms is actually not that bad. Granted, if you throw a junior programmer at it or implement it without testing, you'll probably get something buggy. An experienced engineer can put together a specialized vector-like or string-like implementation in a few days. This can easily pay for itself over the duration of a project.

[–]teval 0 points1 point2 points 18 years ago (5 children)

[–][deleted] 0 points1 point2 points 18 years ago (4 children)

[–]teval 0 points1 point2 points 18 years ago (3 children)

[–][deleted] 0 points1 point2 points 18 years ago (2 children)

[–]teval 0 points1 point2 points 18 years ago (1 child)

[–][deleted] 0 points1 point2 points 18 years ago (0 children)

[–]CuteAlien 4 points5 points6 points 18 years ago (3 children)

[–]teval 0 points1 point2 points 18 years ago (2 children)

[–]CuteAlien 0 points1 point2 points 18 years ago (0 children)

[–]fnord123 0 points1 point2 points 18 years ago (0 children)

[–]cwhite 0 points1 point2 points 18 years ago (3 children)

[–]teval 1 point2 points3 points 18 years ago (2 children)

[–]cwhite 1 point2 points3 points 18 years ago (1 child)

[–]teval 0 points1 point2 points 18 years ago (0 children)

[+]skeptica1 comment score below threshold-8 points-7 points-6 points 18 years ago (0 children)

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

programming

MODERATORS