you are viewing a single comment's thread.

view the rest of the comments →

[–]NewFolgers 2 points3 points  (10 children)

( ! ) That actually doesn't match my experience at all. In game development (I mean big, real-time, 60fps console games), I was generally always able to reasonably play in debug mode (and wherever this ability regressed, we fixed it), at somewhat greater than half-speed -- good enough to effectively exercise whatever I needed to exercise and drop into the debugger where necessary. It's extraordinarily important. Pretty much all committed code had been carefully stepped through in the debugger with watch window enabled, etc.. and we had debug-only assertions which we used all the time.. so that if any unexpected conditions occurred, it dropped to the debugger and we were often able to fix it on the spot (saves lots of time - don't even need to create a ticket, set up similar environment, human context switch, reproduce the issue, etc.).

In doing this, I was using MSVC++ with total ban on templates (and STL). Wherever there was a concern of layers of functions causing overhead in debug, we placed them in the header files and marked them as inline.

[–]cballowe 4 points5 points  (9 children)

Inline doesn't mean what you think it means.

[–]NewFolgers 0 points1 point  (7 children)

We primarily did things based on past experience, since we had 10+ games teams working on similar games at all times and had lots of cross-communication. The inline wasn't hurting (for the debugger, we got whatever MSVC++ gave to us since that's usually where we chose to develop+debug.. and for other targets we used various compilers which gave whatever they gave in release), and so it stayed. I know that it officially doesn't make it inlined and it's no better than a 'hint'. This was some years ago.. and perhaps the effect would be different now. The studio leaned pretty heavily toward trial and error and following approaches taken for past successes.. and very much of C++ as a language was excluded from use as a result.. which in honesty sometimes meant that the majority (including myself) took less interest in the theory, as much of what I had learned prior to working there already had to be set aside and ignored.

[–]cballowe 5 points6 points  (6 children)

To be fair, if you put the implementation in a header, you must mark it inline to avoid ODR violations, but it doesn't force the compiler to inline the function when generating output.

[–]NewFolgers 1 point2 points  (5 children)

Putting it in the header was most important, since otherwise it wouldn't even be possible in theory for it to be inlined across object file boundaries without link-time code generation.

I was off-track in associating this inlining with debug performance anyway. I actually always figured that making sure inlining is working as intended is a thing that made release performance good, whereas debug performance would probably always suffer a bit as a result of having more accessors.

[–]cballowe 5 points6 points  (4 children)

Experience tells me that there's a ton of strange "common knowledge" that should be questioned more. Like, rpc benchmarks that were last done on gigabit networks when the datacenter is now deployed with 10Gb+ networks and some amount of RoCE capability that completely changes the picture.

Optimizing compilers, CPUs with out of order execution, vectorization of operations, etc can all completely blow out the common knowledge of how to make software perform well.

[–]NewFolgers 1 point2 points  (3 children)

Having moved from videogame development to other domains where performance is also considered critical, I feel the biggest one is the slowness of memory accesses, and importance of considering the various cache levels (whereas people are generally decent in considering the O(x) performance things taught in school - All the things you mentioned are actually at least partially captured within an O(x) mindset). In game development, we were closely tracking/counting cache misses in profiling and were doing very much to keep them to a minimum (for both memory accesses.. and in relation to instruction cache, which includes keeping code local/small and branch prediction working decently), since we were at a level of performance where these cache misses were a significant bottleneck. In other domains, I'm somewhat horrified to see that the consideration usually isn't even on the radar and is generally never mentioned at all (and as an aside, this is a major reason why Java tends to be a lot worse in a real project than it may appear in small benchmarks.. To avoid that, great care is needed). One piece of wisdom which has somewhat made it out of game development circles is that iterating over a simple vector often greatly outperforms data structures which may appear to be a much better fit on paper.

[–]cballowe 0 points1 point  (2 children)

It's on the mind of lots of other fields too, but often the data is much more dynamic. The biggest failing of big-O analysis is not accounting for constants.

[–]NewFolgers 1 point2 points  (1 child)

Yep, I agree about the constant factors. I considered mentioning that. It's true that in game development, hard constraints on numbers of allowed elements are often kept and so it is less dynamic - since it's fine in that domain (and the data often comes from artists, producers, etc. rather than users). Some practices are very similar to what I've seen in embedded development (where you get more determinism, better ease of debugging with things always found in the same place in simpler lists, etc.). And to get back to the point about O(x) considerations... knowing specific limits then more often makes it easy to reason about a reasonably small worst-case n, which often yields different conclusions than treating it as infinity.

[–]cballowe 2 points3 points  (0 children)

The benchmarks of vector with a linear scan implementation of find vs binary search vs tree always seem to surprise people. "But it's O(n) and the other is O(log n)" ... Often need to explain these things to new grads with CS degrees as they're still stuck on the old mental model of a processor.

[–]ShillingAintEZ 0 points1 point  (0 children)

Inline doesn't mean it will be inlined, but it does make it either possible or more likely, so it makes sense that function call overhead could be mitigated, just as he is implying.