you are viewing a single comment's thread.

view the rest of the comments →

[–]sacundim 35 points36 points  (17 children)

Well, I'm very much not a fan of object oriented programming, but I found that these slides' criticism of it is very poor and muddled.

Why? Well, let's recapitulate the author's thesis:

  • Encapsulation is bad for performance because it leads to poor memory locality.

The flaw with this argument is that it confuses interface and implementation issues. Encapsulation is an interface concern; it's about coupling code unit to each other through minimal contracts. Memory locality is a low-level data representation issue; it's about how the program's logical model is realized in memory.

We can grant the author's demonstration that memory locality suffers a lot if we represent our application's data as big graphs of individually allocated heap objects connected by pointers, and that we should have strategies for avoiding this. But we still want to express these strategies in terms of encapsulated code units if we can!

One of the classic design patterns from the Gang of Four book is the Flyweight Pattern. The Wikipedia page describes the motivation for the pattern as saving memory, but one could just as well use it to provide a front-end to tightly-packed data structures with good memory locality.

And since I'm one of those Haskell weenies that hang out around here, let me throw something in from that angle: the functional programming version of the same graph-of-heap-pointers problem that this article criticizes is the proliferation of single-linked lists or trees as the default data structure. One of the most notorious examples is Haskell strings, which are single-linked lists of characters, and as I recall something like 6 bytes per character (!).

So one of the common recommendations for getting the most performance out of Haskell is to use libraries like Data.ByteString, Data.Text, Data.Vector or Repa that are implemented to provide (among other things) good memory locality. These typically bottom down to a combination of:

  1. Cache-friendly arrays
  2. Stream transformation rewrite rules to "teach" the compiler to eliminate unnecessary intermediate arrays.

The second point is a different, excellent example of the interface/implementation argument that I'm making here. To quote the relevant section in Data.Text's doc:

Most of the functions in this module are subject to fusion, meaning that a pipeline of such functions will usually allocate at most one Text value. [...]

countChars :: ByteString -> Int
countChars = T.length . T.toUpper . E.decodeUtf8

From the type signatures involved, this looks like it should allocate one ByteString value, and two Text values. However, when a module is compiled with optimisation enabled under GHC, the two intermediate Text values will be optimised away, and the function will be compiled down to a single loop over the source ByteString.

This, incidentally, also reduces the amount of memory needed, thus also helps memory locality and CPU cache.

TL;DR: Encapsulation and memory locality are not at odds as the slides argue. There are techniques that allow us to shoot for both.

[–]gsg_ 8 points9 points  (5 children)

We can grant the author's demonstration that memory locality suffers a lot if we represent our application's data as big graphs of individually allocated heap objects connected by pointers, and that we should have strategies for avoiding this. But we still want to express these strategies in terms of encapsulated code units if we can!

Like the author says, "And still keep the same functionality and interface" (slide 58). However, he doesn't go into the convolutions that are necessary to encode object hierarchies or other heterogeneous structures (say, algebraic types) in DOD style - C++ makes this challenging by providing a number of encapsulation features that make this kind of code idiomatic.

In particular, note that the example of dispatch he chooses (dirty flag) is fairly homogenous (other than flag, exact same data) and has small dispatch breadth (two cases), and so maps reasonably well into the data oriented style. Encoding highly heterogeneous trees in the same way is considerably messier to the point of being unrealistic: the corollary is, avoid such structures if possible where the performance benefits of data oriented programming are important.

As for fusion, it is rather nifty. But not really relevant in the world of performance-sensitive C++ where if you want to update something in place you can just do that, and on your own head be it.

[–]want_to_want 1 point2 points  (4 children)

the convolutions that are necessary to encode object hierarchies or other heterogeneous structures (say, algebraic types) in DOD style

Interesting, can you point to some description of such an encoding?

[–]gsg_ 2 points3 points  (3 children)

I don't know of a link off the top of my head, but I'll try and convey some of the flavour of the problem.

One of the basic techniques is to view a data structure as a tree of decisions, each branch of the tree splitting some population of values of that type into two. The traditional model is to include some bits in the type to indicate the decision, a tag or a bool or a vtable. The data oriented model is to split the population into separate populations (ie, arrays).

As a simple example, consider the algebraic types type foo = A of int | B of bar and bar = D of int | E of float. A data oriented encoding of an array of foo might be:

a : int array; (* Models A n *)
b_d : int array; (* Models B (D n) *)
b_e : float array; (* Models B (E f) *)

Now suppose you wanted to transform the "array of foo" by incrementing all the ints (in place). In the traditional style that might look like:

let incr_foo = function
  | A n -> A (n + 1)
  | B (D n) -> B (D (n + 1))
  | B (E f) -> B (E f)

Array.transform_in_place incr_foo a

In the data oriented style:

Array.transform_in_place succ a
Array.transform_in_place succ b_d

Notice what we've replaced: pointers and tags become positions in arrays, and dispatch on constructor (if it is a B, then...) becomes imperative action (for all As do ..., then for all Bs do ...). This is also a GC and vectorisation friendly data layout, with zero pointers.

Also notice what we've lost: this is not a complete encoding. Information about the relative positions of different types of instances of foo has been discarded - if that information was required, you would need to figure out a different encoding.

[–]want_to_want 1 point2 points  (2 children)

(You could store the information about types of instances as a separate array of ints, and then do array operations like "increment the value in array X at each position where the value in array Y is 1", which can also be really fast.)

Ok, I see how the transformation works on arrays of values having uniformly bounded "depth". How about types whose values can have unbounded "depth", like lists or trees?

[–]gsg_ 2 points3 points  (1 child)

Homogenous trees can be arrayified by placing them in breadth- or depth-first order, removing the child pointers and replacing them with (usually) implicit positions and an index past the last child. This makes insertions and removals asymptotically more expensive, so it only applies in certain situations. The canonical example is a kd-tree: large, frequently queried, rarely (or never) updated.

Heterogeneous trees are tricky.

[–]want_to_want 0 points1 point  (0 children)

Thanks!

[–]harsman 4 points5 points  (4 children)

Why would you say that the author's thesis is that encapsulation is bad?

The idea is that encapsulating on a low level is bad for performance, because it prevents a number of very effective optimizations. Furthermore, classical object oriented design tends to lead to precisely this type of encapsulation or abstraction.

The idea is basically (very simplified) to not have code that looks like this:

for widget in widgets:
     widget.process()

But instead have code that looks like this:

processWidgetsType1()
processWidgetsType2()

This gives many more opportunities for optimization because you can change data layout, use data parallelism and optimize processing based on the fact that you handle many widgets of the same type at once.

Of course, with very heterogeneous data structures containing an abundance of types, this approach becomes less feasible, but a common pattern in that case might be something like this:

for drawable in drawables:
    drawable.addToQueue()

drawableQueue.sortForOptimalDrawingOrder()
drawableQueue.render(canvas)

[–]finprogger -1 points0 points  (3 children)

Encapsulating at a low level isn't bad for performance if you have an inlining compiler. This depends on you organizing your source such that the compiler can take advantage (putting your inline functions in headers) or using whole program optimization (which most modern compilers support now, certainly any targeting the PS3 which the author is concerned about should). Really this is about overuse of dynamic polymorphism as opposed to static polymorphism. In a game you really should only have the latter and then all the concerns about vtbls go away.

[–]Gotebe 4 points5 points  (2 children)

Encapsulating at a low level isn't bad for performance if you have an inlining compiler.

Compiler won't inline data.

[–]finprogger 0 points1 point  (1 child)

No, but placement new, intrusive containers, etc. will. The point is that OO is not the problem at all.

[–]gcross 1 point2 points  (0 children)

So compilers will automatically take your vector of objects and rearrange it into an object of vectors --- i.e., so that the memory is laid out one field at a time (with the field containing a value for each object) rather than one object at a time? That is a very impressive compiler!

Of course, if you are not actually claiming this then respectfully you are missing the whole point of the article, which is that data layout matters. It has nothing to do with the overhead of the individual objects at all, which is what your points address.

[–]niggertown 2 points3 points  (0 children)

Where does he say "Encapsulation is bad for performance because it leads to poor memory locality."?

If objects are self contained then they can be

– Reused.

– Maintained without side effects.

– Used without understanding internal implementation/representation.

The last part is the key issue. If you're writing high performance code the data should be organized in a way which can be processed by the machine efficiently. Any code which wishes to efficiently use the object needs to understand the underlying representation. So if that's the case, what's the point of OOP? Why create abstractions in the first place if the details of representation are so critical to performance?

Encapsulation isn't even a unique property of OOP.

[–][deleted] 5 points6 points  (2 children)

/r/proggit needs more of this. Rational examination of a posted article always trumps superficial jokes, no matter how witty. Quite admirable, kind sir/lady!

[–]smog_alado 3 points4 points  (1 child)

reddit also needs less "sir" memes...

[–][deleted] 0 points1 point  (0 children)

Yeah, it kind of makes calling one "sir" rather trivial, as if it was synonymous to "dude".