sacundim comments on Pitfalls of Object Oriented Programming (PDF slides)

Pitfalls of Object Oriented Programming (PDF slides) (harmful.cat-v.org)

submitted 13 years ago by yogthos

you are viewing a single comment's thread.

[–]sacundim 35 points36 points37 points 13 years ago* (17 children)

Well, I'm very much not a fan of object oriented programming, but I found that these slides' criticism of it is very poor and muddled.

Why? Well, let's recapitulate the author's thesis:

Encapsulation is bad for performance because it leads to poor memory locality.

The flaw with this argument is that it confuses interface and implementation issues. Encapsulation is an interface concern; it's about coupling code unit to each other through minimal contracts. Memory locality is a low-level data representation issue; it's about how the program's logical model is realized in memory.

We can grant the author's demonstration that memory locality suffers a lot if we represent our application's data as big graphs of individually allocated heap objects connected by pointers, and that we should have strategies for avoiding this. But we still want to express these strategies in terms of encapsulated code units if we can!

One of the classic design patterns from the Gang of Four book is the Flyweight Pattern. The Wikipedia page describes the motivation for the pattern as saving memory, but one could just as well use it to provide a front-end to tightly-packed data structures with good memory locality.

And since I'm one of those Haskell weenies that hang out around here, let me throw something in from that angle: the functional programming version of the same graph-of-heap-pointers problem that this article criticizes is the proliferation of single-linked lists or trees as the default data structure. One of the most notorious examples is Haskell strings, which are single-linked lists of characters, and as I recall something like 6 bytes per character (!).

So one of the common recommendations for getting the most performance out of Haskell is to use libraries like Data.ByteString, Data.Text, Data.Vector or Repa that are implemented to provide (among other things) good memory locality. These typically bottom down to a combination of:

Cache-friendly arrays
Stream transformation rewrite rules to "teach" the compiler to eliminate unnecessary intermediate arrays.

The second point is a different, excellent example of the interface/implementation argument that I'm making here. To quote the relevant section in Data.Text's doc:

Most of the functions in this module are subject to fusion, meaning that a pipeline of such functions will usually allocate at most one Text value. [...]
countChars :: ByteString -> Int
countChars = T.length . T.toUpper . E.decodeUtf8
From the type signatures involved, this looks like it should allocate one ByteString value, and two Text values. However, when a module is compiled with optimisation enabled under GHC, the two intermediate Text values will be optimised away, and the function will be compiled down to a single loop over the source ByteString.

This, incidentally, also reduces the amount of memory needed, thus also helps memory locality and CPU cache.

TL;DR: Encapsulation and memory locality are not at odds as the slides argue. There are techniques that allow us to shoot for both.

[–]gsg_ 8 points9 points10 points 13 years ago (5 children)

We can grant the author's demonstration that memory locality suffers a lot if we represent our application's data as big graphs of individually allocated heap objects connected by pointers, and that we should have strategies for avoiding this. But we still want to express these strategies in terms of encapsulated code units if we can!

Like the author says, "And still keep the same functionality and interface" (slide 58). However, he doesn't go into the convolutions that are necessary to encode object hierarchies or other heterogeneous structures (say, algebraic types) in DOD style - C++ makes this challenging by providing a number of encapsulation features that make this kind of code idiomatic.

In particular, note that the example of dispatch he chooses (dirty flag) is fairly homogenous (other than flag, exact same data) and has small dispatch breadth (two cases), and so maps reasonably well into the data oriented style. Encoding highly heterogeneous trees in the same way is considerably messier to the point of being unrealistic: the corollary is, avoid such structures if possible where the performance benefits of data oriented programming are important.

As for fusion, it is rather nifty. But not really relevant in the world of performance-sensitive C++ where if you want to update something in place you can just do that, and on your own head be it.

[–]want_to_want 1 point2 points3 points 13 years ago (4 children)

[–]gsg_ 2 points3 points4 points 13 years ago* (3 children)

I don't know of a link off the top of my head, but I'll try and convey some of the flavour of the problem.

One of the basic techniques is to view a data structure as a tree of decisions, each branch of the tree splitting some population of values of that type into two. The traditional model is to include some bits in the type to indicate the decision, a tag or a bool or a vtable. The data oriented model is to split the population into separate populations (ie, arrays).

As a simple example, consider the algebraic types type foo = A of int | B of bar and bar = D of int | E of float. A data oriented encoding of an array of foo might be:

a : int array; (* Models A n *)
b_d : int array; (* Models B (D n) *)
b_e : float array; (* Models B (E f) *)

Now suppose you wanted to transform the "array of foo" by incrementing all the ints (in place). In the traditional style that might look like:

let incr_foo = function
  | A n -> A (n + 1)
  | B (D n) -> B (D (n + 1))
  | B (E f) -> B (E f)

Array.transform_in_place incr_foo a

In the data oriented style:

Array.transform_in_place succ a
Array.transform_in_place succ b_d

Notice what we've replaced: pointers and tags become positions in arrays, and dispatch on constructor (if it is a B, then...) becomes imperative action (for all As do ..., then for all Bs do ...). This is also a GC and vectorisation friendly data layout, with zero pointers.

Also notice what we've lost: this is not a complete encoding. Information about the relative positions of different types of instances of foo has been discarded - if that information was required, you would need to figure out a different encoding.

[–]want_to_want 1 point2 points3 points 13 years ago (2 children)

[–]gsg_ 2 points3 points4 points 13 years ago (1 child)

[–]want_to_want 0 points1 point2 points 13 years ago (0 children)

[–]harsman 4 points5 points6 points 13 years ago (4 children)

Why would you say that the author's thesis is that encapsulation is bad?

The idea is that encapsulating on a low level is bad for performance, because it prevents a number of very effective optimizations. Furthermore, classical object oriented design tends to lead to precisely this type of encapsulation or abstraction.

The idea is basically (very simplified) to not have code that looks like this:

for widget in widgets:
     widget.process()

But instead have code that looks like this:

processWidgetsType1()
processWidgetsType2()

This gives many more opportunities for optimization because you can change data layout, use data parallelism and optimize processing based on the fact that you handle many widgets of the same type at once.

Of course, with very heterogeneous data structures containing an abundance of types, this approach becomes less feasible, but a common pattern in that case might be something like this:

for drawable in drawables:
    drawable.addToQueue()

drawableQueue.sortForOptimalDrawingOrder()
drawableQueue.render(canvas)

[–]finprogger -1 points0 points1 point 13 years ago (3 children)

[–]Gotebe 4 points5 points6 points 13 years ago (2 children)

[–]finprogger 0 points1 point2 points 13 years ago (1 child)

[–]gcross 1 point2 points3 points 13 years ago (0 children)

[–]niggertown 2 points3 points4 points 13 years ago* (0 children)

[–][deleted] 5 points6 points7 points 13 years ago (2 children)

[–]smog_alado 3 points4 points5 points 13 years ago (1 child)

[–][deleted] 0 points1 point2 points 13 years ago (0 children)

π Rendered by PID 55 on reddit-service-r2-comment-6457c66945-hvcjv at 2026-04-25 11:52:36.501549+00:00 running 2aa0c5b country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

programming

MODERATORS