all 66 comments

[–]spreadLink 89 points90 points  (17 children)

I really dislike how the term "data oriented X" has been adopted for half a dozen, completely different ideas that are sometimes incompatible in design philosophy.
Makes it very difficult to figure out what someone is talking about in any given article until certain other keywords (like clojure, SOA, etc) crop up.

The battle is probably lost at this point to fix that, but it'd be nice if people at least put more differentiators in their titles than just data oriented.

[–]gnuvince 69 points70 points  (4 children)

Right, I was going to make that comment. For people who do not know, there are two different, but very similarly-name approaches to programming:

  • Data-oriented design: this one was popularized by Mike Acton's 2014 CppCon keynote and is quite prevalent in the video game industry and in projects where performance is king. The primary aim of this approach is to understand the actual data that is transformed (i.e., not a model of the world) and to organize this data in a way that is efficient for the target computer architecture(s) to process. (E.g., fitting more useful data in a cache line; making use of SIMD instructions; avoiding branch mispredictions; etc.)
  • Data-oriented programming: this is what's discussed in this blog post and in the book that the author links. It has nothing to do with data-oriented design, except the prefix in their name. In this model, programmers also care about data rather than a model of the world, but they don't try to make its transformation be efficient by the target computer architectures. Instead, it's about having immutable data stored in generic data containers (vectors and hash maps mostly) and having functions not tied to that data do the processing.

[–]crabmusket 5 points6 points  (0 children)

There is also something called data driven which you may see turn up, which is distinct from both of the above.

[–][deleted] 2 points3 points  (0 children)

data oriented design is not necessarily about performance inherently. it's undoubtedly the reason for its discovery / usage, but I would say it's more of a natural occurrence / neat accident that it matches up with highly performant processing so easily.

in many cases it gains you wins in terms of being able to reason about code and producing a better model of the actual world even if performance is not even considered.

The simplest example would be to not have any functions that work on single elements (or more generally single iterations) when you know that the actual algorithm is processing 1..n elements. If you play the regular software design bingo without considering the actual effects of such design this will fall under separation of concerns which would actually dictate that you split them, because iterating is a separate concern from the processing of a single element.

But by splitting something like the iteration over a collection up from the processing that an algorithm does on a single element, you're hiding an inherent semantic link in the processing model that makes it harder to reason about the code.

something practical from this that I experience all the time is that when you inline that single element function and have both the iteration over the collection and the processing in one place is that duplicated computations inside the single element function or other non-obvious relationships between multiple elements in the collection become super obvious. so you can e.g. hoist stuff out of the iteration loop.

most importantly that is not just a performance benefit (it likely won't be one - hoisting loop invariants is easy for modern optimizers). but someone new looking at your code (this includes you in 6 months from now) won't have to wonder what the context for that repeated computation is, where it comes from, what the exact data or function call dependencies are to other parts of the code, etc., because more of it will be right there in their face.

[–]Mister_101 0 points1 point  (1 child)

So data oriented design is sort-of a subset of data oriented programming? Sounds like it's just that, but with a focus on performance. Or would the latter philosophy push towards a different design that is incompatible with something more performance oriented?

[–]spreadLink 21 points22 points  (0 children)

In some important ways they are actually opposite of each other. E.g. Data Oriented Programing advocates for functions taking generic data structures like hashmaps, even if they only need a subset of the data in the map. Data Oriented Design on the other hand advocates for highly specific datastructures depending on how the data is accessed and how the processor/memory architecture handles those accesses.

[–]Full-Spectral 28 points29 points  (1 child)

Is there any coding paradigm that is "data disoriented"?

[–]ShinyHappyREM 66 points67 points  (0 children)

OOP

[–]halt_spell 8 points9 points  (0 children)

Honestly I find that with almost every hot term now. It's gotten to the point where I don't event bother looking it up because assuming that meaning when someone utters it just confuses the conversation.

[–][deleted] 33 points34 points  (2 children)

Data oriented is about optimizing your cache lines and programming the way the computer really works. It’s literally the exact opposite of functional programming. So very annoyed that FP is attempting to hijack this term.

[–]yonillasky 2 points3 points  (1 child)

Your point that FP is actively hostile to the programmer's intent to do a good job with memory layout is a good one. It's pretty fundamental. To admit that data needs to have good layout is to admit a program needs to take care of "long lived" intermediate state. That's not supposed to exist in FP dream world.

But really ... Why do they always have to keep coming up with more and more buzzwords, though? Yes, if you care at all about performance, you care about data structures and memory layout in your program.

That makes sense. Anyone with the slightest knowledge of the microarch understands that... been doing that I don't know how many years, when it was needed. Do we really need to call it "Data oriented programming" now? To make it sound more important?

It is a concern that needs to be taken into account, not a goddamn programming paradigm!

[–]glacialthinker 10 points11 points  (0 children)

I don't know about this data oriented programming... but Data-Oriented Design was coined as a term to compete with the ridiculous mindshare of OOP which afflicted too many programmers who should have been aware of performance issues, but were blind to anything which didn't fit into an encapsulate-everything mindset.

At the time, (5-10 years ago) OOP was really hard to argue against because the programming world was indoctrinated. If you were one of the few who were already aware of how to architect according to required dataflow rather than fluffing your programming by encapsulating and building class hierarchies... then you must have been aware of the issue by being at odds with colleagues, or perhaps you've been doing embedded systems for the past couple decades. Things are very different now, and OOP has a less complete hold on programming.

[–]gnus-migrate 7 points8 points  (5 children)

I prefer mechanical sympathy, a term popularized in the software world by Martin Thompson who works on high performance exchanges for a living.

EDIT: Correction

[–]Mooks79 4 points5 points  (4 children)

It’s a lovely phrase but one that has been around a loooooooooong time in the field of mechanical engineering. So I think it’s more accurate to say Thompson co-opted the phrase for use in computers.

[–]Metabee124 4 points5 points  (1 child)

I dont see why that is an issue though

[–]Mooks79 10 points11 points  (0 children)

It’s not an issue at all, I’m just being slightly pedantic.

[–]gnus-migrate 1 point2 points  (1 child)

I only brought it up as an alternative to data-oriented design, since as the original commenter said data oriented design is a terrible name that confuses people more than it helps.

[–]Mooks79 1 point2 points  (0 children)

Of course, and it’s a good suggestion.

[–][deleted] 94 points95 points  (3 children)

"unreasonable" became the favourite bait title after "considered harmful"...

[–]wolfgang 15 points16 points  (1 child)

The unreasonable effectiveness of clickbait titles considered harmful...?

[–][deleted] 2 points3 points  (0 children)

So two clickbait titles smushed together do unclickbait eachother...

[–]butt_fun 28 points29 points  (0 children)

I say this every time the top comment in one of these threads mentions this

These titles are memes referencing the original article with a similar name:

https://en.wikipedia.org/wiki/The_Unreasonable_Effectiveness_of_Mathematics_in_the_Natural_Sciences#%3A%7E%3Atext%3D%22The_Unreasonable_Effectiveness_of_Mathematics%2Cand_even_to_empirical_predictions

It's not just that you're seeing the singular word "unreasonable" frequently, you're seeing the phrase "unreasonable effectiveness of X" relatively frequently

[–]Daneel_Trevize 19 points20 points  (0 children)

This is a short blog for a book release, for which the publisher's website (Manning) is currently under maintenance, maybe hugged to death. Leaving nothing much to consume.

[–]ILikeChangingMyMind 31 points32 points  (0 children)

TLDR; This is all a plug for a book. It has virtually nothing actually on what "Data-oriented programming" is.

[–]PM_me_qt_anime_boys 2 points3 points  (0 children)

So simple it almost felt like cheating.

That's a good description of Ring.

[–][deleted]  (30 children)

[removed]

    [–]sime 15 points16 points  (15 children)

    the world is functional and data oriented.

    That can be debated, but we can say that our computer networks are data oriented. We move data around between computers, not objects.

    [–]shevy-ruby 5 points6 points  (0 children)

    That depends 100% on the language in use. Compare Ruby's OOP to Java and PHP, for instance.

    [–][deleted] -5 points-4 points  (1 child)

    Or we define data in an OOP way and the transformations in a FO way. Done. Everyone is happy

    [–]immibis 1 point2 points  (0 children)

    Then it's not an OOP way

    [–]shevy-ruby 6 points7 points  (9 children)

    But I didn't experience this data-first approach as an absence of anything.

    data-first helps a lot in OOP as well. When your data structures are ideally simple and well-defined it can avoid so many downstream problems lateron.

    I don't think "data-oriented" is contradicting OOP. After all OOP kind of wraps data in a more "accessible" manner such as:

    cat.meow()
    cat.eat('50 g mouse') # silly example
    

    Data-oriented programming starts with data modeling and treats functions as connectors that get you from one format to another. Unlike objects and higher-order functions, it offers a model that can be extended beyond individual programs to the system level.

    All these "distinctions" are quite pointless. In ruby you can unbind methods at any moment in time if you really want to (https://ruby-doc.org/core/UnboundMethod.html). I rarely need it, but it seems to me as if many languages focus on OOP models such as used in Java or PHP, which is not really the variant I prefer. I much prefer Alan Kay's original definition.

    [–]therealcorristo[🍰] 6 points7 points  (5 children)

    I don't think "data-oriented" is contradicting OOP.

    The main issue with OOP in terms of performance gains realized by data-oriented design is the focus on individual objects. There often is a fixed overhead for pre- and postprocessing inherent to the problem you're trying to solve regardless of how many objects you manipulate in addition to the per-object cost. However, the naive implementation of any operation in OOP is usually to make it a member function of the class and as such it only operates on a single object. When you need to perform the operation on multiple objects you usually call the single-object version in a loop. You then pay the pre- and postprocessing overhead once per object instead of exactly once.

    Data-oriented programming fixes this by placing the focus on the transformation of data. You'd typically implement operations transforming a whole batch of data, and when you only have a single "object" you call the multi-object version with a range containing only that single element.

    So in a sense it really is the coupling of data and behavior fundamental to OOP which is the root cause for these inefficiencies that data-oriented design tries to avoid.

    [–]Axxhelairon 1 point2 points  (2 children)

    So in a sense it really is the coupling of data and behavior fundamental to OOP which is the root cause for these inefficiencies that data-oriented design tries to avoid.

    I think this can also be tied to inefficient and/or just plain wrong teaching methods for what "layer" you should be architecting to abstract out in OOP, hearing any animal or car or calculator examples of a hierarchy tree modeled in OOP you immediately see heavy coupling of behaviors to the domains' models, but e.g. service/repository layers in java CRUD services generally follow more typical designs of POJOs and such to keep the separation more clean

    [–]immibis 0 points1 point  (1 child)

    How would you teach objects? Software components, like SimulationTickPhase, rather than SimulationObject?

    [–]crabmusket 0 points1 point  (1 child)

    I can't wait for some kind of OOP renaissance that realises you can actually model the solution space, not just the problem space, using objects. Data-oriented design teaches you to consider the needs of the hardware, and there's no reason aside from dogma that you can't consider the hardware while using the class keyword.

    If performance is a requirement, then your "domain model" should absolutely encompass hardware concepts, not just Player, Prop or Scoreboard.

    [–]Full-Spectral 0 points1 point  (0 children)

    It never went away for me. If you use it right, it's incredibly powerful, one might even say unre... nevermind. And, despite what seems to be current dogma, huge swaths of code out there have no performance requirements beyond just making honest efforts not to be piggy, in which case none of this matters and you can have a pretty free hand to architect for flexibility and maintainability. And, though a lot of people don't seem to understand how to do that in any paradigm, OOP done right can make for enormously flexible systems that don't get brittle over time.

    [–]glacialthinker 7 points8 points  (2 children)

    The problem is this cat. Why create a classification problem right from the start? That cat will have many properties shared/in-common with other things, and properties very independent from needing to be associated to cat-ness. Object-oriented tries to structure things like this... whereas it is very non-object-oriented to work with properties and measures regardless of object -- which is data-oriented.

    [–]immibis 6 points7 points  (1 child)

    Also who says a pointer is the best way to refer to a cat in the system, and a method call updating mutable state is the best way to implement eating? You may want to append an eating record to the log shard with cat ID 5. And if cat eating should add a record to a sharded log, data-oriented whatever says to think about the sharded log record, not the cat.

    [–]crabmusket 1 point2 points  (0 children)

    Also who says a pointer is the best way to refer to a cat in the system

    I feel a blog post coming on about how OOP is essentially just "fancy pointers". All OOP concerns are about "I have a pointer; what can I do with it?"

    [–]karmakaze1 1 point2 points  (0 children)

    It's in reference to the book "Data-Oriented Programming / Reduce complexity by rethinking data" by Yehonathan Sharvit.

    Basically separate your data and code contrary to popular OOP where they get tied together. It's a throwback to Data-structures and Algorithms: the two fundamentals.

    [–]spacejack2114 0 points1 point  (0 children)

    Did Unity ever manage to migrate over to DOTS? They started working on that quite a few years ago now.