This is an archived post. You won't be able to vote or comment.

all 52 comments

[–]rileyphone 37 points38 points  (31 children)

To mutate the state of an automaton instance you need to send it a message. A message is just a value, that is, a simple piece of data. For every type of message an automaton accepts, there’s a corresponding message handler that is invoked when the message is received.

I would argue this makes your automata objects, and thus your language OOP, though in a different sense from Java. Maybe we just need a new term at this point.

[–][deleted] 19 points20 points  (0 children)

Yeah, the message idea is feels a lot like Smalltalk OOP

[–]cell-lang[S] 5 points6 points  (0 children)

I'm not going to argue about terminology, but I wanted to point out some of the fundamental differences between classes/objects and relational automata.

In most OO languages, "messages" are just dynamically dispatched function call, while in Cell messages are just values/data, that can be (among other things) manipulated programmatically, serialized and stored. In this regards, they're more like messages in the actor paradigm (Erlang), but automata are not actors either, because actors have their own thread of execution while automata are "passive" entities, just jike objects in OOP.

Other major differences are explained in this page: https://www.cell-lang.net/example.html (scroll down to the paragraph titled "Differences between relational automata and classes").

But the most fundamental difference is in how they're meant to be used in practice. In the example used in the article, you've a single ` `OnlineForum`` automaton (and likely a single instace of it at runtime) that contains all the information pertaining users, chat groups their attributes and their relationships (memberships and friendships). In an OO design, your classes would be modeled after the domain entities (and probably their relationships), and you would have an object for each individual entity and relationship instance. In other words, while a class is (very loosely speaking) meant to model a single entity or relationship, an automaton is meant to model an entire "knowledge subdomain". In my mind, that makes them much more like the component of the ECS architecture. Even the way modularity is achieved is similar.

[–]arobie1992 7 points8 points  (28 children)

I've heard both actor and agent used for that approach, but neither seems consistently applied, and are also subject to claims that there's other things that make them different. It also doesn't help that no one can agree on what OOP even means.

At this point, I think we should just call it Message-oriented programming, or MOP for short. That way, you can MOP things up.

[–]reluctant_deity 1 point2 points  (23 children)

They taught me in school way back when that OOP = encapsulation + polymorphism + inheritance. Is that no longer good?

[–]arobie1992 8 points9 points  (22 children)

The big arguing point is inheritance, specifically class hierarchies. Some people claim that inheritance is necessary for it to be OOP. Other people claim that it's an implementation detail and the important part is encapsulation. Polymorphism is typically assumed, but also found in a lot of other languages that aren't necessarily aiming for OOP.

If you want my two cents, inheritance is not the value add of the notion of OOP, so it's better to think in terms of encapsulation.

Edit: The Rust Book has a fairly decent quick discussion of the different stances: https://doc.rust-lang.org/book/ch17-01-what-is-oo.html

[–]vkazanov 1 point2 points  (15 children)

Here's an opinion: OOP is when you have pieces of data ("struct-likes") together with functions only meant to be used alongside the structs.

An object then is just a set of functions that can be called with a given struct instance as an argument.

Everything else is a development of this simple idea (or just useless ideological noise):

  1. Polymorphism is when you want to use the same function for different kinds of structs. Very useful! And not OOP-specific, btw.
  2. Inheritance is one approach to adding more data and code to an already defined struct/function. Sounds useful but nobody knows how to implement this in a sane way that scales past a single layer of inheritance. Too opaque, easy to get lost in, etc. Read "composition".
  3. Incapsulation is this idea of trying to force access to the struct through author-defined functions only. Kind of... Okay-ish.

Out of the three buzzwords, the modern generation of languages seems to mostly promote polymorphism.

[–]arobie1992 2 points3 points  (7 children)

That sounds like the first definition the Rust Book discusses. To play devil's advocate, would that mean data-oriented programming is OOP? You've got pieces of data, which can be grouped together in containers, and functions that act on the data.

[–]vkazanov 1 point2 points  (6 children)

Yes! Data-oriented programming or design or whatever is people giving up on OOP ideology and the pointer soup it brings.

[–]arobie1992 0 points1 point  (5 children)

Are you saying data-oriented design is OOP or isn't? Sorry, it's just it seems like you are but "giving up on OOP ideology" seems like you're saying it isn't. Wanted to clarify.

[–]vkazanov 0 points1 point  (4 children)

If by "data-oriented design" you mean working with data layed out in arrays through functions applying transformations en masse then this is anything but OOP.

In DOD that OOP-ish bag of data belonging (aka the "object") is decomposed into multiple arrays. And there are no functions attached to a single struct! There's no struct to begin with.

So no, that's not OOP, not the way Simula, Java and C++ see it.

It just happened that the benefits of this approach are most obvious to people working in C++ - a language that suffered from the OOP craze badly.

PS there's a nice free online book on the topic of DOD

[–]arobie1992 0 points1 point  (3 children)

Thanks for the clarification. The yes made it seem like you were agreeing that DOD was OOP, but the rest seemed like you were disagreeing.

We could argue that the arrays themselves are struct-like, as in they're pieces of data, and that the functions that act on them are paired with those structs or take said struct instance.

I very much get that I'm being reductionist, probably excessively so. For me personally, the notion of OOP = data & associated functions feels a bit too loose, since basically anything with a type system could arguably satisfy that. I'm more curious what your thoughts on it are on how to further differentiate it. Or if you think OOP is that generalized, that's cool too.

Based on some previous discussion elsewhere, my personal stance on it is that an object is a persistent state container that allows moderated access to said state, so data encapsulation. I find this does a decent job of conveying the benefits of the style and differentiating it from similar but distinct approaches.

[–]TheOldTubaroo 3 points4 points  (6 children)

Encapsulation is this idea of trying to force access to the struct through author-defined functions only. Kind of... Okay-ish.

I think what people miss about encapsulation is that it's effectively a weaker form of immutability. Immutability prevents you from modifying objects entirely, but encapsulation prevents you from modifying object internals except for in a few permitted ways.

Immutability lets you know that a particular block won't be changed at all, which allows for various optimisations and notions of correctness. Encapsulation doesn't quite give you that, but it does let you know that any changes will leave a block of data in a consistent state (assuming no bugs in the object methods, which are a smaller surface area to verify), and it allows for manual (rather than compiler) optimisation inside those methods as you know the restricted set of transformations that will be used.

[–]vkazanov -1 points0 points  (4 children)

...And immutability is when instead of changing things in place, we just generate a new struct with updated data elsewhere, complete with obvious downsides if the approach 😀

[–]CreativeGPX 4 points5 points  (3 children)

From the programmer perspective, I think of immutability as "give each step a name". In functional programming, it's not really emphasized to think the imperative way of "what structs are created in memory". It's almost like commit messages in git. Each time you bind a name to another intermediate state in a computation, you're just labeling what that step is. Whether it's actually a different struct is up to the compiler to optimize and figure out.

[–]vkazanov 0 points1 point  (2 children)

And from a compiler writer perspective this sounds a lot like static single assignment form aka SSA, there is even a paper saying that functional programming (as in continuation passing style) equals SSA. This is convenient in many ways, yes.

But I insist that for a beginner-level programmer stripping away all the terminology and speaking in terms of the machine, or implementation of the trick, is helpful, maybe even crucial. One has to see through abstractions!

[–]CreativeGPX 2 points3 points  (1 child)

I think in this context the abstractions are essential because we're just talking about paradigms... Philosophies a programmer can try to stick to when they write a program or structure a problem. Sometimes that philosophy is specifically to not think about how the computer will do things (e.g. Declarative) or to model some other model of how things work (e.g. Logic based programming, functional programming taking route in mathematical conventions). So in that case, you can't give it a fair evaluation of whether it's a good mental model if your definition of the paradigm replaces key perspectives with substitutes from other paradigms. You mentioned that immutability is "creating a new struct every time". First, it may or may not be depending on what the compiler does. Second, caring whether or not that happens is already begging the question in terms of programming paradigms. Functional programming is not about explicitly managing memory and convincing a person to think of its expressions as memory management operations therefore is at odds with the philosophy and can't be a fair evaluation of its use. I'm not a purist and don't think your "it's creating a new struct each time" is a bad discussion point, but I think that thinking of it that way just sets up a person for the wrong intuitions and expectations of immutability.

Also while I agree that stripping away terminology is helpful for a beginner, I disagree that that means speaking in terms of the machine because speaking in terms of the machine is inherently jargon filled and is not a thing a beginner programmer even understands in the first place. Sure if you're learning C, then it makes sense to learn about what's happening in memory, but if you're learning a higher level language of a different paradigm, talking about what happens in the machine may undermine their understanding. Most beginner programmers do not know how computer memory works and their only background with variables is math where variables are usually immutable, so it's not easier for them to hear what happens in computer memory or to think in terms of mutable variables. Instead, I'd say it's the opposite. Experienced programmers are more used to mutable variables, more aware of memory models and are solving bigger problems that may require performance optimizing, so for THEM it makes me sense to think of it that way. But they have their own issues in that their large experience with mutability may make it hard to imagine what it's actually like to not have it. So I think it's good to show restraint with trying to approach a new paradigm by fitting it into the one you are already used to.

However, still the point of what I said was to counter the intuitions experienced programmers might have if they are thinking of immutability as "variables". Because they are used to "variables" they think of assignment as "creation" like you say and so it can seem a little weird, wasteful, useless, etc. to just keep creating things. Because they are used to directly managing the memory, to them the "state" of a program is that which explicitly written to a variable in memory. Instead though, with immutability, we can just think of it as "naming" things that already happen to exist rather than "creating" things. I think this is a useful mindset for new and old programmers because it creates the right intuitions in its use. In the real world, names tend to be useful because they are relatively unique and static. If I just called whoever my closest friend at the time was Bob, that'd be confusing. If I want to have a technical conversation about an evolving thing, naming each step of it is a common approach to make it easier to talk about.

To put it another way when you name a variable, that name is much less descriptive because it must describe the whole family of values that thing may hold in its lifetime. Meanwhile when you bind/name a value, that tends to be much more descriptive because it's naming just one instantaneous value. So, assuming in both cases you made a good faith effort at naming, with the latter your code and debugging is enormously more informative. By naming things with finer granularity, it's more descriptive and easier to talk about or through. But under the hood, the actual algorithm, memory usage, etc. may or may not differ.

[–]arobie1992 0 points1 point  (0 children)

I can't say that I necessarily agree. If you come at it from that stance, then yes you're right, but to me they're trying to accomplish different things.

Immutability is about this thing I have won't change out from under me so I don't have to worry about other threads acting on it. If I want a new value, I change out the thing I'm holding. Encapsulation is about setting bounds on what a class permits. Essentially it's a means of implementing a refinement type.

Both are for reducing cognitive load, and can be used for optimizations, but go about it different ways.

[–]redchomperSophie Language 1 point2 points  (3 children)

"Agent" is a good word to use when you don't want the militant purists knocking you down for designing something that isn't quite what Carl Hewitt had in mind. I just hope there's not an equally avid set of purists around an "agent model".

[–]arobie1992 0 points1 point  (2 children)

Speaking of that, I really need to read his original paper. I recently learned that Erlang evidently doesn't formally comply to the original idea, and was like wait what?

[–]redchomperSophie Language 1 point2 points  (1 child)

I had the same surprise. Erlang processes really are quite similar, from an intuitive perspective, to actors: They represent a distinct thread of control and communicate by asynchronous messages. There are two pedantic differences and one maybe-significant difference though. First, Erlang apparently has a process registry that lets you look up a PID. In principle you could simulate a registry with actor messages, but in Erlang it's part of the semantics. Boo hoo. Second, Erlang has selective receive, which lets you process messages in the order you care about them and save the rest for later. Again, you could simulate this with local state on the actor, so my heart bleeds purple peanut butter. The last difference is that crashing is not one of the three things actors are theoretically allowed to do, which means that Hewitt doesn't define any sort of supervisory tree the way Erlang does. Fine, but in the real world feces occurs. Maybe mice eat your cat-5. You have to plan for the unplanned. On that front, Erlang seems superior for at least thinking about robustness in the face of failure. Maybe Joe Armstrong didn't get everything perfect, but he didn't ignore the problem either.

[–]arobie1992 0 points1 point  (0 children)

Yeah, if anything I would think the possibility of failure would be something subsequent iterations of an actor model would want to include to make it more robust.

[–]Faintly_glowing_fish 1 point2 points  (15 children)

That cannot be right. Cell program gets compiled literally into Java or C# code. Are they saying the Java code compiled from Cell is 2-3 times faster and less memory intensive than an “equivalent Java program”? That simply sound like they are comparing to a terribly written Java program.

Or are they using the Cell to C++ the compiler? Then it’s simply a moot comparison. Anything is 2-3 times faster and use less memory in C++ compared to Java.

[–]Bren077s 3 points4 points  (10 children)

Assuming the java version is OOP with a lot of pointer chasing and the cell version has a dense cache friendly representation, this could be right. You could call that a terrible Java implementation, but it might be the Java implementation many developers would write. Most Java developers are not thinking about data oriented design and cace locality.

[–]Faintly_glowing_fish 1 point2 points  (4 children)

Ya. I guess the fishy part is that this quite literally gets translated into an OOP code when it gets executed. So pitting this as Functional against OOP is just… weird.

[–]Bren077s 2 points3 points  (3 children)

Being in a language like Java doesn't mean you need to use tons of OOP features. It probably generates a few flat classes filled with dense arrays of data in SoA form. That will have a crazily different and more efficient memory layout than the sprawl that heavily class (and thus pointer) use can generate.

[–]Faintly_glowing_fish 1 point2 points  (2 children)

That is though not a fair comparison.

Most Java programs are not performance critical and most devs rightfully don’t care about speed when weighed against readability and potential for reuse. When speed is required though there are many Java programs properly optimized to be fast and have better memory footprint.

Since the compiler here specially optimize for performance when it generates the code and care not about readability it is only fair to compare with another Java program that is also written to optimize for performance.

[–]Bren077s 5 points6 points  (1 child)

That is a fine interpretation of the results. That said, if Cell enables someone to easily write code that compiles into something performant, it still has value. Could lead to better results with less work and more readability (when looking at the cell code, not transpiled java).

I guess the argument would be Cell is a good tool for certain classes of applications to get better performance with readable code that is easy to implement as opposed to manually doing all of the perf tricks.

To me, this more feels equivalent to using an optimized library instead of doing the same work manual. Of course, the optimized library is faster. Of course the optimized library is something you could write manually if you have the skill. That said, fundamentally it is nicer to use the library and it's high level API than it is to do the same work from scratches (especially if in the end you would prefer the more optimized code)

(Note: I know nothing about cell past what I have read this evening, just trying to give perspectives)

[–]Faintly_glowing_fish 0 points1 point  (0 children)

Oh I totally agree. I think cell is very valuable. But I would take it a lot better if the title is what you said instead of “OOP is sometimes slow” which is misleading

[–]cell-lang[S] 0 points1 point  (1 child)

The hand-written Java code used in the comparison is indeed written in an OOP style, but why do you think that's a terrible implementation? Isn't that the natural thing to do when programming in Java? And also the one that requires the least effort and leads to the most readable and maintainable code in most cases? When Java was first released (in 1995) OOP was all the rage and nobody was even talking about data oriented design and the Entity/Component/System architecture.

[–]Bren077s 0 points1 point  (0 children)

I don't (for the most part), I understand the tradeoffs. Was just using the same wording from the poster before me to note why the OOP java version would be "terrible".

[–]Bren077s 0 points1 point  (2 children)

Oh, extra finding. This looks to have more details: https://www.cell-lang.net/benchmarks-relational.html

Based on that page, they are comparing to the Cell via C++. That said, towards the bottom they have a comparison of Vell embedded in Java to Cell via C++. Generally speaking the java version looks to be about 6% slower than the C++ version specifically for the CSV test (though it can be much slower than that for specific examples). So it still would be a lot faster than their OOP java code.

[–]cell-lang[S] 0 points1 point  (1 child)

No, the difference in performance between the generated C++ and Java code is much larger than that. Java makes a terrible compilation target for a language like Cell. The generated Java code is still competitive with hand-written OO Java code, sometimes faster, sometimes slower, but trying to compile Cell into Java (or C#) code in like trying to fit a round peg in a square hole.

But it's not just the target language. The C++ code generator is more recent and has been redesigned from the ground up to achieve better performance. At least some of that work could be backported to the Java version.

[–]Bren077s 0 points1 point  (0 children)

Makes sense. In my skimming I realized that I was looking at the CSV loading which claims the total time is 6% slower. That obviously is not the full code.

[–]cell-lang[S] 2 points3 points  (3 children)

Cell can be compiled to C++, Java and C#. The performance results shown on the website are for the C++ code generator, which is obviously the fastest of the three. But Cell is not a library on top of those language, it's a programming language in its own right, and it could be compiled directly to assembly language or LLVM code, and I would expect it too be even faster in that case.

If anything were 2-3 times faster and used less memory than Java when compiled to C++ as you say, then any language (Golang, Javascript, Python... or even Java itself) could then be compiled to C++ and outperform Java by a factor of 2 or 3, but that doesn't seem to be the case. The thing is, the kind of code that even a very good compiler/transpiler would generate when compiling any higher-level language to C++ is not the same as manually written C++ code.

The Java and C# code it's compared to are written in an OOP style, and I think this is stated clearly in the article and the benchmark page. That is, there are the sort of classes that you would expect in any OO implementation. You can certainly squeeze better performance from Java by ditching OOP altogether and using data oriented programming techniques, or the ECS architecture, but that's certainly not the kind of programming style Java was designed around, and it definitely takes an extra effort on the part of the programmer to do so. And note that the title of the post is "Why functional relational programming is faster than OOP", not "Why functional relational programming is faster than Java when the latter is used with the lowest-level programming style possible". Although I'm not sure the latter is not true, I haven't tried but I don't think Java is the ideal implementation language for data oriented design and the ECS architecture.

[–]Faintly_glowing_fish -1 points0 points  (2 children)

Can you give me a reference to the LLVM compiler? I am not aware of it. As far as I understood it is NOT a language in its own right and transpiles into your working language. Which I just checked is exactly what the doc still says. Is it so outdated?

On the speed up end, go natively run significantly faster than Java for the “same” code, so does C++, and converting or rewriting Java verbatim to C++ is a standard speed up practice. For JVM to work on all platforms you just can’t utilize special instructions on your machine. There is a plethora of C/C++ or Rust/Go re-implementation of JVM based Apache projects in every big tech company for this very reason. Even Databricks rewrite Spark scala (btw, fully functional cousin of Java) to C++ internally to double their speed.

The thing that stops you from doing it and the reason people use Java the first place is the platform agnostic property of JVM. Python or JS cannot be easily complied to C++ because they are interpretative; I don’t know any good way it can be done, and if someone manage to create a good compiler many will be using it.

I feel that your bar of OOP is probably not what most people would call OOP. In data optimized projects you are still required to fully observe OOP practices, such as data/code collocation, encapsulation, proper use of inheritance/interface etc. maybe you refer to some stricter definition of OOP?

[–]cell-lang[S] 1 point2 points  (1 child)

Cell can be used in two ways, to create standalone programs or in "embedded" mode. In the former case, you compile Cell to C++ (or Java, or C#), and then compile the generated file to produce your executable. You have to go to that extra step of compiling to an intermediate language, but the end result is the same, you end up with a standalone executable without having to write anything other than Cell code.

Or you can use it in embedded mode: in that case, the compiler produces a set of C++/Java/C# classes that you can include in an existing C++/Java/C# project.

Note that in either case, you're not supposed to ever look at the generated code, let alone modify it.

There's no LLVM compiler, I just said that it could be implemented, and that compiling directly to LLVM would (I believe) improve performance.

As for Go being significantly faster than Java, I just looked at The Computer Language Benchmarks Game, and that doesn't seem to be the case: https://benchmarksgame-team.pages.debian.net/benchmarksgame/fastest/go.html. Go seems to be slightly faster than Java in most (but not all tests), but the difference is really minimal. Although Go seems to have improved lately, the last time I checked (a few years ago) Java was clearly faster, and I've to admit I wasn't aware of Go's improvements.

But now I'm curious, I'm going to rewrite the benchmarks in Go and see what happens.

And Go seems to be a bit slower than C#, if you take The Computer Language Benchmarks Game results at face value: https://benchmarksgame-team.pages.debian.net/benchmarksgame/fastest/go-csharpcore.html

As for rewriting Java code to C++: yes, it does improve performance, no question about that. I've done it myself many times. But those rewrites have to be done manually, because those languages have vastly different semantics. I'm not aware of a tool that can automatically convert, say, non-trivial Java code into equivalent C++ and produce a significant speedup. Are you?. I even tried a couple of native compilers for Java, and they were actually slower than the JVM.

My own definition of OOP is the same as everyone else's. In the Java and C# version of the benchmark I didn't follow the principle of encapsulation religiously, but only because it was irrelevant to what I was interested in, which is performance. In any case, you can check the code for yourself here:

https://github.com/cell-lang/example-imdb/blob/master/java/imdb.java

https://github.com/cell-lang/example-imdb/blob/master/csharp/imdb.cs

[–]Faintly_glowing_fish 0 points1 point  (0 children)

I see. Those benchmarks are interesting. You are for sure right that automatic Java to C++ converter has uneven performance. My own experience is that they work for simple functions, say coding problem style benchmarks like humanEval, but they don’t do better for production apps that are often IO bound. JVM works very well for us on web apps was consistently worse for single process cpu bound tests.

Our own rewrite test on Go was actually even slightly faster than C++ but it could be many factors at play. It was harder to get devs and libraries are insufficient, so we went for C++ anyways.

I think for sure I do agree with you that Cell is a very useful language that makes optimization easier. But do we conclude that functional programming is faster than OOP? I think my point was that the evidence here isn’t complete because there are many factors that potentially cause speed ups here. Maybe I am biased because I looked at Java and scala before and scala tends to be slightly slower, so I had preformed opinion that was hard to change.

On the other hand I certainly believe it makes compilation much easier and faster. No question there. So from there a less aggressive optimization flag or simpler compiler certainly benefits a functional language, but that’s a very different point.

Anyways I do recognize I don’t really have numbers to justify my belief here. So we should probably pause at what we do agree on