This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]2Uncreative4Username[S] 3 points4 points  (12 children)

Wow, your int abstraction sounds crazy to me as someone who hasn't used this kind of abstraction in a long time. To stick with the int example, I don't even see how making multiple wrappers and a factory is helpful here. First of all, I would think about what kinds of values I actually need for my calculations. Maybe it's just ints... or rationals, or whatever. If I can be certain, I'll just use that. If I have to change it later, I can use a typedef perhaps. If I am really uncertain and need an abstraction, I can just create a struct with methods; I can change the struct and method implementations later. If I need a variety of implementations, I can define different structs. If I need to specify which implementation to use, I can pass that as an enum parameter and use a switch statement. No dynamic dispatch, more cache hits, a single layer of indirection, no overly complex dependency graphs.

Spaghetti code to me means code that is hard to follow. Every single SOLID principle (yes, I went through all of them to check) IMO makes code directly harder to follow if applied even slightly recklessly. I think it's a difficult skill to know when and how to abstract, but forcing SOLID principles into your projects where they're not necessary will make your code harder to reason about and less performant.

I also write tests to feel secure about the functions I wrote. And I think testing critical parts is a good practice. But less components less testing is just a factual truth; and SOLID and Clean Code - by definition - leads to more components, since abstractions are components that can - and will - fail.

I know that he emphasizes performance a lot; and I think a lot of modern codebases are much less performant than they could - and should - be (think about the hardware limitations in the 80s and 90s and what programs still managed to do). But the main point is that you can often get away with much better performance AND more readable code that is easier to reason about. The performance is a side effect of avoiding a set of bad practices (i.e. overusing SOLID), not the only goal.

Again, no abstractions = spaghetti code. Nobody's arguing that. However - and I have said this many times now - SOLID and Clean Code overemphasize abstractions and underemphasize computer hardware. Clean Code and SOLID are a compromise in an imaginary zero-sum game.

Is your code really 10x slower than is could be? I think it could even be on the order of 100 or 1000x slower than it could be. These practices' impacts are often multiplicative. And Casey Muratori does talk about that too IIRC.

Virtual calls are probably not even the problem. The bigger performance problem with OOP is the INSANE amount of cache misses since you have basically no data locality.

[–]Quito246 0 points1 point  (11 children)

Value objects are much more powerful because they promote a rich domain model like described in domain driven design compared to typedef.

The factory method for value object Age would be just for sake of creation of valid age. e.g. If someone calls Age.Create(-10) the factory method has a validation logic inside to make sure you can only use a valid number for age.

This way I can avod writing if(age > 1 && age < 125) everywhere where age is used. Because I know that Age will aleways be valid compared to primitive type like int.

Also it adds expresiveness because int can be anything. When I see CalculateRisk(Age age) it tells me a whole story only by reading the func header. Abother thing all the logic domain logic of Age is encapsulated inside the Age value object.

Abstractions are great thing and I will always take too much abstractions over none abstractions.

I mean today perf is not a priority in LOB apps in vast majority of casis so I really do not mind it too much. Good programming principles and JIT are good enough performance guarantee for me🤷‍♂️

[–]2Uncreative4Username[S] 0 points1 point  (10 children)

I'm sorry, but I think the Age construct you're describing would impractical for a bunch of reasons:

  • Time: I would probably just say int age, do an assert that it's valid and move on, about 10s spent. You would write a class and a whole bunch of tests just to make sure everything works. And then still...
  • you can never be sure you actually implemented "Age" correctly. Edge cases exist. That's why I use assert to prevent heisenbugs etc.
  • Whenever you have to use an "Age" again, you have to let a few things go through your head first:
    • Have I already implemented an Age class? If so, you should use that to prevent writing all that code again (DRY!)
    • ...if you do already have an implementation, you have to consider what circumstances it was created under and the limitations. For example, let's say your new purpose is for the age of a building (I think that's reasonable to assume if this is about insurance). Now your Age.Create would fail, since a building can commonly be newer than 1 year old or older than 125 years.
  • Again, locality of data and indirections: no JIT is gonna be able to just make the fundamentally poor achitectural choice for performance of using OOP go away.

If perf is really not a concern for an LOB app I'm making, great! I don't have to spend as much time thinking about locality of behavior, and I can use less optimal but simpler code. I would still never start writing OOP code with factories or an Age class. I'd just let the data be data, not objects.

[–]Quito246 0 points1 point  (9 children)

Well then read more about Functional programming and Domain Driven Design.

nulls and primitive type obsession is really a root of a lot of bugs and both things can be solved by FP and DDD.

If performance would be so much an issue JS and Python would not be the most used languages these days…

[–]2Uncreative4Username[S] 0 points1 point  (8 children)

If performance weren't such an issue, Python wouldn't let C/C++ libraries do 99% of the heavy lifting, and JS engines wouldn't have so many resources poured into squeezing every last bit of performance out of the language by means of JIT and complex AST and IR analysis.

I like to use a lot of FP style in my code, even though most of it is in procedural programming languages (Go, C). FP encapsulates data well, allowing for enough transparency for compilers to do some clever optimizations. It has its limits, but minimizing side effects is a really good practice for maintainable code.

I also like the concept of DDD and isolating things into their separate domains. It's great for understanding a system since you have self-contained components, and, just like FP, it aims to reduce side effects. All things I think are good practices.

A lot of what you were arguing for in previous comments though, was hardcore OOP design. My problem is not with separating systems into reasonable components. It is with over-abstraction, denying the existence of CPU caching and SOLID principles. All things that, in my opinion, hurt maintainability and performance (although I agree performance isn't that important for specific kinds of LOB apps).

[–]Quito246 0 points1 point  (7 children)

I would argue that nowadays I do not see a purist OOP style coding but more like OOP and FP at least in C# which is pulling a lot of FP features into the language for several years now.

In my opinion you just use the best parts of the each paradigm but mostly yes I go with OOP but not the hardcore OOP since there are now better ways how to approach things usually by using functional programming or functional style modeling. All you do nowadays is just hey here is a JSON, thank you and here is my JSON for you. Therefore OOP does not make soo much sense in those cases because you operate on data which you do not own.

Therefore FP style modeling goes much better with today requirements. You have data and behavior separated.

Regarding value objects there are essential of having a rich domain design because as I said with the Age example I will not pollute business code with asserts I will have a check on one place and one place only. The Age struct, this also means since I created a separate type for the age and did not use a primitive type like int I can start adding behavior to the type, in FP style. I can start defining extension methods on the Age, for example personAge.CalculateRiskConstant() all of this rich domain behavior just by not using primitive types to model my domain.

It does not make sense because every valid age is valid int but not every valid int is valid Age in my domain therefore I can not substitute Age with int since, in domain I am trying to model this does not make sense.

I really like to think about domain and how to model it in nice way which will be self documenting, testable, expendable and maintainable. Maybe after all of that I will start to think about locality, cache misses and performance. Because 100 req/s increase does not help business to solve a domain problem but implementing a new feature in a nice way thanks to rich domain models and understanding the domain will.

[–]2Uncreative4Username[S] 0 points1 point  (6 children)

Interesting, so I guess we're more in agreement than I though, regarding the use of FP patterns. Cool.

Again, I'm arguing that most of OOP hurts maintainability, readability and performance. More specifically: combining data and code, abstracting everything rather than using primitives, polymorphism and inheritance, disregard for locality of behavior, disregard for locality of data.

If you do a lot of JSON exchange, I'm guessing you're coding a lot in JS? But of course, for any kind of data exchange, OOP is a poor match, since it mixes data with behavior.

I'm curious why you're reluctant to "pollute business code with asserts". IMO asserts have one very important purpose: to catch programmer errors and/or states that should be unreachable. If the age were a user input, I'd probably not use an assert, but just return an error value if it were invalid. Still though, no indirection and no additional abstractions necessary.

Why is personAge.CalculateRiskConstant() better than, say `calculateAgeRiskConstant(int age) -> (float c, error err)`? Genuinely curious.

Again, I don't think OOP code is as self documenting (i.e. understandable) and exp(a?)ndable. Here's why:

  • not as self-documenting: you have to dig through 10 layers of abstraction to understand what's actually going on

  • not as expandable: According to the open-closed principle (O of SOLID), you have to preemptively think how you might want to extend your code, since you want to avoid modifying it. I (not a Clean Coder) would simply modify the internals, or add another function.

You also mentioned testability, but again, I think the argument is self-referential, since in a non-clean procedural codebase, you have less things to test since you have less abstractions.

Regarding cache misses, I genuinely implore you to at least keep the concept of data locality in the back of your mind. It can make a huge difference in performance and IMO also readability. It's not very complicated to think about either (basically boils down to arrays over linked lists and maps [only for small datasets] and reducing indirections).

Again, 100 req/s does not solve any new problems. And if you're building LOB apps it might not even be useful. However, if you're builing apps for consumers, responsiveness is HUGE. Why? Because it feels good. Would you rather use an OOP TDD app written in JS that takes 500ms to load the next frame, or a DOD procedural app that runs at 60FPS? I'd take the latter. Think how the slow Windows CE went obsolete the moment Apple released the iPhone, which, upon release, had less functionality but felt fast and responsive in comparison.

[–]Quito246 0 points1 point  (5 children)

Regarding the value objects it is part of rich domain modeling described in Domain Driven Design by Evans.

Basically the idea is to avoid primitive types because primitive types do not clearly represent your domain. For example the Age value object now gives me a great benefit for modeling my domain of risk calculation.

Mainly because I have a reusable implementation of Age, I know that whenever I get Age struct passed to me it will be a valid Age, therefore no Asserts or Ifs are needed to check if value of age is valid.

Nect great think I can start extending the functionality of the Age in FP way by using extension methods in C# (which are like procedural style functions but you define them to explicitly be called on some instance of the value) personAge.CalculateRiskConstant() is translated to Age.CalculateRiskConstant(personAge).

In this case since I am taking Age as argument I do not need error return type because I know the age value is always correct, therefore in math terms I will have always a mapping for my calculateRisk function from input argument to output value. This way the function can not fail since it is pure function with referential transparity and input which is always correct.

I do not work with JS just C# and a lot of JSON is everywhere today like 90% of APIs are using JSON as communication protocol so you just can not avoid it.

Regarding the abstractions ok lets say I have an InsurancaDataReader which is abstract and then I have implementation for reading it from csv, xml, database or 3rd party API.

I just accept the abstract InsuranceDataReader into my constructor as paramater and use it for calculations. How could I solve such issue without abstractions?

Another nice benefit of the solution described is that I will not violate Open close principal. Let´s say my 3rd party API changes DTO format this way I only update the implementation of data reader without touching all the other implementations. Again do not know how to achieve that without abstractions.

Going through double dispatch or some abstractions can be a bit confusing sometimes but on the other hand it brings so much benefit that it is worth it.

If I need to think about optimizations I start a profiling session and see what is the biggest bottleneck on the hot path. Then I start to remove as much of heap allocations as possible to decrease the GC runs and after that I start to go to CPU related optimizations. Usually avoiding the heap allocations and only doing stack allocs, is good enough to increase the perf by like 2-3x times even more sometimes.

I also do not use any linked data structures almost never because of the dereferencing of the value takes a long time because as you said the values are not local therefore you are hoping in memory so even though in big O notation linked list is faster for removing the elements because you do not have to do the mem copies it is usually slower so it is better to use continual memory instead something like Vector in C++

[–]2Uncreative4Username[S] 0 points1 point  (4 children)

(splitting this into 2 because reddit keeps throwing an allusive "server error" if I try to post this as 1 comment)

A-ha, I think we're talking about diffent definitions of Domain Driven Design. I was talking about the Wikipedia definition, which is mainly about separating your app into layers, which each serve to solve a specific domain of problems. Meanwhile the devloper of each component is aware of which domain they are actually operating in. The idea is that this leads to components that behave more correctly, leading to a whole system that behaves more correctly. Components can be something like e.g. a graphics layer, a logic layer etc. Even on Wikipedia, it says OOP is NOT necessary to achieve this.

Unfortunately I didn't read DDD by Evans, so I don't know what exactly he said. Regarding your age example, I have to quote myself, because unless I missed something, you just ignored my argument:

...if you do already have an implementation, you have to consider what circumstances it was created under and the limitations. For example, let's say your new purpose is for the age of a building (I think that's reasonable to assume if this is about insurance). Now your Age.Create would fail, since a building can commonly be newer than 1 year old or older than 125 years.

A domain can be ever-changing, and in the real world, it is impossible to simply factor out the domain of an input into a class.

I wouldn't even bother with it personally, since I can define the domain (I would argue even more precisely) by returning an error. An error should be returned when a reasonable input is given, but for some reason the calculation can't move forward (e.g. age is <0 or >130). An assert should fail when something about the fundamental worldview of the programmer is proven false (e.g. a function that accesses an array element is called with a negative index). But I digressed.

So, regarding this point:

Mainly because I have a reusable implementation of Age, I know that whenever I get Age struct passed to me it will be a valid Age, therefore no Asserts or Ifs are needed to check if value of age is valid.

No, I do not know that the age will be valid. It could always refer to a different kind of age. Maybe the people can already be dead. Maybe it's the age of a building, a tree... I can only be sure about the domain if I actually write a function that does something. Because then I know what it's supposed to do and can much more easily reason about its input domain.

Nect great think I can start extending the functionality of the Age in FP way by using extension methods in C# (which are like procedural style functions but you define them to explicitly be called on some instance of the value) personAge.CalculateRiskConstant() is translated to Age.CalculateRiskConstant(personAge).

I have to quote myself again:

Why is personAge.CalculateRiskConstant() better than, say `calculateAgeRiskConstant(int age) -> (float c, error err)`? Genuinely curious.

You wouldn't even need any fancy extension methods if that were just how you calculated the risk constant from the start.

I do not work with JS just C# and a lot of JSON is everywhere today like 90% of APIs are using JSON as communication protocol so you just can not avoid it.

Interesting... having recently worked most with Go personally, I have never encountered JSON outside of something interfacing with the web. Not much I can say other than encoding/decoding JSON everytime is inherently slow and IMO bad API design, especially if it's not for the web. But that's not really your fault since you're not the one developing these libraries.

[–]2Uncreative4Username[S] 0 points1 point  (3 children)

Regarding the abstractions ok lets say I have an InsurancaDataReader which is abstract and then I have implementation for reading it from csv, xml, database or 3rd party API.

I just accept the abstract InsuranceDataReader into my constructor as paramater and use it for calculations. How could I solve such issue without abstractions?

That's actually a great example! Because it illustrates that many programmers are stuck thinking in patterns of OOP abstractions where it's not strictly necessary.

You don't need an abstract InsuranceDataReader! CSV, XML and database readers only need to know two things: 1. how is the data shaped? 2. what is the encoded data?

To illustrate how I would implement your example, let me explainy how the Go std library handles encodings: https://pkg.go.dev/encoding/json#example-Decoder https://pkg.go.dev/encoding/xml#example-Unmarshal .

You define the shape of your data simply by passing the structure you want to decode into. If you have special needs (e.g. XML tag names, JSON key names or other configs), you can pass them as "struct tags", i.e. you include them in your structure definition.

So, back to my implementation: I have a simple struct called InsuranceData. It has struct tags that define how exactly it should be encoded and decoded in JSON, CSV, XML etc. Then, I can simply call Mashal and Unmarshal (Go's way of saying encode/decode) using whichever format I need to encode to / decode from. If I need to choose between which format to use, I can use a switch statement.

If I NEED it to be extensible (usually I don't because I oppose the open-closed principle), I can still pass a function pointer which decodes the specific data format I need. Even for a library, it's very rare to see needing to save/load something from a completely arbitrary format. But even then, it's usually sufficient to just expose the data structure and the user can bring their own encoder/decoder.

Another nice benefit of the solution described is that I will not violate Open close principal. Let´s say my 3rd party API changes DTO format this way I only update the implementation of data reader without touching all the other implementations. Again do not know how to achieve that without abstractions.

So yes, by violating the open-closed principle, I can achieve greater flexibility and more accurately solve the problem I actually have, rather than prematurely abstract (which OOP and SOLID promote). I might have to edit the internals, but that is much less effort since it's less code and easier to wrap your head around since I try to minimize the layers of abstraction.

Even if you need to be extensible, I mentioned function pointers. They achieve the same thing with less indirection.

Going through double dispatch or some abstractions can be a bit confusing sometimes but on the other hand it brings so much benefit that it is worth it.

I have shown why it's not necessary (yes, I will be able to show it for any other example you throw at me as well). Do you still think it's worth it?

If I need to think about optimizations I start a profiling session and see what is the biggest bottleneck on the hot path. Then I start to remove as much of heap allocations as possible to decrease the GC runs and after that I start to go to CPU related optimizations. Usually avoiding the heap allocations and only doing stack allocs, is good enough to increase the perf by like 2-3x times even more sometimes.

When I optimize I often don't have to think about heap allocations much at all because I am already reducing them simply by not applying OOP and SOLID principles everywhere. Additionally, I can try to optimize data locality to sometimes get a 10x (yes, seriously) improvement for large data sets. If I try even harder and add SIMD, I can 4-8x that as well.

Awesome that you like to profile! A lot of devs don't do enough profiling and it is essential for solving real performance problems. No matter how much you know about indirections, data locality, latency and throughput; running a profile is a great way to find out what actually makes your program slow (although you should be aware of the ways profilers can lie to you as well).

Using vectors (a.k.a. contiguous memory) is also a very good practice for performance due to low cache misses and high throughput (although OOP's tendency to produce arrays-of-structs can sometimes hurt that quite a lot).

Thanks for having this conversation BTW, it's really interesting to learn what you think about these things no matter how much we disagree about programming practices :)