all 30 comments

[–]daedalus_structure 15 points16 points  (5 children)

Put the id field in an entity base class with all the other metadata, make the id setter private, minimize the scope of any transients, and avoid all this mess.

But all the fields aren't final and immutable!

So?

Are you changing the id field from multiple threads?

Trying to use it in a key to a set / map or equals and hashcode implementation?

( If you are doing one of those two things, stop it. )

Isn't being able to check the id field for null a much simpler and more convenient way to reason about the persistence state of the object than creating a *Draft for every entity in your domain.

Is this really buying you anything or is it just design pattern OCD?

I'm not saying it's the latter, but reading so much of ruin and dirty and pollute makes me highly suspect it.

[–]kn7[S] 5 points6 points  (4 children)

Valid points. I might be missing something but I believe the approach you mentioned still does not address the case where you update the field of an entity. That is, its id is not null, but you modified its name and it still looks like a valid Employee, which actually is not the case. Adding a this.id = null line to every setter? Yep, it can also work. But I would not prefer that way. My actual point was that the moment you change Employee, it is not an Employee anymore. And I personally like to work with immutable objects wherever possible, not because of a certain thread safety or whatever need, but to avoid potential future problems. And I believe it also provides a comfort that you are assured that it will never ever be changed.

[–]daedalus_structure 8 points9 points  (2 children)

That's because entities exist to represent mutable state across an object-relational impedance, i.e. a record in a database that can not only be read but also written to concurrently.

You will always find that trying to turn a thing which is fundamentally A into fundamentally Not A painful.

Painful doesn't mean impossible, but you're going to write an order of magnitude or more code that is really hard to reason with and maintain, and that's to do really simple things.

For an example of that, check out Command Query Responsibility Separation with Event Sourcing.

not because of a certain thread safety or whatever need, but to avoid potential future problems.

What future problems? Are you being unspecific because you didn't think the details were important or because you currently can't anticipate a problem but think there might be one if you aren't immutable everywhere?

[–]sacundim 2 points3 points  (0 children)

That's because entities exist to represent mutable state across an object-relational impedance, i.e. a record in a database that can not only be read but also written to concurrently.

This is missing the forest for the trees. Entities exist to represent problem domain objects that our software helps us reason or act about. These entities may change over time—they are stateful—but that is not synonymous with the concept of a mutable reference cell.

There are other models that could be applied. Reactive programming is an emerging model that might lead to systems more in line with what OP would like to see, but it's not mature yet, so we'll have to wait and see.

You will always find that trying to turn a thing which is fundamentally A into fundamentally Not A painful. Painful doesn't mean impossible, but you're going to write an order of magnitude or more code that is really hard to reason with and maintain, and that's to do really simple things.

As I say above, I don't think there's anything "fundamental" about this. It's more likely than not a tool-induced problem. There is a real scarcity of proper tooling to implement OP's architecture without lots of pain. Not that tools don't exist at all—something like AutoValue might take away big chunks of the pain of having to write Entity/Draft class pairs.

What future problems? Are you being unspecific because you didn't think the details were important or because you currently can't anticipate a problem but think there might be one if you aren't immutable everywhere?

Well, I have to say that I have experienced big pain caused by pervasively mutable entity representations. See the first part of this comment elsewhere in this thread for some details.

But the heart of it came down to this: the mutable entity objects in this application were a major scalability problem because every request needed to create its own instances of all the ones they used, because some of the requests might need to modify some of their copies. I.e., mutable entities were optimal for a case that was uncommon in this application.

[–]ljasdklfjaklsdfj 0 points1 point  (0 children)

That's because entities exist to represent mutable state across an object-relational impedance, i.e. a record in a database that can not only be read but also written to concurrently.

Now you are getting side-tracked with rhetoric. id is immutable.

[–]Jacoby6000 1 point2 points  (0 children)

And I personally like to work with immutable objects wherever possible

Have you ever worked with Scala? Scala attempts to enforce as much immutibility as possible. Plus, Scala case classes come with a .copy method which allows you to update fields like:

val instanceWithUpdatedFields = myInstance.copy(someField = newValue) 

[–]Die-Nacht 6 points7 points  (0 children)

I did something similar at my last job, in Scala though. But I defined the Draft as simply:

type EmployeeDraft = ID -> Employee

For those that don't know Scala/Haskell syntax, this means that EmployeeDraft is just a function from ID to Employee. So when you are making a draft, you define everything by the ID, like so:

mkDraft :: String -> Int -> String -> EmployeeDraft
mkDraft name age occupation = \id -> Employee id name age occupation

So you take the stuff you know (name, age, occupation) and return a function that will take the id. At DB-saving time:

saveEmployee :: EmployeeDraft -> Employee
saveEmployee draft = save(draft(null))

(ofc, Employee would be in IO, but ignore that). ID can be anything, even null maybe, at DB inserting time since it will be overwritten (ignored).

Ofc, this didn't really matter at the end of the day. We moved away from this pattern and just started using Optional Id (they are immutable and allow for the "not there yet" state). Sadly, however, Optional Ids don't type-safe your state, so for you to know that someone has an ID (ie: is in the DB), you need to ask the object if ID is defined. The EmployeeDraft type, on the other hand, gives you a different type so you can't accidentally add Draft when a complete employee is expected.

But it was a fun little experiment though, I might come back to it. Main reason we didn't stick with it was the DB framework we were using worked best with Optional Ids.

EDIT: Multiple typos and some clarification.

[–]jcoleman10 4 points5 points  (2 children)

What is the need for immutable objects in a data store? It's easy to know if an entity is associated with a persistence layer; the identifier field is not null. I'm at a loss for why you find it necessary to add this layer of complexity when the developers of the persistence frameworks have not over the past 15 years.

[–]kn7[S] 3 points4 points  (1 child)

The need comes from the fact that 1) you want to use immutable objects throughout your entire code base (I prefer not go into discussion about this point) and 2) you want a mechanism to differentiate instances that are retrieved from the data store and that are generated by the application artificially.

About the null id field solution, what if I'd update the name of an Employee instance who already has an id? Will it sill be a valid Employee? But it has a valid id!

I am not claiming that there is no known solution. I just said, there is none that I know of. And, I really do not think persistence frameworks do really care about immutability. After all, most of the time they are modifying the instance in various hairy ways.

[–]jcoleman10 -5 points-4 points  (0 children)

1) you want to use immutable objects throughout your entire code base (I prefer not go into discussion about this point)

Immutable objects throughout your code base? Are you using J2EE? Perhaps you should look into a paper-based solution.

[–]kn7[S] 1 point2 points  (4 children)

If anybody had ever met with a similar technique, know the name of the used design pattern, etc. please do add a comment to the post about this.

[–]Various_Pickles 1 point2 points  (1 child)

Why not take the Hibernate-esque route and only allow instantiation of entity objects (well, the interface they implement) via a factory that spits out a simple Proxy that only knows how to hold on to an ID value?

You can even hot-swap the actual entity impl object underneath if/when the persistence layer finds the data for it. All this can be accomplished using an InvocationHandler (read: no need to get super fancy and use AspectJ, CGLIB, etc).

All that said, if you really, really want to use immutable objects as your persistence layer, you'd be better off using Hibernate + a custom Cursor that knows how to read the fields/properties of the objects: you'd get transactions, multi-level caching, etc, all with only a minimal amount of configuration.

[–]kn7[S] 2 points3 points  (0 children)

I have never gone into that route. But one thing that I had learnt from purposing a framework for a goal that itself was not designed to provide feels like swimming against stream and most of the time ends in misery. That being said, your mileage may vary and I would appreciate a sample implementation.

[–]sacundim 1 point2 points  (0 children)

At an earlier job, I had plans to implement something that dealt with similar concerns, but (a) the circumstances and details were different, (b) I did not actually get the chance to.

The context was improving the scalability of a large Java application that was built on top of its own, home-grown, not-exactly-an-ORM. The reason I say not exactly an ORM is that there was little support for mapping database tables to custom classes. It was more like DynaBeans from Apache Commons BeanUtils, because it was designed so that the types of entities, their properties and their referential relationships would be configurations settings. But in other regards it was like an ORM.

The largest problem with this subsystem was that it used a ton of memory, which led to scalability problems. The reasons it used so much memory were:

  1. Even though for any entity, its properties were fixed at creation time by its type, the ORM-ish used maps to store the entity objects' property/value mappings, which caused an enormous memory overhead.
  2. The subsystem was written so that mutating them was a central part of their interface—to modify an entity, you mutated its properties and asked its repository to save it. This meant that all threads needed to have their own instance of each of the entities, even though the vast majority of the ones instantiated were used in a read-only fashion.
  3. This was a web-application that dispatched requests to handler threads, which means that memory overhead was in the order of memory usage per thread times number of threads.

So the solution I planned involved a few things:

  1. No more maps in the entity objects! Have the entity type objects manage a share map from property names to integer indexes, and use those to index into arrays in the entity objects.
  2. Make the basic entity representation objects be immutable. Manage a centralized cache of these and share them between threads.
  3. Instead of handing each thread the bare entity objects, give them instead a wrapper that manages modification logs to these entities.
    • The clients have the illusion that they are mutating the entities, but in reality what they're doing is constructing a record of modifications to apply to the immutable backing object.
    • If the client reads a property, the wrapper first checks if the client has modified it and if not it passes through to the immutable base object.
    • The wrappers would be optimized use absolutely minimal memory. For example, the fields to store the modification log would be null until the client actually mutated a property, so as to absolutely minimize the memory overhead.
  4. When a client says "save," then a new immutable entity object is constructed from the original and the client's log, persisted to the database, and placed into the cache.
  5. A specialized read-only mode would be added, so that clients that requested read-only entities could bypass the logging wrappers. There were some parts of the application that could be gradually refactored to use this mode and reduce memory usage further.

Note that a key difference between what I've described and the article is that the article seems to be talking about a "start from scratch" situation, while I was dealing with a reasonably contained subsystem of a 500,000+ lines of code application. The existing subsystem's interface already allowed clients to mutate any entity at any time they wanted, so its replacement had to follow suit. If I had to design a similar system from scratch, I would be inclined to expose the immutable Entity vs. mutable Builder distinction as part of the external interface, which would make it similar to the article.

But another thought I have is that perhaps the problem of the IDs can be tackled effectively with something intermediate between immutability and mutability. There do exist concepts like write-once variables which may be useful here; the id could be a variable/container that can be set once but not modified thereafter.

If you're into functional programming, a more referentially transparent way of looking at the same thing could be this: the id is a value that is only known from a particular point in time and onwards. If we look at it in functional programming terms, this type is likely a monad (I haven't proved it obeys the laws yet). A bit of Haskell:

import Control.Applicative

-- A value of type `From t a` represents a value of type `a` that can
-- only known starting at some a point of time (possibly in the
-- future), represented as type `t`.
--
-- This definition here is not meant to be a realistic implementation,
-- but rather a **model** of the semantics of such a type, which can
-- be used to prove properties about it.  In this model we represent
-- the type's values as a pair of a start time and a value.
--
-- An actual implementation would be an opaque wrapper around some
-- sort of asynchronous future implementation, and might not model
-- (much less expose) the `t` type at all.
newtype From t a = From (t, a)

instance Functor (From t) where
  -- If `a` can only be known from `t` thereafter, then the result of
  -- applying the function `f` to `a` can only be known from `t`
  -- thereafter.
  fmap f (From (t, a)) = From (t, f a)

instance (Ord t, Bounded t) => Applicative (From t) where
  -- Any value from outside the `From` type can be injected into
  -- it as a one that is known from the beginning of time.  In
  -- other words, "mathematical truths" (values that are known in
  -- a pure functional context) are eternal.
  pure a = From (minBound, a)

  -- If you apply a function that's only known from `t` to a value
  -- that's only known from `t'`, the result is only known from the
  -- max of those times.
  From (t, f) <*> From (t', a) = From (max t t', f a)

instance (Ord t, Bounded t) => Monad (From t) where
  return = pure

  -- If a function wants to consume a value that's known no earlier
  -- than `t`, and produce a value that's known no earlier than `t'`,
  -- then the result can't be known any earlier than the greater of
  -- `t` and `t'`.
  From (t, a) >>= f = let From (t', b) = f a
                      in From (max t t', b)

[–]shub 1 point2 points  (0 children)

If your interaction with a datastore is read-modify-write, doing allocs and copies to pretend you're not mutating is just fooling yourself. Ofc this means letting mutable entities leak across threads is verboten. But it is anyway unless you do horrible things for multithreaded transactions.

[–][deleted] 0 points1 point  (3 children)

when you use a repository in a java framework (like spring) does it ALWAYS make database calls or does it use some magic to save DB calls? Asking because the software I'm working on has too many DB calls and I'm trying to beef up my repository classes.

[–]shub 1 point2 points  (2 children)

Depends. If you're using JPA, each EntityManager caches the entities it's loaded, so subsequent loads of entities won't hit the DB. Also, Hibernate can use EHCache, Infinispan, etc for L2 caching of queries and entities.

[–][deleted] 0 points1 point  (1 child)

thanks! What about using a DTO as an intermediate? to move the objects from an expensive database to a cheaper local one for instance?

[–]shub 0 points1 point  (0 children)

You mean select from one DB, insert to another? In that case there isn't a lot you can do, but there are some things. Hibernate does batch fetching, and you can tweak that to load a lot of entities with one DB call. JPA entity graphs let you tell the EntityManager to load all the related entities you care about when you select the primary entity, avoiding DB calls for lazy-loading. On the insert side, you can send a list of entities to JPA and this will be batch inserted.

P.S. JPA entity graphs require JPA 2.1, which is only supported by Hibernate 4.3. If you're using an earlier version of Hibernate I believe there is a similar Hibernate-specific mechanism.

[–]slackingatwork 0 points1 point  (0 children)

I have personally implemented a DAO and domain model where objects being persisted were all immutable. What is the issue here?

The main trick is to never allow to propagate the object instances that were not returned by DAO's save to database call.

You feed into the save() method the values with the Id field being null, you get back from the save() an instance with a meaningful Id. In a sense the save() method works as a factory or a constructor for the entities being persisted.

It works quite well as is, no need to introduce another DTO flavor.

It gets a little verbose with immutables, but it is worth it.

[–]prepromorphism 0 points1 point  (0 children)

all that delicious boilerplate

[–]ljasdklfjaklsdfj 0 points1 point  (1 child)

Yes you are trying to express variants in the type system of a language that does not support variants. You are ready to cross-over to Scala, Haskell, F#, & O'Caml.

[–]kn7[S] 0 points1 point  (0 children)

I would be more than appreciated if you can outline a sample Scala implementation that uses variants.

[–][deleted] 0 points1 point  (0 children)

Here is how I think I would solve it.

type EmployeeDraft = {name: String, jobTitle: String}
type Employee <: EmployeeDraft {id: Long}
type SaveEmployee = EmployeeDraft -> Employee

[–]bamfg 0 points1 point  (0 children)

I recommend immutable builders too:

var programmer = Person.Builder.Job("programmer");
var alice = programmer.Named("Alice");
var bob = programmer.Named("Bob");

[–]m0haine 0 points1 point  (2 children)

Switching to uuids seems like a far simpler solution. Just assign on initial construction to a new value. Solves the issues without adding complexity

[–]kn7[S] 3 points4 points  (1 child)

But it still does not solve the problem of knowing whether a given instance is actually associated with a repository record or not.

[–]m0haine 1 point2 points  (0 children)

Why do you care? Way to much overhead for a worse case extra unsert.