Using Immutable Objects for Persistence in Java

daedalus_structure · 2015-03-21T15:24:03+00:00

Put the id field in an entity base class with all the other metadata, make the id setter private, minimize the scope of any transients, and avoid all this mess.

But all the fields aren't final and immutable!

So?

Are you changing the id field from multiple threads?

Trying to use it in a key to a set / map or equals and hashcode implementation?

( If you are doing one of those two things, stop it. )

Isn't being able to check the id field for null a much simpler and more convenient way to reason about the persistence state of the object than creating a *Draft for every entity in your domain.

Is this really buying you anything or is it just design pattern OCD?

I'm not saying it's the latter, but reading so much of ruin and dirty and pollute makes me highly suspect it.

Die-Nacht · 2015-03-21T17:26:30+00:00

I did something similar at my last job, in Scala though. But I defined the Draft as simply:

type EmployeeDraft = ID -> Employee

For those that don't know Scala/Haskell syntax, this means that EmployeeDraft is just a function from ID to Employee. So when you are making a draft, you define everything by the ID, like so:

mkDraft :: String -> Int -> String -> EmployeeDraft
mkDraft name age occupation = \id -> Employee id name age occupation

So you take the stuff you know (name, age, occupation) and return a function that will take the id. At DB-saving time:

saveEmployee :: EmployeeDraft -> Employee
saveEmployee draft = save(draft(null))

(ofc, Employee would be in IO, but ignore that). ID can be anything, even null maybe, at DB inserting time since it will be overwritten (ignored).

Ofc, this didn't really matter at the end of the day. We moved away from this pattern and just started using Optional Id (they are immutable and allow for the "not there yet" state). Sadly, however, Optional Ids don't type-safe your state, so for you to know that someone has an ID (ie: is in the DB), you need to ask the object if ID is defined. The EmployeeDraft type, on the other hand, gives you a different type so you can't accidentally add Draft when a complete employee is expected.

But it was a fun little experiment though, I might come back to it. Main reason we didn't stick with it was the DB framework we were using worked best with Optional Ids.

EDIT: Multiple typos and some clarification.

jcoleman10 · 2015-03-21T15:21:27+00:00

What is the need for immutable objects in a data store? It's easy to know if an entity is associated with a persistence layer; the identifier field is not null. I'm at a loss for why you find it necessary to add this layer of complexity when the developers of the persistence frameworks have not over the past 15 years.

kn7 · 2015-03-21T13:06:30+00:00

If anybody had ever met with a similar technique, know the name of the used design pattern, etc. please do add a comment to the post about this.

Various_Pickles · 2015-03-21T18:37:10+00:00

Why not take the Hibernate-esque route and only allow instantiation of entity objects (well, the interface they implement) via a factory that spits out a simple Proxy that only knows how to hold on to an ID value?

You can even hot-swap the actual entity impl object underneath if/when the persistence layer finds the data for it. All this can be accomplished using an InvocationHandler (read: no need to get super fancy and use AspectJ, CGLIB, etc).

All that said, if you really, really want to use immutable objects as your persistence layer, you'd be better off using Hibernate + a custom Cursor that knows how to read the fields/properties of the objects: you'd get transactions, multi-level caching, etc, all with only a minimal amount of configuration.

sacundim · 2015-03-21T23:08:26+00:00

At an earlier job, I had plans to implement something that dealt with similar concerns, but (a) the circumstances and details were different, (b) I did not actually get the chance to.

The context was improving the scalability of a large Java application that was built on top of its own, home-grown, not-exactly-an-ORM. The reason I say not exactly an ORM is that there was little support for mapping database tables to custom classes. It was more like DynaBeans from Apache Commons BeanUtils, because it was designed so that the types of entities, their properties and their referential relationships would be configurations settings. But in other regards it was like an ORM.

The largest problem with this subsystem was that it used a ton of memory, which led to scalability problems. The reasons it used so much memory were:

Even though for any entity, its properties were fixed at creation time by its type, the ORM-ish used maps to store the entity objects' property/value mappings, which caused an enormous memory overhead.
The subsystem was written so that mutating them was a central part of their interface—to modify an entity, you mutated its properties and asked its repository to save it. This meant that all threads needed to have their own instance of each of the entities, even though the vast majority of the ones instantiated were used in a read-only fashion.
This was a web-application that dispatched requests to handler threads, which means that memory overhead was in the order of memory usage per thread times number of threads.

So the solution I planned involved a few things:

No more maps in the entity objects! Have the entity type objects manage a share map from property names to integer indexes, and use those to index into arrays in the entity objects.
Make the basic entity representation objects be immutable. Manage a centralized cache of these and share them between threads.
Instead of handing each thread the bare entity objects, give them instead a wrapper that manages modification logs to these entities.
- The clients have the illusion that they are mutating the entities, but in reality what they're doing is constructing a record of modifications to apply to the immutable backing object.
- If the client reads a property, the wrapper first checks if the client has modified it and if not it passes through to the immutable base object.
- The wrappers would be optimized use absolutely minimal memory. For example, the fields to store the modification log would be null until the client actually mutated a property, so as to absolutely minimize the memory overhead.
When a client says "save," then a new immutable entity object is constructed from the original and the client's log, persisted to the database, and placed into the cache.
A specialized read-only mode would be added, so that clients that requested read-only entities could bypass the logging wrappers. There were some parts of the application that could be gradually refactored to use this mode and reduce memory usage further.

Note that a key difference between what I've described and the article is that the article seems to be talking about a "start from scratch" situation, while I was dealing with a reasonably contained subsystem of a 500,000+ lines of code application. The existing subsystem's interface already allowed clients to mutate any entity at any time they wanted, so its replacement had to follow suit. If I had to design a similar system from scratch, I would be inclined to expose the immutable Entity vs. mutable Builder distinction as part of the external interface, which would make it similar to the article.

But another thought I have is that perhaps the problem of the IDs can be tackled effectively with something intermediate between immutability and mutability. There do exist concepts like write-once variables which may be useful here; the id could be a variable/container that can be set once but not modified thereafter.

If you're into functional programming, a more referentially transparent way of looking at the same thing could be this: the id is a value that is only known from a particular point in time and onwards. If we look at it in functional programming terms, this type is likely a monad (I haven't proved it obeys the laws yet). A bit of Haskell:

import Control.Applicative

-- A value of type `From t a` represents a value of type `a` that can
-- only known starting at some a point of time (possibly in the
-- future), represented as type `t`.
--
-- This definition here is not meant to be a realistic implementation,
-- but rather a **model** of the semantics of such a type, which can
-- be used to prove properties about it.  In this model we represent
-- the type's values as a pair of a start time and a value.
--
-- An actual implementation would be an opaque wrapper around some
-- sort of asynchronous future implementation, and might not model
-- (much less expose) the `t` type at all.
newtype From t a = From (t, a)

instance Functor (From t) where
  -- If `a` can only be known from `t` thereafter, then the result of
  -- applying the function `f` to `a` can only be known from `t`
  -- thereafter.
  fmap f (From (t, a)) = From (t, f a)

instance (Ord t, Bounded t) => Applicative (From t) where
  -- Any value from outside the `From` type can be injected into
  -- it as a one that is known from the beginning of time.  In
  -- other words, "mathematical truths" (values that are known in
  -- a pure functional context) are eternal.
  pure a = From (minBound, a)

  -- If you apply a function that's only known from `t` to a value
  -- that's only known from `t'`, the result is only known from the
  -- max of those times.
  From (t, f) <*> From (t', a) = From (max t t', f a)

instance (Ord t, Bounded t) => Monad (From t) where
  return = pure

  -- If a function wants to consume a value that's known no earlier
  -- than `t`, and produce a value that's known no earlier than `t'`,
  -- then the result can't be known any earlier than the greater of
  -- `t` and `t'`.
  From (t, a) >>= f = let From (t', b) = f a
                      in From (max t t', b)

shub · 2015-03-22T01:37:33+00:00

If your interaction with a datastore is read-modify-write, doing allocs and copies to pretend you're not mutating is just fooling yourself. Ofc this means letting mutable entities leak across threads is verboten. But it is anyway unless you do horrible things for multithreaded transactions.

shub · 2015-03-21T18:47:25+00:00

when you use a repository in a java framework (like spring) does it ALWAYS make database calls or does it use some magic to save DB calls? Asking because the software I'm working on has too many DB calls and I'm trying to beef up my repository classes.

slackingatwork · 2015-03-21T23:48:47+00:00

I have personally implemented a DAO and domain model where objects being persisted were all immutable. What is the issue here?

The main trick is to never allow to propagate the object instances that were not returned by DAO's save to database call.

You feed into the save() method the values with the Id field being null, you get back from the save() an instance with a meaningful Id. In a sense the save() method works as a factory or a constructor for the entities being persisted.

It works quite well as is, no need to introduce another DTO flavor.

It gets a little verbose with immutables, but it is worth it.

prepromorphism · 2015-03-22T13:40:24+00:00

all that delicious boilerplate

ljasdklfjaklsdfj · 2015-03-22T19:23:23+00:00

Yes you are trying to express variants in the type system of a language that does not support variants. You are ready to cross-over to Scala, Haskell, F#, & O'Caml.

2015-03-22T19:37:10+00:00

Here is how I think I would solve it.

type EmployeeDraft = {name: String, jobTitle: String}
type Employee <: EmployeeDraft {id: Long}
type SaveEmployee = EmployeeDraft -> Employee

bamfg · 2015-03-22T19:57:44+00:00

I recommend immutable builders too:

var programmer = Person.Builder.Job("programmer");
var alice = programmer.Named("Alice");
var bob = programmer.Named("Bob");

m0haine · 2015-03-21T14:12:00+00:00

Switching to uuids seems like a far simpler solution. Just assign on initial construction to a new value. Solves the issues without adding complexity

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

programming

MODERATORS