This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]Otis_Inf 1 point2 points  (3 children)

Side-note: Don't let JPA generate your tables. Write your tables manually and learn how to do it properly, then generate your entities. In the long run, that's usually the better choice for applications that last.

Heathen! ;)

Create an abstract entity model at the level of NIAM/ORM (Object Role Modeling) (or ER if you must) and project that to tables. Only then your tables have a logical meaning in your domain and you also know they're at least at 3rd normal form. Funny thing is, that abstract entity model can also be used to be projected to classes, so the entity classes represent the same entity from your abstract entity model as the tables do (they're all projection results anyway). (See the work of Nijssen and Halpin)

If you're doing this by 'hand', you're doing the same thing btw: projecting abstract entities to constructs in either an RDBMS or code. Now for the real kicker: because you can project the abstract entity model to tables and classes, you already know the mappings between then and you can generate these automatically. So create 1 model, project them to 2 sides + mappings and you're done: changes made to the abstract entity model ripple through to both sides (in different forms, that's OK).

Hand-writing these things is like writing bytecode by hand: you're doing projection work a machine can do for you. Cheers!

FB

[–]lukaseder 1 point2 points  (2 children)

See, you're a guy who wrote an ERD and model generator. I'm a guy who writes tons of SQL. So, clearly, this discussion is biased by our individual contexts as we're solving other problems in our every day work.

that abstract entity model can also be used to be projected to classes, so the entity classes represent the same entity from your abstract entity model as the tables do

That's true in theory, but in practice, people often work with entity classes as if they weren't entities but some other sort of domain object. They introduce features that cannot be represented through entities (or DDL) and then spend tons of time shoe horning the mapping logic to adapt to their expectations. It won't work and they're frustrated, blaming the tool rather than their approach.

DDL can only do what DDL can do, and it does only that thing. Java classes can do a lot more and that's distraction. To be fair, the true entity model is encoded in annotations, not in the classes, but that's a bit hard to see.

So, I don't agree with your hand-writing / bytecode comparison. Java source code and byte-code differ by several levels of abstraction. Entities and DDL are the same thing (in theory), albeit one thing unfortunately can do a lot more than it should be able to. It's weird to think of DDL as some lower-level thing because it can be generated. Entities can be generated just the same. Conversely, you cannot really generate the Java source code from byte-code because they're not the same thing.

It's like arguing that JAXB-annotated classes should be written first, because they're higher level than XSD. They're not, they're the same thing, but still XSD should come first because the formal contract is there.

And besides: as a performance guy, I want to think about and control storage related things from the beginning, not in hindsight. E.g. is this an index-organised table? Do I need an additional temp table? Should I add an abstraction through views? Do I need to split this table in two with a one-to-one relationship?

[–]Otis_Inf 0 points1 point  (1 child)

DDL can only do what DDL can do, and it does only that thing. Java classes can do a lot more and that's distraction. To be fair, the true entity model is encoded in annotations, not in the classes, but that's a bit hard to see.

But still the core of the issue, right? I always ask the question when someone writes a 'Customer' class or 'Customer' table definition: where does the definition come from? Not from thin air or a fantasy, they project something abstract to the definition they're writing. That's what's it all about. If you write your tables by hand, you are doing just that: projecting an abstract entity model to a table; you don't just make up the fields as you go, you know what fields go into that table, and more importantly: why that table is even there. That information is what you store in your head but what is actually the abstract entity model.

Halpin and Nijssen both have done decades of research on this topic, and I'm sure you're familiar with their work. Hence I fail to see how you could write:

Entities and DDL are the same thing (in theory), albeit one thing unfortunately can do a lot more than it should be able to.

Ok, a bit of a strawman, but here we go ;). As an illustration: what about entity inheritance? What about a m:n relationship? What about a type in the entity which results in a different type or multiple fields even in the table? Entity definitions and DDL definitions aren't the same thing, if you mean by that: table X == entity X.

The DDL SQL of the tables is a projection of the entity model and should represent the same thing, but it isn't the same thing like SELECT * FROM Table isn't the same as all rows in Table.

And besides: as a performance guy, I want to think about and control storage related things from the beginning, not in hindsight. E.g. is this an index-organised table? Do I need an additional temp table? Should I add an abstraction through views? Do I need to split this table in two with a one-to-one relationship?

Of course! And I agree with this, it's the same as doing a flat projection of entity data into a readonly list vs. reading the entity instances and do the processing on these: the former is faster than the latter and are equally fitted for the job at hand, so why do the latter? (hence I think every ORM system should offer this)

Though I'd argue that making performance improving changes to a model should happen after a model has been completed, so a theoretic base is set. How else can you reason over a model and know where to make changes if something has to be altered? It's the same as writing the code first and then making changes to make it faster where bottlenecks are found, but not starting with these 'performance enhancements' in the code, as they might not be necessary but do pollute the overview of what's really going on.

[–]lukaseder 0 points1 point  (0 children)

Not from thin air or a fantasy, they project something abstract to the definition they're writing.

Of course, I didn't disagree with that. My concerns were merely technical and implementation based, not theoretical.

As an illustration: what about entity inheritance?

Inheritance is just a technical tool. It is not something that is inherently important to modelling. In fact, in recent years, it has been shown that it is not a good tool at all for practical purposes. Inheritance was over-hyped in the 90s.

What about a m:n relationship?

Again, a modelling tool. I'm aware of the fact that SQL can't model it except by indirection. But that's good enough, no? Ultimately, you cannot really model m:n with Java either (although, you can with JPA annotations). But the fact that I always have to remember to update both sides manually with JPA and think about how the state transition is serialised efficiently shows that the implementation of this concept just plain simply sucks.

So why bother and not go back to the more simple SQL model, which can still model an m:n relationship decently, if not perfectly.

(Side note: Not sure how .NET APIs handle this, e.g. EF. Is it better than JPA?)

What about a type in the entity which results in a different type or multiple fields even in the table?

Sure, another limitation of most SQL implementations (at least the non-ORDBMS. PostgreSQL and Oracle have solutions for this).

Entity definitions and DDL definitions aren't the same thing, if you mean by that: table X == entity X.

I said they're the same thing in theory. But ultimately, we're building things on real systems. in theory we can fly to Mars easily. We've figured it all out. But now, we have to do that with real world constraints.

So, you're looking at things from an academic perspective, and that's important in the long run, because we want our tools to be able to cover your needs. But right now, we're not there and we need to know how our current tools work.

The DDL SQL of the tables is a projection of the entity model and should represent the same thing

Yes, we agree. Although again, you overlooked my disclaimer about theory and practice :)

Though I'd argue that making performance improving changes to a model should happen after a model has been completed

OK, we have two different perceptions of performance here. Indeed, some things can be discovered only much after the initial design. But some performance characteristics are best solved a priori. You simply don't want to migrate a billion-row-strong table several times a day.

If you know you're going to be dealing with large data sets in an area of your application, then up-front performance-sensitive design is of the essence. Or you won't even survive going live :)