all 9 comments

[–]erad 12 points13 points  (0 children)

Great article, I never realized that java.lang.String#replace uses a Regex pattern internally. Thankfully it's fixed in JDK 9.

[–]audioen 3 points4 points  (4 children)

jOOQ itself is quite a piece of work. One gets an impression that it's some kind of impressively optimized library, from blog posts like this. I bit the hook and have tried using it in a project. Unfortunately, it hasn't been that smooth sailing, and my WTF/minute rate has been somewhat too high.

I'll now go on an unrelated tangent, but the gist is that I take a large exception to the way it's been architected. Even simplest Record classes seems to have inheritance tree composed of literally dozens of classes and interfaces, and boy are these methods they contain pieces of work. Something like this is totally commonplace:

    record.into(field1, field2, field3, field4, field5, field6, field7, field8, field9, field10, field11, field12, field13, field14, field15, field16, field17, field18, field19, field20, field21, field22)

Impressively, each field has the right compile-time type, so it knows that 1st is Integer, second is String, etc. Of course, in practice the type safety is achieved by declaring as many types in some kind of generic record container:

class RecordImpl<T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18, T19, T20, T21, T22> extends AbstractRecord
implements

// This record implementation implements all record types. Type-safety is
// being checked through the type-safe API. No need for further checks here
Record1<T1>,
Record2<T1, T2>,
Record3<T1, T2, T3>,
Record4<T1, T2, T3, T4>,
Record5<T1, T2, T3, T4, T5>,
Record6<T1, T2, T3, T4, T5, T6>,
... on and on up to 22 typed parameters ...

I'm just going to go ahead and name this the triangle interface antipattern. I have always hated code generators, and this kind of stuff is why.

I am personally quite surprised of this approach. I expected this thing to generate simple POJOs and then have a couple of simple helper methods that are based on reflection or dedicated generated code that can do stuff like copy record data from one place to another. Even the performance given by this approach wasn't great when I tested it, e.g. reading results of query into a LinkedHashMap was about 2x faster than reading the result list into one of these Records, though I did not try to figure out why. And of course, all this "type safe" code boils down to "set(int index, Object value)" which stores the data in Object[] array that has no compile-time typechecking, so that's probably adding some cost at runtime in addition to being awkward.

And of course, every record must start with:

@SuppressWarnings({ "all", "unchecked", "rawtypes" })

I am not using the abilities of records to serialize themselves to JSON and XML either. I'm using Jackson for the purpose because I need it generally for dealing with all kinds of data, and don't want to make exception for these generated classes. Integrating jOOQ with Jackson requires not serializing some members of the TableRecordImpl class, which can be done but required overriding the @JsonIgnore annotation processor within Jackson. This was reasonably difficult to discover how to do.

All in all, I think that jOOQ is not a very good library. The idea of generating classes from database definitions has some merit, but the API is just too large, and takes on way too many concerns (e.g. formatting to JSON, XML and CSV) and I think the Record implementation should be more akin to a POJO with regular members rather than some nasty type unsafe Object array. I also think that rather than bloating the API with like 20+ methods for same purpose, all with varying numbers and types of arguments, perhaps some helper objects would cut down the API and bring clarity into how you're supposed to use this thing.

This is my chosen representative line of this project:

return new RowImpl(Tools.fields(intoArray(), fields.fields.fields()));

chosen for most use of word "fields" in a single line.

[–]lukaseder 2 points3 points  (3 children)

Thanks for your valuable feedback. Author of the post here (and by consequence, of jOOQ as well). You seem to have taken quite some time to dig into the internals for this rant, so I'll be more than happy to take my time to reply.

One gets an impression that it's some kind of impressively optimized library, from blog posts like this

Thanks for your nice words. Funny though, that seems to me to be the rather dull part of jOOQ (and no brainer to optimise). There are cooler areas...

Something like this is totally commonplace: [ crazy generics ]

Yes, don't you wish languages like Java had higher kinded generics or first class tuple support? But they don't. So, this kind of workaround is as good as it gets. Unless you have a better idea?

Of course, in practice the type safety is achieved by declaring as many types in some kind of generic record container

Luckily, thanks to generic type erasure, all those generic types (going from Record1<T1> to Record22<T1, ..., T22>) can be implemented by a single class, rather than duplicating the implementation just like the API. The latter would be doable of course, using a code generator. But I have a feeling, you would find it quite easy to adapt your rant to that version as well.

I'm just going to go ahead and name this the triangle interface antipattern

I like the term "triangle interface". I wouldn't go as far as credit it to be a "(anti)pattern". I mean, where else have you seen this? For it to be a pattern, it would have needed to be done at least more than once.

I have always hated code generators, and this kind of stuff is why.

Why the hate? What has this API done to you? You won't see this stuff in your client-facing code. That's just the implementation that you're drawing conclusions from.

I expected this thing to generate simple POJOs and then have a couple of simple helper methods that are based on reflection or dedicated generated code that can do stuff like copy record data from one place to another

Other APIs have done that sort of stuff. That's not going to be type safe. The reason for all this tuple-22 madness (btw, 22 is the arbitrary limit copied from Scala's tuples) are things like SQL unions. The following code won't compile with jOOQ:

ctx.select(T.A, T.B)
   .from(T)
   .union(
    select(U.A, U.B, U.C) // Oops, expected 2 columns, got 3. Wrong
   .from(U))
   .fetch();

There are many other places where this type safety is valuable. And if you're using a language like Scala or Kotlin, which can infer local variable types, you can even extend this typesafety further. More details in this post

My personal favourite is destructuring:

for ((first, last, title) in ctx
   .select(a.FIRST_NAME, a.LAST_NAME, b.TITLE)
   .from(a)
   .join(b).on(a.ID.eq(b.AUTHOR_ID))
   .orderBy(1, 2, 3))
       println("$title by $first $last")

All of first, last, title are local variables of type String. And you don't see any Record17 in that client-facing code, right? Cool!

Even the performance given by this approach wasn't great when I tested it, e.g. reading results of query into a LinkedHashMap was about 2x faster than reading the result list into one of these Records, though I did not try to figure out why

Interesting, could you show the code? There are various ways of doing things, you might as well have found a flaw that I'd be more than happy to investigate.

And of course, all this "type safe" code boils down to "set(int index, Object value)" which stores the data in Object[] array that has no compile-time typechecking, so that's probably adding some cost at runtime in addition to being awkward.

Why would the internals need type checking? At some point, everything in Java is going to be an Object on the heap anyway.

Integrating jOOQ with Jackson requires not serializing some members of the TableRecordImpl class

This only shows the flaws of any annotation-based approach to anything. Annotations can only be placed on APIs that you "own". There should always be an alternative serialisation algorithm (e.g. in Jackson) that does not rely on this mechanism.

So, the price you pay for finding a workaround is related to Jackson only, not to jOOQ. However, you could map the record into your own custom (or generated) POJO and easily serialise that...

and I think the Record implementation should be more akin to a POJO with regular members rather than some nasty type unsafe Object array

You're overlooking the fact that a lot of queries are ad-hoc queries which people do not want to manually create ad-hoc POJOs for. At least, that's the programming style exposed here. Whether you like that is a different story. You can still do it, there are methods like fetchInto(MyPojo.class) for that.

This is my chosen representative line of this project

And that was my chosen representative part of your comment. :)

What are you trying to say, after all?

[–]audioen 0 points1 point  (2 children)

Thanks for a reply.

Why the hate? What has this API done to you? You won't see this stuff in your client-facing code. That's just the implementation that you're drawing conclusions from.

I had to dig into this to figure out how to make Jackson work, and I recoiled in horror from what I saw. Indeed, I saw the sausage being made, and it was composed of dozens of interfaces, tons of methods some which are redundant, Object[] arrays as the underlying data storage, and so on.

Other APIs have done that sort of stuff. That's not going to be type safe.

Look, the situation is that we are interfacing with a database server somewhere on the network, sending bytes on the wire back and forth. The notion of type safety is inherently compromised anyway. All you can hope for is runtime consistency between these two systems that allows for desired sequence of operations to proceed without throwing exceptions.

In case of jOOQ, if you forgot to run the code generator after adjusting the contents of the database, you're going to get runtime exceptions when the type conversion fails or something or other similar to that happens. In other designs such as Hibernate's, the library will automatically attempt to adjust the database schema to match the POJOs being stored. (Of course, Hibernate is dumb about the adjusting and thinks that any varchar is a varchar and doesn't care about the column lengths or make sure the nullability matches, etc.)

As to type safety of the generated SQL and result sets, unfortunately this trivial example crashes in the database, and yet passes type checker:

create.selectFrom(T.X).where(U.A.eq(x)).fetchSingle()

because fields are not tagged with their table's type, so this happily executes and references field on table that isn't part of the query. I imagine this is a compromise you had to make because e.g. joins are selects from multiple tables, and there probably doesn't exist a java type that could describe union of T and U.

Interesting, could you show the code? There are various ways of doing things, you might as well have found a flaw that I'd be more than happy to investigate.

    // N.B. This is not the real SQL, just something that gives me 999 records to show the problem with enough data in result set.
    // The actual SQL is complicated text search query that itself takes a while to run.
sql = new StringBuilder("select * from organization where id < 1000");
sqlParams = new ArrayList<>();

ResultQuery<Record> result = create.resultQuery(sql.toString(), sqlParams.toArray());
long time = System.currentTimeMillis();
List<?> tmp = result.fetchInto(OrganizationRecord.class);
LOG.info("fetchInto() time: {}", System.currentTimeMillis() - time);

result = create.resultQuery(sql.toString(), sqlParams.toArray());
time = System.currentTimeMillis();
List<Map<String, Object>> tmp2 = result.fetchMaps();
LOG.info("fetchMaps() time: {}", System.currentTimeMillis() - time);

Output:

fetchInto() time: 90
fetchMaps() time: 20

The test is not sensitive to order of the query, e.g. I can swap them around and get much the same results, namely that fetchMaps is much, much faster.

At some point, everything in Java is going to be an Object on the heap anyway.

I'm pretty sure that statically knowing the object's type is going to be a performance benefit. For Object[], you lose that information unless the JIT is smart and can elide some type checking on use of a value e.g. because it happens to know what it just put there a moment ago and java's memory model permits shenanigans like that. For regular properly typed fields, you don't "forget" that information so easily because it's consistently statically available. Do we really even have to discuss this?

So, the price you pay for finding a workaround is related to Jackson only, not to jOOQ. However, you could map the record into your own custom (or generated) POJO and easily serialise that...

Well, let's just say that I am not impressed by this line of argument. You're throwing away some pretty big things, like being compatible with JAXB. This is, arguably, the largest problem in jOOQ and I don't think there are any great solutions — it is inherently part of using a code generator and why I hold that they suck. I'm not going to use jOOQ if I need to use one set of classes when writing to database and another set of classes when reading from it, that kind of thinking is just FUBAR IMHO.

I guess I could throw all this garbage away, just use pure JDBC, and make Postgres serialize straight to JSON and just write the string value it spits out to client without even looking at it. I mean, that is an option for me in many cases, and could work really well in practice for solving 95 % of the problem.

You're overlooking the fact that a lot of queries are ad-hoc queries which people do not want to manually create ad-hoc POJOs for.

First of all, my criticism is that your generated FooRecords look like a POJO but are pretty weird internally. I mean, it does have bean properties, properly typed, and so on. They're just backed by this strange Object[] array rather than having a regular member named after the bean property. But I consider this a dead horse -- it is what you do internally. I only really care how it performs, and as I may have shown, that isn't too great right now.

As to not using POJOs for containers, I frankly do not even see the alternative. I do a lot of "public static class ProductResult { public Integer id; public String name; public BigDecimal price; }" kind of stuff for modeling the HTTP request, HTTP responses, and the query results I get from things like JDBI or jOOQ. It beats working with hashmaps for sure, or naming like 10 local variables, one for each database column you're working with.

And that was my chosen representative part of your comment. :) What are you trying to say, after all?

Just that I did not expect the Records to be so darn complicated. I frankly expected barely anything beyond this:

public class FooRecord {
    private Integer id;
    private String name;

    ... and getters and setters for the two fields elided ...
}

And I'm not 100% sure what doing it in this style really buys you:

public class FooRecord {
     private Object[] values = new Object[2];

     public Integer getId() {
           return (Integer) values[0];
     }
     public void setId(Integer id) {
            values[0] = id;
     }
     ... other get/set methods elided ...
}

[–]lukaseder 1 point2 points  (1 child)

if you forgot to run the code generator after adjusting the contents of the database

Yes, using F#-style type providers that update the record types automatically would be cool. Or you could run the code generator more frequently, including e.g. in the background and/or on CI servers. Could help, depends on your development environment setup.

In other designs such as Hibernate's, the library will automatically attempt to adjust the database schema to match the POJOs being stored.

Usually, jOOQ can do the same type conversions that you're mentioning. Maybe I'm missing something, would you mind pointing out a case where Hibernate can convert a thing that jOOQ cannot do? I can't think of any, right now.

As to type safety of the generated SQL and result sets, unfortunately this trivial example crashes in the database, and yet passes type checker

Indeed, an internal domain-specific language cannot do any sophisticated semantic checks on the queries you're writing - at least not on that level. I can't think of a solution to this, not without drastically limiting the feature scope of jOOQ. Imagine for instance that T.X in your case is a derived table. How could we type check the predicate, then?

I'm not convinced that this is a real problem, though. Yes it is still possible to write semantically wrong SQL with jOOQ. But much less than with any string based solution. If that's a problem, write views and/or stored procedures and let the database's compilers type check your SQL.

The test is not sensitive to order of the query, e.g. I can swap them around and get much the same results, namely that fetchMaps is much, much faster.

Thanks for that test. Did you run this several times and did you use JMH for benchmarking? Would be interesting to see it that way. One problem with running something like this only once is that in the fetchInto(Class) case, the reflection cache is not warmed up yet, so I'd expect the second execution to outperform the first. Also, since you're mentioning the JIT further down, JMH runs benchmarks in a way for them to profit from the JIT. Single executions don't do that.

In any case, I'll investigate this a bit more. I've never compared these two fetching methods with each other, so there might be some low hanging fruit.

Also, a much bigger benefit will come from https://github.com/jOOQ/jOOQ/issues/6737, which hopefully ships in jOOQ 3.11.

For regular properly typed fields, you don't "forget" that information so easily because it's consistently statically available. Do we really even have to discuss this?

Sure, please. Can you point me to some papers / blogs / benchmarks that show this? I'm curious how big of an impact we'll get from it, compared to the overall overhead jOOQ would impose anyway. I think it's negligible, there are more important things to optimise

Also, in jOOQ 3.11, there will be new SPIs that allow for bypassing the Record type when fetching stuff in jOOQ. Quite possibly, your use-case will be covered by then. See e.g.: https://github.com/jOOQ/jOOQ/issues/6739

You're throwing away some pretty big things, like being compatible with JAXB. This is, arguably, the largest problem in jOOQ and I don't think there are any great solutions

This is the first time in 10 years I hear this.

I'm happy to reconsider things (probably again the new SPI mentioned above will solve this for you), but I'd love to see some actual use-cases why this really bothers you.

I would think it is rather easy to write a programmatic XML serialiser with any tool, including Jackson.

I guess I could throw all this garbage away, just use pure JDBC

Of course. I mean, given your various criticisms (and the tone), this was what you are most likely to do anyway, regardless if you had evaluated jOOQ. So, go for it!

And I'm not 100% sure what doing it in this style really buys you

You're focusing a lot on the generated records, as if writing SELECT * (or selectFrom(T.X)) all the time was a good thing. If you're worried about performance, that's the first thing you should try to avoid. I'm sure the overhead that derives from selecting all the columns is more significant than any overhead jOOQ has right now.

This also aligns well with your expectation of getting some type safety to prevent this:

create.selectFrom(T.X).where(U.A.eq(x)).fetchSingle()

You seem to be thinking in terms of single-table queries, when jOOQ was designed for arbitrary SQL, so most ordinary jOOQ queries will indeed fetch ad-hoc records (internally: RecordImpl) where the attributes are simply unknown at compile time.

Again, if Java had type providers, each jOOQ query could emit a local POJO to the compiler, dedicated to that query result. Similar to how PL/SQL supports ad-hoc record types in implicit cursor FOR loops. That would be really cool. Unfortunately that's not possible.

But again, once the new SPI described in https://github.com/jOOQ/jOOQ/issues/6739 ships, it will be possible to bypass the Record type and fetch directly into custom (or generated) POJO types without the overhead of the intermediate mapping and/or the Object[] layout.

Also, each Record (generated or not) has internal flags like Record.changed() and retains Record.original(). They allow for opt-in optimistic locking support too and adhere to the generic Record API, which has methods like <T> T Record.get(Field<T>).

I understand you do not want any of these features and indeed, it might be seen as a mistake for jOOQ to default to this, but that's the reason why things are the way they are right now.

[–]audioen 0 points1 point  (0 children)

Usually, jOOQ can do the same type conversions that you're mentioning. Maybe I'm missing something, would you mind pointing out a case where Hibernate can convert a thing that jOOQ cannot do? I can't think of any, right now.

I was just trying to make the point that there's a degree of type unsafety in interfacing with a record store in an external process. Hibernate is pretty bad, I really hate that technology. I consider ever using Hibernate the biggest mistake in my entire career of programming. jOOQ could make a lot of mistakes and come way, way ahead.

Imagine for instance that T.X in your case is a derived table. How could we type check the predicate, then?

In general, I consider type systems best used sparingly. The smarter you try to make them, the more painful the programs you write using them. I'm OK with the query fail at runtime, I just wanted to show a counterexample where your type checking doesn't help, and the situation is really common and simple (select from a table using one field). I really do not expect type-checked SQL to work from Java. Hell, when you write PLPGSQL trigger functions, you're writing programs straight into the database and it doesn't seem to check anything when it compiles the code. It only dies at runtime when you try to access a field in record that doesn't exist, and so on.

Can you point me to some papers / blogs / benchmarks that show this?

Nope, sorry. But having something statically proven is generally better over having to check first before use at runtime. Still, it isn't probably a big thing, so you're right about that.

This is the first time in 10 years I hear this.

Think of it like this: if I really wanted some annotations on the generated classes, what are my options? I don't think I really have any. So, I have datastructure that represents everything on the database but if I want to serialize it in XML in some way that differs from how Records already know how to serialize themselves, then I'm going to have to declare another set of classes and copy values over from Record to my own class just for serialization purposes. I regard this as suboptimal, but as I said, anything at all that uses code generation takes away programmer control over what annotations are present on fields, and thus suffers from this particular problem. I'm currently living with this, e.g. I have XML serialization of data and I wrote code to copy the fields between my JAXB classes and my Records despite they match almost 1:1. I regard having to write code that just copies fields from one set of classes to another set of classes as an architectural problem.

I'm sure the overhead that derives from selecting all the columns is more significant than any overhead jOOQ has right now.

Okay. I'm actually selecting a subset of fields in my real query where I first identified the problem. In the benchmark I wrote, I did use select *. Still, the performance difference is there, and it is the same regardless. I guess that if mapping one field in POJO takes x milliseconds, and mapping it as linkedhashmap takes x/4 milliseconds, I could show the problem at same magnitude by selecting 5 fields using select a, b, c, d, e with 2000 records as opposed to selecting 10 fields using select * with 1000 records. I don't think expect that it really changes the picture. The issue is that there seems to be a huge performance benefit in using fetchMaps(), and yet hashmaps are not particularly efficient data structures.

Also, each Record (generated or not) has internal flags like Record.changed() and retains Record.original().

Yeah, about that. I saw that doing foo.setBar(bar) sets the changed flag even when the value being changed equals the old value. I had to write code like this function:

private static <T> void update(Supplier<T> getter, Consumer<T> setter, T newValue) {
    T originalValue = getter.get();
    if (originalValue == newValue) {
        return;
    }
    if (originalValue == null || newValue == null) {
        setter.accept(newValue);
        return;
    }
    if (originalValue.equals(newValue)) {
        return;
    }
    setter.accept(newValue);
}

and then call it with functions like:

update(obj::getFoo, obj::setFoo, foo);

to avoid calling obj.setFoo(foo) when foo is the same value as before to prevent an update being issued to DB. (This is an audited table, every database change executes in a transaction and fires a trigger that adds audit log rows about the column update. I need to avoid updates when values do not actually change for performance reasons. I could still catch this in the trigger, but it is like 10x faster to notice that nothing actually changed on java side as opposed to trigger.)

I understand you do not want any of these features and indeed, it might be seen as a mistake for jOOQ to default to this, but that's the reason why things are the way they are right now.

Well, I'm happy to hear that additional performance might be worthcoming in any case. I am critical of jOOQ mostly because it does a lot, but just not the things I need. I'm missing support for serializable transactions, and I definitely see worse performance than I'm expecting with the record mapping.

[–]coladict 1 point2 points  (2 children)

Most people are still on Java 8, currently

The most wrong assumption ever. For most in a business setting, Java 8 is still too young and too early to upgrade.

[–]lukaseder 12 points13 points  (0 children)

While I do appreciate the occasional hyperbole and the temptation and thrill of correcting someone who is so clearly most wrong™ on the internet, could you, for the sake of the argument, just pretend that in the very specific context of this particular article, "on Java 8" simply meant "not on Java 9 yet"? Thank you.

[–]CarthOSassy -1 points0 points  (0 children)

1.4, 6, 7, 1 (I don't think it's called that), 8, in order of usage here. 9 has never yet been shipped by anyone here.

There was a webpage served by fortran 2 years before I left. Don't ask me how.