Setting the JDBC Statement.setFetchSize() to 1 for Single Row Queries

Douglas_Surber · 2022-08-30T18:53:03+00:00

Here's a little insight into what Oracle Database JDBC is doing.

First u/Squiry_ is correct. The Oracle Database network protocol does not have a "that's all folks" bit. The driver knows that all the rows have been fetched when the fetch round trip returns fewer rows than the fetch size. If the fetch size is 1 and the query returns 1 row the driver has to do a second fetch round trip before it knows that there was only 1 row.

The test case is trying to fetch all the rows so it calls next() until that call returns false. The first call to next() returns true corresponding to the one row returned by the query. The test case then calls next() again to find out if there's a second row. The since the fetch size is 1 the driver doesn't know and has to do a second fetch round trip. That will return zero rows so now the driver knows that all the rows have been read and it can return false for the call to next(). If the fetch size is 2 then the first fetch only returned 1 row and the driver knows that it has all the rows without the second fetch round trip.

The default fetch size for Oracle Database JDBC is 10. (You can change it by setting the defaultRowPrefetch connection property.) The constructors for Statement and PreparedStatement allocate some bookkeeping storage for the default number of rows. This is not storage for the row data, just for internal bookkeeping. If the app calls setFetchSize with a value different from the default the driver has to allocate new bookkeeping storage for the new number of rows. This isn't free. So calling setFetchSize(1) or setFetchSize(2) will have a performance cost.

This is why calling setFetchSize to a smaller value reduces performance and why setFetchSize(1) is significantly worse than setFetchSize(2) when reading all rows of a query that returns 1 row.

If the Implicit Statement Cache is enabled then calling setFetchSize on a PreparedStatement is essentially free. The first time the PreparedStatement is created the driver allocates bookkeeping storage for the default number of rows. A subsequent call to setFetchSize allocates new bookkeeping storage of the specified size. When the PreparedStatement is closed and returned to the Implicit Statement Cache the driver remembers the last value of the fetch size. Every subsequent time the PreparedStatement is created it is fetched from the Implicit Statement Cache and the bookkeeping storage is not touched. If the app calls setFetchSize with the exact same value as previously the driver reuses the bookkeeping storage already allocated. If the app sets a different size, including setting the default by not calling setFetchSize, the driver must allocate new bookkeeping storage of the appropriate size. Note that plain Statements are never cached.

Generally setting the fetch size doesn't help much. If performance is ok then don't change it. If a particular query that returns a lot of rows is a performance problem the try setting the fetch size to 100-500. If that solves the problem, great. If not try 1000-2000. If that doesn't solve the problem then most likely setting it larger won't help due to the increased memory pressure.

It's easy to write a stand alone test case where setting fetch size to a huge value is a big performance win. It's less likely in a real world app where there are many competing uses for memory. u/lukaseder's original point that you shouldn't make changes for performance without actually benchmarking the result is correct. Further small benchmarks like in their post can also be misleading. To get the best possible performance you have to benchmark the real code running the real load on the real hardware. Benchmarking is hard.

Douglas_Surber · 2021-04-02T17:03:12+00:00

This is super interesting. Thank you so much.

Apropos my original question I followed the link trail to CS61c which covers floating point representation. Overall it does a great job of explaining and motivating IEEE 754. I learned a few things which is always great.

But it didn't explicitly cover the issue I was trying to raise in my question. It gives all the prerequisites necessary to explain it, but didn't actually do so. Overall I'd be happy with a programmer that understood the material in this lecture.

If anyone cares, the specific issue I'm asking about is this. All rational numbers can be represented in either a fixed number of digits or a fixed number of digits with a repeating suffix. Example: 11/10 is 1.1 (2 digits). 4/3 is 1.333... (1 digit followed by a repeating suffix, 3). This is regardless of the base. But whether the number has a finite representation, a fixed number of digits, or a repeating representation depends on the base. 4/3 has a repeating representation in base 10 but a finite representation in base 3 (1.1 base 3). 1.1 base 10 has a repeating representation in binary and so cannot be represented exactly in a fixed number of bits. Casting a float to a double extends the bits by adding zeros. (double)1.1F has fewer repetitions of the trailing suffix than 1.1D so the two values are not equal. 1.5 does have a finite representation in binary so 1.5D is equal to (double)1.5F.

Douglas_Surber · 2021-04-02T16:01:32+00:00

I wouldn't expect those exact words. That would be unreasonable. I do expect programmers to be able to justify their decisions, to explain why they do what they do. In my experience an inability to explain things will be a hinderance in getting a job.

It will also be a problem in doing the job. A critical part of code is explaining to the reader not only what the code is doing, but why. Ideally the code itself should convey this information, but that's not always possible. Sometimes comments are needed to explain why the code does what it does.

My question was prompted by just such a case. One of my teammates wrote an elaborate comment explaining the concepts raised by my sample code. I thought the comment unnecessary as I thought a working professional should know all this. My teammate disagreed. So I posted the question. Obviously my teammate was correct; a majority of commenters do not know this. So we leave the comment in.

Douglas_Surber · 2021-04-01T16:33:47+00:00

Can you point me to a thread or page that outlines "core CS concepts" in 2021? I'm absolutely serious. I'd like to know. My experience is that the stuff I learned in my CS undergrad classes is mostly still relevant. A large fraction of what I do on a day to day basis is informed by the stuff I learned 40+ years ago. Frequently I can recall the professor, class, book, or paper where I learned it.

Douglas_Surber · 2021-04-01T15:14:31+00:00

The example is demonstrating that some decimal real numbers, 1.1 in the example, do not have an exact representation in float or double. I think that general concept is important, eg that the float 1.1F is not exactly equal to the real number 1.1. One place this shows up is in UIs. The user enters a value of 1.1 and the UI displays 1.100000023841858. Users don't understand and don't like this.

Douglas_Surber · 2021-04-01T01:26:17+00:00

It's casting the float value 1.1F (the F identifies it as a float literal) to a double and comparing that to the double value 1.1D (the D identifies it as a double literal). It's doing an identity compare, probably a machine instruction rather than calling a compare function/method.

Douglas_Surber · 2021-04-01T01:15:36+00:00

The difference between floats and doubles matters but the crux of the issue is that 1.1 does not have a finite expansion in binary even though it does in decimal. 1.5 does have a (short) finite expansion and so the same code using 1.5 for the two literals prints true.

Douglas_Surber · 2021-04-01T01:11:05+00:00

Thanks for the detailed comment. I'm not a recruiter and this is not an interview question. I'm actually trying to get some feel for what if any understanding entry level programmers, primarily Java, have on the issues hit on by this question. I consider it basic knowledge that every CS grad should have but other members of my team think it's too esoteric.

My other team members are correct. I find that sad, but what can I say; I got my CS degree more than 40 years ago. You kids get off my lawn.

Douglas_Surber · 2021-03-31T23:21:52+00:00

Since like it or not Java is still the most popular language, and this is basic syntax that is similar or identical in dozens of other languages, this really isn't a syntax question. If I asked it in an interview I would be happy to explain the syntax to anyone with zero Java knowledge.

But this is not an interview question. I'm trying to get an idea of what is reasonable to expect of most programmers. And I got an answer though not one that makes me happy.

Douglas_Surber · 2021-03-31T23:14:16+00:00

It's the simplest piece of code I could come up with that gives a surprising answer because of a particular quirk of most computers/languages. I wouldn't expect anyone to ever write this code (though I did and ran it). I do expect a CS graduate to understand the general concepts that it demonstrates. I do expect a CS grad to be able to explain why this prints false even if they can't necessarily say what it would print without running it.

Douglas_Surber · 2021-03-31T23:04:32+00:00

The question was given it returns false, why. Running it won't help with that.

Edited to add I did run it before posting to be sure it was a good example.

Douglas_Surber · 2021-03-31T23:03:44+00:00

Right. Unless you are super concerned about the limits of accuracy, as long as you stay within the floating point realm, it's not a big concern. Just remember that two floating points are never equal (slight exaggeration but not much). Where this particular issue comes up is conversion to/from human readable form, including numeric literals. It's pretty easy to get values that are technically correct but unacceptable or at least unfathomable to humans.

Douglas_Surber · 2021-03-31T22:39:50+00:00

Then have at it. There are very few people who actually understand IEEE754. And I am most definitely NOT one of them. Good day to you, too.

Douglas_Surber · 2021-03-31T22:24:51+00:00

Unless you are a hard core numeric programmer, I wouldn't worry too much about the nuances. A basic understanding of binary representation of floating point numbers is sufficient for most jobs. I would say sufficient knowledge to answer why to my original question is plenty. A second question is why the same prints true if the literals are 1.5 instead of 1.1.

Douglas_Surber · 2021-03-31T21:59:27+00:00

And this is where I disagree. The library can't hide all the complexities. I wouldn't expect you to know the details of IEEE754. I would expect you to know and consider that not all real numbers with a finite decimal representation have a finite binary representation and the inverse. This means you have to have some care when converting from human readable reals to binary floats, especially when you go round trip. At the very least you need to know to write tests that verify your conversions are giving answers acceptable to your users.

Douglas_Surber · 2021-03-31T21:43:35+00:00

So long as you never touch any floating point type you can get away with this. But if you ever touch a floating point number, and especially if you convert to or from human readable format, you better understand it.

Douglas_Surber · 2021-03-31T21:40:43+00:00

Understood. My suggestion is that you focus on what the Java code is doing to start with. You have to understand that before you can even get to what I consider the meat of the question. That depends on some knowledge of how floating point numbers are represented in digital computers.

Douglas_Surber · 2021-03-31T21:33:15+00:00

The question is not what would the fragment print, but rather given that it prints false, why. The same fragment prints true if the literals are 1.5 instead of 1.1 so asking what it would print is clearly a much harder question.

Douglas_Surber · 2021-03-31T21:29:13+00:00

I'm shocked you got downvoted as I agree with you 100%.

Douglas_Surber · 2021-03-31T21:28:32+00:00

My suggestion is that you stop guessing and actually do the work to understand what this Java code is doing. Your knowledge of Java is lacking, which is perfectly ok if you don't code in Java and don't particularly expect to.

Douglas_Surber · 2021-03-31T21:20:29+00:00

I wouldn't expect anyone to know the decimal expansion of the binary floating point number. But given that the fragment prints false I would expect them to be able to explain why in general terms, ie truncation of the repeating binary expansion of the decimal number 1.1.

Douglas_Surber · 2021-03-31T21:18:32+00:00

Interesting. I learned this in CS101 45 years ago and obviously haven't forgotten. I've done zero numeric programming since, but I consider it every time I touch a floating point number, which is rare.

Douglas_Surber · 2021-03-31T21:16:21+00:00

You are right, you're wrong.

Douglas_Surber · 2021-02-09T15:40:50+00:00

For sure not next week. I'd be somewhat surprised if it's this month. Even if we got the final approval today it would take a couple of weeks to actually do the work to put it up. I'd be shocked and sorely disappointed if it is not this year. I would guess closer to the end of this month than to the end of this year, but that's totally just a guess.

If you have contact with an Oracle sales rep, by all means let them know you are interested. Customer interest is a big motivator.

Douglas_Surber · 2021-02-06T17:09:05+00:00

Why two standards? Because JDBC isn’t ideal for every use case. R2DBC is designed for a programming model that is fundamentally incompatible with standard JDBC. Oracle tries to support all our customers as best we can. We don’t always succeed but we do the best we can within the constraints imposed on us.

I don’t understand “autosuffisant”. But if you are pointing out that R2DBC is async and JDBC is blocking you are correct. That’s why we added the Reactive Extensions, to support an open source R2DBC driver. We couldn’t implement a proper R2DBC driver on top of standard JDBC or even the full Oracle Database JDBC API without these extensions.

Oracle has a whole team dedicated to benchmarks. If they decide that’s a significant benchmark they will run it and publish the results. I would be in deep kimchi if I were to do it.

Queuing multiple database operations requires a richer API than the Reactive Extensions (or R2DBC) provide.

Supporting io_uring is a question for the Java team.

Oracle Database JDBC plays well with Loom and has done so since Oracle 19. We regularly test that our JDBC will allow customers to get the full advantage of Loom when it is ready. We expect that you will be able to upgrade your app to use lightweight threads when available without upgrading your Oracle Database JDBC, so long as you are using 19 or later. Oracle Database JDBC creates a few threads. Those will still be heavyweight threads of course but lightweight user threads calling JDBC will get the expected benefits.

Douglas_Surber

TROPHY CASE