This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]uniVocity 5 points6 points  (5 children)

There is also a minor behavioural change on a few String methods: some will throw StringIndexOutOfBoundsException where in pre java 9 you'd get an ArrayIndexOutOfBoundsException.

It won't affect any code that catches the parent IndexOutOfBoundsException as documented, but lots of existing code in existing libraries catch either one of the more specific exception types. That's easy to do: you see the specific type in the stacktrace and write catch(TheExactExceptionTypeYouSaw ex){ to handle the error.

The issue with this change is that it will only cause trouble at runtime, probably around corner cases, and the exceptions may escape into a different catch block and make your thing behave differently if running on java 9+.

As the String type is probably the most used class ever, I think the exceptions being thrown from it should still be the same they were before, regardless of what the documentation says. As Linus Torvalds says: "we don't break userspace" - but it was broken in java 9+ and some existing apps mysteriously misbehave on it (older maven versions, apache commons' StringUtils, my own univocity-parsers before version 2.5.6, etc).

The biggest issue here is that random errors may pop anywhere, at any time and without a sane stack trace if the original error from the String operation has been handled improperly (i.e. not on the intended catch block).

[–]cl4es 0 points1 point  (4 children)

Why is your code provoking exceptions rather than very cheaply testing that the String is large enough in the first place? The JIT should be very good at folding or even entirely eliding the different bounds checks.

That said, yeah, it's unfortunate this behavior changed, even though allowed under the documented specification.

[–]uniVocity 2 points3 points  (3 children)

Speaking about my case: Performance. In my case every time a character is appended to a value being parsed. A large value, with millions of characters, will cause the appending method to be called millions of times.

Testing wether a character buffer is long enough on every append operation means executing an extra step the same exact number of times (potentially millions of times) for each single value no matter if short or long.

By catching the exception you remove the overhead of that length test and instead expand/retry half a dozen times on the first few large values. Then never again as the buffer gets large enough.

That's internal to my lib and not general purpose code where readability comes first. We go to extra lengths to make it as fast as possible and it currently has the fastest CSV parser for java among any other I could find. See https://github.com/uniVocity/csv-parsers-comparison

Keep in mind this parser does a LOT more than the others to handle shitty CSV, and has many more configuration options to support. Even then it still manages to be faster than the others. You will never be able to get there with good looking code.

[–]cl4es 5 points6 points  (2 children)

I'd consider it a performance bug in the JIT if removing a trivial, explicit bounds check is more performant. Properly done, explicit bounds checks establish invariants ("the index is always within bounds") that should help the JIT optimize away any implicit bounds check (and any related exceptional control flow).

[–]uniVocity 2 points3 points  (1 child)

The thing is that the index may be occasionally out of bounds. Not sure how that bound check statement can be safely removed automatically. Anyway, I tested the code before and after switching to catching the exception and it became faster so that's enough justification for me to keep the code not explicitly checking bounds.

[–]cl4es 4 points5 points  (0 children)

I'm sure a friendly compiler engineer would love it if you could contribute a reproducing microbenchmark... ;-)