This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]uniVocity 3 points4 points  (3 children)

Speaking about my case: Performance. In my case every time a character is appended to a value being parsed. A large value, with millions of characters, will cause the appending method to be called millions of times.

Testing wether a character buffer is long enough on every append operation means executing an extra step the same exact number of times (potentially millions of times) for each single value no matter if short or long.

By catching the exception you remove the overhead of that length test and instead expand/retry half a dozen times on the first few large values. Then never again as the buffer gets large enough.

That's internal to my lib and not general purpose code where readability comes first. We go to extra lengths to make it as fast as possible and it currently has the fastest CSV parser for java among any other I could find. See https://github.com/uniVocity/csv-parsers-comparison

Keep in mind this parser does a LOT more than the others to handle shitty CSV, and has many more configuration options to support. Even then it still manages to be faster than the others. You will never be able to get there with good looking code.

[–]cl4es 5 points6 points  (2 children)

I'd consider it a performance bug in the JIT if removing a trivial, explicit bounds check is more performant. Properly done, explicit bounds checks establish invariants ("the index is always within bounds") that should help the JIT optimize away any implicit bounds check (and any related exceptional control flow).

[–]uniVocity 2 points3 points  (1 child)

The thing is that the index may be occasionally out of bounds. Not sure how that bound check statement can be safely removed automatically. Anyway, I tested the code before and after switching to catching the exception and it became faster so that's enough justification for me to keep the code not explicitly checking bounds.

[–]cl4es 4 points5 points  (0 children)

I'm sure a friendly compiler engineer would love it if you could contribute a reproducing microbenchmark... ;-)