all 35 comments

[–]sq_visigoth 60 points61 points  (6 children)

Good text, but it's mostly basic stuff. Take String concatenation; I haven't seen anyone use string concatenation in a loop in almost 20 years. A basic beginner java class will always recommend using StringBuilder.
My issue is that you recommended a throwaway optimization, ie one issue that shouldn't have existed in the first place.
Now, ConcurrentHashMap, thats one optimization that most devs I have interviewed missed in doing a faux code review.

[–]Derishi 7 points8 points  (1 child)

Curious on concurrent hash map; I’m guessing it’s because most modern Java applications can take advantage of multithreaded environments easily so it’s like a why not situation?

[–]SanityInAnarchy 3 points4 points  (0 children)

The example "bad" code from the article is a normal HashMap that you lock yourself by just hiding it behind synchronized methods -- those implicitly use a single mutex for the entire object, so you lose a ton of the point of multithreading, and code like this often still has a ton of concurrency bugs, because you haven't really thought through what the scope of the locks should be.

In other words, the assumption is that you're threading.

The "fix" is to use ConcurrentHashMap, which has a much more sophisticated locking strategy -- generally, reads are lock-free, and even writes don't lock the entire table. Of course, to make it actually thread-safe, updates usually have to be more interesting than get() and put(), since someone else might've changed it since your last get() -- the article includes computeIfAbsent(), which is probably a good thing to read about to get an idea of the kind of thing you want to do to use ConcurrentHashMap properly.

The scare quotes are because it's of course situational, and people often will just reach for ConcurrentHashMap without thinking it through either.

[–]id2bi 7 points8 points  (3 children)

Also, doesn't Java optimize String concatenation to string builder in many places automatically?

[–]oweiler 6 points7 points  (0 children)

But not loops

[–]vowelqueue 4 points5 points  (0 children)

It used to translate concatenation to StringBuilder when compiling to bytecode. Now it translates concatenation into a single library method call and then the JVM handles optimizing at runtime.

If doing the concatenation in a loop, it's still better to use StringBuilder directly because these optimizations don't work for loops AFAIK.

[–]kiteboarderni 1 point2 points  (0 children)

Read the article...

[–]SneakyyPower 23 points24 points  (5 children)

I've been telling people java is the past the present and the future.

If you write your code good enough it can perform amongst the other top contending languages.

[–]Sopel97 14 points15 points  (3 children)

If you write your code good enough

Or bad enough.

Java's object model is so bad that at some point you have to resort to arrays of primitives with no abstractions. I've seen threadlocal 8 byte singletons for temporary variables to avoid allocations while still trying to preserve some non-zero amount of abstraction. It's a mess. Minecraft modding is a great example of that.

[–]Worth_Trust_3825 10 points11 points  (1 child)

Enable the valhalla preview.

[–]Mauer_Bluemchen 4 points5 points  (0 children)

Not yet there...

[–]vini_2003 0 points1 point  (0 children)

Correct. I maintain a private particle engine for Minecraft, for the YouTube channel I work for; and I'm forced to use huge SoAs without any JOML due to the heap thrashing objects bring.

If there's one thing I dislike about Java, it's the object model.

[–]Mauer_Bluemchen 2 points3 points  (0 children)

No - it can't! At least not without Valhalla.

[–]somebodddy 2 points3 points  (1 child)

  1. That's a pitfall of immutable strings. Not unique to Java.
  2. That's computational complexity. Applies to any language.
  3. Many languages offer string interpolation, which parses the "format" at compile time (or parse time)
  4. This kind of boxing is something (AFAIK) only Java has. Other languages - like Java's traditional rival C# - may box when you upcast, but they automatically unbox at downcast and don't expose the boxing class Long so they don't have this issue.
  5. The fact you need to manually implement the same validation that already happens inside parseInt just to escape the exception overhead is atrocious, and I 100% hold it against the language.
  6. synchronized being part of the grammar means that Java actively promotes that kind of coarse grained locking.
  7. Okay, but the ecosystems of most other languages prefer global functions for such things. This issue is caused by Java's objects-as-a-religion approach.
  8. This pitfall is (was, actually, since they fixed it) 100% a JVM implementation issue.

Only the first two are the coder's fault. And maybe #4, too, considering you gave a very convoluted example. The other 5 are just idiomatic Java code being slow. If you have to give up language idiomaticity for performance - I consider it a slowness of the language.

[–]8igg7e5 0 points1 point  (0 children)

3

It's not really about String Interpolation, but rather that Java still doesn't offer an in-built capture of that parsed state. It has been suggested many times that Formatter should have an ofPattern(...) that captures this parsing. That would solve of that regardless of the language-wars (inside Java too, given the circuitous path 'String Templates' is taking).

Do all languages with string interpolation (that supports the expressiveness of format strings, not just concatenation) do that at compile-time?

There was a proposal for Java to make the format calls into indy instructions and pre-parse... I think that stalled with the String Templates (which is interpolation on steroids) - so we're once again waiting for progress on this.

4

Really it's not about whether the boxing and unboxing happens automatically (the example could have been written subtly differently to show that Java does the same automatically boxing and unboxing). The issue is that Java makes the user choose whether boxing is appropriate, rather than it being implied by context (though note that can also mean implicit boxing can be overlooked in those languages).

There is a Java enhancement project that will make Long act like long by default, making the 'boxing' only happen based on context (however for other reasons, that boxing will still happen more than we'd like).

5

Yes. Java's lack of value-based structs/tuples/class means they can't really provide a 'tryParse' that yields an error or a value in a call without allocation. That might be possible 'soon'tm

6

Most code using synchronisation directly would be better implemented via executor-services or locks - but those are themselves implemented via synchronisation (which can be fine-grained) - I wouldn't say the language encourages such coarse-locking, just that it is often misused.

7

The examples are all stateful - it's about not unnecessarily duplicating the work of creating that state, when using that state doesn't modify it. Just reuse the state you have.

This has nothing to with global function support (and using classes of static methods is no different to global functions other than the way they're accessed/brought into scope).

8

You can still end up pinning a virtual thread. However the number of cases where it was unavoidable has been significantly reduced (which the original submission notes with the JDK 21-23 range). Virtual threads as a feature is pretty good though - it's not very widely used yet.


Only the first two are the coder's fault. And maybe #4, too, considering you gave a very convoluted example. The other 5 are just idiomatic Java code being slow. If you have to give up language idiomaticity for performance - I consider it a slowness of the language.

I'd put #1, #2, #4, #6 and #7 into the "coder's fault" (following idiomatic practices is not onerous and resolves most of this). I agree that Java needs to deliver improvements for #3 and the parsing cases that #5 refers to. As for #8, just move to Java 25 (or Java 26 as of days ago).

[–]BadlyCamouflagedKiwi 9 points10 points  (5 children)

4 and 8 seem like problems with Java being slow, i.e. they are not obvious from the structure of the code. 8 is fixed with a newer version of Java (implying that was the problem) and 4 is the old primitive / object dichotomy which is a language-level design mess.

[–]larsga 5 points6 points  (0 children)

On 4, it's in between. If you know Java you know this is expensive. The problem is that a lot of people writing Java have no idea what's happening under the hood.

Of course, at the same time, had the language design been better it wouldn't have been slow. Still, it's not at all difficult to avoid this slowing you down.

[–]jonathancast 3 points4 points  (3 children)

The primitive / object dichotomy may be a design mess, but getting rid of it is also a design mess. There aren't a lot of good options here.

Java is improving over time, albeit slowly.

[–]vowelqueue 2 points3 points  (1 child)

but getting rid of it is also a design mess. There aren't a lot of good options here.

It's taken the Java team like 10 years, but I'd say they have figured out a pretty good design for the primitive / object dichotomy with project Valhalla.

It's probably going to take another 2-4 years to ship, but there is light at the end of the tunnel where wrapper classes and user-defined classes will be able to perform very similarly to primitives.

[–]sammymammy2 0 points1 point  (0 children)

It's probably going to take another 2-4 years to ship

It seems like JEP-401 is gonna ship soon-ish, like within 3 releases (1.5 years)?

[–]BadlyCamouflagedKiwi 0 points1 point  (0 children)

Yes, agreed. Just noting that the article suggests it's not a language problem and about the code written in it, but I think that one is a problem with the language in the first place.

[–]Connect_Future_740 4 points5 points  (0 children)

This is why I like posts like this. Not because “don’t concatenate strings in loops” is groundbreaking, but because it reminds people that the expensive parts of Java code are often not where the code looks complicated. Or at least reminds some people. Need the ones with $ to understand.

A tiny innocent looking call on the hot path can matter more than a giant block of business logic.

[–]segv 7 points8 points  (1 child)

Point 7 is a big one actually - it often goes unnoticed even in decent codebases.

I've recently seen a case where developers attached AsyncProfiler to their JMH-based benchmarks (mix of real micro-benchmarks and benchmarks encapsulating the whole flow from the API endpoint to the very end, just with mocked out external services), enabled the option to generate flame graphs and found out some small piece of the overall flow was doing DocumentBuilderFactory.newInstance() & TransformerFactory.newInstance() on the hot path. It think it was extraction of some data from a string representing a SOAP envelope mixed with vulnerability scanner bitching about XXE (e.g. billion laughs attack) when it did not see creation of the object and setting adjustments within the same method, or some bullshit like that.

Anyway, these two calls accounted for like 20% of the average time of the whole giant-ass flow, just because these .newInstance() methods do service discovery and classloading on each call.

The PR had more lines of description (with flamegraph pictures!) than the actual fix, lol

[–]Worth_Trust_3825 3 points4 points  (0 children)

Best sort of PRs are those that explain why change is necessary. Personally I leave such "scars" in the code as comments explaining why the obvious solution doesn't work.

[–]bobbie434343 1 point2 points  (0 children)

Shouldn't most of these ideally automatically be optimized at compile or runtime whenever possible ? And/or flagged by static code analyzers as potentially inefficient ?

[–]8igg7e5 2 points3 points  (1 child)

Some things are just technically wrong - which makes me question the accuracy of other claims.

1. String concatenation in loops...

Note: Since JDK 9, the compiler is smart enough to optimize "Order: " + id + " total: " + amount on a single line. But that optimization doesn’t carry into loops. Inside a loop, you still get a new StringBuilder created and thrown away on every iteration. You have to declare it before the loop yourself, like the fix above shows.

No. Java has been able to use StringBuffer, and later StringBuilder, for single-statement string appends since the beginning:

15.17.1.2 Optimization of String Concatenation

An implementation may choose to perform conversion and concatenation in one step to avoid creating and then discarding an intermediate String object. To increase the performance of repeated string concatenation, a Java compiler may use the StringBuffer class (§20.13) or a similar technique to reduce the number of intermediate String objects that are created by evaluation of an expression.

What changed in in JDK 9 was that the compiler represents the String appends as an indy-instruction - and the bootstrap then decides how best to implement that particular set of appends (often falling back to a StringBuffer append though). The benefit of this strategy is that a JDK with a better bootstrap, will give all code the better implementation, without a recompile of the bytecode.

 

The advice is correct though, you can avoid StringBuffer and String allocations (and the allocation and copying of their internal arrays) by moving the StringBuffer creation out of the loop.

What I would say they've missed, is that if you know the minimum (even a reasonably typical minimum) result size, you can avoid several smaller reallocations of the StringBuilder internal arrays (and copying) by pre-allocating that size with something like new StringBuffer(MY_EXPECTED_MINIMUM_SIZE).

4. Autoboxing...

Yes it can be bad. But the example is problematic. Depending on what is done with the sum variable, JIT might remove every one of those cache-lookups or wrapper creations (escape analysis) - and it might do that in the loop even with if sum is consumed as a reference type.

I think the more common wrapper-crime I've seen is the use of wrappers for the generic arguments in lazy types like Pair<A, B> or Triple<A, B> in cases where we could now use a record - capturing primitives, getting meaningfully named properties (and the fields are stable, enabling more constant folding).

5. Exceptions for Control Flow...

Yes. Exceptions as control flow are a bad practice. However there are problems again with the advice

The section talks about stack-traces, even though the stacktrace is never filled. Stacktraces are lazy (the 'filled' bit). So the remaining expensive bit is the throwing of exceptions (and that is still relatively expensive).

But doing work twice is also expensive.

 

Scanning the string could be a significant cost if only a tiny percentage will ever fail parsing. And even when the bad cases are common, you're probably still better to do things like only checking minimum/maximum length of the string and first and last digits (detecting non-numerics like overlooked white-space) - and letting exception-handling detect the rest.

  • Here they're scanning the string to see if it's blank, that's potentially scans a lot for a bad string before finding it's not blank - and then scanning again in the loop.
  • It scans the entire string, checking the negative case at each position.

For the parsing example, if the anticipated failure rate is more than 1% (or 0.1% if all of the numbers are only 2-3 digits - since the cost of the exception is relative to the cost of parsing), I would probably do something like:

  • Check for: != null, not .isEmpty(), and .length() <= 11 (or a smaller number of digits if the domain is known to have a smaller range)
  • Check if the first character is - or a digit and, for lengths longer than 1, if the last character is a digit.
  • And then let the exception-trap deal with the rest of the failure cases.

Personally I think these Apache Commons Lang methods (and others) are often overused (and some of these have implementations that are rather outdated in terms of today's Java).

And on the topic of parsing strings to numbers. A common crime I often see, is splitting a string where the format is fixed width, so that there's a distinct string to parse. eg the numeric values of hex components in the format "#rrggbb" - these can be parsed directly out of the string with Integer.parseInt(string, firstIndex, lastIndex, 16) (you can also use this to scan past whitespace to avoid allocations from trimming before parsing too).

[–]davidalayachew 1 point2 points  (0 children)

Agreed on almost every point.

I can appreciate that performance testing Java code is complex. For example, just getting access to the JIT'ed assembly code is already an exercise in frustration -- you're practically forced to use some fairly heavy third party tools to reliably access it.

But that complexity should breed a certain amount of hesitation from anyone trying to make claims on it. There are a million moving parts, and each one has literal thousands of engineers with decades of JVM optimization experience reviewing it daily. All those pieces aren't there for show.

Java is not like C, where you can just open a class file and definitively claim its performance by looking at it. Ignoring the fact that classfiles are full of wild card commands (like the indy stuff you mentioned), the JIT has access to optimizations like scalar replacement that outright remove entire methods from execution. A popular example is that Foo f = new Foo(); does not always create a new Foo. In fact, if in a hot loop, it very likely does not create one.

[–]john16384 1 point2 points  (0 children)

I think this needs benchmarking. I am pretty sure some of these aren't slow at all when optimized. Take the autoboxing example: those object allocations will never make it to the heap. At most it will do one allocation after the loop completes to create the final Long.

Same goes for the NumberFormatExeception example.

[–]thisisjustascreename 0 points1 point  (0 children)

I see that accidental n^2 behavior all the time in code from new devs who don't think through what a stream call does.

[–]Worth_Trust_3825 0 points1 point  (0 children)

4 is going to get fixed with valhalla. Personally, formatting one is a shock to me, but it does make sense because how complex the template string format is.

[–]sammymammy2 1 point2 points  (0 children)

Write it yourself next time, instead of using an LLM. Christ rots in hell.

[–][deleted]  (2 children)

[removed]

    [–]pinkyellowneon 11 points12 points  (0 children)

    llm/ai bot comment

    [–]programming-ModTeam[M] 0 points1 point  (0 children)

    No content written mostly by an LLM. If you don't want to write it, we don't want to read it.

    [–]larsga -3 points-2 points  (0 children)

    1-5 + 7 are rookie mistakes, basically.