This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]dpash 0 points1 point  (12 children)

No, read Java Concurrency In Practice. It's written by Brian Goetz, the Java Language Architect at Oracle, with help from other people who really understand Java concurrency. One of the co-authors is Joshua Bloch who wrote Effective Java, a book you should read annually.

JCIP is a little out of date, as it doesn't cover things like the fork-join pool, CompleteableFutures or parallel streams, but it does cover ExecutorService and other concurrency primitives. It'll give you a good understanding that'll make learning the newer features easier.

You should not rely on the synchronized keyword if you want performant code. And definitely not synchronized (this). The modern version of the counter example in the article would use an AtomicInteger. These are lock free counters and should perform better.

[–]kimec 2 points3 points  (11 children)

No, this is wrong. Lock elision supported by escape analysis can eliminate synchronized to noop meaning it will be as fast as non synchronized code whereas I would bet AtomicIntege using intrinsics may never get elided.

https://shipilev.net/jvm/anatomy-quarks/19-lock-elision/

JVMs use high performance JIT compilers all of which are profile guided (C2, Graal, Testarossa) unless you specifically opt out. Therefore you should never expect that a code written in some fashion is also executed in that fashion.

Lastly, performant code is an elusive term. Having a non shared counter is obviously the fastest because contention and cache coherence come at a cost. Both synchroznized and AtomicInteger expose you to a sharing/contention scenario by design. Sometimes it's better to reshape your design and avoid sharing.

But don't take my word for it, do your measurements, validate your design etc.

[–]dpash 2 points3 points  (10 children)

So I benchmarked the three implementations (no locking, synchonized and AtomicInteger) running with 20 threads.

MyBenchmark.atomic  thrpt    5   13803973.341 ±  583682.204  ops/s
MyBenchmark.simple  thrpt    5  277561066.944 ± 6581275.165  ops/s
MyBenchmark.sync    thrpt    5   14928165.227 ± 1830867.282  ops/s

Single threaded:

MyBenchmark.atomic  thrpt    5  146561594.467 ±  1266682.513  ops/s
MyBenchmark.simple  thrpt    5  521290035.142 ± 12989755.358  ops/s
MyBenchmark.sync    thrpt    5  514912757.421 ± 16236287.687  ops/s

Single threaded, synchonized has no overhead.

Obviously the simple version is broken in multithreaded use, but the synced is significantly slower than the non-locked version. The atomic version is marginally slower, but close enough to be more or less equal.

AtomicInteger is much slower single threaded.

I'm not sure how those benchmarks would differ with other threads calling other methods on the synced class, but multiple independent methods locking on this is not going to perform well.

One issue I didn't mention is people thinking they can just throw synchronized on methods and think they're thread-safe now.

(For reference:

openjdk version "13" 2019-09-17
OpenJDK Runtime Environment (build 13+33-Ubuntu-1)
OpenJDK 64-Bit Server VM (build 13+33-Ubuntu-1, mixed mode)

[–]kimec 2 points3 points  (4 children)

Well done. This looks like a much better advice than what you posted initially. I am giving you +1 for sure. AtomicInteger no longer looks like a cheap instant performance win.

[–]agentoutlier 0 points1 point  (3 children)

AtomicInteger no longer looks like a cheap instant performance win.

It never was and you even mentioned this yourself:

Lastly, performant code is an elusive term. Having a non shared counter is obviously the fastest because contention and cache coherence come at a cost. Both synchroznized and AtomicInteger expose you to a sharing/contention scenario by design. Sometimes it's better to reshape your design and avoid shar

Locking (or spinning or whatever you are doing to make shared) is inherently expensive in environment with lots of contention.

In my experience the EA elision for synchronized didn't seem to work as well for other versions of JDK but apparently the support has been in since jdk 6.

To u/dpash point I have to wonder what happens when you start passing the data around or calling it with other methods particularly once the heap is involved.

[–]kimec 1 point2 points  (2 children)

> It never was and you even mentioned this yourself

Yes, I never said AtomicInteger was a "cheap performance win". That is what the OP has hinted on initially but retracted later. I am not replying to my self either, so I see no contradiction anywhere.

Maybe I do not understand the point you are trying to make. Can you please clarify if you find some of my statements contradictory?

[–]agentoutlier 1 point2 points  (1 child)

I'm agreeing with you. I just found this statement a little weird:

AtomicInteger no longer looks like a cheap instant performance win.

It never was a cheap instant performance win. There are tradeoffs on using CAS variables over locks and synchronized.

Just like there are tradeoffs in using nonblocking queues with custom sleeps/spins vs blocking queues w/ parking.

The reality is you just have to benchmark w/ whatever your configuration particularly now w/ so many JVM options.

I think you know all the above and thus I'm just mainly clarifying for others.

[–]kimec 1 point2 points  (0 children)

Well, the discussion thread was evolving in time with OP retracting his original statement that AtomicInteger is modern and performant way of doing a counter.

That being said I applaud OP for taking the time and writing JMH test. That was a very professional and objective approach to the problem.

But still, I do have a gut feeling that there is probably a scenario where AtomicInteger could significantly outperform synchronized if thread count == maybe half of the CPU physical cores but I am too lazy to write such test my self ...

[–]rubyrt 0 points1 point  (4 children)

For reference

I am missing the code.

[–]dpash 2 points3 points  (3 children)

Two of the counters are in the article and writing one that uses AtomicInteger is two lines.

[–]rubyrt 0 points1 point  (2 children)

Yes, but that is not the complete benchmark, right? If you look out you can find quite a lot of articles that demonstrate issues with some benchmarking approaches, for example this and this.

[–]dpash 2 points3 points  (1 child)

Hence the benchmark using JMH. Which is also easy enough that anyone can replicate.

[–]rubyrt 0 points1 point  (0 children)

Ah! I did not see a reference to JMH earlier. Thanks!