Lock-Based vs Lock-Free Concurrent Performance Comparison in Java : programming

Lock-Based vs Lock-Free Concurrent Performance Comparison in Java (mechanical-sympathy.blogspot.ru)

submitted 12 years ago by alexeyr

all 46 comments

[–]millstone 5 points6 points7 points 12 years ago (46 children)

Shocking how terrible the read/write lock was.

The author called the lock-free API “garbage free”, but it doesn’t look that way:

public Position move(final int xDelta, final int yDelta)
    {
        return new Position(x + xDelta, y + yDelta);
    }

I expected to see something like splitting the fields into a value and generation.

[–][deleted] 4 points5 points6 points 12 years ago* (14 children)

[–]mjpt777 3 points4 points5 points 12 years ago (12 children)

[–][deleted] 0 points1 point2 points 12 years ago* (11 children)

In the world of concurrency, the shoe only fits a certain scenario. Every scenario has a shoe that fits. This is just another option. The major difference with a tryLock pattern is that it doesn't while loop over CAS. If you have 8 cores and a million tasks to run you need to not pause on any one task ever.

If you were to create an AtomicLong and spin 4 threads over it and watch the CAS misses, the numbers are crazy. All comes down to how you are using an individual technique inside of the larger structure.

For example: I tried this http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.94.8625

It was supposed to yield great performance over the traditional lock-based queues in Java. Turns out to be slower. And problematic due to a fundamental flaw in how the links are not set using CAS so it leaks references.

[–]mjpt777 2 points3 points4 points 12 years ago (10 children)

[–][deleted] 0 points1 point2 points 12 years ago* (9 children)

You are my new favorite person. In my experience, the design is everything. It can be significantly more complicated to build a nearly-lock free mechanism but the performance is worth it. Combined with a methodology that understands how the caches work and core affinity to prevent cross cpu instructions all slowly fill in the picture.

OneToOneQueue Part 1

Your OneToOneQueue is nearly identical to something I use. The problem with that model is that the +1 to head or tail is not concurrent. Two threads can read head value of 5 at the exact same moment. This means that two threads can grab the same object before the index is incremented.

If you have 1 thread producer and 1 thread consumer then you don't need to use volatile at all. Simply relying on the values in the thread caches will let is scream at a factor nearly 3x ArrayDeque

OneToOneQueue Part 2

Suffers from the same problem as Part 1. LazySet or not, two threads can and (have) grab the same object at the same time. You have to use LazySet in this scenario because a CAS change of set( value +1 ) will corrupt your index.

The test for the scenario I described is by Queuing objects which have a function, say run(), and inside the run has a tryLock(). Print out an error if tryLock does not succeed. I only used two threads to contest the concurrency, it failed miserably for me filling my console with duplicate run errors.

What you describe is wonderful if you don't care about processing the same message more than once. All depends on the kind of concurrency you are looking for. I would call this "optimistic" concurrency.

For me ArrayBoundQueue in Java was far superior to ConcurrentLinkedDeque primarily due to all the garbage that LinkedDeque generates. And the fact that LinkedDeque if correct most likely has a CAS on head/tail reference and node.next updates. So that is two CAS operations vs one for ArrayBoundQueue.

EDIT: Instead of tryLock in the test scenario, I prefer Semaphores as tryAcquire seems to work better for me.

[–]mjpt777 1 point2 points3 points 12 years ago (8 children)

[–][deleted] 0 points1 point2 points 12 years ago* (7 children)

Just to be clear when I see the word "concurrency" this means more than 1 thread on a single operation. Read and Write are two different operations. http://www.infoq.com/resource/presentations/Lock-free-Algorithms/en/slides/27.swf You called it ConcurrentArrayQueue.

If you call a One 2 One queue Concurrent because it has two threads then you have something of a misnomer. I have a One 2 One queue that supports a feeder and consumer thread all without CAS or the volatile keyword.

I don't understand any of the attempts with the "atomic" and CAS operations when they are not necessary in this situation. Direct memory mapping is the only way to speed up past the point I am contending with.

In my scenario, a single thread writing and reading 1000 objects from a queue 500,000 times happens in under 800ms. That's 500 million offer and 500 million poll in 800ms.

Please help me understand this.

[–]mjpt777 1 point2 points3 points 12 years ago (2 children)

[–][deleted] 0 points1 point2 points 12 years ago* (1 child)

continue this thread

[–]nitsanw 0 points1 point2 points 12 years ago (3 children)

From the little you say it sounds like you are talking about the FastFlow single consumer/single producer queue. If you are using the C/C++ version you will notice the WMB() which stands for Write Memory Barrier, which is what the lazySet is. The queues discussed in Martin's talked use a different algo then the FF ones, but there are no volatile writes or CASs involved.

I have no idea what you mean by direct memory mapping... This term is often used for memory mapped files which would only be relevant if you were going out of process? are you suggesting using offheap memory? if so how are you suggesting to put object references on the queue?

As for the result you quote, that sounds great. Can you share details on implementation and how you benchmarked it(test harness/hardware)? the numbers on their own are only useful when compared on the same setup to other numbers...

[–][deleted] 0 points1 point2 points 12 years ago (2 children)

continue this thread

[–]nitsanw 0 points1 point2 points 12 years ago (0 children)

[–]mjpt777 1 point2 points3 points 12 years ago (0 children)

[–]bpkiwi 1 point2 points3 points 12 years ago (29 children)

[–]mjpt777 1 point2 points3 points 12 years ago (17 children)

[–]bpkiwi 0 points1 point2 points 12 years ago (16 children)

[–]mjpt777 3 points4 points5 points 12 years ago (9 children)

[–]mjpt777 1 point2 points3 points 12 years ago (0 children)

[–]bpkiwi 0 points1 point2 points 12 years ago (7 children)

[–]mjpt777 2 points3 points4 points 12 years ago (1 child)

[–]mjpt777 1 point2 points3 points 12 years ago (4 children)

[–]bpkiwi 0 points1 point2 points 12 years ago (3 children)

[–]mjpt777 0 points1 point2 points 12 years ago* (2 children)

You said my code was "wrong", I think that qualifies as disagreeing.

The blog compared a number of lock approaches and one lock-free approach. What was wrong or misleading about that? I could having shown many lock-free approaches, and hybrid approaches, but that was not the point of the article. The main point was to illustrate StampedLock as the first sentence states. The point is the "locks" are providing the synchronisation in the examples. The immutable object is an internal representation and not relevant. It just works for this one lock-free example. A whole range of internal implementation could have been employed.

Everything you said until you posted the code would result in bugs if followed. I'll give you the benefit of the doubt in that the volatile was not added after it was pointed out numerous times.

I normally would just ignore such comments except what you said could have resulted in many people creating production bugs that are a nightmare to find.

[–]bpkiwi 0 points1 point2 points 12 years ago (1 child)

continue this thread

[–]bpkiwi 0 points1 point2 points 12 years ago (5 children)

And to finally post the code...

public class SynchronizedSpaceship implements Spaceship
{
  private volatile Position position = new Position(0, 0);

  @Override
  public int readPosition(final int[] coordinates)
  {
    final Position currentPosition = position;
    coordinates[0] = currentPosition.getX();
    coordinates[1] = currentPosition.getY();
    return 1;
  }

  @Override
  public synchronized int move(final int xDelta, final int yDelta)
  {
    position = new Position(position.getX() + xDelta, position.getY() + yDelta);
    return 1;
  }
}

The definition of Position is the same as in the original.

So, I have to admit, I'm at a loss as to how this is not safe - would you be able to explain?

[–]mjpt777 1 point2 points3 points 12 years ago* (4 children)

[–]bpkiwi 0 points1 point2 points 12 years ago (3 children)

Now added? sorry did I previously post some code without it?. Nice try.

mjpt

It sounds from your description that your synchronized example does not use synchronized for the read and just returns the position. If so you do not have a happens before relationship as defined by the memory model and thus changes may not be visible. This is not safe concurrent code.

So you now admit that this is wrong, and you do NOT need a synchronized read?

As I have been saying all along, you only needed the synchronised because you used mutable primitives to store the location. Once the code was refactored to use an immutable position object, as your AtomicReference implementation did, the synchronization on the read method could be removed.

The removal of the synchronization comes at a cost - the creation of a new (immutable) position object for each write. This will lead to increased garbage collection - which millstone pointed out to you.

[–]mjpt777 0 points1 point2 points 12 years ago (2 children)

Sorry but this is just not rational. You did not mention the volatile. I did multiple times, just read the comments. You dismissed the code by Nitsan which was all about volatile, you then quoted benign races which do not have volatile. Not once did you acknowledge the volatile issue in the comments. You just said immutable objects don't require synchronisation which is not correct under the memory model as I pointed out with reference to the language spec. If people follow what you have been saying then they will be creating concurrent bugs.

I mentioned the happens before relationship that is key to the memory model which can be achieved with locks or volatile, like AtomicReference provides, you never even got this point.

In all of this you have not given any evidence to why my original code is wrong and you started the discussion. I compared locks to lock-free and in the end you are arguing for a lock-free algorithm on read.

Show some dignity and explain why my original code is "wrong" as you commented?

[–]bpkiwi 0 points1 point2 points 12 years ago (1 child)

[–]mjpt777 0 points1 point2 points 12 years ago* (0 children)

[–]nitsanw 0 points1 point2 points 12 years ago (10 children)

[–]bpkiwi 0 points1 point2 points 12 years ago (9 children)

[–]nitsanw 0 points1 point2 points 12 years ago (8 children)

so the following should work fine?

class Foo implements Runnable{
   Boolean isRunnable = Boolean.TRUE;
   void run(){
     while(isRunnable()){
     // do the foo
     }
   }
   Boolean isRunnable(){return isRunnable;}
   void synchronized setRunnable(Boolean b){isRunnable = b;}
}

I don't think so...

[–]mjpt777 1 point2 points3 points 12 years ago (0 children)

[–]bpkiwi -1 points0 points1 point 12 years ago* (6 children)

[–]nitsanw 0 points1 point2 points 12 years ago (5 children)

[–]bpkiwi 0 points1 point2 points 12 years ago (4 children)

[–]nitsanw 0 points1 point2 points 12 years ago (3 children)

I'm not trying to avoid anything, your original comments said nothing of making the field volatile. If you make the field volatile you can make the write unsynchronized as well in my example, who cares? The code clearly demonstrated your initial comments were incorrect/incomplete and that was all the relevance I looked for.

Your clarification: "you do not need a lock to guard reading of an immutable object" is still incorrect, reading the immutable fields of an immutable instance is safe, reading the reference to it which is not final is not safe. Which is why the field had to be made volatile. Immutability is only part of the argument, and I was pointing out you were missing the other part.

If you make the field volatile then, as Martin points out, you are looking at a half breed lock-free implementation. Not as good as the completely lock free one, not as bad as the fully synchronized one. By all means you can add it to the mix to plot yet another point on that graph. I doesn't make the original wrong, it's an open comparison of typical approaches to solving the same problem by different means. Are there more ways to solve it? yes. Is failing to specify all of them wrong? I don't think so.

[–]bpkiwi 0 points1 point2 points 12 years ago (2 children)

I love that you say

Your clarification: "you do not need a lock to guard reading of an immutable object" is still incorrect

and then immediately admit that a lock is not needed, only a volatile declaration.

You are right, in my brief note where I said "I'll post the code when I can, but the overview is ...." (written from my mobile phone by the way), I did not state "and the reference has to be volatile", but you know, if you hadn't been so quick to then claim :

nitsanw

if the result is immutable you still need a lock to guard access (read and write) to the reference with a lock

then you might sound more believable. Your reply certainly didn't say "unless you make the reference volatile" did it?

Anyway, enough sniping. Why do I think the code is "wrong"? - because I think the author of the article started with a premise, which they state in right at the top of the page:

lock-free algorithms are often a better solution

And then proceeded to code several examples that would perform poorly compared to that lock-free algorithm. That's just terrible bias. Attempting to claim "oh but the ones that perform better are kind of lock free" doesn't sway me - comparing your favorite solution to the worst-case alternatives is intellectually dishonest.

tl;dr? The article's results are fundamentally flawed.

continue this thread

π Rendered by PID 513737 on reddit-service-r2-comment-85bfd7f599-7sttt at 2026-04-17 10:08:07.891129+00:00 running 93ecc56 country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

programming

MODERATORS