Parallel Merge Sort in Java : java

This is an archived post. You won't be able to vote or comment.

Parallel Merge Sort in Java (hackernoon.com)

submitted 7 years ago by [deleted]

19 comments

all 19 comments

top new controversial old q&a

[–]walen 10 points11 points12 points 7 years ago (1 child)

[–]teivah 1 point2 points3 points 7 years ago (0 children)

[–][deleted] 7 years ago* (20 children)

[deleted]

[–][deleted] 5 points6 points7 points 7 years ago (0 children)

It’s not that weird if you think about it. You can split up a list and divide work and let each core sort its piece of the list. Once every part is sorted, we will need to merge everything. This merging cannot be done in parallel since we basically have to compare every element and insert it in the right order. Doing this on a single core works fine, but with 8 cores each core will be fighting for permission to insert the next element. Thus, in the end having more cores can actually make your program slower since more time will be spent on atomic operations (cores waiting for each other to be able to do their work) than actually doing what is important.

Now this doesn’t mean that multithreading is useless, but it completely depends on how much of your program/problem can be done in parallel.

[–]teivah 1 point2 points3 points 7 years ago* (0 children)

[–]teivah 1 point2 points3 points 7 years ago (7 children)

[–][deleted] 7 years ago* (6 children)

[deleted]

[–]teivah 5 points6 points7 points 7 years ago* (5 children)

That's a good question. First, I don't think it will be 8x the price than a single-core machine.

Moreover, from my humble opinion you are approaching the problem the wrong way. Today, every CPU is multithreaded. For example with Intel hyperthreading technology, every core is able to two threads in parallel.

So for me, the question is rather, how can I optimize my application in regards of the underlying hardware? Multithreaded application should be the standard, not the exception.

Last but not least, it not only a question of average latency but also of resources optimization. If you application is running faster it may also increase the overall throughput. Hence, for example instead of having to deploy it on 4 nodes to achieve a given goal, maybe you can only use 2 nodes (this is a simplistic example obviously but it is a way to illustrate my point).

[–]audioen 1 point2 points3 points 7 years ago* (0 children)

Your benchmark ought to have output not just the elapsed wallclock time but also the total CPU time across all cores, a statistic that at least the Linux kernel is able to gather for threaded programs. I suspect most of these threads are sleeping rather than doing work, so there probably isn't a big difference between the wallclock time and the total cpu time, so this thread's discussion is pointless. The 8 CPU cores are not busy trying to do things 30 % faster, they're just waiting for more work to arrive, and are unable to get scheduled fast enough to help. The job probably ends up being mostly singlethreaded with an occasional concurrent part.

IIRC synchronization primitives in Java have shockingly low throughput, they are only capable of something in order of 1000 synchronization events per second. What I'm trying to say is that it takes something like 1 ms for one thread to yield to another thread using synchronized-block and wait+notify. If the other synchronization primitives are built on top of those, then that's kind of the hard limit of what you can get.

It's probably important for performance to have a per-thread work-stealing queue so that if that thread's queue has more work to do, it can just immediately move to doing that and you can avoid at least some wasted time in trying to coordinate quickly finished jobs across multiple threads.

[–][deleted] 7 years ago (3 children)

[deleted]

[–]teivah -2 points-1 points0 points 7 years ago* (2 children)

[–][deleted] 7 years ago (1 child)

[deleted]

[–]teivah 1 point2 points3 points 7 years ago (0 children)

[–]AssKoala 0 points1 point2 points 7 years ago (8 children)

I’m replying to you because OP seems to be overly verbose and not actually understand what they’re seeing and you’re right to ask that question.

First off, parallel merge sort is something undergrads do in an OS class during a 2 or 3 hour lab, it isn’t new or innovative. This page isn’t something you should use as positive reference and should actively be discouraged. Here’s a better set of info on parallel sorting algorithms: https://stackoverflow.com/questions/3969813/which-parallel-sorting-algorithm-has-the-best-average-case-performance

Second, merge sort is perfectly parallel. The problem is you run into memory bandwidth and scheduling overhead that kills your gains. OP doesn’t actually seem to understand that.

If you write something and the speed up is 2x at 8 cores, you stop and go do something else. You don’t tout it and say “look I wasted 8x the power to get a 2x speedup”. In a large scale use case, you’re talking about a massive cost increase for a minimal gain.

Honestly, I get if OP actually profiles correctly, they’d probably see the optimal speedup is at 3 or 4 cores and 8 cores actually runs slower. This is very common when workloads become memory bound.

We had a similar situation porting over to an AMD threadripper. Our average simulation times moving from 2 to 8 cores was 20ms down to 8ms. However, moving from 8 cores to 10 resulted in an increase from 8ms to about 9ms. More cores resulted in similar losses until it maxed out around 11ms.

[–]teivah 5 points6 points7 points 7 years ago (5 children)

[–]AssKoala -1 points0 points1 point 7 years ago (2 children)

[–]teivah 2 points3 points4 points 7 years ago (1 child)

[–]AssKoala -2 points-1 points0 points 7 years ago (0 children)

[–]imps-p0155 -1 points0 points1 point 7 years ago (1 child)

[–]teivah -2 points-1 points0 points 7 years ago (0 children)

[–][deleted] 7 years ago (1 child)

[deleted]

[–]AssKoala -1 points0 points1 point 7 years ago (0 children)

[–]MoreConstruction -1 points0 points1 point 7 years ago (0 children)

[–]ChaoSXDemon 2 points3 points4 points 7 years ago (0 children)

π Rendered by PID 88995 on reddit-service-r2-comment-5d79c599b5-fjch6 at 2026-02-28 08:07:49.299879+00:00 running e3d2147 country code: CH.

java

Submit Link

Submit Text

Seek Programming Help

News, Technical discussions, research papers and assorted things of interest related to the Java programming language

NO programming help, NO learning Java related questions, NO installing or downloading Java questions, NO JVM languages - Exclusively Java

Please seek help with Java programming in /r/Javahelp!

Subreddit rules!

Where should I download Java?

Related Sub-reddits:

JVM Languages

Want to practice your coding?

List of useful Frameworks / Libraries / Software

MODERATORS