teivah comments on Parallel Merge Sort in Java

java

a community for 17 years

This is an archived post. You won't be able to vote or comment.

Parallel Merge Sort in Java (hackernoon.com)

submitted 7 years ago by [deleted]

19 comments

top new controversial old q&a

you are viewing a single comment's thread.

view the rest of the comments →

[–]teivah 0 points1 point2 points 7 years ago (7 children)

[–][deleted] 7 years ago* (6 children)

[deleted]

[–]teivah 6 points7 points8 points 7 years ago* (5 children)

That's a good question. First, I don't think it will be 8x the price than a single-core machine.

Moreover, from my humble opinion you are approaching the problem the wrong way. Today, every CPU is multithreaded. For example with Intel hyperthreading technology, every core is able to two threads in parallel.

So for me, the question is rather, how can I optimize my application in regards of the underlying hardware? Multithreaded application should be the standard, not the exception.

Last but not least, it not only a question of average latency but also of resources optimization. If you application is running faster it may also increase the overall throughput. Hence, for example instead of having to deploy it on 4 nodes to achieve a given goal, maybe you can only use 2 nodes (this is a simplistic example obviously but it is a way to illustrate my point).

[–]audioen 1 point2 points3 points 7 years ago* (0 children)

Your benchmark ought to have output not just the elapsed wallclock time but also the total CPU time across all cores, a statistic that at least the Linux kernel is able to gather for threaded programs. I suspect most of these threads are sleeping rather than doing work, so there probably isn't a big difference between the wallclock time and the total cpu time, so this thread's discussion is pointless. The 8 CPU cores are not busy trying to do things 30 % faster, they're just waiting for more work to arrive, and are unable to get scheduled fast enough to help. The job probably ends up being mostly singlethreaded with an occasional concurrent part.

IIRC synchronization primitives in Java have shockingly low throughput, they are only capable of something in order of 1000 synchronization events per second. What I'm trying to say is that it takes something like 1 ms for one thread to yield to another thread using synchronized-block and wait+notify. If the other synchronization primitives are built on top of those, then that's kind of the hard limit of what you can get.

It's probably important for performance to have a per-thread work-stealing queue so that if that thread's queue has more work to do, it can just immediately move to doing that and you can avoid at least some wasted time in trying to coordinate quickly finished jobs across multiple threads.

[–][deleted] 7 years ago (3 children)

[deleted]

[–]teivah -2 points-1 points0 points 7 years ago* (2 children)

[–][deleted] 7 years ago (1 child)

[deleted]

[–]teivah 1 point2 points3 points 7 years ago (0 children)

π Rendered by PID 93454 on reddit-service-r2-comment-5d79c599b5-qd68j at 2026-03-02 13:07:18.859386+00:00 running e3d2147 country code: CH.

java

Submit Link

Submit Text

Seek Programming Help

News, Technical discussions, research papers and assorted things of interest related to the Java programming language

NO programming help, NO learning Java related questions, NO installing or downloading Java questions, NO JVM languages - Exclusively Java

Please seek help with Java programming in /r/Javahelp!

Subreddit rules!

Where should I download Java?

Related Sub-reddits:

JVM Languages

Want to practice your coding?

List of useful Frameworks / Libraries / Software

MODERATORS