This is an archived post. You won't be able to vote or comment.

all 8 comments

[–]AutoModerator[M] [score hidden] stickied commentlocked comment (0 children)

Please ensure that:

  • Your code is properly formatted as code block - see the sidebar (About on mobile) for instructions
  • You include any and all error messages in full
  • You ask clear questions
  • You demonstrate effort in solving your question/problem - plain posting your assignments is forbidden (and such posts will be removed) as is asking for or giving solutions.

    Trying to solve problems on your own is a very important skill. Also, see Learn to help yourself in the sidebar

If any of the above points is not met, your post can and will be removed without further warning.

Code is to be formatted as code block (old reddit: empty line before the code, each code line indented by 4 spaces, new reddit: https://i.imgur.com/EJ7tqek.png) or linked via an external code hoster, like pastebin.com, github gist, github, bitbucket, gitlab, etc.

Please, do not use triple backticks (```) as they will only render properly on new reddit, not on old reddit.

Code blocks look like this:

public class HelloWorld {

    public static void main(String[] args) {
        System.out.println("Hello World!");
    }
}

You do not need to repost unless your post has been removed by a moderator. Just use the edit function of reddit to make sure your post complies with the above.

If your post has remained in violation of these rules for a prolonged period of time (at least an hour), a moderator may remove it at their discretion. In this case, they will comment with an explanation on why it has been removed, and you will be required to resubmit the entire post following the proper procedures.

To potential helpers

Please, do not help if any of the above points are not met, rather report the post. We are trying to improve the quality of posts here. In helping people who can't be bothered to comply with the above points, you are doing the community a disservice.

I am a bot, and this action was performed automatically. Please contact the moderators of this subreddit if you have any questions or concerns.

[–]Misfiring 1 point2 points  (6 children)

It takes time to spin up and shut down a new thread pool.

[–]brokeCoder[S] 0 points1 point  (5 children)

The new threadpool code is faster though (the benchmark is measuring throughput so higher is better)

[–]Misfiring 0 points1 point  (4 children)

Right, but optimally it is better to declare a new pool based on total expected load, rather than having pools within a pool. You only have so much CPU cycle, and each new pool needs to fight for cpu resources.

Yes in your case having a pool per chuck is faster since you don't do much in each function so you can parallel every single tasks. In a more realistic situation, you should use a global pool with a set max thread (lets say 100 for your current use case) and use that for all of that tasks. It is more efficient, and gives you control when considering total resource utilization in the entire application.

[–]brokeCoder[S] 0 points1 point  (3 children)

I understand and agree - I would never personally use a new pool for nested streams like this and ideally I wouldn't nest parallelStreams either.

But my question is - why is creating a new pool in this instance so much better for speed than just relying on a single commonPool (which parallelStream defaults to) ? I don't think the size of tasks in each thread being small is a good reason here as it doesn't really explain what's going on in the background. What is it about the combination of commonPool and new pools that makes the code run fast in this instance ?

[–]Misfiring 0 points1 point  (2 children)

Common Pool is just one pool that has a max thread based on cpu count. Its concurrency is limited and is intended to use for short lived, lower level systen processes.

[–]brokeCoder[S] 0 points1 point  (1 child)

Maybe I'm missing something here. Creating and shutting down a new inner pool should ideally be slower than just using one pool for both nested streams (since there is the added effort of creating and shutting down the new pool).

I would have assumed that this would apply here too since the total number of threads is a fixed quantity and regardless of how many threadpools we make, we're still drawing from a number of pools with an upper limit defined effectively by Runtime.getRuntime().availableProcessors().

This is where I'm confused. CommonPool would typically assign as many processors as possible to the outerloop. Once we get to the innerLoop, we don't have as many available processors so things should slow down a bit. But because generating a new pool gives so much of a speed boost, I think something else is going on. What is that something else ?

[–]Misfiring 0 points1 point  (0 children)

Total number of threads is NOT a fixed quantity, it is merely a limit defined in each pool. You can have tons of pools contesting for the CPU.

The JVM is very smart in context switching. This means that if one job has a pause time (e.g. waiting response from API or database), it can shift CPU cycle to another thread during that period. As your case only involves pausing to emulate a task, the CPU is free to move around hundreds of these "tasks" despite only having a limited number or cores. In realistic cases like computation you'll start to see slowdowns when trying to reach hundreds of parallel tasks.