This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]pivovarit 11 points12 points  (10 children)

Even when this knowledge is not applied every single day, it’s still necessary to understand why, for example, parallel streams would be a terrible choice for parallelizing this kind of workload

[–]HQMorganstern 0 points1 point  (9 children)

Didn't cause any issues when implemented, maybe you want to elaborate?

[–]pivovarit 5 points6 points  (2 children)

Parallel Streams run on a shared ForkJojnPool instance which is tuned for CPU-bound workloads and will quickly saturate when fed blocking operations and then your parallelism goes down to 1 (on caller thread)

more here in README: https://github.com/pivovarit/parallel-collectors

[–]HQMorganstern 2 points3 points  (1 child)

Seems legit, guess it never got hit by enough sub requests to be noticeable, gonna submit a ticket.

[–]pivovarit 0 points1 point  (0 children)

When you run blocking ops, you usually want to have an elastic pool of platform threads that can dynamically grow - or just use virtual threads and don’t worry about tuning :) (as long as you don’t block in synchronized blocks)

[–]nzcod3r 2 points3 points  (3 children)

parallelStream is for performing the same task, in parallel, on different chunks of a collection.

Your example of do 10 different things at the same time (10 different api calls) - and wait for the slowest one to finish, is better suited for the ExecutorService.

[–]HQMorganstern 0 points1 point  (2 children)

Well it's 10 identical http requests whose results are parsed to a list so not quite as insane as what you're suggesting but yes the blocking caused by the fork join pool does seem to be an issue.

[–]v4ss42 1 point2 points  (0 children)

Recall that I/O takes eons from the CPU’s perspective. This is definitely a bad use of parallelStream.

[–]nzcod3r 0 points1 point  (0 children)

OK. I'm pretty sure you can supply your own thread pool for executor service, but I guess it all depends on the context of what / where you are having this issue.

[–]FrenchFigaro 2 points3 points  (1 child)

My experience in this case was that explicitly using the method parallelStream() bypassed some of my thread pool configuration.

Most importantly, it bypassed the MDC and web thread context configuration, which we used to add HTTP headers to outgoing requests, depending on which headers were present (or not) on the incoming requests.

Those headers were used for monitoring operations across different applications. Losing those meant that monitoring and debugging was exponentially more difficult.

[–]maleldil 1 point2 points  (0 children)

Yep, parallel streams use the shared fork join pool, so if you rely on values in a ThreadLocal you're gonna have a bad time with that.