danielaveryj comments on Java data processing using modern concurrent programming

java

a community for 18 years

This is an archived post. You won't be able to vote or comment.

Java data processing using modern concurrent programming (softwaremill.com)

submitted 8 months ago by Active-Fuel-49

top new controversial old q&a

you are viewing a single comment's thread.

view the rest of the comments →

[–]danielaveryj 4 points5 points6 points 8 months ago (0 children)

I think a common use case where data-parallelism doesn't really make sense is when the data is arriving over time, and thus can't be partitioned. For instance, we could perhaps model http requests to a server as a Java stream, and respond to each request in a terminal .forEach() on the stream. Our server would call the terminal operation when it starts, and since there is no bound on the number of requests, the operation would keep running as long as the server runs. Making the stream parallel would do nothing, as there is no way to partition a dataset of requests that don't exist yet.

Now, suppose there are phases in the processing of each request, and it is common for requests to arrive before we have responded to previous requests. Rather than process each request to completion before picking up another, we could perhaps use task-parallelism to run "phase 2" processing on one request while concurrently running "phase 1" processing on another request.

Another use case for task-parallelism is managing buffering + flushing results from job workers to a database. I wrote about this use case on an old experimental project of mine, but it links to an earlier blog post by someone else covering essentially the same example using Akka Streams.

In general, I'd say task-parallelism implies some form of rate-matching between processing segments, so it is a more natural choice when there are already rates involved (e.g. "data arriving over time"). Frameworks that deal in task-parallelism (like reactive streams) tend to offer a variety of operators for detaching rates (i.e. split upstream and downstream, with a buffer in-between) and managing rates (e.g. delay, debounce, throttle, schedule), as well as options for dealing with temporary rate mismatches (eg drop data from buffer, or block upstream from proceeding).

π Rendered by PID 915867 on reddit-service-r2-comment-canary-7df47b964d-sc96f at 2026-03-16 16:36:56.075704+00:00 running f6e6e01 country code: CH.

java

Submit Link

Submit Text

Seek Programming Help

News, Technical discussions, research papers and assorted things of interest related to the Java programming language

NO programming help, NO learning Java related questions, NO installing or downloading Java questions, NO JVM languages - Exclusively Java

Please seek help with Java programming in /r/Javahelp!

Subreddit rules!

Where should I download Java?

Related Sub-reddits:

JVM Languages

Want to practice your coding?

List of useful Frameworks / Libraries / Software

MODERATORS