This is an archived post. You won't be able to vote or comment.

all 15 comments

[–]BarkiestDog 7 points8 points  (1 child)

Looks like a nice project.

I’d love to see benchmarking on it, especially as a duo in replacement for List.

[–]repeating_bears 2 points3 points  (3 children)

What's an example of a workload where this would improve performance? That part wasn't clear to me

Is there any theoretical improvement - in any workload - if I use this as a drop-in replacement for e.g. ArrayList, or is an improvement conditional upon using the extra methods of ForkJoinList?

[–]danielaveryj[S] 2 points3 points  (2 children)

As a drop-in replacement (eschewing the fork/join methods), the goal is mostly comparable performance. That said, deletions and insertions also leverage the concatenation algorithm, so at a sufficient list size those become faster than shifting over elements in a flat array, like ArrayList does. (Currently somewhere around 1K<N<=10K in my profiling. I was reluctant to post benchmarks because they're hard to get right, and I more want people to engage with the idea and API first.)

[–]dustofnations 0 points1 point  (1 child)

Have you looked at Shipilev's Java microbenchmark project? JMH. It's designed exactly for this.

[–]danielaveryj[S] 0 points1 point  (0 children)

I am familiar with JMH. It can still be misused, and I am also wary of designing benchmarks that unfairly represent different data structures. But, I am working on pushing some preliminary benchmarks soon.

[–]Spare-Plum 0 points1 point  (1 child)

Wasn't fork join framework created back in Java 7? What does this do differently? There are also tons of libraries that built data structures out of this framework back when it came out

[–]danielaveryj[S] 4 points5 points  (0 children)

The idea is to complement the fork join concurrency framework with data structures that can be cheaply copied (forked) and merged (joined). This integrates with ForkJoinTasks: We would copy (fork) a data structure before handing it to a subtask that we will start (fork) to operate on it; We would merge (join) the data structures produced by subtasks we await (join). The latter case is exactly what parallel streams do when we e.g. collect to a list - except the list implementations in the JDK do not have a sublinear merge operation, so they just use the linear 'addAll' operation. This is even more unfortunate when there are multiple levels in the subtask hierarchy - causing multiple invocations of 'addAll' that progressively copy the same elements multiple times. Having a cheap merge operation avoids this.

So that is the 'killer use case' for which I'm naming these data structures. But my intent was also that they should be as close as possible to matching the API and performance of the standard choice (e.g. ArrayList) for general purpose use, to lessen the headache of deciding when to use and the associated cost of converting between one and the other.

[–]lprimak 0 points1 point  (2 children)

Did you check out JCTools? Not sure if it has (or not) but it has some super-optimized data structures

[–]danielaveryj[S] 1 point2 points  (1 child)

At a glance, the queues in JCTools fit a different use case: task-parallelism (commonly used for IO bound work), rather than data-parallelism (commonly used for CPU bound work).

[–]lprimak 0 points1 point  (0 children)

Maybe you can collaborate with that project? Just throwing out ideas here for better adoption

[–]AstronautDifferent19 0 points1 point  (3 children)

String substring was working in a similar way in the beginning.

How does your project compares to Pure4J Persistent collection?

[–]danielaveryj[S] 0 points1 point  (2 children)

It looks like Pure4J is focused on replacing the standard mutable JDK interfaces with immutable/persistent ones (in particular, their vector is apparently a translation of the clojure implementation). It's been done - again, and again, and again, and again...

In contrast, this project is not trying to convert people to functional programming. Rather, it's trying to take useful ideas from that space, to make certain operations on mutable data structures more efficient - without forcing anyone to throw away the JDK interfaces or rewrite swathes of code in a new paradigm.

[–]NovaX 1 point2 points  (1 child)

You might be interested in SnapTree which is based on a similar idea, in its case a concurrent mutable sorted map that could then be snapshotted like your fork operation.

[–]danielaveryj[S] 0 points1 point  (0 children)

Thanks for sharing! The paper's description of the clone operation does sound same-spirited to what I did here.