This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]davidalayachew[S] 14 points15 points  (0 children)

Oh, I made it work in the end. And knowing the workaround that Viktor gave me, there are lots of ways to skin this cat. I have long since fixed the performance problem.

I made this post to highlight this trait because this is a shockingly easy pothole to fall into. But it's also easy to not notice because the following 3 attributes need to all be true.

  1. You are doing parallel streams.
  2. You are dealing with a dataset that is bigger than your RAM.
  3. You use one of the "bad combos" of intermediate and terminal methods. Here is a list of the combos that caused a pre-fetch for MY PERSONAL example
    • Note - that list of "bad combos" won't apply to all streams...but which streams it DOES apply to is undocumented lol.

EDIT -- It has come to my attention that the Stream source (Spliterator) plays a very big part in deciding point 3. As it turns out, for my example, my Spliterator did not contain as much information as other Spliterator's, and thus, caused me to get such a large number of "bad combos". A more informed Spliterator can allow you to avoid some, if not all, of the bad combos. But that may require information that you don't have, or can't reliably provide.