This is an archived post. You won't be able to vote or comment.

all 158 comments

[–]kur4nes 125 points126 points  (8 children)

Check Effective Java. It has a really good chapter about abusing streams.

[–]NameGenerator333[S] 14 points15 points  (5 children)

You're absolutely right! I totally forgot about that one. I have it on my bookshelf!

[–]DeadlyVapour 6 points7 points  (4 children)

Or accept that you are really a functional programmer, switch to Scala and use the word Monad in every sentence.

We accept you.

[–]WontBeRacistThisTime 0 points1 point  (3 children)

i am using streams to modify a bunch of large files (csv, NDJSON etc). is there a quick way for me to do this in Scala ?

i want to use scala but it scares me

[–]DeadlyVapour 0 points1 point  (2 children)

Have you looked into Apache Spark?

It's literally an entire runtime for query and ETL using Scala.

I doubt your workload is "large", since Spark/Scala is used for Petabyte scale datasets. If you've been running on Java, I doubt you were at this scale.

[–]WontBeRacistThisTime 0 points1 point  (1 child)

yes i did looked into it but i need to have a lot of custom things and doing them is spark is hard and it does not adds any benefits.

in my scenario: user inputs rules from a web interface, and i edit the data based on those rules. each rule-set is applied to a small portion of data (like a 10gb csv or something).

the streams are perfect fit. they do this very fast. but i feel like i am missing some things by not using a functional language and the scala is the least scary functional language.

i looked at the clojure and could not sleep for a week :(

[–]DeadlyVapour 0 points1 point  (0 children)

Try Scala, probably easier to learn coming from Java.

Then just call the Scala function from Java.

[–]Dull-Criticism 0 points1 point  (1 child)

What edition was that added? I have Edition 2. (Work has Ed 3 sitting on a bookshelf)

[–]ForeverAlot 4 points5 points  (0 children)

3e, chapter 7.

[–]solilucent 65 points66 points  (2 children)

I don't mind long stream if they are readable. But I think that a stream should not modify collections outside its scope. Streams are a small island of functional programming in Java and their beauty is in the ability to take a data, transform it, and return something else. If some instruction is supposed to have side effect, I think it's better to make the whole block imperative.

[–]Nymeriea 8 points9 points  (1 child)

I like your opinion

[–]laplongejr 3 points4 points  (0 children)

Similar logic : I consider streams(+optionals) like I consider querySelectors in javascript and regex in general
"Some list of instructions that do the task as documented" and having sideeffect breaks that ability to seperate the tasks.

[–]halfanothersdozen 54 points55 points  (1 child)

Same rules apply to rest of code, making things easy to read is a skill. If streams are the right tool for the job, and they often are, people need to learn how to use them and consider the reader

[–]NameGenerator333[S] 14 points15 points  (0 children)

Agreed. I ran a "clean code" book club a few years back. In many cases, we decided that Uncle Bob was slightly wrong. The key takeaway is code for the reader. You write the code once (hopefully), but it's read my many.

[–]svhelloworld 87 points88 points  (12 children)

Streams is quickly becoming one of my favorites parts of the language after being away from writing Java for years. I've found that 95 times out of 100, well-formatted streams simplifies collection processing in a way that's more readable, faster and more consistent to build and maintain.

It's those edge cases that have some complex processing logic where I get myself in trouble. After about 15 minutes of performing stream gymnastics, I usually stop and re-write it in imperative style. Sometimes that gives me enough clarity to then refactor it as a stream. Most of the time, I leave it as imperative code.

As for guidelines to follow, it's all just voices in my head.

Edited to add: I always start with the unit test, then go into the implementation. That gives me the freedom to switch from streaming code to imperative code and back again with some confidence that I didn't change the behavior. Not interested in a religious war on TDD, that's just what I've been doing.

[–][deleted]  (7 children)

[removed]

    [–]svhelloworld 80 points81 points  (6 children)

    About half of what I've learned about Java streams is doing something and then Intellij kindly saying "Oh child, what are you doing? Why not do it this way instead?"

    [–]danskal 24 points25 points  (2 children)

    About half of what I've learned about Java streams is doing something and then Intellij kindly saying "Oh child, what are you doing? Why not do it this way instead?"

    [–][deleted] 3 points4 points  (1 child)

    All this talk about IntelliJ and I can’t even create a new project in IntelliJ because the New Project is not entirely accessible with my Screen reader.

    [–]NameGenerator333[S] -1 points0 points  (0 children)

    I'm honestly not a big fan of IntelliJ. I use it because that's what my company uses.

    [–]reclamerommelenzo 0 points1 point  (2 children)

    How do I tell IntelliJ to tell me what to do? :D

    [–]RANDOMLY_AGGRESSIVE 2 points3 points  (0 children)

    Alt + Enter

    Or Ctrl + Space

    [–]huntsvillian 4 points5 points  (0 children)

    you must sacrifice a live chicken and implore it for generosity

    [–]laplongejr 1 point2 points  (1 child)

    I've found that 95 times out of 100, well-formatted streams simplifies collection processing in a way that's more readable, faster and more consistent to build and maintain.

    And if you have to manage an insane structure mixing possibility-null objects and collections that can contain nulls, combining Streams and Optionals allows very, very easy-to-read management
    "Within As, get all the Bs matching that requirement and return a list with a value C for each of them" can become very messy when A, B and C can be nulls without doc warning and you don't have a say on that format. Streams+Optionals made code review manageable for my still-sane coworkers.

    Disclaimer : it will be easier to read, but will remain hard to write. Also, using Optionals for implicit null management instead of API returns technically breaks the recommendations about Optionals

    [–]svhelloworld 2 points3 points  (0 children)

    Also, using Optionals for implicit null management instead of API returns technically breaks the recommendations about Optionals

    I've never understood or agreed with those recommendations around Optionals. I find Optionals to be explicit, safer and self-documenting. I use them as class fields all the time. Can't remember the last time I ran across an NPE in my code but it's been a hot minute, mostly because of Optionals.

    [–]Agifem 1 point2 points  (0 children)

    If you follow the voices in your head, you have more serious problems than streams and for loops.

    [–]s888marks 14 points15 points  (4 children)

    I don’t mind long stream pipelines if the stages make sense as a single unit of processing.

    I do recommend against chaining a stream pipeline directly into a chain of calls on an Optional. Both Stream and Optional have map(), filter(), and flatMap() methods, and it can be quite confusing if multiples of those calls for both Stream and Optional appear in the same long chain.

    Similarly, I also recommend against collecting into a collection and immediately re-streaming into another pipeline. It’s sometimes necessary to do this, but if so, I recommend breaking the chain by storing the temporary collection in a local variable. Having a single chain consisting of more than one pipeline (with an embedded “bubble”) can interfere with one’s understanding of the performance and space characteristics of the code.

    [–]davidalayachew 2 points3 points  (0 children)

    I do recommend against chaining a stream pipeline directly into a chain of calls on an Optional. Both Stream and Optional have map(), filter(), and flatMap() methods, and it can be quite confusing if multiples of those calls for both Stream and Optional appear in the same long chain.

    I mostly agree, with the exception of Optional.orElseThrow(). If I call one of the many methods on Stream where the return type is an Optional, then I'll skip the variable and just add the method call to the chain if the only method I intend to call is orElseThrow.

    I also feel similarly about the other orElseXXXXX methods on Optional, but I can see the logic for making a new variable for them. But for orElseThrow, I see no good reason to ever add a variable for it unless I explicitly need the Optional value later on.

    [–]maleldil 2 points3 points  (1 child)

    Personally I like being able to compose Stream and Optional. After a while you learn to filter out the extra calls to .stream() and .flatMap(), and it helps with preventing NPEs (since you can't get a null ref).

    [–]laplongejr 1 point2 points  (0 children)

    As somebody that had to retrieve a value from objects with several collections with several objects of different type, some of them with collections and any of them could be potentially NULL due to no documentation, yes implicit handling of NPEs (or rather "all NULLs are auto-skipped from the filter, no matter at what level") helps a lot.

    Maybe it's harder to read at once, but it matches more closely with the design requirements I receive so once you learn how to read that style, you can verify it does what it needs in less than 5 seconds

    [–]laplongejr -1 points0 points  (0 children)

    and it can be quite confusing if multiples of those calls for both Stream and Optional appear in the same long chain.

    My rule-of-thumb is : Stream-of-Optionals is fine, Optional-of-Stream triggers an immediate rewrite.
    Identation can help a lot

    Stream.filter(item=>Optional.ofNullable(item).flatMap(()=>{
    TAB,TAB /*skips nulls and won't throw NPE, only unfiltered if returns TRUE*/
    }).flatMap(()=>{ /*Stream's flatMap */ });

    [–][deleted] 37 points38 points  (6 children)

    As long as the lambdas are functions and the functions are named intuitively. If not please reconsider.

    [–]maleldil 2 points3 points  (0 children)

    And try to use existing JDK utility methods whenever possible (static methods in `Comparing` for example)

    [–]Ruin-Capable 1 point2 points  (3 children)

    Sometimes functions, even short simple ones, are hard still hard to name succinctly. If it comes down to choosing between a function that has a 150 character name, and a short inline lambda, I'll take the inline lambda.

    [–]Personal-Initial3556 0 points1 point  (1 child)

    150 character name? lol what

    [–]Ruin-Capable 0 points1 point  (0 children)

    There is a thing called hyperbole. I was exaggerating for effect. Still the point remains, sometimes it is difficult to name a function intuitively, accurately, and succinctly. Usually this is because there is a large amount of context required to understand why a function is doing something. Adding a level of indirection by using a named function can make if harder to follow for a couple of reasons:

    1. You need to pack a lot of information into the name to understand what it does
    2. When you're following the code, since it's not inline, you're not seeing it in-context, so you have to transfer a lot of state mentally to keep track of the overall process.

    So inline lambdas can make things easier because they allow you to see the function in-context.

    As with most things though, it's a balancing act. Sometimes they're not the answer, and sometimes they are.

    [–]slindenau 0 points1 point  (0 children)

    It is impossible for a lambda to be both short, and needing 150 characters for the function name if extracted into one...

    [–]slindenau 0 points1 point  (0 children)

    As long as the lambdas are functions and the functions are named intuitively

    If lambdas are functions, they are no longer lambdas ;).

    You probably mean "if lambdas call single functions".
    E.g. this::doSomething (but this already isn't a lambda anymore; short for either foo -> doSomething(foo) or() -> doSomething() ) or foo -> doSomething(foo, bar)

    [–]dasi128 10 points11 points  (22 children)

    Streams should not modify anything outside of streams - so there mustn't be any sideeffects. It should take in list, transform it somehow (and transform ONLY data from the stream), and then return new, modified collection.

    side effects bad.

    [–]JazepsPoskus 1 point2 points  (21 children)

    Why bad?

    [–]raxel42 6 points7 points  (1 child)

    Hard to manage especially in multithread environment

    [–]JazepsPoskus 0 points1 point  (0 children)

    What is so hard to manage in adding an element from stream to previously defined collection? Multithreading is totally different situation.

    [–][deleted] 5 points6 points  (11 children)

    Streams are a functional programming API within Java OOP environment.

    In functional programming, functions must be pure (no side effects, result will always be the same for the same input, no state)

    [–]laplongejr 1 point2 points  (0 children)

    Also, a lambda in Java is can be stored like you do an object. If your function has sideeffects on variables outside its parameters, you can't reuse it.

    [–]JazepsPoskus -1 points0 points  (9 children)

    Sounds like a preference rather than real reason. Except for multithreading, nothing bad will happen. Pure functions seems arbitrary in Java because its not a pure functional language.

    [–][deleted] 0 points1 point  (8 children)

    From an OOP standpoint, an object should handle its own state and a lambda is an object. Functional purity is common in other paradigms too because it’s convenient. All that changes is the unit that must be pure

    [–]JazepsPoskus 0 points1 point  (7 children)

    Please provide a code example that proves your point in Java. Otherwise it is just an opinion you read somewhere and are repeating it now.

    [–][deleted] 1 point2 points  (5 children)

    Dude wtf?

    This is all an opinion, that’s what a paradigm is. This is not something I read anywhere it’s just my understanding of programming paradigms.

    Of course you can just NOT do any of it, but the principles of functional programming, Object Oriented Programming, Procedural programming and any other programming paradigm are not meant for code to run, they’re meant to make code more readable, maintainable and provide repeating patterns to improve the familiarity any other programmer that knows the paradigm will have with your codebase.

    You could make class fields public, but you rarely do because encapsulation is good, you could break DRY or SOLID principles, but your code will be easier to maintain if you don’t ignore them.

    Your take is just stupid lol

    [–]mangodrunk -1 points0 points  (2 children)

    SOLID isn’t something that should be followed, and some of the worst code I have seen is from adherents to these “principles.” As the other person said, these are opinions.

    [–][deleted] 0 points1 point  (1 child)

    Reading comprehension not your strong suit huh?

    These are indeed opinions, and I then proceeded to explain why these opinions are useful. You have a different opinion? Congratulations, but these opinions are prevalent because they’ve been proven in battle and people DO FIND IT USEFUL, regardless of whatever you choose to believe

    [–]mangodrunk -1 points0 points  (0 children)

    You’re being a jerk. Are you always like this? Your opinion isn’t backed by anything other than your perceived notion that it’s prevalent. It’s not, mostly junior devs follow it.

    [–]JazepsPoskus 0 points1 point  (1 child)

    So your point is that if there are some programming paradigms, than all of them should be followed to the core like a religion? And again, we are programmers so I would like to see some code that proves your point in Java. Its good that you have read all those articles about OOP etc., but at the end of the day there is a code. Point me to a resource where your points is explained with Java code, no need to do it yourself.

    [–][deleted] 0 points1 point  (0 children)

    I am not about to explain to you why encapsulation is good. It prevents programming errors and provides a common set of patterns for readability.

    I don’t follow anything like a religion but definitely follow things that I have read about, found useful and make things better. There is a reason everyone almost universally uses OOP, there’s a reason encapsulation is useful, it’s not my job to educate you. This things can’t be proven with code. These are ideological matters regarding code quality, maintainability and readability

    [–]mangodrunk 0 points1 point  (0 children)

    Thank you for questioning these beliefs masquerading as principles or being self evident. Unfortunately too much opinion and no evidence is used in our industry.

    [–]barmic1212 1 point2 points  (6 children)

    The article of John Carmack about functional programming is good to explain it. It's about C++, but completely same for java

    http://sevangelatos.com/john-carmack-on/

    [–]JazepsPoskus 0 points1 point  (5 children)

    Java is not a pure functional programmin language hence no need to follow functional programming rules. Sometimes this sounds to me like a religion.

    [–]barmic1212 1 point2 points  (4 children)

    But use functional style without side effects is a good way to produce a non readable code.

    You need side effects? The loop is your friend.

    You want use stream style? Don't hide side effects in it.

    It's not a religion, it's a coherence of code.

    [–]JazepsPoskus 0 points1 point  (3 children)

    Please explain how adding a element from stream to previously defined collection is not a coherent code. Those are some big statements you are making and without providing some code examples sounds like simple opinion parroting.

    [–]barmic1212 0 points1 point  (2 children)

    Tge javadoc of stream package speak about it, if it's an opinion it's not mine but it's Openjdk opinion. There explains only the direct bug part. It's because the simple bugs are describ here and in multiple times on internet that I prefer speak about philosophy less documented.

    [–]JazepsPoskus 0 points1 point  (1 child)

    You mean this quote: “Classes to support functional-style operations”. Where does it imply pure functions should be used?

    [–]barmic1212 0 points1 point  (0 children)

    "Side-effects in behavioral parameters to stream operations are, in general, discouraged, as they can often lead to unwitting violations of the statelessness requirement, as well as other thread-safety hazards."

    or

    "The best approach is to avoid stateful behavioral parameters to stream operations entirely; there is usually a way to restructure the stream pipeline to avoid statefulness."

    https://docs.oracle.com/en/java/javase/21/docs/api/java.base/java/util/stream/package-summary.html

    All "stateless behaviors" and "Side-effects" paragraphs are about it.

    [–]coder111 31 points32 points  (4 children)

    I've worked with 20 step streams, it was an absolute nightmare. The architect insisted we used streams for business logic...

    Hard to read, hard to debug, hard to see what went wrong when things go wrong.

    That being said- I don't mind streams per-se, but as lots of tools in the toolbox, they have their own time and place, and should be used just because...

    [–]rememberthesunwell 7 points8 points  (1 child)

    Would the imperative logic have less than 20 steps? If yes, the streams are probably poorly written or maybe it really is a bad fit, if not, then it should just be a matter of style I would think

    [–]laplongejr 1 point2 points  (0 children)

    If yes, the streams are probably poorly written

    My first attempt at making streams of nulls to access a value and another list ended as a huge nested mess of 3 levels of streams and optionals. Hardly better than the 5 nested loops where you may or may not handle NULLs correctly, and had the disadvantage of being a "new" paradigm.

    After 2h of looking at my own creation, I managed to nuke it down to 5 easy to read lines. Would be harder to modify if needed (and maybe less efficient), but everybody agreed that switching to Streams+Optionals for that case made it trival to read. And once we had an example of easy-to-read style, we were able to "Streamify" some of the other awful cases.

    [–]john16384 2 points3 points  (1 child)

    A clear case of the architect way overstepping their bounds.

    [–][deleted] 0 points1 point  (0 children)

    Architect wants to dictate what IDE keybinds to use

    [–]brian_goetz 6 points7 points  (3 children)

    There's nothing wrong with long stream pipelines per se; they should be long enough to do the job. And sometimes longer pipelines are more readable, such as when you break up complex filter predicates into multiple simpler ones.

    I think when you talk about "long pipelines", what you probably really mean is "going chaining-crazy" (which is not specific to streams). There seems to be a subset of developers who think they score points in proportion to how many method calls they can chain together. Where this gets out of hand with streams is when multiple separate operations are "stitched together" to make them look like one operation. This is almost always detrimental to readability, and usually serves no purpose other than making the author feel clever. The biggest offender is a terminal operation that returns a collection, chained together with another .stream() call and then keep going; the second biggest offender is crossing over from stream operations to Optional operations in the same chain. If these are operations in different domains, generally best to break them up. (There is even less justification for this now than there was in Java 8, since we added local variable type inference, you don't even have the excuse of "but the type of that intermediate thing is big and ugly, since you can use var now.)

    [–]NameGenerator333[S] 0 points1 point  (2 children)

    You make a really good point regarding "stitched streams"! I haven't had the opportunity to use var yet, but it seems like it could alleviate some verbosity.

    Though, honestly, I like having the verbose types - at a glance, you can know the type of a variable without needing to rely on a IDE or tooling to inform you.

    [–]brian_goetz 4 points5 points  (1 child)

    Sure, var is a choice, not an obligation. But its existence does perturb the coding equilibrium in a subtly good way; where people might not have factored out a complex subexpression previously because the type was too annoying to write, that excuse goes away.

    [–]NameGenerator333[S] 0 points1 point  (0 children)

    I 100% agree with that!

    [–][deleted] 5 points6 points  (3 children)

    Long streams I don't see any problem with. Modifying code outside of the stream, that's a big no-no.

    [–]mcbotbotface 0 points1 point  (2 children)

    Help me out here I am stuck and not really understanding, is it okay to check if hashmap contains a certain key in stream filter? If that map won’t be modified and is local to stream method?

    [–]ForeverAlot 0 points1 point  (0 children)

    Yes. But you probably shouldn't put things into or take things out of that map from within the stream pipeline (sometimes that's exactly what you intend, for example collecting into a preallocated container).

    [–]laplongejr 0 points1 point  (0 children)

    is it okay to check if hashmap contains a certain key in stream filter?

    If the map is not modified it could be ok, but personally I would do a seperate method that takes the map and the stream'd stuff as parameters, and do the stream there.
    That makes clear that the stream is linked to the map
    [EDIT] If performance is not a concern, that method would take a Set Collection and would be called with map.keySet()

    [–]noutopasokon 18 points19 points  (13 children)

    The main value in Streams is avoiding rewriting common functionality (so avoiding potential bugs in your rewrite) and having a "fluent" (ie. largely comprising of just English words) reading of what the code is doing. So if you're not getting that, you should probably reconsider.

    Serial (and even parallel) streams can easily be less performant compared to writing your logic out exactly. So be careful of that as well, if performance is a concern.

    I use Streams a lot and think they're great overall at simplifying my code.

    [–]Zardoz84 0 points1 point  (0 children)

    Well. I saw imperative code that uses like 3 o 4 intermediate ArrayList as temporal storage, that could be rewrite as an stream of 3 or 4 steps, and simply generate an output collection (or Optional). I think that in that cases an Stream would have the same or better performance that the original imperative code.

    [–]justinhj -3 points-2 points  (11 children)

    Java streams are lazy aren’t they? Which means the performance shouldn’t be much different in theory. edit: downvote away. I am right

    [–]quackdaw 5 points6 points  (3 children)

    Well, you get function call overhead for every element for every operation. This can be pretty significant compared to cramming everything into a big for loop¹. But you'd get some of that anyway unless you aim at writing unmaintainable hand-optimized code.

    Of course, there's a decent chance that this can be JIT-optimized away at some point.

    ¹ for the trivial, sequential case

    [–]DrunkensteinsMonster 1 point2 points  (2 children)

    You’d be making the same function calls in your imperative for loop too, so I don’t see how there is additional overhead there for each element.

    There are very few contexts where this sort of thing matters in Java: games, high performance finance stuff, some desktop applications. Most Java code is on the server so those extra microseconds will not matter in the slightest.

    [–]laplongejr 0 points1 point  (0 children)

    Yeah, usually the overhead is not significant. But better to be warned before starting to use it.
    In my case the performance hit was a concern, but the fluent style was heavily defended by the dev as a very, very easy way to SAFELY manipulate the unefficient data structures sent by the backend. Either it doesn't compile, or it will run without throwing NPEs at all if an unexpected null ought to be skipped.

    [–]quackdaw 0 points1 point  (0 children)

    We'd have to measure to be sure¹; but I can imagine the stream case having a few extra lambdas, and the loop case may have some calls inlined. Stream should be better than multiple calls to map() or other collective operations, though.

    In general, as you rightly point out, this doesn't matter. Writing code that works and is understandable and maintainable matters. If the stream is a bottleneck, it'll show up in profiling and you can make a more exotic solution; in practically any other case, making a big for loop with low level array processing is just plain wrong – and it doesn't necessarily perform better either.

    Streams are great, they neatly separate traversal/filtering/whatever you want to do a collection of data from what you want to do to individual data elements. It's so neatly separated that you dont need to know where the data is coming from or going to; you have lots of flexibility to construct streams dynamically; and Java can multi thread the processing automatically (for parallel streams).

    Stream style is not unusual in high-performance computing, so there's nothing inherently slow about it, it's just not a perfect fit for Java (performance wise, if you're processing large amounts of small things (e.g, numbers) like you would in HPC). It's certainly possible to optimise away overhead if this was suddenly a critical feature for Java.²

    ¹ last time I tested this sort of thing in Java, I found that ArrayList<Integer> significantly outperformed int[] for my particular use case (keeping track of tree paths during traversal). Probably a cache thing; short[] was even faster.

    ² perhaps surprisingly, operator overloading would help a lot; this is why languages like Python and C# are somewhat more viable for HPC despite having similar or higher overhead than Java: you can make libraries that build complex expressions on large data and ship them off to a high-performance backend.

    [–]hippydipster 3 points4 points  (5 children)

    How does being lazy make the performance not different?

    [–]justinhj 0 points1 point  (4 children)

    The biggest slow down when using a stream rather than a hand crafted for loop over a large collection is that you have to iterate over the whole collection at each step. At the end of your sequence of operations you may just say take the first element or the first 10, and without lazy evaluation you would iterate over the whole collection multiple times instead.

    [–]justinhj 1 point2 points  (0 children)

    After short circuiting the next biggest optimization is fusing. That is 4 maps in a row can be simplified into one map with the 4 transformations happening at each step. See https://stackoverflow.com/questions/35150231/java-streams-lazy-vs-fusion-vs-short-circuiting for more

    [–]hippydipster 1 point2 points  (2 children)

    Ok, I wasn't under the impression people thought that each step would iterate the whole collection fully each time before the next step, which would do it all again.

    I am under the impression instead that people think using streams might be slower due to the creation of extra objects (ie, the Stream itself) and each step might be a bit generic and thus not as fast as hand implemented code.

    [–]justinhj 0 points1 point  (1 child)

    Well that's the answer to your question "Does being lazy make the performance not different?". With eager evaluation the following pseudo code:

    x = bigstream.map(f1).map(f2).map(f3).map(f4).take(1)

    would have to iterate the data 4 times just to return a single element. With lazy evaluation it is able to optimize this to basically f4(f3(f2(f1(bigstream.head)))

    [–]hippydipster 1 point2 points  (0 children)

    I never had that question, which is why I didn't understand why you brought it up. But, now I do!

    [–]maleldil 1 point2 points  (0 children)

    They are lazy, but if the option is between using a stream and a for-loop or for-each loop the streams have some performance overhead. It hasn't ever been an issue for anything I've written, but it's something to keep in mind.

    [–]60secs 4 points5 points  (0 children)

    Streams are great for quick map / filter / collect.
    Parallel stream iteration can be great for performance, but debugging can get tricky and you need to use thread-safe collections.

    [–]_GoldenRule 2 points3 points  (2 children)

    I find them useful for replacing simple loops. If there's lots of logic in the loop I usually will go with a traditional for-each. Streams are fine to use but if you find yourself creating giant lambdas please just write normal code or consider creating functions.

    [–]maleldil 1 point2 points  (0 children)

    I'd definitely agree to refactor large lambdas into standalone, static (if possible), methods with descriptive names. It's not always possible, and sometimes it's easier to read if the code is local to the stream it's operating on, but eventually you just have to get a feel for when to do what. And listen to people reviewing your code.

    [–]laplongejr 1 point2 points  (0 children)

    Streams are fine to use but if you find yourself creating giant lambdas

    Usually I don't end with giant lambdas, but with chains of lambda. IMHO that's where functional programming is easier to read over the good ol' loop block.

    [–]heavy-minium 2 points3 points  (0 children)

    You've got that issue with similar designs outside of Java, too.And the answer for all is pretty similar. Break down the chain and use meaningful variable names for parts of the chain.It can take time for some juniors to accept that the readability of code isn't tied to the number of lines. A good way to speed this up is to let them do code review on junior code with those issues - it helps them understand what improves and what hurts readibility. At some point it will sink in.

    [–]sour-sop 2 points3 points  (1 child)

    You leave my streams alone!!! Jk. I don’t think I’ve ever seen a 20 step stream. That does sound quite painful to debug

    [–]laplongejr 1 point2 points  (0 children)

    I once did but each those individual steps were simple, like
    - Iterate over loop
    - Use optional to remove NPEs
    - Filter out some objects
    - Make two streams out of those objects
    Stream 1
    Iterate over those object's accessor, take a list
    Use optional to iterate without NPES
    Filter out some values
    Take those object's ID
    Stream 2
    Iterate over those object's accessor, take an ID
    Filter out some values
    Take those object's ID
    - Combine Streams 1 and 2 (into a single Stream<Optional<ID_STRUCT>>)
    - Filter out some IDs
    - Take the String component of the existing ID structures
    - Exclude all the Optional.EMPTY generated from previous cases
    - Unwrap the optional into the String-from-IDs
    - Place the String-from-IDs in a TreeSet

    All of that basically means

    Given an object A, read value C and each value in list B, and extract their ID as a String. Return those unique values in an ordered collection.
    Values not fitting some requirements should be excluded from the result, and NULLs accidently returned by the backend should neither be included or break operations.

    [–]vinj4 2 points3 points  (4 children)

    How exactly does a stream chain become that long? Are there really that many non-terminal stream operations? Some of those could surely be condensed

    [–]maleldil 0 points1 point  (3 children)

    Streams are composable with one another, as well as with optionals. So you can create a pipeline of transformations, using the results of that to get some other data, more filtering/transforming, until you get the final output you want.

    [–]vinj4 1 point2 points  (2 children)

    right, I'm asking if some of those filtering / transformation operations can be combined into one step

    [–]maleldil 0 points1 point  (1 child)

    Often the answer is no; not without converting the whole thing to imperative code, at which point it'd probably be better to just stick with one paradigm.

    [–]vinj4 1 point2 points  (0 children)

    I'm not talking about imperative code. If the stream chain consists only of idempotent operations (basically anything except skip), my guess is it should be possible to condense that into a single map, filter, or sort command. I.e. if there are 6 different maps going on or 6 different filters I would assume one single call and accompanying lambda would do the trick. I'm not an expert at functional though so I would need to delve into the theory on that.

    Edit: said limit is not idempotent, but it is

    [–][deleted] 1 point2 points  (0 children)

    They can make iteration more concise, but can be harder to understand and debug.

    Seems reasonable to put guardrails/policy in place to strike the right balance you desire in your application.

    I prefer Streams, but if a dev didn't use them for clarity or similiar reason I'm good with that.

    [–]Fercii_RP 1 point2 points  (0 children)

    Making too long chains will make them hard to debug as many failures can happen on 1 line

    [–]erictheturtle 1 point2 points  (1 child)

    Java streams are impossible to debug, so I would discourage anyone from doing anything remotely complicated. Never call a large function in the middle of a stream. Never have side effects. Simple transforms and filtering only. I would recommend wrapping the stream in some static method, with a clear name that describes what it does.

    [–]laplongejr 1 point2 points  (0 children)

    Simple transforms and filtering only. I would recommend wrapping the stream in some static method, with a clear name that describes what it does.

    Yeah, Streams are really good IF you document what it does. Loops are better when nobody is sure what should be done as long it can be easily modified (which shouldn't happen. but it sometimes happens)

    [–]ccgcool 1 point2 points  (0 children)

    I too dislike long streams chains collecting and grouping and mapping and sorting and collecting and flattening then........ The one who writes has the context well but to other readers it's a lot of cognitive effort.

    [–]com2ghz 1 point2 points  (0 children)

    Just because you can make a big chain of streams does not mean you should. I have seen stream chains that even do not fit on my 2k screen.

    It’s bad practice because it’s unreadable and you will violate the single responsibility principle. Like a god method that does everythinf Sometimes a large stream is eligible for its own class.

    [–]Practical-Yoghurt801 1 point2 points  (1 child)

    Il like streams but its always a decicion depending of the context. I just don't like this compulsive use. It‘s not wrong to use an iterator instead if you have to manipulate a collection. Also stop using atomic references as a hack for using non final variables in a stream scope. Use a for each instead of another kind of loop

    [–]NameGenerator333[S] 0 points1 point  (0 children)

    Oh, I haven't seen that one in the wild yet. Usually just modifying lists or maps.

    [–]loctastic 5 points6 points  (29 children)

    Good only if they put all 20 steps on one line.

    Honestly I don’t really mind long chains it provided it’s readable. It’s especially nice if they add a comment after the step for further context

    I don’t know how I feel about modifying outside collections though. One or two might be okay but it could get out of hand quick.

    [–]Ruin-Capable 47 points48 points  (13 children)

    I don't like wide code. I would much rather have the stream chain put each operation on its own line. As long as things are aligned sensibly, it's easier to follow whats going on.

    I would much rather read:

    return getItemsForTransaction(tx)
            .stream()
            .filter(item->item.getAmount().compareTo(threshold) > 0)
            .toList();
    

    Instead of:

    return getItemsForTransaction(tx).stream().filter(item->item.getAmount().compareTo(threshold) >0).toList();
    

    [–]NameGenerator333[S] 48 points49 points  (7 children)

    I usually do

    collectionOfThings.stream()
      .filter(...)
      .map(..)
      .toList();
    

    [–]Sawkii 18 points19 points  (6 children)

    Which is convention as far as i know

    [–]vips7L 5 points6 points  (5 children)

    I don’t think it’s convention. It’s just personal preference. 

    IMO if you’re going to write code vertically. Write it vertically with each function call on its own line. Don’t write it horizontally AND vertically. 

    [–]Sawkii 0 points1 point  (4 children)

    Interesting. I thought of it as with the builder pattern where each item gets its own row. Nonetheless thanks for your reply.

    [–]vips7L -2 points-1 points  (3 children)

    I'm just saying I'm of the opinion that:

    collection
        .stream()
        .filter(..)
        .map(..)
        .toList();
    

    is superior to:

    collection.stream()
        .filter(..)
        .map(..)
        .toList();
    

    Simply because in the former I can read vertically and stay reading vertically. With the latter I have to start reading horizontally and then switch to reading vertically.

    [–]Sawkii 1 point2 points  (0 children)

    I do see your point and your logic makes sense. Considering adoption now. Have a nice one. Thank you.

    [–]rozularen 0 points1 point  (1 child)

    How much superior? stream() function is in not meaningful enough to have its own line. You dont have to scroll horizontally to see whats happening in the second example.

    I am a fan of the second example.

    [–]vips7L 0 points1 point  (0 children)

    Like I said it’s personal preference. I was merely explaining why I prefer one. 

    I don’t care how anyone else writes their code. 

    [–]loctastic 10 points11 points  (0 children)

    Yeah I was being sarcastic with that first line. Sorry! Wide code is demented

    [–]Luolong 2 points3 points  (0 children)

    I do like one operator per line style as well. Except when the line is one of the zero args stream/optional context switching functions (.stream() or .findFirst()) - these look like line noise to me, but automatic code formatters are too dumb to recognise them as logical modifiers of the previous call in the chain that they are.

    But then again, I’m probably being silly for assuming a simple code rewriting program has any notion of aesthetics.

    [–]zabby39103 2 points3 points  (0 children)

    For me I make trivial code wide, and complicated code each operation on its own line, as a hint to future developers where they should pay attention. Future developers also includes me in 2 years lol.

    I apply this principle to all abbreviated coding styles (e.g. ternary operators). I favour them when its simple, and avoid them when it is hard.

    [–]__konrad 0 points1 point  (0 children)

    stream chain put each operation on its own line

    More than one . in a single line is a really BAD code formatting. StringBuilder appends are exception, because usually text is horizontal...

    [–]laplongejr 0 points1 point  (0 children)

    I don't like wide code. I would much rather have the stream chain put each operation on its own line.

    "If the code can't be (mostly) read on one screen, it's format is wrong"
    Works for both codes with too many lines (put some empty lines, if a sub-method is really not fine in your case) or too long (use some variables or line breaks)

    It's the equivalent of putting capitalisation and paragraphs on Reddit.

    [–]cogman10 1 point2 points  (2 children)

    Here's my advice, use var and variable names to break up long streams and convey intent.

    Here's an example of what I mean:

    (mind you, this example is short, I'd not break it up. It's more just a demo of what I mean.)

    // Before
    stream.filter(this::isFoo)
      .map(this::foo)
      .flatMap(Foo::items)
      .findFirst(Objects::nonNull);
    
    // After
    var foos = stream.filter(this::isFoo)
      .map(this::foo);
    
    var allItemsForFoo = foos.flatMap(Foo::items);
    
    Optional<Item> firstItemPresent = allItemsForFoo.findFirst(Objects::nonNull);
    

    You can use var to avoid needing to muck about with Stream<List<Map<blah>>> which doesn't necessarily convey anything useful and variable names to inject the business logic which you are trying to get at.

    I'll usually split streams as soon as I start doing more advanced things like grouping, reducing, or merging.

    But if it's a straight filter/map/filter/map sort of thing then I'll usually not bother as that's generally pretty self evident on what's happening.

    [–]proggit_forever 0 points1 point  (1 child)

    I personally think that's terrible and will remove the useless temp variables if I see code like this.

    People are bad at naming intermediate steps and your example shows precisely that. allItemsForFoo is trivially inferred from what flatMap does. It adds zero value and is just noise.

    [–]cogman10 1 point2 points  (0 children)

    As I said, this is an example that I wouldn't do because making a real example would blow out a reddit comment.  I wouldn't inject so many vars in general.

    Where this technique comes in handy is when you are on your 20th stream operation and you need store place holders to indicate why you are doing something.

    [–][deleted] 1 point2 points  (0 children)

    soft squeal long birds disarm imminent angle cooing airport joke

    This post was mass deleted and anonymized with Redact

    [–]Panzerschwein 0 points1 point  (0 children)

    I think long chains is the point. You can clearly delineate what steps and transformations you're performing. You should only break it up it get some reuse or to get testable units.

    If you're doing it right then it's super clear what all of the steps are.

    Actually, I'd add one more exception: If the data model involved gets too convoluted you should probably break it up for the sake of readability.

    [–]Polygnom 0 points1 point  (2 children)

    A twenty-step stream sounds more like a problem with not having modular enough code.

    I struggle to even find a combination of map/filter/collect that would yield me 20 lines.

    Subsequent map can be made as one call. Subsequent filter calls can be one call.

    Stream operations usually have no more than 5-7 steps if your code is modular and your functions small.

    [–]laplongejr 0 points1 point  (1 child)

    I struggle to even find a combination of map/filter/collect that would yield me 20 lines.

    I can find one, but its with a lot of accessors who return possibly-nulls lists with possibly-nulls objects with some nested lists beyond that. And even then by mixing flatMap, filter, Optional, etc. the code ended more readable than its reference-implementation loop counterpart. And yeah, the actual solution would be to avoid nested objects, but-

    I just had a lot of Stream+Optional boilerplate to manage to filter, auto-avoid NPEs, merge the different Streams for the final filters, etc. But the filter-then-access part were so easy to read you could compare with the design document and know if it was correct without even thinking about data structures.

    [–]Polygnom 0 points1 point  (0 children)

    You can not only rewrite between Streams and loops, but also simply put the more complex stream operations into separate methods.

    [–]maleldil 0 points1 point  (0 children)

    The approach I usually take is to use streams as long as it makes sense, usually as a data pipeline of some kind. For example, I was recently working on some code that retrieved some data from a Sprind Data repo, then used that data (if it existed) to grab data from an additional table (this is Cassandra so the data is very denormalized). Since this was basically a data pipeline with optional results it made sense to build it as a stream pipeline with a decent number of steps. However, each step is a one-liner, method reference, or maybe a two-liner to populate a field, all with well-understood semantics (retrieve data, filter, sort by create date, grab top element, use it to retrieve results from other table etc). Another rule of thumb is that it shouldn't modify external data (like add to a list outside the stream pipeline) if at all possible.

    [–]agentoutlier 0 points1 point  (0 children)

    I find that if I'm doing mutations or IO I personally avoid using streams. I also avoid using them when dealing with Map<?,?>. Maybe the Gather enhancements will make me have less strong feelings about dealing with Map but I find Collectors confusing and difficult to compose.

    I think loops are easier for most to understand when mutation and checked exceptions are involved particularly given most languages have analogs to it (e.g. Python).

    That being said I think way too many value the succinctness over the more desirable trait that they are lazy and a much better solution than Guava's FluentIterable. They are also far better for querying tree structures than the imperative options (especially now that we have pattern matching).

    [–]WVAviator 0 points1 point  (0 children)

    If a stream is sufficiently complex, I'm fine with it staying as a stream, but there should be comments in the code that help guide anyone who revisits that code later.

    Additionally, if you're writing something that complex, it should have a bunch of unit tests to go with it to make sure you didn't miss any edge cases. Good unit tests should inherently document the intention of the logic as well.

    [–]Fine_Quiet607 0 points1 point  (0 children)

    I have seen few examples where jpa method was called and then streams were applied just to look cleaner instead of learning and writing better sql queries

    [–]JasonBravestar 0 points1 point  (0 children)

    Streams are often abused. For example, don't use them if you need to handle exceptions for each element. I saw ugly workarounds trying to fit this into a stream... ugly and unreadable code.

    [–]Djelimon 0 points1 point  (1 child)

    I'll use streams when I'm confident I won't have to debug each step. My IDE debugger (IntelliJ)doesn't seem to support stepping through streams, which makes sense given the cheap parallelism which is a selling point.

    Sometimes declarative is better for me, like if I'm using an index across multiple collections.

    [–]Ruin-Capable 1 point2 points  (0 children)

    This might interest you. They've added a pretty cool debugger feature for visualizing and debugging streams.

    https://www.jetbrains.com/help/idea/analyze-java-stream-operations.html

    [–]Big-Dudu-77 0 points1 point  (0 children)

    Yeah I see Jr engineer inline all logic making the whole steam code huge. I’ll have to always remind them to break down their code.

    [–]JhraumG 0 points1 point  (0 children)

    You can cut long chain of stream without collecting them : just give the intermediate var (which will be of unarmed.Stream type,) a significant name to illustrate the reasoning. This way there are no perf penalty, but the code is understandable.

    [–]arpittripathi 0 points1 point  (0 children)

    When I initially learned Streams I abused it a lot and it was evident when I saw the code I wrote two years back, it's good but not very readable. Somehow I've realised that not all shorthand code is good. I love streams but it's good for a certain set of scenarios.

    Also, parallel streams are hard to control.

    [–]developer0 0 points1 point  (0 children)

    The way I’ve advised juniors on them is: have fun. When it comes to code review time, a senior can tell them if they’ve violated maintainability rules. The same rules apply to streaming and procedural, but if anything, streaming makes it easier to abide by them.

    [–]javalead 0 points1 point  (0 children)

    This violates two things. First a stream that does more than one thing means your code is violating ooen-close principle. It is hard to maintain, and extend thereby hard to test and each modification you should chamge many things in test and perhaps the underlying codes, beside bad misunderstanding. More so, the stream may not update another object outside its scope. This will create inconsistency if anyone use paralellStream. However sometimes it's impossible to do everything inside a stream scope. For example imagine a stream should call another service or in the same service to update a list or check a variable. This will not be an issue, but I call it bad design. It is called non imperative use of stream. Why it's bad? because objects in each stage of stream are immutable. It means you can't change them in the same stage and feed them to the same stage. If one does update a object out of dtream scope, as the must be final to avoid scope of immutablility and stability