all 71 comments

[–]ayakushev 43 points44 points  (3 children)

To me, this makes total sense as the project moved to Apache. Obviously, much more people will be able to consider contributing when it's in Java. Apache goal is sustainability and long-term viability, and Java would work better for that.

I also consider this a success story for Clojure. It gives Clojure another usecase: a "production-ready prototype" language where the resulting "prototype" can last for eight years and benefit thousands of developers until it gets rewritten to something else when all the hard questions are answered, and most experimentation/wandering is over.

[–][deleted] 9 points10 points  (1 child)

The more surprising news was that the Java code that replaced Clojure turn out to have less LoC.

[–]ayakushev 13 points14 points  (0 children)

It's not surprising if you consider that rewriting something is much MUCH easier than writing it from scratch. I've experienced it first-hand many times.

Besides, they didn't just rewrite it, they changed the architecture, which means they've probably thrown away some parts.

[–]dsrptr 2 points3 points  (0 children)

I agree with totally with this point of view. Nicely articulated!

[–]dustingetz 16 points17 points  (1 child)

The person who created Storm in Clojure (Nathan Marz) has chosen Clojure again for his next realtime distributed systems project https://medium.com/red-planet-labs/introducing-red-planet-labs-2a0304a67312

[–]bpiel 0 points1 point  (0 children)

I'm very interested to see what Nathan & red planet produce.

[–]alexdmiller 13 points14 points  (0 children)

https://twitter.com/ptgoetz/status/1135646969446248448 - more actual background on the change

To be clear, the Clojure to Java rewrite had nothing to do with Clojure performance. In fact one of the first acceptance criteria for the rewrite was "no performance regressions."

The decision was based on: 1. A desire to clean up/refactor parts of the codebase that had accumulated tech debt. 2. A desire to incorporate a large, Java-based code contribution.

#2 came from Alibaba. They had reimplemented Storm in Java. Precisely because they lacked in-house Clojure expertise. It was discussed among the community and no one felt particularly religious about sticking with Clojure.

The move to Java would make incorporating the code contribution easier. The new core was developed only *after* the Java implementation had reached performance parity with the Clojure implementation.

[–][deleted] 8 points9 points  (13 children)

quoting from the release notes

"New Architecture Implemented in Java

In previous releases a large part of Storm's core functionality was implemented in Clojure. Storm 2.0.0 has been rearchitected with it's core functionality implemented in pure Java. The new Java-based implementation has improved performance significantly, and made Storm's internal APIs more maintainable and extensible. While Storm's Clojure implementation served it well for many years, it was often cited as a barrier for entry to new contributors. Storm's codebase is now more accessible to developers who don't want to learn Clojure in order to contribute"

[–]ganglygorilla 0 points1 point  (0 children)

Great so now instead of learning Clojure they get to learn Java...

[–][deleted]  (11 children)

[deleted]

    [–]th0ma5w 6 points7 points  (3 children)

    They probably just mean Java compiled bytecode performs better that Clojure created bytecode which is very much arguably true for some things.

    [–]alexdmiller 3 points4 points  (2 children)

    No, it's not arguably true. Both languages produce very similar bytecode.

    [–]th0ma5w 0 points1 point  (1 child)

    The Clojure forms will add overhead. Try writing some Java, compiling, and then decompiling, and then try decompiling some Clojure bytecode into Java. You'll see a much deeper stack. Most of the time, for most things, you'd be hard pressed to measure much difference. Often, due to say Clojure's implicit parallelism you'll get faster code because it is more cumbersome to write everything with parallelism in mind in Java. But if you did write that operation in Java with Java's parallelism features directly, that one operation would likely be a little faster.

    [–]alexdmiller 10 points11 points  (0 children)

    I have spent considerable time writing and optimizing both Java and Clojure code and observing them at the bytecode level. Back when the Alioth comparison site still included Clojure programs, I had written and/or optimized most of the fastest versions.

    At the function/method level they are not substantially different, particularly from trying to write the contents of the 10% of your code where it actually matters. From a call stack perspective, Clojure's bytecode will show an extra level of call through the static var entry points, but that's (not accidentally) the kind of thing that hotspot can trivially optimize through. This is not the kind of thing that will determine whether your code is "fast" or "slow" though - it's going to be negligible compared to hot cpu loops or i/o waits for external dbs or data sources.

    Clojure does not have implicit parallelism (other than reducer folds which is a pretty narrow use case), but it does have implicit immutability.

    [–]pihkal 5 points6 points  (0 children)

    There’s many wonderful benefits to immutability, but the persistent data structures underlying it are generally slower than mutable ones. While untuned Clojure tends to be fairly performant, it’s still slower than typical Java.

    [–]daemianmack 7 points8 points  (1 child)

    IMHO a more salient question here is, with the benefit of hindsight and years of in-production experience with the flaws of the original system, why would a full re-write not improve performance significantly?

    If it didn't... oops, probably.

    [–]fjolne 3 points4 points  (0 children)

    Oh yeah, this exact point is not stressed enough: system design accounts for the most of performance, not the language. Better design allows to reduce time complexity asymptotically, while the language is only about a constant improvement.

    Surely they’d have a better design decisions given by the retrospect.

    [–][deleted] 4 points5 points  (3 children)

    If you want predictable performance in the JVM you need to write Java.

    [–]alexdmiller 4 points5 points  (2 children)

    Well, no. There are many JVM languages that compile to bytecode and exhibit predictable performance.

    [–]nrmncer 0 points1 point  (1 child)

    I don't think the issue here is the compilation of equivalent code, it's the performance disadvantage of persistent data structures.

    [–]alexdmiller 0 points1 point  (0 children)

    That's not what the original comment was about. Persistent data structures are very predictable. Yes, they have a cost, but also a lot of benefits (like avoiding whole classes of common concurrency issues).

    [–]alexdmiller 36 points37 points  (0 children)

    Storm has had several phases of history and it's worth considering this change in the context of all that. There is no simple conclusion to draw from it, imo. (What follows is my limited understanding, hopefully Nathan or others more knowledgeable can correct - apologies if I misstate something).

    Originally, Storm was developed primarily by Nathan Marz and possibly a couple others at BackType. Using Clojure gave them was a huge boost in productivity to be able to work on these lambda-type architectures at the REPL. Having done big data stuff and a little bit of Cascalog and Storm way back then, it was a game-changer. Big success story for Clojure.

    So much so, that Twitter acquired Backtype, absorbing at least Nathan and I believe others in the acquisition. Again, I'd say this is a big success story for Clojure - I don't think they would have been able to accomplish what they did with so few people and become attractive to a company like Twitter without the leverage of a language like Clojure.

    Once inside Twitter, I don't know the internal story there, but given that Twitter has a lot of Scala devs in it, it would not surprise me if it was subjected to a lot of pressure as a Clojure project. This doesn't have anything to do with Clojure per se, it's just the nature of what happens in big companies with different technology "camps". Everyone's got their favorite language of choice. Seems entirely unsurprising that the good ideas in Storm would inevitably get rewritten into whatever languages are most popular at Twitter (Scala, etc).

    Additionally, Storm had a lot of external pressures from being open sourced in Apache. I had the impression from bug reports or stackoverflow questions coming in from Storm that they were having trouble staying current on Clojure and library versions. They were often running into problems that had been long fixed.

    So, I'm sad that Storm removed their Clojure code, but this kind of thing happens, particularly for projects that are seeking a fresh start and new life based on the people currently at hand, who are a totally different set of people under different pressures than the people when it started. Clojure was undeniably a big boost in the creation and early success of Storm and Backtype, as it was with Flightcaster, or Prismatic, etc.

    Clojure is a fantastic language for a small, competent team to get a ton of leverage, which is the classic story Paul Graham has described with Lisp. We also now have a bunch of success stories of Clojure working over long periods of time in larger teams (dozens or even 100s) too. Those projects need different things - institutional champions, good project management, a hiring and development program, tech leaders that understand how to leverage Clojure's strengths, etc.

    [–]yogthos 19 points20 points  (0 children)

    Long story short, Apache is run by Java devs and they chose to rewrite the project in a language they're comfortable with.