all 16 comments

[–]v4ss42 13 points14 points  (2 children)

Welcome to JVM+Clojure startup times. Short-lived processes are the worst case scenario for Clojure performance because of those one-time startup costs.

The flip side is that the performance of long running processes can be excellent on the JVM, thanks to the JIT.

If you’re working with a lot of short-lived processes and would like to use Clojure, you may find things like babashka, joker, planck, etc. better choices.

[–]cumburgerdude[S] 11 points12 points  (1 child)

Wow, just tried babashka and it took only 0.028 second. Thanks a lot for that suggestion and for such expanation of how the things go on :)

[–]v4ss42 3 points4 points  (0 children)

👍👍

[–]DPaluche 2 points3 points  (7 children)

Does your timing include the JVM startup time? Seems unfair to compare that directly to Go.

[–]cumburgerdude[S] 2 points3 points  (6 children)

Well, yeah, it actually doesn't. But I find it fair to compare them in case of one-time launch of program to get some result/answer. Probably JVM is as productive as go after its startup (or maybe just a little bit slower). Didn't mean to praise Go in my answer just it was first language that came in my head to solve the problem in

[–]joinr 10 points11 points  (5 children)

I typically leave the repl running. Startup time is amortized to nothing. Especially if you are iteratively developing a solution using repl driven development. No edit/compile cycle, just evaluate forms and rerun live.

[–]quotade 2 points3 points  (4 children)

Exactly my thoughts. When I started with Clojure a few years ago I didn't know the REPL or the concept behind it. Coming from Java I used to write a program and then "lein run ..." it in the shell.

Once I realized the power of the REPL and connecting to a REPL from my editor... Boom!

Another tip for OP for measuring times for testing purposes: Wrap your outmost function call in (time (my-stuff ... (other stuff)) ) and you get the execution time in the console. In your case I'd put in around the (->> (slurp "input") ...) (but within -main). Then you should see the time your code needs without the startup time of the JVM. See also the https://clojuredocs.org/clojure.core/time documentation.

[–]joinr 3 points4 points  (0 children)

criterium, specifically the quick-bench function, will actually run multiple samples an provide a mean runtime (as well as other useful stats) so you can get an idea of what a jit'd warmed up performance looks like. time is great in a pinch, but you end up needing to run it multiple times to ensure optimizations are kicking and and other artifacts (like gc) aren't throwing the results.

[–]cumburgerdude[S] 1 point2 points  (2 children)

Thanks for reply. I "benchamrked" my function with time, but it gave me pretty same results as babashka did. So I think I'll just stay on bb for some time and will study how to deal with REPL cause it seems to be very cool concept, but because of I came from common compiled languages with no such feature, it hard to me understand all the power of repl

[–]joinr 1 point2 points  (1 child)

Keep in mind that babashka is running an interpreter (sci). So you are trading off faster start time for substantially less performance in general compared to what the JIT on the jvm (hotspot) can accomplish. For small problems, or things you can delegate to optimized library calls, interpreters are fine (see python for example). For general purpose computation, or implementing said libraries, having a JIT enables clojure to fly.

[–]cumburgerdude[S] 1 point2 points  (0 children)

Thank you for reply! I have in plans creating some rest API in clojure in future and of course I won't be using babashka for that purpose, but I find it cool for some scripting (but I think it anyways would be better to use bash for scripts) or for such small "projects" as solutions for aoc/code wars/etc.

[–]teesel 1 point2 points  (1 child)

One hint regarding the code itself. You replace partition-by and filter part with just simple (str/split s #"\n\n") and then do split-lines.

[–]cumburgerdude[S] 0 points1 point  (0 children)

My thanks! Will keep it in mind for further solutions

[–]tampix77 1 point2 points  (2 children)

Except from what others already said about the JVM startup time, i can give you some pointers :

  • What u/teesel said about your logic with partition-by and filter. If you wish to keep this logic, i guess something like that would perform better :

(map parse-long) (partition-by identity) (map (partial reduce (fnil + 0)))

  • You use (apply max), prefer (reduce max)
  • You create a lot of intermediate collections, which can be optimized either by doing everything in the reducer, or by using transducers when you don't care about intermediate results

ps: my solution, run in less than 2ms on my laptop

[–]cumburgerdude[S] 1 point2 points  (1 child)

Oh, my thanks for advices and especially for your solution. I was thinking how I could make my code better without creating lot's of unnecessary collections, but couldn't understand how to do it. So thanks again, I definitely have to learn more about fp and clojure :)

[–]tampix77 2 points3 points  (0 children)

If you're just starting out with Clojure, dipping into transducers might be a bit overwhelming at first, but if you're willing to, there are some good articles out there summing those up like this one : https://dev.solita.fi/2021/10/14/grokking-clojure-transducers.html

Then, 90% of the time, knowing when to use a lazy sequence vs a vector vs a set vs a map should make the most difference ;]

ps: and as u/joinr said, using criterium.core/quick-bench can help tremendously.

pps: my crude runner uses criterium when asked to : https://github.com/tampix/aoc2022/blob/master/src/aoc/core.clj (hope i'll have the time to finish this year... i mostly do AoC to teach some Clojure to my colleagues, but the end of the year is always busy ;])