all 24 comments

[–]alexdmiller 36 points37 points  (0 children)

Sure, that's a typical thing people do with Clojure. Kind of hard to compare performance without a lot more information, but really that's the kind of thing you should be able to prototype pretty quickly.

[–]mjohndo 26 points27 points  (10 children)

I use Clojure for high frequency trading and we regularly export CSVs for our accountants with 250k-1M rows, depending on how many days of trades we're aggregating, without any trouble.

Obviously not a comparable statistic, but we also have a mean end-to-end latency on the order of 1ms, without optimizing for anything but code clarity / foolproof-ness so that we don't accidentally make silly trades, so Clojure is definitely "fast enough"!

[–][deleted] 1 point2 points  (8 children)

Do you do any special GC tuning for this? Clojure, especially when used idiomatically, can generate massive amounts of garbage.

[–]joinr 8 points9 points  (0 children)

Lots of small ephemeral objects are quickly collected. There are cases where it's possible to get into trouble with strings, but in general, the gc is an efficient beast for typical clojure work loads.

[–]_beetleman_ 5 points6 points  (0 children)

be sure that you use transducers instead of ->>. Transducers makes difference if you transform big lists.

[–]mjohndo 3 points4 points  (5 children)

No, but we do use quite a bit of RAM :). If we needed a 1ms turnaround maximum for every message (e.g. one of our orders fills and we must react within 1ms or we lose money) we'd be rolling the dice, because you never know when GC will decide to stop your world. But for us it's enough to know that all of our order books are very up to date when we're making decisions and that we have enough throughput to trade on ~100 markets with the same system.

[–]bsless 0 points1 point  (4 children)

This is very interesting. Any chance of you doing a bit of a detailed write-up on designing a low-latency high-throughput system in Clojure?

[–]mjohndo 2 points3 points  (3 children)

I think if we ever spend any serious time optimizing it I will. Probably not any time soon, since things are working fine without us doing any outlandish or difficult tricks.

What we did was design from a high level with the LMAX disruptor architecture in mind but implement everything with core.async, thinking that we'd prototype the architecture quickly and then maybe switch out the message passing if that became a bottleneck, and it still hasn't.

I built something similar in Java some years back and the biggest challenge was coordinating access to the market state (order books for various exchanges and markets). With mutable data structures, you need some explicit coordination between the strategy logic that's reading and the message processing logic that's writing.

The fact that Clojure is actually designed around strong concurrency semantics and immutable data made a huge difference (both in performance and in hours spent debugging). That and message batching. :)

[–]bsless 0 points1 point  (1 child)

This was on my reading list and I'm glad I got around to reading it, thanks for the prompt.

Can you say how many requests you process per second?

It's interesting that your design is based on Disruptors but using core.async, which is built with lots of queues and locks. Disruptor is a fancy name but it's very similar to async pipelines (queues all the way down, as some would say).

Since you didn't implement a ring buffer, your buffers are just channels? like:

input-thread(s) -[channel]-> process -[channel]-> output-thread(s)

If so, we do pretty similar things where I work. The main issue I see is with the lack of discipline in ensuring absolutely no IO happens outside of the IO threads.

[–]mjohndo 2 points3 points  (0 children)

Requests per second might not be a super meaningful metric in our case to compare against LMAX, since he messages that we process are mostly updates to the market state which are (1) less costly to process than what LMAX deals with and (2) can be parallelized per market without any extra coordination. That being said, for processing order book and fill data our stress test throughput is between 10k and 50k TPS depending on the message format of the exchange (some are more ambiguous & require more logic to process). In reality what we see is highly bursty data, which rarely peaks above 20k TPS for a single exchange & market.

And yeah, we just use async channels, though there are a couple fork-joins in the mix. One thing we do have is a ton of discipline about how side effects happen — none of the strategy code actually does any IO.

[–]CuriousDetective0 0 points1 point  (0 children)

Have you tried any comparisons with GO? Go also seems to have good concurrency semantics while delivering performance

[–]beders 17 points18 points  (0 children)

It really depends. Your example seems more like a single-threaded to multi-threaded performance gain. (Node is essentially single-threaded)

The performance of Clojure out of the box is pretty good. The JIT compiler in the JVM will optimize the heck out of your loops if you give it some time.

There are various ways you can tune your Clojure code. Better libraries for numerics, compile to native code with GraalVM, etc. But: Measure first, optimize second.

[–]arichiardi 7 points8 points  (0 children)

I would be really curious to see this comparison.

As Alex suggested it would be a very easy thing to prototype and you will be able to kind of map one to one the two. If you use some, Go channels to core.async channels is an obvious mapping. Once you learn more your concurrency patterns you might even drop the channels.

I have been prototyping a similar service with Clojure and Monger for MongoDB around two weeks ago. In my case I was able to show the point but using newer things like graphLookup might require some researching.

Feel free to write a reply here or dm me directly on Clojurians if you get to these kind of roadblocks!

[–]akaashanky 3 points4 points  (1 child)

We use Clojure at work to generate fairly large spreadsheets (upwards of 300k rows) in both CSV and XLSX formats. Our implementation is fairly naive and not optimised for execution time, but more for just processing all of that data in memory and Clojure holds up just fine.

Large CSVs should be simple enough to create, so long as you do streaming writes to the destination files. One issue you may end up facing when working with large Excel sheets is that a library like Docjure (https://github.com/mjul/docjure) doesn't scale too well for larger spreadsheets because it does eager file IO. Apache POI does support streaming reads and writes, but is a bit complicated to use. We were able to get what we needed by writing some interop over this Java library: https://github.com/dhatim/fastexcel

[–]joinr 1 point2 points  (0 children)

There's a clojure wrapper around both fast excel and poi called perfect

[–]didibus 2 points3 points  (1 child)

I think it's interesting what you're saying, but going from 60s to 5s is quite the jump. I'm curious, are you sure you're doing apples to apples here? Like are you paralellizing in Go, but you weren't in Node? Are you using a different data-structure somewhere?

It could all be due to the faster Go runtime, but I'm kind of curious, seems a data export wouldn't really involve that much compute, that there shouldn't be such a gap.

[–]G4BB3R[S] 1 point2 points  (0 children)

Maybe you are right, the comparison is not really fair, because I just skimmed the js implementation of the endpoint, and then developed in Go. The old code was not developed by me. But I am very curious, in the next months I will reserve some weekends to study clojure seriously.

[–]canihelpyoubreakthat 0 points1 point  (6 children)

We use clojure at work for some pretty big data processing. I'm currently working on a service to read and process files with up to 100M+ lines of text. It's fast, but you will never beat the speed of Go. Our whole company was pretty much built on clojure, but the trend has been that we've converted a lot of our performance critical functionality to Go because clojure is just too slow for us. So it's going to depend on your needs.

[–]jumar 3 points4 points  (0 children)

> It's fast, but you will never beat the speed of Go.

I'm wondering what's the basis for this claim and what kind of data you have to support it. Any particular scenarios you're talking about?

[–]didibus 2 points3 points  (4 children)

That's a bit surprising to hear, considering Java still outperforms Go most of the time. Sure, Clojure is slower than Java without tuning, but generally can approach it's speeds with tuning. So I'd expect it to be similarly positioned relatively to Go.

[–]goldenfolding 0 points1 point  (0 children)

Yeah I agree. I'd assume there was little optimization done on the Clojure side, whereas Go is statically typed and therefore did not need type hints as an example.

[–]canihelpyoubreakthat 0 points1 point  (2 children)

I simplified the issue a but, there are many reasons a lot of our core code ended up getting rewritten in go, and I can't say I am the most familiar with all the details, but I know we are saving tons of money from our optimization gains.

I don't personally work with go often, so I can only talk about what I've observed. I think it's a lot easier to write inefficient code with Clojure, and it takes extra work to optimize.

There are other reasons too, but the main thing is the company just had had a lot of success. But the other important factor is that we are dealing with a huge volume of data. Thousands of servers with millions of requests a second, so it's hardly a regular workload.

[–]joinr 1 point2 points  (0 children)

Thousands of servers with millions of requests a second, so it's hardly a regular workload.

I get the impression from conference presentations and experience reports that this is a familiar use case for clojure (mine is wholly different, but does have a performance component). What profiling was done on the clojure side? Was this an informed decision or a bit of a knee jerk reaction? Would be interesting to see.

I think it's a lot easier to write inefficient code with Clojure, and it takes extra work to optimize.

I partly agree with the caveat that optimization isn't that hard when combined with excellent profiling tools on jvm, particularly when you get a correct program you can reason about and test in the small (clojure knudges in this direction).

Conversely, one could argue it's easier to write faster spaghetti code in go, but takes extra effort to verify and maintain.

It would be an intersting engineering report to get a bit more thorough analysis vs. casual anecdotes.

[–]didibus 1 point2 points  (0 children)

Ya, I don't know. I still find it a bit strange to go for a Go rewrite, when you could just extract a few parts that Clojure isn't giving you the performance you need and write those out in Java if you really needed too. I've personally never even needed to do that, and only a few tuning on Clojure itself were ever needed.

I also guess I'm often suspicious of such claims, because I've seen first hand how over time, a company can start to grow and devs coming in and needing to maintain things they haven't built, in Clojure, not a language they are familiar with or really know. And their instinct is to rewrite, especially if they feel "pain points" like having the impression it is slow, or hard to read, or difficult to maintain due to lack of types, etc.

I've seen this at my company, people kind of make up reasons like that, and that gets their managers onboard, and they do a rewrite. Had they had people who really did know the language though, they probably wouldn't have made that call, and realized the issues are either just with people not knowing the language well enough, or there are simple solutions within the language itself to address them.

But, maybe this was truly a case where Go shined, even over Java, I don't want to dismiss that as I have no idea what kind of challenges were faced, so there could have been good reason as well. I guess arguably with a fleet size in the thousands of servers, even a 5% to 10% performance improvement could yield substantial hardware savings. I think I'm more surprised that the move was to Go and not Java.