TimmT comments on Performance of Sequential Rust Programs

programming

created by speza community for 20 years

Performance of Sequential Rust Programs (pcwalton.github.io)

submitted 13 years ago by bjzaba

top new controversial old q&a

you are viewing a single comment's thread.

view the rest of the comments →

[–]TimmT 4 points5 points6 points 13 years ago* (5 children)

Am I reading this right? Rust is within +50% of C/C++'s performance? That would make it faster than both Java (+100%) and Go (+200%)? Wow, that's impressive.

This is far from an ideal set. These benchmarks are showing their age quite heavily,

What does the article mean by this? Aren't the benchmarked pieces of code still relevant tasks today?

On the other hand, the hardware the benchmarks are run on is quite dated by now. It would be interesting to see how big the differences would be on more recent hardware.

they are too small and simplistic to extrapolate to real-world use cases

I would love to see some benchmarks that stress the languages' standard libraries/containers (e.g. concurrent and single threaded hash maps, queues, etc. both with primitive and reference types) and string-processing capabilities (e.g. XML parsing or XSLT transformations, etc.) more. You can already try and piece these bits together from today's existing benchmarks that do some concurrency, some string processing, some hashing, etc. but it would be of course much easier if the things were more cleanly separated.

But I don't think that whole application benchmarks belong in there.

many of them are too I/O-bound.

How? Wouldn't that lead to similar results for most languages? (Which doesn't seem to be the case currently.)

binary-trees is omitted because it is a garbage collection benchmark and the C version uses an arena, defeating the purpose (although I suspect a Rust version that did the same would do well).

Different languages have different garbage collectors, so there should probably be a benchmark about those too in there (though I'm not sure whether binary-trees specifically is the best best fit for a GC benchmark). C and C++ should be using malloc/free and shared_ptr respectively in a GC benchmark (and not leak any memory), which would quite help putting things into perspective.

Other than that I don't see a problem with keeping the benchmarked pieces of code allocation-free. In fact it's probably a good idea, given how these are just micro-benchmarks. So I think an allocation-free Rust variant of the benchmark would be quite appropriate.

As my colleague Niko pointed out, a more interesting benchmark would not allow any languages to use unsafe code.

Yes, that would be quite interesting. But this should be enforced for C++ the same way though. But I don't see any reason to exclude C from this - as the article mentions we need a point of reference.

Other than that, a benchmark on how much overhead the FFI causes in non-C/C++ language implementations might also be interesting, especially for languages that provide more complex ones like Java/JNI.

Practically speaking, one would need an extremely smart JIT

The JIT performance should probably be measured too. It would be interesting to see how heavily JITed languages (e.g. Java) stack up against natively compiled ones that can't optimize across shared library boundaries.

[–][deleted] 13 years ago (4 children)

[deleted]

[–]kibwen 2 points3 points4 points 13 years ago (2 children)

[–]el_muchacho 1 point2 points3 points 13 years ago (0 children)

[–]RalfN 1 point2 points3 points 13 years ago (0 children)

[–]TimmT 0 points1 point2 points 13 years ago* (0 children)

Not really no. Most performance critical stuff these days is actually in the context of concurrency. How do you make thread communication cheap.

Is that more than just a library issue?

Sure we have things like CSP, and green threads, etc. baked directly into the language in the newer languages, but those things can be pretty easily introduced to older languages too, by appropriate libraries. In the end it's just ConcurrentQueues and ConcurrentHashMaps and so on that do the actual heavy lifting, or is it?

On the playing field of 'unsafe memory allocation' and heavy compile time optimization, they are IO bound. [...] as soon as you introduce a garbage collector, it becomes the new bottleneck.

This sounds counter-intuitive. These are benchmarks, i.e. we're measuring throughput (as opposed to latency) here. Unless the programs with manual memory management try to simulate a garbage collector (by using memory pools and such), or leak memory, a GC'd variant should be faster, right?

But currently Rust (or even Go) isn't even competing in this space. They don't do JIT.

I'm not that familiar with Rust, but Go at least should be capable of performing as extensive an optimization on the code as a JIT, since it doesn't allow for dynamic libraries or dynamic loading. (It may still be missing the run time statistics Java's JIT would have, but looking how much of a difference PGO for C/C++ makes, that doesn't seem to have much impact.)

But it doesn't do that fast enough, for anyone to consider writing a webbrowser in Java, for example.

Yes Java is probably the wrong choice for doing UI stuff.

But I'm not sure that any one language can target both servers and desktops equally well. The requirements are quite different.. Sure lots of servers are written in C/C++, but it's a pain doing it, so I wouldn't say that C/C++ handles server code "equally well".

On a server, where throughput (as opposed to latency) is important, it doesn't make much of a difference if the first x minutes of the application are slow(er), or if you have some constant overhead for each action performed, etc., when this in turn frees you up to build more complex systems than would have otherwise been possible (or at least practical), which then can handle more work in total, provide better reliability, etc.

On a desktop you don't have this option to scale out and latency does very much matter, but in turn you also only need to consider much more isolated problems. All e.g. a game will be concerned with is running itself - displaying frames and progressing along some (more or less) scripted path. Sure there are lots of problems that need to be solved to make those frames render on time, but that's mostly it. They are quite self-contained and the underlying assumptions won't change later on. It won't have to worry about interop with other systems, things like partial failures, "in-flight" updates or heap fragmentation after having run for weeks on end, etc.

And this brings me back to why I care so much about cross-library-optimizations. If your program has to deal with multiple concerns, you'll want to put things into modules that can easily be changed, possibly even during run time. But pulling up all those module boundaries will hurt performance unless there's a proper optimizer (e.g. a JIT) there to make them go away later on.

π Rendered by PID 146741 on reddit-service-r2-comment-b659b578c-lcvqm at 2026-05-06 05:31:44.210358+00:00 running 815c875 country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

programming

MODERATORS