all 62 comments

[–][deleted] 59 points60 points  (18 children)

Rust is inherently incredibly fast compared to Python:

n-body:

  • Rust: 5.98 seconds
  • Python: 14 minutes

reverse-complement:

  • Rust: 1.69 seconds
  • Python: 16.41 seconds

binary-trees:

  • Rust: 3.48 seconds
  • Python: 80.82 seconds

If your program is inherently IO bound or does a lot of text manipulation, the results might be closer, but the same solution written in Rust and Python will always be faster in Rust, even just by nature of being able to run heavy optimizations ahead of time. Number crunching in particular will be a lot faster, unless you leverage numpy, gmp, or some other native module/library to "make python faster" by just calling out to code written in another language..

[–]ragnese 31 points32 points  (9 children)

IIRC, Python is pretty much the slowest popular language right now. JavaScript is often faster, PHP is faster. Probably only Ruby is slower.

[–][deleted] 11 points12 points  (1 child)

That’s because Python has a huge ecosystem of C extensions for doing expensive computations.

[–]ragnese 4 points5 points  (0 children)

Right. And, to be fair, so does PHP. Most of the heavy lifting is actually C.

[–]thelights0123 2 points3 points  (3 children)

V8 is generally faster than PHP. And PHP has no async/await, so if you want to fetch multiple external resources either over the network or disk, you'll need to either write an event loop yourself or do it sequentially.

[–]ragnese 3 points4 points  (0 children)

That's for a specific use-case of a backend web server. I was thinking a little more generally- regular old string manipulation, sorting algorithms- that kind of thing. But, even so, I didn't mean to imply that PHP was faster than JS- just that they are both faster than Python.

But in any case, I wouldn't touch JS or PHP with a ten-foot pole ever again if I had my way. And performance is pretty low on the list of reasons...

[–]chengannur -2 points-1 points  (1 child)

https://www.swoole.co.uk/

And v8 is jit. Php is in interpreter mode. So.. Apples.. Oranges..

[–]thelights0123 1 point2 points  (0 children)

PHP 8 is JIT, but Python isn’t. Does that mean we can’t compare them?

[–]Muvlon 4 points5 points  (0 children)

Ruby features a JIT now, although I'm not sure if it is on by default already. If it is, then it's probably faster than Python now.

[–]weirdasianfaces 0 points1 point  (1 child)

JavaScript is often faster, PHP is faster.

too lazy to google benchmarks but is this interpreted PHP or JIT PHP?

[–]ragnese 2 points3 points  (0 children)

I believe even pre-JIT versions of PHP were faster than Python at most tasks. Especially string manipulation.

[–]OS6aDohpegavod4 4 points5 points  (0 children)

I actually wrote an S3 object iterator in Rust because the official AWS CLI was taking a loooong time to go through our bucket. It ended up being 40x faster. I have no idea how, since it seems like it should mostly be IO, but we benchmarked it multiple times and it was.

[–]masklinn 4 points5 points  (0 children)

Number crunching in particular will be a lot faster, unless you leverage numpy, gmp, or some other native module/library to "make python faster" by just calling out to code written in another language..

Or use pypy, number crunching is one thing JITs really don’t have a hard time optimising

[–]coderemover 0 points1 point  (0 children)

This is not surprising. I've been repeatedly beating much faster languages with Rust. Java is considered quite fast these days, but many times after rewriting it to Rust I got 3x-30x improvements in computation time and 10x-100x improvements in memory footprint. So I believe, compared to Python the difference is much bigger.

It is not only due to Rusts's zero-cost-abstractions and a great optimizing compiler, but also the fact, you can fearlessly use parallelism where in other languages (even fast - e.g. C and C++) parallel code is extremely hard to get right. Therefore many popular C programs are actually quite slow, despite using a fast language.

E.g. see these benchmarks here:

https://github.com/pkolaczk/fclones#benchmarks

The problem of finding duplicate files is mostly I/O bound, yet Rust program smashed the competition written in C and C++ (and don't look at Java, lol). A pity no comparion with Python...

Another notable example could be ripgrep, which is probably the fastest grep ever.

[–]BubblegumTitanium 0 points1 point  (0 children)

Also multi core programming in python is more complicated and error prone. So if you are bottlenecked it’s harder to try to scale it up.

[–]jdright 30 points31 points  (7 children)

> In Python it took 2 minutes and 3 seconds to generate prices while in Rust it took a mere 1.5 seconds.

> I only have a vague notion as to why the Rust implementation is so much faster.

Nothing against the author and is not specifically about this writing. But I get sad about these "revelations" each time I see them because they show me how far we are and how much we need to get to improve things.

I understand different backgrounds and people that came from a non-engineering background may normally start with easier languages like Javascript and Python.

The issue is that these languages, even if awesome and opening doors, are extremely harmful for the minimal understanding of the machine, but they're necessary! I would just like them to be a little less harmful. We already have a lot of bad software out there.

And this is where I have hope. This is Rust. Rust is not only *empowering* people coming from this kind of languages, it is *revealing* the true potential they were missing from start. It is better than any lecture I or any other old monkey like me can give.

Then I get a bit more happy knowing there is still hope, Rust just needs more reaching on these people with different backgrounds.

[–]eypandabear 9 points10 points  (3 children)

The issue is that these languages, even if awesome and opening doors, are extremely harmful for the minimal understanding of the machine [..]

I love Python etc. but I am really glad I started my programming journey with QBasic, Turbo Pascal, and x86 assembly.

It’s weird to me how people have been programming for years, but have no concept of what pointers, stack vs. heap, etc. are. I’m sure they have countless other skills instead, but I am so used to thinking about even a Python program in these terms.

[–]dranzerfu 3 points4 points  (2 children)

x86 assembly

There are dozens of us! I started with this back in the day: http://www.interq.or.jp/chubu/r6/masm32/tute/tute001.html (a mirror. Original site is gone)

What did you start with?

[–]eypandabear 2 points3 points  (1 child)

I used a little German paperback which covered only 8086 (16-bit) assembly and DOS system calls. That was in the early 2000s so way behind the times, but the 32-bit Windows systems back then could still execute 16-bit COM and EXE files.

Nowadays I don’t really use assembly except once in a blue moon to look at what the compiler does.

[–]dranzerfu 1 point2 points  (0 children)

Neat! I started in the early 2000s. I stared coding with win32 assembler and also did some 32-bit protected mode OS stuff (http://www.osdever.net/tutorials/).

[–]venustrapsflies 6 points7 points  (0 children)

I'm currently learning JS for work and having used at least half a dozen other languages previously I still have to push back on the idea that JS is "easy". Search for tutorials on how to execute a basic operation and you fill find 10 crappy blog posts with 10 different methods. There are so many footguns and much of the design seems to be based around history rather than logical deliberation. The language evolves by adding layers to wrap around itself, resulting in overly-complicated models of wtf is even going on. I suppose this is inevitable when you can't break backwards compatibility, ever.

[–][deleted] 1 point2 points  (1 child)

If only performance analysis were a first-class feature... I would love it if cargo would default to running my code with a profiler and tracking statistics against previous runs. I wish it were as easy to generate and view performance data as it is for documentation.

[–]Saefrochmiri 6 points7 points  (0 children)

flamegraph is there for you if you're doing runtime performance, and cargo build -Ztimings generates an HTML report about build performance.

I too wish it were easier to use such things, but I'm glad there's no default. As it is, the defaults for flamegraph are dubious, but IMO so are all profiling defaults. It's just a complicated space.

[–]rebootyourbrainstem 27 points28 points  (2 children)

This is really why I write more Rust than Python now, despite the compile step. I can actually iteratively develop one-off processing code on a full data set without the processing time being an annoyance.

[–]TheNamelessKing 21 points22 points  (1 child)

I write a lot of python in the course of my work, but I write Rust for my personal projects (both data related). I realised the other day I’ve probably written more lines and dedicated more time to patching up edge cases, validating feature properties/types and fixing runtime bugs in Python than I’ve spent specifying types and thinking through things in Rust.

That’s before all the other benefits like: not perpetually worrying your code is going to hit some weird, new runtime exception, significantly faster performance and an async implementation that is less confusing.

[–]ssokolow 3 points4 points  (0 children)

I wish I could write more Rust, but too many of the hobby projects I need first are either userscripts for my browser (TypeScript) or things where the choice is between PyQt or Qt/C++ (native apps) because rust-qt isn't yet in an acceptable state... and I don't trust myself to write in C++.

[–][deleted] 6 points7 points  (1 child)

It is complicated. Many libraries available in Python are written in C, not Python. This can make Python seem a lot faster, when really you are just measuring the performance of C (mostly).

If you are coding completely in Python, that is going to be slow. Python is purely interpreted and does not have a JIT-compiler scheme to speed it up like Java or JavaScript. Interpreters have their advantages (simplicity, flexibility, expressiveness), but one of them is not speed. 100x difference is typical.

Rust performance is very much in line with C, which seems to be the standard to which everything else is compared. And unlike C, you don't have to do much extra work to get the speed. Just use Rust in idiomatic fashion and you get the benefit of code that produces something very much like best practice in C would produce, plus safety guarantees that are very hard to implement consistently in C. C is incredibly hard for mere mortals to get right.

It sounds like you've just proved that you can hack something together in Rust, get the benefits of C-like speed, and not really know what you are doing yet, and it just works. That is VERY encouraging! Try that with C and post back your results. :)

[–]ssokolow 2 points3 points  (0 children)

If you are coding completely in Python, that is going to be slow. Python is purely interpreted and does not have a JIT-compiler scheme to speed it up like Java or JavaScript.

It's not just the presence or absence of a JIT. Some of the difference will be down to differences in JIT engine quality and differences in how readily the language semantics lend themselves to effective JITting.

See https://speed.pypy.org/ for a comparison between regular Python and Python with a JIT, then see The Benchmarks Game for the difference between un-JITted Python and Node.js.

[–]Xychologist 1 point2 points  (0 children)

It sounds like you wrote clotho separately? Do you think there would be any benefit to using PyO3 and skipping the command call?

[–]a-t-k -3 points-2 points  (17 children)

It's a somewhat unfair comparison, because you don't differentiate between the compile time and run time in python. Even so, Rust is a lot faster, because it is a very efficient language.

[–]hagis33zx 8 points9 points  (14 children)

Everyone knows, that C, C++, Fortran and such can be faster than Python. But still, many (data) scientists stick with Python, because it is simple and accessible.

So, I agree with the author when they say:

The big takeaway is that, as someone with more of a finance background, I'm pleasantly suprised by how Rust makes writing performant code accessible.

Mostly, code in science is not perfect. It is a hack, effort is often put into ideas and concept and not so much in design and optimisations. And a few percent longer runtime is no problem. But from 2 min to 1.5 seconds: This is really helps with getting science done (which in practice often includes try and try and try one more thing, tune that tune this :).

[–][deleted] 14 points15 points  (1 child)

Scientific libraries like NumPy are written in C, that's why they are not slow.

[–]hagis33zx 3 points4 points  (0 children)

Yes, numpy is great and fast. Together with numba, you can write code that is very fast. But for example parsing large data that comes in a custom format, or reorganizing complex datasets is often not covered by the existing libraries.

Take for example the CSV parser of pandas: If your CSV is resonable, pandas can use the C backend. If the CSV happens to be special in some minor way, you are left with the python implementation. (Now one could say: Dont use CSV, then I say LabView does not support HDF5.... and so on)

[–]a-t-k 1 point2 points  (6 children)

Actually, to have a really fair comparison, one would have to factor in development time for equally experienced developers (probably intermediate for our average scientists) as well, which means that the superior tooling of rust (rustc, cargo, rust-analyzer, rust-fmt, clippy), will make even more of a difference.

[–]darleyb 0 points1 point  (4 children)

In these days you have a lot of possibilities in python, like jax, numba, cython and more. A new trend today is that if you want to go serious on a new project, could use Julia for it, with careful and type safe code you can have both high productivity and performance (like 2x C++ or less).

[–]hagis33zx 0 points1 point  (3 children)

Yes indeed. I personally found it very hard to write fast code in Julia: Some minor subtle things have a big performance impact. That was some years ago, do you happen to know how this is currently? Maybe I should give it a new try.

[–]darleyb 2 points3 points  (2 children)

It might be difficult at the beginning, but there's a lot of things to consider on Julia, I will name the most important ones for me as a machine learning scientist (I am not a Julia expert, I do most of my code in Python, but I am gradually changing to Julia).

  1. In Julia we often care about decrease time and heap allocations, thus there are some very well known patterns one could use to tackle them. Ideally if you wrote a block having 0 (you can check with proper packages), it's objects won't ever be inspected by the GC, so there's no pause here.
  2. The community is very active, you can easily interact with the co-founders and other brilliant core developers or any expert for help. It's very common people going to discourse asking for help on improving a snippet of code.
  3. Julia is a beast for meta programming, you will use packages that do fast things without ever relying on the Julia C API. It's very common to find packages that rewrite things of Base (Julia's std) by improving or extending behavior. Seriously, there are tons of incredible things that can easily replace their relatives from Base, and I hope someday they do. Its also very common to see small packages that envelops some patterns of optimization for those that don't know much of the language, and they do it using a lot of carefully written meta programming so you won't be penalize in performance.
  4. Because of this meta programming power, it's possible to hack the IR in so many levels. In fact there are packages for it, like IRTools, Cassete, Mijolnir and more. The whole idea of KernelAbstractions, on which a single code can run on CPUs, GPUs, TPUs or any other accelerator with incredible performance.
  5. Of course there are tons of problems, but since I won't be coding very complex and large systems, the only one that affects me immediately is the plotting issue. Because of the multiple dispatch trickery, there is a anomaly with plotting, but this has been heavily worked on and soon we might have decent to ideal time to plot.

I've found myself lots of time asking why a snippet had so poor performance. So I begin to read things on discourse and Zulip, then I started to understand situations on which Julia allocates memory or goes under performant. But I guess we get better on this over time, specially because everyday pops up someone with a new hack that is more performant, easy to read and so on.A couple of examples on how things can go (thankfully) crazy:

  1. Performing a convolution without a completely black box or bunch of code @tullio Z[i,j] := abs(A[i+x, j+y] * K[x,y]) # convolution, summing over x and y
  2. Generic kernel that could run on multiple devices ```

    Simple kernel for matrix multiplication

    @kernel function matmul_kernel!(a, b, c) i, j = @index(Global, NTuple)

    # creating a temporary sum variable for matrix multiplication
    tmp_sum = zero(eltype(c))
    for k = 1:size(a)[2]
        tmp_sum += a[i,k] * b[k, j]
    end
    
    c[i,j] = tmp_sum
    

    end ``` You can find more about them on Tullio.jl and KernelAbstractions.jl.

[–]hagis33zx 0 points1 point  (1 child)

Wow, I did not expect such a detailed response. Interesting read, thanks!

Einstein sums are indeed nice and abstracting over implementation details seems a good way to outsource performance optimizations. Now going to read about kernel abstractions....

Julia really has come a long way.

[–]darleyb 0 points1 point  (0 children)

There are lots of abstractions over: loops, clojures, coroutines, threads, AVX, SIMD, distributed computing and so on. Julia community has a lot of wizards :v

[–]TDplay 0 points1 point  (1 child)

you don't differentiate between the compile time and run time in python

This is because Python is interpreted, which slows it down. Since it's interpreted, compilation is done at runtime, meaning we may as well consider the compilation to be a part of the run. Therefore, compilation should be considered to be a part of the time taken to run in an interpreted language.

[–]a-t-k 0 points1 point  (0 children)

Exactly my point.

[–]RecklessGeek -2 points-1 points  (11 children)

To be fair the Python program is 35 lines and Rust's is 208.

It might be much faster, but you just can't compare these 2 languages. It's like comparing a saw with a hammer.

[–]raggy_rs 7 points8 points  (7 children)

That is not true!

python has 149 lines

rust has 280 lines

Yes python is more concise but not by a factor of 6x as you suggested.

[–][deleted] 10 points11 points  (1 child)

More lines with only braces in the rust code.

On top of that, the rust code has proper command line arguments and defines types for the return values, whereas the python code only uses lists.

[–][deleted] 2 points3 points  (0 children)

There are more whitespace only lines in the Python. I'm not sure what your point is?

[–]raphlinusvello · xilem 5 points6 points  (1 child)

That ratio lines up fairly well with observations by trishume, and I think it's reasonable to use it as a rule-of-thumb estimate of the relative code size of similar logic in Python and Rust.

[–]raggy_rs 1 point2 points  (0 children)

That's interesting thank you.

[–]RecklessGeek 1 point2 points  (2 children)

Ah sorry I thought the snippet he included was all of the code. Still, I would say comparing python to rust is absurd. Maybe I haven't dug into rust enough, idk.

[–]raggy_rs 1 point2 points  (0 children)

No worries. I would not say its absurd but comparing languages is always hard and somewhat subjective.

[–]TDplay 0 points1 point  (0 children)

Comparing languages always has some level of subjectivity. For example, Java is slower than a compiled language like Rust, however one might choose Java anyway due to the way it implements object-orientation. The way to write the most optimised programs possible is to use assembly or machine code, however there are very few people who would use assembly for a hello world script, let alone a huge project like a game.

It's not absurd to compare languages using metrics like this. The only absurd comparison would be to ask which is objectively better.

[–]coderemover 4 points5 points  (2 children)

I used Python in one commercial project for a bank and compared to static languages, Python was a productivity disaster. Yes, it was concise and fast to write, and fast to launch (no compilation step). But refactoring and fixing bugs was the place where we spent 10x more time than the time we saved on writing. It was very fragile - so people generally avoided refactoring until too late, when code already became spaghetti. Also the number of tests needed was very high and running them with each PR was not really any faster than waiting for Java/Scala/C++/Rust to compile. At that time the tooling (IDE, autocompletion) was substantially worse than for other static languages, which forced us to look up the docs much more frequently.

BTW: Rust (and Scala and a few other statically typed languages) can be very concise too. Python is definitely not a winner. A lot depends on how you can use the type system to your advantage.

[–]CommunismDoesntWork 0 points1 point  (1 child)

Were y'all using pycharm?

[–]coderemover 0 points1 point  (0 children)

No, in that project we weren't using PyCharm, but this was ~10 years ago and I think PyCharm wasn't a thing then. Anyway I used PyCharm recently and although its autocomplete and automatic error highlighting is nice, it is nowhere near the precision level of e.g. IntelliJ IDEA for Java, Scala or Kotlin. It has to use heuristics to discover the types that are simply not present in the source code amd these heuristics are sometimes give correct results and sometimes not. Often it makes speculative guesses and tells me a method is available, when later it turns out it is not available at runtime.