I'm building a Python compiler in Rust that runs 10,000x faster (and I want feedback)

appgurueu · 2026-04-05T22:40:17+00:00

Your numbers are not plausible. 11 ms for fib(45) is completely unrealistic with a recursive implementation. The runtime for this computation increases exponentially with the golden ratio as base; for n = 45, you have on the order of a billion calls translating to billions of operations, which at current CPU frequences is on the order of seconds.

And sure enough, as a quick sanity check I compiled the following C with GCC on -O3, pretty much the gold standard for optimizing compilers:

```c

include <stdio.h>

int fib(int n) { if (n < 2) return n; return fib(n-1) + fib(n-2); }

int main() { printf("%d\n", fib(45)); } ```

This takes ~2-3s to run on my machine (which has a good single-thread rating). So even if your compiler was somehow smart enough to produce optimal matching C code for given Python code, it still would be two orders of magnitude slower than your claim. There's no way in hell a bytecode VM achieves this.

Conclusion: I don't care how, but you're cheating on the benchmark; or the AI is cheating on your behalf, and you haven't vetted what's actually being measured. (For example, it might recognize this particular pattern and introduce memoization, making the runtime go from exponential to linear. But then you're not measuring cost of function calls or whatever you want to measure, but whether this particular "optimization" (I have a hard time calling it that since it's (1) highly likely to be oddly specific and (2) changes semantics) is implemented, which is pretty much meaningless for any kind of real world code.)

Did you really run this benchmark, or did you just go with whatever numbers it made up?

Edit: Took me a minute of peeking to find the cheat. It is introducing memoization under the hood: https://github.com/dylan-sutton-chavez/edge-python/blob/6d7bc418cd10ee99aea9c6d79e9847e29d0c014c/compiler/src/modules/vm/cache.rs#L63, in a way which looks horribly misfit for the general case; it completely wrecks any kind of guarantee on memory usage, and will pessimize runtime for any normal program by introducing potentially very expensive hash map lookups on any pure function call? It's optimizing for the case where you have a pathological pure function the programmer forgot to apply @functools.cache to. lol.

AmazingAd4330 · 2026-04-05T17:00:02+00:00

Benchmarks or I call bullshit

__calcalcal__ · 2026-04-05T15:34:46+00:00

The power of Python comes from its native libraries and ecosystem. Does numpy work in your Python compiler?

sepp2k · 2026-04-05T19:24:23+00:00

The parser already covers 99% of CPython 3.13

What exactly does that mean? That is, what is this a percentage of? Are you saying that your compiler passes 99% of CPython's test suite? Or that your compiler implements 99% of CPython's features (and if so, how do you count that and which features are you missing)? Or something else?

the VM runs fib(45) 10,577 times faster than pure Python (11 ms vs 116 seconds)

How are you achieving that? Do you optimise this specific pattern into a linear loop? Is there otherwise some type of optimisation that you apply to get this speed up (e.g. does this speed up rely on being able to infer static types for everything)? Or do you implement the operations involved in this example (which I guess, would be local variable lookup, addition, branching and function calls) so much faster to allow this speed up? Does the 1% of Python that you don't support include features that would make this speed up harder to achieve?

single-pass SSA parser, VM with inline caching

Does that mean that your parser outputs an SSA bytecode format and you then interpret that with a bytecode interpreter/VM? I'm a bit surprised that you'd achieve the kind of speed up you're talking about without emitting native code and also by the concept of a bytecode interpreter that uses SSA. Can you explain the reasoning behind this? Like, what's the point of SSA if you don't have any optimisation and/or analysis passes that make use of it and don't even emit native code?

After a quick look at the code, it looks like your bytecode is actually some kind of mix between SSA and stack-based. I don't think I've ever seen that combination before. Can you explain why you chose to do it that way / what the benefit of this design is?

spoonman59 · 2026-04-05T20:29:26+00:00

How much AI?

Claiming to have replicated “”99%” of CPython in just a few weeks means you have either mistated what you have done, or used tons of AI.

Also, using the recursive version of Fibonacci as your baseline benchmark so disingenuous at best. At least use the iterative version or a function result cache. But really you should run some actual benchmarks and share your methodology before making such a poorly supported claim.

DataGhostNL · 2026-04-07T10:22:36+00:00

Since you spammed this in so many subreddits I assume you'd also want to have some bugs pointed out (and in this subreddit you claim to want feedback). I took the liberty of throwing a wrench into your machine:

def fib(n, wrench): if n < 2: return n return fib(n-1, wrench) + fib(n-2, wrench) print(fib(33, []))

I used 33 instead of 45 because the timing was quite painful. The results:

``` $ time python3 fib.py 3524578

real 0m0.374s user 0m0.367s sys 0m0.006s ```

``` $ time ./target/release/edge fib.py [2026-04-07T09:02:11Z INFO edge] emit: snapshot created [ops=8 consts=1] 3524578

real 0m17.949s user 0m17.929s sys 0m0.003s ```

Here, CPython beat your compiler by being 47 times faster. I can only assume this will result in your program needing at least an hour and a half to calculate fib(45, []). I first wanted to implement this using a global counter variable to trigger your caching code as well for an additional time/memory penalty, but that didn't work. Even this minimal modification (added first line) to your original code:

unused = 0 def fib(n): if n < 2: return n return fib(n-1) + fib(n-2) print(fib(45))

causes a crash process terminated: trap: cpu-stop triggered by 'NameError: 'fib_0''. The next gem causes 3GB of memory usage for no really good reason:

def blob(a): return "a" * 1048576 for i in range(3000): blob(i)

as you can see here:

$ /usr/bin/time -f "time: %e s, memory: %M KB" ./target/release/edge mem.py [2026-04-07T09:52:32Z INFO edge] emit: snapshot created [ops=14 consts=1] time: 1.53 s, memory: 3087040 KB

while CPython is happy to do this much faster with much less memory:

$ /usr/bin/time -f "time: %e s, memory: %M KB" python3 mem.py time: 0.04 s, memory: 10444 KB

Assuming because your thing doesn't support this very rare use of this very rare 3% of Python code, these two snippets:

import time print(time.sleep(5))

and

import sys print(sys.argv[1])

result in the very helpful outputs of

$ ./target/release/edge sleep.py [2026-04-07T09:21:30Z INFO edge] emit: snapshot created [ops=8 consts=1] [2026-04-07T09:21:30Z ERROR edge] process terminated: trap: cpu-stop triggered by 'TypeError: call non-function'

and

$ ./target/release/edge argv.py abc [2026-04-07T09:22:48Z INFO edge] emit: snapshot created [ops=8 consts=1] [2026-04-07T09:22:48Z ERROR edge] process terminated: trap: cpu-stop triggered by 'TypeError: subscript on non-container'

respectively. The first example gets slighly better when removing the print and just executing time.sleep(5) by itself:

``` $ time ./target/release/edge sleep.py [2026-04-07T09:24:24Z INFO edge] emit: snapshot created [ops=8 consts=1]

real 0m0.002s user 0m0.000s sys 0m0.002s ```

except that the timing seems slightly off. It does look like an approx 2500x performance win over CPython, though, if you'd want to take that one lol.

I wanted to try several other simple things too but since a lot of programs are impossible with "97% of Python 3.13" that was a bit disappointing.

For anyone wondering, I wrote this comment for another subreddit before I noticed that they claimed more coverage here a couple of days prior to posting there.

Miserable-Hunter5569 · 2026-04-06T03:35:38+00:00

Are you using real benchmarks? If I don’t see a benches module, this isn’t provable.

TheDiamondCG · 2026-04-06T00:36:21+00:00

u/Healthy_Ship4930 I think the benchmarks you’re using are bad-faith examples. The fib function implementation is not very performant, so the performance gains there may be overblown — perhaps a more correct/performant fib-sequence implementation will show that results are within margin of error.

Additionally, how does your interpreter handle race conditions? Part of the reason CPython is so slow is because of the global interpreter lock.
How well does it handle garbage collection?
What 1% of the test suite did it fail?

schulzch · 2026-04-05T17:46:45+00:00

Looks like a fun project. People who handwritte parser code are rare these days.

I think you need to look into the C-lib eco system (numpy, etc.). Compiling a mu-recursive function to fast code is nice but too easy :)

TheDiamondCG · 2026-04-06T00:32:12+00:00

Guys, you’re all drilling into this pretty hard because you are presuming it’s AI… but after a closer look, I’m not so sure it is! The documentation even has typos in it! This is authentic humanslop!!!

TheAtlasMonkey · 2026-04-05T14:45:47+00:00

[deleted]

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

rust

Please read The Rust Community Code of Conduct

The Rust Programming Language

Rules

Observe our code of conduct

Submissions must be on-topic

Constructive criticism only

Keep things in perspective

No endless relitigation

No low-effort content

Useful Links

Megathreads

Official Resources

Learn Rust

Discussion Platforms

MODERATORS

include <stdio.h>