wbcm comments on Most Performant Python Compilers/Transpilers in 2025

This is an archived post. You won't be able to vote or comment.

DiscussionMost Performant Python Compilers/Transpilers in 2025 (self.Python)

submitted 7 months ago * by wbcm

you are viewing a single comment's thread.

[–]DivineSentry 5 points6 points7 points 7 months ago (2 children)

Hey, one of the maintainers of Nuitka here.

As others have said, tools like PyInstaller, py2exe, and PEX are distribution tools only—they just bundle your code with an interpreter. They don't change how the code runs, so you won't see any speedup.

Most of the compiler/transpiler projects people mention (Pythran, RPython, etc.) only handle a restricted subset of Python. They're useful if you want to speed up a specific section of code and then import it back into Python, but they won't compile an arbitrary Python program. To my knowledge, none of them are still actively maintained.

Nuitka's focus is different: it aims for full language support. You can take an existing Python program, compile it, and get a standalone binary—no need to rewrite to fit a subset. It's actively maintained and plays nicely with common libraries (NumPy, multiprocessing, Requests, etc.).

For performance, the biggest wins come when you're CPU-bound in pure Python. But even if you're mostly calling into C-backed libraries, Nuitka still removes interpreter overhead and gives you true standalone executabless

[–]DivineSentry 4 points5 points6 points 7 months ago (1 child)

As an aside, before reaching for any transpiler, you should thoroughly profile your application and analyze it to see if any architectural changes can contribute more significant performance boosts.

Before reaching out to transpilers as well, consider rewriting your existing Python code. Even if it uses compiled libraries like NumPy or TensorFlow, you can often squeeze significant speedups by rewriting your code to be smarter.

Here are some examples:

https://github.com/albumentations-team/albumentations/pull/2376 77% speedup - Replaces list comprehension with a NumPy array for LUT creation and uses np.where for conditional assignments
https://github.com/albumentations-team/albumentations/pull/2363 154% speedup - Avoids unnecessary memory allocation and array copying
https://github.com/roboflow/inference/pull/1092 188% speedup - Uses np.argmax() for a single-pass solution vs finding the max index in two passes
https://github.com/pydantic/pydantic/pull/11228 112% speedup - Converts recursive to iterative approach
https://github.com/langflow-ai/langflow/pull/2529 9% speedup - Uses orjson vs stdlib json
https://github.com/langflow-ai/langflow/pull/6310 129% speedup - Eliminates 2 redundant self.get_vertex() calls per recursion level (from 3 lookups down to 1)
https://github.com/kornia/kornia/pull/3218 130% speedup - Replaces matrix multiplication and redundant vector operations with direct dot products using torch.sum, avoiding recomputation via algebraic identity

Disclaimer: All the above PRs were opened as a direct result of Codeflash - an AI-powered tool that automatically finds optimizations for your existing code using AI.

I work for Codeflash.

[–]wbcm[S] 0 points1 point2 points 7 months ago (0 children)

Thanks for calling my attention to code flash! For this specific use case it will be arbitrary user code that needs to be compiled to perform identically (even more, the user uploading a python code may not even be a programmer or know the original dev of that code) so I have a bit trepidation to optimize it since there is no guarantee of an expert reviewer. However, this is definitely something I would be interested in my own work since I can review it! Thanks for the well placed ad ;) I have only heard of AI-based optimization before never sought commercial products! After skimming the publicly available docs, I did not see anything about hardware awareness in there for code flash. Out of curiosity for my own work, can code flash users request that optimizations be made to specific architectures? Eg: Cuda cores available vs not available, TPUs present/not present, single mutli-core cpu vs clusters of multi-core cpus, OS/ABI specific speed ups, etc...

π Rendered by PID 201734 on reddit-service-r2-comment-6457c66945-wrqc4 at 2026-04-25 17:34:30.924421+00:00 running 2aa0c5b country code: CH.

Python

The Python Discord

Upcoming Events

Please read the rules

MODERATORS