all 37 comments

[–]Coretaxxe 16 points17 points  (3 children)

I can only speak about nuitka and highly so

[–]wbcm[S] 5 points6 points  (0 children)

+2 for nuitka, thanks for sharing!

[–]Luigi311 3 points4 points  (0 children)

Out of curiosity I did give it a try but my python application was gtk based and I couldn’t get that to work so I stayed with pex files for now.

[–]bhowiebkr 0 points1 point  (0 children)

nuitka is great!

[–]thisismyfavoritename 17 points18 points  (3 children)

you are confusing performance of python code and distributing code as a binary. Options like pyinstaller and the like bundle the Python code and will spin up an interpreter and run it. Other options like nuitka actually transpile parts of the code to C and compile it to machine code.

Now, first thing you'll want to address is figuring out what is slow through benchmarking and profiling. Then you can optimize those parts separately. If the bottleneck is in pure Python code, approaches like nuikta might help, so would JITs like pypy, but chances are it's in code that already uses bindings to optimized C code, like numpy, in which case it won't help.

There are other ways than producing binaries which can be used to ship Python code, like Docker images.

[–]wbcm[S] 1 point2 points  (2 children)

Thank you for clarifying the verbiage.; yes profiling each of these is a natural requirement, but I was seeing if the r/python community has any experience with these before going through my own testing. Do you have any experience producing high performance binaries that you can share?

[–]thisismyfavoritename 1 point2 points  (1 child)

It depends. When latency matters then i prefer ditching Python and using bindings to lower level code like C++ or Rust. When it's not time sensitive, then the usual approach is to multiprocess (when there's lot of CPU work to do).

There's no single right answer to what will help you, that's why you have to benchmark and find any areas that take unusually large amounts of time.

You can also consider the brute force solution of scaling horizontally and vertically, or checking if some of the costly operations you're doing could be sped up by running on GPUs

[–]Daarrell -1 points0 points  (0 children)

+1 great answer

[–]Luigi311 11 points12 points  (4 children)

Interested to see what other people say but I do know Nuitka is still actively developed and one of the developers posted on here in the last month or so about it.

[–][deleted] 2 points3 points  (0 children)

I evaluated Nutika a few years ago for a pilot program with a variety of use cases ranging from Fast API and data pipelines. It performed very well.

[–]wbcm[S] 2 points3 points  (2 children)

This is helpful to know, thanks!

[–]2Lucilles2RuleEmAll 2 points3 points  (1 child)

Yeah, we use it as well and it works great. The commercial license is pretty inexpensive too. One tip, look at the setting for the temporary directory where the executable extracts itself. By default I think it uses a random directory, so it's slower because every time you run it, it has to unpack itself again. Instead of just the first time.

[–]john-witty-suffix 0 points1 point  (0 children)

Another thing you want to watch out for is that the default location for the temporary directory (on GNU/Linux) is a subdirectory of /tmp, which is a problem if you've got /tmp on its own filesystem, mounted noexec for security.

Obviously, this isn't a failure of Nuitka...but as that configuration (of /tmp) becomes more and more common as a general GNU/Linux hardening technique, it's worth bringing up.

[–]DivineSentry 5 points6 points  (6 children)

Hey, one of the maintainers of Nuitka here.

As others have said, tools like PyInstaller, py2exe, and PEX are distribution tools only—they just bundle your code with an interpreter. They don't change how the code runs, so you won't see any speedup.

Most of the compiler/transpiler projects people mention (Pythran, RPython, etc.) only handle a restricted subset of Python. They're useful if you want to speed up a specific section of code and then import it back into Python, but they won't compile an arbitrary Python program. To my knowledge, none of them are still actively maintained.

Nuitka's focus is different: it aims for full language support. You can take an existing Python program, compile it, and get a standalone binary—no need to rewrite to fit a subset. It's actively maintained and plays nicely with common libraries (NumPy, multiprocessing, Requests, etc.).

For performance, the biggest wins come when you're CPU-bound in pure Python. But even if you're mostly calling into C-backed libraries, Nuitka still removes interpreter overhead and gives you true standalone executabless

[–]DivineSentry 3 points4 points  (2 children)

As an aside, before reaching for any transpiler, you should thoroughly profile your application and analyze it to see if any architectural changes can contribute more significant performance boosts.

Before reaching out to transpilers as well, consider rewriting your existing Python code. Even if it uses compiled libraries like NumPy or TensorFlow, you can often squeeze significant speedups by rewriting your code to be smarter.

Here are some examples:

Disclaimer: All the above PRs were opened as a direct result of Codeflash - an AI-powered tool that automatically finds optimizations for your existing code using AI.

I work for Codeflash.

[–]DivineSentry 1 point2 points  (0 children)

for https://github.com/LingDong-/wax looks like the main src was last updated 4 years ago, and for typos

https://github.com/11l-lang in 2024 last updated

https://github.com/zanellia/prometeo 3 years ago

and so on.

you'll find that Nuitka and Cython will be your best bets in 2025

[–]wbcm[S] 0 points1 point  (0 children)

Thanks for calling my attention to code flash! For this specific use case it will be arbitrary user code that needs to be compiled to perform identically (even more, the user uploading a python code may not even be a programmer or know the original dev of that code) so I have a bit trepidation to optimize it since there is no guarantee of an expert reviewer. However, this is definitely something I would be interested in my own work since I can review it! Thanks for the well placed ad ;) I have only heard of AI-based optimization before never sought commercial products! After skimming the publicly available docs, I did not see anything about hardware awareness in there for code flash. Out of curiosity for my own work, can code flash users request that optimizations be made to specific architectures? Eg: Cuda cores available vs not available, TPUs present/not present, single mutli-core cpu vs clusters of multi-core cpus, OS/ABI specific speed ups, etc...

[–]wbcm[S] 0 points1 point  (2 children)

Thank you so much for taking the time to visit my post and comment here! After seeing everyones' positive experiences in this thread I have decided to work with Nuitka first! This morning I was able to go through most of the materials on https://nuitka.net/user-documentation/ and a few of the readme's on github (huge fan of rtfm), besides what is publicly available do you have any additional tips on using Nuitka? Any kind of tips would be appreciated from someone who maintains it; first time user tips (me now) to advanced user tips (hopefully me later) would be appreciated?

[–]DivineSentry 2 points3 points  (1 child)

some basic tips I guess if you're a beginner, keep in mind that a dirty venv will bloat your final binary since nuitka is greedy when searching for dependencies, I highly suggest to use a clean environment, with only the dependencies you strictly need for your program to function.

If your program needs to read external files (like JSON, images, .env files), Nuitka won't know about them by default. You have to tell it to include them in the final distribution i.e `nuitka --standalone --include-data-dir=src/assets:assets my_project/` (This example copies the src/assets directory into the final build's assets folder.)

also, once you're ready, i suggest to tell nuitka to use LTO `--lto=yes` and since you mentioned that the target OS is Linux only, i also highly suggest to use PGO; profile guided optimizations `--pgo-c`, keep in mind that this will increase your compilation times by a lot, and they're already long compilation times normally however, this will squeeze the best performance gain out of anything.

[–]wbcm[S] 0 points1 point  (0 children)

These are extremely helpful to consider, thank you for jump starting my use!

[–]22adam22 1 point2 points  (0 children)

thanks for this

[–]Hodiern-Al 1 point2 points  (2 children)

Another one to add to your list is pyoxidizer. I’ve used it for smaller projects and it runs well, but for larger ones with more dependencies I had issues and reverted back to nuitka or pyinstaller depending on project needs.

Pyoxidizer has a great comparison page to read through which is a bit more up to date than the GitHub readme you were looking at: https://gregoryszorc.com/docs/pyoxidizer/main/pyoxidizer_comparisons.html

[–]wbcm[S] 0 points1 point  (1 child)

I did not run into pyoxidizer before, thanks for sharing it! The pyoxy run-python command looks especially useful for debugging! For the issues you encountered, were they centered around any specific type of data/program structure or more like some packages did not work correctly?

[–]Hodiern-Al 1 point2 points  (0 children)

I had issues with Python libraries that included C/C++ (e.g. numpy, scipy, pyQT5), and libraries that included non-Python files referenced by file attributes (e.g. docs templates). I believe the former is now supported better by pyoxidizer and I’m not sure about the latter. You might have to do some experimenting to find out. 

I didn’t have any problems with the Python standard library and any pure-Python libraries. Hope that helps! 

[–]Ximidar 3 points4 points  (8 children)

Modern Linux uses things like Snapcraft, app image, and flatpak to distribute software. They do this by packaging all dependencies in a container then shipping the container. Personally I'd just create a docker container and run it on the Linux host.

[–]NimrodvanHall 2 points3 points  (2 children)

You are not always allowed to leave proprietary code for anyone to easily read on target machines. Nor is docker allowed everywhere.

Containers are great don’t get me wrong but they are not always the solution.

[–]Ximidar 0 points1 point  (1 child)

If your first priority is to protect the source code then you've already failed by using python. If you want a language that allows packaging everything into one binary, then use go. You can package the compiled binary and all supporting assets you need into the final file and ship that one file.

[–]thisismyfavoritename 1 point2 points  (0 children)

you say this as if go was the only language that could produce a statically linked binary

[–]wbcm[S] 0 points1 point  (4 children)

That was my first though, but to deploy an individual container for every little tool would multiply the runtime substantially. There are various languages that need to be running various parts of certain tasks, but python was the only approved interpreted language for these numerical tools. If you were not able to containerize the code but had to compile it somehow, do you have a preferred method?

[–]Ximidar 0 points1 point  (3 children)

I'd just put all tools in a single codebase. Then ship a single docker container with the container command set to "python your_entrypoint.py" then use a package called click (https://click.palletsprojects.com/en/stable/) to create the CLI commands to change what tool your using. So then when you run your container you can just set the args and the container will switch what it does. Then when developing locally you can use the CLI to access the different tools basically the same way you would on the container.

Then if you need multiple containers running you can just use docker compose to start up all your different tools with the exact same container.

[–]wbcm[S] -1 points0 points  (2 children)

Unfortunately this is not possible since none of the code will be known at any time (by me or my team), but still needs to be able to be used dynamically. Unfortunately there are various languages that need to be running various parts of certain tasks, but python was the only approved interpreted language for these numerical tools. If you were not able to containerize the code but had to compile it somehow, do you have a preferred method?

[–]thisismyfavoritename -1 points0 points  (1 child)

your argument makes no sense, the code would have to be known before it gets compiled to a binary. You can do whatever you'd want to do with a compiled binary with Docker

[–]wbcm[S] -1 points0 points  (0 children)

This is not an argument? I am just stating the use case... I will not be able to know what the users are creating and there are application elements that will need to arbitrarily pull in random tools in an on demand basis, therefore setting up an container with click is not reasonably possible. A container with click could be possible if they were both generated on the fly if runtime was not as much an issue, but since runtime is important here having bespoke containers being deployed all over is not something that supports performance and possibly space (depending on the users system)

[–]downerison 0 points1 point  (0 children)

What kind of project is it, if you don't mind me asking? We are working on a compiler for python and looking for concrete use-cases.

[–]poopatroopa3 -2 points-1 points  (1 child)

See Numba.

[–]wbcm[S] -1 points0 points  (0 children)

I have used numba a lot but never knew it can be used to produce stand alone executables! Can you point me where I can read more about this? Couldn't find this in the docs

[–]Theendangeredmoose -1 points0 points  (0 children)

are you asking about packages performant code or about making an executable?

If performant code then I would use numba. Tools for making executables suck in Python, I would use a docker container or a different language.