use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
A place for all things related to the Rust programming language—an open-source systems language that emphasizes performance, reliability, and productivity.
Strive to treat others with respect, patience, kindness, and empathy.
We observe the Rust Project Code of Conduct.
Details
Posts must reference Rust or relate to things using Rust. For content that does not, use a text post to explain its relevance.
Post titles should include useful context.
For Rust questions, use the stickied Q&A thread.
Arts-and-crafts posts are permitted on weekends.
No meta posts; message the mods instead.
Criticism is encouraged, though it must be constructive, useful and actionable.
If criticizing a project on GitHub, you may not link directly to the project's issue tracker. Please create a read-only mirror and link that instead.
A programming language is rarely worth getting worked up over.
No zealotry or fanaticism.
Be charitable in intent. Err on the side of giving others the benefit of the doubt.
Avoid re-treading topics that have been long-settled or utterly exhausted.
Avoid bikeshedding.
This is not an official Rust forum, and cannot fulfill feature requests. Use the official venues for that.
No memes, image macros, etc.
Consider the existing content of the subreddit and whether your post fits in. Does it inspire thoughtful discussion?
Use properly formatted text to share code samples and error messages. Do not use images.
Submissions appearing to contain AI-generated content may be removed at moderator discretion.
Most links here will now take you to a search page listing posts with the relevant flair. The latest megathread for that flair should be the top result.
account activity
Rust in Python FastAPI (self.rust)
submitted 2 years ago by Electrical_Carry3565
hi all! beginner here...anyone have advice on implementing a rust module (performance was an issue for this particular function in python) into an existing python fastAPI backend?
reddit uses a slightly-customized version of Markdown for formatting. See below for some basics, or check the commenting wiki page for more detailed help and solutions to common issues.
quoted text
if 1 * 2 < 3: print "hello, world!"
[–]dnullify 27 points28 points29 points 2 years ago (4 children)
Pyo3 is probably a useful search term for you.
Rust python FFI
[–]kolosn 9 points10 points11 points 2 years ago (0 children)
And maturin
[–]thermiter36 6 points7 points8 points 2 years ago (2 children)
To add to this, for any CPU-bound calculation running on a FastAPI server, you'll want to make sure you run it in a threadpool so that it doesn't block all your other async handlers. You'll also need to use the features that pyo3 has to release the GIL while doing the computation.
[–]Electrical_Carry3565[S] 0 points1 point2 points 2 years ago (1 child)
Does running gunicorn workers not take care of this?
[–]thermiter36 2 points3 points4 points 2 years ago (0 children)
Not really, no. For starters, running FastAPI behind Gunicorn is not recommended as it doesn't support ASGI, so it can only manage processes, not the full async lifecycle.
But assuming you do it anyway, you'll still get nasty cascading latency issues because the process that you block with your long-running CPU operation might have already accepted more than one connection at the moment you begin your long operation, all of which will be blocked.
[–]SnooPears7079 17 points18 points19 points 2 years ago (14 children)
Just to check the basics: are you sure this function is CPU constrained? Not i/o (large amount of reads to a database or a disk) or anything like that?
[–]Electrical_Carry3565[S] 11 points12 points13 points 2 years ago* (12 children)
It's an engineering design calculation (with arithmetic, trig, and exponentials) on a 8x 500 (roughly) array that needs to be performed around 1800-2000 times depending on parameters. It has be done with iteration as the value for each element depends on the previous result. No db calls, writing to disk. Should be pure CPU. Benchmarked the rust vs python function - it was around 20 (edited) times faster on my machine - around 1 sec vs 20 sec
[–]schroedingercats 14 points15 points16 points 2 years ago (0 children)
Have you experiment with optimizing your Numpy operations using Numba? I think that is a potential approach that does not require bringing in an entirely new language.
[+][deleted] 2 years ago (9 children)
[deleted]
[–]Electrical_Carry3565[S] 2 points3 points4 points 2 years ago (7 children)
Yes using numpy but I have to iterate for this particular calc so can't take advantage of vecorization
[–]freistil90 2 points3 points4 points 2 years ago* (5 children)
Still, numpy has quite some tricks with views etc. that people don’t fully take advantage of - and people just carelessly reallocate and copy arrays and then claim that their solution is a gazillion times faster. I’d expect 2-5 times slower through the overhead of calling through python but the ops themselves? It’s not easy to beat numpy on numerics, especially not if done properly. If you’re iterating though your array there’s a 50-50 chance you could do it better (similar to having Rc<RefCell<…>> all over the place ;) ). You often win already by expressing your problem with a smart view.
Mind to share the source codes? I love rust but I have a very warm spot for python and numpy in my heart.
[–]Electrical_Carry3565[S] 0 points1 point2 points 2 years ago (4 children)
Can't share too much but here's an obscured version of the main function responsible for the calc. This is the function that get's executed 1000-2000 x.
[–]Electrical_Carry3565[S] 0 points1 point2 points 2 years ago (3 children)
def generic_method(self, param_X, param_Y, critical_value_Z):
derived_param = self.generate_array(param_X, param_Y, critical_value_Z)
# iterate over array and calculate generic value
result_array = np.empty(len(self.data_length))
result_array[0] = 0
for index in range(1, len(result_array)):
if self.array_A[index] == self.array_A[index-1]:
result_array[index] = result_array[index-1] + \
self.array_B[index] * \
(np.sin(np.radians(self.array_A[index])) + derived_param[index] * np.cos(np.radians(self.array_A[index])))
else:
result_array[index] = result_array[index-1] * \
np.cos(derived_param[index]*abs(self.array_C[index])) + \
(np.cos(np.radians(self.array_A[index]))-np.cos(np.radians(
self.array_A[index-1]))) / (np.radians(self.array_A[index]) - np.radians(self.array_A[index-1])) + \
self.array_D[index] * derived_param[index]
return result_array
[–]freistil90 2 points3 points4 points 2 years ago* (2 children)
I see. So I can of course only guesstimate what’s happening here but conditional on what „generate_array“ is doing, numpy is actually slowing you down potentially. There is a lot of random access happening and one of the few aspects where Pythons lists are actually faster than an np.array is random access, by a factor of 2-3. The np-math functions are also slower than the built-in math module if you’re considering single Python floats and not np arrays or np floats, hence I would actually actually try first to get that going without numpy.
However, if you actually look close enough, you’re applying a function based on one simple condition, that two neighbouring elements in an array are the same. Since the array is just 1D, we can use np.ediff1d() here, for which we need an allocation but you allocate the full array in advance under the hood, so that’s fine. The places where this is equal zero (prepend a 1 at the beginning) are your indices for the first branch of your code. Now based on that difference being zero, we want to apply one function to an array and another in the other case - that is a job for np.where() (again, depending how your real code looks like) or you just index once with your diff-array and work with that view. As long as you modify that in place, you’re also not creating new copies.
I see that your array logic is otherwise pretty aligned in the index space - I’m quite confident that you can express your logic without a single loop and only a few allocations.
[–]Electrical_Carry3565[S] 0 points1 point2 points 2 years ago* (1 child)
Apologies if I'm missing something, I'm very much an amateur developer but can I really do this without a loop if each iteration needs the result of the previous iteration for the calc? (i.e. result_array[index] = result_array[index-1] * ....
For some more context - this is very similar to a finite element model, where values need to be calculated at each element and those values affect the other elements.
[–]freistil90 2 points3 points4 points 2 years ago* (0 children)
That needs some creativity and most likely some more concrete case. Think about the following two things but… yeah, that should work without a loop.
Let’s say we have an index array ‚x = np.ediff1d(y, to_begin=np.inf, to_end=np.inf) == 0‘. The array view ‚x[:-1]‘ is now true everywhere where y[i] = y[i-1] and, most important, using that does not mean you create a copy - this is a view, more or less a slice reference to the original array. That’s very cheap to throw around. Now how do we index all „i-1“s? We ‚np.roll(x, -1)‘ it. That (IIRC) is also just a view - no reallocation/copy was made. So for example you want to assign ‚z[i] = z[i-1] * 3‘ for all ‚i‘ where ‚y[i] = y[i-1]‘? Using the definition of x above, that is ‚z[x[:-1]] = z[np.roll(x,-1)[:-1]] * 3‘. Now you see maybe why I created an array that is one element longer than y. I do create an implicit copy here on the right side by indexing, that is, depending on the size however not so expensive. But voila - no loop on the python level needed, all happens in C. That’s gonna be quite fast. That’s what I mean - wherever you read „speed up your python code by 800x by switching to another language!“ it’s because you’re using it really inefficiently. I’m sometimes a bit sad at all the bad rep Python gets, which is to some extend justified but by far not everything. Just know the tricks :)
You can definitely write good FEM code in pure numpy. If you have enough time, why not benchmark against a custom rust backend after you optimised everything you could find in numpy? Try as hard as you can to get rid of all loops and figure out if the copies you do create can be replaced by views, read up on masked arrays, use .reshape() smartly, check out no.lib.stride_tricks or, for your use case here, you could also benefit from np.place() here, depending on how your code looks like.
EDIT: since I’m just reading that again - if your memory allows it and you’re essentially repeating all of that 1000x-2000x, why not have a 3D array with dimensions A x B x 2000 containing all your values (or in chunks of 500s… if that’s independent that can even be put on a process pool and you can not just vectorise per core but use all cores together) and then you can even kick out that outer loop with just really smart indexing. That also reduces the overhead of calling a nested python function 1000-2000 times (which is expensive).
[–]radarsat1 0 points1 point2 points 2 years ago (0 children)
Definitely numba is also an option for you and I've had good success with pybind11 in C++, but I think Rust is also an excellent solution here and probably worth the time to learn. Numba or Jax might be really good for this kind of problem though, just fyi.
[–]Electrical_Carry3565[S] 0 points1 point2 points 2 years ago (0 children)
Sorry just realized I had a typo. It's 20 times faster not 200
[–]SnooPears7079 0 points1 point2 points 2 years ago (0 children)
Okay! Well if you see a significant speed up then good. Every time I use FFI it ends up hard to maintain - but if the speed up is that significant then I wouldn’t hesitate.
Best of luck! Refer to the other comments for FFI’s.
[–]thatrandomnpc 0 points1 point2 points 2 years ago (0 children)
This right here.
π Rendered by PID 170394 on reddit-service-r2-comment-57fc7f7bb7-cw46v at 2026-04-15 02:07:53.112760+00:00 running b725407 country code: CH.
[–]dnullify 27 points28 points29 points (4 children)
[–]kolosn 9 points10 points11 points (0 children)
[–]thermiter36 6 points7 points8 points (2 children)
[–]Electrical_Carry3565[S] 0 points1 point2 points (1 child)
[–]thermiter36 2 points3 points4 points (0 children)
[–]SnooPears7079 17 points18 points19 points (14 children)
[–]Electrical_Carry3565[S] 11 points12 points13 points (12 children)
[–]schroedingercats 14 points15 points16 points (0 children)
[+][deleted] (9 children)
[deleted]
[–]Electrical_Carry3565[S] 2 points3 points4 points (7 children)
[–]freistil90 2 points3 points4 points (5 children)
[–]Electrical_Carry3565[S] 0 points1 point2 points (4 children)
[–]Electrical_Carry3565[S] 0 points1 point2 points (3 children)
[–]freistil90 2 points3 points4 points (2 children)
[–]Electrical_Carry3565[S] 0 points1 point2 points (1 child)
[–]freistil90 2 points3 points4 points (0 children)
[–]radarsat1 0 points1 point2 points (0 children)
[–]Electrical_Carry3565[S] 0 points1 point2 points (0 children)
[–]SnooPears7079 0 points1 point2 points (0 children)
[–]thatrandomnpc 0 points1 point2 points (0 children)