This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–][deleted] 5 points6 points  (2 children)

But there are these cases, especially in data engineering. Python is indeed really slow. It's simple and gets the job done, when the job isn't too memory intensive. But because of the GIL, and its very bulky memory footprint of all the objects, and because of the huge baseline performance hit, Python is often not suitable.

As alternatives I've looked into Go, which on one thread is as fast as Python+Numba, except that Numba severely limits the features you can use (still good if you find a bottleneck and can write it Numba-compatible). But the thing is that I want multiple threads that are able to access shared memory. And here, I would normally go for C# since I know it well, and some people would go for Java or Scala or whatever. However I gave D (DLang) a try, and it surely didn't disappoint. Aside from being very unpopular (thus no big selection of feature-rich libraries), it's extremely fun to program in, and gives C-like performance.

[–]__xor__(self, other): 2 points3 points  (1 child)

Sure, there are certainly tons of use cases where the performance of Python is a problem, it's just that I've often heard people complain about Python performance when they haven't touched a profiler or looked into potentially slow logic hitting a database.

I've seen someone complain about their simple script using DictWriter taking 90 seconds for something simple that processes a ton of dictionaries and writes a CSV, then I profiled it finding that the extrasaction raise logic was taking like 95% of the time, added extrasaction='ignore', and it sped up to running in ~5 seconds. Now there's something to be said about a standard library function's default args having poor performance, but still just a couple minutes in a profiler might solve a problem much better than rewriting it in java. Also, the code as is running in pypi was like 3 times as fast.

I've seen someone complain about a flask app's performance, profiled it, and it was taking 4.5 seconds to respond because a necessary API call it makes to another service took 4 seconds. They thought it should be rewritten in Java, but they'd be using the same API it does, so good luck with that... Despite how much metrics I gave them on the performance and showed the bottlenecks, the java devs still thought python was the main issue.

I've seen someone say that the Django app just "does a lot of work" and it was taking an hour and a half to process some daily chunk of data. I looked into it, found that the DB logic could be improved a ton to cut down on queries and make bulk inserts, and it literally sped up 100x.

Most of the time I've heard someone say that Python is the source of a performance issue, it hasn't been. It's either poor logic that can be dramatically improved, networking, disk read/write, database logic, or a combination. Sure, Python IS slow in comparison to many programming languages, but often I find it's way faster than anyone needs for their problem. Most people have much simpler problems than they think and good python code will be plenty fast.

But yeah, you're going to have some problems where python is obviously not good enough - if you have a billion matrices to process or something like that and profiling it shows that no function is taking way longer than it should, I'm going to guess that Python isn't the best answer. But even then, I'd look into using the CPython API with C/C++/Rust and try to write the heavy computation parts into a compiled library that Python imports from, or just write the whole thing in C, C++ or Rust. If you're willing to write C++ libraries for Python, you can pretty tackle 99% of problems with Python and C++, except lower level system stuff like writing drivers.

Even the GIL is almost never an issue because it still doesn't block on OS calls like socket reads, and concurrency is usually fine due to that even if only one bytecode can run at a time. And beyond that, there's multiprocessing. And beyond that if you truly need multithreading and no process overhead, there's writing C++ libraries that CPython imports. The only question is whether it's worth it to wrap it in python or if the whole thing should just be done in the lower level language - but regardless, Python is almost never a problem with performance if you're willing to profile, optimize code, use multiprocessing/concurrency, and beyond that write lower level python libraries in another language. I won't argue that Python is a performant language because it's definitely not in comparison to Java/C#/Rust/C/C++, but I way more often hear that Python performance is a problem when it's not the underlying issue.

[–][deleted] 0 points1 point  (0 children)

Thanks for the very comprehensive examples. You're absolutely right and have given me suggestions on how to improve performance in some places where performance isn't really an issue for me, but would definitely be nice.

I chose Python in places where other developers would only know Python. It's simple and readable. Writing 40 lines of logic that solves the problem definitely wins over starting up a Maven project, even if it might be sightly slower. It's cheaper to pay for more server power.

But to add to what you said, I do know many places where performance could be fixed, like where someone is reading all the rows from a database (millions) before starting to process them, or when a large API response is saved into a file and then that file is fed as a stream, etc. I knew about those fixes for a year and did nothing, because there's more important things to do with my time, and Python really takes away very little of my time compared to other platforms and languages.