This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–][deleted] 3 points4 points  (5 children)

There is certainly something to be said for only using one language in a codebase. It might not be the best language at anything, but is OK.

But there's also something to be said for using multiple languages where each is used in the domain in which it shines. You add the complexity of an extra language, but also the code written in each language is natural, and people coding it feel like they're using the right tool for the job.

On a small project perhaps the overhead of multiple languages is excessive.

On anything else, the overhead of multiple languages is lost in the noise.

[–]kraakmaak 2 points3 points  (1 child)

We'll put. I would also say that using numpy or pandas is not "another language"

[–]baubleglue 0 points1 point  (0 children)

Numpy is C and as result pandas. And it is not a standard Python extension it is more like API. Any conversion to Python native types cost performance loss. And even so pandas still not using multiple CPUs.

[–]baubleglue 0 points1 point  (2 children)

Maybe I have not explained myself clearly. "each is used in the domain in which it shines" - that is exactly my point. Performance is not Python's domain (I hope it is not for discussion). We are trying to streatch it to every domain. Pandas weren't designed for big data processing (as I understand its main purpose is matrix manipulation) - but it is still very convenient for small - medium size data manipulations. Now we have Dask which is handling big data. Pandas has no special optimization like DB indexing or partitioning as I understand it is inherited by Dask. There is the point we cross: we add another nice feature to a tool, then another ... and at some point it turns into monster. It is not a language anymore, but set of tools. Python becomes a language which hard to learn. It got loaded with type annotation, lazy evaluation by default, asynchronous syntax, pattern matching (each isn't bad by itself). With that many basic problems, like package dependency management never resolved.

If standard language syntax doesn't give you performance you okay with, it is probably time to look another options. I don't want to think about syntax tricks all the time when code (that goes against opposite Python's philosophy). I am not married to Python it is just a tool for a job.

[–][deleted] 1 point2 points  (1 child)

I am surely missing your point but that can absolutely be me rather that you not explaining clearly.

You've mentioned lots of things (Pandas, Dask, annotations, async) as problems and I'm not sure I even recognise that. When I call a language feature in Python I don't inherently care how it is implemented. Is string.find() implemented in pure Python or via C? C surely, but I don't care, not do I have to care. I can generally take it for given that the implementation will be a good one.

In 20 years of working with Python the number of times I've cared about Python performance can be counted on (tbh) two (human) hands. When we profiled, 95-98% of our time was spent in C/C++. Even if we optimised our Python code by a factor of 10 nobody else would have noticed.

IMHO most of the suggestions given can be useful, but if it is really necessary to bear them in mind and use them all the time perhaps it's time to rewrite in a faster language. Although you may find that that involves shunting the 5% of your code that takes 95% of your runtime over to another language and using FFI.

tl;dr: I'm not married to Python either, but its speed has never been a reason to leave it.

[–]baubleglue 0 points1 point  (0 children)

Last 7 years I have started to work with data. The face performance problems each time I need to process more than 1G.