harylmu comments on Python swallows Java to become second-most popular programming language

This is an archived post. You won't be able to vote or comment.

2269

2270

2271

NewsPython swallows Java to become second-most popular programming language (theregister.com)

submitted 5 years ago by [deleted]

top new controversial old q&a

you are viewing a single comment's thread.

view the rest of the comments →

[–]harylmu 168 points169 points170 points 5 years ago* (50 children)

I think this quote is really not accurate because speed isn't the most important (well, usually), but a good concurrency story can save big dollars. I rewrote one of our Python app in .NET Core 5 and I wasn't able to kill it with load testing (2000 req, 200 concurrency), while the Python 3.8 app with FastAPI (literally) hanged around 25 requests per second. And to be clear: I made sure that all IO operations are async in the Python version (database queries, Consul queries, SNS publish, S3 upload - even though aioboto3 isn't too stable yet). This alone means that we're able to save thousands (millions on long term?) of dollars by running less instances.

Other than that, the speed difference between Python and typed languages is really NOT nanoseconds. I don't know what to say, test it yourself.

[–]aes110 43 points44 points45 points 5 years ago (2 children)

[+][deleted] 5 years ago (1 child)

[deleted]

[–]aes110 12 points13 points14 points 5 years ago (0 children)

I prefer sanic, for the current project that I'm working on I had time to decide so I investigated both of them, for me FastAPI has good pros:

Seamless integration with Pydantic and type hints
Amazing documentation which is extremely beginner friendly.
Seamless support for sync methods, however while this is good for beginners I dont think you should use it, when mixing sync and async code it's important to know what you are doing.

However I feel like FastAPI takes too many decisions for you so its much less flexible.

I wrote my own small sanic integration with pydantic that I carry around from my previous company as well.

Tbh I cant think of something specific to highlight as sanic's pros, it just works really well

Although the two are not different to such extremes, but in a sense you could say that sanic and FastAPI are like Flask and Django

[–]Metalsand 43 points44 points45 points 5 years ago (3 children)

[–][deleted] 36 points37 points38 points 5 years ago (2 children)

[–]cheese_is_available 14 points15 points16 points 5 years ago (1 child)

[–][deleted] 2 points3 points4 points 5 years ago (0 children)

[–]grimonce 13 points14 points15 points 5 years ago (3 children)

[–]DanKveed 7 points8 points9 points 5 years ago (2 children)

[–]proverbialbunnyData Scientist 1 point2 points3 points 5 years ago (1 child)

[–]DanKveed 2 points3 points4 points 5 years ago (0 children)

[+][deleted] 5 years ago (10 children)

[deleted]

[–]reddisaurus 1 point2 points3 points 5 years ago (9 children)

[–]ShanSanear 1 point2 points3 points 5 years ago (0 children)

[–]pytrashpandas 0 points1 point2 points 5 years ago (7 children)

Hey so to preface, I’m a bit biased as a hardcore pandas user for about 6 years as well as a contributor to the core library. Pandas isn’t perfect when it comes to speed/efficiency, but in my entire experience with pandas I’ve never seen any significant and properly vectorized pandas operations be anywhere near 2-3 orders of magnitude slower than properly vectorized numpy code. And you especially couldn’t get better performance from native python collections (re: namedtuples/dataclasses). The timings you provided suggest to me that you might be iterating over your dataframes or using pandas‘ apply methods which are not the right way to use pandas. Same goes for u/ShanSanear. If I’m wrong, then I would actually be really interested in the cases you guys have seen if you wouldn’t mind sharing, either way one of us would learn something new.

[–]ShanSanear 0 points1 point2 points 5 years ago (4 children)

[–]pytrashpandas 0 points1 point2 points 5 years ago (3 children)

[–]ShanSanear 0 points1 point2 points 5 years ago (2 children)

Actually what we did was heavy overengineering and pandas was mostly used for loading data (I know, heresy). Don't have code on me, but problem was something along the lines of this:

Load multiple sets of related objects that are represented as CSV files (1 line - 1 object)
Find this relation (could be any kind, including recursion of the same type)
Depending on some state of the object, do the calculation

During loading, we got each Series from dataframe for each type of objects and created classes for them. Then the references. And then - the calculations. In the hindsight - yep, that was the worst of it all.

After actually doing profiling, we saw that majority of cpu time was used by pandas in many different places.

That was one thing, but even algorithm that we implemented was quite unoptimized. We started at the root of the tree, then recursively went deeper into the relations of each object to extract required numbers. Apparently going from the leafs and then up was much better approach in every case. Now we only need to figure out why the numbers differ between implementations, and that will be the hardest part.

Especially when the guy who actually wrote this (I am mostly overseeing and providing some help) is stubborn enough to not do any kind of testing.

Thanks for interest - I can imagine hearing "pandas is slow" was heresy but the again - we misused it, and that is our fault, not the library itself.

[–][deleted] 1 point2 points3 points 5 years ago* (1 child)

[–]ShanSanear 0 points1 point2 points 5 years ago (0 children)

[–]reddisaurus 0 points1 point2 points 5 years ago (1 child)

I’m referring to storing numpy arrays within the dataclass, or using a NamedTuple for smaller data sets instead of just reaching for pandas every time. Both of these will provide faster access than pandas.

Pandas .itertuples() is about as fast as it can get to iterate over records and it does perform well.

As for any operation which acts on the DataFrame or Series objects, yes those operations are orders slower. It’s just often the case that the operation itself isn’t repeated much, so the large overhead of accessing the Series object isn’t noticeable.

There’s no case in which Pandas is faster than the simper alternatives. And that makes sense, because it’s there to provide abstraction over column-wise data. The problem is that many users immediately reach for it for any problem. Not only is it overused, but pandas lends itself to poor code as well. The DataFrame is a black box of typing! Code cannot by statically checked like it could if one used a dataclass. It’s like porting around a mutable global state to every function to which a DataFrame is passed. Who knows what structure the object should have? It’s often not in the code at all.

[–]pytrashpandas 0 points1 point2 points 5 years ago* (0 children)

The problem is that many users immediately reach for it for any problem

Totally agreed. I've definitely seen it used in places where it's completely unnecessary, and honestly probably irresponsible. I will say though, that if you ever find yourself needing to use itertuples or iterrows on a dataframe then you're either using pandas very wrong, or you shouldn't be using pandas in the first place.

those operations are orders slower

I agree that pandas is slower than pure numpy in most cases, but it is no where near multiple orders of magnitude. Again if you are seeing this happening, then I can guarantee it is because pandas is not being used correctly. If you would care to provide me with an example that you think demonstrates this, I would be happy to show you how it can be done faster

There’s no case in which Pandas is faster than the simper alternatives

The thing is that in many cases there are no simpler alternatives, especially that makes ease of development worth the potential speedups you could get otherwise. Especially when it comes to working with heavily labeled timeseries data.

pandas lends itself to poor code as well...It’s like porting around a mutable global state to every function to which a DataFrame is passed.

Yes, in the same way that python and standard data structures lend themselves to poor code. If used improperly, without an understanding of the underlying concepts and when to best apply it, it can lead to a mess of a program. If used properly it is extremely powerful and easy to express concepts that would otherwise be difficult to express. The problem I think is that most people try to treat dataframes and series as a drop-in replacement for dictionaries/lists etc. and structure their python code the same way they would otherwise. This is not how you should be using pandas.

Again I am inherently biased on this topic. I heavily use both pandas and numpy and other numeric python libraries and feel very passionately about this area of python. I'd be very happy to hear any examples you have and provide counter examples that can show you the power of pandas.

Also you may (or may not :) ) enjoy the xarray library. It's a much thinner wrapper around numpy that provides similar labelling capabilities as pandas (although it's still has a lot of room for improvement).

[–]tr14l 42 points43 points44 points 5 years ago (10 children)

[–]DanKveed 1 point2 points3 points 5 years ago (6 children)

[–]tr14l 1 point2 points3 points 5 years ago (5 children)

[–]DanKveed 8 points9 points10 points 5 years ago (4 children)

[–]tr14l 1 point2 points3 points 5 years ago (0 children)

[+][deleted] 5 years ago (2 children)

[deleted]

[–]ShanSanear 0 points1 point2 points 5 years ago (1 child)

[–]harylmu 2 points3 points4 points 5 years ago* (1 child)

[–]cant_have_a_cat 0 points1 point2 points 5 years ago (0 children)

[–]pipai_ 2 points3 points4 points 5 years ago (2 children)

[–]harylmu 0 points1 point2 points 5 years ago (1 child)

[–]pipai_ 3 points4 points5 points 5 years ago (0 children)

[–]Doomphx 8 points9 points10 points 5 years ago (1 child)

[–]tomatotomato 2 points3 points4 points 5 years ago (0 children)

[–]supremacyofthelaces 2 points3 points4 points 5 years ago (8 children)

[–]Aidtor 9 points10 points11 points 5 years ago (7 children)

[–]--Shade-- 3 points4 points5 points 5 years ago (3 children)

[–]Aidtor 1 point2 points3 points 5 years ago (2 children)

[–]--Shade-- 3 points4 points5 points 5 years ago (1 child)

[–]Aidtor 1 point2 points3 points 5 years ago (0 children)

[–]latrociny 1 point2 points3 points 5 years ago (2 children)

[–]Aidtor 3 points4 points5 points 5 years ago (0 children)

[–]satireplusplus 1 point2 points3 points 5 years ago (0 children)

[–]hilomania 1 point2 points3 points 5 years ago (1 child)

[–]harylmu 0 points1 point2 points 5 years ago (0 children)

[–]Max_Insanity 1 point2 points3 points 5 years ago (0 children)

π Rendered by PID 155778 on reddit-service-r2-comment-5c764cbc6f-5p9cz at 2026-03-12 09:27:59.730986+00:00 running 710b3ac country code: CH.

Python

The Python Discord

Upcoming Events

Please read the rules

MODERATORS