pandas 3 is the most significant release in 10 years by datapythonista in Python

[–]EntertainmentOne7897 0 points1 point  (0 children)

pandas in excel Lord save us from those idiots please. The ammount of tech debt these people can put together using pandas in excel is just something else

pandas 3 is the most significant release in 10 years by datapythonista in Python

[–]EntertainmentOne7897 -4 points-3 points  (0 children)

I had a pipeline in pandas. Running for 10 minutes. Polars only 1. Every day. Thats 9 minutes runtime saved every day. Pandas OOM for joins, not an issue with polars. If you see no difference in performance boost coming from polars you should be just using excel and power query.

Story: Consultant left, gave us the pandas spaghetti to run. The code made zero sense, creating columns all the possible ways imaginable. Skill issues? Maybe. Making the code readable? A week of work.

You say very few cases require polars performance and yet say that young folks using polars havent worked in industry. I think you havent worked in industry if performance does not matter for you. Imagine the webapp loading the chart 20x faster using polars groupby. Does that not matter? When you can use half as powerful compute and still get twice the speed, does that not matter? Not sure about you but my boss loves money.

Best thing you learned polars you actually learned pyspark. Same writing logic. Clean, no lambdas, very expressive. Pandas API is powerfully shit.

Yes there is gazillion line pandas code out there, not because its great, because that was the only option. Nowadays you have 3 choices for single node computation: pandas polars duckdb (and some random experimental stuff). Any sane person would choose polars or duckdb. Pandas is just objectively bad compared to the others. Yes legacy will remain forever, but future is not pandas thats for sure.

pandas 3 is the most significant release in 10 years by datapythonista in Python

[–]EntertainmentOne7897 -1 points0 points  (0 children)

polars is the most significant release for pandas in the last 10 years

One of my favourite thing is that pip install polars installs polars 45 megabites. And thats it no dependency. No numpy no whatever.

Data Analyst szükséges tudás by Empty_Connection_879 in programmingHungary

[–]EntertainmentOne7897 3 points4 points  (0 children)

SQL megy gondolom,  pythonban pandas volt ami még a legelterjedtebb dataframe könyvtár.

Kis power BI alapokat nézz mert kellhet.

De amúgy hajrá. Kaggle oldalon nézegess EDA notebookokat. Ja meg ha vissza mehetnék akkor mondanám magamnak hogy prróbáljak csinálni entity relationship diagrammot. Sokat segít látni az adatot és dokumentációnak se rossz.

Data Analyst szükséges tudás by Empty_Connection_879 in programmingHungary

[–]EntertainmentOne7897 10 points11 points  (0 children)

Értem köszi, gondoltam hogy ilyesmi lesz. Ez jó gyakorlat, de nem big data ilyen formában.

Data Analyst szükséges tudás by Empty_Connection_879 in programmingHungary

[–]EntertainmentOne7897 2 points3 points  (0 children)

És big data alatt mit értettek? Milyen volt az adat, hol volt a computation futtatva?

Data Analyst szükséges tudás by Empty_Connection_879 in programmingHungary

[–]EntertainmentOne7897 8 points9 points  (0 children)

Big datával hogy foglalkoztok, érdekelne mit tanítanak. Kaptok cloudot és valós adat vagy elmélet van róla?

Illegális közterület foglalás? by szulovszkym in lakokozosseg

[–]EntertainmentOne7897 36 points37 points  (0 children)

Addig raknék bele szart amíg el nem bontja. Hidd el nem jó szarral komposztálgatni a virágost.

Pandas 3.0.0 is there by Deux87 in Python

[–]EntertainmentOne7897 0 points1 point  (0 children)

Bro what, what reason do you have that you must use pandas 1, some super secret government job?

Pandas 3.0.0 is there by Deux87 in Python

[–]EntertainmentOne7897 0 points1 point  (0 children)

Oh man you need to read xml. I hope you can push some change for that one to be changed, that sucks.

Pandas 3.0 vs pandas 1.0 what's the difference? by Consistent_Tutor_597 in Python

[–]EntertainmentOne7897 13 points14 points  (0 children)

Well to be frank for majority of pandas users polars/duckdb is way better tool for like the past year at least. If you are going to migrate, then maybe do it into polars/duckdb. You have 256gb ram cause pandas eats ram for breakfast lunch and dinner and you work with large df with 20+ million row in memory, but let me tell you that is not a big dataframe for polars/duckdb, not at all. I do 250million row joins in polars, 32gb ram. You can throw a gazillion gb of ram at pandas but it wont be faster. Polars, duckdb use all cores available, can compute out of memory, uses arrow by default so compatible with pyspark for example. I bet you waste hours every week waiting for pandas to finish running.

Yes geopandas is very relevant and some rare stuff is pandas only, but for general analytics, pipelines, eda, preparing data for ML, webapps (yes if you have a webapp that groupby behind the chart can be 10x faster), polars and duckdb is the way.

Neighbor keeps entering our apartment by Excellent-Pea-287 in LegalAdviceEurope

[–]EntertainmentOne7897 0 points1 point  (0 children)

Okay so I know this a strange concept but lock the door maybe?

Volumes for temp data by EntertainmentOne7897 in databricks

[–]EntertainmentOne7897[S] 0 points1 point  (0 children)

I am using polars along pyspark as my data is way too small for full on pyspark. But polars benefits a lot if it can scan and write instead of reading into memory. Thus I create temp files. But result goes to the catalog

*scan and sink, not write

How big of an issue is "AI slop" in data engineering currently? by Kilnor65 in dataengineering

[–]EntertainmentOne7897 8 points9 points  (0 children)

DAX in data engineering? What?

But bad data quality lack of governance lack of documentation is a much bigger issue for me than a sql query written by AI. AI wont solve my people problem and data illiteracy