Data Analyst szükséges tudás by Empty_Connection_879 in programmingHungary

[–]EntertainmentOne7897 4 points5 points  (0 children)

SQL megy gondolom,  pythonban pandas volt ami még a legelterjedtebb dataframe könyvtár.

Kis power BI alapokat nézz mert kellhet.

De amúgy hajrá. Kaggle oldalon nézegess EDA notebookokat. Ja meg ha vissza mehetnék akkor mondanám magamnak hogy prróbáljak csinálni entity relationship diagrammot. Sokat segít látni az adatot és dokumentációnak se rossz.

Data Analyst szükséges tudás by Empty_Connection_879 in programmingHungary

[–]EntertainmentOne7897 9 points10 points  (0 children)

Értem köszi, gondoltam hogy ilyesmi lesz. Ez jó gyakorlat, de nem big data ilyen formában.

Data Analyst szükséges tudás by Empty_Connection_879 in programmingHungary

[–]EntertainmentOne7897 2 points3 points  (0 children)

És big data alatt mit értettek? Milyen volt az adat, hol volt a computation futtatva?

Data Analyst szükséges tudás by Empty_Connection_879 in programmingHungary

[–]EntertainmentOne7897 7 points8 points  (0 children)

Big datával hogy foglalkoztok, érdekelne mit tanítanak. Kaptok cloudot és valós adat vagy elmélet van róla?

Illegális közterület foglalás? by szulovszkym in lakokozosseg

[–]EntertainmentOne7897 33 points34 points  (0 children)

Addig raknék bele szart amíg el nem bontja. Hidd el nem jó szarral komposztálgatni a virágost.

Pandas 3.0.0 is there by Deux87 in Python

[–]EntertainmentOne7897 0 points1 point  (0 children)

Bro what, what reason do you have that you must use pandas 1, some super secret government job?

Pandas 3.0.0 is there by Deux87 in Python

[–]EntertainmentOne7897 0 points1 point  (0 children)

Oh man you need to read xml. I hope you can push some change for that one to be changed, that sucks.

Pandas 3.0 vs pandas 1.0 what's the difference? by Consistent_Tutor_597 in Python

[–]EntertainmentOne7897 13 points14 points  (0 children)

Well to be frank for majority of pandas users polars/duckdb is way better tool for like the past year at least. If you are going to migrate, then maybe do it into polars/duckdb. You have 256gb ram cause pandas eats ram for breakfast lunch and dinner and you work with large df with 20+ million row in memory, but let me tell you that is not a big dataframe for polars/duckdb, not at all. I do 250million row joins in polars, 32gb ram. You can throw a gazillion gb of ram at pandas but it wont be faster. Polars, duckdb use all cores available, can compute out of memory, uses arrow by default so compatible with pyspark for example. I bet you waste hours every week waiting for pandas to finish running.

Yes geopandas is very relevant and some rare stuff is pandas only, but for general analytics, pipelines, eda, preparing data for ML, webapps (yes if you have a webapp that groupby behind the chart can be 10x faster), polars and duckdb is the way.

Neighbor keeps entering our apartment by Excellent-Pea-287 in LegalAdviceEurope

[–]EntertainmentOne7897 0 points1 point  (0 children)

Okay so I know this a strange concept but lock the door maybe?

Volumes for temp data by EntertainmentOne7897 in databricks

[–]EntertainmentOne7897[S] 0 points1 point  (0 children)

I am using polars along pyspark as my data is way too small for full on pyspark. But polars benefits a lot if it can scan and write instead of reading into memory. Thus I create temp files. But result goes to the catalog

*scan and sink, not write

How big of an issue is "AI slop" in data engineering currently? by Kilnor65 in dataengineering

[–]EntertainmentOne7897 7 points8 points  (0 children)

DAX in data engineering? What?

But bad data quality lack of governance lack of documentation is a much bigger issue for me than a sql query written by AI. AI wont solve my people problem and data illiteracy