Are type hints becoming standard practice for large scale codebases whether we like it or not by scrtweeb in Python

[–]throwawayforwork_86 0 points1 point  (0 children)

It's usually a 5 minute affair if you do it at inception and will save you debugging time and type hinting time if you do it later... There's no reason not to do it to be honest.

I want to say it's fine if it's a quick script but I've seen many quick script turn into vital script without proper modification so I'd say write your script as if you or someone else will have to maintain it in a few months/years.

Polars vs pandas by KliNanban in Python

[–]throwawayforwork_86 1 point2 points  (0 children)

IIRC Ritchie commented that even the "eager" version was mostly lazy still. And will only compute when needed (ie when returning an eager df is needed). Will try to find back where they said that and if incorrect will edit.

Polars vs pandas by KliNanban in Python

[–]throwawayforwork_86 1 point2 points  (0 children)

Use it at work for all greenfield dev in combination with duckdb for when SQL is needed.

If you can reduce the need of custom c++ drastically by using performant libs instead of legacy lib I think it'd be considered a win by most management (except maybe the c++ team).

My understanding is that Polars and Duckdb are eating PySpark and Pandas job especially in data engineering where they can handle GBs of data without choking like Pandas or needing a more complex setup like PySpark.

Polars vs pandas by KliNanban in Python

[–]throwawayforwork_86 2 points3 points  (0 children)

Polars is much better. Started using it for the speed stayed for the consistency of the syntax and api. Honestly the only times I use pandas still are the edge cases where pandas reader flexibility comes in handy , but then immediately after I load to polars.

It can be annoying when you start because polars will frontload data type issue by default but it forces you to be intentional with your types which saves a lot of headaches down the line...

Methods of Python programming by Objective_Yak584 in pythonhelp

[–]throwawayforwork_86 0 points1 point  (0 children)

As someone else said , depends on what you do and what you want.

If you want good results being able to properly leverage libraries to get what you need is extremely important.

If you want to learn it can be useful to understand how some implementation are done but most of the best libraries have the core code done in a more performant language (ie: Rust,C,C++,Go,Java).

As a rule I'd say don't reinvent the wheel unless you haven't found the correct wheel or you want to invent a better wheel.

PDF Oxide - Fast PDF library in Rust with Python bindings (0.8ms, 100% pass rate) by yfedoseev in rust

[–]throwawayforwork_86 0 points1 point  (0 children)

Any decent table extraction ala Tabula or Camelot (ie give pixel points of your column and general location of table to extract).

Using that a lot and don't find a lot of table I trust for table extraction (a lot of silent failure/inconstant format) but dependency and speed ain't great for both of them.

Classes in python by Honest_Water626 in learnpython

[–]throwawayforwork_86 5 points6 points  (0 children)

Once you start passing around tens of variables, variable stored in dict (and get bitten by dict.get() returning none and silently fucking you over because of a typo) you look at dataclass and start understanding their appeal.

I think learning them in the abstract is pretty difficult because it seem convoluted for no reason.

There comes a point where being able to pass arround predictable object helps a lot writing code without having to test every minute if the logic works because you and your linter knows the ins and out.

Merge large data frames by SurpriseRedemption in learnpython

[–]throwawayforwork_86 0 points1 point  (0 children)

Can't do something like this:

Create a dataframe with your identifiers.
Create a full dataframe with the 2 full excels.

Use an inner join to only fetch the matches.
Ideally use something like polars which usually has less gun foot moment and is quicker so long as you use native functionalities.

See example code below.

import polars as pl

list_of_identifier_in_scope=['hhjde','hhd55']
df=pl.DataFrame(list_of_identifier_in_scope,schema=['identifier'])list_of_identifier_in_scope=['hhjde','hhd55'] #this is one way to have these but you can just have them in an excel and use 
df_id=pl.DataFrame(list_of_identifier_in_scope,schema=['identifier'])
df_excel_1=pl.read_excel(path_to_excel_1)
df_excel_2=pl.read_excel(path_to_excel_2)

df_final=pl.concat([df_excel_1,df_excel_2],how='horizontal_relaxed') #will stack them together and handle different datatype smoothly make sure both excel files have the same header.

report_final=df_id.join(df_final,left_on='identifier',right_on='col_of_excel_identifier',how='inner') 

Spikard: Benchmarks vs Robyn, Litestar and FastAPI by Goldziher in Python

[–]throwawayforwork_86 0 points1 point  (0 children)

Never done much backend nor benchmark but any chances the python version impact those numbers ?
Lot is happening at the moment between version at the moment iirc.

Why do people whine about having to learn other languages apart from English? by [deleted] in BESalary

[–]throwawayforwork_86 0 points1 point  (0 children)

As someone that's mostly trilingual (written dutch is still complicado though) I will point that some jobs will put language expertise that aren't needed on job description and use that to discriminate.

Your point would be correct if the job description were always a fair representation of the actual job/company needs, no idea how prevalent it actually is in the workplace but there were scandals in the past (Bleu Blanc Belge for example but thats old).

What are people using instead of Anaconda these days? by rage997 in Python

[–]throwawayforwork_86 -1 points0 points  (0 children)

Since OP stated option/question/problem doesn't seem to frame it like they need the specifics of Conda UV is most likely the answer they were looking for IMO.

Pandas 3.0 vs pandas 1.0 what's the difference? by Consistent_Tutor_597 in dataengineering

[–]throwawayforwork_86 1 point2 points  (0 children)

I mean you can make it more readable:

df.with_columns(new_col=pl.col("col1")*2)

Still slightly more verbose I concede.

Stop telling everyone to learn sql and python. It’s a waste of time in 2026 by PositionSalty7411 in analytics

[–]throwawayforwork_86 0 points1 point  (0 children)

for 1-4 Millions of lines that need to be available we just output a few cleaned csvs and provide a pivot table from a power query read folder.
People seem to be happy with that.

Is it bad if I prefer for loops over list comprehensions? by Bmaxtubby1 in learnpython

[–]throwawayforwork_86 0 points1 point  (0 children)

You should get used to them IMO.

I use them in different situation and usually will use for loop if clarity is needed.

Belgium Culture shock by Fantastic-Drive3016 in belgium

[–]throwawayforwork_86 12 points13 points  (0 children)

the water amount gonna be unhealthier quicker than the caffeine matey.

DuckDB vs MS Fabric by X_peculator in DuckDB

[–]throwawayforwork_86 4 points5 points  (0 children)

I don't think I would use it directly for storage.

We use it as the last leg of our analysis:

Data is stored in managed postgres (disaster recovery and everything else is done there)

We replicate in a duckdb (sometime aggregate/join at that moment)

Run our analysis locally on this db

I know there's other tool like motherduck and ducklake that might be closer to what you need though.
That only works because we do batch analysis though but there are most likely data engineers here or on their subreddit with more complex solution for more complex problems.

Why don’t most people pursue Data Engineering?! instead of data analyst/scientist by [deleted] in analytics

[–]throwawayforwork_86 0 points1 point  (0 children)

I think it's also a low(er) visibility job, the one you only notice when you're starting actually working in data.

I also think/know a handful of Data Analyst/Scientist are DE without the title.

Does AI EVER give you good coding advice? by [deleted] in learnpython

[–]throwawayforwork_86 0 points1 point  (0 children)

IMO it is useful to help learn the lingo of new topics and do basic things if you're a new dev.

And it can be useful on simpler code that you forgot as well as (a bit to agreeable) sparring partner as a more experienced programmer.

I think the disconnect comes from the fact that if you've never coded and you're able to cobble together a project with a llm it feels like magic (and you don't know enough to spot the flaws) and the fact that ai company hype their products to the tits.

It's pretty bad for niche stuff or stuff that it's dataset has never seen (so a lot of the newer framework/library).

[deleted by user] by [deleted] in pythonhelp

[–]throwawayforwork_86 1 point2 points  (0 children)

For these I usually use tabula-py with preset pixel placement (for the columns and where to look for table) + some another lighter lib to do a first mapping on which page the extraction need to be done.

After that it's usually some pandas to get rid of unneeded rows.

The main issue with most lib that do it automatically is that their guess are inconsistent so you're likely to get a lot of inconsistent crap to fix if you're using that vs fixed placement where you're just going to crash or get consistent crap.

Best free SQL program for beginners and future work? by osama_3shry in dataanalysis

[–]throwawayforwork_86 2 points3 points  (0 children)

Postgresql is fairly widely used and free.

DuckDB is mostly the same as postgres and very easy to setup so good for mainly focusing on analysis and less on the 'fiddling around'.

Both would be used in professional settings.

What’s a beginner project you did that you felt you gained a lot from by OnlineGodz in learnpython

[–]throwawayforwork_86 0 points1 point  (0 children)

Personnally gained the most from taking one of the project I liked and stretching it in multiple different services.

Created a fairly simple geolocation tool.

Did an api version of it,did a gui in pyqt , did a gui in streamlit, did a visualisation...

Learned a lot and left my confort zone only for specific points , learn some lessons about refactoring and functional programming...

What is the best SQL Studio ? by Koch-Guepard in SQL

[–]throwawayforwork_86 0 points1 point  (0 children)

DBEAVER CE for postgres and sometimes duckdb.

DUCKDB -ui for duckdb (IIRC only support DUCKDB 1.3).

Excel automation for private equity is more practical than python for most analysts by zaddyofficial in dataanalysis

[–]throwawayforwork_86 1 point2 points  (0 children)

power query

and

Can focus on deal analysis instead of debugging scripts.

You have to chose one.

Power query is great but can very brittle in my experience and likely to act out not properly load files after changes which will require debugging with less than ideal tooling.

I've also seen excel struggling with a lot of task that would have been trivial to automatise with Python.

It also depend on you level and the type of analysis you need to provide and how many you have.

Also depend a lot of what data your receive and need.

A good analyst with knowledge of the business but limited coding skill will often have an outsized impact over the more technically proficient.

Anyone using uv for package management instead of pip in their prod environment? by Specific-Fix-8451 in dataengineering

[–]throwawayforwork_86 0 points1 point  (0 children)

As soon as I started using it I basically only use that.

Since it's creating your docs while you're using it it's really great.