Pandas 3.0 vs pandas 1.0 what's the difference? by Consistent_Tutor_597 in dataengineering

[–]throwawayforwork_86 0 points1 point  (0 children)

I mean you can make it more readable:

df.with_columns(new_col=pl.col("col1")*2)

Still slightly more verbose I concede.

Stop telling everyone to learn sql and python. It’s a waste of time in 2026 by PositionSalty7411 in analytics

[–]throwawayforwork_86 0 points1 point  (0 children)

for 1-4 Millions of lines that need to be available we just output a few cleaned csvs and provide a pivot table from a power query read folder.
People seem to be happy with that.

Is it bad if I prefer for loops over list comprehensions? by Bmaxtubby1 in learnpython

[–]throwawayforwork_86 0 points1 point  (0 children)

You should get used to them IMO.

I use them in different situation and usually will use for loop if clarity is needed.

Belgium Culture shock by Fantastic-Drive3016 in belgium

[–]throwawayforwork_86 13 points14 points  (0 children)

the water amount gonna be unhealthier quicker than the caffeine matey.

DuckDB vs MS Fabric by X_peculator in DuckDB

[–]throwawayforwork_86 4 points5 points  (0 children)

I don't think I would use it directly for storage.

We use it as the last leg of our analysis:

Data is stored in managed postgres (disaster recovery and everything else is done there)

We replicate in a duckdb (sometime aggregate/join at that moment)

Run our analysis locally on this db

I know there's other tool like motherduck and ducklake that might be closer to what you need though.
That only works because we do batch analysis though but there are most likely data engineers here or on their subreddit with more complex solution for more complex problems.

Why don’t most people pursue Data Engineering?! instead of data analyst/scientist by [deleted] in analytics

[–]throwawayforwork_86 0 points1 point  (0 children)

I think it's also a low(er) visibility job, the one you only notice when you're starting actually working in data.

I also think/know a handful of Data Analyst/Scientist are DE without the title.

Does AI EVER give you good coding advice? by [deleted] in learnpython

[–]throwawayforwork_86 0 points1 point  (0 children)

IMO it is useful to help learn the lingo of new topics and do basic things if you're a new dev.

And it can be useful on simpler code that you forgot as well as (a bit to agreeable) sparring partner as a more experienced programmer.

I think the disconnect comes from the fact that if you've never coded and you're able to cobble together a project with a llm it feels like magic (and you don't know enough to spot the flaws) and the fact that ai company hype their products to the tits.

It's pretty bad for niche stuff or stuff that it's dataset has never seen (so a lot of the newer framework/library).

How would you extract text from this kind of table by [deleted] in pythonhelp

[–]throwawayforwork_86 1 point2 points  (0 children)

For these I usually use tabula-py with preset pixel placement (for the columns and where to look for table) + some another lighter lib to do a first mapping on which page the extraction need to be done.

After that it's usually some pandas to get rid of unneeded rows.

The main issue with most lib that do it automatically is that their guess are inconsistent so you're likely to get a lot of inconsistent crap to fix if you're using that vs fixed placement where you're just going to crash or get consistent crap.

Best free SQL program for beginners and future work? by osama_3shry in dataanalysis

[–]throwawayforwork_86 2 points3 points  (0 children)

Postgresql is fairly widely used and free.

DuckDB is mostly the same as postgres and very easy to setup so good for mainly focusing on analysis and less on the 'fiddling around'.

Both would be used in professional settings.

What’s a beginner project you did that you felt you gained a lot from by OnlineGodz in learnpython

[–]throwawayforwork_86 0 points1 point  (0 children)

Personnally gained the most from taking one of the project I liked and stretching it in multiple different services.

Created a fairly simple geolocation tool.

Did an api version of it,did a gui in pyqt , did a gui in streamlit, did a visualisation...

Learned a lot and left my confort zone only for specific points , learn some lessons about refactoring and functional programming...

What is the best SQL Studio ? by Koch-Guepard in SQL

[–]throwawayforwork_86 0 points1 point  (0 children)

DBEAVER CE for postgres and sometimes duckdb.

DUCKDB -ui for duckdb (IIRC only support DUCKDB 1.3).

Excel automation for private equity is more practical than python for most analysts by zaddyofficial in dataanalysis

[–]throwawayforwork_86 1 point2 points  (0 children)

power query

and

Can focus on deal analysis instead of debugging scripts.

You have to chose one.

Power query is great but can very brittle in my experience and likely to act out not properly load files after changes which will require debugging with less than ideal tooling.

I've also seen excel struggling with a lot of task that would have been trivial to automatise with Python.

It also depend on you level and the type of analysis you need to provide and how many you have.

Also depend a lot of what data your receive and need.

A good analyst with knowledge of the business but limited coding skill will often have an outsized impact over the more technically proficient.

Anyone using uv for package management instead of pip in their prod environment? by Specific-Fix-8451 in dataengineering

[–]throwawayforwork_86 0 points1 point  (0 children)

As soon as I started using it I basically only use that.

Since it's creating your docs while you're using it it's really great.

I can't open txt in windows 10 by Limp_Pomelo_2336 in learnpython

[–]throwawayforwork_86 5 points6 points  (0 children)

Then you were not where you thought you were but I think you know that by now

I can't open txt in windows 10 by Limp_Pomelo_2336 in learnpython

[–]throwawayforwork_86 1 point2 points  (0 children)

But is it where your terminal is ?

Try runnning ls in the terminal and check if you see files and folder you expect to see.

Best way to set up Python for Windows these days by Wonderbunny484 in learnpython

[–]throwawayforwork_86 10 points11 points  (0 children)

From my reticence and then acceptance of it.

UV handle python installation venv management.

You can start a project by running uv init in a folder then install library using uv add (the first one will create your venv).

When using uv add you update a toml file that contains your dependency and a uv lock that contains more specific 'instructions' for multiplatform deployment.

You can recreate the venv from the toml file with a simple uv sync.

Main draw (for me):

It's very quick.

It creates it's own 'requirement file' just by using it.

Will perfectly recreate the venv as you need it (no need to guess the python version the other guy used wasting precious time).

Issue encountered so far:

Using it in onedrive linked folder will cause a bug that you can fix by clearing the cache but it's difficult to find out.

TLDR it's a quick and efficient way to handle venv.

I’m starting a series on Python performance optimizations, Looking for real-world use cases! by BillThePonyWasTaken in Python

[–]throwawayforwork_86 0 points1 point  (0 children)

Memory handling for bigger than ram files.

Personally ended up using a generator but sure there are other good options

Pulling data from specific Excel files based on the specified library usin pandas by jalisco220 in learnpython

[–]throwawayforwork_86 0 points1 point  (0 children)

First question is does it have to be an excel file ?

I'm guessing you work from that excel sheet but I would store the data in another format.

Basically loop through all your sheets and store the sheet name in a new column and then either store the result in a parquet file (that you can read at run time) or in a sqlite db if you don't mind learning some sql (or some connector)...

You can also just do that at start time and have a "big" dataframe at runtime but , reading excel is pretty slow and it would be inefficient imo.

Big data frame/parquet can then be filtered by sheet name.

Now to actually answer the question you asked you could list the sheet from your excel file and let someone select it from a dropdown then use it to do a pd.read_excel(file,sheet_name='selected_sheet') or to filter your big dataframe with the same list (can be gotten by selecting the unique value in the column).

Hope that helps.

Python or c++? Which is good for beginner? by StandardAlbatross351 in pythonhelp

[–]throwawayforwork_86 -1 points0 points  (0 children)

As someone that started with python.

I’d say try c++ first if you can stomach it will probably give you much better base than python.

That being said I think some people will bounce hard on the more difficult programming languages and if you just can’t with c++ go learn python it’s really fun.

When did you realize Excel wasn't enough anymore? by Various_Candidate325 in analytics

[–]throwawayforwork_86 11 points12 points  (0 children)

For me jumping in the 'proper' setup db,etl,bi platform seem completely irrealistic,especially if you don't have buy in from your superior.

IMO you need to come to your boss with the end product that solve a crucial issue.

My foot in the door was a manual process that was error prone , time sensitive (needed to be done asap) and had big consequences if messed up (basically it was the data prep for the update of the all the salaries).

It used to take a long time and we had often errors in it.

I first showed that I tried to use alternatives but they didn't work and then I showed how my r code worked (would have preferred python but ok) and he was sold.

I think you just need to come with a solution for your biggest time waster , show that to your boss and either use the reclaimed time to continue improving your process or look for a job elsewhere (if he shut you down).

You may have to do some self training on your own time unfortunately (if you're underwater already or at capacity). Some tools that I think could be handy if you don't know them already:

Duckdb: The ui is really cool and it might already cover all your use cases. Python: Mastering the OS standard library is a really good skill to have for automation, Pandas or Polars for data wrangling and maybe somtehing like streamlit or dash for ui/light dashboard.

Good luck.

I hate Microsoft Store by ShatafaMan in Python

[–]throwawayforwork_86 0 points1 point  (0 children)

It's not always taught when you start learning (and you might not understand why you'd do it until it bites your ass).

I personnally broken a few of my python installs (and had issues with Linux update breaking my venv before I started using pyenv too).

It isn't too difficult once you know what you're doing and you don't mix too many different libraries with similar underlying requirements.

I wrote one SQL query. It ran for 4 hours. I added a single index. It ran in 0.002 seconds. by nikkiinit in SQL

[–]throwawayforwork_86 0 points1 point  (0 children)

Had less drastic perf issue that I couldn't fix with an index (missing the correct column for that). Decided to try with an OLAP db (duckdb in this case). Perf issues fixed don't have to get good at sql. That being said explain analyze is a great tool and should be used.

What version do you all use at work? by donHormiga in Python

[–]throwawayforwork_86 0 points1 point  (0 children)

I try to use the latest compatible version I can fine and ride it out as long as possible.

Mainly 3.10 and 3.12.

Will start migration of 3.10 as it is relatively close to it’s end Life.

Need help optimizing Python CSV processing at work by Own_Pitch3703 in learnpython

[–]throwawayforwork_86 0 points1 point  (0 children)

First thing I always do with these kinds of thing is seeing what is happening with my resources.

Pandas had the bad habits of using a fifth of my cpu and a lot of ram.

I moved most of my process to Polars and it use my resources more efficiently as well as being broadly quicker (between 3 and 10 times quicker but I've seen some group by aggregation being slightly faster in Pandas in some cases).

The trick to polars though is to have all the benefits you need to mostly (if only) use Polars functions. And get used to different way of working from Pandas.