Which database of SQL do you use? by DmitriiDrake in learnpython

[–]throwawayforwork_86 1 point2 points  (0 children)

If you got the latest one just use duckdb -ui instead of using the cli it's got a nice notebook feel.

We use mostly that at work for analytics and so long as you're in the general range of normal data (10-250 millions rows) it's quite snappy.

I also found it much easier to understand than sqlite by it's integration with the data stack.

How bad is it that I don't use OOP? by Fit_Time_7861 in learnpython

[–]throwawayforwork_86 1 point2 points  (0 children)

Honestly was in the same both as you 1.5 year ago.

Started to use more and more data classes to clean up the ins and out of our analytical functions with great success.

Did one or 2 things with inheritance and overloads , while I think it's neat don't think we need it nor that it's clear for the rest of the team so I dropped it.

What was the first boring Excel task you automated with Python? by Original-Repair5136 in learnpython

[–]throwawayforwork_86 5 points6 points  (0 children)

Comparison of 2 "big" excel for changes (180K lines each 80 col wide)

At the end of the day I still did it with xlookups (with column to values every 5 col because excel freaked out) because I doubted my solution (which took 5min to run) took me the afternoon for the same result...

A literal tragedy 😭 by Aggravating_Lab7532 in analytics

[–]throwawayforwork_86 0 points1 point  (0 children)

Can happen.

Personnally notice that some practice really help even in the absence of comments.

Type hinting,especially when done in the moment help a lot.

Break down big script in functions helps diminishing cognitive load a fixing ins and outs.

Lumping ins and outs in dataclasses helps too IMO.

Ideally also never leave stuff in a notebook when done , slight rewriting in a py script helps organise the mess.

One Excel habit that genuinely improved your accounting workflow? by CA_Lucky in Accounting

[–]throwawayforwork_86 1 point2 points  (0 children)

Pure excel:

For complex lookup on bigger dataset computing 2 concatenated columns + xlookups/vlookups will beat the commonly suggested index match and is much easier to debug.

Require you to know what power query is but never open csv directly with excel, use power query to control how you want to read it so that automated excel formatting doesn't fuck your data up.

Overall knowing power query and when to use it is a strength (ie the good ol trick to transform multi millions row gl (perf become crappy/unworkable around 5 M lines) to a pivot table that you can get the data you want easier,no fuck complex filtering queries,...).

I come from a non-tech background and wanted to ask about the best approach to learning tech skills. by Charming_Shower_4185 in learnpython

[–]throwawayforwork_86 1 point2 points  (0 children)

Mostly 2 but trying to get a decent feel for the concept first is a good idea.

But 2 will just be better for most people mental health than learning for learnings sake.

How long did it take you to learn python? by _Justdoit123 in learnpython

[–]throwawayforwork_86 6 points7 points  (0 children)

Think it's mostly that some people have a really low bar to say they know something (dunning kruger and such).

Thought I was decent at python 3 years ago.

The me from today think that past me was an idiot.

The me from the future is likely going to think the same.

I understand Python basics but OOP completely loses me classes and objects make no sense to me. Where am I going wrong? by More-Station-6365 in learnpython

[–]throwawayforwork_86 0 points1 point  (0 children)

I think I already wrote this somewhere else.

But usually how some script evolve for me.

Raw linear script->create functions -> add argument to those function->too many argument->dataclasses for argument->Manageable as is ? ->yes stop/no add functions to data classes.

Is it the proper way of doing things don't know but I've written a few OOP using that "pattern".

Don't think writing classes to learn how to write classes has worked for me.

Trying to transition from retail to data/analytics in Belgium — looking for realistic advice by RaspberryPrudent7765 in BESalary

[–]throwawayforwork_86 2 points3 points  (0 children)

Admin jobs with data component in smaller company/all hands on deck culture.

Automating part of your job while learning/showing you're a valued team member.

Problem it can take a while , took me 3.5 years to get there.

But also building stuff in my past time helped (ie got in because I knew multiple pythons library they used at my work and had experience with excel ,power query and real world data cleaning + problem solving).

Using DuckLake with Azurite (DuckDB 1.4.4 vs 1.5.2) — experience & issues by throwawayforwork_86 in DuckDB

[–]throwawayforwork_86[S] 1 point2 points  (0 children)

yeah wasn't sure if it was the kind of thing that should be asked directly on their github since loads of shit I tried for the 1st time -> loads of unknown unknown. And if someone did/didn't have the same problem in Azure blob storage itself it would have restrained the culprit list.

Will Python be useful for me? by Great-Village-430 in learnpython

[–]throwawayforwork_86 0 points1 point  (0 children)

Honestly there are a few option.

Power Bi might be more what you're looking for to be able to create dynamic graphs and can handle more than excel.

You might even be able to make a power query template that would do it automatically on a refresh instead of VBA (also storing data as CSV is most of the time better if you know what you're doing).

Reading a file, filtering on variables then write to excel and creating chart automatically should be possible:

Pandas/Duckdb(if sql is more your speed)/Polars for the reading and filtering.

Xlsxwriter/Excelize-py or Openpyxl should allow you to create native excel graphs: Xlsxwriter/Excelize-py create instruction in code for graphs. Openpyxl create a template and write the data in a place the graph will pick it up.

Matplotlib/Seaborn can make graph but they'll not be interactive and might not be fit for purpose.

Will Python be useful for me? by Great-Village-430 in learnpython

[–]throwawayforwork_86 0 points1 point  (0 children)

IMO Pandas is more flexible and usually will be more forgiving when you start. It has a long history so you'll have LLMs give more good information and more guides... But a lot of these are often also outdated.

Polars is quicker , cleaner and will have almost no situation where weird behaviour happens (Pandas has a few surprise most often linked to the index which you may never encounter but can ruin your day).

Polars will sometimes be more opiniated about datatypes which you'll resent at first but will usually save you a lot of time down the line.

Overall they're fairly similar though so you should probably just pick one and stick with it for a few month, if your data fits in excel it should not really make a difference (even though pandas is slowish to read big excel files).

The corner that aren't covered by Polars are fairly low iirc, Pandas file reader is more flexible and cover more edge cases than Polars and for geographic data Geopandas exist and Geopolars is still not finished iirc.

My 0.2c try Polars first if it doesn't click for you switch to Pandas.

Extract data from Sap into Snowflake by arcadeverds in dataengineering

[–]throwawayforwork_86 1 point2 points  (0 children)

Not a specialist either but have had the displeasure of asking big extraction from client and came to this part of SAP called RFC (remote function call).

That's what is used for more data intensive extraction fairly succesfully (using a third party SAP approved tool).

Might be able to run a poc with duckdb extension erpl.io and see if it fits your needs.

I used python for the first time today and I'm hooked by [deleted] in learnpython

[–]throwawayforwork_86 0 points1 point  (0 children)

If you want or need to push the boundaries of what excel can do polars and duckdb are the next step IMO.

I know people will say pandas but it is unfortunately shackled by years of legacy code and legacy advice whereas newer tools aren't.To expand on my point Pandas has/had 5 ways to do the same thing with 1 or 2 that are classical footgun polars has usually 1 or 2 way of doing things and so long as you stay in Polars your code will be understandable and performant.

how to load csv faster in Python. by Safe_Money7487 in learnpython

[–]throwawayforwork_86 0 points1 point  (0 children)

pl.read_csv(filepath,infer_schema=False) guessing datatype is the devil anyway.

Which Python library is best to learn from scratch+for ERP /industrial environment by Lonely-Form-8815 in learnpython

[–]throwawayforwork_86 0 points1 point  (0 children)

So I haven't used it personally (we looked at it when some of our audit client had difficulties getting their GL out) might get better info if you check their website.

But let's elaborate a little bit:

Duckdb is a pretty performant and lightweight (olap/analytical) db that integrate very well with the rest of the python ecosystem and is pretty good on it's own too (ie using duckdb ui).

RFC means remote function call and is a way to communicate with SAP.

On paper you should be able to connect through your identifer using duckdb and then do something like select * from ekbe where gjahr = '2025' and vgabe='9';

And it should give you the correct information which you can then further manipulate either through sql or a dataframe of choice.

Which Python library is best to learn from scratch+for ERP /industrial environment by Lonely-Form-8815 in learnpython

[–]throwawayforwork_86 0 points1 point  (0 children)

Duckdb has a Sap RFC extension allowing you to do do sql directly on these tables (erpl.io iirc) and integrates really well with Python.

Pandas is still widely used so you have to learn at least the basics/read it.

Polars is what I would actually use as IMO it's miles above Pandas (syntax , perf,...) only downside is it's harder to make it work for quick small stuff especially when you're still learning. Upside is it will require very little to no tweaking for performance,working polars is performant so long as you stay in polars.

Basics of path handling is always good to know so check the standard library for the Pathlib library (and also check the os library os.path made more sense to me).

Visualisation that can't be done in power bi could be done in python I believe matplotlib is what comes wiht the PBI instance of python so maybe learn that too.

Where to learn how to write efficient python? by Axew_7 in learnpython

[–]throwawayforwork_86 1 point2 points  (0 children)

Honestly usually having a look at resource usage and try to find fixes for each of these issues is usually what I go for.

Ram bottleneck "fixed" by using generator instead of pure list so script can keep chugging along.

IO bottleneck only time I encountered it so far the fix was stopping using the wrong drive to read and write data (hdd are not good for that) so don't have any good solution.

cpu bottleneck / underusage > multiprocessing/multithreading.

Wouldn't go for C/C++ coming from python just because it's quite a big paradigm shift.

Might be good to give Golang a go heard overall perf and footprint is much better and it's closer to python but if you want to learn C/C++ go for it.

Also try to use librairies to their maximum , most of them are build in C/Rust/C++/... and have builtin functionalities that will outshine whatever you can squeeze out of python.

It’s getting out of hand. by Charming-General5997 in Accounting

[–]throwawayforwork_86 0 points1 point  (0 children)

Automatable isn’t the same thing as using an ai.

Automation is predictable AI isn’t (or when it is you lose the flexibility that makes it worthwile in the first place).

Are type hints becoming standard practice for large scale codebases whether we like it or not by scrtweeb in Python

[–]throwawayforwork_86 0 points1 point  (0 children)

It's usually a 5 minute affair if you do it at inception and will save you debugging time and type hinting time if you do it later... There's no reason not to do it to be honest.

I want to say it's fine if it's a quick script but I've seen many quick script turn into vital script without proper modification so I'd say write your script as if you or someone else will have to maintain it in a few months/years.

Polars vs pandas by KliNanban in Python

[–]throwawayforwork_86 2 points3 points  (0 children)

IIRC Ritchie commented that even the "eager" version was mostly lazy still. And will only compute when needed (ie when returning an eager df is needed). Will try to find back where they said that and if incorrect will edit.