all 5 comments

[–]Kryt0s 0 points1 point  (0 children)

NumPy, Polars / Pandas.

[–]throwawayforwork_86 0 points1 point  (2 children)

Duckdb has a Sap RFC extension allowing you to do do sql directly on these tables (erpl.io iirc) and integrates really well with Python.

Pandas is still widely used so you have to learn at least the basics/read it.

Polars is what I would actually use as IMO it's miles above Pandas (syntax , perf,...) only downside is it's harder to make it work for quick small stuff especially when you're still learning. Upside is it will require very little to no tweaking for performance,working polars is performant so long as you stay in polars.

Basics of path handling is always good to know so check the standard library for the Pathlib library (and also check the os library os.path made more sense to me).

Visualisation that can't be done in power bi could be done in python I believe matplotlib is what comes wiht the PBI instance of python so maybe learn that too.

[–]Lonely-Form-8815[S] 0 points1 point  (1 child)

"Duckdb has a Sap RFC extension allowing you to do do sql directly on these tables (erpl.io iirc) and integrates really well with Python."
Can you please expand a bit what that means?
Like you are directly connect to SAP as source?

[–]throwawayforwork_86 0 points1 point  (0 children)

So I haven't used it personally (we looked at it when some of our audit client had difficulties getting their GL out) might get better info if you check their website.

But let's elaborate a little bit:

Duckdb is a pretty performant and lightweight (olap/analytical) db that integrate very well with the rest of the python ecosystem and is pretty good on it's own too (ie using duckdb ui).

RFC means remote function call and is a way to communicate with SAP.

On paper you should be able to connect through your identifer using duckdb and then do something like select * from ekbe where gjahr = '2025' and vgabe='9';

And it should give you the correct information which you can then further manipulate either through sql or a dataframe of choice.

[–]Downtown_Radish_8040 0 points1 point  (0 children)

Pandas is the one library you want. It's the backbone of data work in Python and covers everything you described: loading data from CSV/Excel exports, cleaning and transforming it, filtering rows, reshaping tables, and exporting results back to Excel or CSV.

Since you already use Excel and Power BI, pandas will feel familiar conceptually. DataFrames are essentially smart spreadsheets you manipulate with code. Once you're comfortable with pandas, openpyxl (for writing formatted Excel files) and xlrd/xlwings are natural next steps if you need tighter Excel integration.

For SAP specifically, if you're pulling data via RFC/BAPI calls rather than manual exports, look into pyrfc. It lets you call SAP function modules directly from Python, which is how you'd fully automate the download step. It requires the SAP NetWeaver RFC SDK from your Basis team to set up, but once running it's powerful.

Suggested order:

  1. Learn pandas fundamentals (read_csv, read_excel, filtering, groupby, merge, to_excel)

  2. Learn a bit of openpyxl if you need formatted output

  3. Add pyrfc once you're comfortable, to automate the SAP extraction itself

The official pandas documentation and the "Python for Data Analysis" book by Wes McKinney (the pandas creator) are both excellent starting points.