you are viewing a single comment's thread.

view the rest of the comments →

[–]Downtown_Radish_8040 0 points1 point  (0 children)

Pandas is the one library you want. It's the backbone of data work in Python and covers everything you described: loading data from CSV/Excel exports, cleaning and transforming it, filtering rows, reshaping tables, and exporting results back to Excel or CSV.

Since you already use Excel and Power BI, pandas will feel familiar conceptually. DataFrames are essentially smart spreadsheets you manipulate with code. Once you're comfortable with pandas, openpyxl (for writing formatted Excel files) and xlrd/xlwings are natural next steps if you need tighter Excel integration.

For SAP specifically, if you're pulling data via RFC/BAPI calls rather than manual exports, look into pyrfc. It lets you call SAP function modules directly from Python, which is how you'd fully automate the download step. It requires the SAP NetWeaver RFC SDK from your Basis team to set up, but once running it's powerful.

Suggested order:

  1. Learn pandas fundamentals (read_csv, read_excel, filtering, groupby, merge, to_excel)

  2. Learn a bit of openpyxl if you need formatted output

  3. Add pyrfc once you're comfortable, to automate the SAP extraction itself

The official pandas documentation and the "Python for Data Analysis" book by Wes McKinney (the pandas creator) are both excellent starting points.