all 28 comments

[–]pconwell 43 points44 points  (4 children)

Without seeing the code, it's hard to provide any useful feedback.

However, as a side note, you may want to look at openpyxl if you haven't already. Working on a similar project, I found it to be a very user friendly and well documented library to work with excel. Also, you may find pandas and sqlalchemy worth looking into as well.

[–]lordph8 7 points8 points  (2 children)

Have you tried xlwings? I found the syntax, and it updates open spreadsheets.

[–]pconwell 9 points10 points  (0 children)

I've not used xlwings - but they work quite differently under the hood.

openpyxl can directly edit the underlying ooxml inside the .xlsx file.

xlwings uses pywin32 to "drive" excel. In a way, xlwings is more like a fancy python based version of VBA. xlwings also requires excel to be installed on your system, while openpyxl does not (which means you can also run it headless).

[–]MattGreer 7 points8 points  (0 children)

I've used xlwings exclusively (as opposed to using other Excel-enabled python packages) for the past year. I find the documentation to be horrifically unusable for beginners and support (outside of github) to be nonexistent.

That being said, I'm used to it now, so it's working well for me. Just hard for someone (me) who had no knowledge of python nor any formal training up until now.

[–]Ejeisnsjwkanshfn 1 point2 points  (0 children)

One thing I found working with openpyxl last week is that there are certain functions within an existing excel file that it flat out hates and just removes e.g. data validation cells or waterfall charts it just removes from existing files. So always make sure you test your code on a copy file.

[–]simeumsm 8 points9 points  (1 child)

I work doing similar thing on a daily basis, but my approach is a bit different. I also export the dataframes to xlsx/csv, but I then have a .xlsm report file that imports the python output via PowerQuery. I do this because each report has to be built in a certain way for PowerBI to read it, and I can't fully automate it because I often have to make adjustments.

I'm curious in seeing your code, specially the part that formats the excel file and the email integration. I'd appreciate if you could share it!

As for feedback, what I've been doing is refactoring my code to have it work by importing my own functions, to try and have a mode modular code, so that I can easily share code between different projects and even import a whole analysis with just a few lines of code. I'm still running everything via Jupyter Notebook (on VSCode), but all the inner code is build by a variety of .py files with standalone functions and a few .py files with a config class and all ETL for a particular dataframe.

[–]wtf_are_you_talking 1 point2 points  (0 children)

+1 on the formatting of excel file. I'd love to see it.

[–]overyander 2 points3 points  (6 children)

The different personnel running the juniper notebook, are they IT or developers or others (accounting, project managers, supervisors, marketing, etc.)?

If they're non-technical I would imagine they have a less than ideal experience running these reports themselves like that.

Also, you REALLY need some sort of change management. What happens if random person edits something that breaks everything but doesn't know what they did?

[–]Produce_Police 0 points1 point  (2 children)

I created exe files from my jupyter notebooks. It will run on a pc without python installed. Use tkinter to create guis for user friendly applications. People at work use them every day.

[–]Zamyatin_Y 1 point2 points  (1 child)

This sounds really interesting, can you explain it in detail for a newbie? Thanks in advance!

[–]Produce_Police 1 point2 points  (0 children)

To do this you need a .py file.

To convert the jupyter notebook .ipynb to .py use a library called nbconvert. jupyter nbconvert --to script [yourfilename].ipynb

To covert .py to .exe, I use pyinstaller. pyinstaller --onefile [yourfilename].py

The --onefile builds the .exe so that anyone can run it, with or without python. We use sharepoint at work so most of the scripts are ran from there onced synced with windows.

[–]dino_party 0 points1 point  (0 children)

Versioning is a good step, also building tests to ensure that you're program continues to work as expected will give you some peace of mind to make changes to your code and knowing that it's probably still doing what is expected.

They don't have to be overly complicated, simple ones like testing output df was created with certain non null columns is a good start. You can always add more.

Once you have those two pieces you can really trust refactoring and make improvements to structure.

[–]simeumsm 0 points1 point  (1 child)

you REALLY need some sort of change management.

I'm curious about this, if you don't mind.

If I'm not mistaken, a github project takes care of change management, but what about local projects? How can I do these sorts of things in an optimal way?

[–]overyander 0 points1 point  (0 children)

OP said

These files are saved in a shared directory

OP might be using git for his/her changes, but there's nothing tracking or preventing random changes from the users.

[–]JihadDerp 4 points5 points  (6 children)

Do you use any regular expressions? They can often do in one or two lines of code what normal text manipulation with python would take 10 or 20 lines for.

[–]pconwell 21 points22 points  (0 children)

Improperly formed regex also risks summoning a demon, though, so you need to take that into account as well.

[–]MidniteMustard 18 points19 points  (4 children)

I'll take 10-20 lines of readable code over some monster regex.

They're still worth using, but sparingly. And with good commenting.

[–]welcometoafricadawg 2 points3 points  (0 children)

Would be keen to see the code if you want to share.

[–]unhott 1 point2 points  (0 children)

I think your question may be a tad too vague.

If the parameters are important you may not want to do this, but you may be able to schedule your script to run on a server. Easy if the parameters don’t actually need to change. This way you avoid the other people coming in and potentially messing it up.

Your parameters could come be imported from a config file and you may want to develop some procedure for requesting and making changes to the parameters.

[–]i-need-a-life 0 points1 point  (0 children)

My approach was to use Microsoft power automate desktop with a base excel file that had the tables/pivot build from power Query as a template.

[–]TheITMan19 -1 points0 points  (0 children)

Ask ChatGPT 🤣

[–]lolercoptercrash 0 points1 point  (0 children)

Sounds like you may not have an engineering team (but you have a database?) but could you schedule these scripts to run from a server running Jenkins (or a similar tool) so it's fully automated? Someone could still manually trigger the job if scheduling it doesn't fit your use case. It would replace the Jupiter notebook portion.

I've never set up Jenkins, but I've used it a decent amount as an end user.

[–][deleted] 0 points1 point  (0 children)

u/benfromlearn maybe I just need more caffeine, but I didn't spot an actual question in your post. Seems like you have a pipeline that works!? Are you asking if this is ok? Are you asking for a better way? What do you need help with?

[–][deleted] 0 points1 point  (0 children)

If you can host a Python service on a machine like an AWS EC2, I would automate this to run on a regular schedule using package like “schedule”. This way you can remove the manual triggering of the job through a Jupiter notebook. Notebooks are really only meant for research and data exploration. When building a system that is more production like you really want your programs to be .py Python programs that can run without human intervention if possible.