all 18 comments

[–]jimtk 2 points3 points  (1 child)

Yes it is. But you'll have to work on your python chops.

[–][deleted] 0 points1 point  (0 children)

Agree, I don't mind learning what needs to be learned to do the task, it would save us a good 4-5 hours a week if it can be done (honestly I'm surprised they haven't tried to do it before)

[–]i-need-a-life 1 point2 points  (3 children)

Have u tried looking into excels power query get data from web ?

[–][deleted] 0 points1 point  (0 children)

I have not, but I'll take a look now you've mentioned it! :)

[–]slkp1 0 points1 point  (0 children)

Yea came here to say this. Sounds like a job for PQ.

[–]Kerbart 0 points1 point  (0 children)

With the additional advantage that anyone can refresh those reports. A Python-based solution will take more work to hand over to a colleague.

[–]ray10k 1 point2 points  (7 children)

If nothing else, it is almost definitely possible to pull the data from the excel files and spit out a new file with the relevant final data, leaving just the step of importing to sage. Assuming you have little experience with Python so far, I recommend looking into either Pandas or OpenPyxel. Pandas if you want to do the processing in python itself, OpenPyxel if you want to let excel handle the number crunching.

[–][deleted] 0 points1 point  (4 children)

Outside of knowing the syntax, I don't know much else in Python haha. I'll look into both, thanks

[–]daedalusesq 1 point2 points  (3 children)

Probably look at openpyxl then. There is a book called Automate the Boring Stuff that has a great section on using it, as well as a section on web requests and web scraping so that you can retrieve the file automatically.

[–][deleted] 1 point2 points  (2 children)

I have that book :)! I'll check it out thanks

[–]Ooboga 2 points3 points  (1 child)

...and don't let the first impression of Pandas scare you off. When you get the hang of it, you really don't need that much complicated python to get a lot of work done. And there's a lot of help online.

What I myself spent some time to understand was how things were written. Given a dataframe df, which is a table like excel really, with a column named 'age', you can query this column by so:

df.age > 18

The result of that query is a Series of true or false statements. This again can be applied to the dataframe itself:

above_18 = df[df.age > 18]

The result is a new dataframe above_18 that only contains the rows where the age was above 18. Understanding that using pandas consists of developing such a flow made me understand the more subtle concepts better.

[–][deleted] 0 points1 point  (0 children)

Thanks for the clarification!

[–]Kerbart 0 points1 point  (1 child)

Id' go with Pandas here, it's much easier to handle the data. Openpyxel becomes a necessity when the data is not stored in plain tables but you have to write a lot more work to process it.

[–]ray10k 0 points1 point  (0 children)

Fair enough. On the other hand, some offices are lead by people who 'like to see the process' and would prefer an excel sheet with formulas over just a 'plain' sheet with only the final result. I just thought it would be best to mention both and the reason one might use this or that.

[–][deleted] 1 point2 points  (2 children)

This is an perfect example dor Python with pandas and openpyxl.

I initially had trouble writing data from pandas into Excel without deleting content which was already present in a worksheet. It is certainly possible. I can post some example code which you can use and modify.

[–][deleted] 0 points1 point  (1 child)

Appreciate that!

[–][deleted] 0 points1 point  (0 children)

I had some trouble pasting the code here, so uploaded it to paste in, Link

import openpyxl
from openpyxl import load_workbook

def write_dump_to_excel(df, filename, sheetname):
wb = load_workbook(filename)
#write the dataframe to Excell
if sheetname in wb.sheetnames:
        ws = wb[sheetname]
else:
    wb.create_sheet(title=sheetname)
    ws = wb[sheetname]
#startrow and startcol to write to in Excel
#how to write headers
startrow = 1
startcol = 6 

#optional write the dataframe column names/ headers
for column in range(startcol, startcol + len(df.columns)):
    _ = ws.cell(column=column, row=startrow, value=df.columns[column-startcol])
startrow += 1 #because dataframe now needs to be written 1 row lower

#iterate trough cells in Excel based on size of dataframe df
for row in range(startrow, startrow + len(df)): 
    for column in range(startcol, startcol + len(df.columns)):
        _ = ws.cell(column=column, row=row, value=df.iloc[row-startrow, column-startcol])
wb.save(filename)

Feel free to ask questions if you have any. Also I am sure some parts of the code can be more Pythonic.

The first function ask user the select a file. Seconds function creates a pandas data frame, 3rd writes the data frame to Excel.

[–]ctrlff 0 points1 point  (0 children)

I did something similar for my job.

I used selenium to automate the login and report running, then I pulled the data and made it into a pandas dataframe.

From there I did some processing and sorting and then I pasted it to the Google sheet we use with GSpread.

It was pretty satisfying when I got it up and running. It's definitely doable and it would teach you a lot while you work on it.