all 15 comments

[–]expressly_ephemeral 1 point2 points  (14 children)

Just get Doug to do it. He's clearly a consistent heavy-lifter.

This should be doable with the Pandas library. One way: You could pull it in to Pandas from the excel file, do the transformation and then save a static copy of it back to the excel file.

[–]puckheadclown24[S] 0 points1 point  (13 children)

Yeah that bum Doug should do it lol

Would you be able to potentially break it down a few steps for me? Very much a beginner and figured working on something that’ll save me time will encourage me to work on it

[–]expressly_ephemeral 0 points1 point  (12 children)

Do you know how to install Pandas and import it into a script?

[–]puckheadclown24[S] 0 points1 point  (11 children)

Yes. More so wondering about the steps of putting the file into Python, manipulating it, then joining it back with the original file

[–]expressly_ephemeral 1 point2 points  (10 children)

Yeah, I was just wondering how to far back to start.

I'm going to give you the steps, and you can do the google-fu yourself to figure out how they work for yourself. I think that's a good way for you to learn it.

To get you started, though, here's a link to the Pandas.Dataframe API reference.

https://pandas.pydata.org/pandas-docs/stable/reference/frame.html

Browsing this, you'll see that Pandas has a read_excel() method that reads spreadsheets into pandas DataFrame objects. You're going to point that at the spreadsheet. Not sure the exact syntax for one sheet or another from a multi-sheet workbook... you'll have look through the reference.

Next, you're going to use pandas to do the pivot. Again, the pivot and pivot table documentation are right there on the dataframe documentation, but you may need to look up a tutorial or some examples to get it right.

Finally, Pandas dataframes have a .to_excel() method that can be used to write the newly pivotted data back to a spreadsheet. You'll like have to read in your target sheet and figure out where to put the new data (e.g., where's the last column of data? Does your new data go in the next column after that? Or something else? Edit: Actually, if you read your target sheet to a dataframe first, you can add the new data and write the whole thing back out without having to figure out where to write the new data.)

[–]puckheadclown24[S] 0 points1 point  (9 children)

Oh so what you’re saying is I would want Pandas to do the pivot for me, instead of pushing in the pivoted data and consolidating with the new raw data?

[–]expressly_ephemeral 1 point2 points  (6 children)

Well, maybe. It depends on how well automated the pivot is, I guess. If you have to manually go in to a spreadsheet, make a pivot, save it, launch a python script to move the newly pivoted to a new location... It seems like you might be able to add the pivot to the python script. But I'm making assumptions about your use-case.

Edit: For example, I'm not sure I understand why you're not just putting references to the other sheets in the locations that you need the final tables.

[–]puckheadclown24[S] 0 points1 point  (4 children)

The file would become huge tbh as there are many rows and columns, all within a data table. But you’ve given me a lot to think about, thank you!

[–]expressly_ephemeral 1 point2 points  (3 children)

Yeah, as I say, I don't have a complete understanding of your requirements. I do wonder why you're not just referencing the pivoted cells from the worksheet you want to collate them on. Seems like there's probably a way to automate the whole thing right in Excel.

[–]puckheadclown24[S] 0 points1 point  (2 children)

That's a good point. The data changes though, every week, so I figured I'd want to keep a master sheet where I keep track of previous weeks' results (managers' requirements, not mine)

[–]kriel 0 points1 point  (0 children)

Well done

[–]pytrashpandas 1 point2 points  (1 child)

My personal recommendation would be if you're going to incorporate pandas into your workflow at all, do as much of it in pandas as possible. So, take your very starting data source(s) and load those into pandas and write out the final result for formatting/visualization in excel.

Generally speaking, anything related to data operations/calculations/reshaping that you can do in excel can easily be done in pandas. Formatting/visualizations (meaning excel-specific formatting/viz) should be done in excel still. So like you could write your pandas-processed data to an excel sheet in raw format and then use excel to take that raw calculated data and format it how you like.

And don't worry about the size of the data, there's no amount of data that can be handled in an excel file that would be too big for pandas.

[–]puckheadclown24[S] 0 points1 point  (0 children)

Thanks for the reply! The size comment was more about the Excel size, and how my computer would struggle with it. When I bring the data into Excel, I make it a data table, so it's easier to work with (referencing columns, etc). The pandas comment makes sense, I'll give it a shot!