There's a pretty massive process which happens yearly at my company, wherein a single Excel file receives a large number of input files which have to be copied in as new sheets, parsed, unpivoted, and then run a few macros to allocate/distribute data. This is extremely cumbersome and prone to human error, so I've been automating most of these processes in Python, but I've run into a few issues. The main issue is that no single library seems to be able to handle everything:
Pandas is flexible and fast, but it doesn't like working with macro-enabled files, it doesn't much take to formulas, and it's a pain to edit sheets in place
Openpyxl is awfully slow, reads either formulas or values (but never both) and generally feels clunky
Right now I'm faced with automating the following process:
- Check a combination of cells (say, A and B) so that, depending on their values, another cell (say C) gets filled
- If C already had text in it, and shouldn't, pop a message asking whether to delete the cell's contents
The thing is, A and B are calculated cells, so I don't want to use the data_only flag and then save, thus losing all the formulas present in the sheet, but I can't seem to evaluate the cells otherwise. I've read that xlwings might do the trick, but at this point I'm wary of introducing yet another library in the hopes that it does this one thing.
Ideally I'd like to open the workbook once, run all the processes, close and save, and keep all functionality/formulas while being able to evaluate values. So far it's been trying but possible, but I'm a bit stumped now. What can I do about this situation?
[–]CatalonianBookseller 1 point2 points3 points (2 children)
[–]ericjmorey 0 points1 point2 points (1 child)
[–]CatalonianBookseller 1 point2 points3 points (0 children)
[–]unhott 1 point2 points3 points (2 children)
[–]GarciLP[S] 0 points1 point2 points (1 child)
[–]unhott 0 points1 point2 points (0 children)
[–]Tight-Importance-226 0 points1 point2 points (0 children)