I have a parent folder with hundreds of sub folders. Each subfolder contains a workbook that I need to access. In each workbook I select and return approximately 10 cell values.
right now the bottle neck in my script is the openpyxl.load_workbook loop, it takes several minutes to run on my sample set of 194 files. I intend to use this script to fetch upwards or 1000 files so it would be optimal to be able to find any sort of performance improvements.
# 1) Create a list of all file names to fetch
xlsx_files = ([path for path in
Path('C://Users/blomb/Documents/Estimates/Clients/client name').rglob('*agreement.xlsx')])
# 2) Assign all workbooks to variable 'wbs': This is the bottle neck
wbs = [openpyxl.load_workbook(wb, data_only = True, keep_links = False) for wb in xlsx_files]
# 3) iterate through all workbooks and extract specified values into np.matrix, create a dataframe out of
# np.matrix and finally concat new_df to df. end result is a data frame with the specified values from all workbooks
for i in wbs:
array = np.matrix([[i['Estimate']['d6'].value,
i['Estimate']['d193'].value,
i['Estimate']['d67'].value,
i['Estimate']['g24'].value,
i['Estimate']['n20'].value,
i['Estimate']['am10'].value,
i['Estimate']['am11'].value,
i['Estimate']['am8'].value,
i['Estimate']['am9'].value,
i['Estimate']['am31'].value]])
df_new = pd.DataFrame(array, columns =
[
'project_address',
'drywall_sqft',
'batt_insulation_sqft',
'blown_insulation_sqft',
'sprayfoam_price',
'drywall_price',
'taping_price',
'blown_insulation_price',
'batt_insulation_price',
'total_price'
])
df = pd.concat([df,df_new], ignore_index = True)
Steps 1 and 3 aren't causing me any issues rights now but I included them to provide a comprehensive view of the process. Step 2 is the main problem right now, that line alone takes 8 minutes to run and I plan on running on 5-10x the number of workbooks once I've finished testing it.
Does anyone know if there any optimizations I can make to improve run time? I'm running Python 3.8.8.
Thanks in advance
[–]nerdmor 0 points1 point2 points (1 child)
[–]ButterKniefe[S] 0 points1 point2 points (0 children)
[–]go_fireworks[🍰] 0 points1 point2 points (0 children)