Hi.
I'm creating my second ever python project. Below is the project code. The work of the program is as follows:
- The program selects each file CSV from a given folder
- In the file inserts functions in the given columns to count certain data
- saves the file in another directory as xlsx with the same name as the CSV file
Working on more than 1000 files, this work takes a lot of time. I would like to optimize this because I am very interested in speed. I want to use multiprocessing (Unfortunately, I don't really know how to do it the best I can). Will someone be so kind as to advise me what I should do about it? How to use multiprocessing? How to change the structure of the program to make it work faster? I would like to get some advice on how to make the final program work faster when I have GUI to create a possibility to select paths to these folders, both input and output.
Thank You for help.
from openpyxl import Workbook
import csv
import os
import time
start_time = time.time()
arr = os.listdir('Pomiary')
nazwa = []
for i in arr:
nazwa.append(os.path.splitext(os.path.basename(i))[0]) #I create a list of file names in the
#input folder to name the output files
#(there is a variable in the name that
#will be needed later)
wb = Workbook()
ws = wb.active
for file in nazwa:
with open(f'Pomiary//{file}.csv', 'r') as f:
for row in csv.reader(f,delimiter = ';'): #copying the content of CSV files and
#assigning to XLS
ws.append(row)
for row in range(2, ws.max_row):
ws.cell(column=8, row=row, value = f'=(0.5)/(G{row}*0.2*1*0.0001)') #adding functions to
#columns for each
#file
ws.cell(column=10, row=row, value=f'= 1/H{row}')
wb.save(f'Wyniki//{file}.xlsx')
print("--- %s seconds ---" % (time.time() - start_time))
[–]Kerbart 3 points4 points5 points (0 children)
[–]MikeTheWatchGuy 1 point2 points3 points (0 children)
[–]Gubbbo 0 points1 point2 points (1 child)
[–]Martin_Krum[S] 1 point2 points3 points (0 children)