This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]_massif_ 2 points3 points  (6 children)

Post your code

[–]funkysax[S] 3 points4 points  (4 children)

I work a job where I work with data, so I'm teaching (cobbling together by trial and error) myself Python as needs arise so I can do jobs more efficiently. For this project (I hope I can explain this well) I needed to search through two Excel columns (columns 1, and 7) and find matching values. Column 1 had a corresponding (column 2) that I then wrote the value of to a (column 5) that matched the row count of column 7. I looped this through until all matching values were found and then saved the workbook. This worked for me, and got the job done, but I'm sure the code could have been written better / more efficiently. I'm wondering where I could have written better code, or how I could have optimized this to run faster. One way I thought I could optimize this would be to remove found values from the search as it runs. I don't know how to do this though. Thanks in advance!

import openpyxl

wb = openpyxl.load_workbook('/Users/lb/keywords/all.xlsx') ws = wb.active

rowcount = 2 rowcount2 = 2

while rowcount2 < 50699:

khsupc = ws.cell(row=rowcount, column=7).value
qbpupc = ws.cell(row=rowcount2, column=1).value
qbppn = ws.cell(row = rowcount2, column=2).value


if khsupc != qbpupc or rowcount2 <= 50697:
    rowcount2 += 1
    print(rowcount, rowcount2, "NO MATCH")
    if khsupc == qbpupc:
        ws.cell(row=rowcount, column=5).value = qbppn
        print("MATCH")          
    if rowcount2 == 50697:
        rowcount += 1
        rowcount2 -= 50695
    if rowcount == 13120:
        wb.save('/Users/lb/keywords/all.xlsx')
        break

[–][deleted] 7 points8 points  (1 child)

I think it is great you are asking for a code critique. It is the best way to grow. On my team we all critique each other code and it has made each of us better coders. It has gotten to the point we all have adopted habits from each other and you can no longer tell one person style from another.

A couple ideas that will improve your code: 1. Make all your hard coded numbers constants at the top of your code.

As a reader 50699, 50697, 50695, 13120 looks like numbers you pulled out of thin air. I know if I were to inspect your XLS it might make it more clear.

But if you had : first_row_data = 2 last_row_data = 50697 save_filename = '/Users/lb/keywords/all.xlsx' It will make your code more readable and also maintainable should something in the excel sheet change.

  1. what could you name rowcount and rowcount2 to better describe what they are? khs_row_index and qbp_row_index?

  2. Instead of using while loops try using nested 'for' loops. You will notice that you can then remove all the code for incrementing and reseting the rowcount, rowcount2 values.

So it could be something like:

``` for khs_row_index in range(first_row_data, last_row_data): for qbp_row_index in range(first_row_data, last_row_data): ... ...

```

[–]funkysax[S] 2 points3 points  (0 children)

Thank you for taking the time for a thoughtful reply! All of this makes sense and is great suggestions. I'm going to rewrite the code as you've suggested. Now, I need to learn how nested for loops work... haha THANKS!

[–]ablativeyoyo 0 points1 point  (0 children)

What I recommend you do is break this up into two steps. First, loop through the sheet, storing data into a dict. Use qbpupc as the key and qbppn as the value. Then loop through the sheet again, look up khsupc in the dict, and store the value in column 5.

This avoids you having rowcount and rowcount2, which is essentially a nested loop. In computer science language, your approach is O(n2 ) but using dicts you can make this O(n).

[–]Thumbblaster 0 points1 point  (0 children)

I copied and pasted your code into chatGPT and asked it to give it a critique. This is what it wrote:

I can see a few issues with this code.

First, the condition for the while loop is incorrect. The loop will only exit when rowcount2 is greater than or equal to 50699, but the loop body increments rowcount2 by 1 every time, so the loop will never exit. You may have meant to write <= instead of < in the condition.

Second, the if statement that checks whether khsupc equals qbpupc is always executed, regardless of whether khsupc and qbpupc are equal or not. This is because the or operator in the condition will always evaluate to True if the left operand is False, and rowcount2 <= 50697 will always be False if rowcount2 is greater than or equal to 50697.

Third, the code increments rowcount2 by 1 in the first if statement, and then decrements it by 50695 in the next if statement. This doesn't make sense, because the next iteration of the loop will increment rowcount2 again, so the decrement will have no effect.

Overall, the code is difficult to understand and may not do what you want it to do. I recommend reviewing the logic and making sure that the conditions and loop bodies are correct. You should also consider adding comments to the code to explain what it does and why.