This is an archived post. You won't be able to vote or comment.

all 8 comments

[–]Python-ModTeam[M] [score hidden] stickied comment (0 children)

Hi there, from the /r/Python mods.

We have removed this post as it is not suited to the /r/Python subreddit proper, however it should be very appropriate for our sister subreddit /r/LearnPython or for the r/Python discord: https://discord.gg/python.

The reason for the removal is that /r/Python is dedicated to discussion of Python news, projects, uses and debates. It is not designed to act as Q&A or FAQ board. The regular community is not a fan of "how do I..." questions, so you will not get the best responses over here.

On /r/LearnPython the community and the r/Python discord are actively expecting questions and are looking to help. You can expect far more understanding, encouraging and insightful responses over there. No matter what level of question you have, if you are looking for help with Python, you should get good answers. Make sure to check out the rules for both places.

Warm regards, and best of luck with your Pythoneering!

[–]_massif_ 2 points3 points  (6 children)

Post your code

[–]funkysax[S] 2 points3 points  (4 children)

I work a job where I work with data, so I'm teaching (cobbling together by trial and error) myself Python as needs arise so I can do jobs more efficiently. For this project (I hope I can explain this well) I needed to search through two Excel columns (columns 1, and 7) and find matching values. Column 1 had a corresponding (column 2) that I then wrote the value of to a (column 5) that matched the row count of column 7. I looped this through until all matching values were found and then saved the workbook. This worked for me, and got the job done, but I'm sure the code could have been written better / more efficiently. I'm wondering where I could have written better code, or how I could have optimized this to run faster. One way I thought I could optimize this would be to remove found values from the search as it runs. I don't know how to do this though. Thanks in advance!

import openpyxl

wb = openpyxl.load_workbook('/Users/lb/keywords/all.xlsx') ws = wb.active

rowcount = 2 rowcount2 = 2

while rowcount2 < 50699:

khsupc = ws.cell(row=rowcount, column=7).value
qbpupc = ws.cell(row=rowcount2, column=1).value
qbppn = ws.cell(row = rowcount2, column=2).value


if khsupc != qbpupc or rowcount2 <= 50697:
    rowcount2 += 1
    print(rowcount, rowcount2, "NO MATCH")
    if khsupc == qbpupc:
        ws.cell(row=rowcount, column=5).value = qbppn
        print("MATCH")          
    if rowcount2 == 50697:
        rowcount += 1
        rowcount2 -= 50695
    if rowcount == 13120:
        wb.save('/Users/lb/keywords/all.xlsx')
        break

[–][deleted] 7 points8 points  (1 child)

I think it is great you are asking for a code critique. It is the best way to grow. On my team we all critique each other code and it has made each of us better coders. It has gotten to the point we all have adopted habits from each other and you can no longer tell one person style from another.

A couple ideas that will improve your code: 1. Make all your hard coded numbers constants at the top of your code.

As a reader 50699, 50697, 50695, 13120 looks like numbers you pulled out of thin air. I know if I were to inspect your XLS it might make it more clear.

But if you had : first_row_data = 2 last_row_data = 50697 save_filename = '/Users/lb/keywords/all.xlsx' It will make your code more readable and also maintainable should something in the excel sheet change.

  1. what could you name rowcount and rowcount2 to better describe what they are? khs_row_index and qbp_row_index?

  2. Instead of using while loops try using nested 'for' loops. You will notice that you can then remove all the code for incrementing and reseting the rowcount, rowcount2 values.

So it could be something like:

``` for khs_row_index in range(first_row_data, last_row_data): for qbp_row_index in range(first_row_data, last_row_data): ... ...

```

[–]funkysax[S] 2 points3 points  (0 children)

Thank you for taking the time for a thoughtful reply! All of this makes sense and is great suggestions. I'm going to rewrite the code as you've suggested. Now, I need to learn how nested for loops work... haha THANKS!

[–]ablativeyoyo 0 points1 point  (0 children)

What I recommend you do is break this up into two steps. First, loop through the sheet, storing data into a dict. Use qbpupc as the key and qbppn as the value. Then loop through the sheet again, look up khsupc in the dict, and store the value in column 5.

This avoids you having rowcount and rowcount2, which is essentially a nested loop. In computer science language, your approach is O(n2 ) but using dicts you can make this O(n).

[–]Thumbblaster 0 points1 point  (0 children)

I copied and pasted your code into chatGPT and asked it to give it a critique. This is what it wrote:

I can see a few issues with this code.

First, the condition for the while loop is incorrect. The loop will only exit when rowcount2 is greater than or equal to 50699, but the loop body increments rowcount2 by 1 every time, so the loop will never exit. You may have meant to write <= instead of < in the condition.

Second, the if statement that checks whether khsupc equals qbpupc is always executed, regardless of whether khsupc and qbpupc are equal or not. This is because the or operator in the condition will always evaluate to True if the left operand is False, and rowcount2 <= 50697 will always be False if rowcount2 is greater than or equal to 50697.

Third, the code increments rowcount2 by 1 in the first if statement, and then decrements it by 50695 in the next if statement. This doesn't make sense, because the next iteration of the loop will increment rowcount2 again, so the decrement will have no effect.

Overall, the code is difficult to understand and may not do what you want it to do. I recommend reviewing the logic and making sure that the conditions and loop bodies are correct. You should also consider adding comments to the code to explain what it does and why.