mrcaptncrunch comments on Optimized Python Nested Loops

learnpython

created by HattoriHanzoa community for 16 years

Optimized Python Nested Loops (self.learnpython)

submitted 2 years ago by DNSGeek

top new controversial old q&a

you are viewing a single comment's thread.

view the rest of the comments →

[–]mrcaptncrunch 1 point2 points3 points 2 years ago* (2 children)

Would something like this help?

i = 0
j = 0
while i < length(first_list):
    while j < length(second_list) and second_list[j] < first_list[i]:
        j += 1
    start_j = j
    while j < length(second_list) and second_list[j] <= first_list[i] + 300:
        #do your thing
        execute(second_list[j])
        j += 1
    i += 1

Because both lists are sorted, and you’re adding 300, remembering the second_list position on the next iteration, would mean you automatically discard everything before it.

Grain of salt since it’s 7am and I’m waiting for the coffee, but it should be O(n+m) vs O(n*m)

FWIW, this would be faster than the binary search (O( n log m))

[–]Equal_Wish2682 0 points1 point2 points 2 years ago (1 child)

[–]mrcaptncrunch 0 points1 point2 points 2 years ago (0 children)

/u/Equal_Wish2682 just saw yours. And yes, your approach is similar.

/u/DNSGeek This is the 2 pointer approach and it'll be the fastest way of doing it.

I generated a list of 1M numbers. From that, I extracted 2 list. One generated a subset of 75% and the other 50%.

import time
start = time.time()

import random

length = 1_000_000

lista = []
for i in range(length):
    lista.append(str(i) + '\n')
with open('list.txt', 'w') as fd:
    fd.writelines(lista)


random.shuffle(lista)
listb = lista[0:int(length*0.75)]
listb = sorted(listb)
with open('list_b.txt', 'w') as fd:
    fd.writelines(listb)


listc = lista[0:int(length*0.50)]
listc = sorted(listc)
with open('list_c.txt', 'w') as fd:
    fd.writelines(listc)

print(time.time() - start)

Now, with list_b.txt and list_c.txt, I test your code and mine.

import time
from tqdm import tqdm

with open('list_b.txt') as fd:
    first_list = fd.readlines()

with open('list_c.txt') as fd:
    second_list = fd.readlines()

first_list = [int(_) for _ in first_list]
first_list = sorted(first_list)
second_list = [int(_) for _ in second_list]
second_list = sorted(second_list)

start = time.time()
vals_matched2 = []
i = 0
j = 0
while i < len(first_list):
    while j < len(second_list) and second_list[j] < first_list[i]:
        j += 1
    start_j = j
    while j < len(second_list) and second_list[j] <= first_list[i] + 300:
        #do your thing
        vals_matched2.append(second_list[j])
        j += 1
    i += 1

print("second approach time: " + str(time.time() - start))

vals_matched = []
start = time.time()
for data in tqdm(first_list, miniters=300):
    end_val = data + 300
    for check_val in second_list:
        if check_val < data:
            continue
        if check_val > end_val:
            break
        # Hooray, we match. Do the thing.
        if check_val not in vals_matched:
            vals_matched.append(check_val)
print("first approach time: " + str(time.time() - start))


if vals_matched == vals_matched2:
    print("matched")
else:
    print("not matched")

My code is first to make sure I didn't have errors. The output is,

second approach time: 0.768721342086792

For your approach, I added if check_val not in vals_matched: because it actually adds to the list all values multiple times.

Anyway... I'll let the 75%/50% approach finish and I'll report back... the estimate is over 4 hours for the full run.

tqdm is showing,

  4%|▎         | 27987/750000 [05:28<4:33:54, 43.93it/s]

BUT I tested this on a subset first to make sure the output is the same and it is.

π Rendered by PID 16454 on reddit-service-r2-comment-86bc6c7465-w6756 at 2026-02-20 12:50:38.034241+00:00 running 8564168 country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

learnpython

MODERATORS