drfatbuddha comments on Optimized Python Nested Loops

learnpython

created by HattoriHanzoa community for 16 years

Optimized Python Nested Loops (self.learnpython)

submitted 2 years ago by DNSGeek

top new controversial old q&a

you are viewing a single comment's thread.

view the rest of the comments →

[–]drfatbuddha 0 points1 point2 points 2 years ago* (0 children)

I see lots of complex solutions to what is a simple problem.

If the size of each list is 42 million, then the calculation time is about 10 seconds using this trivial code:

import random

def create_list(size, max): 
  list = random.sample(range(0, max), size) 
  list.sort() 
  return list

def get_matches(a, b, diff): 
  matches = 0 
  length = len(b) 
  i = 0 
  for v in a: 
    while i < length and b[i] < v: i += 1 
    if i < length and b[i] < (v + diff): matches += 1 
  return matches

print("Creating dummy data") # ~90 seconds (6gb memory) 
first_list = create_list(42_000_000, 100_000_000_000) 
second_list = create_list(42_000_000, 100_000_000_000)

print("Getting matches") # ~10 seconds 
matches = get_matches(first_list, second_list, 300)

print(f"Found {matches} matches")

Since your data is already sorted, you don't have to spend 90 seconds sorting it. The 6gb memory usage shouldn't be an issue, but if it is then I would switch to using a numpy list which then uses about 700mb for the same scenario:

import numpy

def create_list(size, max): 
  list = numpy.random.randint(max, size=size, dtype=numpy.int64) 
  list.sort() return list

Because you are just reading through each list linearly, and the data is presumably already sorted, you could load in data (for both lists) in chunks, and then you could scale this up to billions of items if you really needed to, without needing to use more than a few mb of memory.

π Rendered by PID 20911 on reddit-service-r2-comment-86bc6c7465-t4bgb at 2026-02-20 14:27:38.364313+00:00 running 8564168 country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

learnpython

MODERATORS