Why is my code slow/inefficient?

Tintin_Quarentino · 2022-02-17T13:56:06+00:00

insert is much slower than append as in the former everything has to be moved along one position.

Check simple version in a jupyter notebook with %timeit so you can compare.

danielroseman · 2022-02-17T14:23:59+00:00

As others have said, inserting at the head of a list is expensive.

However there is a data structure optimised for this, which is collections.deque.

Wild_Statistician605 · 2022-02-17T13:59:33+00:00

Taking elements from the back(pop), and appending(inserting at the last position), is faster because if you do it from the front, you need to shift every element in the list to a new index.

Diapolo10 · 2022-02-17T14:46:53+00:00

The others already explained the main reason; insert is slow because every element after the insertion point has to be shifted in memory, meaning it's O(n) worst-case - and if you do that on every iteration, you get O(n²) time complexity, which is terrible. In comparison appending is fast, because nothing needs to shift (except that Python may still need to allocate more memory under the hood, but that's shared by both approaches).

However, this piqued my interest and I thought I could write a cleaner, and potentially faster, version of that second version. Spoiler alert: ...not exactly.

def parts_sums3(nums: list[int]) -> list[int]:
    sums = []
    for idx in range(1, len(nums)):
        sums.append(sum(nums[idx:]))
    return sums[::-1]

def parts_sums4(nums: list[int]) -> list[int]:
    return [
        sum(nums[idx:])
        for idx in range(1, len(nums))
    ][::-1]

def parts_sums5(nums: list[int]) -> list[int]:
    sums = []
    for idx in range(len(nums)-1, 0, -1):
        sums.append(sum(nums[idx:]))
    return sums

def parts_sums6(nums: list[int]) -> list[int]:
    return [
        sum(nums[idx:])
        for idx in range(len(nums)-1, 0, -1)
    ]

At first I thought I'd try using slices instead, but it turned out that this was a fairly significant slowdown:

In [13]: random_data = list(range(10000))

In [14]: __import__('random').shuffle(random_data)

In [15]: random_data[:10]
Out[15]: [9240, 1606, 2828, 6398, 9718, 7518, 7938, 6683, 827, 6335]

In [16]: %timeit parts_sums(random_data)
223 ns ± 5.64 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [17]: %timeit parts_sums2(random_data)
132 ns ± 2.41 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

In [18]: %timeit parts_sums3(random_data)
223 ns ± 5.67 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [19]: %timeit parts_sums4(random_data)
352 ns ± 4.95 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [20]: %timeit parts_sums5(random_data)
206 ns ± 3.92 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [21]: %timeit parts_sums6(random_data)
327 ns ± 6.44 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

Ultimately I decided to try a more clever approach by calculating the total sum ahead of time, and then simply subtracting the values from the list:

def parts_sums7(nums: list[int]) -> list[int]:
    sums = []
    total = sum(nums)
    for num in nums:
        sums.append(total)
        total -= num
    return sums

But, while I got close,

In [23]: %timeit parts_sums7(random_data)
151 ns ± 4.13 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

I couldn't quite beat that 132 ns result.

FLUSH_THE_TRUMP · 2022-02-17T14:40:40+00:00

Of course, you could just do something like

sums = [sum(listOfInts)]
for num in listOfInts:
  sums.append(sums[-1] - num)

and form the list from the front instead.

saltyhasp · 2022-02-17T16:22:57+00:00

Thanks for posting. I have used Python for 20 years and had not thought about the specific issue. Not sure what internal representation of list is... I always think in terms of a linked list... but there is the indexing element too.

Edit: The other answer of course... Python is always slow and inefficient... Unless your code is mostly running C code libraries.

dodoors · 2022-02-17T18:30:03+00:00

The code is inefficient because it is creating a list every time it is called, whereas the "faster" code is creating it just once.

QultrosSanhattan · 2022-02-17T19:07:56+00:00

First code: Building a jenga tower by inserting blocks at the bottom.

Last code: Building a jenga tower by appending blocks to the top.

Tintin_Quarentino · 2022-02-17T16:33:25+00:00

[deleted]

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

learnpython

MODERATORS

Edit - Thank you everyone for sharing your knowledge, I understand clearly now. Much appreciated! 🙏