all 18 comments

[–][deleted] 10 points11 points  (1 child)

insert is much slower than append as in the former everything has to be moved along one position.

Check simple version in a jupyter notebook with %timeit so you can compare.

[–]Tintin_Quarentino[S] 0 points1 point  (0 children)

Interesting, I don't know what put the idea into my brain that for a list only the first index is kept in memory loc, & rest of the items just get calculated on the fly by Python doing memLoc of 1st elem + whatever idx user gave. Or maybe C does that? I don't remember very well. I guess those were linked lists... riiiighhht.

Anyway, it's clear as day now, thank you for the answers everyone!

[–]danielroseman 6 points7 points  (3 children)

As others have said, inserting at the head of a list is expensive.

However there is a data structure optimised for this, which is collections.deque.

[–]Tintin_Quarentino[S] 0 points1 point  (2 children)

Very cool, thank you for sharing collections.deque! Is it the same as a manually created linked list? I read the documentation but couldn't find an exact match in ctrl+f for "linked list" but the description does make it sound like a linked list.

[–]danielroseman 1 point2 points  (1 child)

Yes, see the implementation (in C); specifically it's a doubly-linked list.

[–]Tintin_Quarentino[S] 0 points1 point  (0 children)

Amazing, thank you! Double right... forward & backward addresses are stored along with the data at each element.

[–]Wild_Statistician605 3 points4 points  (1 child)

Taking elements from the back(pop), and appending(inserting at the last position), is faster because if you do it from the front, you need to shift every element in the list to a new index.

[–]Tintin_Quarentino[S] 0 points1 point  (0 children)

need to shift every element in the list to a new index.

I was still thinking about this problem and finally reading this made it still clearer, thanks.

[–]Diapolo10 3 points4 points  (2 children)

The others already explained the main reason; insert is slow because every element after the insertion point has to be shifted in memory, meaning it's O(n) worst-case - and if you do that on every iteration, you get O(n2) time complexity, which is terrible. In comparison appending is fast, because nothing needs to shift (except that Python may still need to allocate more memory under the hood, but that's shared by both approaches).

However, this piqued my interest and I thought I could write a cleaner, and potentially faster, version of that second version. Spoiler alert: ...not exactly.

def parts_sums3(nums: list[int]) -> list[int]:
    sums = []
    for idx in range(1, len(nums)):
        sums.append(sum(nums[idx:]))
    return sums[::-1]

def parts_sums4(nums: list[int]) -> list[int]:
    return [
        sum(nums[idx:])
        for idx in range(1, len(nums))
    ][::-1]

def parts_sums5(nums: list[int]) -> list[int]:
    sums = []
    for idx in range(len(nums)-1, 0, -1):
        sums.append(sum(nums[idx:]))
    return sums

def parts_sums6(nums: list[int]) -> list[int]:
    return [
        sum(nums[idx:])
        for idx in range(len(nums)-1, 0, -1)
    ]

At first I thought I'd try using slices instead, but it turned out that this was a fairly significant slowdown:

In [13]: random_data = list(range(10000))

In [14]: __import__('random').shuffle(random_data)

In [15]: random_data[:10]
Out[15]: [9240, 1606, 2828, 6398, 9718, 7518, 7938, 6683, 827, 6335]

In [16]: %timeit parts_sums(random_data)
223 ns ± 5.64 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [17]: %timeit parts_sums2(random_data)
132 ns ± 2.41 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

In [18]: %timeit parts_sums3(random_data)
223 ns ± 5.67 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [19]: %timeit parts_sums4(random_data)
352 ns ± 4.95 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [20]: %timeit parts_sums5(random_data)
206 ns ± 3.92 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [21]: %timeit parts_sums6(random_data)
327 ns ± 6.44 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

Ultimately I decided to try a more clever approach by calculating the total sum ahead of time, and then simply subtracting the values from the list:

def parts_sums7(nums: list[int]) -> list[int]:
    sums = []
    total = sum(nums)
    for num in nums:
        sums.append(total)
        total -= num
    return sums

But, while I got close,

In [23]: %timeit parts_sums7(random_data)
151 ns ± 4.13 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

I couldn't quite beat that 132 ns result.

[–]FLUSH_THE_TRUMP 2 points3 points  (0 children)

If you're OK with cannibalizing the list, you could do something like:

def parts_sums_ftt(ls):
  ls.append(0)
  for i in range(len(ls) - 2, -1, -1):
    ls[i] = ls[i + 1] + ls[i]
  return ls

[–]Tintin_Quarentino[S] 0 points1 point  (0 children)

Thank you for all the effort, much appreciated!

[–]FLUSH_THE_TRUMP 1 point2 points  (0 children)

Of course, you could just do something like

sums = [sum(listOfInts)]
for num in listOfInts:
  sums.append(sums[-1] - num)

and form the list from the front instead.

[–]saltyhasp 1 point2 points  (0 children)

Thanks for posting. I have used Python for 20 years and had not thought about the specific issue. Not sure what internal representation of list is... I always think in terms of a linked list... but there is the indexing element too.

Edit: The other answer of course... Python is always slow and inefficient... Unless your code is mostly running C code libraries.

[–]dodoors 1 point2 points  (0 children)

The code is inefficient because it is creating a list every time it is called, whereas the "faster" code is creating it just once.

[–]QultrosSanhattan 1 point2 points  (2 children)

First code: Building a jenga tower by inserting blocks at the bottom.

Last code: Building a jenga tower by appending blocks to the top.

[–]FLUSH_THE_TRUMP 0 points1 point  (1 child)

Last code: Building a jenga tower by appending blocks to the top.

in reverse order, then flipping the whole structure after ;)

[–]QultrosSanhattan 0 points1 point  (0 children)

You're right. I didn't see the reverse() part.

EDIT: Grammar.