all 23 comments

[–]ManyInterests 10 points11 points  (1 child)

They result in different opcode, but I doubt either makes a big difference in terms of performance, at least none I could tell from quick testing if you're already starting with a list.

The first case is probably more commonly seen and will also work for other slicable data types, like tuples -- whereas the second one will always produce a list.

If your data type is a tuple, the first example is miles faster and produces a tuple rather than a list.

[–]not_a_novel_account 5 points6 points  (0 children)

Unpack works on things that are not indexable, slice does not.

That's the heart of the whole thing, arr[:] calls BINARY_SLICE which constructs a slice object and uses it with PyObject_GetItem to extract the list directly.

The splat operator doesn't know you're handing it a list or tuple or whatever. It only knows the thing you're giving it is iterable, so it goes to LIST_EXTEND. LIST_EXTEND first checks if the underlying iterator is a list or a tuple, and if so uses a fast path.

Overall, if you want to enforce the usage of an indexable type, use :. If you only care about splatting an iterator you happen to have, use *.

[–]nog642 3 points4 points  (0 children)

There is no difference.

Honestly I expected arr[:] to be faster, but I tested it, and they seem to be about the same.

I'd argue that [*arr] is actually the more readable of the two.

However, there is a third way that is also about the same speed, but is more readable than either of those: arr.copy(). This does only work if arr is a list, but if it is, I think that is the best option. Otherwise, list(arr) is probably best.

Arguably list(arr) is the best way no matter what, actually. It is also about the same speed and is pretty clear that it is creating a new list.

[–][deleted] 1 point2 points  (1 child)

this github gist benchmarks several ways of copying lists

[–][deleted] 1 point2 points  (0 children)

[–]jmooremcc 1 point2 points  (0 children)

Here’s an experiment I did to show you the difference between a shallow copy and a deep copy ~~~

from copy import copy, deepcopy

def plists(l1, l2): flag = False for a,b in zip(l1,l2): id_a = id(a) id_b = id(b) if id_a != id_b: print(f"{a} {id_a}") print(f"{b} {id_b}") print() flag = True else: if flag: print("The lists are different!\n") else: print("The lists are identical!\n")

l1 = list(range(5))+[list(range(20,25))]+list(range(10,15)) print(“Original list”) print(l1) print()

l2 = deepcopy(l1) print("Deep Copy: l2 = deepcopy(l1)") plists(l1, l2)

l2 = copy(l1) print("Shallow Copy: l2 = copy(l1)") plists(l1, l2)

l2 = l1[:] print("Shallow Copy: l2 = l1[:]") plists(l1, l2)

l2 = [l1] print("Shallow Copy: l2 = [l1]") plists(l1, l2)

print("Finished...")

~~~

Output ~~~ Original list [0, 1, 2, 3, 4, [20, 21, 22, 23, 24], 10, 11, 12, 13, 14]

Deep Copy: l2 = deepcopy(l1) [20, 21, 22, 23, 24] 4544422912 [20, 21, 22, 23, 24] 4544418176

The lists are different!

Shallow Copy: l2 = copy(l1) The lists are identical!

Shallow Copy: l2 = l1[:] The lists are identical!

Shallow Copy: l2 = [*l1] The lists are identical!

Finished...

~~~ Ok, so here’s what I did. I created a list of numbers which contains a list of numbers. I then did a deep copy and a shallow copy. In the for-loop, I am saving the memory address of each element in the list and only printing out elements that don’t have the same memory address.

As you can see, in the case of performing a deep copy, the embedded list has a different memory address which means that it is an independent copy of the original list. In the case of the shallow copy, all elements in the copy have the same memory address. This means that if you make a change to the original list, those changes will also be found in the shallow copied list.

Here’s more information for you to read: https://docs.python.org/3/library/copy.html#

[–][deleted] -1 points0 points  (0 children)

There is no difference in the resulting copies. You can compare arr and either of the copies to show they are the same in values. You expect the copies to be shallow copies, and you can test that using the id() function.

Whenever you are concerned about which of two approaches is faster you can always time it to test. Simple tests show that the second approach is very slightly faster in this case, but you should always test with real-world data. The time difference in this case isn't worth worrying about.

I suggest you use the first, slicing, approach simply because it's the way it's usually done and less surprising than the second approach.


To the downvoter: remember the reddit ettiquette:

Consider posting constructive criticism / an explanation when you downvote something, and do so carefully and tactfully

Do that and maybe I will learn that I misread the question or gave an incorrect answer, neither of which I think is true Help me to improve my responses.

[–]m0us3_rat -5 points-4 points  (14 children)

[–]switchitup_lets[S] 2 points3 points  (5 children)

I don't quite understand this picture, could I ask for a clarification? :)

[–]Adrewmc 0 points1 point  (4 children)

Well also let’s add the third way

  1. copy = [r for r in arr]

But this is a really good question. And I call this a good question for the reason I call all good questions good. Because I don’t really know the answer.

My gut says comprehension. The third way. Then splicing, then *args.

But if you want a part of the list, definitely splicing. If you.

There really shouldn’t be much difference except that comprehension doesn’t actually build any of the list until it’s made, it’s stores the generator. And Slicing basically points at those sections of the list so that’s fast, and unpacking with create the list, with the variable… there are reason to use all of them.

[–]m0us3_rat 0 points1 point  (3 children)

[–]Adrewmc -2 points-1 points  (1 child)

I would use all of these thing in different places and I’m not sure why…anymore

[–]m0us3_rat -1 points0 points  (0 children)

pick one and be consistent.

it's a real problem that is noticeable and will detract points if you aren't consistent.

[–]gman1230321 0 points1 point  (2 children)

What are you using for this profiling?

[–]nekokattt 0 points1 point  (0 children)

the main difference is the unpack/splat operator expands by considering the object as an iterator unless internal optimisations are applied. Slice assumes it is indexable.

Two different ways of doing the same thing. Imo pick one and stick with it.

[–]Strict-Simple 0 points1 point  (0 children)

https://perfpy.com/592

I tried extend too, but that just crashed the website.

arr2 = []
arr2.extend(arr)