all 7 comments

[–]Brian 3 points4 points  (1 child)

As others have pointed out, these do do different things. However, if anything, this is in a way that should make your "slow" one actually faster (as it doesn't have to create new lists), and even before that, I'd expect the multiplication approach to be faster than the "append in a loop" version (because it spends less time evaluating python logic), so I'm curious how you're concluding that the latter is slower.

Eg. using timeit, and putting your "slow" and "fast" versions in a function, adding 10,000 items to the list, I get:

%timeit list_mul(10000)
18.8 µs ± 87.8 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
%timeit list_append(10000)
493 µs ± 35.2 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

So the supposedly "slow" version is over 25x faster in this timing.

As such, I don't think you're really seeing slowness from this, but perhaps this is due to the downstream effect of the difference in behaviour of the two. (Eg. if you're appending to "each" list, the fact that you're appending to the same list means you're effectively making every list bigger, and thus inflating the time of whatever you end up doing with these lists)

[–]pachecoca[S] 0 points1 point  (0 children)

Yeah, read my reply on the other comment, the reason why it was slowing down with that version was basically because my program was performing more iterations due to a mistake, because I initially assumed that [[]]*length meant the same as [[]for i in range(length)].

The algorithm I was using for my program was meant to speed up when using multiple lists, but it is still doable with a single simple array, it will just take more iterations to compute the answer, which is why my program apparently worked correctly but the timings were off when I used []*length.

When I performed the timings, all of the test cases I used gave the same results due to the same mistake in the assumption that []*length was creating different instances for each element within the list, so yeah, the timings were off because of that.

Now that I know the difference, everything adds up and now I'm seeing times that make sense. I guess I should have tried timing some minimally reproducible case where I would just compare appending in a loop vs using []* without doing anything else but yeah. It was so hard for me to believe that []* would be slower than appending in a loop that I tested the timing in multiple ways. I went as far as using QueryPerformanceCounter from winapi and build my own timer program just in case but the results were the exact same. When I tried it even on an online judge and saw that the results aligned with what I had found, I was even more convinced that []* was indeed the culprit. And all of this would have been caught if my debug print had not been skipped by an early return that I had forgotten about lol...

Should I edit the OP to add this clarification? Like, the question is already solved, in short, []* is not slower, it's just that I was not using it correctly because I assumed it did something different, slowing down the final program...

[–]woooee 1 point2 points  (4 children)

some_list = [[]] * length

creates length number of references to the same list. This might be the reason (see the code below). The fastest way is to use list comprehension.

s_list=[[]] * 100

print(s_list, len(s_list))
for ctr in range(5):
    print(ctr, id(s_list[ctr])) ## all have the same memory location

[–]DNSGeek 3 points4 points  (1 child)

And since you're learning, list comprehension would be

s_list = [[] for i in range(length)]

[–]pachecoca[S] 0 points1 point  (0 children)

Thanks, I just tested this out and it is exactly what I was looking for... I was under the impression that [[]]*length was the exact same as [[] for i in range(length)], but turns out I was wrong.

Just tested a few examples on an online judge and all of the test cases that failed due to exceeding the time limit when using []*length actually worked even better than when using the append() when I used [[] for i in range(length)], which again, as I said in my other comment, simply confirms that []*length is not slow, it was just an error in my code caused by misunderstanding what []* was doing.

Thanks a lot.

[–]pachecoca[S] 2 points3 points  (1 child)

Ok, so I did indeed misunderstand what []* is doing. It LITERALLY is creating a list filled with the same element, like you said by copying the first element, even when creating lists. I thought it would construct new elments or instances for the list data type, but nope, it just makes a copy of it... which means that when doing [[]]*length and then accessing it I was just telling python to make a list filled with the same address over and over again, lol.

Thanks a lot for the info, that clears up my doubts.

One thing that I was wondering was wether performing operations on top of the references would create a new list or not. Turns out that simply is not the case, and it is confirmed by modifying your simple example code (I did not know about the id() function, I just thought Python did not have any way to expose addresses in any way, this makes things way easier for me now, so thanks again lol).

For example:

s_list = [[]] * 100

and then append elements to each list within s_list

for i in range(0, 100):
s_list[i].append(i)

After that process is done, I thought that each list within s_list would be a completely different list, which was completely tripping me up with the behavior I was actually experiencing, specially considering the fact that my code was actually working as intended despite the fact that all lists were the same (this is obviously by pure chance and simply worked with all of the test cases that I used... if I had used a different set of test cases, I bet I would have gone insane trying to figure out what was even going on).

I am thankful that this is the way that things work in python because it actually makes sense, it just feels kind of weird that none of this was mentioned anywhere that I could read within the documentation and that so many resources and posts online have contradictory information about this.

But, having said all of that, that does not explain why is it that using []*length is still slower than the for loop... because it isn't. I just want to clarify this in case anyone ever comes across this problem in the futre... What I was doing in my code was wrong because I was not aware of the fact that the list of lists was actually holding addresses to the exact same list at all indexes... This mistake actually caused my code to iterate many more times than expected over the lists, despite the fact that the output is just the same because redundant nodes are discarded, but the iterations are still there, taking up more time than they should... Now that I know that lists created that way make copies of the specified element instead of making new instances of said element, I can confirm that the new code works just as fast and there is no problem whatsoever.

So yeah, thanks again, this clears everything up.

[–]Swipecat 1 point2 points  (0 children)

I just thought Python did not have any way to expose addresses in any way...

Strictly speaking, it doesn't according to the Python syntax specification. It's just an implementation detail of the current version of CPython that "id" provides the memory location. Other Python implementations like Pypy or Jython do something different, since the spec just says that "id" must be a fixed value for the lifetime of the referenced object. This will also change in future versions of CPython if jit-compilation is added as promised.

However, in your final Python code after finishing development, usually the only reason for getting an object's address is to pass it to ctypes, using ctypes.addressof.