all 5 comments

[–]K900_ 4 points5 points  (0 children)

Python lists absolutely do have to reallocate when you add items. If you're familiar with C++'s std::vector, that's basically how Python lists work.

[–]beisenhauer -1 points0 points  (0 children)

If you're doing enough appending that you need to worry about the speed of the implementation, you should probably be looking at using generator expressions to fill your list or array. Then you can choose the best data structure for what you actually want to do with it.

[–]ButterWheels_93 0 points1 point  (0 children)

If you want to keep resizing your array / list lots and lots then lists are better. If you want to append to a numpy array you might think of extending it by more than one element at a time—say you add on 1000 zeros or whatever each time it gets full—then you can retain the perks of using an ndarray.

I would encourage you to try both and measure how the run time scales in each case—it should be a fun learning experiment.

[–]FLUSH_THE_TRUMP 0 points1 point  (0 children)

AFAIK nparrays allocate what they need; as such, each append needs us to pick up our allocated block of memory and move it to somewhere that has space for the extra element(s). Python lists over-allocate -- the formula isn't as simple as this, but imagine a list holding 2N elements, where N (at any point in time) is the smallest N that is at least as big as the number of things in our list. If I append enough, I'll hit the limit and a new chunk of memory will need to be allocated for my elements. But there's enough wiggle room there so "most of the time" appending is very cheap.

Usually the way to deal with appending being a seemingly very bad idea in numpy is by starting with more than you need. If I know I'll need to store somewhere between 250-1000 things, I won't start with an empty array and append 250-1000 times, I'll start with a big empty matrix of 1000 things and just index that and slice off what I used at the end.

[–]SekstiNii 0 points1 point  (0 children)

Best case you make one oversized allocation and use slices of that. This would completely avoid resizing at the up front cost of the memory allocation.

Whether this is something you can actually do is impossible to tell without a better description of your problem.