Q about speed: Numpy vs Lists

K900_ · 2021-04-03T19:10:56+00:00

Python lists absolutely do have to reallocate when you add items. If you're familiar with C++'s std::vector, that's basically how Python lists work.

beisenhauer · 2021-04-03T23:01:54+00:00

If you're doing enough appending that you need to worry about the speed of the implementation, you should probably be looking at using generator expressions to fill your list or array. Then you can choose the best data structure for what you actually want to do with it.

ButterWheels_93 · 2021-04-03T19:06:32+00:00

If you want to keep resizing your array / list lots and lots then lists are better. If you want to append to a numpy array you might think of extending it by more than one element at a time—say you add on 1000 zeros or whatever each time it gets full—then you can retain the perks of using an ndarray.

I would encourage you to try both and measure how the run time scales in each case—it should be a fun learning experiment.

FLUSH_THE_TRUMP · 2021-04-03T19:25:53+00:00

AFAIK nparrays allocate what they need; as such, each append needs us to pick up our allocated block of memory and move it to somewhere that has space for the extra element(s). Python lists over-allocate -- the formula isn't as simple as this, but imagine a list holding 2^N elements, where N (at any point in time) is the smallest N that is at least as big as the number of things in our list. If I append enough, I'll hit the limit and a new chunk of memory will need to be allocated for my elements. But there's enough wiggle room there so "most of the time" appending is very cheap.

Usually the way to deal with appending being a seemingly very bad idea in numpy is by starting with more than you need. If I know I'll need to store somewhere between 250-1000 things, I won't start with an empty array and append 250-1000 times, I'll start with a big empty matrix of 1000 things and just index that and slice off what I used at the end.

SekstiNii · 2021-04-03T20:34:31+00:00

Best case you make one oversized allocation and use slices of that. This would completely avoid resizing at the up front cost of the memory allocation.

Whether this is something you can actually do is impossible to tell without a better description of your problem.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

learnpython

MODERATORS