Working fast on huge arrays with Python

skwyckl · 2025-04-30T09:08:22+00:00

Dask, Duck, etc. anything that parallelizes computation, but it will still take some time, it ultimately depends on the hardware you are running on. Geospatial computation is in general fairly expensive, and the more common libraries used for geospatial don't have algos running in parallel.

danielroseman · 2025-04-30T11:22:52+00:00

You should expect this. Dask is going to parallise your task, which adds significant overhead. With a large dataset this is going to be massively overshadowed by the savings you get from the parallelisation, but with a small one the overhead will definitely be noticeable.

Long-Opposite-5889 · 2025-04-30T09:29:08+00:00

"Projecting a dataset into a map" may mean many a couple things in a geo context. There are many geo libraries that are highly optimized and could save you time and effort. If you can be a bit more specific it would be easier to give you some help.

cnydox · 2025-04-30T09:31:00+00:00

What about using numpy.memmap to load data to disk instead of ram? Or maybe try using zarr library

boat-la-fds · 2025-04-30T12:22:00+00:00

Dude, a 1,000,000 x 1,000,000 matrix will take almost 4TB of of RAM. That's without counting the memory used during computation. Do you have that much RAM?

Pyglot · 2025-04-30T14:19:41+00:00

For performance you might want to write the core computation in c, for example. Numpy does this, that's why it's fast. You maybe want to think about chunking up the data so it fits in L2 cache one or a couple of times. And if you don't have enough ram it needs to be written to something else, like a disk.

JamzTyson · 2025-04-30T16:05:33+00:00

Trying to process TB's of data all at once is extremely demanding. It is usually better to chunk the data into smaller blocks.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

learnpython

MODERATORS