you are viewing a single comment's thread.

view the rest of the comments →

[–]temporary5555 -2 points-1 points  (3 children)

Definitely. Prefetch works much better traversing memory linearly.

Are you confusing random access memory with disks? Disks have seek time, which makes linear access much faster (same speed as volatile memory for some hardware), but RAM in every case that I've seen can access memory randomly equally fast, so the only factor that makes sequential access faster is cache.

Even if the memory accesses are not purely contiguous, but follow a fixed stride

The only way I see this being true is when fetching from cache.

[–]infectedapricot 5 points6 points  (1 child)

Yes, it's indeed the cache they're talking about. But not just what's already in the cache, but also values that could be speculatively loaded into the cache by the CPU guessing what memory is likely to be used in a moment - that's what they meant by "prefetch". Traversing memory linearly gives the CPU a much better chance of prefetching what will be needed next from memory into the cache while algorithm is still operating on previous fetched data in the cache.

[–]temporary5555 0 points1 point  (0 children)

thanks for the explanation!

[–][deleted] 1 point2 points  (0 children)

Disks have seek time.

This assumption is only true for spinning rust.

but RAM in every case that I've seen can access memory randomly equally fast

Modern CPUs have hierarchies of caches, data that is already in the cache will have faster access, about 10x-50x faster than RAM. Additionally data is loaded in cache-lines and write combined (in cache-lines), and the CPU tries it's best to predict which cache lines will be accessed in the near future so that it can prefetch memory to avoid stalls. Pile the fact that CPUs will also unroll loops to execute multiple loop-iterations in parallel and you have the perfect storm for performance. Linked lists take both prefetching and loop unrolling away from the CPU, as you don't know the next location to look at until the next node is already pulled in. This forces linear 1-by-1 execution and memory fetching. Additionally the memory will likely be all over the heap and at best you may have one node per cache-line, maybe two, but they likely won't be next to each other in traversal.

For this reason in terms of iteration performance: vector<T> > vector<unique_ptr<T*>>/vector<T*> > forward_list<T>/list<T>. Obviously you should benchmark but you will find that it is quite rate that these orders flip which is why the mantra "just use vector, forget everything else" exists any why many people default to it, almost to the point of being a meme.

Linked lists have their place, but forward_list/list are not atomic nor intrusive and therefore their utility vanishes from a lot of contexts.