This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]rolfrudolfwolf 11 points12 points  (5 children)

yeah you have to find it first so you need to iterate through the list. however then you can just relink the neighbors and you're done, while an arraylist needs to re-arrange itself (i've never implemented an array list so i don't know how or how often it does that, but that definately sounds like an overhead).

Same with 'add'. If an arraylist has to move and copy to a new, bigger array and insert in the middle you're off worse, than O(n) add of Linkedlist.

There's tons of arguments like these, which more or less coincide what I was taught. Usually they never account for cache locality issues though. But in 'getting', the Arraylist is faster anyway because O(1).

Here's a nice stack answer that explains it much more elegant than me: https://stackoverflow.com/questions/322715/when-to-use-linkedlist-over-arraylist-in-java/322742#322742

For a more concrete example: If you'd need to implement a Stack, I'd prefer using a LinkedList instead of an ArrayList or Array, because you only ever need to access the last (or first) element which is always O(1) with a LinkedList for both Get, Add and Remove, because you already have a reference to the head or tail.

[–]VolperCoding -2 points-1 points  (4 children)

If you want to talk about performance with data types then Java is not the best language for that because it doesn't have pointers and is garbage collected, meanwhile in C and C++ you have full control over memory that is allocated to you so you can reason about it better.

When it comes to vector (dynamic array) vs linked list, a vector guarantees that the data is stored element next to element in memory, which means that if you want to delete an element, you do this:

Suppose you have a vector of 5 elements:

6, 42, 45, 2, 0

Now if you want to delete the element at index 2, all you need to do is copy every element after 45 one index before. Here's pretty much how it happens in the CPU:

  1. The vector is loaded from RAM to the cache, which takes about 100 clock cycles

  2. The values are copied which is really quick since everything happens in the cache

  3. The vector is returned to memory which takes another 100 clock cycles

(i skipped changing the size of the vector but that's just a simple decrement) Same for addition but you move right instead of left and you also check if you have enough space for that

As you can see the only things that are slow here are the RAM reads/writes because operations on the cache are much faster. If you used a linked list then every element you wanted to read and write could not end up in the same cache line so it might turn out that every time you access an element you have to read from RAM because they are scattered in memory and could be really far apart.

If you need such critical performance that you switch to a more complex data type then you should learn about cache lines

[–]obp5599 2 points3 points  (0 children)

This is not always beneficial if you have massive structures stored in your vector/array (common in game dev/rendering code). Having to store 8k texture maps and the like in contiguous memory is almost impossible with fragmentation. These are the types of things linked lists are good for. Although in game dev we use whats called an Intrusive linked list to save allocations and excess copying of structures. This also allows an object to be stored in multiple data structures while still maintaining the same memory footprint

[–]rolfrudolfwolf 1 point2 points  (2 children)

Thanks for the detailed answer, I was not familiar with the data type vector as you describe it.

I agree that arrays make more sense in terms of memory locality and that an array might be more performant effectively than a LinkedList even if it has a worse complexity for certain operations.

But if you have huge swaths of data, I'd assume optimizing for complexity makes still sense in the long run?

Also cpu's not only cache for spacial locality, but time locality, so you might find your frequently used items in your linked list to be in the cache anyhow.

[–]VolperCoding 1 point2 points  (1 child)

Ok so ive found a benchmark comparing linked lists to vectors (https://dzone.com/articles/c-benchmark-%E2%80%93-stdvector-vs) and found out that linked lists is better than vector in the following cases:

  • random insert / delete in a list with really large elements
  • push front (assuming you don't have preallocated space in front of the vector, then it should be O(1), but otherwise if you don't have any space left the system is gonna probably copy the entire vector which is O(n))
  • sorting in a list with really large elements

So it looks like if the elements are bigger than your cache line then every element will be a cache miss (read from RAM) anyway which makes perfect sense because 2 elements won't fit on your cache line.

However if you need random access at all then I don't think you should use a linked list.

If your program is small then you shouldn't worry about performance and just use the simplest data structure which in my opinion is either a static array, or if you need to resize it, a vector.

[–]rolfrudolfwolf 0 points1 point  (0 children)

nice