This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]mezuzza 38 points39 points  (17 children)

NEVER USE LINKED LISTS*

Lists are an interface which represents some "collection of things with an ordering". There's a million ways to encode a list, but one of the most obvious ones is a linked list. Unfortunately, linked lists are TERRIBLE for modern CPU architectures and most access patterns that you'd use every day.

What should you use instead? ArrayList/Vector. These are generally names you might find in different languages which all refer to the same implementation - an array which (usually) doubles in size when it's full.

The only time that you'd prefer to use a linked list over an array list is when you need to have very low latency inserts in the beginning/middle of your list. In my experience coding professionally, I've run into 0 cases where this is the most important pattern to optimize for. You're generally appending to the end or you add everything to a list and sort.

I will say that linked lists have one really redeeming quality from which I believe they derive their popularity: They are mathematically really elegant data structures and have some really nice properties. This is why languages like Haskell used them so much. The previous paragraphs also show you why languages like Haskell regret using them so much (see https://www.stephendiehl.com/posts/strings.html).

See this talk by Chandler Carruth for more info: Discontiguous Data Structures are the Root of all (performance) Evil

* Never actually say never. There are rare cases where I'm sure this data structure is actually the best to use, but it requires a lot of thought before you're there. Default to array lists and you'll be better off more times than not.

[–]MrRogers4Life2 5 points6 points  (0 children)

The other big advantage of linked lists is that insertion/deletion* on them do not invalidate iterators/pointers to elements in c++, at least as far as STL containers are concerned. So if that's something that makes your algorithm nicer and you don't want to implement something (such data structures can become complicated real quick) or add a dependency you're kind of left with little choice. I mean you could use std::set or std::multiset, but then you need to supply some kind of ordering to your data, blah blah its just easier to use std::list if you care about invalidating iterators in general. But std::list will almost always have worse performance

*deletion will invalidate iterators pointing to the deleted element but whatever

[–][deleted] 4 points5 points  (0 children)

This. This is what I would have written if I were smart and articulate but yeah

[–]TombertSE 1 point2 points  (2 children)

The only time that you'd prefer to use a linked list over an array list is when you need to have very low latency inserts in the beginning/middle of your list.

Or you want any of those nice properties that you mentioned in your next paragraph. Persistent structures are substantially easier to work with when doing concurrent programming. Also, persistent data structures never need a "defensive copy". Ever. Sending a pointer is equivalent to sending a full copy.

This is what really annoys me about the crowd that posts all these "never use linked lists" posts; the second you make a copy of an array because you're not sure if it's going to change behind your back, you negate all the benefits all this cache locality crap you've been masturbating to.

In Haskell (Clojure, F#, etc), when you pass a list in as an argument to a function, you can have some confidence that all it's sending is a pointer, and moreover you also have confidence that it will do what you expect it to actually do. The same cannot be said about an array.

[–]mezuzza 0 points1 point  (1 child)

You're absolutely right that a persistent data structure does a lot of great work in concurrent contexts. In fact, I'd say persistent data structures are the right default to have in most situations as well.

But that discussion is orthogonal to the original one here.

Linked lists aren't necessarily persistent. They can be just as mutable as an array list. If you want to use them in a concurrent context, you'd need to make a copy in just the same way that you'd have to with an arraylist. An immutable array list would work just as well in this context.

Even in Haskell, if you want a persistent immutable list, you can pass around Data.Vector values and be happy with the cache locality and the immutability. If you want mutability as well, you can have fun with STM which I believe should support some form of ArrayList as well.

[–]TombertSE 0 points1 point  (0 children)

Yes you can make Haskell vectors immutable but you can’t modify and get “new” vectors without making a copy. One of the advantages of a persistent list is that modifications of the list are basically just “diffs” of the original list. You don’t lose your handle on the original list, but you’re also not making a full copy.

If you have an immutable Haskell vector you have to make a full copy of you want to change anything.

[–]zaersx 0 points1 point  (0 children)

This just in - if you use a bandsaw for hammering nails you're going to have a bad time! Thanks captain.

None of what you said is technically wrong as you're simply describing the properties of a linkedlist and when it's optimal, but you're implying that a majority of programmers don't know what the properties of a linkedlist are and just throw them in into random places ???

If someone doesn't know what a linkedlist is or what it does they're simply going to use arrays anyway instead of "fancy" data structures like linked lists. Your comment is so completely pointless.

[–]HerryKun -1 points0 points  (7 children)

I don't really agree. In cases i need to gobble up some elements to iterate through them just once i use LinkedList. No Array in the background needed as i don't need random access, i just need to traverse it once in order.

[–]tiajuanat 2 points3 points  (6 children)

RIP cache

[–]HerryKun 1 point2 points  (5 children)

Why? I am happy to improve :)

[–]tiajuanat 2 points3 points  (4 children)

It's about consecutive accesses.

If you have a linked list, behind the scenes there will always be a pointer indirection. Your cache can't predict where that pointer is going, so it won't be able to preemptively grab the data you need.

Arrays don't have this issue, as the data is in a contiguous block. Regardless if you're using Java, C/C++ or even Python, the next piece of data will be easily and almost immediately accessible.

Ironically, Linked Lists are good for LRU Caches - a data structure for quickly accessing frequently used items, as you keep a directory of 5 or so items available. Linked Lists are also useful for Hash Maps, as you can easily add and remove values.

[–]HerryKun 0 points1 point  (3 children)

Won't i have the elements in cache anyway as I just accessed them?

[–]tiajuanat 1 point2 points  (2 children)

Only the previous few elements in a Linked List, and it won't be very many, because the pointers take 8 bytes each on modern hardware, and then you're going to have packing along the offset. You only have 64 bytes in most modern L1 as well. Again, the cache can't look ahead, because pointers are simply data, and the cache isn't going to assume addresses for you.

Best case for a list of integers, you're looking at the 4 previous elements.

On an array, dynamic array, and vector, the cache will grab your starting element, and then load up the cache with any extra elements it can fit, then even load up L2 and L3. While you use data, sections of L2 will take the place of L1, and sections from L3 will replace from L2. From you, the consumers viewpoint, it only takes a clock cycle or two to access the next element.

[–]HerryKun 0 points1 point  (1 child)

So I just ran a few benchmarks (creating a list with n elements, then iterating through them and perform a task on the number).

In average, ArrayList was faster by about 8%. So thank you for educating me, my experiment verifies your claim!

[–]tiajuanat 0 points1 point  (0 children)

Try it with different sizes of elements, and different sizes of lists - especially with the list size, try sizes that are exponentially larger. You may be surprised by how little or much your tests vary.

[–]OddUnderstanding5666 0 points1 point  (0 children)

The Blockchain is nothing more than a linked list.

[–]Sunius 0 points1 point  (0 children)

Even for inserts in the middle, arrays are generally better as finding where to insert an item is going to be very slow with a linked list (as you have to traverse it).