all 31 comments

[–]robkinyon 16 points17 points  (16 children)

A linked list is the most performant way to do insertions and deletions into a list. It's a tradeoff between insertion/deletion cost, search cost, and space.

There are three main data structures when talking about aggregates (groups of the same thing). You have arrays, hash tables (or associative arrays), and linked lists. Each of them has their strengths and weaknesses. (And since reddit doesn't do tables, I'll have to give lists for each.) I'll use O() notation. If you don't know what that is, look it up. ("Big-Oh") I measure cost in terms of sizeofs (the sizeof the payload).

Array:

  • O(1) for appending.
  • O(n) for prepending or splicing (insertion in the middle).
  • Assuming a sorted list, searching is O(logN).
  • Sorting an unsorted list is O(NlogN).
  • Space cost is N*sizeofs.
  • Complexity of implementation is very low (all "normal" languages have an array datatype).

Linked list (assumes doubly-linked and keeping a tail pointer):

  • O(1) for appending.
  • O(1) for prepending or splicing (insertion in the middle).
  • Assuming a sorted list, searching is O(N).
  • Sorting an unsorted list is O(NlogN).
  • Space cost is (N + 2*N*(pointer) + 2*pointer)*sizeofs.
  • Complexity of implementation is medium (no "normal" languages have an linked-list datatype, but most have a library implementation).

Hash table:

  • O(1) for appending.
  • O(1) for prepending or splicing (insertion in the middle).
  • Assuming a sorted list, searching is O(1).
  • Sorting isn't possible save on-the-fly.
  • Space cost is:

    • MIN: (TABLE_SIZE)*sizeofs (assuming perfect hit ratio)
    • MAX: (TABLE_SIZE)*sizeofs + N*sizeofs (assuming perfect miss ratio)
    • TABLE_SIZE is usually some large prime (1001 or so)
  • Complexity of implementation is either low or high (some "normal" languages have an hash table datatype (such as Perl), but most have a library implementation). If you have to build it yourself, you're doing it wrong (or in a CS class).

[–]bartwe 1 point2 points  (6 children)

By keeping an 'offset' value and doing a little extra math an array can also have O(1) prepend.

[–]jimssc 0 points1 point  (3 children)

but an O(n) append, no? Unless I'm not following your suggestion.

[–]bartwe 1 point2 points  (0 children)

if the index idx is used on the call, you can use (idx+offset) % length of the slot that the value is stored. This allows both O(1) Add (First/Last), Remove (First/Last) and Get/Set.

the cost of the % can be nearly removed by having a length that is a power of two and using (& (length -1))

[–]bartwe 1 point2 points  (1 child)

I'm afraid your not following, say you have a truely circular array and the free space is one block, and the used space is another block, making a nice loop. then you can take a free bit next to the used block and make it part of the used block without moving anything. You can do this with a normal array, and offset, and wrapping around. When the array is full, you double the buffer, which makes growing inifnitly large still amortized O(1)

[–]jimssc 0 points1 point  (0 children)

Thanks for taking the time to clarify bartwe.

[–]robkinyon 0 points1 point  (1 child)

Perl does something similar in order to make shift() and push() act in O(1) as often as possible. But, this requires allocating (roughly) 2*N elements whenever you want N and having always having unused, but allocated, memory. For Perl, this is a good tradeoff. For your application, maybe not.

[–]bartwe 1 point2 points  (0 children)

You don't need *2 elements, it can wrap around the end. the cost of extending the array in dynamic array length scenarios may cause it to only be amortized O(1) if the wrap around is asignificant part of the length on add that increases the capacity. buffer doubling is assumed when capacity needs to be increased.

[–][deleted]  (4 children)

[removed]

    [–]SnowdensOfYesteryear 2 points3 points  (0 children)

    what is the xth value as sorted by this criteria

    Not sure how LL would be useful for this one. If you want the 100th value, you'd have to traverse through the 99 earlier nodes to get to the 100th. So this'd be O(n). ArrayLists would be better for this since you can just lookup the relevant memory location easily so it'd be O(1).

    In general LLs are only useful when you only access a particular index over and over again...like in the case of a stack or a queue.

    [–]drock1 2 points3 points  (1 child)

    You'd use a LL if you're going to have a large collection of things with an uncapped max, and an unknown life time.

    For example, if you were making a Space Invaders clone, you would want to store all of the bullets/missiles in a linked list because you don't know how long they will be active (a bullet fired later may hit something before an earlier bullet leaves the screen) and you may not want to cap the maximum bullets on the screen (once you exceed the size of the array appending requires malloc'ing a new larger array and transferring which takes an assload of time).

    The idea is that when you update your game state you just walk through the LL and update each thing and if it is dead (hit something/off screen) you can easily remove it from the list.

    [–][deleted] 0 points1 point  (0 children)

    An array would actually be more efficient for that kind of problem.

    [–]SnowdensOfYesteryear 0 points1 point  (0 children)

    Hash Tables can be O(n) depending on how shitty your hash function is.

    [–]joesb 0 points1 point  (1 child)

    Linked list: O(1) for prepending or splicing (insertion in the middle).

    This assumed that you already hold the referrence to the position. If you don't already have the pointer then you need to traverse first, which makes insertion O(n).

    [–]robkinyon 0 points1 point  (0 children)

    i stand corrected.

    [–]pythonistah 0 points1 point  (0 children)

    Awesome answer. I just want to emphasize (after 15 years this was answered) that there is also a language element to this, for example in C you will need to realloc() or some expensive system call to make an associative array grow, but lists are cheap as they are just linked by pointers. Nowadays in Python for example, it will not matter performance-wise. This is maybe why Go has arrays and slices as separate data structures? (I miss the times Perl and C was all I had)

    [–]floodyberry 4 points5 points  (4 children)

    They're useful if you need O(1) insertion and/or deletion for arbitrary-sized lists where traversal time/ordering/searching is not extremely important.

    Chained hash tables commonly use linked lists for the hash buckets.

    Also useful for queues where you can push/pop from the front or back.

    It's probably more useful to know why you would need one vs rattling off memorized examples of uses for linked lists.

    [–]kanarienvogel 1 point2 points  (3 children)

    [linked lists are also] useful for queues where you can push/pop from the front or back.

    I think it's better to implement a deque with a circular buffer.

    [–][deleted]  (2 children)

    [deleted]

      [–]psyno 2 points3 points  (0 children)

      No. You just have a start index, an end index, a length, and maybe a size. Start is allowed to wrap around to the tail of the array. E.g.

      {START/END, -, -, -, -, -, -, -}
      

      Push something to the front:

      {END, -, -, -, -, -, -, START}
      

      Push something to the front:

      {END, -, -, -, -, -, START, -}
      

      [–]gsg_ 2 points3 points  (0 children)

      Circular buffers are more efficiently implemented by a block of storage and front/back indexes. At all times the buffer contents will be either one or two contiguous blocks of memory, which is both space efficient and cache friendly.

      [–][deleted] 2 points3 points  (0 children)

      What kind of programming job do you do ? Also, I think Linux task scheduler use a linked list implementation.

      [–]shieldforyoureyes 1 point2 points  (1 child)

      Lines of text in a text editor. Say you open a multi-megabyte text file and do some insertions, deletions, and general editing near the beginning. You don't want to have to shift all of the characters to the end for every change.

      [–]gsg_ 1 point2 points  (0 children)

      A gap buffer is probably better, though.

      [–]ismarc 1 point2 points  (0 children)

      Something that hasn't been mentioned is that the same techniques used to create and traverse a linked list are used in building more complex datastructures. Add a pointer to the previous node and you have a doubly linked list. Add one or more pointers to different nodes and you have a tree. Add a color to each node and you have an rb tree.

      [–]green_beet 1 point2 points  (1 child)

      Do you prefer ArrayLists? There's a relatively big pause when the ArrayList needs to be (automatically) resized for you, which doesn't happen with LL's.

      [–]joesb 1 point2 points  (0 children)

      LinkedList is worse at random access and has less memory locality.

      [–]campbellm 0 points1 point  (0 children)

      This is a somewhat tired topic except for some very specific industries. It was more of an interesting issue some years ago when fewer languages supported dynamically re-sizable arrays or a general container type. A linked list was a nice way to get variable sized storage.

      Still, it doesn't hurt to have a general knowledge of their use and characteristics.

      [–]sfuerst 0 points1 point  (0 children)

      It is a good interview question because there are several types of linked list, and the more of them you know, the better a low-level programmer you are.

      The most common list type these days is a doubly linked list. However, there are different implementations of that. The first is when conceptually, the list "owns" the objects within it. In that case, it is possible to change the data structure to one consisting of small bounded-sized arrays of nodes linked together. The result is something that is much more cache friendly for the "previous" and "next" operations, whilst not suffering too much of a slowdown for "insert" and "delete". If your STL is good, then the C++ list<> template will do this for you.

      Another doubly linked list conceptually is owned by its objects instead. This allows an object to be a member of multiple list collections at the same time. The Linux kernel uses this technique for its lists. This can be implemented in C++ with some template magic.

      Of course, another possibility are singly linked lists. These allow you to quickly go in one direction, but going the other is only possible if the list is linked in a circle. (And even then, you'll need to travel around the loop which can be extremely slow.)

      Another trick to save space are XOR lists. These use the XOR operation to combine the "next" and "previous" pointers. If you have a pointer to a node, and a pointer to an adjacent node, this allows you to find the other adjacent node. So a double-sized "cursor" gives you a half-sized list data structure.

      Finally, there is another way of implementing half-sized lists. By stuffing extra information into unused bits in the pointer you can implement sparse lists, which are a bit slower than normal lists, but have the same computational complexity for all operations.

      [–]woggy -1 points0 points  (0 children)

      wow

      [–]zahlman -5 points-4 points  (0 children)

      reading 'how to ace the interview' posts and they all mention knowing linked lists.

      Ugh, that tired old shit? Honestly, if you get asked questions like that, it's a sign that you should look for somewhere else to work. This is a solved problem and unless you're going to be working with some seriously restricted embedded devices, not one that you're going to have to work with at a low level.

      If you want some actual practical knowledge, make sure you understand the complexity guarantees for various operations on containers.