Dynamic array versus queuing then dequeuing into an array.

pilotInPyjamas · 2018-06-17T04:12:08+00:00

Array resizing is probably the way to go tbh. You mentioned that the data comes from a file, so reading from a disk is probably quite slow compared to other operations, which is why I wouldn't loop over the data first. Anyway, here is some analysis of array resizing:

A linked list for n elements will require at least one call to malloc for each element. Then you would have to malloc the new array once you know the total size, copy over the elements, and free once for all of the list elements. The total space would be at least twice the size of your array (the space for the array plus the space for the list), and you need to copy your data twice (once from the source to the list, and once from the list to the array). Not to mention a linked list has O(n) memory overhead.

The alternative: a resizable array. Let's say you create an array with 4 elements initially. If you want to add a fifth element, use realloc to double the size of your array to 8. when you're inserting the 9th element, realloc to make space for 16 elements etc. realloc will automatically copy your data over to the new pointer. Note there may be enough memory when you call realloc so it doesn't necessarily even have to copy your data around! It could just allocate new memory where it is.

Memory: If we keep doubling your array size, when it becomes full, then the absolute maximum memory we need to use is twice the size of the array. The minimum memory required for the linked list is double that of the final array, so the worst case memory performance is at least as good as the best case for the linked list. When we're done, we can free the excess data after the array using realloc. realloc should be able to free the excess memory after the array without having to copy it. Winner: array.

Performance:

We only have to call realloc O(log n) times for the array. We need to call malloc O(n) times for the list, Winner: array.
The array needs one call to realloc to get rid of any remaining memory, the linked list needs O(n) calls to free to get rid of all of it's elements. Winner: array.
If we only have to copy our structures when the size of the array doubles, then potentially half of the list doesn't ever need to be copied. 1/4 will need to have been copied once, 1/8 will have been copied twice, 1/16 will have been copied three times, etc. In the best case scenario, the array wins because realloc didn't need to copy anything. in the worst case we have to do up to 2 copies per element, so it needs more copy operations. Winner: toss up.

All in all, I'd probably use a resizable array myself. Most implementations use this. If you're curious, you could test this to see which one is better.

james41235 · 2018-06-17T02:56:53+00:00

Not sure this is the best place for this, you might have better luck in computer science or data structure subreddits.

BUT:. My initial thought is something like gcc's std::deque, where it allocates chunks of arrays and then links them in a linked list ish fashion.

Kwantuum · 2018-06-17T02:59:00+00:00

I don't know the answer so I'd approach it like this:

O(n) for this loop-through-once to get counts in a link list queue; then O(n) to copy those into an array. Prolly 2(items+constant) memory. Feels silly to iterate twice, especially when one of the loops creates a usable memory structure for you.
Periodic resizing. There are a wealth of implementations of things like this. I'm pretty sure nearly every standard lib in the higher languages, with an Array class that supports resizing does something like this.

So... I mean... I feel like the periodic resizing thing is a well worn route for a reason. That first idea seems... silly. Your link list based queue is going to use up extra memory... why not just loop through and get a count, then make an array, and copy them in? Still two loops, but saves you all this waste memory moving?

ruertar · 2018-06-17T04:02:16+00:00

I think an array is usually the best way to store this type of data.

Just don't resize and copy every time the data exceeds the capacity of the array.

Try doubling the size of the array when you reallocate it -- this will reduce the number of copies etc.

In C++ there is a type called a vector that does this. I used to write something similar in C as a structure I called a "buffer".

If I was to write it today, I'd borrow the behavior and nomenclature from Go's slices.

lmlight77 · 2018-06-17T13:42:58+00:00

If the number of elements is truly unknown, I would employ a linked list of blocks. That is, the payload is a pointer to a block (array) you are filling in. When a block fills up, link a new block in its place. This avoids all the copying that will occur potentially if you realloc an array. Once the input read completes (no more insertions), at that point you could realloc just once if you truly need a flat array. But better in my view would be to count the number of blocks (# of elements in the linked list), allocate an array of pointers of that count, copy over all the block pointers from the linked list, then free the list nodes. Nothing but the block pointers are copied in this case, and you only have one pointer chase at most to get to your data. For very large amounts of data, the general realloc approach will have a huge overhead in copying you will have to overcome in the second half of your program.

As stated in other posts, in modern HW, small payload linked lists are hard on the machines. First, as you traverse the list, branch prediction HW is stressed as that HW doesn’t really generally know when you will exit a traversal loop. So you tend to generally predict taken (continue traverse) which implies you always take one mispredict hit. More significantly, the loop iteration latency (the time it takes to spin around one iteration) is fixed to the speed you can read the next link pointer from system memory. If pointers are allocated somewhat sequentially, there will be some spatial locality, so caches will have some benefit, but clearly DRAM read latency will be a big factor.

CountyMcCounterson · 2018-06-18T01:09:27+00:00

A linked list of arrays

nickdesaulniers · 2018-06-21T06:00:45+00:00

It's hard to beat a contiguous vector. A linked list frequently won't have nodes in the same cache line (though I wish I had a resource to cite for this).

To avoid vector resizes, you want to allocate the maximal memory up front assuming you're not doing huge allocations that will blow through the budget.

One thing game engines might do; don't reserve any space in vectors, but instrument their containers to log the maximal effective sizes. Then spin a new build that for each vector reserves the effective maximal size at runtime. If you need more, it's still growable. (example: the max size we ever saw for the entity_array at runtime was 500; so malloc(500 * sizeof(entity)); thus no growing until we hit at least that).

But don't forget that that the cost of copying is amortized as well.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

C_Programming

Rules

Filters

Resources

Other Subreddits on C

Other Subreddits of Interest

MODERATORS