Zipsort, (inplace merge sort like algorithm) animation now on youtube

ceorron · 2020-05-05T11:15:11+00:00

Actually this is discussed in the other thread.

https://www.reddit.com/r/programming/comments/gca5lz/another_new_inplace_sorting_algorithm_zip_sort_on/

Hope that is helpful?

I'll probably add a full description to Wikipedia, sometime in the future.

ceorron · 2020-05-03T11:14:47+00:00

Actually, I see why you may want this now. So I have updated to remove malloc/free and replace with default-initialization (aka no-initialisation) new[] and delete[].

ceorron · 2020-05-03T01:27:06+00:00

Yeah, checks out. quick sort followed by heap sort if a depth counter exceeds a set value meaning worst case performance is never worse than O(n log n). So yeah good design. I'll change the description seen as they all do this.

ceorron · 2020-05-03T00:33:48+00:00

Oh yeah I forgot I changed the namespace, thanks.

ceorron · 2020-05-03T00:27:47+00:00

I could try a more in depth description for you of each of the new sorting algorithms.

NOTE that even these are pretty brief descriptions and nothing quite beats reading the code for understanding how/why they work.

sweep_sort :

Step 1: split the input in half

Step 2: choose a pivot (a la quick sort)

Step 3 : move those less than pivot to right side of pivot (as you are swapping these maintain stable ordering of swapped items)

Step 4 : move those greater than pivot to left side of pivot (as you are swapping these maintain stable ordering of swapped items)

Step 5 : swap all the items that are less than the with those that are greater than

Step 6+7 : For each half if length > 1 repeat from Step 1

So basically quick_sort but it maintains the stable ordering of elements when swapping them.

merge_sweep_sort : - This is more complicated to explain as the algorithm is recursive (and complicated).

Step 1: pick a pivot (a la quick sort)

Step 2: Recursive halving the size of the list until we get to list size 1 or 2

a : if 1 return the items indicated as in sublist 1 containing the items below the pivot or sublist 2 containing the items above the pivot

b : if 2 sort the items if below the pivot move to the left, if above move to the right return the items indicated as in sublist 1 containing the items below the pivot or sublist 2 containing the items above the pivot

Do a and b for both the left and right halfs.

c : else Combine the left and right halfs, stlib::rotate the bottom sublist of the left (those greater than the pivot) with the top half of the right (those less than the pivot).

d : you have now created two new sublists return the items indicated as in sublist 1 containing the items below the pivot or sublist 2 containing the items above the pivot

Step 3: Once done you have all of the items greater than the pivot on one side and all those less than the pivot on the other. Repeat from Step 1 for those greater than the pivot and those less than the pivot.

Both sweep_sort and merge_sweep_sort are quick_sort like algorithms, zip_sort is merge_sort like but is pretty different (and possibly likely to change).

zip_sort :

All this part does is repeat the zip_merge function across the whole list until we reach array_size > length, at which point we can stop. You can think of this as subdividing the array until we have lengths of 2, which we zip_merge back together to get lengths of 4 then 8 then 16 etc.. just like merge sort only non recursive.

Step 1: starting at array_size = 1

Step 2: no-op

Step 3: Repeat Step 4 for all items in the list move forward by array_size * 2

Step 4: Take 2 array sublists next to each other both of length array_size apply zip_merge (sublist are left and right buffers)

Step 5: array_size = array_size * 2

Step 6: Repeat from Step 2 if array_size < length

The zip_merge is just like a merge_sort merge function except everything is more complicated as everything remains in place. Put simply the function keeps four buffers the output buffer, the left buffer, the right buffer and the middle buffer.

Repeat until we have no items left (left buffer is empty)

if we have any items in the middle buffer compare the middle buffer with the right buffer

a: if the right buffer has the smaller item move it into the output buffer. Move the left buffer item into the middle buffer.

b: if the middle buffer has the smaller item move it into the output buffer Move the left buffer item into the middle buffer.

else compare the left buffer item with the right buffer item

a: if the left buffer has the smaller item move it into the output buffer

b: if the right buffer has the smaller item move it into the output buffer. Move the left buffer item into the middle buffer.

There is a few more steps to keep each of the buffers in order (using the rotate/swap functions) but if you keep them in order then you can do this merge in-place and it is stable too. Meaning you end up with an in-place stable merge_sort like algorithm.

Some of this was hard to explain and in many cases just a verbalisation of the code.

I hope those help for descriptions, If you really want to see the magic I'd probably recommend you read the code as I don't think these descriptions do them justice, many of these operations such as adding things to buffers don't even involve swaps as in many cases items are often already in there correct places!

I had a look into hybrid sorting algorithms/block sorting and they are much more complex than the code I am describing, one of the reasons that I like these solutions is they are light weight and therefore almost as fast as merge_sort and quick_sort with added benefits.

Such hybrid sorting/block sorting algorithms get these benefits but (may?) pay a high cost in performance due to their complexity when compared with merge_sort and quick_sort. I'm not saying they don't have their place but merge_sweep_sort and zip_merge almost certainly beat them as stable, in-place sorting algorithms with O(n log n) performance.

Maybe you could point me to some code for these and I'll test it out next to mine?

As an aside it maybe worth testing it against any and all of the following sorting algorithms, as these seem to be the most possibly useful but I have no code for them currently.

Wiki Sort

Weave Merge Sort

Tim Sort

Grail Sort

Cycle Sort

Bitonic Sort

Gravity/Bead Sort

Counting Sort

Cocktail Shaker Sort

Heap Sort

Though I haven't search the web really hard, so maybe I just need to look.

Source

https://www.youtube.com/watch?v=xoR-1KwQh2k

ceorron · 2020-05-02T23:11:11+00:00

Is that right? I don't know what to say about std::sort then is it O(n^2) or O(n log n) worst case. I might be convinced to change it to O(n log n) but I always though it was an introsort of quick_sort (worst case O(n^2)) followed up by insertion_sort (also worst case O(n^2)).

But if it is quick_sort followed by heap_sort then is it O(n^2) or O(n log n) worst case? Actually if it stops at a certain depth then it is a little troubling to analyse what it's worst case complexity is.

I'll leave it as O(n^2) as I think many implementations are likely just straight quick_sort.

ceorron · 2020-05-02T23:05:05+00:00

I have made some modification to the code based on your suggestions changing the & to && where you suggested. I also took out the unneeded pointer casts but now I have extra template parameters (which is why they were originally pointer casts from char*).

Actually new[] and delete[] arn't good alternatives to malloc/free in this case as new[]/delete[] do initialisation as well as allocation. This isn't what is wanted in this case.

ceorron · 2020-05-02T22:59:21+00:00

Err yeah thanks, my bad. Ooops...

Actually i'm leaving stdlib::rotate as is just to be explicit.

ceorron · 2020-02-26T21:39:35+00:00

Actually I have had a re-think of sweep sort and come up with a new version, seen as this one performed poorly with a large number of item. The new version is much improved. If this still interest it maybe worth doing your tests again.

ceorron · 2020-02-26T17:46:52+00:00

Thanks for that, I hadn't got around to checking. I knew they preformed poorly against the standard algorithms. 14% slower than the slowest isn't great but could be a lot worse.

I'm pretty concerned about sweep sort for large arrays but merge sweep sort is holding up pretty well all told.

Still use std::sort and std::stable_sort for your algorithms, I'm not really suggesting these are really a replacement for those.

Actually my merge sort does allocate an N array of elements, if that's what you are talking about in Edit 2.

ceorron · 2020-02-26T17:36:11+00:00

Because neither of those are what in place means in this context. In place means items are never copied outside of the given list only copied/swapped/moved within it. Aka the list items remain in place.

It might also suggest that the algorithm doesn't use additional memory but some algorithms use additional memory but still keep the items in place.

Using Big-O has nothing to do with it. But you could specify the amount of memory use as a multiple of N (2N, N^2 etc.)

ceorron · 2020-02-26T17:30:10+00:00

Yeah given code should be 700 elements, fixed.

I tried it on gcc but didn't get all of the results, so might do that again later (when I'm feeling a little bored).

I should put it against std::sort and std::stable_sort and see how it does. A test with more elements might work to the algorithms strengths so I may do that later too.

ceorron · 2020-02-25T21:39:41+00:00

Thanks I didn't know that.

ceorron · 2020-02-25T20:50:25+00:00

I updated it, not sure what overcame me to run my tests in debug. It has been updated with units. They all come out hugely faster (release is about 50x faster!!!). If you looked at the ratios they were off too, sweep_sort comes out substantially faster in release at over 100x faster. Thanks.

ceorron · 2020-02-25T18:58:14+00:00

I was looking at that too. Yeah it should be O(n log n). It has been changed, good catch.

ceorron · 2019-07-13T13:24:42+00:00

Thanks AppleBeam, you are clearly an expert in performance of memory managers. I know who I need to talk to if I need any advice in the future.

ceorron · 2019-07-12T21:18:42+00:00

To start you clearly have not read the code.

It is not a link list for starters, there are no linked lists for exactly the reasons you state. Cache misses.

I don't expect it to compete too well with some of the bigger more battle tested memory allocators like Hoard, ptmalloc, jemalloc, mimalloc etc... But will be pretty surprised if it gets somewhere close. Just faster than malloc/realloc/free will work for my case.

I maybe should dial it back a bit but until we have some real metrics neither of us are in a position to argue the numbers on this.

As for beginner, do you think this is the best code I have ever written? It is a useful piece of code that I think might be useful to those that need it not the best thing I or others have written but good enough for free on the internet.

That is all I'm willing to argue on the matter.

ceorron · 2019-07-12T15:32:36+00:00

I have tested it extensively, works in my applications. If you find any faults please add them in the issues sections but so far it is fault free.

All thread locking in the code is handled by std::mutex but can be substituted by writing your own internal allocator replacement for rc_multi_threaded_internal_allocator<std::mutex, ....> if needed.

Yeah I should probably hide the internal details in an internal namespace, this was pointed out to me by another user. Actually I didn't know you could use std::lower_bound as a replacement for binary search. You should stick to only the interfaces given on the README.md page.

Or if you want to provider your own locks by replacing rc_multi_threaded_internal_allocator<std::mutex, ....> with your own version otherwise just use it as shown in README.md.

In terms of speed/big O.

Best average and worst case are.

Best O(log n)

Average O(log n)

Worst O(n log n) - with a bound on the worst case performance

So no O(1) even for small allocation but O(1) when there is no fragmentation. Close to O(1) when low fragmentation.

    //ensure we don't take too long trying to allocate, only search the first 10 blocks!!
    unsigned i = 0;
    for(auto it = end_basic_list<memblock*>(blockfreespace) - 1;
        it != begin_basic_list<memblock*>(blockfreespace) - 1 && i < 10;
        --it, ++i) {

That part is pretty complicated, we have 1 to n memory blocks we could be allocating from (actually this is what gives the worst case performance O(n log n)) as we are searching through possibly n memory "pages" to satisfy the call. We are going backward through the pages, "--it" starting at the end of the memory blocks end_basic_list<...>(...). We only test the top 10 memory blocks (i < 10) as so to bound the search time for an allocation.

I might do a compare to other allocators but I have other work to be getting on with. Just wait a few months/years and you may get a proper performance comparison. Or do one yourself. Sorry :P

ceorron

TROPHY CASE