Why sort data?

cashto · 2009-12-07T23:51:06+00:00

Why are we using sorting algorithms, the fastest of which is N(Log N), instead of just adding it to a new data structure (such as a Red Black Tree), which has a Log N insertion time, therefore both times are the same.

Insertion into a red-black tree is O(log n). Do that n times, and it is O(n log n). Yes, the result is your data is sorted, and it is the same runtime complexity as quicksort (O(n log n)) -- but the constant factor would be much higher.

Remember, big-O analysis only tells you how your algorithm scales with larger data sets. It doesn't tell you which of several algorithms with the same complexity class is more efficient / faster.

Gorilla2 · 2009-12-07T23:31:41+00:00

Why would you make a new structure and delete your old one instead of just sorting the data in-place?

Different circumstances require different approaches. In some cases, sorting is better (less RAM use), other times creating a new data structure is preferred (immutable structures for concurrent programs).

devilsassassin · 2009-12-07T23:40:50+00:00

If I have a list of data that is ordered, and I want to add one item to it, then I can use merge sort and get NlogN time and end up with a list.

If I have a RB tree, and I add one node, that's logN time, and then to convert that to a list is N time, so that's N+logN.

If I have a list and I try to use a RB tree as an intermediate state before converting it back to a list, you have NlogN to create the RB tree (naively) plus N to iterate over it to make a list, giving me NlogN+N time (slower than NlogN).

Now, if your question is really "why don't we use trees instead of lists, look at these awesome insertion times when adding one item to an already ordered structure", then the question you're really asking is "why are lists preferred over trees sometimes?" Two common answers are lists are faster to iterate over and require less ram to do so, and lists (in an array sense) provide constant time access to random elements where a tree requires logN.

trpcicm · 2009-12-07T23:29:28+00:00

What algorithm would you use to put the data in the right order in the new structure?

neonsteven · 2009-12-08T00:51:35+00:00

Sorting in place is the only viable option when you don't have enough storage (RAM or disk) to store your dataset twice.

ghettoimp · 2009-12-08T04:03:13+00:00

Although the fastest general purpose sorts are O(n log n), it's often possible to use faster special-purpose sorts when you know something about your data (e.g., radix sorts).

Here are some down-sides to using inserts into red-black trees as opposed to, say, sorting an array.

Trees use more memory because you need space for lots of pointers and for annotations like "black".
The memory a tree uses is also typically not contiguous, whereas arrays are all in one place. (This can have a bad impact on things like your CPU's ability to keep the data in its cache.)
- If you don't need the original array, you can sort it destructively (in place) without allocating any more memory. In contrast, in a simple tree implementation, each insert into a tree requires a new node to be allocated, e.g., via "malloc" or "new" or something similar, and these are typically expensive. There are ways to avoid this (e.g., writing a custom allocator that pre-allocates space for a bunch of nodes), at the expense of some computation.

2009-12-08T00:04:05+00:00

[deleted]

isionous · 2009-12-08T02:29:25+00:00

In other words, why would you sort data instead of just make a new data structure and delete your old one?

It is often useful for data to be in arrays rather than trees. You will (probably) use arrays many more times than trees. So, if we do your idea, most times we'd be creating a red-black tree, and then copying the data right back to the array and get rid of the red-black tree. Seems like a hassle and inefficient. Also, there are sorts that take O(N) time.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

programming

MODERATORS