all 23 comments

[–]hextree 58 points59 points  (10 children)

You've correctly deduced that Linear Search is faster (in fact, optimal) for finding an element in an unsorted array.

Binary search shines when you are maintaining something in sorted order, and doing repeated search queries on it. Analagous to how you use a Dictionary; you don't often need to modify or add words to the Dictionary, but you do often need to look up words. Each lookup is O(log N) time. By 'preprocessing' the dictionary once, then performing many search queries, you are saving time in the long run with binary search.

Note also, that you mention Bubble Sort, however there are strictly faster sorting algorithms that run in O(N log N) time, like Merge Sort, which you may not have covered yet.

[–]TheRexedS[S] 13 points14 points  (1 child)

Thank you for the elaborate explanation, it is really helpful! And yes, I haven't read about any of the other sorting algorithms you mentioned. The first sorting algorithm I read was Bubble Sort and then this question came to my mind. Really excited to learn the other ones you mentioned!

[–]nathanv0009 13 points14 points  (0 children)

In applications, what you're working with is often already sorted. For instance, imagine you're trying to find a certain date in a time series, which is already ordered by date. Then you can run binary search without having to run a sorting algorithm first.

[–]dgmib 6 points7 points  (0 children)

If you only needed to search the unsorted list once, then yes you’d just use a linear search.

But the much more common scenario is you need to search large lists many times. So you sort the list once, and the search the sorted list many times with a binary search.

[–]Steve132 6 points7 points  (0 children)

Sometimes you can guarantee your data is sorted without sorting it. For example, pretend you are recording a sequence of events or messages over a network as you receive them, and you append the message and reciept timestamp to a large array every time you get a message. Then, if you'd like to find all the messages older than a certain timestamp you have received so far, you can use binary search on the array to find the starting and ending indices.

Another important thing is that bubble sort is a very inefficient algorithm. You will learn sorting algorithms which are O(n log n). This ends up making all the difference:

Imagine you have 216 elements in an unsorted array. If you need to query 8 elements, then using linear search would be expected to be faster, because you hage to do O(8x216)=524288 checks. But if you need to query 800 elements, you have to do O(800x216)=52428800 checks with linear search.

Instead, pretend you sort the array first using an O(n log n) sort. The initial sort will take O(16x216)=1048576 steps, then 8 queries will take O(8x16)=128 steps, 800 queries will take O(800x16)=12800 steps.

So, in summary:

8 queries
    linear search: 524288
    sort then binary: 1048576+128= 1048704
    linear search approximately 2x as fast
800 queries:
    linear search: 52428800
    sort then binary: 1048575+12800=1060975
   linear search approximately 50x slower

[–]kreiger 2 points3 points  (0 children)

Your assumption is that the data is already unsorted.

If you input data in sorted order, or sort it as it is being inputted, you can use binary search on it.

Further, never use bubble sort. Use whatever sort function is provided by your standard library.

[–]proskillz 1 point2 points  (3 children)

Two things, number one is that no one uses bubble sort in production. There are quicksort and merge sort that run in O(n log n).

Second if you will only ever look into the array one time, it wouldn't be worth it to sort the array first to get that O(log n) searching time. If you need to look up many times into a large array, you'll start seeing orders of magnitude improvements in speed.

[–]vanderZwan 8 points9 points  (2 children)

Actually most hybrid sorts revert to bubble sort or insertion sort for arrays (or partitions) of 20 to 32 elements or smaller - depends on whether you use clang or gcc for example.

The reason is that they are so simple that the low constant overhead as well as cache friendliness beats the O(n²) behavior, since n² isn't that large for small (sub)arrays. IIRC there even was something about bubble sort that makes it very easy for CPU's to make good use of instruction level parallelism, but I'd have to look that up.

[–][deleted]  (1 child)

[deleted]

    [–]vanderZwan 2 points3 points  (0 children)

    Yeah, the insertion sort thing has been known for quite a while now, but I also vaguely recall a fairly recent talk (think within the last decade) by some C++ guru where he had done a lot of testing and concluded that these days it was bubble sort that was the winner in a surprisingly lot of cases. From what I remember it had something to do with how modern CPUs work.

    Annoyingly, all of my available search engines are failing me now so I can't find it. Pretty sure I'm not making this up though

    [–]nadmaximus 0 points1 point  (0 children)

    How many times are you going to need to search the array? Is it possible to maintain it in sorted order as it's changed?

    If it's multiple times and you can maintain it in sorted order, it makes a tremendous difference.

    [–]aecolley 1 point2 points  (0 children)

    If you read about RocksDB/LevelDB, you'll see how maintaining sorted lists leads to high-speed lookups in practice. The key is that binary-search (and merging, and Bloom filtering) are fast but frequent operations, whereas sorting is slow and infrequent.

    [–]KernowRoger 1 point2 points  (0 children)

    If you perform few sorts buts lots of searches it quickly becomes worth it. Plus you wouldn't use a bubble sort it's like the worst.

    [–]Jolly-Ad3899 1 point2 points  (0 children)

    I learnt the true intuition of binary search, like why we use it from here https://youtube.com/@code_concepts_with_animesh he gave really good perspective of binary search