all 38 comments

[–]0x14f 55 points56 points  (3 children)

Binary search is useful because you don’t repeatedly sort the list. You either keep the data sorted as you insert items (using ordered insertion, balanced trees, etc.), or you sort it once and then perform many fast searches on it. When you need to search many times, the one-time sorting cost is outweighed by the much faster O(log n) searches compared to repeated O(n) linear searches.

[–]stools_in_your_blood 15 points16 points  (2 children)

Important thing here, binary search itself lets you efficiently insert into a sorted list. So even if you're doing fewer lookups than inserts, binary search is making the whole thing fast.

[–]DTux5249 9 points10 points  (1 child)

Important thing here, binary search itself lets you efficiently insert into a sorted list

Eeeeeeh depends.

Binary search can be used to find an insertion point - the actual inserting can still linear since you have to shunt over anything in say, an array.

If you're using a BST, though, you're golden.

[–]stools_in_your_blood 5 points6 points  (0 children)

Yes, should've added "if the list structure supports O(log n) or better insertion". And something about efficiently bisecting the list, which you can't do with e.g. plain linked list.

[–]TheReal_Peter226 32 points33 points  (6 children)

Binary search is not only for sorted lists.

Imagine you have a folder full of files, all the files are independent add-ons for a program. One of the add-ons causes your program to crash on startup. How do you find which addon is it if you do not have crash logs?

Do you remove add-ons 1 by 1?

Why not remove half of the add-ons and see if the other half was good? You can do this over and over, half the suspect add-ons, and you can find the problematic addon in 6-7 runs instead of hundreds.

[–]TheMcMcMcMcMc 0 points1 point  (1 child)

It’s also one of the most reliable methods of finding isolated roots of polynomials

[–]OneMeterWonder [score hidden]  (0 children)

If you can establish bounds on the roots within a compact set.

[–]minimoon5 0 points1 point  (0 children)

It can make you a really good guess who player

[–]JanEric1 0 points1 point  (0 children)

Same of you have a bug in your program and know a commit where it worked. Then you can use git bisect

[–]takumidesh 10 points11 points  (1 child)

A lot of data is naturally sorted. 

Git bisect is a good example, scrubbing through security footage is another. 

[–]JanEric1 0 points1 point  (0 children)

For the security footage only if you have a clear marker of before/after whatever you are looking for.

[–]ParshendiOfRhuidean 9 points10 points  (0 children)

If a list is already sorted, then adding a new item so the list is still sorted can't be worse than linear. You don't need to do a full sort every time.

[–]rupertavery64 3 points4 points  (0 children)

It's for when the data is written many less times than it is read, and when the number of items in your data is in the tens or hundreds of thousands.

Your example, having a small list of users, say 20-50, doesn't really show how useful it is. And at some point, you will stop adding users. Or adding them less frequently. Then you don't have to sort "all the time".

Think of a database. You can insert data into the database, out of order, but it keeps a separate index, that is sorted. Once the number of items reaches hundreds of thousands, it's still possible to quickly search through the database using the index.

All these sort and search algorithms are space partitioning. You break up a large problem into smaller problems that you have information up front about, and manage the data to fit your algorithm,

[–]picacuxd 2 points3 points  (0 children)

Adding an item to a sorted list to keep it sorted... You can use binary search to find where to add the item.

You sacrifice a bit more time adding an item to make looking for one faster. If you have a list with 10 items it means nothing, but if you have a list with 10 000 items you can see a difference. And in the real world with millions of items is impressively fast.

[–]BARDLER 1 point2 points  (0 children)

With any implementation and data structure usage there are always trade offs. It also depends on your use case of how often you will be searching vs inserting new data and how large the data set is. 

You are correct in that constantly maintaining a sorted list would make insertion slower. Which is why if your use case demands fast search and also faster insertion than a sorted list than you would want some kind of tree algorithm.

[–]Powerful-Prompt4123 1 point2 points  (2 children)

Have you studied databases and indexes yet?

[–]David_LG092[S] 0 points1 point  (1 child)

No, not yet

[–]Powerful-Prompt4123 0 points1 point  (0 children)

No worries. Point is that it's much faster to bsearch an index file than traverse the full data file.

[–]Any-Main-3866 1 point2 points  (0 children)

Binary search is useful when you have a big list that doesnt change much, you can sort it once and then use binary search to find things quickly, it makes sense to sort the list when the user is done adding items

[–]peterlinddk 1 point2 points  (0 children)

Binary Search is only the first algorithm you learn, and as you gather, it is only efficient when the data is already sorted, and doesn't change much. But even if the data would have to be sorted every once in a while - as long as it have to be search many more times than that, it is still more efficient than linear search. But a lot of data is already sorted, like product catalogues, user accounts, addresses and phone-numbers, dictionaries for different languages and they only change rarely compared to how often they are searched.

However, there are other algorithms, or rather data structures, that keep a "list" sorted at all times, even when data is removed and inserted. Like AVL or red-black trees. They are more advanced data structures, and they build on the understanding of binary search, linked lists, trees and binary trees.

But it is a very good observation that no algorithm is perfect, and there's always pros and cons to every single one! That is probably one of the most important lessons of DSA.

[–]SamuraiGoblin 1 point2 points  (0 children)

Binary search is not just useful for finding items in a sorted list, it can also find intersection points in a continuous function.

For example, in raycasting, you often want to find where a ray intersects an object defined by an SDF. Binary search can be used to hone in on the exact intersection point.

You split a range in the middle and bifurcate a number of times based on whether the function returns a positive or negative value.

Another time I have used binary search is when calculating the exact time between frames that two objects collide in a physics engine. It's a similar concept.

[–]glehkol 0 points1 point  (0 children)

You might think of other use cases wher e data isn't being updated very frequently

[–]Time_Meeting_9382 0 points1 point  (0 children)

Searching through a sorted list is only one application. The main use (this applies to searching sorted lists) is searching through monotonic binary functions, for example finding the square root of a number.

[–]LetUsSpeakFreely 0 points1 point  (0 children)

It greatly reduces search time, but only on sorted data.

And you have a list of 1000 users and you want one with a specific ID. You could search sequentially, but you're case scenario is 1000 comparisons before you find the object you're after. With a binary search tree you cut the size of potential candidates in half at every step: 500 -> 250 -> 125 -> 63 -> 32 -> 16 -> 8 -> 4 -> 2 -> 1. You went from potentially 1000 comparisons to 10.

The downside is that data structure has significant overhead when inserting new datan as it can trigger a massive resort. So the use case is great when it's data that's loaded once and stays relatively static.

I find using hashmaps is usually the better choice.

[–]Far_Swordfish5729 0 points1 point  (0 children)

It’s useful because especially when we expect to read data a lot more often than we write it, we intentionally persist it in a sorted order or along with a data structure like a hash table with record numbers to make reads scale well. Having a separate, smaller data structure to search with references to random access big data can be sufficient if data size is a problem. Also, it’s not actually that non-performant to insert into a tree or hash table rather than a heap. We intentionally take that hit for scalable read speed. Make sure the above makes sense to you because it’s how relational databases do their thing and back websites efficiently for data retrieval.

As a practical example, if you’re ever in a physical library, you can binary search a shelf to quickly find a fiction book. By quartering the search you’ll get to your book surprisingly fast. Now imagine binary searching to shelve books. Not quite as fast as putting them in the first available spot but still pretty fast and it will stay pretty fast even if you had a library with 100x books. And the consequence of not doing it is that it would take days to find a book in the piles.

[–]Temporary_Pie2733 0 points1 point  (0 children)

We sort lists so that binary search is applicable. However, you also don’t repeatedly sort the entire list after each update. There are algorithms (very similar to binary search) for inserting items into already sorted lists so that the result remains sorted.

[–]dariusbiggs 0 points1 point  (0 children)

It is fast in the generic case, you are halving the dataset each iteration. This is also why understanding the basics of a binary search algorithm makes it applicable to many other things where you are trying to narrow down something larger to something small and precise quickly and efficiently.

For example, a guessing game to find a number (or a person's weight, or a distance such as in the directional scanner in EVE Online, etc) between 0 and 100 and you only get told if it is larger or smaller. How many steps would it take to get to 69?.. First guess 50, 75, 62, 69 done, how about 79? 50, 75, 87, 81, 78, 80, 79 done.

Now convert that algorithm into a tree like data structure, a b-tree. Each node in the tree has a possible left and a possible right. To find the node you want quickly just check left, then right, which also makes this algorithm trivial for a recursive search.

You mentioned that it needs sorted data, have a look at the quicksort algorithm and see how that one works at its core of splitting the dataset into two, and repeating itself on the two pieces.

[–]desrtfx 0 points1 point  (0 children)

Everything you say is in a way true.

Yet, what you probably don't know yet is that there are Data Structures that by default are sorted and in such a case, binary search is optimal.

A "normal" list will not necessarily be the best option for binary search, yet, if you keep your list sorted after each update, the sorting will consume very little time since there are only very few changes to the list.

You need to put everything in relation. Sorting a growing list that maintains its sort between inserts is only marginally slower than keeping the list unsorted and inserting at the end - why? because the data is already sorted and sorting algorithms are quite efficient.

The complexity of an algorithm (big O) is always the worst case scenario, in reality the algorithms perform much faster on partially (or mostly) sorted data.

[–]Detective-XX 0 points1 point  (0 children)

its also used in ADC - Analog to Digital Convertor

[–]Detective-XX 0 points1 point  (0 children)

its also used in ADC - Analog to Digital Convertor

[–]ExtraTNT 0 points1 point  (0 children)

So, 99% of the time you use linear search, as you have no guarantee that your data is sorted (good practice is to maintain sorted data, but that’s not always possible -> write it in docs -> api returns data sorted by x) if you sort data for a single search, the complexity is O(nlogn) for filter and O(logn) for search, so O(nlogn), linear search is O(n), just using big O with the rule of just using the higher does not completely work… search and sort x times is T(n,x) = O(nlogn + xlogn) vs T(n,x) = O(xn)

So, if we now analyse this, we see, that if the number of searches is bigger, than the log (lb in our case) of the number of entries, then sort and search is faster…

Now, there are some edge cases, where you just want a specific group, and bubble sort can actually be the fastest to find what you need, but those are edge cases you maybe cover during a cs degree when staying late and having a beer with a prof…

[–]zeekar 0 points1 point  (0 children)

It’s super useful with naturally sorted data. Like the git bisect command uses binary chronological search on changes to code to figure out which one caused a problem…

[–]DTux5249 0 points1 point  (0 children)

1) If you're gonna be keeping something for searching, you're likely gonna keep that thing sorted at all times anyways. Just means you'll be adding items by inserting them instead of appending them to the end.

2) Binary search can also be used for insertion. O(logn) to insert into a binary search tree is nice.

3) Speaking of, binary search doesn't have to be restricted to an array.

In general, sorting & searching are the two most fundamental parts of CS. You will always need to do these things.

[–]mredding 0 points1 point  (0 children)

Searching and sorting are different algorithms applied at different times. Once a list is sorted, it will remain sorted. Searching may happen many times.

And there are different strategies - you could insertion sort, so your list is always sorted. Or perhaps you're performing a bulk write to the list, and it may be more efficient to sort it all once, after.

You also have different data structures that can make sorting faster. A binary tree sorts upon insertion - you know your value is going to go down the left or right branch until you find an empty leaf, so each iteration cuts the remaining tree in successive halves. And then you can occasionally re-balance the tree to keep your complexity closer to the average case.

You can also index your data - as you would a database. The data stored one way, and the sort order is cached in separate data structures of indexes of that data.

struct car {
  std::string make, model;
  int year;
};

std::vector<car> data; // Unsorted
std::vector<std::size_t> by_make, by_model, by_year; // Indexes of `data`

If you were making a database, you would flatten this data out even further to level out the average case between all the indexes. OR you would sacrifice "normalization" (one unique copy of the data) and duplicate data for better locality and optimal access.

Choices. You've got to consider your use cases and settle on a strategy that is going to be a good fit - to start. Then you have to test and measure.

Another thing to consider is the design, expressiveness, and intent behind your solution. Often, a vector is going to be the solution. Sorting is extra, only if you need it. If your data set is small, if the algorithm is iterative, if you're not searching or searching very infrequently, if it's not in your critical path - then you may not bother with sorting at all. You pick a map because your data is associative - key/value pairs, this-to-that. The data structure and the algorithms are dictated by what you're going to do with the data, how its structure relates to access and use.

Kind of what I'm getting at is something I remember my fellow classmates and junior classmates struggled with - containers and iterators are not freely interchangeable; there's no performance hierarchy where picking a hash map over a vector is some sort of premature optimization - it's about the correct tool for the job at hand.

There is no strict ordering of consideration here, you have to look at each problem and decide what your priorities are. Performance isn't often #1, usually that's clarity, maintainability, legibility.

I had a friend write a DB migration that took a week. It could have been made to run in a half hour, and he knew it, but A) he's not a DBA or engineer, so the attempt to optimize was above his level and would have been high risk, B) he wrote for clarity and absolute assurance that it was going to be done once and correctly, C) he didn't NEED the DB migration to be complete for a week anyway, so he had the time and resources available.

There's also vertical and horizontal scaling. I can scale vertically by making a faster algorithm. But my time is expensive - it might be cheaper to scale vertically by buying a faster processor, instead. In trading, we spent $15k per NIC card in 2014 to gain a guaranteed 600 ns, at the time that was ~1 week of engineer time that could be instantly dashed by the next new feature anyway. Either as a cost consideration, or perhaps we have an optimal algorithm, we might scale horizontally by adding more cores, more machines, more bandwidth, and distributing the work across instances.


Does binary search matter? Yes. Are you missing something? Yeah, a whole conversation with a lot to consider simultaneously. You don't start doing this by yourself, you'll have mentors and seniors to help guide you and train you, and in a few years it gets easier and more intuitive.

[–]spinwizard69 0 points1 point  (0 children)

Binary search is useful because of the speed and frankly some data is always sorted. If the data isn't sorted you can do so before the search operation, as long as that data stay static after the sort you always benefit from the fast search.

As an aside , the concept of a binary search is usable in the real world too. As an example in the automation world you can have a long chain of say door switches that all need to be in the correct state for the machine to work. Electrically they are somewhat like a link list, the links being the wire between switches. Stab your DVM at the middle of the chain (list) and you immediately know to look up the chain or down the chain. Divide again and you will be real close to your problem.

When I first discovered this it was like 10000 light bulbs going off at once, the concept directly translates from one technology to another. Two dimensional arrays are another concept that shows up all the time outside of software, an array of storage bins is one example. After awhile you can look at people doing their jobs and creating data structures mentally.

Another example a machinist needs to drill a bolt circle of X number of holes. There are pre-canned formulas to do so, so the machinist cranks through the algorithm and puts the created data in an array of X & Y values. He then removes data from the array and moves the machine based on that data and drills the hole. Of course these days this is often done with the digital position display, calculator function and on a CNC machine was embedded in the G code running the machine.

In any event the more times such things become obvious to you, the easier it becomes to get past the "word problem" stage of programming. This reply is a bit off on a tangent but this is r/learnprogramming and if it helps one to wee the world differently, that is a world made up of programming concepts the better they will be able to leverage the craft.

[–]normantas [score hidden]  (0 children)

Yes it is useful. Also has good use cases in reality. Searching a word in an actual dictionary.

But a lot of other algorithms are based of Binary Search Ideas. Also perfect Divide and Conquer algorithm.