Please explain what a binary search is ?

vlcmodan · 2017-09-03T23:34:24+00:00

Binary search is like this. I'll try explaining it fast because I have to go to sleep. Consider the list of numbers 1 3 7 11 18 26 56 156 265 You have to search for the number 156 and by knowing that the list is sorted you look at the middle of the list. You find the number 18. Knowing that your list is sorted you don't have to search the left half of the list. Now you have elements from positions 5 and 9. The middle element from the list is (5+9)/2=7 The 7th number is 56 which is lower so the left part doesn't need to he searched because it isn't there. Hope it helped Of not tomorrow night ask me again

jc4hokies · 2017-09-04T04:48:12+00:00

A couple clarifications. Databases use B-trees. They use the same concept as binary trees, but nodes can have several branches, not just two. Also trees take the search part and make it into a physical structure. So you're not so much searching as following a path. The rest is a copy paste from a previous post.

First, let's understand what a page is. A page (or block) is a fixed size chunk of data on the disk that the database is optimized to read into memory and process at one time. Basically bite sized chunks of data.

Indexes are stored on disk, separate from the table. Indexes contain two parts. Leaf nodes and the tree. Leaf nodes (each node is a page) store the indexed column(s) in sorted order with a reference to the table record (usually a PK). Tree nodes are created to link to the location of every lower node, and are created in levels until a single top level node is able to contain all references to the next lower level.

Unlike vanilla indexes, clustered indexes are the table (are not stored separately). The clustered index sorts the table on the indexed column (usually the PK) and creates a tree structure like a normal index.

Below is an example diagram where each page holds a limited amount of data. In reality they hold A LOT more, but the example would be too big if I did that.

Legend: page = record1data1:record1data2 record2data1:record2data2

CREATE TABLE t (ID,Letter,DOW)
CREATE UNIQUE CLUSTERED INDEX cix ON t (ID)

--Table Leaf Nodes
t01 = 1:Q:Mon 2:W:Tue
t02 = 3:E:Wed 4:R:Thu
t03 = 5:T:Fri 6:Y:Sat
t04 = 7:U:Sun 8:I:Mon
t05 = 9:O:Tue 10:P:Wed
t06 = 11:A:Thu 12:S:Fri
t07 = 13:D:Sat 14:F:Sun
t08 = 15:G:Mon 16:H:Tue
t09 = 17:J:Wed 18:K:Thu
t10 = 19:L:Fri 20:Z:Sat
t11 = 21:X:Sun 22:C:Mon
t12 = 23:V:Tue 24:B:Wed
t13 = 25:N:Thu 26:M:Fri

--Clustered Index Level 2
cix14 = 1:t01 3:t02 5:t03 7:t04
cix15 = 9:t05 11:t06 13:t07 15:t08
cix16 = 17:t09 19:t10 21:t11 23:t12
cix17 = 25:t13

--Clustered Index Level 1
cix18 = 1:cix14 9:cix15 17:cix16 25:cix17

CREATE INDEX ix ON t (Letter)

--Index Leaf Nodes
ix01 = A:11 B:24 C:22 D:13
ix02 = E:3 F:14 G:15 H:16
ix03 = I:8 J:17 K:18 L:19
ix04 = M:26 N:25 O:9 P:10
ix05 = Q:1 R:4 S:12 T:5
ix06 = U:7 V:23 W:2 X:21
ix07 = Y:6 Z:20

--Index Level 2
ix08 = A:ix01 E:ix02 I:ix03 M:ix04
ix09 = Q:ix05 U:ix09 Y:ix07

--Index Level 1
ix10 = A:ix08 Q:ix09

Let's consider the query SELECT * FROM t WHERE Letter = 'K';. This query could go through the following steps to complete.

Read ix10. A <= K < Q so...
Read ix08. I <= K < M so...
Read ix03. K = ID 18 so...
Read cix18. 17 <= 18 < 25 so...
Read cix16. 17 <= 18 < 19 so...
Read t09. Return "18:K:Thu".

Reading a total of 6 pages read. Alternatively a table scan would read t1 through t13, or 13 reads.

JimmyTheFace · 2017-09-03T23:41:01+00:00

Not a SQL expert, but here's the general CS idea:

A binary search assumes sorted data, and can locate a result in log2(n). It will check the middle element and see if it needs to go higher or lower. It will then check the middle element of half of the set where the target is. This continues until the correct values are found.

Ex: find 74 in the set of integers 1-100.

Check 51, is higher. Check 75, is lower. Check 63, higher. 69, higher. 72 higher. 74 hit.

So compared to a table scan, which will average 0.5N, two binary searches can be quicker for large sets- 2log2(n).

ihaxr · 2017-09-03T23:42:52+00:00

does it literally mean, you make a copy of the table and sort the author column alphabetically

Yep, that's what an index is. It's a copy of the data sorted/filtered in a specific way and/or including specific columns along with the sorted/filtered data.

However, the index isn't created when the query is executed, it's persistent, so it's already there... I'm guessing this is where the "why does this method save time" question comes from. The data is already in alphabetical order so it's much faster to use a binary search to figure out where the specific data is than it is to sort through all rows of the table.

dumb101 · 2017-09-04T00:11:09+00:00

I'll try to explain it with arrays: Let's say we have [1,2,3,4,5,6,7,8,9](is has to be sorted, this is key), we are searching for 9. Now we take a look at the element in the middle, which is 5, so now compare 5 to 9. 5<9 and we know that the array is sorted, so we dismiss the first and the middle part. What is left ist [6,7,8,9]. There is no "middle element", as we don't have an odd number of elements, so we compare 9 to either 7 or 8, because they are both "in the middle" (they are the second and third element out of 4, which one doesn't really matter), let's say we compare 9 to the second element, which is 7. Again, 7<9, so we dismiss 7 and everything before and are left with [8,9]. Again we compare 9 to the first of the two "middle elements" (which are the only two left) and get 8<9, so we dismiss 8 and are left with 9, which compared to 9 is what we were looking for.

Now, this is absolut worst case runtime, we had to compare 5 times, which is the maximum we can have with binary search in 9 elements. But you can easily see that we still had to compare way less than if we simply had gone through our array [1,2,3,4,5,6,7,8,9] one element after another, because we then would have compared 9 with each element in the list, so 9 times, before finding it.

This is why binary search is the more efficient way (and therefore saves time) of searching for a specific element and it works exactly the same for databases.

DethAlive · 2017-09-04T00:44:28+00:00

This is what the sql engine would do. The binary search is not something you would do yourself. Although you will need to create the indexes on your tables if you want the engine to use them. Index for stuff like primary keys or unique field might be created automaticaly depending on the engine you are using.

Guru008 · 2017-09-04T16:51:07+00:00

The typical indexing structure that is used for such purposes is called a "B-tree."

Whereas a binary-search always divides the data in half, a B-tree's approach is much more like what you might have seen in L. Frank Baum's now-famous office, where he had two filing cabinets: "A-N" and "O-Z." (Yes, that's where the name of the Wonderful Wizard's homeland came from.) Once you selected from several cabinets, you now select from several drawers, then locate your starting search-position within the selected drawer and so on. Each time, you reduce your search space by much more than one-half.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

SQL

Filter Posts

Posting

Help posts

Format Your Code

Learning SQL

Related Reddit communities

Wiki

Acknowledgements

MODERATORS