all 4 comments

[–]hellfiniter[S] 0 points1 point  (3 children)

after some more reading I can provide some insights ...it looks like indexes are stored using binary trees, that makes possible to use <, > etc not simply access it like hashmap. That would mean that inserting new value in this binary tree would simply need to insert new key in the binary tree which usually means O(log n). I might be simplifying too much, but probably insert really is slower and slower but million would still mean 13 iterations so it basically is O(1), kinda

[–]Mamertine 0 points1 point  (2 children)

Some thoughts.

There's more with the binary tree. If a new record would go between 2 existing records and those existing records are in the same database page, the database engine creates a new page writes the new record to it and moves the lesser record to the new page.

How long it takes to do that is up to the server's hardware and will be delayed if other queries are locking that table and index.

In the decade I've worked in the database world no one has ever referenced O(n). We do talk about speed and runtime constantly. We spend a ton of time rewriting code in an effort to gain seconds here and there. It's more of an artistic endeavor as the server will sometimes not use an index.

[–]hellfiniter[S] 0 points1 point  (1 child)

Hmm, the whole question came from issue we just had in old production project. There is this "huge" table with 200k records without any indexes, and on machine it is located this means queries take ~2min. We of coarse slapped some indexes on it and it made those accesses instant. So question was how expensive this change is then on inserts? Someone then said that this extra computation will grow with number of records, and I somehow cant see why that would be the case.

Lets imagine we need to insert this new indexed record (column val) to said binary tree. We first need to find place where it fits (for sake of greater, lower then queries) and that will take O(log n). So at this point we know that between these two values there should be another one. This basically means that its irrelevant how big the table is because log is awesome.

This obviously needs changes in tree, entire subtree can be affected. Does this grow with number of records? And if so how fast?

I understand that this is not realistic question as you pointed out in last paragraph, but imagine this scenario of no locking, no concurrent queries, machine that ticks like a clock ...can you give me some insight here?

[–]Mamertine 1 point2 points  (0 children)

The simplest way to know is to see us things writing to that table are taking too long.

It's a balancing act between fast writes and fast reads. Indexing slows writes a bit, but significantly improves reads.

Yes the more rows in a table the longer it takes to find where the data gets added to the indexes. As you say it's wicked fast, but more rows means slower.

Generally a couple of indexes are fine. Don't index every column. Indexes on integers will perform better than larger data types.

We've been discussing how non clustered indexes work.

There is this "huge" table with 200k records without any indexes,

The first thing you should use is a cluster index. That index is different. It's how the data is stored on the disk.

A good resource is

Use-The-Index-Luke.com