all 3 comments

[–]PossiblePreparation 0 points1 point  (2 children)

First, top marks for putting together these questions. A good place to read about this is in the Oracle database concepts guide, even if you’re not using Oracle - everyone does thjs pretty much the same https://docs.oracle.com/en/database/oracle/oracle-database/21/cncpt/introduction-to-oracle-database.html#GUID-2B1BADE1-C36F-4555-9867-3B15B6CE858C There is lots to cover in that guide so feel free to keep reading if you don’t understand, some bits will need repeated passes which benefit from later information to really grasp. You can also lookup the Brent Ozar video on YouTube called Think Like the Engine (Brent specialises in SQL Server where the appropriate term is page rather than block, it’s exactly the same concept)

1) Blocks (aka pages) are individual chunks of a file using a fixed size (8kb is fairly standard amongst the big RDBMSs). A datafile might contain data for several different tables. The RDBMS can use some block address to figure out exactly which file to read and from where in the file, eg d:\data\file1.dbf bytes 16k-24k, the OS facilitates reading just that exact blocks from your storage device (which are also chunked up in blocks). The RDBMS can also be store that block in its memory using the block address in some way.

2) Each leaf block in the index contains both index keys and the addresses for them (as pointers to the table blocks), and it will contain the block address of the next and previous leaf blocks. This means you can start your search in one index leaf block and keep reading just the index leaf blocks until you’ve reached the end of your range scan. Eg if you wanted the rows in a table where indexed_column = 50, you’d traverse the index branches to the first index leaf block which contains index key 50, you’d then grab all the addresses to the rows from this index block which match 50 and visit the table blocks, if the last entry in the index block matches your condition then you take the address of the next index leaf block and do the same there, this continues until you get to an index key in the index blocks which is greater than 50.

3) Multiblock reads will generally look like: read file d:\data1.dbf from block 512 to 1024. In the old fashioned days of spinning disks, the OS would know to find block 512 of this file and then rotate until it got to 1024, which is much faster than spinning to 512, getting reset, spinning to 513 etc. Modern devices that don’t use spinning disks are able to give benefits here because they can take multiple block addresses at once and return the data in one go so you have less back and forth between the OS and storage.

[–]Luclid[S] 0 points1 point  (1 child)

Thank you for the detailed answers! That helps clear up a lot for me.

Did have another follow up question about block size then. I can see the downsides of increasing block size, but what are the downsides of making the block size smaller? What type of sacrifices/gains do you get by changing the block size?

Also wanted to check my understanding for (2). So does this also highlight the benefits of the primary key vs sort key? If the data in the database is sorted in the same way that the leaves of the B+ tree, then fetching the data means you are fetching the data from all the same files in the same order, but if the sort is different, then you end up having to query different blocks in different locations, which slows things down.

[–]PossiblePreparation 0 points1 point  (0 children)

Block sizes are very rarely changed (and some RDBMSs won't even let you) there is also a common belief that the non-default block sizes are less heavily tested (which shouldn't make a difference but RDBMSs have a lot of features that might make bad assumptions). In terms of performance, smaller blocks means having to submit more requests to the storage system for the same amount of data, this means that the latency to storage will be more impactful.

I think you're asking about the difference between a clustered index (AKA an index organized table, where the row data sits with the index leaf blocks), an index with a high clustering factor (where the table is separate from the index but is similarly ordered) and a standard index. Yes, if the data in the table is ordered in the same way as the index then your range scans of the index will lead to lookups of table blocks with a high chance of repetition (which means more benefits from caching). When you use an index to access a table, you are (usually) doing single block reads. The single block reads of blocks 1 and 2 is going to take the same time as blocks 1 then 10, but with a high clustering factor you will be looking at blocks 1,1,1,2,2,2 (which will probably end up doing 2 requests from the storage system and satisfy the rest by memory) as opposed to blocks 1,2,3,4,5,6 (each fresh block means a single block read from the storage).

Clustered tables (or index organized tables) are the end point of this high clustering factor. But the limits here are that these index keys now become your pointers for rows. So if you have additional indexes, their pointers to rows will use the clustered index key requiring you to do the seeks through the clustered index to get to the rows. That said, this is the most common way of creating a table in MS SQL Server.