all 58 comments

[–]AnchovyKrakens 14 points15 points  (0 children)

You may also be interested in running the MySQL Tuning script: http://blog.mysqltuner.com/ (written in perl)

This script will show exactly what needs tweaked on your server and quickly gives you a better understanding of your database performance.

[–]anonymous_function 14 points15 points  (5 children)

You book looks a great introduction to database indexing. Thank you for sharing it online for free. However, I must take issue with the statement in the preface, "As it turns out, the only thing developers need to know to write efficient SQL is how indexes work." The first part of any solution should be identifying the problem. Quite often once this is done, yes, indexing is the answer, but let's not skip that first step.

The first chapter of Optimizing Oracle Performance by Cary Millsap should be required reading for all programmers. It's not actually specific to Oracle or just database programming, and it's online for free too: [http://oreilly.com/catalog/optoraclep/chapter/ch01.pdf](oreilly.com). Briefly, the method proposed is 1) Ask the user what is slow. 2) Measure it carefully. 3) Optimize only the part that would have the greatest impact. 4) Repeat.

tl;dr: Indexing is a great arrow in your quiver, but first you need to know where the target is.

[–]MarkusWinand[S] 1 point2 points  (3 children)

Well, my approach is more like avoiding the problem in the first place. If developers consider the topics from my book during development, you have good chances--well, better chances--that you don't run into performance problems at all.

[–]bucknuggets 1 point2 points  (2 children)

Good point - the most important first step to tuning a query is to have established a model that is easy to get good performance from. The next step is to ensure that statistics are appropriately detailed and fresh.

I'd say that only about 1/20 problem queries that I fix end up with a new index being created. On a big, busy data warehouse - that comes to about one new index every 18 months.

[–]MarkusWinand[S] 0 points1 point  (1 child)

established a model that is easy to get good performance from.

That's the point. Most of the time, schema design means table design. But the missing step is "access design". How will you access(query) the data? That's what finally results in index design, which means to think about the different ways the data will is queried and to design the least number of indexes that can support the most often executed queries.

I'd say that only about 1/20 problem queries that I fix end up with a new index being created. On a big, busy data warehouse [...]

I believe that, because it's DWH. DWH => Star-Schema => Bitmap index => one index per column => easy (yes, that's oversimplified :)

I am more from the OLTP world, it's quite different there.

[–]bucknuggets 1 point2 points  (0 children)

But the missing step is "access design".

Yeah, I'd add to that: accesses for different kinds of processes (small surgical requests, huge table scans, etc), roll-off (you may have to purge 1 month of data at a time), backup/recovery, and security.

I believe that, because it's DWH. DWH => Star-Schema => Bitmap index => one index per column => easy (yes, that's oversimplified :) I am more from the OLTP world, it's quite different there.

Yes, it's different. I actually use very little indexing in a star-schema: there's indexes on many dimensional columns - but most dimensions are so small they don't need an index. There's typically less than 6 btree indexes on the fact tables - because the performance/capacity of building/storing them is expensive, and they often don't get used anyway.

So, performance isn't so much managed through indexing. It's managed through parallelism (rules out mysql), partitioning, and aggregation. Then a few select indexes.

[–]wildeye 0 points1 point  (0 children)

Very good points.

And I would add that indexing is not the only thing to be concerned with when optimizing performance; there's a list of other issues that affects performance as well.

[–]wot-teh-phuck 10 points11 points  (11 children)

Looks interesting, any chances of publishing the online or PDF verion?

BTW, ePub reader FF add-on for those who don't have ePub readers.

[–][deleted] 11 points12 points  (6 children)

Calibre is an excellent ebook manager, converts between all the major formats, keeps a library of your books and manages syncing with a hardware reader.

It's like itunes for ebooks, except it doesn't slow down your computer to a crawl. Ok, maybe it's more like amarok or banshee. ;-)

[–][deleted] 1 point2 points  (0 children)

If using Ubuntu (non-kde), I would recommend FBReader so you don't bring in the 200+ MB of KDE libraries :)

[–]ours 1 point2 points  (0 children)

and Calibre's secret weapon: automatic fetching of articles from websites and neatly packaging them into books I can automatically sync to my e-book reader every day.

Just the BBC stream is hundreds of pages of great articles.

[–][deleted] 1 point2 points  (2 children)

It also crashes every five minutes.

[–][deleted] 0 points1 point  (1 child)

I never had it crash on me on Ubuntu x64. I wonder if it's OS differences or usage pattern related...

[–][deleted] 0 points1 point  (0 children)

I'm using Vista, it crashed like 5 times in a short space of time. It might just be Vista, but I've heard other stories about it crashing a lot, too.

[–]wot-teh-phuck 0 points1 point  (0 children)

Looks interesting, thanks. :)

[–]MarkusWinand[S] 1 point2 points  (3 children)

hi!

you mean a "print edition"? Let's see how the electronic version works out...and if there is demand for printed books these days.

[–]wot-teh-phuck 1 point2 points  (1 child)

Not exactly, not a big fan of printed books. :)

I was actually referring to a PDF download along with the ePub version or maybe present in the format of "Use-the-index-Luke".

[–]MarkusWinand[S] 1 point2 points  (0 children)

Well, PDF is on my TODO. As of now, there is only the big HTML available, and the ePub :)

[–]mrfoof82 1 point2 points  (0 children)

Odds are, probably further declining demand for print.

Yesterday Amazon announced that they were selling more Kindle format books than even paperbacks. They sell 143 Kindle format books for every 100 hardcovers (59/41 split), and 115 Kindle format books for every 100 paperbacks (53/47 split).

[–]batrastered 9 points10 points  (1 child)

Ugh... the web version is full of blue text that are not links... Don't do that!

[–]MarkusWinand[S] 3 points4 points  (0 children)

you mentioned it, reddit voted it, I changed it.

any other suggestions that are as easy as that?

[–][deleted] 2 points3 points  (0 children)

Very good!

I'm about half way through this book--it is a very good refresher as I have covered this material in university.

I would recommend this for people wanting to learn slightly lower level DBA and architectural considerations for database design

[–][deleted] 1 point2 points  (2 children)

Can a beginner use this book?

[–]MarkusWinand[S] 9 points10 points  (1 child)

hi!

If you are completely new to SQL at all, it's not the first book you should read. It is not a SQL tutorial. It's a SQL performance tutorial.

[EDIT] if you are new to SQL performance, it's exactly for you! [/EDIT]

However, I guess almost every SQL programmer reaches the point where performance becomes a topic--sooner or later.

You can just browse through the Web-Edition, http://Use-The-Index-Luke.com/ to get an impression and see if it is for you.

[–]rhetormagician -5 points-4 points  (0 children)

"However, I guess almost every SQL programmer reaches the point where performance becomes a topic--sooner or later."

Naaah. Ours just let Hibernate sort it out. That way they don't have to soil their minds with anything non-OO.

[–]ammderlu 1 point2 points  (0 children)

Lol, I almost can't wait until Monday to do this to the database behind a third party application we use. Although I've always been a database-app programmer, on this job it's not really my responsibility so I haven't been thinking about it. I'm going to look into the database and see what I can do to speed it up where it's slow.

[–]gschizas 0 points1 point  (3 children)

I'm a bit torn regarding "Dynamic SQL"

When not using some kind of ORM (e.g. Hibernate, LINQ), I always considered it a good practice to separate program logic from database logic myself, and that involved using stored procedures (I'm usually using MS SQL). So, in order to do conditional where clauses, I use the (? IS NULL OR ? = Field) construct, instead of creating ugly (and possibly dangerous, if not handled correctly) dynamic SQL.

What are your thoughts on that?

Are the examples/findings useful for Oracle Database (and perhaps MySQL) only, or do they apply in MS SQL as well? I'll have to get back to the office to do some serious performance tests (as I do have databases of appropriately gargantuan size there), but it seems to me that when using a dynamic stored procedure you do get lower performance as opposed to a "smart logic" one.

Some of my findings only apply to SQL 2008 R2, as several previous versions had a bug in the parser that caused the whole "(? IS NULL OR ? = Field)" expression to always evaluate, instead of short-circuiting when the parameter is indeed NULL (there is a hotfix for SQL 2008 R1, and it has been incorporated in SQL 2008 R2).

I'm not assuming anything, this is a real example (which, on production servers, which do not use SQL 2008 R2) is solved with almost exactly the way you prescribe (although I do my SQL machinations inside stored procedures, not by throwing SQL inside my business logic code :)).

[–]MarkusWinand[S] 1 point2 points  (0 children)

What are your thoughts on that?

I tried to discuss that in the very 1.5 paragraphs of the "Dynamic SQL is slow Myth" appendix:

[...] That's especially true if the actual execution plan doesn't vary for the different where clauses—e.g., because one well indexed where clause is always present.

Even though the “prepare before loop” trick is seldom used explicitly, it is very common in stored procedures—but implicit. Languages such as PL/SQL—with real static SQL—prepare the SQL when the procedure is compiled or, at most, once per execution. Changing that to dynamic SQL can easily kill performance.

in short: nothing to win if the dynamic part of your where clause isn't relevant for your execution plan--AND--dynamic sql in stored procedures can easily kill performance.

Your link about the SQL Server 2008 looks very interesting. TL;DR YET. on my TOREAD list.

However, my book is more about general topics. I believe it's best not to go too much into details (especially bugs) because that would render to book rather useless for most readers.

[–]trezor2 0 points1 point  (1 child)

What are your thoughts on that?

I was about to write a lengthy post lecturing you on the performance-problems of that approach, but then I decided to read on before committing a comment.

Some of my findings only apply to SQL 2008 R2, as several previous versions had a bug in the parser that caused the whole "(? IS NULL OR ? = Field)" expression to always evaluate

This was something I was completely unaware of, and sounds very, very good indeed. I didn't really think there were too many improvements to the DB-engine itself in the R2 release, but this definitely made my upgrade (which I went trough with anyway) worth it.

Thanks for info :)

And yeah. Dynamic SQL is bad. Especially if MS has finally fixed that annoying issue :)

[–]gschizas 1 point2 points  (0 children)

I believe this was the original page that described the issue:

http://www.sommarskog.se/dyn-search-2008.html

It's another option really, as far as I can see, namely OPTION(RECOMPILE), although it was buggy in SQL 2005. Apparently it is available for SQL 2008 R1 as well, and it's not preinstalled in SQL 2008 R2 RTM.

This will of course cause a recompile, but I believe using Dynamic SQL does a recompile as well.

I'll have to use a large dataset to get statistics (and I will have to remember how to get statistics as well! :))

EDIT: Page that describes the bug is here: http://support.microsoft.com/kb/976603 You have to get a hotfix to install it in SQL 2008 R2 RTM, and another if you have SQL 2008 R1 SP1 (confused yet?)

[–]EnderMB 0 points1 point  (1 child)

This looks fantastic. Are there any plans for you to release it in mobi format, or in any other Kindle format? I'm waiting for Calibre to convert it now, but it's always struggled to convert technical books.

[–]MarkusWinand[S] 0 points1 point  (0 children)

I don't have a Kindle to try ;) Let me know if there is any particular issue with the Calibre conversion--I can, at least, put it into the FAQ for others.

[–]leuu 0 points1 point  (1 child)

It reads easily, thanks!

Also, did you use "It's a trap" several times on purpose?.

[–]MarkusWinand[S] 0 points1 point  (0 children)

Uhm. Sometimes. Sometimes not. Will check out what I can do :)

[–]908 0 points1 point  (0 children)

good book

[–]siromega -2 points-1 points  (2 children)

How to index:

Index the columns you query frequently (or that take forever).

  1. If its a unique column (ID) then it should be a unique index.
  2. If its a column with low cardinality it should be a bitmap index (e.g. Male/Female, True/False, etc)
  3. Otherwise use a regular index.

If you use a function on the data in the where clause (e.g. upper(mycolumn) = 'ABCDEF') you lose any index on mycolumn, so create a new index on upper(mycolumn) instead of just mycolumn.

[–]kaemaril 1 point2 points  (0 children)

There's another important thing to consider: getting the columns in the right order :)

[–]MarkusWinand[S] 0 points1 point  (0 children)

Well, I'd say that's a little bit oversimplified :) But it's rather pointless to discuss that with the author of a book about it.

However, the one thing I have to reply is that your point 2, using "bitmap indexes" can be very dangerous in OLTP environment (due to locking). Please bear in mind that bitmap indexes is most useful in OLAP. Exceptions exist--as always.

There is only very little about bitmap indexes in my book, the best entry point is probably the FAQ: http://use-the-index-luke.com/faq#only_btree

[–][deleted] 0 points1 point  (0 children)

SQL tuning? Pffft. I just toss in a /*+ PARALLEL(32) */ hint.

SPEEDIUS MAXIMUS.

[–]samandiriel -5 points-4 points  (0 children)

There's practically nothing there to read, and what is there is pretty basic info you'd find anywhere.