I hate waiting. I hate sitting there watching the timer spin. Knowing the data is flying in the background is no comfort to me as I wait for a chart to finish loading. No matter how much data is moving through the pipes, when I see that timer start to spin a part of me wants to scream. FASTER! I want it NOW!
One way to make a database faster is to give it some Indexes to work with. In this post I’ll describe the basic types of indexes used in Microsoft SQL Server and how they can help us get to our data faster.
This topic leans toward the technical, so let’s start with the summary (I was always a dessert first kind of guy) and then break it down from there.
Unordered (unsorted) date with no index is called ‘the heap’. Nonclustered indexes are sorted lists of references to unsorted data (‘the heap’) OR sorted data (the clustered index). The clustered index contains sorted data.
“Unordered (unsorted) date with no index is called ‘the heap’”.
Imagine a phone book. First take a razor and cut out every entry in the phone book. Next drop all those slips of paper into a big box and mix them all around real good. This big pile of unordered slips of paper looks a lot like how SQL Server stores data with no clustered index. SQL Server calls this mess ‘the heap’. Searching a table with no index is like reaching into that big box and pulling out handfuls of paper. You look at each slip of paper and if it’s not the one you want you put it aside. You keep going until you find the one combination of LastName, FirstName, Address and PhoneNumber you’re interested in. SQL Server is really, really fast … but no matter how fast the database, it’s not going to do well without an index.
“Nonclustered indexes are sorted lists of references to unsorted data (‘the heap’) OR sorted data (the clustered index).”
Now imagine we take that big box of unordered slips of paper and organize it into small boxes. Each small box contains one random handful of slips cut from our phone book. We give each box a label so we can find it again. This time instead of a phone book you’re given two small books containing references to the labels on the boxes we just created. One book is a list of first names in alphabetical order. For each first name you find references to the boxes where the phone numbers for people with that first name can be found. The second book is an ordered list of every town in your area. For each town there is a list of street names. For each street name you find references to the boxes where the phone numbers for people on that street can be found. These books are like nonclustered indexes. Nonclustered indexes are ordered lists (using one or more fields) that can help with searches. In this example we are using nonclustered indexes to speed up searches of data found in ‘the heap’. SQL Server queries the much smaller (compared to ‘the heap’) nonclustered index and is given a reference to get to the data that it needs to answer the query. In SQL Server we can have as many nonclustered indexes as we want. In a future blog post I’ll go into much greater detail about how nonclustered indexes help (and sometimes hurt) us when we are optimizing database performance.
“The clustered index contains sorted data”.
The last, and most useful index is the clustered index. In this example we imagine that we’ve created a clustered index with the combination of LastName, FirstName, Address and PhoneNumber . We can picture the phone book we started with as the index that SQL Server would create for us. When we find a record in a clustered index, we need search no more. There is no reference to another location where the data is stored, SQL Server stores the data associated with a clustered index IN the index. In other words, the clustered index reorganizes ‘the heap’ into a sorted list. One drawback is that we can have only one clustered index per table. This makes sense when you think about it, remember … a clustered index is a sorted list. You can only sort a list one way (by one field or one specific sequence of fields). Clustered indexes are great for performance.
This has been a look at clustered indexes, nonclustered indexes and ‘the heap’. In future blog posts I’ll talk more about how these indexes can be tuned to maximize performance.