Indexing, Statistics, Query Analyzer.
Microsoft’s flagship database product, the SQL Server, comes in a bewildering array of sizes and prices which fit almost every need. There are free editions for Pocket PC handheld computers, the MSDE is an only slightly less functional, but free version of its bigger brothers. The standard, enterprise and data center editions fit the needs of small to very large corporations. Writing applications which run on the SQL server is not just a matter of making tables and normalizing your data, and then issuing queries against it. The application developer should also be spending time with the database administrator (DBA) planning for performance. If the DBA is not currently around, this series of articles provides a primer for application developers seeking to improve application performance by tweaking the database.
Performance is not simply a matter of having a good database on a fast system. These factors certainly help, and are occasionally decisive, but having a poor database design (not normalized) , poor data access strategy (a high number of unnecessary reads / writes or editing too much data) are the number one performance killers.
This article will use a sample database, pubs, to illustrate the techniques used. This database is based is delivered with every SQL Server installation, and can be found in the \install subdirectory in your SQL Server folder.
What are indexes?
This is a fairly straightforward question, the answer is also simple. Table indexes in SQL server allow the query engine to find the data row more efficiently. Indexes can be stored separately from the data, allowing you to put the index on a separate disk, or disk array. This obviously improves performance, one disk is busy reading and writing to the index whilst the other is busy fetching the data.
Clustered Indexes
SQL Server supports three types of index, Clustered, Non-Clustered and Full-Text indexes. Full-Text indexes are a very special category and are not discussed in this article. Each table can contain at most one Clustered index. Each table in your database really should have a clustered index. Each data row is referenced in the index and is then stored in a sorted fashion. Without a clustered index, the data rows are unsorted, the table is then known as “heap”, which is true in more senses than one! A good rule of thumb is to include only the row identifying column in your clustered index, this column contains unique values such as an identity value or GUID for easier identification of each row. If you mainly use a person’s social security ID for locating the corresponding data row, it would be best make a clustered index on that row instead.
Non-Clustered Indexes
Each table can have multiple Non-Clustered indexes. Non-Clustered indexes always use a clustered index, if available, for locating the data rows. If the table doesn't have a clustered index, the row locator points directly to the location of the row in the heap.
Indexing tables incurs a cost. There is nothing like having indexes for speedy data retrieval, but they bring a performance penalty on tables which undergo large numbers of insert, delete or update operations. Each row modification requires that the index be updated, the entire affected branch of the index structure may need to be re-sorted. Indexes are stored as a b-tree, or balanced tree. This principle tries to reduce the number of index lookups necessary to find a particular row by balancing each side of a branch to approximately equal sizes.
Covering Indexes
When every column (known as attributes) of a table is referenced by an index, this index is said to be a covering index. Covering indexes often provide the best speed for long, thin tables (many rows and few columns) because the query engine need not fetch any rows from the heap, all of the data can be retrieved from the index. Composite indexes contain two or more attributes from a table, when the composite index references all of the tables attributes it is said to be a covering index. Don’t forget, these super-indexes come with a performance penalty when modifying the data. If you have limited storage space, then the extra space needed by your indexes could be an issue. This should never be an issue, since even high performance disks are relatively inexpensive. (Approx $ 300.00 to $500.00 for 70GB SCA disks at the time of writing).
Covering indexes have an added benefit if you have a transaction based system. The rows referenced by the index’s key – range (the data rows referenced by the index) are locked when you do a data modification within a transaction. This is beneficial, since the database no longer needs to lock the entire table. Key – Range locking is an advanced topic useful for very few.
Indexed Views
A new feature of the SQL Server 2000 Enterprise and Developer editions is the indexed view. Indexed views neatly sidestep the issue of performance degradation when indexing heavily modified data. Instead of indexing a table, you can create a view on the table so that the data is accessed in exactly the same fashion. The indexes are placed on the view, and not on the table, meaning that the indexes do not need to be updated every time the data is modified.
You need not even modify your existing SQL. If the query optimizer sees that you are accessing data from a table, when it would be faster to access the same data from an indexed view, it will automatically upgrade the query to utilize the indexes available in the view.
Many highly normalized databases have joins spanning several tables, in order to effectively store, for example, all of the manufacturers names from every product ever bought by a particular customer. Storing this long-winded join syntax as an indexed query improves performance and makes the SQL statements more intuitive when being read. Because indexed views are so reliant on the underlying base tables, they must be schema bound to those tables when being created. You can no longer alter those tables without first dropping the indexed view.
Managing Indexes
The general syntax for creating indexes is
Create
Index [Unique] [Clustered | Nonclustered]
Index_name
ON
Table_name (Column_name1 [ASC
| DESC], Column_name2…)
[ON File_Group]
In the “pubs” database sample shipped with the SQL Server, the table “Authors” has an index which is created with the following script:
CREATE INDEX [aunmind] ON [dbo].[authors]([au_lname], [au_fname]) ON [PRIMARY]
Meaning that an index named “aunmid” (alpha numeric ID) was created is to be created on the authors table, containing both columns “au_lname” and “au_fname”, the first and last name of an author. This index is not unique and is nonclustered, it resides on the “PRIMARY” filegroup, which is the default.
To change an existing index, we can modify the create statement accordingly:
CREATE INDEX [aunmind] ON [dbo].[authors]([au_lname], [au_fname]) With DROP_EXISTING ON [PRIMARY]
There are several further options which can be specified, here some of the more interesting ones:
FILLFACTOR:
The fill factor is a percentage value describing how much free space should be left in the pages of the index’s leaves. Each page is 8 KB in size. Setting a fill factor of 30% will fill only 30% of each page with key values, allowing enough room for new entries to be stored in the same page. If more entries are needed than is room in the index page, a new page is created somewhere on the file system. Specifying a high fill factor (the default is 90%) on an index which will undergo many changes due to row modifications will lead to a gradual but steady fragmentation of the index, meaning slower I/O operations and degraded performance.
If you want to see information about the current fragmentation of an index, you can use the DBCC command:
DBCC SHOWCONTIG (authors, aunmind)
Which gives me the following result:
DBCC SHOWCONTIG scanning 'authors' table...
Table: 'authors' (1977058079); index ID: 2, database ID: 5
LEAF level scan performed.
- Pages Scanned................................: 1
- Extents Scanned..............................: 1
- Extent Switches..............................: 0
- Avg. Pages per Extent........................: 1.0
- Scan Density [Best Count:Actual Count].......: 100.00% [1:1]
- Logical Scan Fragmentation ..................:
0.00%
- Extent Scan Fragmentation ...................: 0.00%
- Avg. Bytes Free per Page.....................: 7394.0
- Avg. Page Density (full).....................: 8.65%
DBCC execution completed. If DBCC printed error messages, contact your
system administrator.
The output shows that the “Logical Scan Fragmentation” is 0%, and the number of extent switches is even less than the number of extents scanned, that is the index is contained in one page, there was no need to switch to another page. When the number of extent switches is significantly higher than the number of extents, your index is fragmented. Fragmented indexes can be defragmented easily by rebuilding the clustered index, if it exists, or be rebuilding each index individually. The following statement rebuilds the index, defragmenting it on the fly and without taking the database offline:
DBCC DBREINDEX (authors, aunmind)
It can take many parameters, here it is given the name of the table and the name of the index to rebuild.
Removing indexes is performed using the DROP INDEX syntax. To remove the above index we could use:
IF EXISTS (SELECT name FROM sysindexes
WHERE name = 'au_lname’)
DROP INDEX authors.au_lname
syntax. Dropping the clustered index will then lead to the nonclustered being rebuilt. The row locators in the nonclustered indexes can no longer point to the clustered index, but must physically point to a data row on the heap.
Dropping unique indexes is more involved. First, the constraints must be dropped from the table using an ALTER TABLE DROP CONSTRAINT command, and then a DROP INDEX command.
Managing Statistics
There is more to indexes than meets the eye. By default, the SQL Server keeps statistics on the distribution of data ranges in all columns whenever you have placed that column in an index. Statistics can also be maintained on columns which are not in an index. Statistics are used by the query optimizer to help decide how bet to search for particular data. If the data in a column covers a wide range of possibilities is evenly distributed, the query optimizer may well decide to use the index in fetching rows. It is important to maintain up to date statistics to prevent the query optimizer from making false assumptions about the distribution of data.
Statistics can be quickly and painlessly made for each column in every table in the current database by executing the following T-SQL command:
EXEC sp_createstats 'fullscan'
The system stored procedure “sp_createstats” gives detailed information about what work it has undertaken. It is very easy to see exactly what statistics are available. The following DBCC command will show the what is available for the “aunmind” index on the “authors” table:
DBCC SHOW_STATISTICS (authors, aunmind)
This gives detailed information in three resultsets. Consult the “Books Online” help “DBCC SHOW_STATISTICS” topic for more information. There are several other useful T-SQL commands for use in managing statistics: “CREATE STATISTICS“, “DROP STATISTICS” and “UPDATE STATISTICS” to name just three. These commands allow you to very fine tune exactly what is to be done.
Indexing Strategies
If you are lucky enough to be designing a database from scratch, you need the following information before deciding on an indexing strategy for each table:
- Does the table contain a column uniquely identifying the row?
This is a primary candidate for a clustered index. - Will the data be undergoing large numbers of modifications compared to
reads? (a ratio of 1:2 or higher)
Be careful of performance, add few indexes. - Will the table become very large? (>100.000 rows)
Be careful of the indexes disk usage. Index all fields used in where or sort clauses. - Do most queries sort on a particular field?
Index this field! Sort the index the way your sorts expect the data. - Do most queries filter on a particular field?
Index this field! If the query selects a few columns, create a composite index and add the selected fields to the index. - Does the table contain mainly natural language text?
Consider a full-text index.
Even the best laid plans can go astray, indexing is not a one shot approach. Always keep in mind that the effectiveness of your indexes changes according to how you access the data and how the data changes over time. Be prepared to tune your indexes periodically.
Index Gotchas
There is much more to learn about indexes, an invaluable reference is the Books Online suite of help files delivered with the product. So that you don’t need to go through a steep learning curve, here are a few common gotchas:
Indexing a “Bit” field |
This is pointless. Bit fields can contain either a 0 or a 1. The index can gain no advantage by having only one alternative when searching. Do not index bit fields, unless as part of a compound index. |
Varchar fields and “Like %this” queries. |
If you use the LIKE operator when searching on Varchar fields as shown with the wildcard, the index can not be used. Avoid searching like this. |
Requesting too many fields |
The index optimizer analyzes just how many records are expected, and how many fields will be returned before executing the best query plan. Requesting unused data will lead to a sub optimal query plan being used. |
Indexing uniform columns |
Just like indexing a bit field, if you index a column which contains a very limited range of data, the index is useless. If the database can partition 100,000 into either “Smith, Jones or Hughes” it will still need to scan too many rows to find the one you want. |
| Indexing small tables | Sometimes indexes can be slower than table scans! Very small tables with few rows are sometimes quicker when scanned than when index searched. |
Analyzing the effectiveness of an index.
With the knowledge about what index options are available, and after having initially indexed your tables, how do you go about assessing the effectiveness of your index strategy?
There are basically two methods, a manual assessment of the query optimizer’s behaviour using the SQL Analyzer tool shipped with the SQL Server, or the semi-automatic approach using the Index Tuning Wizard.
Index Tuning Wizard
The “Index Tuning Wizard”, or ITW, is a GUI based tool for analyzing the effectiveness of your current indexes, based on a workload. The workload can be created manually or recorded, using the SQL Profile, another GUI oriented tool. The SQL Profiler can listen to the current to the commands currently being executed against the database, and store these to file or to a new table in the database.
The index tuning wizard then runs the workload, monitoring the usage and effectiveness of indexes. Finally, the index tuning wizard recommends alterations and can even carry these out for you. These tools are very advanced and very effective, in practice they are favourable to using manual analysis methods due to the time savings which can be achieved from not needing to manually assess the effectiveness of your queries.
SQL Query Analyzer
The SQL Query Analyzer is another graphical tool shipped with the SQL Server. Alongside it’s basic functionality of allowing you to run T-SQL commands against the database server, it aids us by analyzing, in a graphical form, exactly how those T-SQL statements are performed. To get this information, choose “Query / Show Execution Plan” from the menu or hit CTRL + K before running (with F5) the current commands shown in the SQL pane.
Running the following SQL Statement shows useful information:
Select * from authors A
inner join titleauthor TA
on A.au_id = TA.au_id
inner join titles T on TA.title_id = T.title_id
Which basically shows all books written by all authors.
Notably, indexes are used to satisfy this query, and no bookmark lookups are necessary to fetch the data from the heap. Table scans and bookmark lookups are signs of possible areas for index improvement.
If we drop the Primary Key on the author’s table, and rerun the following query:
ALTER TABLE [dbo].[titleauthor] DROP CONSTRAINT
[FK__titleauth__au_id__0519C6AF]
GO
ALTER TABLE [dbo].[authors] DROP CONSTRAINT [UPKCL_auidind]
GO
Select * from authors A
inner join titleauthor TA
on A.au_id = TA.au_id
inner join titles T on TA.title_id = T.title_id
then the output from the query analyzer shows that because there is no longer an index on author.au_id, the optimizer must scan all rows in the table to find matching values:
As you can see, analyzing query performance using the SQL Query Analyzer provides detailed information. Even more details are given if you move the mouse pointer over one of the tables. This tool is highly useful for troubleshooting individual queries, but a little too much overhead for anal zing the performance of an entire application.
Summary
This article has given a detailed description of how indexes can be managed, with the intention of improving the performance of your SQL Queries.
Indexes speed database performance, but come at a cost as they need to be kept up to date by the server. Keeping the statistics information about the distribution of data in your database is essential for ensuring that your indexes can be used effectively.
In summary, tools and techniques have been demonstrated for managing indexes and statistics, and finally, how to analyzes the performance of your queries and the usage of your indexes has been shown.
Some fake listings to confuse certain spiders follow the real address:
Indexing_article _- A T -_ richard-quinn.com
q.unknown@765123wiueryewr.com
p.unknown@765123wiueryewr.org
a.unknown@765123wiueryewr.edu
z.unknown@765123wiueryewr.net
p.unknown@765123wiueryewr.us

This work is licensed under a Creative Commons License.