As part of our work on a current project, there was a request to investigate some slow running queries in a database.  It is a common practice in the database world to normalize the data to some degree.  Essentially, this is a process to reduce duplication by spreading data across multiple tables.  For example:

Customer

ID Last_Name First_Name Street_Address City
1 Smith Bob 123 Main St. Toronto
2 Smith Jane 123 Main St. Toronto
3 Smith Peter 123 Main St. Toronto

The above table could instead be normalized into two tables, Customer and Address:

Customer

ID Last_Name First_Name Address_ID
1 Smith Bob 1
2 Smith Jane 1
3 Smith Peter 1

Address

ID Street _Address City
1 123 Main St. Toronto

This has some definite advantages, such as:

  • Saves a lot of space in the database
  • Can help with data integrity
    • Reduced risk of inconsistencies such as Bob Smith being 123 Main St. and Jane Smith being 123 Main Street
  • For some queries it can even improve performance
    • Searching for Customers with Address_ID = 1 is faster than the String compare that would be required for Street_Adress =  ‘123 Main St.’

The most notable disadvantage however, is that for the majority of read queries, performance tends to take a small hit (since the query will have to JOIN the two tables together).  In heavily normalized databases the performance impact can become excessive.

In the database we were investigating, the data is nearly maximally normalized which leads to very poor performance for certain operations; one extreme example being report generation (a process frequently impacted by data normalization).

For databases that follow this extensive normalization model, it is critical to look for opportunities to optimize performance.  One interesting optimization technique is that of an Indexed View (in other databases these are often referred to as a Materialized Views).

Automatic Optimization

While working to improve the performance of a particular query (for generating a specific report – “Report A”), a very interesting discovery was made.  The query began by joining a pair of tables into a single temporary table, and then accessing that table numerous times.  The decision was made to remove the temporary table and replace it with an Indexed View to improve performance.  As expected, there was a small improvement (about 5%).

The interesting part is that other report queries were suddenly also more performant!

Query

Original SQL Duration

New SQL Duration

Improvement

Report B

62.33 sec.

51.73 sec.

17.01 %

Report C

19.20 sec.

10.11 sec.

47.34 %

Report D

30.94 sec.

12.48 sec.

59.66 %

Apparently, SQL Server is intelligent enough to detect when any query would benefit from using an existing Indexed View, and will use it automatically.  According to Microsoft Technet (https://technet.microsoft.com/en-us/library/ms181151(v=sql.105).aspx):

A query does not have to explicitly reference an indexed view in the FROM clause for the query optimizer to use the indexed view. If the query contains references to columns in the base tables that are also present in the indexed view, and the query optimizer estimates that using the indexed view provides the lowest cost access mechanism, the query optimizer chooses the indexed view, similar to the way it chooses base table indexes when they are not directly referenced in a query.

Even if the Indexed View contains additional columns, if the SQL Server optimizer determines that using the Indexed View will give better performance, it will use it.

As a final note, there will be times when the SQL Server optimizer makes a poor choice (such as using an Indexed View that it shouldn’t use).  In these instances, it will be important for developers (or DBA’s) to forcibly override the optimizer’s decisions.  This can be done with the EXPAND VIEWS query hint (to prevent an Indexed View from being used).  As with other potential optimizations (such as creating column indexes), it is never a black or white decision.  In the case of indexed views, performance of read operations will generally improve as there are fewer joins on the database.  The downside is that write operations become more expensive as the data needs to be updated in both the initial table as well as the indexed view.  So developers should always take time to run analysis (using a query analyzer / plan optimizer) on the schema with representative data to help determine if an indexed view (or any other potential optimization) makes sense for their particular scenario.