TechNet
Products
IT Resources
Downloads
Training
Support
Products
Windows
Windows Server
System Center
Microsoft Edge
Office
Office 365
Exchange Server
SQL Server
SharePoint Products
Skype for Business
See all products »
Resources
Channel 9 Video
Evaluation Center
Learning Resources
Microsoft Tech Companion App
Microsoft Technical Communities
Microsoft Virtual Academy
Script Center
Server and Tools Blogs
TechNet Blogs
TechNet Flash Newsletter
TechNet Gallery
TechNet Library
TechNet Magazine
TechNet Wiki
Windows Sysinternals
Virtual Labs
Solutions
Networking
Cloud and Datacenter
Security
Virtualization
Updates
Service Packs
Security Bulletins
Windows Update
Trials
Windows Server 2016
System Center 2016
Windows 10 Enterprise
SQL Server 2016
See all trials »
Related Sites
Microsoft Download Center
Microsoft Evaluation Center
Drivers
Windows Sysinternals
TechNet Gallery
Training
Expert-led, virtual classes
Training Catalog
Class Locator
Microsoft Virtual Academy
Free Windows Server 2012 courses
Free Windows 8 courses
SQL Server training
Microsoft Official Courses On-Demand
Certifications
Certification overview
Special offers
MCSE Cloud Platform and Infrastructure
MCSE: Mobility
MCSE: Data Management and Analytics
MCSE Productivity
Other resources
Microsoft Events
Exam Replay
Born To Learn blog
Find technical communities in your area
Azure training
Official Practice Tests
Support options
For business
For developers
For IT professionals
For technical support
Support offerings
More support
Microsoft Premier Online
TechNet Forums
MSDN Forums
Security Bulletins & Advisories
Not an IT pro?
Microsoft Customer Support
Microsoft Community Forums
Sign in
Home
Library
Wiki
Learn
Gallery
Downloads
Support
Forums
Blogs
Resources For IT Professionals
United States (English)
Россия (Pусский)
中国(简体中文)
Brasil (Português)
Skip to locale bar
Editing: SQL Server Columnstore Performance Tuning
Wiki
>
TechNet Articles
>
SQL Server Columnstore Performance Tuning
Article
History
Title
<html> <body> <p><span style="font-family:azby">[toc]</span><a name="Introduction"></a></p> <h1><a name="Introduction"></a>Introduction</h1> <br> SQL Server columnstore indexes are new in the SQL Server 2012 release. They are designed to improve query performance for data warehouses and data marts. This page describes query performance tuning for columnstores. <br> <h1><a name="Fundamentals_of_Columnstore_Index-Based_Performance"></a>Fundamentals of Columnstore Index-Based Performance </h1> Columnstore indexes can speed up some queries by a factor of 10X to 100X on the same hardware depending on the query and data. These key things make columnstore-based query processing so fast: <ul> <li><strong>The columnstore index itself stores data in highly compressed format, with each column kept in a separate group of pages.</strong> This reduces I/O a lot for most data warehouse queries because many data warehouse fact tables contain 30 or more columns, while a typical query might touch only 5 or 6 columns. Only the columns touched by the query must be read from disk. Only the more frequently accessed columns have to take up space in main memory. The clustered B-tree or heap containing the primary copy of the data is normally used only to build the columnstore, and will typically not be accessed for the large majority of query processing. It'll be paged out of memory and won't take main memory resources during normal periods of query processing. </li><li> <strong>There is a highly efficient, vector-based query execution method called "batch processing" that works with the columnstore index.</strong> A "batch" is an object that contains about 1000 rows. Each column within the batch is represented internally as a vector. Batch processing can reduce CPU consumption 7X to 40X compared to the older, row-based query execution methods. Efficient vector-based algorithms allow this by dramatically reducing the CPU overhead of basic filter, expression evaluation, projection, and join operations. </li><li><strong>Segment elimination can skip large chunks of data to speed up scans.</strong> Each partition in a columnstore indexes is broken into one million row chunks called segments. Each segment has metadata that stores the minimum and maximum value of each column for the segment. The storage engine checks filter conditions against the metadata. If it can detect that no rows will qualify then it skips the entire segment without even reading it from disk. </li><li><strong>The storage engine pushes filters down into the scans of data.</strong> This eliminates data early during query execution, improving query response time. </li></ul> <p>The columnstore index and batch query execution mode are deeply integrated into SQL Server. A particular query can be processed entirely in batch mode, entirely in the standard row mode, or with a combination of batch and row-based processing. <strong><em>The key to getting the best performance is to make sure your queries process the large majority of data in batch mode. </em></strong>Even if the bulk of your query can't be executed in batch mode, you can still get significant performance benefits from columnstore indexes through reduced I/O, and through pushing down of predicates to the storage engine.<br> <br> To tell if the main part of your query is running in batch mode, look at the graphical showplan, hover the mouse pointer over the most expensive scan operator (usually a scan of a large fact table) and check the tooltip. It will say whether the estimated and actual execution mode was Row or Batch. See <a href="http://social.technet.microsoft.com/wiki/cfs-filesystemfile.ashx/__key/communityserver-wikis-components-files/00-00-00-00-05/3782.Untitled.jpg"> here </a>for an example.<br> <strong><em></em></strong></p> <h1><a name="Do_s_and_Don_ts_for_using_Columnstores"></a>DOs and DON'Ts for using Columnstores Effectively</h1> <p>Obeying the following do's and don'ts will help you get the most out of columnstores for your decision support workload.<br> <br> </p> <p><strong><span style="font-size:14px">DOs</span></strong></p> <ul> <li><strong>Put columnstore indexes on large tables only.</strong> Typically, you will put them on your fact tables in your data warehouse, but <em>not</em> the dimension tables. If you have a large dimension table, containing more than a few million rows, then you may want to put a columnstore index on it as well. </li><li><strong>Include every column of the table in the columnstore index.</strong> If you don't, then a query that references a column not included in the index will not benefit from the columnstores index much or at all. </li><li><strong>Structure your queries as star joins with grouping and aggregation as much as possible</strong>. Avoid joining pairs of large tables. Join a single large fact table to one or more smaller dimensions using standard inner joins. Use a dimensional modeling approach for your data as much as possible to allow you to structure your queries this way. </li><li><strong>Use best practices for statistics management and query design.</strong> This is independent of columnstore technology. Use good statistics and avoid query design pitfalls to get the best performance. See the white paper on <a href="http://msdn.microsoft.com/en-us/library/dd535534(v=SQL.100).aspx">SQL Server statistics </a>for guidance. In particular, see the section "Best Practices for Managing Statistics." </li></ul> <p><strong><span style="font-size:14px">DON'Ts<br> </span></strong>(Note: we are already working to improve the implementation to eliminate limitations associated with these "don'ts" and we anticipate fixing them sometime after the SQL Server 2012 release. We're not ready to announce a timetable yet.) Later, we'll describe how to work around the limitations.</p> <ul> <li><strong>Avoid joins and string filters directly on columns of columnstore-indexed tables.</strong> String filters don't get pushed down into scans on columnstore indexes, and join processing on strings is less efficient than on integers. Filters on number and date types are pushed down. Consider using integer codes (or surrogate keys) instead of strings in columnstore indexed fact tables. You can move the string values to a dimension table. Joins on the integer columns normally will be processed very efficiently. </li><li><strong>Avoid use of OUTER JOIN on columnstore-indexed tables</strong>. Outer joins don't benefit from batch processing. Instead, SQL Server 2012 reverts to row-at-a-time processing. </li><li><strong>Avoid use of NOT IN on columnstore-indexed tables.</strong> NOT IN (<subquery>) (which internally uses an operator called "anti-semi-join") can prevent batch processing and cause the system to revert to row mode. NOT IN (<list of constants>) typically works fine though. </li><li><strong>Avoid use of UNION ALL to directly combine columnstore-indexed tables with other tables.</strong> Batch processing doesn't get pushed down over UNION ALL. So, for example, creating a view vFact that does a UNION ALL of two tables, one with a columnstore indexes and one without, and then querying vFact in a star join query, will not use batch processing. </li></ul> <p> </p> <h1><a name="Working_Around_Columnstore_Limitations_in_Denali"></a>Maximizing Performance and Working Around Columnstore Limitations</h1> <p>Follow the links to the topics listed below about how to maximize performance with columnstores indexes, and work around their functional and performance limitations in SQL Server 2012.</p> <h2><a name="Ensuring_use_of_the_Fast_Batch_Mode_of_Query_Execution"></a>Ensuring Use of the Fast Batch Mode of Query Execution </h2> <ul> <li><a href="http://social.technet.microsoft.com/wiki/contents/articles/8534.parallelism-dop-2-is-required-to-get-batch-processing.aspx">Parallelism (DOP >= 2) is Required to Get Batch Processing</a> </li><li><a href="http://social.technet.microsoft.com/wiki/contents/articles/use-outer-join-with-columnstores-and-still-get-the-benefit-of-batch-processing.aspx">Use Outer Join and Still Get the Benefit of Batch Processing</a> </li><li><a href="http://social.technet.microsoft.com/wiki/contents/articles/work-around-inability-to-get-batch-processing-with-in-and-exists.aspx">Work Around Inability to get Batch Processing with IN and EXISTS</a> </li><li><a href="http://social.technet.microsoft.com/wiki/contents/articles/perform-not-in-and-still-get-the-benefit-of-batch-processing.aspx">Perform NOT IN and Still Get the Benefit of Batch Processing</a> </li><li><a href="http://social.technet.microsoft.com/wiki/contents/articles/perform-union-all-and-still-get-the-benefit-of-batch-processing.aspx">Perform UNION ALL and Still Get the Benefit of Batch Processing</a> </li><li><a href="http://social.technet.microsoft.com/wiki/contents/articles/perform-scalar-aggregates-and-still-get-the-benefit-of-batch-processing.aspx">Perform Scalar Aggregates and Still get the Benefit of Batch Processing</a> </li><li><a href="http://social.technet.microsoft.com/wiki/contents/articles/maintaining-batch-processing-with-multiple-aggregates-including-one-or-more-distinct-aggregates.aspx">Maintaining Batch Processing with Multiple Aggregates Including one or More DISTINCT Aggregates</a> </li><li><a href="http://www.sqlserverfaq.net/2012/07/13/performance-tuning-with-columnstore-index-using-batch-processing/">Using HASH JOIN hint to avoid nested loop join and force batch processing</a> </li></ul> <h2><a name="Physical_Database_Design_Loading_and_Index_Management"></a>Physical Database Design, Loading, and Index Management</h2> <ul> <li><span style="color:rgb(0,102,221)"><a href="http://social.technet.microsoft.com/wiki/contents/articles/11804.using-a-drop-and-rebuild-approach-with-columnstores-for-a-nightly-load-scenario.aspx"><span style="color:rgb(0,102,221)">Adding Data Using a Drop-and-Rebuild Approach</span></a></span> </li><li><span style="color:rgb(0,102,221)"><a href="http://social.technet.microsoft.com/wiki/contents/articles/5069.add-data-to-a-table-with-a-columnstore-index-using-partition-switching.aspx">Adding Data Using Partition Switching</a></span> </li><li><a href="http://social.technet.microsoft.com/wiki/contents/articles/trickle-loading-with-columnstore-indexes.aspx"><span style="color:rgb(0,102,221)">Trickle Loading with Columnstore Indexes</span></a> </li><li><a href="http://social.technet.microsoft.com/wiki/contents/articles/5303.aspx"><span style="color:rgb(0,102,221)">Avoid Using Nonclustered B-tree Indexes</span></a> </li><li><a href="http://social.technet.microsoft.com/wiki/contents/articles/changing-your-application-to-eliminate-unsupported-data-types.aspx"><span style="color:rgb(0,102,221)">Changing Your Application to Eliminate Unsupported Data Types</span></a> </li><li><a href="http://social.technet.microsoft.com/wiki/contents/articles/achieving-fast-parallel-columnstore-index-builds.aspx"><span style="color:rgb(0,102,221)">Achieving Fast Parallel Columnstore Index Builds</span></a> </li></ul> <h2><a name="Maximizing_the_Benefits_of_Segment_Elimination"></a>Maximizing the Benefits of Segment Elimination</h2> <ul> <li><a href="http://social.technet.microsoft.com/wiki/contents/articles/understanding-segment-elimination.aspx">Understanding Segment Elimination</a> </li><li><a href="http://social.technet.microsoft.com/wiki/contents/articles/verifying-columnstore-segment-elimination.aspx">Verifying Columnstore Segment Elimination</a> </li><li><a href="http://social.technet.microsoft.com/wiki/contents/articles/ensuring-your-data-is-sorted-or-nearly-sorted-by-date-to-benefit-from-date-range-elimination.aspx">Ensuring Your Data is Sorted or Nearly Sorted by Date to Benefit from Date Range Elimination</a> </li><li><a href="http://social.technet.microsoft.com/wiki/contents/articles/5823.aspx">Multi-Dimensional Clustering to Maximize the Benefit of Segment Elimination</a> </li></ul> <h2><a name="Additional_Tuning_Considerations"></a>Additional Tuning Considerations </h2> <ul> <li><a href="http://social.technet.microsoft.com/wiki/contents/articles/work-around-performance-issues-for-columnstores-related-to-strings.aspx"><span style="color:rgb(0,102,221)">Work Around Performance Issues for Columnstores Related to Strings</span></a> </li><li><a href="http://social.technet.microsoft.com/wiki/contents/articles/5304.aspx"><span style="color:rgb(0,102,221)">Force Use or Non-Use of a Columnstore Index</span></a> </li><li><a href="http://social.technet.microsoft.com/wiki/contents/articles/5939.aspx"><span style="color:rgb(0,102,221)">Workarounds for Predicates that Don't Get Pushed Down to Columnstore Scan (Including OR)</span></a> </li><li><a href="http://social.technet.microsoft.com/wiki/contents/articles/7404.using-statistics-with-columnstore-indexes.aspx">Using Statistics with Columnstore Indexes </a> </li></ul> <p> </p> <p><a href="http://social.technet.microsoft.com/wiki/contents/articles/sql-server-columnstore-index-faq.aspx">Go to Columnstore FAQ</a>.</p> <p> </p> </body> </html>
Comment
Tags
Please add 6 and 7 and type the answer here: