Table of Contents IntroductionFundamentals of Columnstore Index-Based PerformanceDOs and DON'Ts for using Columnstores EffectivelyMaximizing Performance and Working Around Columnstore Limitations Ensuring Use of the Fast Batch Mode of Query ExecutionPhysical Database Design, Loading, and Index ManagementMaximizing the Benefits of Segment EliminationAdditional Tuning Considerations
The columnstore index and batch query execution mode are deeply integrated into SQL Server. A particular query can be processed entirely in batch mode, entirely in the standard row mode, or with a combination of batch and row-based processing. The key to getting the best performance is to make sure your queries process the large majority of data in batch mode. Even if the bulk of your query can't be executed in batch mode, you can still get significant performance benefits from columnstore indexes through reduced I/O, and through pushing down of predicates to the storage engine. To tell if the main part of your query is running in batch mode, look at the graphical showplan, hover the mouse pointer over the most expensive scan operator (usually a scan of a large fact table) and check the tooltip. It will say whether the estimated and actual execution mode was Row or Batch. See here for an example.
Obeying the following do's and don'ts will help you get the most out of columnstores for your decision support workload.
DOs
DON'Ts (Note: we are already working to improve the implementation to eliminate limitations associated with these "don'ts" and we anticipate fixing them sometime after the SQL Server 2012 release. We're not ready to announce a timetable yet.) Later, we'll describe how to work around the limitations.
Follow the links to the topics listed below about how to maximize performance with columnstores indexes, and work around their functional and performance limitations in SQL Server 2012.
Go to Columnstore FAQ.
Please leave a comment if you want to vote for us to complete links to a specific new item first, or if you have a question about a workaround or best practice. We'll use that to prioritize what we add to this.
Thanks for this blog post - very helpful.
Thanks for very useful post.
Thanks very much for these posts, but even more for delivering the ColumnStore feature.
I'd like to see the features implemented in the following order:
(1) batch processing support in stream aggregation operators: typically when not applying a GROUP BY and aggregating over a large fact table
(2) push down of textual filters into the storage engine: imagine a dimension table with tens of millions of product names where a user is searching for a particular partial name
(3) support for combined filters in the storage engine: a query rewrite is much easier for the SQL team to perform than for us to change every implementation for every use of OR; it's exactly like any other query rewrite optimisation
Note as well that in our tests, using a ColumnStore index on a dimension table can provide very significant performance improvements on some fact table queries utilising joins and filters.
Great article. I love how you keep your explanations very simple yet pretty in-depth.
I'd recommend this article to everyone who is just starting to use columnstore indexes (or those who has used them for a while but didn't research the pitfalls).
@ash8
I totally agree. I'm just new to Columnstore indexes and this has been very helpful.. especially the DOs and DONTs.
Thanks, Eric!