The SQL Server xVelocity memory-optimized columnstore index stores data by columns instead of by rows, similar to a column-oriented DBMS. The columnstore index speeds up data warehouse query processing in SQL Server 2012 through the current version, in many cases by a factor of 10 to 100. We'll be posting answers to frequently asked questions here. For more information about SQL Server 2012 nonclustered columnstore indexes, please see Columnstore Indexes on MSDN and SQL Server Columnstore Performance Tuning on TechNet. SQL Server 2014 is introducing clustered columnstore indexes. For more information, please see the pre-release topics Create Columnstore Index (Transact-SQL) and Columnstore Indexes.
What is xVelocity? xVelocity is Microsoft SQL Server's family of memory-optimized and in-memory technologies. These are next-generation technologies built for extreme speed on modern hardware systems with large memories and many cores. The first two members of the xVelocity family are the xVelocity In-Memory Analytics Engine (used in PowerPivot and Analysis Services), and the xVelocity Memory-Optimized Columnstore Index (used in the SQL Server database). SQL Server 2012 uses xVelocity to substantially accelerate common data warehouse queries. SQL Server 2012 introduced two new features: a nonclustered columnstore index and a vector-based query execution capability that processes data in units called "batches." Now, SQL Server 2014 is adding clustered columnstore indexes.
What is a column store?
Relational database management systems traditionally store data in row-wise fashion. The values comprising one row are stored contiguously on a page. We sometimes refer to data stored in row-wise fashion as a row store. In a column store, values from a single column (from multiple rows) are stored contiguously, potentially in a compressed form.
What is a columnstore index?
A columnstore index is a technology for storing, retrieving and managing data by using a columnar data format, called a columnstore. The data is compressed, stored, and managed as a collection of partial columns, called column segments. You can use a columnstore index to answer a query just like data in any other type of index. A columnstore index appears as an index on a table when examining catalog views or the Object Explorer in Management Studio. The query optimizer considers the columnstore index as a data source for accessing data just like it considers other indexes when creating a query plan. What do I have to do to use a columnstore index?
For nonclustered columnstore indexes, all you have to do is create a nonclustered columnstore index on one or more tables in your database. The query optimizer will decide when to use the columnstore index and when to use other types of indexes. The query optimizer will also choose when to use the new batch execution mode and when to use row execution mode. For clustered columnstore indexes, you can first create the table as a heap or clustered index. If your table has indexes, you need to drop all indexes, except for the clustered index. Then, use the CREATE CLUSTERED COLUMNSTORE INDEX statement to convert the existing table to a clustered columnstore index. The clustered columnstore index is the data storage mechanism for the entire table. SQL Server 2014 does not allow any other indexes when the table is stored as a clustered columnstore index.
When can I try columnstore indexes?
Nonclustered columnstore indexes are available in SQL Server 2012. Clustered columnstore indexes are in the preview releases of SQL Server 2014 and will ship in the final release. Are columnstore indexes available in SQL Azure? No, not yet.
[TOP]
How do I create a columnstore index?
You can create a columnstore index by using a slight variation on existing syntax for creating indexes. To create an index named mycolumnstoreindex on a table named mytable with three columns, named col1, col2, and col3, you would use the following syntax:
CREATE NONCLUSTERED COLUMNSTORE INDEX mycolumnstoreindex ON mytable (col1, col2, col3);
To avoid typing the names of all the columns in the table, you can use the Object Explorer in Management Studio to create the index as follows:
Does it matter what order I use when listing the columns in the CREATE INDEX statement?
No. When the columnstore index is created, it uses a proprietary algorithm to organize and compress the data.
Does the columnstore index have a primary key?
No. There is no notion of a primary key for a columnstore index.
How many columns should I put in my columnstore index?
Typically, you will put all the columns in a table in the columnstore index, although it is not necessary to include all the columns. The limit on the number of columns is the same as for other indexes (1024 columns). If you have a column that has a data type that is not supported for columnstore indexes, you must omit that column from the columnstore index.
What data types can be used with columnstore indexes?
A columnstore index can include columns with the following data types: int, big int, small int, tiny int, money, smallmoney, bit, float, real, char(n), varchar(n), nchar(n), nvarchar(n), date, datetime, datetime2, small datetime, time, datetimeoffset with precision <=2, decimal or numeric with precision <= 18.
What data types cannot be used in a columnstore index?
The following data types cannot be used in a columnstore index: decimal or numeric with precision > 18, datetimeoffset with precision > 2, binary, varbinary, image, text, ntext, varchar(max), nvarchar(max), cursor, hierarchyid, timestamp, uniqueidentifier, sqlvariant, xml.
Can I create a clustered columnstore index?
No. You can only create a columnstore index as a secondary (nonclustered) index. Each columnstore index will also have a base table that is either a heap or a clustered (row store) index.
How long does it take to create a columnstore index? Is creating a columnstore index a parallel operation?
Creating a columnstore index is a parallel operation, subject to the limitations on the number of CPUs available and any restrictions set on MaxDOP. Creating a columnstore index takes on the order of 1.5 times as long as building a B-tree on the same columns.
My MAXDOP is greater than one but the columnstore index was created with DOP = 1. Why it was not created using parallelism? If your table has less than one million rows, SQL Server will use only one thread to create the columnstore index. Creating the index in parallel requires more memory than creating the index serially. If your table has more than one million rows, but SQL Server cannot get a large enough memory grant to create the index using MAXDOP, SQL Server will automatically decrease DOP as needed to fit into the available memory grant. In some cases, DOP must be decreased to one in order to build the index under constrained memory. How much memory is needed to create a columnstore index?
The memory required for creating a columnstore index depends on the number of columns, the number of string columns, the degree of parallelism (DOP), and the characteristics of the data. SQL Server will request a memory grant before trying to create the index. If not enough memory is available to create the index in parallel with the current max DOP, SQL Server will reduce the DOP as needed to get an adequate memory grant. If SQL Server cannot get a memory grant to build the index with DOP = 1, the index creation will fail.
A rule of thumb for estimating the memory grant that will be requested for creating a columnstore index is:
Memory grant request in MB = [(4.2 *Number of columns in the CS index) + 68]*DOP + (Number of string cols * 34)
What can I do if I do not have enough memory to build the columnstore index?
It's possible for creation of a columnstore index to fail either at the very beginning of execution if it can't get the necessary initial memory grant, or later during execution if supplemental grants can't be obtained. If the initial grant fails, you'll see error 8657 or 8658. You may get error 701 or 802 if memory runs out later during execution. If out-of-memory error 8657 or 8658 occur at the beginning of columnstore index creation, first, check your resource governor settings. The default setting for resource governor limits a query in the default pool to 25% of available memory even if the server is otherwise inactive. This is true even if you have not enabled resource governor. Consider changing the resource governor settings to allow the create index statement to access more memory. You can do this using TSQL: ALTER WORKLOAD GROUP [DEFAULT] WITH (REQUEST_MAX_MEMORY_GRANT_PERCENT=X) ALTER RESOURCE GOVERNOR RECONFIGURE GO where X is the percent, say 50. If you get error 701 or 802 later during the index build, that means that the initial estimate of memory usage was too low, and additional memory was consumed during index build execution and memory ran out. The only viable way to work around these errors in this case is to explicitly reduce DOP when you create the index, reduce query concurrency, or add more memory. For all these error conditions (701, 802, 8657, and 8658), adding more memory to your system may help. See SQL Server Books Online for ALTER WORKLOAD GROUP for additional information. Another way to deal with out-of-memory conditions during columnstore index build is to vertically partition a wide table into two or more tables so that each table has fewer columns. If a query touches both tables, the table will have to be joined, which will affect query performance. If you use this option, you will want to allocate columns to the different tables carefully so that queries will usually touch only one of the tables. This option would also affect any existing queries and loading scripts. Another option is to omit some columns from the columnstore index. Good candidates are columns that are infrequently touched by queries that require scanning large amounts of data. In some cases, you may not be able to create a columnstore index due to insufficient memory soon after the server starts up, but later on it may work. This is because SQL Server, by default, gradually requests memory from the operating system as it needs it. So it may not have enough memory available to satisfy a large memory grant request soon after startup. If this happens, you can make the system grab more memory by running a query like "select count(*) from t" where t is a large table. Or, you can set both the min server memory and max server memory to the same value using sp_configure, which will force SQL Server to immediately grab the maximum amount of memory it will use from the operating system when it starts up.
Can I create a columnstore index on a compressed table?
Yes. The base table can have PAGE compression, ROW compression, or no compression. The columnstore index will have its own compression, which cannot be specified by the user. I tried to create a columnstore index with SQL Server Management Studio using the Indexes->New Index menu and it timed out after 20 minutes. How can I work around this? Run a CREATE NONCLUSTERED COLUMNSTORE INDEX statement manually in a T-SQL window instead of using the graphical interface. This will avoid the timeout imposed by the Management Studio graphical user interface.
Can I create a filtered columnstore index?
No. A columnstore index must contain data from all the rows in the table.
Can I create a columnstore index on a computed column?
Can I create a columnstore index on a sparse column?
Can I create a columnstore index on an indexed view?
Can I create multiple columnstore indexes?
What are the advantages and disadvantages of row stores and column stores?
When data is stored in column-wise fashion, the data can often be compressed more effectively than when stored in row-wise fashion. Typically there is more redundancy within a column than within a row, which usually means the data can be compressed to a greater degree. When data is more compressed, less IO is required to fetch the data into memory. In addition, a larger fraction of the data can reside in a given size of memory. Reducing IO can significantly speed up query response time. Retaining more of your working set of data in memory will speed up response time for subsequent queries that access the same data.
When data is stored column-wise, it is possible to access the column individually. If a query only references a few of the columns in the table, it is only necessary for a subset of the columns to be fetched from disk into memory. For example, if a query references five columns from a table with 50 columns (i.e. 10% of the columns), IO is reduced by 90% (in addition to any benefits from compression).
On the other hand, storing columns in independent structures means that the data must be recombined to return the data as a row. When a query touches only one (or a few) rows, having all the data for one row stored together can be an advantage if the row can be quickly located with a B-tree index. Row stores may offer better query performance for very selective queries, such as queries that lookup a single row or a small range of rows. Updating data is also simpler in a row store.
What is the difference between a pure column store and a hybrid column store?
SQL Server columnstore indexes are pure column stores. That means that the data is stored and compressed in column-wise fashion and individual columns can be accessed separately from other columns. A hybrid columnstore stores a set of rows together, but within that set of rows, data is organized and compressed in column-wise fashion. A hybrid column store can achieve good compression from a column-wise organization within the set of rows, but when data is fetched from disk, the pages being fetched contain data from all the columns in each row. Even if a query references only 10% of the columns in a table, all the columns must be fetched from disk, and unused columns also take up space in main memory. SQL Server columnstore indexes require less I/O and give better main-memory buffer pool hit rates than a hybrid columnstore.
Is a columnstore index better than a covering index that has exactly the columns I need for a query
The answer depends on the data and the query. Most likely the columnstore index will be compressed more than a covering row store index. If the query is not too selective, so that the query optimizer will choose an index scan and not an index seek, scanning the columnstore index will be faster than scanning the row store covering index. In addition, depending on the nature of the query, you can get batch mode processing when the query uses a columnstore index. Batch mode processing can substantially speed up operations on the data in addition to the speed up from a reduction in IO. If there is no columnstore index used in the query plan, you will not get batch mode processing. On the other hand, if the query is very selective, doing a single lookup, or a few lookups, in a row store covering index might be faster than scanning the columnstore index.
Another advantage of the columnstore index is that you can spend less time designing indexes. A row store index works well when it covers all the columns needed by a query. Changing a query by adding one more column to the select list can render the covering index ineffective. Building one columnstore index on all the columns in the table can be much simpler than designing multiple covering indexes.
Is the columnstore index the same as a set of covering indexes, one for each column?
No. Although the data for individual columns can be accessed independently, the columnstore index is a single object; the data from all the columns is organized and compressed as an entity. While the amount of compression achieved is dependent on the characteristics of the data, a columnstore index will most likely be much more compressed than a set of covering indexes, resulting in less IO to read the data into memory and the opportunity for more of the data to reside in memory across multiple queries. In addition, queries using columnstore indexes can benefit from batch mode processing, whereas a query using covering indexes for each column would not use batch mode processing.
Is columnstore index data still compressed after it is read into memory? Yes. Column segments are compressed on disk and remain compressed when cached in memory. Do columnstore indexes use bitmap indexes?
No. Columnstore indexes use a proprietary data representation based on Vertipaq. It’s not the same as a bitmap index and doesn’t use one. But it has some similar benefits to bitmap indexes, such as reducing the time it takes to filter on a column with a small number of distinct values.
I want to show other people how cool SQL Server columnstore indexes are. What can I show them? OR Where can I find more information (including documents and videos) about SQL Server columnstore indexes? White paper: http://download.microsoft.com/download/8/C/1/8C1CE06B-DE2F-40D1-9C5C-3EE521C25CE9/Columnstore%20Indexes%20for%20Fast%20DW%20QP%20SQL%20Server%2011.pdf Product documentation: http://msdn.microsoft.com/en-us/library/gg492088(SQL.110).aspx SQL Server Columnstore FAQ: http://social.technet.microsoft.com/wiki/contents/articles/sql-server-columnstore-index-faq.aspx SQL Server Columnstore Performance Tuning Guide: http://social.technet.microsoft.com/wiki/contents/articles/sql-server-columnstore-performance-tuning.aspx The Coming In-Memory Tipping Point, by David Campbell http://blogs.technet.com/b/dataplatforminsider/archive/2012/04/09/the-coming-in-memory-database-tipping-point.aspx
Microsoft Virtual Academy talk video, 47 minutes, March 2012: http://technet.microsoft.com/en-us/edge/Video/hh859842 TechEd 2011 talk video, Columnstore Indexes Unveiled, 1 hour, 9 minutes: http://channel9.msdn.com/Events/TechEd/NorthAmerica/2011/DBI312 TechEd 2012 talk video, SQL Server Columnstore Performance Tuning, 1 hour, 15 minutes: http://channel9.msdn.com/Events/TechEd/NorthAmerica/2012/DBI409 Columnstore performance and partition switching demo video, 9 minutes: http://channel9.msdn.com/posts/SQL11UPD02-REC-02
Columnstore performance demo video, 4 minutes: http://www.youtube.com/watch?v=vPN8_PCsJm4 ACM SIGMOD 2011 paper on SQL Server columnstore indexes: http://dl.acm.org/citation.cfm?doid=1989323.1989448 IEEE Data Engineering Bulletin Paper on SQL Server columnstore indexes, March 2012: http://sites.computer.org/debull/A12mar/apollo.pdf VertiPaq vs ColumnStore: Performance Analysis of the xVelocity Engine, v1.0, rev 2, Aug 3, 2012. http://www.sqlbi.com/wp-content/uploads/Vertipaq-vs-ColumnStore1.pdf Microsoft SQL Server 2012 Columnstore for Real Time Reporting in Manufacturing Automation (COPA-DATA zenon Analyzer), 2012. http://www.kreatron.ro/news/newsdetail_65.html Case Study (bwin.party): http://www.microsoft.com/casestudies/Microsoft-SQL-Server-2012/bwin.party/Company-Cuts-Reporting-Time-by-up-to-99-Percent-to-3-Seconds-and-Boosts-Scalability/710000000087 Case Study (Motricity: Migration from Sybase IQ to xVelocity columnstore index): http://www.microsoft.com/casestudies/Microsoft-SQL-Server-2012-Enterprise/Motricity/Mobile-Advertiser-Makes-Gains-with-Easy-Migration-of-Sybase-Database-to-Microsoft/710000000170 Case Study (MS People): http://www.microsoft.com/casestudies/Microsoft-SQL-Server-2012/Microsoft-Information-Technology-Group-MSIT/Microsoft-Cuts-Costs-and-Improves-Access-to-Information-with-Enhanced-Data-Warehouse/4000011545 Case Study (Columnstore Indexes to Speed ETL): http://prologika.com/CS/blogs/blog/archive/2011/12/07/columnstore-indexes-to-speed-etl.aspx Case Study (Mediterranean Shipping Company): http://www.microsoft.com/casestudies/Microsoft-SQL-Server-2012/Mediterranean-Shipping-Company-MSC/Shipper-Supports-Expansion-by-Boosting-Speed-Control-and-Savings-with-Microsoft/4000011460 Case Study (Beth Israel Deaconess Medical Center): http://www.microsoft.com/casestudies/Microsoft-SQL-Server-2012/Beth-Israel-Deaconess-Medical-Center/Hospital-Improves-Availability-and-Speeds-Performance-to-Deliver-High-Quality-Care/5000000011 Case Study (Belgacom) http://www.microsoft.com/casestudies/Microsoft-SQL-Server-2012-Enterprise/BICS/Telecom-Performs-Database-Queries-Five-Times-Faster-Gains-Ability-to-Sustain-Growth/710000000579 Case Study (BNZ - New Zealand Bank) http://www.microsoft.com/casestudies/Case_Study_Detail.aspx?CaseStudyID=710000000356 Case Study (RHI - Refractory Materials Manufacturer) http://www.microsoft.com/casestudies/Microsoft-SQL-Server-2012-Enterprise/RHI/Manufacturer-Speeds-Queries-and-Improves-Business-Decisions-with-New-BI-Solution/710000001276 Case Study (Recall -- Records Management Firm) http://www.microsoft.com/casestudies/Microsoft-SQL-Server-2012-Enterprise/Recall/Records-Management-Firm-Saves-1-Million-Gains-Faster-Data-Access-with-Microsoft-BI/710000001279 Slide deck on CDR (Telecom) application design loading 100M rows per day with 3 year retention http://sqlug.be/media/p/1238.aspx Internal Microsoft Columnstore Benchmark: http://download.microsoft.com/download/7/2/E/72E63D2D-9F73-42BB-890F-C1CA0931511C/SQL_Server_2012_xVelocityBenchmark_DatasheetMar2012.pdf SQL Server Column-Store available for all major SAP BW releases http://blogs.msdn.com/b/saponsqlserver/archive/2012/10/29/sql-server-column-store-generally-available-for-sap-bw.aspx SQL Server 2012 and Tableau -- speeding things up http://random-thunks.com/2012/11/23/sql-server-2012-and-tableau-speeding-things-up/ What determines how many segments there will be?
Each physical partition of a columnstore index is broken into one-million-row chunks called segments (a.k.a. row groups). The index build process creates as many full segments as possible. Because multiple threads work to build an index in parallel, there may be a few small segments (typically equal to the number of threads) at the end of each partition with the remainder of the data after creating full segments. That's because each thread might hit the end of its input at different times. Non-partitioned tables have one physical partition. [TOP]
How do I know whether the columnstore index is being used for my query?
How can I force the query to use a columnstore index?
… FROM mytable WITH (INDEX (mycsindex)) …
How can I prevent the use of a columnstore index in my query?
Are columnstore indexes an in-memory database technology? SQL Server columnstores provide the performance benefits of a pure in-memory system with the convenience and economics of a system that stores data on disk and caches recently used data in memory. Columnstores hold data in memory in a different format than is kept on disk. This in-memory representation is highly optimized to support fast query execution on modern processors. Not all data has to fit in memory with a SQL Server columnstore index. But if all columnstore data does fit in memory, SQL Server provides pure-in-memory levels of performance. Why require all data to fit in memory (capping your database size or demanding a large budget to purchase memory, and demanding slow system startup times) if you can get the best of both worlds, that is, state-of-the-art query performance on economical hardware? Does all the data have to fit in memory when I use a columnstore index?
No, a columnstore index is persisted on disk just like any other index. It is read into memory when needed just like other types of indexes. The columnstore index is divided into units called segments, which are the unit of transfer. A segment is stored as a LOB, and can consist of multiple pages. We elected to bring columnstore index data into memory on demand rather than require that all data fits in memory so customers can access databases much bigger than will fit in main memory. If all your data fits in memory, you'll get reduced I/O and the fastest possible query performance. But it's not necessary for all data to fit in memory, and that's a plus. What determines whether the columnstore index is stored in memory?
Can I force a whole columnstore index to be loaded into memory?
You cannot force the columnstore index to be loaded, or kept, in memory but you can warm the cache by running a query that will cause the columnstore data to be read into memory.
When should I build a columnstore index?
Columnstore indexes are designed to accelerate data warehouse queries, not OLTP workloads. Use columnstore indexes when your query workload entails scanning and aggregating large amounts of data or joining multiple tables, especially in a star join pattern. The restrictions on how you update the data will also affect your choice. Columnstore indexes will be easiest to manage if you have a read-mostly workload and if partition switching to update the data will fit into your workflow. Partition switching for handling updates is easier if most updates consist of appending new data to the existing table and can be placed in a staging table that can be switched into the table during periodic load cycles.
Typically you will want to build a columnstore index on large fact tables and maybe on large dimension tables as well. You can build a columnstore index on very small tables, but the performance advantage is less noticeable when the table is small. If you frequently update your dimension tables, and they are not too large, you may find the maintenance effort outweighs the benefit of a columnstore index.
When should I not build a columnstore index?
Do columnstore indexes work with Transparent Data Encryption?
Yes.
Can I compress the columnstore index?
The columnstore index is compressed when it is created. You cannot apply PAGE or ROW compression to a columnstore index. When a columnstore index is created, it uses the VertiPaqTM compression algorithms, which compress the data more than either PAGE or ROW compression. There is no user control over compression of the columnstore index. What is the difference in storage space used between the base table and the columnstore index?
Based on our experiments with a variety of different data sets, columnstore indexes are about 4X to 15X smaller than an uncompressed heap or clustered B-tree index, depending on the data.
Do columnstore indexes work on partitioned tables?
Yes, you can create a columnstore index on a partitioned table. The columnstore index must be partition-aligned with the base table. If you do not specify a partition scheme when you create the columnstore index, the index will be automatically created using the same partition scheme as the base table. You can switch a partition in and out of a partitioned table with the same requirements regarding matching indexes as exist for other types of clustered and nonclustered indexes.
Can I partition a columnstore index?
Yes, you can partition a columnstore index, but the base table must also be partitioned and the columnstore index must be partition-aligned with the base table.
How do I add to, or modify, the data in a table with a columnstore index?
Once you create a columnstore index on a table, you cannot directly modify the data in that table. A query with INSERT, UPDATE, DELETE, or MERGE will fail and return an error message. To add or modify the data in the table, you can do one of the following:
ALTER INDEX mycolumnstoreindex ON mytable DISABLE;
-- update the data --
ALTER INDEX mycolumnstoreindex ON mytable REBUILD;
Now the columnstore index is ready to use again.
See also the question about trickle load.
What happens if I try to update a table that has a columnstore index?
The update will fail and return an error message.
Can I disable and rebuild the index on a single partition?
No. You can only disable or rebuild a columnstore index on the entire table. If you want to rebuild only one partition, you should switch the partition into an empty staging table, disable/rebuild the index on the staging table, and switch the staging table back into the main table. There is no need to rebuild the index except when you want to modify the data in the table.
How can I tell whether there is a columnstore index on my table?
There are two ways to determine whether a columnstore exists on a table. In Management Studio, you can look at the Object Explorer. Each table has an entry for Indexes. Columnstore indexes are included in the list of indexes and have their own icon and description. You can also look at various catalog tables. In sys.indexes, a columnstore index has type = 6 and type_desc = “NONCLUSTERED COLUMNSTORE.” A new catalog table, sys.column_store_index_stats, has one row for each columnstore index.
How can I find out more about my columnstore indexes? Is there metadata?
There are two new catalog tables with data about columnstore indexes:
VIEW DEFINITIONS permission on a table is required to see information in the catalog tables about a columnstore index on that table. In addition, a user must have SELECT permission on the table to see data in the following columns:
sys.column_store_segments: has_nulls, base_id, magnitude, min_data_id, max_data_id, null_value, data_ptr
sys.column_store_dictionaries: last_id, entry_count, data_ptr
A user who does not have SELECT permission on a table will see NULL as the value in the columns listed above. Does the columnstore compression algorithm compress each partition separately? Yes, each partition is compressed separately. Each partition has its own dictionaries. All segments within a partition share dictionaries. Dictionaries for different partitions are independent. This allows partition switching to be a metadata-only operation.
How big are my columnstore indexes?
You can use the new catalog tables or sys.dm_db_partition_stats to determine how big the columnstore indexes are on disk. A relatively simple query to get the size of one columnstore index is:
SELECT SUM(s.used_page_count) / 128.0 on_disk_size_MB FROM sys.indexes AS i JOIN sys.dm_db_partition_stats AS S ON i.object_id = S.object_id and I.index_id = S.index_id WHERE i.object_id = object_id('<tablename>') AND i.type_desc = 'NONCLUSTERED COLUMNSTORE'
Here are some other queries that total up column store component sizes. -- total size with total_segment_size as ( SELECT SUM (css.on_disk_size)/1024/1024 AS segment_size_mb FROM sys.partitions AS p JOIN sys.column_store_segments AS css ON p.hobt_id = css.hobt_id ) , total_dictionary_size as ( SELECT SUM (csd.on_disk_size)/1024/1024 AS dictionary_size_mb FROM sys.partitions AS p JOIN sys.column_store_dictionaries AS csd ON p.hobt_id = csd.hobt_id ) select segment_size_mb, dictionary_size_mb, segment_size_mb + isnull(dictionary_size_mb, 0) as total_size_mb from total_segment_size left outer join total_dictionary_size on 1 = 1 go -- size per index with segment_size_by_index AS ( SELECT p.object_id as table_id, p.index_id as index_id, SUM (css.on_disk_size)/1024/1024 AS segment_size_mb FROM sys.partitions AS p JOIN sys.column_store_segments AS css ON p.hobt_id = css.hobt_id group by p.object_id, p.index_id ) , dictionary_size_by_index AS ( SELECT p.object_id as table_id, p.index_id as index_id, SUM (csd.on_disk_size)/1024/1024 AS dictionary_size_mb FROM sys.partitions AS p JOIN sys.column_store_dictionaries AS csd ON p.hobt_id = csd.hobt_id group by p.object_id, p.index_id ) select object_name(s.table_id) table_name, i.name as index_name, s.segment_size_mb, d.dictionary_size_mb, s.segment_size_mb + isnull(d.dictionary_size_mb, 0) as total_size_mb from segment_size_by_index s JOIN sys.indexes AS i ON i.object_id = s.table_id and i.index_id = s.index_id left outer join dictionary_size_by_index d on s.table_id = s.table_id and s.index_id = d.index_id order by total_size_mb desc go -- size per table with segment_size_by_table AS ( SELECT p.object_id as table_id, SUM (css.on_disk_size)/1024/1024 AS segment_size_mb FROM sys.partitions AS p JOIN sys.column_store_segments AS css ON p.hobt_id = css.hobt_id group by p.object_id ) , dictionary_size_by_table AS ( SELECT p.object_id AS table_id, SUM (csd.on_disk_size)/1024/1024 AS dictionary_size_mb FROM sys.partitions AS p JOIN sys.column_store_dictionaries AS csd ON p.hobt_id = csd.hobt_id group by p.object_id ) select t.name AS table_name, s.segment_size_mb, d.dictionary_size_mb, s.segment_size_mb + isnull(d.dictionary_size_mb, 0) as total_size_mb from dictionary_size_by_table d JOIN sys.tables AS t ON t.object_id = d.table_id left outer join segment_size_by_table s on d.table_id = s.table_id order by total_size_mb desc go -- size per column with segment_size_by_column as ( SELECT p.object_id as table_id, css.column_id, SUM (css.on_disk_size)/1024/1024.0 AS segment_size_mb FROM sys.partitions AS p JOIN sys.column_store_segments AS css ON p.hobt_id = css.hobt_id GROUP BY p.object_id, css.column_id ), dictionary_size_by_column as ( SELECT p.object_id as table_id, csd.column_id, SUM (csd.on_disk_size)/1024/1024.0 AS dictionary_size_mb FROM sys.partitions AS p JOIN sys.column_store_dictionaries AS csd ON p.hobt_id = csd.hobt_id GROUP BY p.object_id, csd.column_id ) -- It may be that not all the columns in a table will be or can be included -- in a nonclustered columnstore index, -- so we need to join to the sys.index_columns to get the correct column id. Select Object_Name(s.table_id) as table_name, C.column_id, col_name(S.table_id, C.column_id) as column_name, s.segment_size_mb, d.dictionary_size_mb, s.segment_size_mb + isnull(d.dictionary_size_mb, 0) total_size_mb from segment_size_by_column s join sys.indexes I -- Join to Indexes system table ON I.object_id = s.table_id join sys.index_columns c --Join to Index columns ON c.object_id = s.table_id And I.index_id = C.index_id and c.index_column_Id = s.column_id --Need to join to the index_column_id with the column_id left outer join dictionary_size_by_column d on s.table_id = d.table_id and s.column_id = d.column_id Where I.type_desc = 'NONCLUSTERED COLUMNSTORE' order by total_size_mb desc go
What is batch mode processing?
Batch mode processing uses a new iterator model for processing data a-batch-at-a-time instead of a-row-at-a-time. A batch typically represents about 1000 rows of data. Each column within a batch is stored as a vector in a separate area of memory, so batch mode processing is vector-based. Batch mode processing also uses algorithms that are optimized for the multicore CPUs and increased memory throughput that are found on modern hardware. Batch mode processing spreads metadata access costs and other types of overhead over all the rows in a batch, rather than paying the cost for each row. Batch mode processing operates on compressed data when possible and eliminates some of the exchange operators used by row mode processing. The result is better parallelism and faster performance.
How do I know whether batch mode processing is being used for my query?
Batch mode processing is only available for certain operators. Most queries that use batch mode processing will have part of the query plan executed in row mode and part in batch mode. You can tell whether batch mode processing is being used for an operator by looking at showplan. If you look at the properties for a scan or other operator in the Actual Execution Plan, you will see two new properties: EstimatedExecutionMode and ActualExecutionMode. Only EstimatedExecutionMode is displayed in the Estimated Execution Plan. The values for these two properties can be either row or batch. There is also a new operator for hash joins when they are being executed in batch mode. The BatchHashTableBuild operator appears in graphical showplan and has a new icon.
Can EstimatedExecutionMode and ActualExecutionMode be different? When and why?
The query optimizer chooses whether to use batch mode processing when it formulates the query plan. Most of the time, EstimatedExecutionMode and ActualExecutionMode will have the same value, either batch or row. At run time, two things can cause a query plan to be executed in row mode instead of batch mode: not enough memory or not enough threads. The most common reason for the ActualExecutionMode to be row when the EstimatedExecutionMode was batch is that there was a large hash join and all the hash tables could not fit in memory. Batch mode processing uses special in-memory hash tables. If the hash tables do not fit in memory, execution of the query reverts to using row mode and traditional hash tables that can spill to disk. The other reason for changing to row mode is when not enough threads are available for parallel execution. Serial execution always occurs in row mode. You can tell that a fall back to serial execution occurred if the estimated query plan shows parallel execution but the actual query plan is executed serially.
If the query executes in parallel but falls back to row mode processing, you can infer that memory was the problem. There is also an xevent (batch_hash_table_build_bailout) that is fired when there is not enough memory during hash join and the query falls back to row mode processing. If this happens, incorrect cardinality estimation may have contributed to the problem. Check the cardinality estimation and consider updating statistics on the table.
Is a parallel query plan required to get batch mode processing? Yes. Batch mode processing occurs only for parallel query execution. If the cost of the query plan is small, the optimizer may choose a serial plan that is "good enough." When experimenting with columnstore indexes you may need a large data set to see the effects of batch mode processing. Check the degree of parallelism if you see that a query was executed in row mode when you expected batch mode. Can I get batch mode processing even if I don’t have a columnstore index?
No. Batch mode processing only occurs when a columnstore index is being used in the query.
Barbara Kess MSFT edited Revision 140. Comment: SQL Server 2014 edits to the Overview Section
Rick Byham, Microsoft edited Revision 139. Comment: Changed references to Denali to SQL Server 2012.
Ed Price - MSFT edited Revision 135. Comment: Removing "(en-US)" from title.
Julie Koesmarno edited Revision 119. Comment: Corrected a typo on first paragraph under "How big are my columnstore indexes?" section