Demo Transcript: Demo: Accelerating Data Warehouse Queries with SQL Server 2012 Columnstore Indexes

Demo Transcript: Demo: Accelerating Data Warehouse Queries with SQL Server 2012 Columnstore Indexes



This wiki is a transcript of a previously recorded video.


Related content assets:


  • Demo: Accelerating Data Warehouse Queries with SQL Server 2012 Columnstore Indexes (SQL11UPD02-DEMO-01)
  • Video: Accelerating Data Warehouse Queries with SQL Server 2012 Columnstore Indexes (SQL11UPD02-REC-02)



Create Columnstore Index

Hey everyone – this is Joe Sack and I’m a Principal Consultant with SQLskills - and this is the demonstration for columnstore indexing in SQL Server 2012. Hopefully you got a chance to look at the presentation and I’m going to go ahead and switch to SQL Server Management Studio.

Before I kicked off this demo I actually increased the number of rows in FactProductInventory – so the default’s something like less than 100 thousand and I bumped this up to 6.2 million. If you plan on doing something similar to this, you can go far higher than that. Last weekend I did 123 million just to see how the performance was and it still kept up. I think I could probably go to a billion and still see impressive results.

So for this case, 6 million is enough for you to see the before and the after impact – and part of this demonstration is showing the before and the after and then the other part of it is actually showing you how to switch data out – so using partitioning to update the table.

First thing I’m going to do – I’m in AdventureWorksDWDenali and I’m going to turn on STATISTICS TIME so that I can get some before statistics. Alright, and I’m going to execute a standard data warehouse query. So in this case I’m just going against FactProductInventory and I’m joining the DimProduct and the DimDate table – so I’m pulling a few columns from the dimension tables and then I’m doing some aggregates against the UnitCost and then a sum against UnitsOut. So execute this twice because I don’t want to compare the results on a cold cache – so we’ll at least have a fair fight.

Now if you look here I have logical reads are 38 thousand – if we had looked at the previous logical reads and physical reads we would have seen a non-zero value for physical reads. But in this case there’s 38606 logical reads and that’s my baseline that I’m going to compare against. We can also look at CPU time – so this is something of interest – I’ll copy this over. Alright I already had something – 38 thousand over here.

Alright so we have our before state – and now I’m going to create a nonclustered columnstore index and remember for this version you have a nonclustered option and that’s it for columnstore index – there isn’t the concept of a clustered columnstore index.

So we’re going to create the nonclustered columnstore index and in your case, when you’re creating this, remember the best practice is to cover as many columns as you think might be used for reporting. So the whole idea is we don’t create another situation that you might have been stuck with in 2008, where you have to create intermediate tables or aggregate tables and then if anybody changes the reports they have to go back and change it or add the additional columns. In this case if you anticipate the column being used, go ahead and add it so long as it has the supporting data type.

Now I’m going to set time on and IO on again and I’m going to execute this twice as well, just to be fair. Alright. So this runs right away and if I scroll down to the logical reads…. Ok, so we have 810 – no wait, here we go – 5511 – there we go; logical reads. So that’s down from 30 thousand-some logical reads, so that’s a significant decrease and then in terms of CPU time. Let’s go ahead and copy that over so you can do a little visual comparison. Let’s go up here… yeah, so we see a decrease there as well. So we see almost 11 seconds, or almost 12 seconds of cpu time – now we have 2.2 seconds.

Alright, so I’ve shown you just the before and the after performance. One other thing I want to point out is the execution plan. Let me zoom in a little bit. So if I go over here to the actual columnstore index scan, I just want you to notice the batch mode. So Actual Execution Mode is batch, and you want to confirm that. So can have a columnstore index benefit with row mode, but the ultimate in performance is to have a columnstore index and then had batch mode used. So you want to validate that and you can do that through the execution plan by hovering over iterator – in this case it’s the columnstore index scan.

Update Columnstore Table Using Partitioning

Alright, so – what if I want to modify data in the table? Now that I have the columnstore index on there, I can’t just do an insert/update/delete or a merge but I do have other options like using partitioning.

So next part of the demonstration is I’m going to drop a couple of the indexes and I’m going to create a partition function and a partition scheme that I’m going to use against the fact table. I’m going to recreate the clustered index and recreate the nonclustered index using the partition function… this will take a few seconds… And then I’m going to create the nonclustered columnstore once this is done running. And the whole point here is that, once I’ve added this, again I’m not going to be able to do insert/update/delete or merge – but that doesn’t leave me without options.

Now, I’m demonstrating partitioning in this case – another option is that, if you have, for example, a periodic load – so let’s say you’re loading data once a day, you could drop the index, load the data, and then recreate the index. And a lot of the times, that amount of time to recreate the index after you’re done loading it might be less than you’re doing today. So if you’re using intermediate tables, aggregate tables, and having to modify those and having to create those, you might actually have less time in doing the drop columnstore index technique, and then load data, and then recreate – than you were in your legacy solution.

So I’m going to show you an insert and this insert is going to fail – I just want to show you that it’s not working so it failed because you can’t update a table with a columnstore index.

And now I’m going to create an archive table and this is where I’m going to create or actually add data and switch it in – so just creating a table, creating a clustered index, and creating a nonclustered index. Alright. And pardon me before I said I was going to switch it in – I’m actually going to switch data out to this. So I’m going to create a few other constraints – so some check constraints, so I’m able to actually switch out this table. Alright, so now our archive table is in a state where I can switch out from the fact table – so in terms of data modifications, in this case I’m demonstrating switching a partition out.

And I’m just going to a little reality check on the count – we’ve got 6.2 million rows right now. And our archive table, of course, is zero rows. All we’ve done is add it, we haven’t done anything else with it. Alright, so now here’s where we get to switching out the partition. So I can ALTER TABLE – the name of our fact table and switch partition 19 to this empty archive table. And in this case it’s empty – it’s going to be a quick metadata switch – and sure enough, that’s it – it happened right away – and if I query it, I see I’ve 63 thousand rows.

So that’s 63 thousand rows that moved in right away, and again – no tricks – if I try to do an insert again against the fact table, it doesn’t allow me to do it. And I can also then switch back to the main table. Alright – so right now we have 6.1 million rows and I’m going to create a nonclustered columnstore index on the table. Alright – and then I’m just going to switch it back in. So ALTER TABLE Archive_FactProductInventory  - I’m going to SWITCH to the FactProductInventory partition 19. So going from Archive to partition 19. And 6.2 million – so we have our rows back in.

So even though you have extra considerations around data modifications once you’ve added a columnstore index, a lot of data warehouse scenarios are partition-friendly so often you’re basing it on a date key, for example. And so with a little architecture ahead of time, the actual data movement back and forth is not as much of a constraint as you think at first and it���s often going to be well-worth the benefit that you see in terms of performance that I showed you a little bit earlier.

So I hope you’ve enjoyed this video and thanks for watching!

Return to SQL Server 2012 Developer Training Kit BOM (en-US)

Leave a Comment
  • Please add 2 and 5 and type the answer here:
  • Post
Wiki - Revision Comment List(Revision Comment)
Sort by: Published Date | Most Recent | Most Useful
  • Richard Mueller edited Revision 2. Comment: Removed (en-US) from title, added tag

Page 1 of 1 (1 items)
Wikis - Comment List
Sort by: Published Date | Most Recent | Most Useful
Posting comments is temporarily disabled until 10:00am PST on Saturday, December 14th. Thank you for your patience.
  • Richard Mueller edited Revision 2. Comment: Removed (en-US) from title, added tag

Page 1 of 1 (1 items)