Dropping a Clustered Index Will Not Reorganize the Heap

Dropping a Clustered Index Will Not Reorganize the Heap

If a clustered index will be turned into a HEAP, no reorganization of the data will occur. This "myth" is sometimes popping up in forums or blogs. A view "behind the scenes" will describe how Microsoft SQL Server is handling this operation. Before the deep dive into the internals, some basics of relations in Microsoft SQL Server databases must be understand.

What is a clustered index?

A clustered index IS the relation itself ordered logically by the clustered key. Clustered indexes sort and store the data rows in the table or view based on their key values. These are the columns included in the index definition. There can be only one clustered index per table, because the data rows themselves can be sorted in only one order. Get detailed information about clustered index at http://technet.microsoft.com/en-us/library/ms190457.aspx.

What is a heap?

A heap is the relation without a clustered index. One or more nonclustered indexes can be created on tables stored as a heap in the same way as it can be created in relations which are clustered indexes. Data is stored in the heap without specifying an order. Usually data is initially stored in the order in which is the rows are inserted into the table, but the Database Engine can move data around in the heap to store the rows efficiently. Get detailed information about structures of a heap at http://technet.microsoft.com/en-us/library/hh213609.aspx.

Problem definition

A relation with a clustered index should be turned into a heap. Therefore the clustered index will be removed by using the command

ALTER TABLE [schema].[relation] DROP INDEX [NAME_OF_CLUSTERED_INDEX];

If a relation may have hundreds of millions of records the above operation can be very time consuming which lead to the assumption that the relation will be rebuild when it turns into a heap. That's not true as the following sample(s) may demonstrate:

Test scenario

First a simple relation with a clustered index will be created and filled with 100.000 records. The results for the physical index statistics show the depth of each index which depends on the size of the key attributes.

01.-- Creation of a relation for demonstration
02.IF OBJECT_ID('dbo.tbl_demo', 'U') IS NOT NULL
03.    DROP TABLE dbo.tbl_demo;
04.    GO
05. 
06.CREATE TABLE dbo.tbl_demo
07.(
08.    Id      int          NOT NULL    IDENTITY (1, 1),
09.    col1    char(189)    NOT NULL    DEFAULT ('just stuff'),
10.    col2    char(200)    NOT NULL    DEFAULT ('more stuff')
11.);
12. 
13.-- Filling the table with a few records
14.SET NOCOUNT ON
15.GO
16. 
17.INSERT INTO dbo.tbl_demo DEFAULT VALUES
18.GO 100000
19. 
20.-- create two indexes for demonstration
21.CREATE UNIQUE CLUSTERED INDEX tbl_demo_Id ON dbo.tbl_demo (Id);
22.CREATE INDEX tbl_demo_Col1 ON dbo.tbl_demo(Col1);

The above code creates a relation named [dbo].[tbl_demo] and inserts 100.000 records. After finishing the INSERT-operation a clustered index [tbl_demo_id] and a non clustered index [tbl_demo_col1] will be created. When the script is finished you will get an overview of the physical structures of both indexes with the following script.

01.-- Check the physical index stats
02.SELECT  OBJECT_NAME(i.object_id)    AS    object_name,
03.        i.name,
04.        i.index_id,
05.        index_level,
06.        index_type_desc,
07.        page_count,
08.        record_count
09.FROM    sys.indexes i INNER JOIN sys.dm_db_index_physical_stats(db_id(), OBJECT_ID('dbo.tbl_demo', 'U'), DEFAULT, DEFAULT, 'DETAILED') ps
10.        ON  (
11.                i.object_id = ps.object_id AND
12.                i.index_id = ps.index_id
13.            )
14.ORDER BY
15.        i.index_id ASC,
16.        ps.index_level ASC;



The clustered index has a ROOT-Level (index_level = 2) and a B-Tree (balanced Tree) level (index_level = 1). The Leaf-Level contains the data which are distributed over 5.004 data pages (index_level = 0). Because the clustered index IS the relation itself the leaf-level contains the data of the relation.
The internal structure of the clustered index can be viewed by using sys.dm_db_database_page_allocations (available since SQL 2012). The next pic shows an extract from the first data pages)

01.SELECT  page_type_desc,
02.        page_level,
03.        previous_page_page_id,
04.        allocated_page_page_id,
05.        next_page_page_id
06.FROM    sys.dm_db_database_page_allocations
07.        (
08.            db_id(),
09.            OBJECT_ID('dbo.tbl_demo', 'U'),
10.            1,
11.            DEFAULT,
12.            'DETAILED'
13.        )
14.WHERE   is_allocated = 1
15.ORDER BY
16.        page_type DESC,
17.        page_level DESC,
18.        previous_page_page_id ASC;



A pretty good way to make page allocations visible are the following queries which use information from sys.fn_PhysLocFormatter about the position of a record:

1.-- see the location of each record in the affected index
2.SELECT sys.fn_PhysLocFormatter(%%physloc%%) AS Location, * FROM dbo.tbl_demo WITH (INDEX (tbl_demo_Id));
3.SELECT sys.fn_PhysLocFormatter(%%physloc%%) AS Location, * FROM dbo.tbl_demo WITH (INDEX (tbl_demo_Col1));



The first resultset shows the very first seven records stored in the clusterd index which starts on data page 20256 in the first data file of the database. The storage of data for the non clustered index starts on data page 25632 on the same data file of the database.
Now the operation starts with dropping the clustered index from the relation. This operation automatically turns the relation into a HEAP which is - by definition - the relation itself without a clustered index. Some people assume a completely rebuild of the relation when dropping the clustered index - because the operation can consume much time and resources of Microsoft SQL Server. To see exactly what steps will Microsoft SQL Server go through when the clustered index is dropped the operation will be covered in a named transaction. This trick will make it easy to filter the transaction steps in a simple way from the transaction log.. After the clustered has been dropped again the page allocation of the relation will be listed by sys.fn_PhysLocFormatter.

01.-- drop the clustered index which will turn the relation into a HEAP
02.BEGIN TRANSACTION DropClusteredIndex
03.DROP INDEX tbl_demo_Id ON dbo.tbl_demo;
04.COMMIT TRANSACTION DropClusteredIndex
05.GO
06. 
07.-- see the location of each record in the affected index
08.SELECT sys.fn_PhysLocFormatter(%%physloc%%) AS Location, * FROM dbo.tbl_demo ORDER BY Id;
09.SELECT sys.fn_PhysLocFormatter(%%physloc%%) AS Location, * FROM dbo.tbl_demo WITH (INDEX (tbl_demo_Col1))
10.ORDER BY Id;



The result for the relation - which has now turned into a heap - surprises. The formerly allocated data pages have not been deallocated. The records are located at the same position as before the DROP-operation. But the page allocation of the non clustered index [tbl_demo_col1] has changed! The non clustered index has been rebuild! A look into the transaction log shows, why the DROP operation didn't affect the allocation of the data pages of the relation itself.

1.SELECT Operation, Context, AllocUnitId, AllocUnitName, [Lock Information]
2.FROM sys.fn_dblog(NULL, NULL)
3.WHERE [Transaction ID] IN
4.(
5.   SELECT [Transaction ID]
6.   FROM   sys.fn_dblog(NULL, NULL)
7.   WHERE  [Transaction Name] = 'DropClusteredIndex'
8.);



The picture above shows a shorten extract from the transaction log and the recorded steps Microsoft SQL Server has taken while dropping the clustered index. In total is was over 150 single operations but only a few of them can demonstrate the technical background; Microsoft SQL Server does not move data from one page to another but releases data from the data pages which hold meta data information of the relation.

 PFS Page Free Space Information about allocated and free space on data pages
GAM / SGAM Global Allocation Map
Shared Global Allocation Map
Information whether Extents are allocated or free
IAM Index Allocation Map Information of extents which are allocated by a clustered index ore a non clustered index.

To dive deeper into the internal structure of databases / data pages / records is recommended to read the great blog article "Inside the Storage Engine: GAM, SGAM, PFS and other allocation maps" from Paul S. Randal concerning the internals of the storage engine. An alternative to the www is the really great "bible for SQL Server Professionals" from Kalen Delaney "Microsoft SQL Server 2012 Internals". Have a look into the chapter "Databases Under the Hood" to get very detailed information about the storage engine.
From the transaction log it is obvious that only data pages which hold meta data will be part of the changes. The data itself won't be moved / touched in any way. Because a heap doesn't have a B-Tree structure, only these structure information will be removed when dropping a clustered index.
As an example for the operation the row 11 of the resultset of the transaction log will be object of deeper investigation. The [Lock Information] holds an exclusive lock in database 19 (which is my demo database) in file 1 and page 178 and slot 5. The page 178 is the IAM-page of the relation [dbo].[tbl_demo] which holds information about allocation of space by clustered or non clustered indexes. A close view into the data page demonstrates that extents have been deallocated.

1.DBCC TRACEON (3604);
2.DBCC PAGE (19, 1, 178, 3);



The following ranges are deallocated - they will be compared to the pages of the clustered index which "built" the B-tree structure

Not allocated  data pages of root level or intermediate level (B-Tree) of the clustered index
0-19960 179, 180, 410 - 414
20464 - 20472  20464, 20465
24312 - 24319 24312
24464 - 24480 24472, 24473
24976 - 24984 24976

All named pages in the above table are pages from the root-level and the B-tree-level of the former clustered index. So really NO data itself have been moved / manipulated but only meta data of the structure of the relation itself. The "middle construct" between IAM and data has been removed completely - which describes the structure of a heap perfectly.

Why have the data of the non clustered index been rebuild?

What impacts a non clustered index when any meta data manipulation will be used against the relation itself (e.g. creation / dropping a clustered index). The dependency between a non clustered index and the relation itself (whether a heap or a clustered index) is the link inside the non clustered index to the data pages of the heap / Clustered Index. Each non clustered index need to have a reference to the data of the relation. This "link" can be three different types of pointers:

 type of relation pointer in non clustered index  
unique clustered index the clustered key  
non unique clustered index   the clustered key and a UNIQUEIFIER which makes the row unique inside the relation   
heap RID (RowLocator Id)  

Concerning the structure of the demo relation it was a unique clustered key on the primary attribut [Id]. Due to the fact that the clustered key has been determined as unique no UNIQUIFIER need to be stored "on top" to the clustered key. This key is always part in any non clustered key. The following pic demonstrates the structure of the non clustered index before the DROP operation has taken place:



For every row in the non clustered index the pointer to the clustered key is stored, too. By this method Microsoft SQL Server is able to locate data from other attributes which are not part of the non clustered index (e.g. [col2]). The next picture shows the same non clustered index after the clustered index has been dropped.



As a side effect you can see that a RID consumes another 4 bytes more than the previous clustered key which was an INT (4 bytes).

Result

Dropping a clustered index will not force Microsoft SQL Server to rebuild the complete relation. Dropping a clustered index will only change the meta data and turn the B-Tree structure of an index into a flat hierarchy of an IAM page and multiple data pages (which is a heap). Adding a clustered index or dropping a clustered index will have deep impacts to ALL non clustered index of a relation because the pointers to the clustered index / heap will change and forces a rebuild of each non clustered index.

References

heap-structures http://technet.microsoft.com/en-us/library/ms188270.aspx
Clustered Index http://technet.microsoft.com/en-us/library/ms177443.aspx
Nonclustered Index: http://technet.microsoft.com/en-us/library/ms177484.aspx

 

Leave a Comment
  • Please add 3 and 5 and type the answer here:
  • Post
Wiki - Revision Comment List(Revision Comment)
Sort by: Published Date | Most Recent | Most Useful
Comments
  • Maheshkumar S Tiwari edited Revision 15. Comment: corrected typo error and tags

  • TNJMAN edited Revision 14. Comment: syntactic and idiomatic corrections

  • Ed Price - MSFT edited Revision 10. Comment: Title casing; "heap" should be lowercase

  • Uwe Ricken edited Revision 8. Comment: added the pictures - yeah :)

  • Naomi  N edited Revision 7. Comment: First image

  • Naomi  N edited Revision 1. Comment: Minor edit

Page 1 of 1 (6 items)
Wikis - Comment List
Sort by: Published Date | Most Recent | Most Useful
Posting comments is temporarily disabled until 10:00am PST on Saturday, December 14th. Thank you for your patience.
Comments
  • Naomi  N edited Revision 1. Comment: Minor edit

  • The images are not shown in the article

  • Dear Naomi,

    I've informed Ed because I'm not able to insert pics into the article :(

    I've tested with IE and Firefox - no success ...

    Unfortunately Ed didn't response to my email.

    If anybody can insert the pics to that article - ...

    PLEASE drop me a mail to uwe.ricken@db-berater.de

    I will send a ZIP. The naming of the pics are 1:1 to the tags in the article.

    Thank you so much for your support.

    Best, Uwe

  • Did you upload the image from your computer to the Wiki, as described in this article?

    social.technet.microsoft.com/.../264.wiki-how-to-add-a-video-or-image-to-your-article.aspx

    I find when I insert images into a Wiki article, I must be patient. It can take a few seconds for the image to be uploaded, then even longer before it is inserted in the article. I wait until I see the image before I save.

  • Naomi  N edited Revision 7. Comment: First image

  • I was able to insert first image Ok.

  • Uwe Ricken edited Revision 8. Comment: added the pictures - yeah :)

  • Ed Price - MSFT edited Revision 10. Comment: Title casing; "heap" should be lowercase

  • Great article, Uwe! Looks like the images are all up now. Sorry I didn't get back to you in time!

  • Congrats on winning the gold medal in the August TechNet Guru contest! blogs.technet.com/.../technet-guru-awards-august-2013.aspx

  • We featured your article on the home page of TNWiki: social.technet.microsoft.com/wiki

  • Exceptionally well-done!

  • TNJMAN edited Revision 14. Comment: syntactic and idiomatic corrections

  • You know, we should re-make the song "Hey Mickey" and change it to "Hey Wiki" ! ... Hey, Wiki, you're so fine, you blow my mind... hey Wiki, hey hey hey Wiki... [is it Friday yet???]

  • Maheshkumar S Tiwari edited Revision 15. Comment: corrected typo error and tags

Page 1 of 2 (16 items) 12