If a clustered index will be turned into a HEAP, no reorganization of the data will occur. This "myth" is sometimes popping up in forums or blogs. A view "behind the scenes" will describe how Microsoft SQL Server is handling this operation. Before the deep dive into the internals, some basics of relations in Microsoft SQL Server databases must be understand.
A clustered index IS the relation itself ordered logically by the clustered key. Clustered indexes sort and store the data rows in the table or view based on their key values. These are the columns included in the index definition. There can be only one clustered index per table, because the data rows themselves can be sorted in only one order. Get detailed information about clustered index at http://technet.microsoft.com/en-us/library/ms190457.aspx.
A heap is the relation without a clustered index. One or more nonclustered indexes can be created on tables stored as a heap in the same way as it can be created in relations which are clustered indexes. Data is stored in the heap without specifying an order. Usually data is initially stored in the order in which is the rows are inserted into the table, but the Database Engine can move data around in the heap to store the rows efficiently. Get detailed information about structures of a heap at http://technet.microsoft.com/en-us/library/hh213609.aspx.
A relation with a clustered index should be turned into a heap. Therefore the clustered index will be removed by using the command ALTER TABLE [schema].[relation] DROP INDEX [NAME_OF_CLUSTERED_INDEX]; If a relation may have hundreds of millions of records the above operation can be very time consuming which lead to the assumption that the relation will be rebuild when it turns into a heap. That's not true as the following sample(s) may demonstrate:
First a simple relation with a clustered index will be created and filled with 100.000 records. The results for the physical index statistics show the depth of each index which depends on the size of the key attributes.
01.
-- Creation of a relation for demonstration
02.
IF OBJECT_ID(
'dbo.tbl_demo'
,
'U'
)
IS
NOT
NULL
03.
DROP
TABLE
dbo.tbl_demo;
04.
GO
05.
06.
CREATE
dbo.tbl_demo
07.
(
08.
Id
int
IDENTITY (1, 1),
09.
col1
char
(189)
DEFAULT
'just stuff'
),
10.
col2
(200)
'more stuff'
11.
);
12.
13.
-- Filling the table with a few records
14.
SET
NOCOUNT
ON
15.
16.
17.
INSERT
INTO
VALUES
18.
GO 100000
19.
20.
-- create two indexes for demonstration
21.
UNIQUE
CLUSTERED
INDEX
tbl_demo_Id
dbo.tbl_demo (Id);
22.
tbl_demo_Col1
dbo.tbl_demo(Col1);
The above code creates a relation named [dbo].[tbl_demo] and inserts 100.000 records. After finishing the INSERT-operation a clustered index [tbl_demo_id] and a non clustered index [tbl_demo_col1] will be created. When the script is finished you will get an overview of the physical structures of both indexes with the following script.
-- Check the physical index stats
SELECT
OBJECT_NAME(i.object_id)
AS
object_name,
i.
name
i.index_id,
index_level,
index_type_desc,
page_count,
record_count
FROM
sys.indexes i
INNER
JOIN
sys.dm_db_index_physical_stats(db_id(), OBJECT_ID(
'DETAILED'
) ps
i.object_id = ps.object_id
AND
i.index_id = ps.index_id
ORDER
BY
i.index_id
ASC
ps.index_level
;
The clustered index has a ROOT-Level (index_level = 2) and a B-Tree (balanced Tree) level (index_level = 1). The Leaf-Level contains the data which are distributed over 5.004 data pages (index_level = 0). Because the clustered index IS the relation itself the leaf-level contains the data of the relation. The internal structure of the clustered index can be viewed by using sys.dm_db_database_page_allocations (available since SQL 2012). The next pic shows an extract from the first data pages)
page_type_desc,
page_level,
previous_page_page_id,
allocated_page_page_id,
next_page_page_id
sys.dm_db_database_page_allocations
db_id(),
OBJECT_ID(
1,
WHERE
is_allocated = 1
page_type
DESC
page_level
previous_page_page_id
A pretty good way to make page allocations visible are the following queries which use information from sys.fn_PhysLocFormatter about the position of a record:
1.
-- see the location of each record in the affected index
2.
sys.fn_PhysLocFormatter(%%physloc%%)
Location, *
WITH
(tbl_demo_Id));
3.
(tbl_demo_Col1));
The first resultset shows the very first seven records stored in the clusterd index which starts on data page 20256 in the first data file of the database. The storage of data for the non clustered index starts on data page 25632 on the same data file of the database. Now the operation starts with dropping the clustered index from the relation. This operation automatically turns the relation into a HEAP which is - by definition - the relation itself without a clustered index. Some people assume a completely rebuild of the relation when dropping the clustered index - because the operation can consume much time and resources of Microsoft SQL Server. To see exactly what steps will Microsoft SQL Server go through when the clustered index is dropped the operation will be covered in a named transaction. This trick will make it easy to filter the transaction steps in a simple way from the transaction log.. After the clustered has been dropped again the page allocation of the relation will be listed by sys.fn_PhysLocFormatter.
-- drop the clustered index which will turn the relation into a HEAP
BEGIN
TRANSACTION
DropClusteredIndex
COMMIT
Id;
(tbl_demo_Col1))
The result for the relation - which has now turned into a heap - surprises. The formerly allocated data pages have not been deallocated. The records are located at the same position as before the DROP-operation. But the page allocation of the non clustered index [tbl_demo_col1] has changed! The non clustered index has been rebuild! A look into the transaction log shows, why the DROP operation didn't affect the allocation of the data pages of the relation itself.
Operation, Context, AllocUnitId, AllocUnitName, [Lock Information]
sys.fn_dblog(
[
Transaction
ID]
IN
4.
5.
6.
7.
Name
] =
'DropClusteredIndex'
8.
The picture above shows a shorten extract from the transaction log and the recorded steps Microsoft SQL Server has taken while dropping the clustered index. In total is was over 150 single operations but only a few of them can demonstrate the technical background; Microsoft SQL Server does not move data from one page to another but releases data from the data pages which hold meta data information of the relation.
To dive deeper into the internal structure of databases / data pages / records is recommended to read the great blog article "Inside the Storage Engine: GAM, SGAM, PFS and other allocation maps" from Paul S. Randal concerning the internals of the storage engine. An alternative to the www is the really great "bible for SQL Server Professionals" from Kalen Delaney "Microsoft SQL Server 2012 Internals". Have a look into the chapter "Databases Under the Hood" to get very detailed information about the storage engine. From the transaction log it is obvious that only data pages which hold meta data will be part of the changes. The data itself won't be moved / touched in any way. Because a heap doesn't have a B-Tree structure, only these structure information will be removed when dropping a clustered index. As an example for the operation the row 11 of the resultset of the transaction log will be object of deeper investigation. The [Lock Information] holds an exclusive lock in database 19 (which is my demo database) in file 1 and page 178 and slot 5. The page 178 is the IAM-page of the relation [dbo].[tbl_demo] which holds information about allocation of space by clustered or non clustered indexes. A close view into the data page demonstrates that extents have been deallocated.
DBCC TRACEON (3604);
DBCC PAGE (19, 1, 178, 3);
The following ranges are deallocated - they will be compared to the pages of the clustered index which "built" the B-tree structure
All named pages in the above table are pages from the root-level and the B-tree-level of the former clustered index. So really NO data itself have been moved / manipulated but only meta data of the structure of the relation itself. The "middle construct" between IAM and data has been removed completely - which describes the structure of a heap perfectly.
What impacts a non clustered index when any meta data manipulation will be used against the relation itself (e.g. creation / dropping a clustered index). The dependency between a non clustered index and the relation itself (whether a heap or a clustered index) is the link inside the non clustered index to the data pages of the heap / Clustered Index. Each non clustered index need to have a reference to the data of the relation. This "link" can be three different types of pointers:
Concerning the structure of the demo relation it was a unique clustered key on the primary attribut [Id]. Due to the fact that the clustered key has been determined as unique no UNIQUIFIER need to be stored "on top" to the clustered key. This key is always part in any non clustered key. The following pic demonstrates the structure of the non clustered index before the DROP operation has taken place: For every row in the non clustered index the pointer to the clustered key is stored, too. By this method Microsoft SQL Server is able to locate data from other attributes which are not part of the non clustered index (e.g. [col2]). The next picture shows the same non clustered index after the clustered index has been dropped. As a side effect you can see that a RID consumes another 4 bytes more than the previous clustered key which was an INT (4 bytes).
Dropping a clustered index will not force Microsoft SQL Server to rebuild the complete relation. Dropping a clustered index will only change the meta data and turn the B-Tree structure of an index into a flat hierarchy of an IAM page and multiple data pages (which is a heap). Adding a clustered index or dropping a clustered index will have deep impacts to ALL non clustered index of a relation because the pointers to the clustered index / heap will change and forces a rebuild of each non clustered index.
Maheshkumar S Tiwari edited Revision 15. Comment: corrected typo error and tags
TNJMAN edited Revision 14. Comment: syntactic and idiomatic corrections
Ed Price - MSFT edited Revision 10. Comment: Title casing; "heap" should be lowercase
Uwe Ricken edited Revision 8. Comment: added the pictures - yeah :)
Naomi N edited Revision 7. Comment: First image
Naomi N edited Revision 1. Comment: Minor edit
The images are not shown in the article
Dear Naomi,
I've informed Ed because I'm not able to insert pics into the article :(
I've tested with IE and Firefox - no success ...
Unfortunately Ed didn't response to my email.
If anybody can insert the pics to that article - ...
PLEASE drop me a mail to uwe.ricken@db-berater.de
I will send a ZIP. The naming of the pics are 1:1 to the tags in the article.
Thank you so much for your support.
Best, Uwe
Did you upload the image from your computer to the Wiki, as described in this article?
social.technet.microsoft.com/.../264.wiki-how-to-add-a-video-or-image-to-your-article.aspx
I find when I insert images into a Wiki article, I must be patient. It can take a few seconds for the image to be uploaded, then even longer before it is inserted in the article. I wait until I see the image before I save.
Naomi N edited Revision 7. Com