Physical disks often experience errors of varying severity, from errors that the disk can transparently recover from, without interruption or data loss; to errors that are catastrophic and can cause data loss. Storage Spaces responds to a range of errors that can occur on physical disks, responding according to the severity of the error in such a way to maximize data safety.
Option
Behavior
Enabled
If a disk is missing but its enclosure is still present, treat the missing disk as failed and retire it.
Disabled
If a disk is missing, wait for either the disk to reconnect or for administrative action. GUI will show a status "Lost communication"
Auto
If the pool has a hot spare, follow the Enabled logic. Otherwise, follow the Disabled logic.
Retire means the drive is no longer used for new allocations - data is written only to the remaining disks in the pool. Existing data on the drive, once it's marked retired, will be re-allocated to other drives in the pool as part of a repair operation. However, until re-allocation starts existing allocations on the drive will continue to be updated, in order to maintain redundancy of data, a disk that is retired will display that status in both the GUI and in Windows PowerShell.
Storage Spaces does not persist non-critical error information about physical disks. If a transitory I/O error occurs during write and the system is restarted, all physical disks will be identified as healthy after the reboot. If, however, an error occurs that indicates the entire drive is failing, that information will be persisted across a reboot such that the physical disk will continue to be identified as unhealthy. If a Storage Space was marked as degraded prior to the restart it will continue to be identified in the GUI as degraded until corrective action, such as a repair or automatic re-synchronization is started.
If one or more drives are configured as hot spares, the following conditions describe when the hot spare drive(s) will come online:
Provider:
disk
Event ID:
154
Level:
Error
Text:
“The IO operation at logical block address <LBA> for Disk <Disk Number> (PDO name: <PDO Name>) failed due to a hardware error.”
Logged when:
A fatal hardware error is reported by the device.
Sense Key:
Byte 0x2D in the binary data
Add’l Sense Code:
Byte 0x2E in the binary data
Add’l Sense Code Qualifier:
Byte 0x2F in the binary data
If a GUID is recorded, you can use the following PowerShell command to identify the physical disk associated with the I/O error:
Get-PhysicalDisk |? { $_.ObjectId.Contains( $PhysicalDiskGUID ); }
To illuminate the LED associated with the physical disk:
Enable-PhysicalDiskIndication –FriendlyName (Get-PhysicalDisk |? { $_.ObjectId.Contains( $PhysicalDiskGUID ); }).Friendlyname
If a physical disk number is recorded, as shown in Figure 1, you can use the following PowerShell command to identify the physical disk associated with the I/O error:
Get-PhysicalDisk -friendlyName <PhysicalDiskNumber>
Figure 1: Event Viewer showing an error with physical disk 3
Enable-PhysicalDiskIndication –FriendlyName <PhysicalDiskNumber>
In the Storage Pools tile of the File and Storage Services role in Server Manager, health status that requires administrative action is identified as illustrated below, a yellow triangle with an exclamation point:
Figure 2: The Storage Pools tile in Server Manager
Figure 3: The Virtual Disks related tile in Server Manager
You can identify a drive using the "Toggle Drive Light" option: Figure 4: The Physical Disks related tile showing the Toggle Drive Light command
Martin Kirchhoefer edited Revision 9. Comment: When attempting to bring in a hot-spare, we will only consider hot-spares which reside in the same enclosure as the failed drive
Ed Price - MSFT edited Revision 7. Comment: Title casing
Martin Kirchhoefer edited Revision 6. Comment: added links to event log description
Jason Gerend_MSFT edited Revision 2. Comment: Finished initial drafting and image posting