{"id":384,"date":"2020-02-29T16:27:07","date_gmt":"2020-02-29T08:27:07","guid":{"rendered":"https:\/\/kylemcdonald.com.au\/?p=384"},"modified":"2020-03-01T17:45:02","modified_gmt":"2020-03-01T09:45:02","slug":"replacing-a-failed-vsan-cache-drive","status":"publish","type":"post","link":"https:\/\/kylemcdonald.com.au\/2020\/02\/29\/replacing-a-failed-vsan-cache-drive\/","title":{"rendered":"Replacing a failed vSAN cache drive"},"content":{"rendered":"
My home lab has been running with primary storage provided via an all-flash implementation of VMware vSAN for almost 4 years now.<\/p>\n
The underlying drives are consumer-grade Samsung 850 EVO 120gb (cache) and 500gb (capacity) SSD drives. Six months ago, vSAN started showing health warnings for the cache drive on one of the ESXi hosts which a few months later finally resulted in the vSAN disk group for that host being marked offline.<\/p>\n
<\/a><\/p>\n This wasn’t a totally unexpected event, although I was surprised in how long the SSD drives had lasted for. I ordered some cheap Gigabyte 120gb replacement SSD’s for all hosts, assuming that the other cache drives would likely die soon as well and thought it worthwhile to note down the process I used to replace the failed cache SSD and get the vSAN disk group back up and running.<\/p>\n First, I went to Cluster-> Monitor-> vSAN-> Health, and expanded the Physical disk section to confirm it was the 120gb cache drive that needed to be replaced on the host.<\/p>\n <\/a><\/p>\n Then I went to Cluster-> Configure-> vSAN-> Disk Management, and expanded the Disk Group for the host ESX1 which shows the cache drive was showing Permanent disk failure<\/p>\n <\/a><\/p>\n I then clicked the icon to Remove the disk group and was given options around vSAN data migration. While the drop-down does let me select “Full data migration”, this option would have resulted in failure since the disk group is unavailable. Instead, I selected “No data migration” and was given the warning that objects would become non-compliant with my vSAN storage policy.<\/p>\n <\/a><\/p>\n Selecting “Delete”, I could then see the progress in the Recent Tasks bar<\/p>\n