Categories
Outages

[Resolved] xn10 down

xn10 has suffered a primary drive failure in the RAID10 array and is currently unbootable, it looks like everything in /boot is damaged.

We are currently waiting on remote hands to attach rescue media and will work to repair this and restore service as quickly as possible.

Update: The operating system is too badly damaged and will require a reinstall. Unfortunately it also looks like two drives are failing in the same RAID1 set. We will reinstall and may need to wait for the rebuild to complete before putting the server back in service to ensure complete recovery of customer data. Further updates will follow.

Update 2 (17:45 GMT): The server has been reinstalled and we have been able to recover the raid array, it’s now rebuilding and currently at 60%. Once it finishes we will reboot the server into Xen and run a fsck against each VM individually, as soon as this completes it will be booted. When all VM’s have been booted, solusvm will be reconnected and we will make arrangements to how-swap out the other bad drive.

Update 3 (18:05 GMT): RAID Rebuild at 72%

Update 4 (18:30 GMT): RAID Rebuild at 85%

Update 5 (18:50 GMT): RAID Rebuild at 95%

Update 6 (19:17 GMT): VM’s are now being checked and booted one by one.

Update 7 (21:29 GMT): 50% of VM’s are now online.