Categories
Outages

RFO: XN3 Outage

This morning the raid controller in XN3 failed, resulting in the machine rebooting and then failing due to no boot device. This is quite unfortunate as this server is only a few months old.

A datacentre engineer was assigned to check the status of the server, and following that were instructed to remove the server from the rack and re-seat the card. This restored service, though the server then crashed again shortly afterwards. A quick check of our spares inventory showed that there was no suitable replacement onsite, though we had a spare at the office.

A senior member of staff (myself) then set off as quickly as possible and arrived at the datacentre just after 1PM with a replacement card. On arrival to the rack I had great difficulty removing the server, and after the help of a datacentre engineer we managed to remove the server by very carefully removing the rails from the chassis while it was still in the rack.

It was apparent that the runners on the right-hand rail were slightly bent and catching on one of the adjustment screws making it impossible to remove the server /w rails fitted as per design.

Once the server was removed I replaced the card, adjusted the rails and powered the machine up. A quick check from our support team confirmed the problem had now been resolved. Just after 2PM we saw VPS starting to come back up.

We are very sorry for the amount of time it took to resolve this issue and will be taking measures to avoid such incidents in future. An audit of our spares inventory has already been done and we will be adding a few additional items, we will also look at getting a larger spare diskless-chassis onsite to enable a faster resolution to such issues.

Chris