Categories
Outages

RFO: XN3 Outage

This morning the raid controller in XN3 failed, resulting in the machine rebooting and then failing due to no boot device. This is quite unfortunate as this server is only a few months old.

A datacentre engineer was assigned to check the status of the server, and following that were instructed to remove the server from the rack and re-seat the card. This restored service, though the server then crashed again shortly afterwards. A quick check of our spares inventory showed that there was no suitable replacement onsite, though we had a spare at the office.

A senior member of staff (myself) then set off as quickly as possible and arrived at the datacentre just after 1PM with a replacement card. On arrival to the rack I had great difficulty removing the server, and after the help of a datacentre engineer we managed to remove the server by very carefully removing the rails from the chassis while it was still in the rack.

It was apparent that the runners on the right-hand rail were slightly bent and catching on one of the adjustment screws making it impossible to remove the server /w rails fitted as per design.

Once the server was removed I replaced the card, adjusted the rails and powered the machine up. A quick check from our support team confirmed the problem had now been resolved. Just after 2PM we saw VPS starting to come back up.

We are very sorry for the amount of time it took to resolve this issue and will be taking measures to avoid such incidents in future. An audit of our spares inventory has already been done and we will be adding a few additional items, we will also look at getting a larger spare diskless-chassis onsite to enable a faster resolution to such issues.

Chris

Categories
Outages

[Resolved] xn1 down

xn1 is currently down (affecting some VPS in the 95.154.246.xx range) and we are waiting for the DC to attach a KVM. Apologies for the delay.

Further updates will follow.

Update: Sorry about this folks, it has taken over 2 hours for the datacentre to attach a KVM to the machine. We will be raising a complaint once this has been resolved.

Update 2: We have heard back from the datacentre, the server has no power. We are currently getting the PSU swapped out with a spare and service should be restored shortly. Thanks

Update 3: We are still waiting for the DC to swap out the PSU..

Update 4: We are still waiting for the DC to swap out the PSU, apparently no part onsite when there definitely is… they are re-checking and if not I will drive up there now. The support manager has also been notified about the responsetimes.

Update 5: PSU has been swapped, VMs are booting

Categories
Outages

[Resolved] DDoS

We currently have a 2GBps+ inbound DDoS which is affecting systems on 95.154.xx and 109.169.51.xx and are working to mitigate

Update: For now the attack has stopped and we will continue to monitor, apologies for any invonvenience.

Update 7.08PM – It’s back and we are working on it.

Update 7.17PM: – This has now been resolved and we expect no further issues. The target IP/website will not be re-instated.

Categories
Outages

[Resolved] xn22 + xn15 packet loss

There appears to be intermittent packet loss on xn22 and xn15 currently. This server is in a shared rack and the DC is working on it.

Update: Confirmed as a DDoS against another server in the rack, the target server has been nullrouted and connectivity restored.

Categories
Outages

[Resolved] xn4 network issue

We have an inbound 1Gbps+ DDoS attack against a VPS on this machine, we are working on isolating the target to restore service.

Update: The target IP has been nullrouted and service restored. If you are seeing any further connectivity issues please open a ticket.

Categories
Outages

[Resolved] Network issues

There appears to be none or partial connectivity to all of our equipment at the datacentre currently depending on your ISP and location, it looks like it could be an issue with one of our main carriers – Level 3 from doing a quick traceroute on my mobile but will get more details shortly

Thanks

Update: The issue appears to have resolved itself as soon as I finished writing this post! If you have any further issues please let us know.

Message recieved from our datacentre:

We are currently working with one of our upstream providers, who have notified us about a network issue as a result of them making a configuration change.  We had not been notified of the change.

Our core networking teams are doing what they can to mitigate the issue and looking into re-routing traffic.  This will be performed in the next few minutes, when connectivity should improve.

We will keep you updated on this matter and sincerely apologise for any inconvenience this is causing.

If you continue to experience issues please let our support teams know and we will investigate as a matter of urgency.

Categories
Outages

[Resolved] xn17 Packet loss

xn17 currently has high packet loss because another server in the rack is under a DDoS attack. This is a shared colo rack so the datacentre is working on it.

Update: Connectivity is restored

Update2: Just got back from the datacentre, it looks like the motherboard failed or at least ACPI functionailty had completely stopped working. We’ve moved the drives into a different chassis (inc motherboard, psu, raid card etc) and service is now restored.

Thanks

Categories
Outages

[Resolved] xn3 Down

The powersupply in XN3 failed this morning and has been replaced but the server still has startup issues.

We are waiting on an update from the datacentre but will probably go to site shortly and swap the drives into another chassis.

Further updates will follow.

Categories
Outages

[Resolved] cp1 crashed

cp1 has crashed and is currently doing an automated FSCK of / to correct any fileystem errors. Its currently at 13% so ETA is about 20 minutes to service restoration.

Appologies for the inconvenience.

Categories
Outages

[Resolved] xn18 issue

The issues on xn18 are back, as the load on the machine is too high to get in via SSH the DC is nullrouting IP’s individually until we identify the cause for these issues.

Updates will follow shortly.