Categories
Outages

[Complete] XN7 Issues Continued

We are sorry to inform you that the issues on XN7 continue despite a set of 4x new disks.

At this point we have decided it’s in our best interest to scrap the machine for the time being, and only after extensive testing will it be put back into production. We are going to start migrating all VPS to a new Supermicro/Xeon Lynfield host machine. Unfortunatley this will require IP changes but it’s the best option available to us at this point.

You will be contacted by email once your VPS has been moved, with the new IP Address(s).

ALL but two VPS on XN7 have now been moved and issued new IP Addresses. We are working on those and customers will be notified ASAP.

Categories
Outages

[Complete] XN7 Issues

XN7 is currently having issues affecting some VPS customers in the 95.154.207.xx range. It’s very rare this happens but both drives in one half of the RAID10 array are showing signs of failure and its causing intermittent filesystem issues.

We are working to resolve this as quickly as possible, while maintaining integrity of all customer data.

Update: Our datacentre is being very slow doing anything for us at the moment, appologies for the delay.
Update 2: We are still waiting on our datacentre…
Update 3: It looks like the motherboard may have failed in the machine. We do have a spare onsite but are confirming this now.
Update 4: System is back online and VPS are booting up. The RAID array is rebuilding and we will continue to monitor this. Hopefully the rebuild completes and we can then swap out the failing drives immediatley.

Update 5: The rebuild has failed. Server has been restored for now with the rebuild stopped. We are going to shortly backup all data, put in new disks and reload this server.

Update 6: Senior staff will begin maintainence on this server at the datacentre, starting approximatley 10.00PM (GMT+1).

Update 7: 1.53AM. 12 out of the 18 VPS on this node have now been backed up. This gives us a rough ETA that the backing up will be completed at 4AM in around 2 hours time.

Update 8: 3.21AM. The last VPS Is backing up now, then we will reload the machine.

Update 9: We are restoring data

Update 10: We are about to boot all VM’s

XN7 has now been restored and all VPS are booting up. We will continue to monitor closely.

Categories
Outages

[Resolved] xn14 outage

xn14 is currently down and is being checked

Update: A disk had failed in the RAID array and the controller failed to migrate I/O away from that disk causing it to hang. Have now replaced the disk and it’s in the process of rebuilding. VPS are booting up.

Please note that approx 5 VPS are doing a mandatory FSCK of their filesystem as its been X amount of days since their last reboot. Until these complete there is going to be a high amount of I/O wait on this node. Please be patient and do not reboot your VPS, this will only cause more load.

Categories
Outages

[Resolved]CP1 MySQL Issue

We are currently facing an issue with CP1, the machine has loaded up on the MySQL Service and we are looking into it now.

Update: This has now been resolved, apologies for the short MySQL Downtime.

Categories
Outages

[Resolved] Network Issues

We currently have malicious network traffic incoming affecting connectivity to the below servers:

xn4, xn5, xn6, xn8

It seems only 109.169.51.0/24 is targeted, systems on other subnets are fine.

Update: After some process of elimination, this looks like a possible malfunction with the NIC on xn4, which is causing a traffic storm on the subnet. XN5, XN6 + XN8 are restored.

Update 2: We are going to move to the 2nd NIC on XN4 with a new cable + switchport.

Update 3: The traffic has returned to xn5/6/7 and we are working to mitigate.

Update 4: XN5/Xn6/Xn8 restored. XN4 being worked on

Update 5: All systems are now up and we will be following this up with our datacentre in the morning.

Categories
Outages

[Resolved] Large Inbound DDoS

We are currently facing a 5GB/s inbound DDoS attack, we are working to mitigate this and update will be provided here.

Update: We are still working on mitigating this attack, appologies for any inconvenience. Further updates will be provided as soon as possible.

Update 2: Connectivity has been restored to all but 1 server at the moment.

Update 3: Connectivity has now been restored to all servers. The remaining malicious traffic has terminated, and traffic through our network has returned to normal levels. We are very sorry for the amount of time it has taken to mitigate this, the largest timekiller was isolating multiple targets. Some customers will have had a longer amount of downtime/packet loss than others.

Following this outage we will continue to monitor the network vigilently for the next 48 hours, and we are also going to draw up our options for implementing additional filtering into our network to improve our resilienece to malicious traffic.

Categories
Outages

[Resolved] xn7 down

xn7 appears to be down although its responding to SSH. We are currently waiting for a KVM to be attached.

Update: The datacentre appears to be non-responsive at the moment. Appologies for the delay.

Update 2: This was due to the first disk in the RAID array failing, the server has now been rebooted and VPS are starting up. We will hot-swap the disk out shortly.

Update 3: We have hot-swapped out the bad disk and the array has now almost finished rebuilding.

Categories
Outages

[Resolved]Packet Loss

We have been alerted by our monitoring systems that there is currently a high amount of packet loss in our UK Data Centre, we are in contact with the Data Centre and will update here once we have any more information.

Update 2: This has now been resolved and all traffic is flowing normally.

Update: This is a LINX (London Internet Exchange) Issue, it has just dropped over 500GB/s of traffic:

Categories
Outages

[Resolved]Temporary Network Issue

During a maintenance window, one of the DWDM units lost power, this caused temporary packet loss for approx 2-3 Minutes however everything is now online and normal service has been resumed.

Categories
Outages

[Resolved] xn9 Issue

xn9 is currently having issues, although the system & VPS are responding to ping it looks like I/O or RAID related issues. Just waiting on the DC to attach KVM now

Update 1: We have confirmed this is a RAID failure and we are currently working to restore the array.

Update 2: Waiting on DC again at the moment, sorry for the delay.

Update 3: Unfortunatley it appears the RAID array has collapsed and is unusable instead of just being degraded due to a bad disk. We are working through our options to restore the array.

Update 4: We have now re-assembled both sides of the RAID set and the system is now booted off a degraded RAID-10 volume, VPS are currently starting up. We will now proceed to inspect the integrity of the array and hot-swap out any suspect disks.

Update 5: Disks p1 and p2 make up the first half of the RAID-10 array (Or together  a RAID-1 set), disk p2 is bad and a read error caused the system to hang. The system is currently running off the bad disk, p2 while p1 rebuilds itself. Once this has been completed we will immediatley take p2 offline and replace the disk. At this point the array is fragile however we have no reason to believe the current rebuild will not complete successfully. Many thanks for your patience.

Update 6: Full redundancy and performance has now been restored to the array.