xn9 appears to be suffering some filesystem issues. Currently getting some remote eyes.
Update: For some reason disk 3 decided it would belong to a new array, and caused the filesystem to go read only. This has been corrected and the RAID is rebuilding. We will continue to monitor.
It appears XN7 has crashed, although it is responding to ping and as such was not picked up by our monitoring.
We will update you ASAP
Update: We have rebooted the machine and VPS are now starting up one by one. We are still investigating the cause of the crash.
There are currently issues with our Shared and Reseller Server CP1 which is causing sites to fail to load.
This is being investigated and all update will be here.
Update @ 15:58: This is now resolved
Update @ 15:46: This is looking like a grub issue, it is being corrected now
xn11 has crashed and is being checked
Update: Looks ok on the console but is unreachable via the network. This is rebooting
Update 2: It turns out there was a bit of fat finger syndrome and the port was taken out of the VLAN. The server is now up and VPS are starting up as well
We are currently seeing packet loss affecting all UK servers
Update: This looks like a potential issue with LINX (London internet exchange) as many UK sites are slowing to a crawl. We will update you as we have further information.
Update 2: We have confirmed the issues are due to LINX.
Update 3: We are now routing past LINX, therefore most ISP’s should find access to our services is back to normal speeds. Some ISP’s will still go straight to LINX unfortunatley.
Update 4: We believe this issue is now resolved which can be seen from the LINX Graph below.
As per the email sent out today, we will be taking cp1 down for approximatley 15 minutes at 9PM, in order to restore full redundancy to the RAID array after the events on Saturday.
Update: This is currently in progress
Update 2: Actual downtime 5 minutes. Machine is up and services are starting
Update 3: The RAID rebuild is now starting. It’s going to be 10-15 minutes before the load stabalizes. There is going to be substancially increased I/O wait unfortunatley until the rebuild completes.
Update 4: Load is now coming down and the rebuild is chugging along nicely. We are marking this resolved and will continue to monitor until full redundancy and performance has been stored to the array.
cp1 has crashed due to what appears to be load issues. We are currently waiting on some remote eyes.
Update: It looks like possible primary hard disk failure. Giving it 5 minutes on the console to see if it boots, if it fails we will need to run a FSCK over the raid array and take it from there.
Update 2: I can confirm that /dev/sda (the primary hard disk) has failed. We are inspecting the data on the second disk. Standby for updates
Update 3: /dev/sdb is ok. We have repaired the filesystem and re-installed Grub. The machine is starting up now. Please note that CP1 is now running with 1 less idsk in the RAID set. Expect increased I/O wait an higher than normal loads. We will be replacing the disk momentarily.
Update 4: Some IP’s failed to come online properly. This has been fixed and everything is looking ok. It’s going to be 10 minutes or so before the machine stabalizes with normal levels of load.
Update 5: We have made a secondary backup of the machine onto our NAS as a precaution.. We will be restoring full redundancy to the RAID array with a new disk in due course.
Some of our xen nodes are currently the target of a denial of service attack. While no servers are down, you may notice increased latency and a reduction in network speed.
Update: This is now resolved.
UPDATE : This maintainence is now underway. All machines have been cleanly powered down. We are currently double checking all systems are powered down and will swap the PDU
UPDATE2: Due to a delay with onsite staff this is taking longer than expected. Appologies for the inconvenience.
UPDATE3: PDU has been replaced and we have connectivity on Edge #1. Servers are now being power cycled.
UPDATE5: All systems are now online and passed sanity checks. IF your VPS is offline please reboot via SolusVM or open a ticket
This is a reminder for the below maintainance which will begin in approx 1hr 45mins
Planned Maintenance Notification for 11:00AM GMT Monday 1st March.
This email is to inform you of some planned maintenance which will be service affecting.
The main power bar in our rack which is supplied by our data centre has failed. The failure does not affect the power going through and ultimately to our servers however what has failed is the LCD Display which shows the current power usage which is very important to ensure that the rack isn’t overloaded.
11:00AM GMT on Monday 1st March 2010
The downtime expected is approx 10 – 15 minutes. We will be powering all machines down cleanly then replacing the PDU and powering each machine back on.
Note: VPS may take longer due to the way that VPS start up.
We apologize for any inconvenience this may cause.
xn9 is currently having RAID issues, initial inspection shows that half of the RAID-10 set which includes the area of the raid including the OS is inoperable.
We are taking the machine down for further inspection and assessment on restoring service along with data recovery.
Thank you for your patience.
Update @ 18:19GMT: All disks have been tested and have passed all tests. The RAID is now ReBuilding again as it was Cancelled to perform the Disk Tests. If any VPS are still down then please raise a ticket and we will look into it for you.
Update @ 17:43 GMT: The RAID Array is currently rebuilding however VPS should now be starting up again.
Total Downtime: ~2 Minutes