Categories
Outages

[Resolved] xn18 issue

A VPS on XN18 is currently being attacked and we are working on mitigating the traffic. If your VPS is down please do not try and reboot it.

Update We believe this is now resolved but will continue to monitor.

Categories
Outages

[Resolved] xn3 Outage

xn3 is currently having issues with the RAID array and is failing to boot, its looking like multiple drive failures in one half of the RAID10 set. This also hosts the VPS for our website and helpdesk so that’s currently offline.

We are working on this and updates will be posted here shortly.

Update: We have recovered the array and it’s now rebuilding onto some fresh drives. Disk performance will be poor for a few hours while it rebuilds and restores full redundancy to the array. If your VPS is currently down, its booting up and may be doing an automated filesystem check.

Thanks

Categories
Outages

Denial of service attack

16:59: We are currently under a 2Gbps+ denial of service attack affecting most of our equipment, this is currently being worked on.

Update 17.07: Connectivity has been restored

Update 19:07: Attack has returned and is being investigated

Update: 19.25: We are currently waiting on updates from the DC. Unfortunately as the attack is greater than both up-links in the rack (2x 1Gbps), we are unable to nullroute the traffic on our own equipment. We appreciate your patience during this time and will restore service as soon as possible.

Update 20.05: Stability should be more or less restored at this point for all systems except XN9 which we’ve isolated as the target host.

Update 20:15: All systems up and running, we will continue to monitor.

Categories
Outages

[Resolved] Network Issue 19/07/12

As of 20:15 we are seeing high latency and packet loss to all of our racks, colo machines and the datacentre’s support system. We are waiting to hear further information and will update here shortly.

Thanks

Update: We’ve now confirmed this issue is affecting the whole DC, including other sites on the iomart network.

Update 20.36: We’re still waiting on news from the datacentre.

Update 20.45: It looks like the issue is with iomart’s router at LINX (London Internet Exchange) which most traffic in the UK goes through. LINX’s public traffic stats appear normal.

Update: 21.24: The datacentre is invesigating this but we are still waiting for updates.

Update 21.55: Connectivity appears to be restored, waiting to recieve all-clear from the DC

Update: 23.03: All clear issued, this outage was the result of a very large DDoS attack against someone in the Datacentre, saturating multiple upstream providers. Traffic has been re-routed and the network staff at the DC will continue to monitor.

Categories
Outages

[Resolved] UK Connectivity Outage

We are currently aware of an issue affecting our UK Servers and are looking into this issue, any new information will be posted here once we have it.

Update @ 10:15am: This is looking like a routing issue at the DataCentre and senior Networking Engineers are looking into it now.

Update @ 12.12: A senior network engineer at the datacentre is now looking into this. Please note that the following subnets are affected:
95.154.207.xx
95.154.244.xx
95.154.246.xx
109.169.51.xx

If you have a service in one of our newer IP ranges i.e. 95.154.203.x or 95.154.208.x you should be unaffected.

We sincerely appologise for this disruption to service and will provide a full RFO as soon as possible.

Update @ 12.49: Connectivity has been fully restored. RFO to follow after further investigation.

Thanks

Categories
Outages

[Resolved] xn3 crashed

xn3.pcsmarthosting.co.uk has crashed with a kernel panic and has been rebooted.

Update: It’s almost booted up, no signs of any RAID or filesystem issues, will give it a thorough once-over once we have brought all the VPS back online

Appologies for the inconvenience.

Categories
Outages

[Resolved] xn3 Down

xn3.pcsmarthosting.co.uk is currently down and this is the machine our helpdesk resides on as well.

We are working on this and will update you shortly.

Update: The system has restarted however its failing to startup as a few init scripts appear to be damaged. We are working on this.
Update: Sorry for the delay folks, the datacentre is being quite slow to respond at the moment.

Update: The system has been fully restored, all VPS are up. Thanks

Categories
Outages

[Resolved] xn2 issues

The RAID array has gone inoperable on xn2 despite being optimal with all discs present only a few days ago.

Unit  UnitType  Status         %RCmpl  %V/I/M  Stripe  Size(GB)  Cache  AVrfy
——————————————————————————
u0    RAID-10   INOPERABLE     –       –       64K     596.025   OFF    ON

We are hoping this is a malfunction of the RAID controller and not an actual array issue, currently we are waiting on the datacenter for updates.

Update: 8.12PM – Just called the DC to chase up the reboot request and it’s being done now, apparently they are very busy tonight.
Update: 8.26PM – Machine rebooted but not responsive to ping and no output on the KVM. Is being checked, also preparing to go onsite!
Update 8.35PM – System is now booting up and the RAID array appears intact! Will do some sanity checks etc once it’s up.

Update 8.39PM – It appears that there were multiple drive failures. This machine has a 4x disk RAID10 set.
Drive p0 – We hot-swapped a few days ago and is OK
Drive p1 – Failed, completely dead/undetected
Drive p2 – OK
Drive p3 – Rebuilding

We will let drive p3 finish rebuilding itself. Once that is complete we will replace drive p1, and when that finishes we will also replace drive p3 as a precaution. Please can we ask that you avoid any disk intensive tasks for the next 48 hours, so we can restore full redundancy and performance to the array in a timely fashion.

** At this point ALL VPS should be online. If yours is having issues, log a ticket **

Update 10.11PM – Drive p3 is 90% rebuilding. Will replace the failed p1 disk shortly.
Update 11.13PM – Drive P3 has been fully rebuilt. Drive P1 has been hot-swapped out and is rebuilding.

Update 20.59PM 26/04/12: Full redundancy + performance has now been restored to the RAID array on xn2. We dont expect any further issues but as always we will monitor this server carefully for the time being.

Thanks

Categories
Outages

[Resolved] Scheduled reboots 06/04/12

We are doing a scheduled reboot of the following servers this evening to address a stability issue between the latest Adaptec Firmware and the aacraid driver in the RHEl5 kernel:

xn19.pcsmarthosting.co.uk
xn20.pcsmarthosting.co.uk

This has been completed successfully. RAID arrays on these machines are doing a verify /w fix to ensure consistency.

Categories
Outages

[Resolved] vz2 unresponsive

vz2 has become unresponsive. We are waiting for KVM to be moved by onsite staff

Update: We have identified the VPS causing this high load, their system has spawned hundreds of processes. We are trying to get things under control with some liberal use of pkill -9 but if that fails will reboot.

Update 2: This has been resolved without reboot, the responsible VPS has been suspended.