Categories
Outages

Denial of service attack

16:59: We are currently under a 2Gbps+ denial of service attack affecting most of our equipment, this is currently being worked on.

Update 17.07: Connectivity has been restored

Update 19:07: Attack has returned and is being investigated

Update: 19.25: We are currently waiting on updates from the DC. Unfortunately as the attack is greater than both up-links in the rack (2x 1Gbps), we are unable to nullroute the traffic on our own equipment. We appreciate your patience during this time and will restore service as soon as possible.

Update 20.05: Stability should be more or less restored at this point for all systems except XN9 which we’ve isolated as the target host.

Update 20:15: All systems up and running, we will continue to monitor.

Categories
Outages

[Resolved] Network Issue 19/07/12

As of 20:15 we are seeing high latency and packet loss to all of our racks, colo machines and the datacentre’s support system. We are waiting to hear further information and will update here shortly.

Thanks

Update: We’ve now confirmed this issue is affecting the whole DC, including other sites on the iomart network.

Update 20.36: We’re still waiting on news from the datacentre.

Update 20.45: It looks like the issue is with iomart’s router at LINX (London Internet Exchange) which most traffic in the UK goes through. LINX’s public traffic stats appear normal.

Update: 21.24: The datacentre is invesigating this but we are still waiting for updates.

Update 21.55: Connectivity appears to be restored, waiting to recieve all-clear from the DC

Update: 23.03: All clear issued, this outage was the result of a very large DDoS attack against someone in the Datacentre, saturating multiple upstream providers. Traffic has been re-routed and the network staff at the DC will continue to monitor.

Categories
News

RFO: Network Outage 18/07/12

In the early hours of this morning we became aware of a network issue affecting all servers in one of our racks, including our main website and support desk. We received reports of intermittent packet loss, and potential routing issues as IP’s were responding from some locations and not others.

A ticket was immediately logged with our datacentre by the member of staff currently online, and after some basic checks were made on our hardware, the ticket was placed on hold for attention of a network engineer.

We use a Cisco HSRP setup which provides the switch in each rack with 2x redundant uplinks; should one of those uplinks fail, the other should pick up the slack. Despite the uplinks being online at either end, our switch was dropping packets on the primary uplink, thus causing these intermittent connectivity issues as it didn’t disable the interface and move to the secondary uplink.

Having double and trouble checked everything, we reloaded the configuration, and restarted the switch which restored full connectivity to all systems.

Prior to this incident this rack and its switch had well over a year of uptime, we can only draw from what we have seen today that either:

A) This was a one off/ a glitch (We prefer answers, but technology isnt perfect..!)

B) This was a bug in the Firmware on the switch and we will check this with Cisco, though we installed the latest Cisco IOS before deployment.

We don’t expect any further issues at this point, but will continue to closely monitor and investigate this issue further to prevent such an outage happening again.

We sincerely apologise to all customers affected by this incident, and we will be honouring any SLA credits made via the procedures as outlined on our website.

Chris

Categories
Outages

[Resolved] UK Connectivity Outage

We are currently aware of an issue affecting our UK Servers and are looking into this issue, any new information will be posted here once we have it.

Update @ 10:15am: This is looking like a routing issue at the DataCentre and senior Networking Engineers are looking into it now.

Update @ 12.12: A senior network engineer at the datacentre is now looking into this. Please note that the following subnets are affected:
95.154.207.xx
95.154.244.xx
95.154.246.xx
109.169.51.xx

If you have a service in one of our newer IP ranges i.e. 95.154.203.x or 95.154.208.x you should be unaffected.

We sincerely appologise for this disruption to service and will provide a full RFO as soon as possible.

Update @ 12.49: Connectivity has been fully restored. RFO to follow after further investigation.

Thanks

Categories
Outages

[Resolved] xn3 crashed

xn3.pcsmarthosting.co.uk has crashed with a kernel panic and has been rebooted.

Update: It’s almost booted up, no signs of any RAID or filesystem issues, will give it a thorough once-over once we have brought all the VPS back online

Appologies for the inconvenience.

Categories
Planned Maintainance

[Completed] Scheduled Reboots 21/06/12

We are rebooting our OpenVZ nodes tonight in order to boot into the latest kernel for security & compatability improvements. The following servers will be rebooted, starting at 9.00PM (GMT+1)

* vz1.pcsmarthosting.co.uk = Completed
* vz2.pcsmarthosting.co.uk = Completed
* vz3.pcsmarthosting.co.uk = Completed

Categories
Planned Maintainance

Xen Upgrades Completed **All Clear**

We have now successfully finished updating all of our Xen nodes and are pleased to say they all have a clean bill of health, as well as running the latest Xen version.

If you are on a Xen PV VPS, we have also rotated our kernels. We strongly reccomend any Xen PV customers who are not using Pygrub, click Reboot inside SolusVM which will automatically restart your VPS onto the latest kernel, and copy the matching kernel modules inside your VPS.

As always, if you have any problems let us know and we will be happy to help!

Chris

Categories
Planned Maintainance

Scheduled Reboots 20/06/12

We are continuing maintenance work on our Xen VPS nodes this evening. The following servers will be restarted from 9.00PM (GMT+1)

* xn13.pcsmarthosting.co.uk = Completed
* xn15.pcsmarthosting.co.uk = Completed
* xn16.pcsmarthosting.co.uk = Completed

Please Note: Since these machines had over a year of uptime, a number of VPS are doing a FSCK causing high IOWAIT on the host machines and delaying the startup process. If your VPS is not online it will come online by itself, please be patient and do not try to reboot it from SolusVM. Thanks

Categories
Planned Maintainance

[Completed] Scheduled Reboots 19/06/12

We are continuing maintenance work on our Xen VPS nodes this evening. The following servers will be restarted from 9.00PM (GMT+1)

* xn6.pcsmarthosting.co.uk = Completed
* xn7.pcsmarthosting.co.uk = Completed
* xn8.pcsmarthosting.co.uk = Completed

Please Note: Since these machines had over a year of uptime, a number of VPS are doing a FSCK causing high IOWAIT on the host machines and delaying the startup process. If your VPS is not online it will come online by itself, please be patient and do not try to reboot it from SolusVM. Thanks

Categories
Planned Maintainance

[Completed] Scheduled Reboots 18/06/12

We are continuing maintenance work on our Xen VPS nodes this evening. The following servers will be restarted from 9.00PM (GMT+1)

* xn1.pcsmarthosting.co.uk = Completed
* xn3.pcsmarthosting.co.uk = Completed
* xn5.pcsmarthosting.co.uk = Completed

Please Note: Since these machines had over a year of uptime, a number of VPS are doing a FSCK causing high IOWAIT on the host machines and delaying the startup process. If your VPS is not online it will come online by itself, please be patient and do not try to reboot it from SolusVM. Thanks