[Resolved] Network Issue 19/07/12

As of 20:15 we are seeing high latency and packet loss to all of our racks, colo machines and the datacentre’s support system. We are waiting to hear further information and will update here shortly.

Thanks

Update: We’ve now confirmed this issue is affecting the whole DC, including other sites on the iomart network.

Update 20.36: We’re still waiting on news from the datacentre.

Update 20.45: It looks like the issue is with iomart’s router at LINX (London Internet Exchange) which most traffic in the UK goes through. LINX’s public traffic stats appear normal.

Update: 21.24: The datacentre is invesigating this but we are still waiting for updates.

Update 21.55: Connectivity appears to be restored, waiting to recieve all-clear from the DC

Update: 23.03: All clear issued, this outage was the result of a very large DDoS attack against someone in the Datacentre, saturating multiple upstream providers. Traffic has been re-routed and the network staff at the DC will continue to monitor.

RFO: Network Outage 18/07/12

In the early hours of this morning we became aware of a network issue affecting all servers in one of our racks, including our main website and support desk. We received reports of intermittent packet loss, and potential routing issues as IP’s were responding from some locations and not others.

A ticket was immediately logged with our datacentre by the member of staff currently online, and after some basic checks were made on our hardware, the ticket was placed on hold for attention of a network engineer.

We use a Cisco HSRP setup which provides the switch in each rack with 2x redundant uplinks; should one of those uplinks fail, the other should pick up the slack. Despite the uplinks being online at either end, our switch was dropping packets on the primary uplink, thus causing these intermittent connectivity issues as it didn’t disable the interface and move to the secondary uplink.

Having double and trouble checked everything, we reloaded the configuration, and restarted the switch which restored full connectivity to all systems.

Prior to this incident this rack and its switch had well over a year of uptime, we can only draw from what we have seen today that either:

A) This was a one off/ a glitch (We prefer answers, but technology isnt perfect..!)

B) This was a bug in the Firmware on the switch and we will check this with Cisco, though we installed the latest Cisco IOS before deployment.

We don’t expect any further issues at this point, but will continue to closely monitor and investigate this issue further to prevent such an outage happening again.

We sincerely apologise to all customers affected by this incident, and we will be honouring any SLA credits made via the procedures as outlined on our website.

Chris

[Resolved] UK Connectivity Outage

We are currently aware of an issue affecting our UK Servers and are looking into this issue, any new information will be posted here once we have it.

Update @ 10:15am: This is looking like a routing issue at the DataCentre and senior Networking Engineers are looking into it now.

Update @ 12.12: A senior network engineer at the datacentre is now looking into this. Please note that the following subnets are affected:
95.154.207.xx
95.154.244.xx
95.154.246.xx
109.169.51.xx

If you have a service in one of our newer IP ranges i.e. 95.154.203.x or 95.154.208.x you should be unaffected.

We sincerely appologise for this disruption to service and will provide a full RFO as soon as possible.

Update @ 12.49: Connectivity has been fully restored. RFO to follow after further investigation.

Thanks

[Resolved] xn3 crashed

xn3.pcsmarthosting.co.uk has crashed with a kernel panic and has been rebooted.

Update: It’s almost booted up, no signs of any RAID or filesystem issues, will give it a thorough once-over once we have brought all the VPS back online

Appologies for the inconvenience.