Category Archives: Outages

[Resolved] xn9 outage

xn9 appears to have gone unresponsive, we are currently waiting on remote hands to check the server.

Update 1: It appears the underlying partition of the LVM volume group containing VPS filesystems has gone away, despite the RAID controller, drives and array being healthy. We are currently investigating recovery opportunities.

Update 2: We have been able to manually reconstruct the underlying partition and LVM metadata, however after several attempts we are unable to get it assembled in such a way that VM filesystems are accessible. The root cause of why the partition disappeared is unclear, we suspect the size of the volume may have changed due to a bug/defect within the raid controller. It is possible that with further examination we may be able to recover complete or partial data, we cannot make any guarantees, at this time no data is available. If there is any data which is of particular importance and you can provide the complete filename, we will do our best to recover it through some alternate methods.

We will now begin a recovery operation to re-create VPS based on XN9, onto alternative host machines. Managed servers will include restoration of backups where available.

We sincerely apologise for this inconvenience and will continue to work with our customers to restore service as quickly as possible.

Update 3: All VPS have been migrated to alternate hardware.

[Resolved] XN10 Packet loss

Starting at approximately 22:20 GMT+1, xn10 was experiencing high packet loss. Due to limited access to the server we gracefully shutdown all VM’s and brought them back up to make some configuration changes. Apologies for the reboot.

We have now identified one VM is the destination of a low bandwidth, high concurrency DoS attack which has now been null routed and we continue to monitor.

[Resolved] XN1 Outage

We are aware of an issue with our XN1 node, we have identified the issue and are working to resolve it as quickly as possible.

Updates will be provided here.

Update @ 16:52: DC Staff are running slow, frustrating but nothing we can do unfortunately to speed this up.

Update: This is now resolved.

[Resolved] xn3 outage

xn3.pcsmarthosting.co.uk currently has issues with the RAID controller which we are working to resolve. Further updates will follow.

Update: We’ve traced this to a bug in the upstream Linux 4.9 kernel. The raid controller and array appear to be healthy. We’ll bring the server back up on a secondary kernel to restore service. Further investigations will be carried out in our test environment.

[Resolved] XN10 Outage

We have been made aware of an issue with XN10 by monitoring and are currently investigating the issue, further updates will be provided here when we have them

Update: This has now been resolved.

[Resolved] Web1 Read Only

Our web1 server has currently gone into a read only state and our technicians are currently investigating this.

Updates will be provided here when they’re available.

Update: Onsite staff are currently hooking up a crash cart to this machine.

Update: It appears a hard disk in the RAID10 array has failed, and caused the controller to hang. This is rare but it can happen. There are some filesystem inconsistencies we are working to repair then the server will be brought back online.

Update: Damage has been repaired and didn’t look too bad, just doing a 2nd pass now to be absolutely sure and will then boot.

Update: Server is now back online and the RAID array is rebuilding.

[Resolved] VZ1 Outage

We are currently aware of an issue with our VZ1 Node, we are investigating this and hope to have service restored in the next 30 minutes.

Update: This has now been resolved and the machine is back online.

[Resolved] Billing and Ticketing DDoS Outage

We are currently facing a large DDoS attack on the offsite DataCentre we use for our billing and ticketing systems, the incident is being treated as a priority 1 and we will post updates here as we get them.

We offload our billing and ticketing systems to a different DataCentre for redundancy so should our primary DataCentre be unavailable access to your billing and ticketing systems would be unaffected.

Update: This has now been mitigated and the issue resolved.

[Resolved] XN10 Unresponsive

XN10 has currently gone unresponsive and is being investigated, due to the history of recent outages on this machine it’s likely we will be replacing this machine in the next few minutes.

Updates will be provided here as usual.

Update: XN10 is now back online and all VPS up and running, the move to new hardware has been postponed for a couple of days.