Categories
Planned Maintainance

[Completed] Scheduled reboots

Tonight between 22:00 and 00:00 (GMT+1) we will be rebooting the below servers in order to boot into new kernels which address a recent Xen security issue. Downtime will vary from machine to machine as each VPS shuts down and boots up individually.

Any issues will be posted here. Apologies for any inconvenience.

Categories
Outages

[Resolved] xn3 Down

xn3.pcsmarthosting.co.uk is currently down and this is the machine our helpdesk resides on as well.

We are working on this and will update you shortly.

Update: The system has restarted however its failing to startup as a few init scripts appear to be damaged. We are working on this.
Update: Sorry for the delay folks, the datacentre is being quite slow to respond at the moment.

Update: The system has been fully restored, all VPS are up. Thanks

Categories
Outages

[Resolved] xn2 issues

The RAID array has gone inoperable on xn2 despite being optimal with all discs present only a few days ago.

Unit  UnitType  Status         %RCmpl  %V/I/M  Stripe  Size(GB)  Cache  AVrfy
——————————————————————————
u0    RAID-10   INOPERABLE     –       –       64K     596.025   OFF    ON

We are hoping this is a malfunction of the RAID controller and not an actual array issue, currently we are waiting on the datacenter for updates.

Update: 8.12PM – Just called the DC to chase up the reboot request and it’s being done now, apparently they are very busy tonight.
Update: 8.26PM – Machine rebooted but not responsive to ping and no output on the KVM. Is being checked, also preparing to go onsite!
Update 8.35PM – System is now booting up and the RAID array appears intact! Will do some sanity checks etc once it’s up.

Update 8.39PM – It appears that there were multiple drive failures. This machine has a 4x disk RAID10 set.
Drive p0 – We hot-swapped a few days ago and is OK
Drive p1 – Failed, completely dead/undetected
Drive p2 – OK
Drive p3 – Rebuilding

We will let drive p3 finish rebuilding itself. Once that is complete we will replace drive p1, and when that finishes we will also replace drive p3 as a precaution. Please can we ask that you avoid any disk intensive tasks for the next 48 hours, so we can restore full redundancy and performance to the array in a timely fashion.

** At this point ALL VPS should be online. If yours is having issues, log a ticket **

Update 10.11PM – Drive p3 is 90% rebuilding. Will replace the failed p1 disk shortly.
Update 11.13PM – Drive P3 has been fully rebuilt. Drive P1 has been hot-swapped out and is rebuilding.

Update 20.59PM 26/04/12: Full redundancy + performance has now been restored to the RAID array on xn2. We dont expect any further issues but as always we will monitor this server carefully for the time being.

Thanks

Categories
Outages

[Resolved] Scheduled reboots 06/04/12

We are doing a scheduled reboot of the following servers this evening to address a stability issue between the latest Adaptec Firmware and the aacraid driver in the RHEl5 kernel:

xn19.pcsmarthosting.co.uk
xn20.pcsmarthosting.co.uk

This has been completed successfully. RAID arrays on these machines are doing a verify /w fix to ensure consistency.

Categories
Planned Maintainance

[Completed] Scheduled Reboots 02/03/12

We are going to be rebooting the following servers at approx 11PM GMT in order to boot into newer kernels, with numerous security fixes and support for newer OS’:

vz1.pcsmarthosting.co.uk
vz2.pcsmarthosting.co.uk

We don’t expect any issues, but downtime will vary as OpenVZ starts up and shuts down VPS one by one.

vz1 = Reboot completed, VPS are starting up.
vz2
= Reboot completed, VPS are up

Categories
Outages

[Resolved] vz2 unresponsive

vz2 has become unresponsive. We are waiting for KVM to be moved by onsite staff

Update: We have identified the VPS causing this high load, their system has spawned hundreds of processes. We are trying to get things under control with some liberal use of pkill -9 but if that fails will reboot.

Update 2: This has been resolved without reboot, the responsible VPS has been suspended.

Categories
Outages

[Resolved] vz1 Unresponsive

vz1 is currently unresponsive/down and is being investigated

Update: Looks like its just under high load, waiting for onsite hands to connect KVM.

Update 2: One particular VPS was maxing out CPU causing high load. This same VPS has caused issues twice before, and has now been removed from the server. Appologies for the inconvenience.

Approx downtime: 10min

Categories
Planned Maintainance

[Resolved] xn2 drive failure

Our monitoring picked up a failed drive in the RAID10 array on xn2 earlier this evening.

The missbehaving drive has been hot-swapped out and the array is rebuilding. There was no interruption to service, but note I/O will be a little slow while the array restores full redundancy.

Thanks

Categories
Planned Maintainance

[Completed] cp1 maintenance

Following a previous crash due to high load, we are doing a bit of maintenance on this server to ensure everything is running optimally.

During this time there may be some brief load spikes as we apply updates. Appologies for any inconvenience

Update: This has been completed and we are seeing improved disk performance and lower load.

Categories
Outages

[Resolved] cp1 outage

cp1 our main shared/reseller cPanel system has crashed and is rebooting.

Its currently doing an automatic FSCK of the filesystem as it’s last reboot was over 250 days ago. Will be up soon.

Update: Server is now up and load is slowly coming down. Will keep an eye on this for the next 24hrs.