www6 down - 18-Nov-2004

Announcements concerning Networking & Related News, Planned Outages, Anything which may affect your services.

Moderator: Admins

Post Reply
porcupine
Site Admin
Posts: 704
Joined: Wed Jun 12, 2002 5:57 pm
Location: Toronto, Ontario
Contact:

www6 down - 18-Nov-2004

Post by porcupine »

Hi guys,

As of around 2:45pm, www6 server has gone down. We're uncertain of the cause, but are considering swapping the drives to a new, nearly identical system. Currently www6 is sitting in fsck mode (though it should not be) scanning the drives for errors, and refuses to boot until this is complete.

I'm currently heading onsite to oversee this operation.

Anyone who wishes to be transferred to another server, we provide this service at no cost to you, and would be happy to take requests for this if anyone feels this is necessary after the recent issues with the www6 server, as the cause is not concrete.

Regards,
Myles Loosley-Millman
Priority Colo Inc.
myles@prioritycolo.com
http://www.prioritycolo.com
porcupine
Site Admin
Posts: 704
Joined: Wed Jun 12, 2002 5:57 pm
Location: Toronto, Ontario
Contact:

Post by porcupine »

www6 is back up. The problem was CPanel creating the label "/backup" in /etc/fstab when the label already existed, but as a direct reference. The system was unable to resolve the label at boot time, and thus did not come back up.

I am considering moving the two drives in this system to another, nearly-identical configuration (different motherboard, NIC, and Memory) to leave this system offline for RAM testing. We will notify users of this system VIA email if this is to be done, and perform this on an emergency-maintenance basis. This will allow us time to diagnose the system in detail (hardware wise) without having the users offline for the duration of the tests (which take several hours), to determine if there are physical defects with the boards, etc.
Myles Loosley-Millman
Priority Colo Inc.
myles@prioritycolo.com
http://www.prioritycolo.com
porcupine
Site Admin
Posts: 704
Joined: Wed Jun 12, 2002 5:57 pm
Location: Toronto, Ontario
Contact:

Post by porcupine »

We will be performing emergency-maintenance on the www6 server in the next 10-20 minutes.

We're going to attempt swapping the memory. We will be replacing the 1GB of DDR400 ECC Registered memory in the system with 2GB of DDR ECC Registered Memory. The memory is the exact match for this board, and we feel that if we're going to swap the memory, we might as well upgrade it during this process.

After this we will be removing the old memory from the system and putting it in a seperate system to be tested. We expect this maintenance period will have an actual impact of less then 10 minutes, with a target impact of 3-5 minutes.

Sorry for any inconvinience this may cause.

Regards,
Myles Loosley-Millman
Priority Colo Inc.
myles@prioritycolo.com
http://www.prioritycolo.com
porcupine
Site Admin
Posts: 704
Joined: Wed Jun 12, 2002 5:57 pm
Location: Toronto, Ontario
Contact:

Post by porcupine »

Maintenance has been completed, and the timing goals were met without issue. The network monitor didn't even pick up a single service outage (and it runs monitors the various services every 3 - 5 minutes).

We've hooked up another near identical server, and are running the memory tests on this server at the present time.

Thanks for your patience,
Myles Loosley-Millman
Priority Colo Inc.
myles@prioritycolo.com
http://www.prioritycolo.com
Post Reply