05-Oct-2007 - csr01.tor1 RSM Failure/Crash

Announcements concerning Networking & Related News, Planned Outages, Anything which may affect your services.

Moderator: Admins

Post Reply
porcupine
Site Admin
Posts: 703
Joined: Wed Jun 12, 2002 5:57 pm
Location: Toronto, Ontario
Contact:

05-Oct-2007 - csr01.tor1 RSM Failure/Crash

Post by porcupine »

Hi Guys,

At 5:36AM EST today, our csr01.tor1 RSM module (on the sw01.tor1 legacy switch) disconnected itself from the switch fabric, and shutdown all currently routed VLAN sessions on the switch in question, for reasons unknown (IOS bug/crash).

The problem was remedied shortly after, by recording the log files from the device (for later analysis), saving/backing up the current configurations (just in case), and forcing the RSM to reload its configuration/operating system, and reinitialize itself.

The sw01.tor1 switch had well over 1 year of uptime since its last IOS upgrade, and had been performing reliably since its installation in early 2003.

We do currently have plans to remove the sw01.tor1 switch from production by year end, and do not expect to see any further related incidents of this nature. If we do see any further problems from the RSM module, we will simply offload the routing from this device to another Cisco router. Should there be any issues with sw01.tor1's core components, we have multiple cold-spare units that it can be immediately replaced with.

I'd like to thank the few customers who were affected by this for their patience regarding this incident. Thankfully the IOS failure struck during the dead of night, and the fix did not require any hardware replacements, etc.

Regards,
Myles Loosley-Millman
Priority Colo Inc.
myles@prioritycolo.com
http://www.prioritycolo.com
Post Reply