04-Sep-2004 - SW01 Downtime

Announcements concerning Networking & Related News, Planned Outages, Anything which may affect your services.

Moderator: Admins

Post Reply
porcupine
Site Admin
Posts: 703
Joined: Wed Jun 12, 2002 5:57 pm
Location: Toronto, Ontario
Contact:

04-Sep-2004 - SW01 Downtime

Post by porcupine »

Hi guys,

Well it looks like this month is due to keep us on our toes, only the manner is unfortunate. Any customers on the sw01 Switch will have noticed that there was roughtly 15-20 minutes of downtime.

While re-organizing some of the cables, the secondary power supply for this switch was unplugged to be re-routed. This being a completely routine activity, I expected no impact. When we unplug a power supply, we first turn it off (if possible), re-route cables as necessary (eg. when tidy'ing up, etc.) and then we leave the power supply off for a couple of minutes as "cool off" time. This is to try and keep everything in good condition (and also helps to make sure the power supplies are reasonably tested against failure every now and then).

Unfortunatly in this case, the power circuit that the primary power supply was on blew. This was frighteningly similar to what happened just two days ago with one of the other power circuits. We use a clamp-amp to monitor, and mesure our circuit loads whenever servers are added, and about once a month to try and avoid situations like this. The circuit was measured and recorded two days ago (day of the previous incident) at 14.5A, and today with a low of 14.0 and a high of 14.2 amps. For a 20A circuit, this should be perfectly fine.

I've offloaded some of the current from this circuit in a similar manner to other ones, and noted that it was "full". Though it's only operating at 70% of its potential load, and *should* be well within expected parameters (and similar electrical/fire codes for that matter, which dictate circuits should be loaded up to 80% of their capacity (16A in this case).

I appologise for anyone who has been inconvinienced by this, and have since scheduled an appointment to discuss the matter with our Switch and Data CS Rep. We'll be turning up additional power circuits soon as a result of this (and setting a lower "max" on what we use), but we still hope to get some form of explanation of why this is happening.

Notably when power was turned back on to this switch, none of the APC's were using a staggered powerup configuration. For anyone who isn't familiar with what this means, it simply means that all of the equipment "flicks" on at once. Staggered powerups prevent circuit overloading when restarting equipment. As such, when the circuit was turned back on, power consumption would be at, or very near the peak usage that one could expect from their equipment, thus any potential circuit overload problems would immediatly shut the circuit back down, as they tripped the breaker.

If you have any questions, comments, or suggestions on this matter, please post or email me, as I'd love to hear it (as the more minds, the merrier).

Regards,
Myles Loosley-Millman
Priority Colo Inc.
myles@prioritycolo.com
http://www.prioritycolo.com
Post Reply