08-14-2003 4:15pm Onwards, Continent wide power issues

Announcements concerning Networking & Related News, Planned Outages, Anything which may affect your services.

Moderator: Admins

porcupine
Site Admin
Posts: 704
Joined: Wed Jun 12, 2002 5:57 pm
Location: Toronto, Ontario
Contact:

08-14-2003 4:15pm Onwards, Continent wide power issues

Post by porcupine »

Hello everyone,

Well it seems this is just not a good month in general.

At 4:15PM EST, power failed across most of the north eastern states, and most of ontario (which contains more then half of Canada's population). At the time, the UPS systems kicked into place immediatly, as did some of the generators down at 151 Front Street. After about 45 minutes, one of the UPS chains died out completely, causing power failures to several servers, and some of our swithces/routing gear.

At the time, equipment was moved over to circuits on a second battery chain. Shortly after, another unrelated battery chain in 151 Front reached critical temperatures, and was shutdown. At this time, our peer1 link disconnected, because this chain was powering a DC power plant, which their core Cisco GSR 12,000 router was on [On a sidenote, i asked why not have one AC power, on DC, instead of redundant DC power supplies, and apparently it creates noise in the chassis]. Shortly thereafter, 360 Networks/Group Telecomm's fiber died, apparently their equipment was shutdown (unclear as to exact reason/nature, as they're in a different location).

At this point, everything went down, for several hours. At this point, Peer1 decided to route us through a different router out in the 1 Young facility (as their pop>pop fiber was still up VIA their Cisco 3500 series switching gear). This got us back online for several more hours, but with a few hiccups (peer1 lost their cable and wireless link down in NYC, and their global crossing link as well), as a result, there was signifigantly heavier traffic going through their Toronto POP, and several reboots were done to the switches, route adjustmenets, etc.

During the wee-hours of the morning, their connectivity down in NYC decided to come back online, and routes began to return to normal. Additionally, DC power was restored to their router, and our connectivity through peer1 returned to normal. After a few routing issues, and some tweaking, all traffic is currently flowing smoothly out peer1. The Istop network is currently 100% down, Peer1 is up, and seemingly so without too many issues.

Traffic will continue to route exclusively through peer1's connectivity until the Istop network gets back online, at which point there will be some slight hiccups, as the routes switch back over to their normal route. Thats the long and the short of it, should be interesting to see how this catastrophe pans out longterm.

For anyone whose server was on the first UPS string and got needlessly rebooted, we're sorry, and we hope you didn't loose any uptime records :).
Myles Loosley-Millman
Priority Colo Inc.
myles@prioritycolo.com
http://www.prioritycolo.com
Emboss
newbie
Posts: 2
Joined: Fri Aug 15, 2003 12:12 pm

Post by Emboss »

Hats off for keeping things going Myles. I guess, things kind of work out for the best with the new Peer1 link and the new router. I guess the folks at 151 are going to look into the issues with power backup, and be prepared next time around.

How much total downtime did you experience on the PriorityColo network?
Alexander
not so much a newbie
Posts: 50
Joined: Fri Dec 20, 2002 2:52 pm

Post by Alexander »

The traffic stats are all stuck (which scared me to death thinking only the main site was up ..) Fortunatly that seems to be the only thing that is not working atm (afiak) :)

-> http://www.prioritycolo.com/members.shtml

Thanks heaps Myles for doing your best in keeping our websites online!
Alexander
porcupine
Site Admin
Posts: 704
Joined: Wed Jun 12, 2002 5:57 pm
Location: Toronto, Ontario
Contact:

Post by porcupine »

Emboss wrote:Hats off for keeping things going Myles. I guess, things kind of work out for the best with the new Peer1 link and the new router. I guess the folks at 151 are going to look into the issues with power backup, and be prepared next time around.

How much total downtime did you experience on the PriorityColo network?
Different segments experienced different amounts. Most of them experienced around 3-4 hours, when both GT/360 Networks lost their fiber routes completely (killing istop), and peer1 lost their router.

Sad thing is, Istop is still down, and most of what happened last night violated our contracts with the space provider in the building, in nearly half a dozen ways (for guarantee's of preparedness, procedures, etc.), thus I'd imagine it's going to get messy in the next few months, as there were several threats of lawsuits (against the space provider, the people who were contracted for the past half dozen years or so to provide additional generators, and paid monthly for a "reassurance, you're on the list" type of thing, etc.).
Myles Loosley-Millman
Priority Colo Inc.
myles@prioritycolo.com
http://www.prioritycolo.com
porcupine
Site Admin
Posts: 704
Joined: Wed Jun 12, 2002 5:57 pm
Location: Toronto, Ontario
Contact:

Post by porcupine »

As of about 1:00am, Istop's link is officially back. That makes everything pretty much back to normal. To my understanding, good chunks of Toronto are still without power, but 151 Front is running on its regular power feeds as it has some one completely dedicated feed (and is probably a squeaky wheel in the scheme of people whining for power), along with other feeds from other grids.

To my understanding, generators are well topped with fuel, there are still additional generators parked out back in case of emergency, and things are begining to wind down.
Myles Loosley-Millman
Priority Colo Inc.
myles@prioritycolo.com
http://www.prioritycolo.com
ZeroFill
newbie
Posts: 32
Joined: Wed May 28, 2003 1:59 am

Post by ZeroFill »

any eta when www5's ftp will be fixed again? before the outage, the ftp login delay was fixed; before the fix, there existed a ~12 second delay before it would display the motd/welcome message. this delay is now back :cry:

this occurance is not isolated with my isp/network as i've had other friends with different isp's try and get the same results. other ftp servers such as ftp.cdrom.com login instantly (as it should).

i'm trying to connect at 2am PST (5am EST) so i doubt the server's sockets are overloaded.
porcupine
Site Admin
Posts: 704
Joined: Wed Jun 12, 2002 5:57 pm
Location: Toronto, Ontario
Contact:

Post by porcupine »

ZeroFill wrote:any eta when www5's ftp will be fixed again? before the outage, the ftp login delay was fixed; before the fix, there existed a ~12 second delay before it would display the motd/welcome message. this delay is now back :cry:

this occurance is not isolated with my isp/network as i've had other friends with different isp's try and get the same results. other ftp servers such as ftp.cdrom.com login instantly (as it should).

i'm trying to connect at 2am PST (5am EST) so i doubt the server's sockets are overloaded.
This is what i've found while testing:

ours.prioritycolo.com
[porcupine@www1:~]$ time ftp www5
Connected to www5.prioritycolo.com.
[...]
220 ProFTPD 1.2.8 Server (ProFTPD) [66.11.162.63]
Name (www5:porcupine): ^C
real 0m1.025s
user 0m0.005s
sys 0m0.001s
(1.025 seconds)

snickers.org
[spike:porcupine] ~: time ftp www5.pcdc.net
Connected to www5.pcdc.net.
[...]
220 ProFTPD 1.2.8 Server (ProFTPD) [66.11.162.63]
Name (www5.pcdc.net:porcupine): ^C
0.000u
0.006s
0:10.95 0.0%
0+0k 2+0io 3pf+0w
(10.95 seconds)

straynet.com
[pc@voyager] /home/pc: time ftp www5.pcdc.net
Connected to www5.pcdc.net.
[...]
220 ProFTPD 1.2.8 Server (ProFTPD) [66.11.162.63]
Name (www5.pcdc.net:pc): ^C
real 0m1.057s
user 0m0.000s
sys 0m0.008s
(1.057 seconds)

tpconsulting
root@server1 [~]# time ftp www5.pcdc.net
Connected to www5.pcdc.net (66.11.162.63).
[...]
220 ProFTPD 1.2.8 Server (ProFTPD) [66.11.162.63]
Name (www5:porcupine): ^C
real 0m1.025s
user 0m0.005s
sys 0m0.001s
(1.025 seconds)

As you can see, this is not limited to just you as snickers.org caused it, but I've no idea why some are getting < 1.25 seconds, and some are getting > 10 seconds. It'll be looked into, but since this doesen't affect the speed (and may well be the configuration of the other systems, eg. ignoring an acknowledgement through firewall filtering, or something on that level), this will not be a high priority request as it has no affect on the functionality of the FTPD (as it runs at normal speed right?). Are you running a firewall on the client attempting to connect?
Myles Loosley-Millman
Priority Colo Inc.
myles@prioritycolo.com
http://www.prioritycolo.com
ZeroFill
newbie
Posts: 32
Joined: Wed May 28, 2003 1:59 am

Post by ZeroFill »

yes, i do have a firewall (NAT + Norton) but i doubt this a factor as i was connecting "normally" after the servers were switched over and before the power outage.

here's a little time-line

start---<12sec>---migration---<"normal">---blackout---<12sec>--now
porcupine
Site Admin
Posts: 704
Joined: Wed Jun 12, 2002 5:57 pm
Location: Toronto, Ontario
Contact:

Post by porcupine »

ZeroFill wrote:yes, i do have a firewall (NAT + Norton) but i doubt this a factor as i was connecting "normally" after the servers were switched over and before the power outage.

here's a little time-line

start---<12sec>---migration---<"normal">---blackout---<12sec>--now
After doing some investigation of snickers.org (the server doing the exact same thing as yours), i Noted the following:

[pc@voyager] /home/pc: time host 216.220.40.220
220.40.220.216.IN-ADDR.ARPA domain name pointer 220.40.220-216.q9.net

real 0m10.102s
user 0m0.001s
sys 0m0.004s

Based on your posts on this forum:

[pc@voyager] /home/pc: time host 24.205.164.195
195.164.205.24.IN-ADDR.ARPA domain name pointer 24-205-164-195.wc-eres.charterpi
peline.net

real 0m10.097s
user 0m0.000s
sys 0m0.004s

Charterpipeline.net takes 10.097 Seconds for your IP address to resolve to it's hostname, coincidence, I think not :). Perhaps your ISP's DNS servers are not running at optimal since the power outage. Incidentally this test was conducted inside, and outside our network for both entries (eg. it's not our server getting a slow resolve for your DNS, it's the 3-4 I personally use when testing stuff, all on different networks, in different locations around the continent).
Myles Loosley-Millman
Priority Colo Inc.
myles@prioritycolo.com
http://www.prioritycolo.com
ZeroFill
newbie
Posts: 32
Joined: Wed May 28, 2003 1:59 am

Post by ZeroFill »

that's really strange. i wonder why all other ftp servers i connect to are fine and the only one that takes a long time is prioritycolo.
Last edited by ZeroFill on Tue Aug 19, 2003 2:53 pm, edited 2 times in total.
porcupine
Site Admin
Posts: 704
Joined: Wed Jun 12, 2002 5:57 pm
Location: Toronto, Ontario
Contact:

Post by porcupine »

ZeroFill wrote:that's really strange. i wonder why all other ftp servers i connect to are fine and the only one that takes a long time is prioritycolo.
They might not qualify this during the login process (aka they might not check your RDNS before allowing you to connect). Attempt to connect to the other 3 reseller servers and see if you get the same results (which is fairly likely as they're identically configured, unless RDNS caching comes into play somewhere in there.
Myles Loosley-Millman
Priority Colo Inc.
myles@prioritycolo.com
http://www.prioritycolo.com
ZeroFill
newbie
Posts: 32
Joined: Wed May 28, 2003 1:59 am

Post by ZeroFill »

when i try doing the same command from www5, i get different results

14:49:23 union:~
$ time host 24.205.164.195
195.164.205.24.in-addr.arpa domain name pointer 24-205-164-195.wc-eres.charterpipeline.net.

real 0m1.344s
user 0m0.000s
sys 0m0.010s
porcupine
Site Admin
Posts: 704
Joined: Wed Jun 12, 2002 5:57 pm
Location: Toronto, Ontario
Contact:

Post by porcupine »

ZeroFill wrote:when i try doing the same command from www5, i get different results

14:49:23 union:~
$ time host 24.205.164.195
195.164.205.24.in-addr.arpa domain name pointer 24-205-164-195.wc-eres.charterpipeline.net.

real 0m1.344s
user 0m0.000s
sys 0m0.010s
Chances are that either your ISP has fixed the problem with their RDNS taking too long to resolve, or www5 just has the entry cached.
Myles Loosley-Millman
Priority Colo Inc.
myles@prioritycolo.com
http://www.prioritycolo.com
ZeroFill
newbie
Posts: 32
Joined: Wed May 28, 2003 1:59 am

Post by ZeroFill »

ftp login still takes a while. when doing the command again, real time is 0m0.014s so i guess it was cached after the first time i ran the command (quite a difference from 1sec to .014 sec), ie: it wasn't cached the first time.

i'm sorry if i dont understand what is going on as i am not as technically inclined as you, but what would cause all other ftp servers (that i have tried) to allow me to login quickly and prioritycolo to be the only hiccup? it isnt the ftpd software as i can connect to other ProFTPd sites fine.
porcupine
Site Admin
Posts: 704
Joined: Wed Jun 12, 2002 5:57 pm
Location: Toronto, Ontario
Contact:

Post by porcupine »

ZeroFill wrote:ftp login still takes a while. when doing the command again, real time is 0m0.014s so i guess it was cached after the first time i ran the command (quite a difference from 1sec to .014 sec), ie: it wasn't cached the first time.

i'm sorry if i dont understand what is going on as i am not as technically inclined as you, but what would cause all other ftp servers (that i have tried) to allow me to login quickly and prioritycolo to be the only hiccup? it isnt the ftpd software as i can connect to other ProFTPd sites fine.
It's the way they're configured, they gather certain data from everyone who connects. Try from your home to connect to www4.pcdc.net, www5.pcdc.net, and www6.pcdc.net, they should all take about the same amount of time.
Myles Loosley-Millman
Priority Colo Inc.
myles@prioritycolo.com
http://www.prioritycolo.com
Post Reply