Priority Colo Inc.

Posted: **Fri Aug 15, 2003 8:45 am**

Hello everyone,

Well it seems this is just not a good month in general.

At 4:15PM EST, power failed across most of the north eastern states, and most of ontario (which contains more then half of Canada's population). At the time, the UPS systems kicked into place immediatly, as did some of the generators down at 151 Front Street. After about 45 minutes, one of the UPS chains died out completely, causing power failures to several servers, and some of our swithces/routing gear.

At the time, equipment was moved over to circuits on a second battery chain. Shortly after, another unrelated battery chain in 151 Front reached critical temperatures, and was shutdown. At this time, our peer1 link disconnected, because this chain was powering a DC power plant, which their core Cisco GSR 12,000 router was on [On a sidenote, i asked why not have one AC power, on DC, instead of redundant DC power supplies, and apparently it creates noise in the chassis]. Shortly thereafter, 360 Networks/Group Telecomm's fiber died, apparently their equipment was shutdown (unclear as to exact reason/nature, as they're in a different location).

At this point, everything went down, for several hours. At this point, Peer1 decided to route us through a different router out in the 1 Young facility (as their pop>pop fiber was still up VIA their Cisco 3500 series switching gear). This got us back online for several more hours, but with a few hiccups (peer1 lost their cable and wireless link down in NYC, and their global crossing link as well), as a result, there was signifigantly heavier traffic going through their Toronto POP, and several reboots were done to the switches, route adjustmenets, etc.

During the wee-hours of the morning, their connectivity down in NYC decided to come back online, and routes began to return to normal. Additionally, DC power was restored to their router, and our connectivity through peer1 returned to normal. After a few routing issues, and some tweaking, all traffic is currently flowing smoothly out peer1. The Istop network is currently 100% down, Peer1 is up, and seemingly so without too many issues.

Traffic will continue to route exclusively through peer1's connectivity until the Istop network gets back online, at which point there will be some slight hiccups, as the routes switch back over to their normal route. Thats the long and the short of it, should be interesting to see how this catastrophe pans out longterm.

For anyone whose server was on the first UPS string and got needlessly rebooted, we're sorry, and we hope you didn't loose any uptime records

.

Posted: **Fri Aug 15, 2003 12:16 pm**

Hats off for keeping things going Myles. I guess, things kind of work out for the best with the new Peer1 link and the new router. I guess the folks at 151 are going to look into the issues with power backup, and be prepared next time around.

How much total downtime did you experience on the PriorityColo network?

Posted: **Fri Aug 15, 2003 2:25 pm**

The traffic stats are all stuck (which scared me to death thinking only the main site was up ..) Fortunatly that seems to be the only thing that is not working atm (afiak)

-> http://www.prioritycolo.com/members.shtml

Thanks heaps Myles for doing your best in keeping our websites online!

Posted: **Fri Aug 15, 2003 3:58 pm**

Emboss wrote:Hats off for keeping things going Myles. I guess, things kind of work out for the best with the new Peer1 link and the new router. I guess the folks at 151 are going to look into the issues with power backup, and be prepared next time around.

How much total downtime did you experience on the PriorityColo network?

Different segments experienced different amounts. Most of them experienced around 3-4 hours, when both GT/360 Networks lost their fiber routes completely (killing istop), and peer1 lost their router.

Sad thing is, Istop is still down, and most of what happened last night violated our contracts with the space provider in the building, in nearly half a dozen ways (for guarantee's of preparedness, procedures, etc.), thus I'd imagine it's going to get messy in the next few months, as there were several threats of lawsuits (against the space provider, the people who were contracted for the past half dozen years or so to provide additional generators, and paid monthly for a "reassurance, you're on the list" type of thing, etc.).

Posted: **Sat Aug 16, 2003 3:16 am**

As of about 1:00am, Istop's link is officially back. That makes everything pretty much back to normal. To my understanding, good chunks of Toronto are still without power, but 151 Front is running on its regular power feeds as it has some one completely dedicated feed (and is probably a squeaky wheel in the scheme of people whining for power), along with other feeds from other grids.

To my understanding, generators are well topped with fuel, there are still additional generators parked out back in case of emergency, and things are begining to wind down.

Posted: **Mon Aug 18, 2003 5:05 am**

any eta when www5's ftp will be fixed again? before the outage, the ftp login delay was fixed; before the fix, there existed a ~12 second delay before it would display the motd/welcome message. this delay is now back

this occurance is not isolated with my isp/network as i've had other friends with different isp's try and get the same results. other ftp servers such as ftp.cdrom.com login instantly (as it should).

i'm trying to connect at 2am PST (5am EST) so i doubt the server's sockets are overloaded.

Posted: **Tue Aug 19, 2003 3:30 am**

ZeroFill wrote:any eta when www5's ftp will be fixed again? before the outage, the ftp login delay was fixed; before the fix, there existed a ~12 second delay before it would display the motd/welcome message. this delay is now back

this occurance is not isolated with my isp/network as i've had other friends with different isp's try and get the same results. other ftp servers such as ftp.cdrom.com login instantly (as it should).

i'm trying to connect at 2am PST (5am EST) so i doubt the server's sockets are overloaded.

This is what i've found while testing:

ours.prioritycolo.com
[porcupine@www1:~]$ time ftp www5
Connected to www5.prioritycolo.com.
[...]
220 ProFTPD 1.2.8 Server (ProFTPD) [66.11.162.63]
Name (www5:porcupine): ^C
real 0m1.025s
user 0m0.005s
sys 0m0.001s
(1.025 seconds)

snickers.org
[spike:porcupine] ~: time ftp www5.pcdc.net
Connected to www5.pcdc.net.
[...]
220 ProFTPD 1.2.8 Server (ProFTPD) [66.11.162.63]
Name (www5.pcdc.net:porcupine): ^C
0.000u
0.006s
0:10.95 0.0%
0+0k 2+0io 3pf+0w
(10.95 seconds)

straynet.com
[pc@voyager] /home/pc: time ftp www5.pcdc.net
Connected to www5.pcdc.net.
[...]
220 ProFTPD 1.2.8 Server (ProFTPD) [66.11.162.63]
Name (www5.pcdc.net:pc): ^C
real 0m1.057s
user 0m0.000s
sys 0m0.008s
(1.057 seconds)

tpconsulting
root@server1 [~]# time ftp www5.pcdc.net
Connected to www5.pcdc.net (66.11.162.63).
[...]
220 ProFTPD 1.2.8 Server (ProFTPD) [66.11.162.63]
Name (www5:porcupine): ^C
real 0m1.025s
user 0m0.005s
sys 0m0.001s
(1.025 seconds)

As you can see, this is not limited to just you as snickers.org caused it, but I've no idea why some are getting < 1.25 seconds, and some are getting > 10 seconds. It'll be looked into, but since this doesen't affect the speed (and may well be the configuration of the other systems, eg. ignoring an acknowledgement through firewall filtering, or something on that level), this will not be a high priority request as it has no affect on the functionality of the FTPD (as it runs at normal speed right?). Are you running a firewall on the client attempting to connect?

Posted: **Tue Aug 19, 2003 3:39 am**

yes, i do have a firewall (NAT + Norton) but i doubt this a factor as i was connecting "normally" after the servers were switched over and before the power outage.

here's a little time-line

start---<12sec>---migration---<"normal">---blackout---<12sec>--now

Posted: **Tue Aug 19, 2003 12:23 pm**

ZeroFill wrote:yes, i do have a firewall (NAT + Norton) but i doubt this a factor as i was connecting "normally" after the servers were switched over and before the power outage.

here's a little time-line

start---<12sec>---migration---<"normal">---blackout---<12sec>--now

After doing some investigation of snickers.org (the server doing the exact same thing as yours), i Noted the following:

[pc@voyager] /home/pc: time host 216.220.40.220
220.40.220.216.IN-ADDR.ARPA domain name pointer 220.40.220-216.q9.net

real 0m10.102s
user 0m0.001s
sys 0m0.004s

Based on your posts on this forum:

[pc@voyager] /home/pc: time host 24.205.164.195
195.164.205.24.IN-ADDR.ARPA domain name pointer 24-205-164-195.wc-eres.charterpi
peline.net

real 0m10.097s
user 0m0.000s
sys 0m0.004s

Charterpipeline.net takes 10.097 Seconds for your IP address to resolve to it's hostname, coincidence, I think not

. Perhaps your ISP's DNS servers are not running at optimal since the power outage. Incidentally this test was conducted inside, and outside our network for both entries (eg. it's not our server getting a slow resolve for your DNS, it's the 3-4 I personally use when testing stuff, all on different networks, in different locations around the continent).

Posted: **Tue Aug 19, 2003 2:49 pm**

that's really strange. i wonder why all other ftp servers i connect to are fine and the only one that takes a long time is prioritycolo.

Posted: **Tue Aug 19, 2003 2:51 pm**

ZeroFill wrote:that's really strange. i wonder why all other ftp servers i connect to are fine and the only one that takes a long time is prioritycolo.

They might not qualify this during the login process (aka they might not check your RDNS before allowing you to connect). Attempt to connect to the other 3 reseller servers and see if you get the same results (which is fairly likely as they're identically configured, unless RDNS caching comes into play somewhere in there.

Posted: **Tue Aug 19, 2003 2:53 pm**

when i try doing the same command from www5, i get different results

14:49:23 union:~
$ time host 24.205.164.195
195.164.205.24.in-addr.arpa domain name pointer 24-205-164-195.wc-eres.charterpipeline.net.

real 0m1.344s
user 0m0.000s
sys 0m0.010s

Posted: **Tue Aug 19, 2003 2:57 pm**

ZeroFill wrote:when i try doing the same command from www5, i get different results

14:49:23 union:~
$ time host 24.205.164.195
195.164.205.24.in-addr.arpa domain name pointer 24-205-164-195.wc-eres.charterpipeline.net.

real 0m1.344s
user 0m0.000s
sys 0m0.010s

Chances are that either your ISP has fixed the problem with their RDNS taking too long to resolve, or www5 just has the entry cached.

Posted: **Tue Aug 19, 2003 3:00 pm**

ftp login still takes a while. when doing the command again, real time is 0m0.014s so i guess it was cached after the first time i ran the command (quite a difference from 1sec to .014 sec), ie: it wasn't cached the first time.

i'm sorry if i dont understand what is going on as i am not as technically inclined as you, but what would cause all other ftp servers (that i have tried) to allow me to login quickly and prioritycolo to be the only hiccup? it isnt the ftpd software as i can connect to other ProFTPd sites fine.

Posted: **Tue Aug 19, 2003 3:10 pm**

ZeroFill wrote:ftp login still takes a while. when doing the command again, real time is 0m0.014s so i guess it was cached after the first time i ran the command (quite a difference from 1sec to .014 sec), ie: it wasn't cached the first time.

i'm sorry if i dont understand what is going on as i am not as technically inclined as you, but what would cause all other ftp servers (that i have tried) to allow me to login quickly and prioritycolo to be the only hiccup? it isnt the ftpd software as i can connect to other ProFTPd sites fine.

It's the way they're configured, they gather certain data from everyone who connects. Try from your home to connect to www4.pcdc.net, www5.pcdc.net, and www6.pcdc.net, they should all take about the same amount of time.

Priority Colo Inc.

08-14-2003 4:15pm Onwards, Continent wide power issues

08-14-2003 4:15pm Onwards, Continent wide power issues