Hi Folks,
Please be advised of the following maintenance window:
Maintenance Window: Tuesday August 15th, 12:01am - 2:00am Eastern time
Facility: Tor2
Affected Device(s): dist03.tor2
Affected Customers: Customers connected to dist03.tor2
Service Impacting: No - We do not expect this to be a service impacting maintenance, this is a precautionary notice only.
Details: As some customers may have noticed, our dist03.tor2 distribution switch/router went offline on Sunday Aug 13th @ 2:40am for approximately 7 minutes. This was the result of an unexpected reboot, which was the result of a "bus error".
We've isolated the diagnostic/error codes to a region of memory on dist03.tor2's supervisor card which stores the IOS image (operating system) during operation. We suspect the supervisor may have a bad region of memory given this result.
Prior to this event, dist03.tor2 (and its supervisor) enjoyed an uptime of just over 7 years, and it’s possible this was also the result of a slow memory leak (IE: to encounter that specific region of memory).
To address this, we’re scheduling an emergency maintenance window to insert a secondary supervisor into slot 6 of dist03.tor2. The secondary supervisor should automatically go into hot-spare mode, and synchronize its configuration/parameters/etc. with the primary supervisor upon insertion. This will ensure if another event is to occur, the system will fail-over to the secondary supervisor, protecting the chassis from further events.
This is an interim measure, until the replacement/upgrade of the supervisor can take place sometime in the coming months (a project which had been performed on dist04.tor2 back in 2020, but was not performed on dist03.tor2 due to several customer requests at the time). We will schedule another maintenance window for that upgrade at a later date/time.
Dist03.tor2 scheduled emergency maintenance - 15-Aug-2023 - 12:01am
Moderator: Admins