Log in

No account? Create an account
In the last 48 hours . . . - Doug Ayen's Blacksmithing Blog [entries|archive|friends|userinfo]
Doug Ayen

[ userinfo | livejournal userinfo ]
[ archive | journal archive ]

In the last 48 hours . . . [Feb. 12th, 2009|10:41 pm]
Doug Ayen
In the last 48 hours, I have:
*discover my cell phone is dead right at the start of a maintenance where having 2 phone lines would be really handy.
*migrate 3 customers, 6 sonet circuits, twelve GigE circuits, and a half dozen peers onto new equipment.
*Perform the first field trial of the switch memory upgrade process on two switches, with our best and most experienced field engineer acting as remote hands. Both switches fail and require intervention to recover using the standard process. What was claimed to take 5 minutes, and proven in the lab to take ten, took over half an hour in the field.
*Troubleshoot the ten percent or so of customers who, as always, failed to recover after the maintenance.
*drag my tired ass into work, arrange for a replacement cell, organize and coordinate everything for the night's maintenance, all while dealing with escalations every 9 minutes.
*Realize at that I'm going to need an additional field engineer due to mission creep and the spectacular failure of the last night's test. This finally is arranged at 9pm.
*upgrade memory using remote hands for eleven switches.
*troubleshoot the four switches that fail during the upgrade process
*regenerate and restore configs from scratch on one of said switches using the oob.
*on another of those switches, the out of band management connection went first to another switch, then back through the failed switch before hitting the network. So, when the switch failed, so did it's out of band access. So, I also spent several hours talking the field engineer through the whole configuration from scratch and then the troubleshooting processes until I can get into the switch.
*Re-do most of the configuration to get most the customers back up.
*Troubleshoot the ten percent or so of customers who, as always, failed to recover after the maintenance.
*Try, and fail to nap.
*Come back to work for just a few hours to pick up the new cell phone (blackberry 8330).
*Discover I'm the only sr. engineer in the house, and my boss is out of the office, so immediately get hit with escalations, questions, and calls.
*Watch half the backbone melt down when a single fiber cut destroys the other half.
*Nervously resist the urge to interfere while the new router load balancing does, eventually, figure out how to shoehorn a hundred gigs of traffic into 95 gigs of capacity, somehow, without significant packet loss. (Hint: traffic doesn't have to take an optimal path, it just has to get there, maybe through Phoenix on its way from seattle to chicago)
*Discover that one of the upgraded switches has lost access to half its memory, rendering the upgrade useless and requiring another maintenance window to fix.
*deal with another switch as it goes tits up, forcing it over to the backup supervisor card. Arrange for FE to go out, attempt to reseat card, and when that doesn't work, move the connections on that card to another.
*Get a reminder memo from my boss that I need to schedule another 60+ switch upgrades ASAP.
*Go home, write this, get ready to go to bed.

Notice the lack of any actual sleep in there.

So, what did you do in the last 48 hours?

[User Picture]From: perspicuity
2009-02-13 04:57 am (UTC)
looked for work
got reading glasses
paid bills wishing i had a job
looked for work

i think they keep you so busy so you won't have time to defect.

do these people who run YOU get sleep as well? why the emergency mindset?

(Reply) (Thread)
[User Picture]From: blackanvil
2009-02-13 04:19 pm (UTC)
Of course they do. The emergency is because these switches have a memory leak, and so are crashing every three months or so.
(Reply) (Parent) (Thread)
From: n5red
2009-02-13 05:01 am (UTC)
Nothing like that. Please take care of yourself, you have a lot of friends who worry about you.
(Reply) (Thread)