2015-04-03

ehowton: (Default)
2015-04-03 12:00 am
Entry tags:

The Monitoring Group


As unix system administrators, we are responsible for every accidental, incorrect, or mistaken incident, even when they fall far, far outside our area of responsibility. The burden of proof (and subsequently the time either closing or rerouting the errant ticket) falls entirely upon us. "How do you know this_is_a_windows_server_hostname is not a unix box? How can you be so sure my_hostname_is_a_networking_appliance isn't one of your unix servers?"

I can be on the phone with someone in the monitoring group, begging them to not generate any incidents from servers which I'm about to shut down during our regularly scheduled maintenance. I repeat to them authorization which covers reboots and downtime - and they verify the authorization real time, with me on the phone with them. When they generate incidents, it cascades into hours of needless work by us, the busy unix administrators, filling out the appropriate online forms verifying the ticket opened was in error despite the error not being ours.

So I'm on the phone with the monitoring group, letting them know I'm about to shut down the server.

"And you're not going to generate any incidents, correct?"

"Correct."

"Ok, I'm shutting down the server now."

"Ok."

[New Incident email alerts me of a host down]

"Did you just generate a ticket?"

"Yes. Sorry."

Every. Scheduled. Maintenance.