Swatting flies
I read with interest the post by Security Monkey today about swatting flies, not because I don’t agree with him about his point, but his example really brought to mind another of my pet peeves about working in IT generally, but Help Desk specifically. Let’s start with his example:
Your help desk reports to you (the local security monkey) that users are no longer able to browse the internet via your proxy server. The users say that they keep getting “Cannot load page” messages.
A quick test from your desk works just fine. Obviously the users are just doing something wrong, right? All 200 of them!You check the proxy server, the authentication daemon, the internet connection and then read all 200 service tickets again.
“It must be a client configuration error. Call up all the users and walk them through configuring their browser to use the proxy.”
Several hours later (and $X^100 of wasted staff salary) you strike up a conversation with one of the infrastructure network engineers. He mentions a recent struggle with running out of address space on a few of the user subnets and then mentions a change on the firewall which performs a dynamic network address translation function to make all the users appear to come from one reserved IP in the DMZ.
You nearly drop your coffee cup. Why?
You know that the proxy server is configured to only respond to private address space behind the firewall, not the firewall’s actual IP address.
A quick dash to your desk and many minutes later you reconfigure the proxy and users start to gain internet access again.
In other words, the hours that you spent swatting at that fly with the flyswatter could have been saved if you just realized that the fly was buzzing around your head because your shampoo smelled like a rotting animal corpse.
Now when I read this example, my first thought was not that there was time wasted not looking deep enough into the problem. There was, no question, but even more aggravating, there was help desk and security pro’s time wasted tracking down something that should have been documented and communicated throughout the IT department. It’d be nice if everyone was communicating about the infrastructure, and the network guy was aware that the proxy server would be affected by the firewall change, and communicated that to the appropriate folks, and the help desk was notified that there was a firewall configuration change made and to make note of any problems that resulted from the change. Surely a help desk guy who knows the networking folks made a firewall change, who suddenly sees 200 tickets relating to Internet access would connect the dots and hand it off to the networking folks who made the change to investigate instead of the security guy who hasn’t made any changes to the proxy server, right?
Unfortunately, I know all to well that in the real world, this lack of documentation and communication is all too common.
Follow these topics: HelpDesk
Being one of the network guys I too lament the lack of communication between the teams and departments. Unfortunately, the helpdesk will bear the brunt of the displeasure of the users when we make a mistake. The problem, as I see it, is that any network / server infrastructure containing more than a few routers, switches and servers, will eventually become so complex and crufty that it is impossible for any one person to know and predict the full extent of the fallout when a change is made. I have, on several occasions, made minor changes that caused site-wide outages 🙂 You could argue that if everything was correctly and adequately documented such mistakes should not occur. I would argue that despite such documentation, with the pressure from management to deliver solutions and services, no engineer will have the time to fully read up on the system he is working on in order to assess the potential ripple effects caused by a change. There you have it – my 0.02$, for what it’s worth.