Even with a great monitoring system, it can be hard sometimes to keep the noise down. (Indeed, the more powerful the monitoring, the more difficult this can be, as more data is collected and tested, automatically.) And keeping noise down in monitoring is vital, as you do not want staff to start ignoring alerts – which they will if there are too many meaningless alerts.
There are of course best practices to help with this process, but one of the best ways to start attacking your alert noise is also one of the easiest – simply set up a report to highlight where the noise is coming from, and review it once a week.
Under the Reports tab, select New Report, then fill it out as the below – the important thing being to select the report type as Alert Report.
The magic of the report is in the details:
I suggest setting the report to cover the last week, for all hosts (although if you are responsible only for a set of hosts – by all means change the report to only reflect those you are getting alerted about); exclude alerts that occurred during periods of Scheduled DownTime (those alerts would not have been sent out anyway); check the Summarize Alert Counts box, THEN select the sort method of sorting by Alert count. (This sort order is not available until the summarize alert count box is checked.)
Run this report, and you’ll get output like the below:
Which makes it very easy to see that in this case, we could eliminate 80% of the alerts for the last week simply by changing the monitoring on the IPMI event logs of one development host – filtering out alerts, or using SDT, or even disabling that monitoring, given it’s just a development host.
We can then work through the top noise makers, tuning, disabling, or fixing issue (such as increasing the MySQL cache on prod5.iad), which will greatly reduce the amount of alert noise with the least work.
And then we’ll get this report emailed to us every Monday, so we can stay on top of the issues, and keep our monitoring meaningful. That way, we’ll have improved the performance of our systems, eliminated any alert noise, and if we do get an alert – we can be sure it’s meaningful, and that people will react to it.
It’s 6 AM. Bob, an entry-level IT engineer walks into a cold, dark, lonely building – flips on the lights, fires up the coffee pot, and boots up. Depending on what he’s about to see on his computer screen, he knows the fate of the free world could rest in his soft, trembling, sun-starved hands.
Well maybe not the free world, but at least the near-term fate of his company, his company’s clients, and possibly his next paycheck. Bob is the newest engineer for a busy MSP, whose promise to its clients is very simple: your technology will always be up and working!
Fortunately for Bob, his MSP has a great ticketing system, so as soon as his coffee is hot and hard drive warm, he’ll login to his ticketing dashboard, right? Wrong! What?! Bob! What are you logging into?! Oh. Your monitoring application? Really?
Really. True story. Dramatized for effect, name changed to protect the reasonably innocent, but true story. Eric Egolf, the owner of CIO Solutions, a thriving MSP told us about it just last week. “The first thing the new guy does, intuitively, is open up the monitoring portal, before he ever looks at our ticketing system.” And the other engineers are following suit. Egolf says the ticketing system is great, but their comprehensive monitoring solution reveals the actual, real-time IT landscape for their entire client-base within seconds. And the most critical problems practically jump off the screen at the engineers, sometimes before a ticket has even been created.
Set an easy to use interface on top of the comprehensive monitoring solution, and Bob can often times very quickly isolate the problem, ferret out the root cause, and resolve the issue … before the asteroid plummets to earth and destroys America … or at least before a client calls screaming as if that did just happen.
“LogicMonitor makes my engineers smarter,” claims Egolf, “an entry-level engineer can basically perform all the functions of a mid-level engineer.” And without the increase in pay grade. That keeps costs down and clients up, and while that’s particularly a sweet-spot for MSP’s and cloud providers, the same formula holds true for SaaS/Web companies and in-house IT departments. Not good, but great monitoring is the answer.
That’s how you make an engineer smarter. Next blog post: How to Make an Engineer the Life of the Party.
Performance monitoring for all your infrastructure & applications. In minutes, not hours.
Questions? Call Us!
(888) 415-6442 or +1 (805)-617-3884