The beta version of the new Alerts UI has been available to LogicMonitor users for a few months now. If you haven’t tried it yet, you should (keep reading to find out how). But even those who’ve tried the sexier new UI may not have noticed all the sweet new features we’ve introduced along with it.
See the information that’s important to you
The current UI can tell you a lot about your alerts at a glance. But sometimes that’s too much information, sometimes not enough, and sometimes it’s just not the right information. (Who knew monitoring had so much in common with Goldilocks!) In the new UI, we’ve added four additional columns of data (Escalation Chain, Alert Rule, Cleared On, and In SDT) and we allow you to add, remove, and arrange columns in any way that makes sense for you. Just click the Settings button above the Alerts table and choose “Manage Columns”. You’ll be able to select, unselect and drag the column options however you choose.
Giddy with all the new information you have access to, but don’t have enough pixels to show it all? Click that ‘Settings’ button again and you can make the font size smaller and even choose how many alerts to show at one time. Plus, you can drag columns to whatever width you like.
Filter on just about anything
The four most common filters (Group, Device, Datasource, Datapoint) are readily apparent at the top of the alerts table. And clicking the “more” filter dropdown gives you access to six additional filters (Alert Rule, Escalation Chain, Severity, Acknowledged, In SDT, and Cleared).
The Cleared filter (and “Cleared On” column) allows you to see the last seven days of cleared alerts (or cleared and active alerts), which is pretty helpful when you want to get a feel for alert response times or just how often a particular alert has been occurring on a particular device.
For a faster, broader search, you can use the new “Search Anything” tool, which searches across all visible columns.
Everyone has a different workflow when it comes to alerts. And you need specific information at different times. The new column and filtering options are flexible enough to handle those situations. You’ll be able to quickly find the information you want so you can know what’s going on and fix it.
View and copy error messages with ease
Sometimes the little things make a big difference. The current UI gives you error messages in a hover tip, which makes it impossible to select the message for copying (as soon as you move your mouse, the tips disappear).
In the new UI, we’ve made a concerted effort to not depend on secondary mouse interactions (right-click and hover, specifically), and the alert messages are no exceptions. Just click on any alert to see the new Alert Details screen, which not only provides the message in a more readable format, but also makes it selectable and, yes, copyable.
All the important stuff is just a click away
Alert Details (above) allows you to make notes, schedule down time (either for the selected alert, or the entire device), acknowledge the alert, or escalate the alert to the next person in the relevant Escalation chain. And if you receive alerts by email or SMS, the link provided in those messages will give you the same easy Alert Details screen in a mobile-friendly format.
There are other little UI helps like row highlighting for easier readability, but part of the joy is in the discovery, so we won’t ruin the surprise. If you haven’t seen the new, better, sexier alerts page yet, go to Settings > Roles and Users. Edit your user profile (or ask your admin to do it) and select “Use New UI”. Once you’ve saved your user and refreshed the page, you’ll see a link at the top of the page that says “toggle UI”, which will allow you to switch back and forth between the new and the current UI.
Once you’ve used the beta version of the new Alerts page, let us know what you think by clicking the blue “Feedback” button on the right edge of the screen. Our goal has been to make the Alerts page simpler, easier, and perhaps even a little more enjoyable (which is not a word usually associated with alerts). We’d love to know if we’re on the right track and what we can do to make it even better.
Even with a great monitoring system, it can be hard sometimes to keep the noise down. (Indeed, the more powerful the monitoring, the more difficult this can be, as more data is collected and tested, automatically.) And keeping noise down in monitoring is vital, as you do not want staff to start ignoring alerts – which they will if there are too many meaningless alerts.
There are of course best practices to help with this process, but one of the best ways to start attacking your alert noise is also one of the easiest – simply set up a report to highlight where the noise is coming from, and review it once a week.
Under the Reports tab, select New Report, then fill it out as the below – the important thing being to select the report type as Alert Report.
The magic of the report is in the details:
I suggest setting the report to cover the last week, for all hosts (although if you are responsible only for a set of hosts – by all means change the report to only reflect those you are getting alerted about); exclude alerts that occurred during periods of Scheduled DownTime (those alerts would not have been sent out anyway); check the Summarize Alert Counts box, THEN select the sort method of sorting by Alert count. (This sort order is not available until the summarize alert count box is checked.)
Run this report, and you’ll get output like the below:
Which makes it very easy to see that in this case, we could eliminate 80% of the alerts for the last week simply by changing the monitoring on the IPMI event logs of one development host – filtering out alerts, or using SDT, or even disabling that monitoring, given it’s just a development host.
We can then work through the top noise makers, tuning, disabling, or fixing issue (such as increasing the MySQL cache on prod5.iad), which will greatly reduce the amount of alert noise with the least work.
And then we’ll get this report emailed to us every Monday, so we can stay on top of the issues, and keep our monitoring meaningful. That way, we’ll have improved the performance of our systems, eliminated any alert noise, and if we do get an alert – we can be sure it’s meaningful, and that people will react to it.
Last night our ops team (of which I am a member) got paged about the CPU load on a Cisco 3560 switch in a new datacenter, late at night. My initial reaction was “We don’t need this alert escalated to pagers or phones- 3560’s switch and route in hardware, so CPU load doesn’t matter.” Once I’d woken up a bit more, the corollary – that there is no possible way that this switch should be at a CPU level to trigger an error alert – occurred to me. Read more »
We recently had a customer come into trial looking around for a new monitoring solution. This is always good for us. We love the takeaway. (Customers defecting from other monitoring systems to us.) As in most takeaway situations this customer had specific needs. Now there are the obvious ones in which LogicMonitor easily fits the bill such as alerting, dashboards, performance monitoring, etc (and if you fall into that VMWare, Cisco, NetApp sweet spot, game over!). This guy however, had a very specific need we didn’t fulfill directly out of the gates. I think anyone who has ever worked with a monitoring solution knows that it’s hard to find one that does everything. Well in the case of LogicMonitor this is no different. We don’t do EVERYTHING. I know, you thought I was going to get all high and mighty and talk about how LogicMonitor is the one monitoring tool that CAN do everything. Well Read more »
We received some alerts tonight that one Tomcat server was using about 95% of its configured thread maximum.
The Tomcat process on http-443 on prod4 now has 96.2 % of the max configured threads in the busy state.
These were SMS alerts, as that was close enough to exhausting the available threads to warrant waking someone up if needed.
The other alert we got was that Tomcat was taking an unusual time to process requests, as seen in this graph: Read more »
Performance monitoring for all your infrastructure & applications. In minutes, not hours.
Questions? Call Us!
(888) 415-6442 or +1 (805)-617-3884