Friday, April 29th, 2011
LogicMonitor is, as far as I know, the most automated network monitoring system out there. But there is one area we don't provide much in the way of automation, that we are often asked about - automated scripts in response to alerts. There are few reasons why not, which flow ...
Posted in Uncategorized | 1 Comment »
Friday, October 1st, 2010
One of the difficulties in IT environments is that redundancy can sometimes make outages worse. The problem being that redundancy can often give people (mostly justified) confidence in the availability of their systems, so they design architectures on the assumption that their core switch (or database, or load balancing cluster, ...
Posted in Uncategorized | No Comments »
Friday, January 29th, 2010
When designing infrastructure architecture, there is usually a choice between complexity and fault tolerance. It's not just an inverse relationship, however. It's a curve. You want the minimal complexity possible to achieve your availability goals. And you may even want to reduce your availability goals to reduce your complexity (which ...
Posted in Uncategorized | 1 Comment »
Friday, November 20th, 2009
One question that often arises in monitoring is how to define alert levels and escalations, and what level to set various alerts at - Critical, Error or Warning.
Assuming you have Errors and Critical alerts set to notify teams by pager/phone, and Critical alerts with a shorter escalation time, here ...
Posted in Uncategorized | No Comments »
Tuesday, November 10th, 2009
Monitoring System Sprawl
This is often a corollary to the first point, not relying on manual processes. The number of monitoring systems you have in place should approach 1. You do not want one system to monitor windows servers; another for linux, another for MySQL, another for storage. Even if they ...
Posted in Uncategorized | No Comments »
Friday, November 6th, 2009
Continuing on the series of common Datacenter monitoring mistakes...
Alert overload
This is one of the most dangerous conditions. If you have too many noisy alerts, that go off too frequently, people will tune them out - then when you get real, service impacting alerts, they will be tuned out, too. I've ...
Posted in Uncategorized | 2 Comments »
Thursday, November 5th, 2009
Continuing on from Part 1
No issue should be considered resolved if monitoring will not detect its recurrence.
Even with good monitoring practices in place, outages will occur. Best practices dictate that the issue not be considered resolved until monitoring is in place to detect the root cause, or provide earlier warning. ...
Posted in Uncategorized | No Comments »
Wednesday, November 4th, 2009
Everyone knows they need monitoring to ensure their site uptime and keep their business humming. Yet many sites still suffer from outages that are first reported by their customers. Here at LogicMonitor, we have lots of experience with monitoring systems of all kinds, and these are some of the most ...
Posted in Uncategorized | No Comments »