I’ve talked about this before, but I just read an article about why application performance monitoring is so screwed up, and coincidentally had just talked about it in a lecture I gave to a graduate class at UCSB on scalable computing, so figured it’s worth a mention.
The article mentions that “enterprises have confused (with vendor help) the notion of monitoring the resources that an application uses with its performance”. The way I put it in my lecture was that:
So… how to tie one to the other?
Monitor what users care about (page load times, response per request, etc)
Also monitor all the limiting resources (CPU, Disk IO – or more importantly what percentage of the time a drive is busy, network, memory):
And monitor the performance of the systems that affect the limiting resources:
So while monitoring InnoDB file sytem reads does not tell you anything that an end user cares about, if your monitoring of Tomcat request time shows that users are experiencing poor performance, and your logical drives are suddenly 100% busy and request service time increasing, it’s good to know why that is. It may be because of InnoDB buffer misses, or it may be because of something else – but having this intermediate data will drastically reduce your time to correct the issue that users care about – response time.
Another point to note: the “user” in the phrase “monitor what users care about” may not be a human. If a server is a memcached server – the users for this server are web servers, who care about memcached response time, availability and hit rates. So on this class of machines, that is the thing to monitor to determine if the service is meeting the needs of users.
In short, for every machine, identify the “thing(s) to care about” for it; monitor those things; monitor the constrained resources; and monitor all aspects of the systems on that server that inmpact the constrained resources.
A more technical article today.
In adding some more Exchange Monitoring we ran into some issues, and solutions, that may help others. Some things in recent Exchange versions can only be monitored by Powershell. (Perfmon, WMI, Powershell, all needed for different versions of Exchange…. I wish they’d make up their mind…)
So the first issue was that Powershell scripts, when called from a LogicMonitor agent, never returned. This wasn’t too hard – simply pass the parameter -inputformat with the (undocumented) option “none”, and the agent can successfully run Powershell commands:
powershell -inputformat none dbstatus.ps1
(Why? The Microsoft.PowerShell.ConsoleHost class constructs a M.PS.WrappedDeserializer passing the STDIN TextReader as one of the parameters. By default, the WrappedDeserializer will call ReadLine() on this STDIN TextReader and wait indefinitely, effectively hanging PowerShell and the calling process. That’s why.)
So past that hurdle, but the next one:
>> powershell -inputformat none dbstatus.ps1
Add-PSSnapin : No snap-ins have been registered for Windows PowerShell version 2.
Yet running the exact same command from the command shell on the host running the agent resulted in the output we were expecting. And we could see the Exchange snap in, called by the Powershell script, was correctly registered, and in fact worked fine.
But.. our agent was running on a 32 bit JVM and Exchange 2010 (in our lab, at least) is installed on 64 bit Windows. The Powershell snap in was only visible when powershell was started from a 64 bit app. When I started powershell from the cmd.exe in SysWOW64, I got the same error about missing snap-ins as our agent reported.
The solution – it doesn’t matter that our agent was installed as a 32 bit app, in Program files (x86). What mattered was that the Java virtual machine launched by the agent, that ultimately launched Powershell, be a 64 bit JVM, not the default 32 bit JVM installed from Java.com. (At least, a 32 bit JVM is the default when you browse to Java.com with a 32 bit browser.)
So, running the LogicMonitor agent with a 64 bit JVM, and Powershell started with “-inputformat none” gives us full access to Powershell output and all its snap ins, so expect some datasources released very shortly to take advantage of that.
Performance monitoring for all your infrastructure & applications. In minutes, not hours.
Questions? Call Us!
(888) 415-6442 or +1 (805)-617-3884