×

Tag Archive disk io

I was invited to talk to an MSP peer group the other week, and during the presentation, one of the group members who was a LogicMonitor customer described a way they use LogicMonitor to solve a previously hard-to-solve VMWare operational issue. Read more »

Tags:

I’ve talked about this before, but I just read an article about why application performance monitoring is so screwed up, and coincidentally had just talked about it in a lecture I gave to a graduate class at UCSB on scalable computing, so figured it’s worth a mention.

The article mentions that “enterprises have confused (with vendor help) the notion of monitoring the resources that an application uses with its performance”.  The way I put it in my lecture was that:

  • Systems are limited by Disk IO, memory and less commonly CPU and network.
  • Users dont care about Disk/memory/CPU/network… They care about web pages, and speed.

So… how to tie one to the other?

Monitor both.

Monitor what users care about (page load times, response per request, etc)

Also monitor all the limiting resources (CPU, Disk IO – or more importantly what percentage of the time a drive is busy, network, memory):

And monitor the performance of the systems that affect the limiting resources:

So while monitoring InnoDB file sytem reads does not tell you anything that an end user cares about, if your monitoring of Tomcat request time shows that users are experiencing poor performance, and your logical drives are suddenly 100% busy and request service time increasing, it’s good to know why that is. It may be because of InnoDB buffer misses, or it may be because of something else – but having this intermediate data will drastically reduce your time to correct the issue that users care about – response time.

Another point to note: the “user” in the phrase “monitor what users care about” may not be a human.  If a server is a memcached server – the users for this server are web servers, who care about memcached response time, availability and hit rates.  So on this class of machines, that is the thing to monitor to determine if the service is meeting the needs of users.

In short, for every machine, identify the “thing(s) to care about” for it; monitor those things; monitor the constrained resources; and monitor all aspects of the systems on that server that inmpact the constrained resources.

Tags:

Categories
Popular Posts
Subscribe to our blog.