1-888-41LOGIC

How not to do Cisco, VMware and NetApp monitoring

October 29, 2010 – 11:47 am

Under the Imagine Virtually Anything alliance, Cisco, NetApp and VMWare have teamed up to deliver a shared virtualized server/network/storage infrastructure that can securely host multiple “tenants.”  This seems like A Good Thing, for all the usual reasons virtualization is good (lower costs, improved energy/space efficiency, faster deployments, etc). Yet they seem to have forgotten (more…)

Share

Preventative SQL Server Monitoring

October 11, 2010 – 12:38 pm

No matter the kind of database – Oracle, SQL server, MySQL, PostgreSQL, etc – there are distinct kinds of monitoring for the DBA.  There is the monitoring done to make sure everything is healthy and performing well, that allows you to plan for growth, allocate resources, and be assured things are working as they should.

Then there is the kind of in depth activity that DBA’s undertake when they are investigating an issue.  This takes far more time, and uses a different set of tools – the query analyzer, profilers, etc – but can have a large impact, and is where a good SQL jockey can really make a difference.  But given the amount of time that can be required to analyze and improve a query, when is it worth it? (more…)

Share

Cisco Switch Monitoring? Isn’t that redundant?

October 1, 2010 – 3:02 pm

One of the difficulties in IT environments is that redundancy can sometimes make outages worse.  The problem being that redundancy can often give people (mostly justified) confidence in the availability of their systems, so they design architectures on the assumption that their core switch (or database, or load balancing cluster, or what have you) will not go down.

And they even have monitoring.

But they don’t monitor the state of the redundant server or component. So then the redundant server or component fails, or is unplugged, or synchronization fails, or what have you, and stays that way for weeks with no one noticing. Then the active server or component fails, the other one is already out of commission – and boom – Bad Things happen.

So if you run redundant supervisor modules in your core switches to get high availability, make sure your cisco switch monitoring is capable of monitoring them. Same for redundant power supplies.

Same for active-standby Netscalers, or F5 Big IPs, or NetApp clusters, and or anything that you want to make sure works when needed.

If it’s not monitored, chances are it won’t be there when you need it.

Share

Memcached for the Masses

September 28, 2010 – 3:12 pm

Assuming you’re not Facebook, you probably don’t have large development teams to tweak how your application interacts with memcached, you may still want to deploy it to help your site’s scalability.

But if you practice safe operations, you will never put anything into production without monitoring.  Particularly nothing that production depends on.

“But production is not dependent on memcached!” I hear you say. “If the memcached slice that is caching a particular lookup is down, the app will just hit the database, and carry on happily.”

This is true, but what happens when your app  is not able to deal with the load in the absence of almost all the memcached slices being up?  Then you end up with a big dose of downtime, or horrendous app performance at best.

And how would you know how many requests your memcached systems are offloading from your database? If you weren’t monitoring them, you wouldn’t – so you wouldn’t know when you’d passed that point where you need memcached. (Ideally you’d be plotting the aggregated view of all the memcached’s in your environment.) Nor would you know when CPU load on your memcached systems has become the bottleneck, rather than memory. (Although unless you are Facebook, you probably won’t run into that.)  And of course, more than pretty graphs, you need to know when Memcached is down, or how many nodes are down.

So even if you don’t use LogicMonitor’s memcached monitoring, be sure to practice safe ops, and get some monitoring.

Want to learn more about monitoring, and how to practice safe operations in your environment? Schedule a free consultation with one of our experienced operations staff.
Free consultation.

Share

Monitoring NetApp MultiStore

September 24, 2010 – 10:19 am

One of the many great things about working at LogicMonitor is that we get to use our product to solve business needs. Personally, I like having someone asks us “I need to monitor X”, and be able to deliver custom monitors for X, within a few hours.

One example this week was a customer wanting to monitor NetApp’s Multistore capabilities. I’ve worked with NetApp hardware for over 10 years, but had never used multistore.  (NetApp’s Multistore let’s you easily create separate and completely private logical partitions in filer network and storage resources. It’s like VMware for NetApps.)  Multistore is getting more attention recently as a good way to bring NetApp’s strengths to cloud provisioning.

Despite LogicMonitor’s excellent NetApp monitoring we didn’t have anything to monitor vfilers created when Multistore is used.  However, we fired up some vfilers in our lab, and using LogicMonitor’s debugging tools to query the NetApp API, we found the relevant metrics, created a datasource with Active Discovery of all vfilers, graphs and alerts, and voila! Within an hour all our current and future customers just had their NetApp monitoring capabilities extended, automatically! (Another reason to love SaaS!)

Even cooler, using LogicMonitor’s role based access control, you can give users the same visibility in LogicMonitor as they have on their vfiler. So an administrator delegated control of only a single vfiler can be set in LogicMonitor to just see his own vfiler and its volumes and their usage and performance, while others can see all the vfilers, and latency of all volumes on the phsyical system.

Being able to deliver things that exceed customer’s expectations, about things that you didn’t know existed, in less than an hour – fun day at work!

Share

Linux Monitoring is dead.

August 31, 2010 – 4:13 pm

Long live Linux monitoring.

By which I mean that, unless you are a kernel developer or some other individual with esoteric purposes, having Linux up and running is not the point of your servers.  Your servers are there to DO something, whether that’s to serve web pages, answer database requests, or provide the best hosted monitoring service.

So…what do you monitor? (more…)

Share

VMware monitoring webinar to watch

August 21, 2010 – 5:19 pm

Just a quick note to say that our recording of the webinar we gave on virtualization monitoring VMware monitoring (and XenServer Monitoring) is up on LogicMonitor.com.

It’s a quick (about 10 minutes) look at why it’s particularly important to have a unified monitoring system that covers virtualization infrastructure, the guest OS’s and the applications on the guest OSs, and the storage, all in one monitoring system – otherwise you can end up with people without sufficient information chasing all sorts of problems that would be immediately identifiable with a unified system.

Check it out, and if you have any questions, feel free to email us at info@ logicmonitor.com

There were quite a few questions at the end of the webinar, but I didn’t include them in the recording, else we would have had to post it somewhere else other than the Worlds Shortest Webinars. :-)

Share

3 Simple steps to Apache Monitoring

August 13, 2010 – 1:04 pm

If you’re reading this, you know you should be monitoring your Apache web servers. (You want to know if they are approaching limits of configured server workers; you want to know how many requests you are serving; you want to ensure availability, etc).  Fortunately, enabling Apache monitoring is quite simple.

Make sure you are loading the mod_status module.

If you are using a version of Apache that was installed by your OS’s package manager, there are OS specific ways to enable modules.

For Ubuntu/Debian:

/usr/sbin/a2enmod status

For Redhat/Centos: Just uncomment the line:

LoadModule status_module modules/mod_status.so

in /etc/httpd/conf/httpd.conf
For Suse derivatives:
add “status” to the list of modules on the line starting with APACHE_MODULES= in /etc/sysconfig/apache2

Configure the Mod_status module

You want the following to be loaded in your apache configuration files.

ExtendedStatus On
<Location /server-status>
 SetHandler server-status
 Order deny,allow
 Deny from all
#Add LogicMonitor agent addresses here
 Allow from localhost 192.168.10.10
</Location>

Where you set that configuration also changes depending on your Linux distribution.
/etc/apache2/mods-available/status.conf on Ubuntu/Debian
/etc/httpd/conf/httpd.conf on Redhat/CentOs
/etc/apache2/mod_status.conf on OpenSuse/SLES
Finally, restart apache using your OS startup script ( /etc/init.d/httpd restart or /etc/init.d/apache2 restart). Note that using the OS startup script is often necessary to allow the OS specific script files to assemble the final apache config. Sending apache signals, or using apache2ctl, does not do this.

3. Watch the monitoring happen.

If you are using LogicMonitor’s Apache monitoring, then you’re done.  LogicMonitor will automatically detect the Apache web server, and apply appropriate monitoring and alerting, as well as alerting and graphing on the rest of the system, so you can correlate CPU, interface and disk load to Apache load.

One thing you may want to customize is your dashboards – add a widget that collects all Apache requests/second, from all hosts, or all production hosts, and aggregates them into a single graph.  Using LogicMonitor’s flexible graphs, the graph will automatically include new servers as you add them.

Apache Requests per second

Want to make your Apache monitoring simple? Check out LogicMonitor for monitoring that automates your monitoring setup. Free Trial

Share

Is your network monitoring software deadly?

July 27, 2010 – 1:02 pm

So maybe the consequences of an outage in your infrastructure will not be as calamitous as the BP oil spill in the Gulf, but the effect on your enterprise may feel like it.  Which is why we could all use B.P. as a case study in how not to treat your monitoring.

News reports detail allegations that some of the alarms on the failed rig were “inhibited”, because “they did not want people to wake up at 3 a.m. due to false alarms”.  The rig’s operator responded, “Repeated false alarms increase risk and decrease rig safety”.
(more…)

Share

World Cup Soccer Monitoring

July 1, 2010 – 10:05 pm

You (or at least I) wouldn’t think world cup fever would affect the more mundane world of data center monitoring. Unless you work for Twitter, or some other web presence overloaded by unprecedented surges in network traffic due to people checking in on, and commenting about, the World Cup.  Twitter found that having a network that “wasn’t appropriately being monitored” supporting two critical components was not a good idea, and led…to an outage (or several fail whales.)

Twitter is addressing issues, including their monitoring. (As we’ve noted before, if monitoring did not alert about an incident at the earliest possible time, the incident should never be closed until the monitoring is improved to warn of the issue in advance.)

In our view, the best monitoring approach is to have your monitoring software assume everything should be monitored to production levels, automatically, and not need your Ops team to tell it.

Repel your own flying whales by using data center monitoring automation as much as possible.  Let the humans go play soccer, or blow their vuvuzelas!

Share