1-888-41LOGIC

LogicMonitor’s Hosted Monitoring – best support just got better

August 29, 2011 – 5:11 pm

One of the great things about LogicMonitor’s hosted monitoring is the support we can offer.  Because we are hosted monitoring, customer can choose to grant our support staff access to their accounts so we can help them directly; they can chat with an engineer in their portal, or they can email or phone us.

Today we are announcing another support channel: support.logicmonitor.com

This a community site, where you can post questions, report problems, suggest ideas or even give praise.

The advantage of this support channel is that it accumulates knowledge – so once we (or other community members) have answered a question, it will be immediately available for others to find as an answer when they ask a similar question.  If there is no answer, and a question is posted, LogicMonitor staff will be notified and we can answer  the question directly.

So we encourage everyone to use this as their first line of support – it should benefit everyone, and we’ll also be using it in the future for some cool contests, like the most interesting LogicMonitor alert and solution of the month. (As a matter of fact, if you have ideas for cool contests, suggest them at support.logicmonitor.com!)

Go check it out!

Share

Is Linux disk utilization on your monitoring dashboard?

August 7, 2011 – 9:01 pm

Recently we rolled out a new release of LogicMonitor. Among the many improvements and fixes that users saw, there were also some backend changes to the Linux systems that store monitoring data.

The rollout went smooth, no alerts were triggered – but it was pretty easy to see that something had changed: (more…)

Share

The best network monitoring is not on your network

July 1, 2011 – 6:12 pm

There was a good article on techtarget this week about the hesitancy of IT pros to adopt SaaS. The main gist of the article was that SaaS is coming, even into the IT space. Which we here at LogicMonitor heartily agree with. We’ve seen much greater acceptance to SaaS as a delivery mechanism for a monitoring service over the last few years.

Of course, the IT professionals in the article still had issues. Some of the arguements against SaaS seem upside down, at least as regards to SaaS based datacenter monitoring.cc lisense by Christian Haugen

“That outsourcing a lot of computing functionality to hosted services often leads to downsizing of the IT staff itself”.

This may be true in some cases, but even in the recent economic downturn, we haven’t seen that at LogicMonitor.  What we’ve seen is that for companies that are growing, their IT staff is expected to accomplish more.  Pushing out responsibilities for things that are not part of their core focus (such as server and network monitoring) allows them to deliver better service in other areas, by freeing up staff time.  We’ve had customers with LogicMonitor deployments where they have freed up the time of a whole staff person – not resulting in layoffs, but allowing that person to address other issues in the IT backlog.

Ever heard of an IT department without a backlog?

“The fear is that if the Internet goes down, you won’t be able to do your job because the tools won’t work”.

True, but if your Internet connection goes down, you’ll be notified by your monitoring. Yes, you’ll be in the dark about the status of systems while that outage is going on, but you’ll know there is an issue, and it can be addressed. (And with LogicMonitor, the data for all systems will appear once connectivity is restored).  A far more likely scenario is that your premise based monitoring server goes down.  And you don’t know about it, as you dont have anything monitoring the monitoring server – so it could be down for hours before you even notice.  Or, your internet goes down at night, and the notification messages from your premise based monitoring can’t get out, so you arrive to work in the morning to an outage you didn’t know about and a mass of angry users.

Or you lose one of your datacenters. Power fails, you lose a core network switch, or what have you. With monitoring as a service, you’ll be notified (which you may not be if your premise based monitoring was in that datacenter.) You’ll know if your other datacenters are OK, and if services failed over to other datacenters. (Again, not something you’d know with a premise based system.)  This will give you some breathing room to focus on the failed site, knowing all is well elsewhere (assuming you have DR setup.)

And when you restore power or what have you to the failed datacenter, you’ll know immediately what hosts recovered, what databases started automatically, what storage clusters failed over – or not – without having to first recover your monitoring node and wait for all its services to start.

“Security”

This wasn’t mentioned in the article, but an objection we hear (although much less than we used to.)

Again, this is probably an objection that is upside down. I’ve yet to meet any enterprise that restricts physical access to its premise based monitoring servers by keeping them in locked cages with biometric access, 24 hour armed guards; tightly restricts who can log in to their monitoring servers; encrypts all data in the database, so even gaining root access to the database is of no use; and regularly conducts vulnerability assessments against their monitoring.

Most likely their monitoring is running in a server room which many people can access; all IT admins can usually log in as root; and they have no idea about the protection of data within the monitoring server.

We’ve even heard security raised as an objection against hosted monitoring from companies using Salesforce.com to manage their customer relationships.  As if CPU load and disk latency metrics were more valuable to the enterprise than customer and prospect data.  There are valid cases for not adopting SaaS (some finance or government applications), but in general security is raised by IT people taking a fiefdom view of SaaS, rather than really considering information risk and benefits and that accrue to the company.

So what do you think?

Is SaaS coming to IT?

 

Share

Linux Monitoring, Net SNMP and terabyte file systems

June 11, 2011 – 9:42 pm

Or, how to deal with signed integers in a way that makes sense when doing Linux Monitoring.

A customer contacted us this week and said “Hey, one of my filesystems that was being monitored by LogicMonitor disappeared after I grew it.”  Turns out the filesystem in question was now a bit over 2 terabytes.

Some poking around showed that the file system was being filtered out of discovery, as net-snmp was reporting a size for the file system (via 1.3.6.1.2.1.25.2.3.1.5, hrStorageSize) of -1982127408. Yes, that’s a negative value.

The hrStorageSize obect is defined as Integer32 – so it’s really a signed integer. Go above 2147483648 allocation units, and you’ll be in negative territory (as the first bit will be interpreted as the sign.)

So, instead of disk Usage (as a percentage) being calculated:

  • let StorageSize be the value reported by .1.3.6.1.2.1.25.2.3.1.5 (hrStorageSize) for the filesystem
  • let StorageUsed be the value reported by .1.3.6.1.2.1.25.2.3.1.6(hrStorageUsed) for the filesystem
  • thus the percentage of disk Usage is: 100*StorageUsed/StorageSize

we can change the formula LogicMonitor uses to calculate the percentage of disk space to:

100*(if(lt(StorageUsed,0),4294967296+StorageUsed,StorageUsed))/
(if(lt(StorageSize,0),4294967296+StorageSize,StorageSize))

which takes account of the fact that anything above 2147483648 will be reported as a negative number, and corrects for it.

In English, the above formula says:

  • if StorageUsed <0, add 4294967296 (2^32) to it
  • if StorageSize < 0, add 4294967296 to it
  • then compute as before: PercentUsage = 100*StorageUsed/StorageSize

We use a similar formula in the graphing definition of the Linux Disk Usage datasource, although there the values are also multiplied by the size of the Allocation Units, so you get an accurate representation of the size of the file system:

We’ve updated LogicMonitor and it’s core datasource repository, so now all customers will be able to avoid this problem if they deploy Terabyte size filesystems.

This adjustment can be used for other values reported as signed integers when you don’t want them treated as signed.  So, for everyone running into this issue – you don’t need to update net-snmp (which there seems to be a lot of people calling for); or define a new MIB object. Just configure your monitoring and graphing systems to correct for the sign, as above.

And if your monitoring systems can’t, well – you can always switch to LogicMonitor.

Share

MySQL Monitoring – Don’t fire the DBA just yet

May 20, 2011 – 10:30 pm

Some people (sometimes even us) have been known to refer to LogicMonitor as a “sysadmin in a box”.  This is not quite true – and not just because we’re a SaaS service, so there is no box.  But while LogicMonitor can certainly let you put off hiring a sysadmin, or let your sysadmin manage vastly more systems than other monitoring systems, by automating the monitoring and providing pre-defined best practices alerts – someone still has to interpret those alerts, and figure out the best way to deal with them.

A case in point:
A customer got the following alert about one of their MySQL databases: (more…)

Share

Network Monitoring .. Not all automation is good

April 29, 2011 – 6:27 pm

LogicMonitor is, as far as I know, the most automated network monitoring system out there.  But there is one area we don’t provide much in the way of automation, that we are often asked about – automated scripts in response to alerts.  There are few reasons why not, which flow from our experience running critical production datacenters:

  • There are many cases where you don’t want automated recovery – you want a human to pinpoint the cause of failure, and ensure the recovery is done safely.   e.g.  after a master database crash, many DBAs don’t want to restart the database without determining the cause, whether transactions need to be backed out, whether slaves are still valid replicas, etc.
  • If a system is important enough to need automated recovery, the right way to do that is to have standby systems, clustered or otherwise available. e.g. multiple web servers behind a load balancer; master-master databases; switches with rapid spanning tree; routers with a rapidly converging IGP (OSPF, EIGRP).
  • If a service or process does need to be automatically restarted on a host, the monitoring system is almost certainly not the right way to do it. Use daemon-tools or init on Linux, or configure the service to restart in the Services control panel on Windows.  Using the monitoring system to attempt to remediate this will necessarily be a more fragile system than OS level tools.
  • If there are processes that need to be killed and restarted in response to the state of monitored metrics – if memory leaks and grows too much, say (I’m looking at you, mongrel) – then use a tool designed for that – monit, say.

In all these cases, use your monitoring to tell you if your recovery mechanisms are working, not to be the recovery mechanisms.  Monitor the memory usage of your mongrel processes, and alert only if the memory consumption is higher than you expect, for longer than it should be if monit was doing it’s job, say.

Of course, LogicMonitor can trigger automated script actions in response to alerts – you can set an agent inside your datacenter to pull all the alerts, send them to a script, which can do … whatever you can script.  And there are cases where that’s appropriate.  But you should have a good think about your architecture and design before you leap to that as a first resort.

Share

Troubleshooting server performance and application monitoring – a real example.

April 20, 2011 – 7:24 pm

We got a question internally about why one of our demo servers was slow, and how to use LogicMonitor to help identify the issue.  The person asking comes from a VoIP, networking and Windows background, not Linux, so his questions reflect that of the less-experienced sys admin (in this case). I thought it interesting that he documented his thought processes, and I’ll intersperse my interpretation of the same data, and some thoughts on why LogicMonitor alerts as it does… (more…)

Share

MySQL Linux Tuning talk

April 13, 2011 – 12:48 am

Not really monitoring, but I just finished giving a talk at the MySQL conference.  (It was gratifyingly packed with people, too.)

Thought I’d post the slides here. The summary is:

  • you need to be able to measure and trend on your OWN infrastructure – your kernel, hardware, MySQL version, application. (Of course, if you are using LogicMonitor, that issue is solved.)
  • solve your problems in the simplest way possible.
  • Test different IO schedulers – may not be any benefit, but its so easy to do so, you should try.
  • Test different levels of innodb thread concurrency – can make big difference, and easy to test.
  • Eliminate swapping, in the simplest way you can (tuning swappiness; NUMA tricks, then hugepages.)

Download the presentation here.

Feel free to post questions.

 

Share

How to select Data Center Monitoring

March 10, 2011 – 7:13 pm

We just put together a new white paper, “How to Select a Data Center Monitoring System” to help enterprises going through this process.  It doesn’t provide any easy answers, but gives you a framework and a lot of questions to consider.  (Much as I’d like the easy answer to be “Use LogicMonitor”, that’s not always the right answer.)

The biggest take away – think in terms of business value, not technical features.

Share

Datacenter Monitoring and responding to Alerts

February 18, 2011 – 10:27 pm

We’ve talked about this before on the blog a few times, but it’s really policies and processes that help make a successful datacenter monitoring deployment.

We’ve collected some of our best practices, from years of our own experience running datacenters, and also from working with a lot of customers, into a Alert Response Best Practices document.

The one key item we urge everyone to adopt, especially during the initial deployment of monitoring, would be a weekly alert review, to decide what alerts need tuning, disabling, or what issues to focus on fixing.

Share