My grandpa loved cars. He worked on them with a level of passion most people reserve for things like expensive red wines and members of the opposite sex. He didn’t believe in outsourcing the care and maintenance of his wheels.
So I was shocked when one day he announced that changing his own oil was senseless. He was prideful, but he also valued his time and was adept at basic math: 4 quarts + 1 filter + 1 oil pan + 1 jack + 10 greasy fingernails + 2 trips to the auto parts store + 3 hours labor was not less than $29.99 + 45 minutes of watching television in the lobby at Oil & Tune.
This same general equation comes to mind when we hear tales of people instrumenting their own network monitoring solutions with open-source tools (see price comparison chart). When you factor in not just software costs, but hardware costs, and people costs to maintain everything, open source monitoring tools can quickly become more costly than a SaaS-based monitoring solution like LogicMonitor. (For more detail, download the network and server monitoring comparison whitepaper.)
You don’t have to take our word for it. This recent Twitter exchange between a Nagios fan* and a LogicMonitor client illustrates the difference in philosophies.
@NagiosFan: #Nagios is awesome Except for the parts that are terrible and inexcusable. But mostly awesome.
@LogicMonitorUser: @NagiosFan I cannot disagree more. Too much work for not enough gain. But we each value things differently #nagios is not for me.
@NagiosFan: @LogicMonitorUser haha that’s ok I like the extensibility and the initial ‘crafting’ for gains later. Plus, hella-automatable. What do you use?
Of course there are use cases where building your own monitoring tool makes sense. But for the greater percentage of SysAdmins, IT departments, and CTO’s out there, LogicMonitor has done the hard work for you, and serves it up on a silicon platter.
You may want to take a minute and do the math, like grandpa finally did. Then take your overalls off, put your toolbox away, grab a cup of coffee, and fire up a free trial…
*Twitter exchange was excerpted and the @names changed
– This article was contributed by Blake Beltram, Community Evangelist at LogicMonitor
Sometimes the truth hurts. Well the truth is what we didn’t find at DevOps Days was a throng of adoring fans waiting to throw their undergarments at us. Come to think of it, that would be kind of gross anyway, especially with the DevOps crowd…no disrespect.
What we did find was:
a) our marketing table nestled so close to our competitor’s that…if our tables had been teenagers, we would have sent them to the Principal’s office (see PHOTO below…with competitor’s name shamelessly Photoshopped out and replaced with ours) … and,
b) a lot of companies and DevOps teams that were fairly embedded in their custom-rigged, hard-fought and hard-won monitoring solutions.
In our last blog post we talked about the “suck” factor in monitoring. Well, maybe for some, blessed with sizable IT budgets and IT brains, monitoring doesn’t suck so bad at all. In fact maybe for those who take pride in their ability to cobble together a patchwork of complex solutions into one grand “comprehensive” solution, it’s sort of a way of life… a job within a job, a golden chalice, a worthy opponent for any Real Mensa up to the task.
When I was a kid I entered a Soapbox Derby – a racing event where the entrants spend the better part of a year (usually with their dads) making, honing, tweaking, and polishing their own motorless downhill race cars. Well I was new in town and my dad was busy with a new job, so I saved up and bought a Soapbox Derby Car from an enticing ad in the back of Popular Mechanics. The car was amazing. It was beautiful, took me fifteen minutes to put together, and with very little time, effort, or expense I placed an easy second in the popular Derby out of more than three dozen entrants. I loved it.
When, on the trophy stand, I told everyone I’d bought the car, they called an emergency meeting and, despite having no written rule to back up their judgement…took the trophy right out of my hands and disqualified me from the race. My car was arguably better, faster, sleeker and more attractive than most of the others in the field, but I hadn’t spent hundreds of hours and piles of money and put the requisite amount of blood, sweat and tears into it… so it didn’t count.
Sometimes the truth hurts. Well the truth is I just completely made up that story. Sorry, but I was searching for something analogous to what we didn’t find at DevOps Days and that fake memory seemed to kind of fit. It seemed more rich (and fun) than just coming straight out and saying, “When I was out last week I went to DevOps Days – an event where the participants spend a good part of their year (usually with their team) searching, honing and tweaking a multitude of products like Nagios, Cacti, collectd + graphite + pnp4nagios, Muni, etc. etc. to create their own monitoring solution…” and so on.
Plus, admit it, it conjured up a nice little twinge of boyhood nostalgia for a few seconds, didn’t it? Oh well, it did for me. It also caused me to realize what to do with the rest of this quarter’s marketing & event budget – we’re taking out a full page ad in the back of Popular Mechanics.
There’s some interesting discussion around “Monitoring Sucks”, and has been for a while. (Go check the twitter hashtag #monitoringsucks). This is not a new opinion – the fact that I thought monitoring sucks is why I started LogicMonitor.
But it’s interesting to assess whether LogicMonitor meets the criteria for not sucking. Clearly our customers think we have great monitoring – but probably only 30% of our customers are SaaS type companies, and may or may not have the DevOps mentality.
So the initial criteria for why monitoring sucks, at least on the referenced blog post, were:
But does monitoring REALLY suck? Heck no! Monitoring is AWESOME. Metrics are AWESOME. I love it. Here's what I don't love: - Having my hands tied with the model of host and service bindings. - Having to set up "fake" hosts just to group arbitrary metrics together - Having to either collect metrics twice - once for alerting and another for trending - Only being able to see my metrics in 5 minute intervals - Having to chose between shitty interface but great monitoring or shitty monitoring but great interface - Dealing with a monitoring system that thinks IT is the system of truth for my environment - Perl
Let’s look at these points from the point of view of LogicMonitor
Having my hands tied with the model of host and service bindings. I’m not sure how you not tie someone’s hands to some degree, but LogicMonitor certainly tries to give flexibility. Services do generally have to associated with hosts – but can be associated by all sorts of things (hostname, group membership, SNMP agent OID, system description, WMI classes supported, kernel level, etc.)
Having to set up “fake” hosts just to group arbitrary metrics together. LogicMonitor avoids this mostly with custom graphs on dashboards, which allow you to group any metric (or set of metrics based on globs/regex’s) with any other set, filtered to the top 10, or not; aggregated together (sum, max, min, average) or not. Also, some meta-services are associated with groups, not hosts, to allow alerting on things like number of servers providing a service, rather than just whether a specific host is successfully providing the service.
Having to either collect metrics twice – once for alerting and another for trending. We certainly don’t require that. Any datapoint that is collected can be alerted on, graphed, both or neither. (Sometimes datapoints are collected as they are used in other calculated datapoints, derived from multiple inputs.)
Only being able to see my metrics in 5 minute intervals. Again, we don’t impose that restriction – you can specify the collection interval for each datasource, from 1 minute to once a day. (I know going to only 1 minute resolution is not ideal for some applications – but as a SaaS delivery model, we currently impose that limit to protect ourselves, until the next rewrite of the backend storage engine, which should remove that.)
Having to chose between shitty interface but great monitoring or shitty monitoring but great interface.I think we have a pretty good interface and great monitoring. Certainly our interface is orders of magnitude better than it was when we launched, and a lot of people give us kudos for it. But there’s lots of room for improvement.
Dealing with a monitoring system that thinks IT is the system of truth for my environment. LogicMonitor thinks it is the truth for what your monitoring should be monitoring – but it’s willing to listen. It’s easy to use the API to put hooks into puppet, kickstart, etc that automatically add hosts to monitoring, assign them to groups, etc. We’re looking at integration with Puppet Lab’s MCollective initiative and other things to get further along this issue.
Perl. Our collectors are agnostic when it comes to scripting. They support collection and discovery scripts in the native languages of whatever platform they are running on – so VBscript, powershell, C# on Windows; bash, ruby, perl, etc on linux. But as our collectors are Java based, we encourage Groovy as the scripting language for cross-platform goodness. The collectors expose a bunch of their own functionality (snmp, JMX, expect, etc) to groovy, so it makes a lot of things very easy. So it’s the language we use for writing and extending datasources for our customers. But if Perl is your thing, keep at it.
So, does LogicMonitor suck? I don’t think so, and hopefully DevOps Borat does not either.
I’ll be at the DevOps Days conference in Austin this coming week (LogicMonitor is sponsoring), so hopefully we’ll get some more feedback there.
Or post below to let us know what constitutes “non-sucky” monitoring.
IT people by nature are supposed to be gurus. They’re supposed to be able to build things from scratch. This expectation certainly applies to data center monitoring, where a common practice is to rely on open source monitoring tools such as Nagios. But when you consider the value of your time, these free tools can quickly wind up being far more costly than commercial tools. For instance, we did a survey and found that some system admins had spent over 100 hours to get their open source monitoring solution to do what they wanted. Further, there was ongoing work to try to keep the system up to date with frequent changes in their datacenter, and even then they only had, for the most part, coarse level monitoring (for example, monitoring only the CPU load of a load balancer, instead of monitoring the state of all the hundreds of VIPs on the load balancer.)
When the only alternatives were costly enterprise-class monitoring solutions, sweating it out with open source was understandable. But now that there are affordable tools that automate configuration and give you everything you need in 30 minutes, insisting on building your own doesn’t seem wise (especially in this era of understaffed data centers.) At the root of this DIY mentality is pride. With so many open source options available, Techies probably feel some sense of shame or embarrassment going to an IT director and asking for tools that cost money.
I’d suggest a better source of pride is being able to spend time on tasks that add value to the enterprise – writing Puppet scripts that automate machine and software deployments, and so greatly reduce the time to spin up machines; investigate cloud usage options; correlate resource expenses with revenue per business unit. There are a lot of things that should be done in any enterprise, that are not because of lack of time. A good systems administrator’s time is very valuable – much more valuable than going through a MIB to figure out which item is important to monitor.
And no matter how good a systems administrator you are, monitoring is not going to be your top priority (nor should it be). You’ll get monitoring going “good enough” – but there will be lots of cases that it failed to alert on, when a comprehensive monitoring system would have. Then after every outage, you’ll have to go back and extend the monitoring, adding in metrics that could have helped predict the specific case.
So given the cost of your time; the more in depth monitoring that you get immediately with LogicMonitor (a typical Nagios implementation may monitor 10 metrics on a linux host; a typical LogicMonitor deployment will monitor over 100); and the opportunity cost of the things you could be doing to add value with your time, if you weren’t configuring monitoring, then why not use an automated monitoring tool such as LogicMonitor that makes you a better system administrator, and doesn’t require a Fortune 500 budget to implement?
If you’d rather skip the tedious work, but want the peace of mind knowing that your infrastructure is properly monitored, and that you will be alerted of any issues early, it’s perfectly okay to go the automation route. You’ll feel a sense of satisfaction in preventing an outage, whether you wrote the code or not. And your CFO may even thank you for spending the money.
Performance monitoring for all your infrastructure & applications. In minutes, not hours.
Questions? Call Us!
(888) 415-6442 or +1 (805)-617-3884