Archive for the ‘Tips & Troubleshooting’ Category

Puppet monitoring: how to monitor the success or failure of Puppet runs

Wednesday, February 20th, 2013

This post, written by LogicMonitor's Director of Tech Ops, Jesse Aukeman, originally appeared on HighScalability.com on February 19, 2013 If you are like us, you are running some type of linux configuration management tool. The value of centralized configuration and deployment is well known and hard to overstate. Puppet is our ...

Discovering write latency problems with ESX datastores

Wednesday, October 24th, 2012

Our digs here at LogicMonitor are cozy. Being adjacent to sales, I get to hear our sales engineers work with new customers, and it’s not uncommon that a new customer gets a rude awakening when they first install LogicMonitor. Immediately, LogicMonitor starts showing warnings and alerts.  ”Can this be right ...

What’s with the different SNMP versions? s1, v2c, v3?

Friday, October 5th, 2012

We use snmp a lot, and know it well. However, not everyone of our customers has spent years working with OIDs in ASN.1, MIBs, Access types, and so on - and nor should they. (As we like to say, "Your monitoring solution should make your life easier, not harder.") So ...

Cisco switch temperature problems solved by newbie

Monday, September 24th, 2012

As the new hire here at LogicMonitor brought in to support the operations of the organization, I had two immediate tasks: Learn how LogicMonitor's SaaS-based monitoring works to monitor our customer's servers, and at the same time, learn our own infrastructure. I've been a SysA for a longer than I care ...

In (Dev) ops, a release is only as good as its worst effect

Thursday, September 6th, 2012

You released new code with all sorts of new features and improvements. Yay! Now, after the obvious things like "Does it actually work in production", this is also the time to assess: did it impact my infrastructure performance (and thus my scalability, and thus my scaling costs) in any way. This is ...

Alerts – making good datacenters better

Wednesday, August 15th, 2012

A company started a trial yesterday, added a bunch of windows hosts, and immediately got warnings triggered that their hosts were "receiving 42 datagrams per second destined to non-listening ports...Check if all services are up and running." This was across many of their hosts, and was an issue they were unaware ...

Our Philosophy on Monitoring

Tuesday, July 17th, 2012

Your monitoring solution should make your life easier, not harder. Monitoring should not have to be a full time job. Data collection, alerting, trending and reporting should all be integrated. Everyone in IT/Ops should be able to use your monitoring - not just the person that set it up. Monitoring should tell people about ...

When Lightning Strikes Your “Cloud”, Good Monitoring Means Great Disaster Recovery

Monday, July 2nd, 2012

Kablooee!  That was the sound I (and many others) heard coming from one of Amazon Web Services (aka, the "cloud") availability zones in Northern Virginia on June 30th (http://venturebeat.com/2012/06/29/amazon-outage-netflix-instagram-pinterest/, http://gigaom.com/cloud/some-of-amazon-web-services-are-down-again/).  The sound was a weather-driven event causing one of Amazon's data centers to lose power.  And what happens when a ...

The most important monitoring report that you are not using

Saturday, June 30th, 2012

Even with a great monitoring system, it can be hard sometimes to keep the noise down. (Indeed, the more powerful the monitoring, the more difficult this can be, as more data is collected and tested, automatically.) And keeping noise down in monitoring is vital, as you do not want staff ...

How To Make an Engineer Smarter

Tuesday, June 19th, 2012

It’s 6 AM.  Bob, an entry-level IT engineer walks into a cold, dark, lonely building – flips on the lights, fires up the coffee pot, and boots up.  Depending on what he’s about to see on his computer screen, he knows the fate of the free world could rest in ...