[Written by Chris Morgan, Senior Solutions Engineer at LogicMonitor]
At LogicMonitor, our monitoring philosophy is to provide customers with actionable intelligence. Great examples of actionable intelligence are the alerts we send you about performance issues in your IT infrastructure. Providing meaningful performance and health metrics is our bread and butter, but we want to avoid overwhelming you with alerts as overload often results in apathy, defeating the original purpose of monitoring.
Consider the case of when a Windows Server running SQL database receives a credential change. Any new client request to that server will then fail, and with every failure a Window Event will trigger. When your server has an issue and 100 different clients are trying to access it unsuccessfully, you’ll see an event, and an alert, for each and every failure. This quickly becomes overwhelming, and you’ll probably turn off EventSource alerting to avoid the alert storm. Your frustration in this case would be understandable – a single Windows Server can be responsible for thousands of event alerts in a very short time period. But turning off event alerting has potentially dire consequences: you can miss crucial events you actually need alerting on, so you’re throwing the baby out with the bath water.
While LogicMonitor is great at identifying issues that need attention, sometimes figuring out what exactly the solution is can be a bit harder, especially for network issues. One relatively common case – an alert about failed TCP connections. Recently, one of our servers in the lab triggered this alert:
The host Labutil01 is experiencing an unusual number of failed TCP connections, probably incoming connections. There are now 2.01 per second failed connections, putting the host in a warn level. This started at 2014-02-26 10:54:50 PST. This could be caused by incorrect application backlog parameters, or by incorrect OS TCP listen queue settings.
OK – so what is the next step? Read more »
While walking our dogs, I often catch up on podcasts. On Planet Money episode number 352 – The High-Tech Cow, they lay out 4 rules for the success of a business in this constantly changing economy. These rules are:
Planet Money illustrates these rules in the context of a dairy farm, but I suggest you consider them as they apply to your IT department, too. Read more »
To paraphrase Oscar Wilde – there is only one thing worse than having no monitoring. And that is having monitoring. Or at least that can be the case when you have too many monitoring systems.
LogicMonitor was recently at the Gartner Data Center conference in Las Vegas. The attendees were somewhat larger enterprises (think General Motors) than the majority of our customer base, but shared many of the same goals – and problems – of smaller enterprises. One problem smaller enterprises do not share was the degree of proliferation of monitoring systems, and the problems this causes. Some companies had over 40 monitoring systems in place (more than one hundred for a few) – and all the commensurate silos that go with them. This means for non-trivial problems, resolving an issue often means getting many people into a war room, so the issue can be investigated and traced across the many monitoring systems, by the many people in all their fiefdoms.
There was an informal consensus that when a problem involves multiple silos, resolution was at least 3 to 4 days, as opposed to hours when it didn’t. This makes running multiple monitoring systems (which help create silos of operational people) a very expensive proposition. At LogicMonitor we often help companies consolidate from 10 or 12 monitoring systems to LogicMonitor plus one or two others, but the benefits in consolidating 40 or more walking dead monitoring systems would be huge.
Some of the other more interesting observations from the conference talks: Read more »
[Kevin McGibben (CEO), Steve Francis (Founder and Chief Product Officer) and Jeff Behl (Chief Network Architect) contributed to this post.]
This week LM’s Chief Network Architect “Real Deal Jeff Behl” was featured on the DABCC podcast with Doug Brown. The interview journey covered lots of ground and sparked our interest about IT industry predictions for 2014. There are so many exciting things happening in IT Ops these days it’s hard to name just a few.
Before it’s too late, here’s our turn at early year prognosticating.
1) 2014 is (at long last) the year for public Cloud testing. The definition of what “Cloud” is depends on whom you ask. To our SaaS Ops veterans, it means a group of machines running off premise for which someone else is responsible for managing. Given Cloud can mean lots of things — from public Cloud infrastructure (Amazon), Cloud services (Dyn or SumoLogic) to Cloud apps (Google Apps) to SaaS platforms (SalesForce and LogicMonitor!). The shared definition among all things Cloud is simple: it’s off premise (i.e., outside your data center or co-lo) hosted infrastructure, applications or services. For most enterprises currently , Cloud usually represents a public data center, offering from the very generic VM compute resources to specific services such as high performance NoSQL databases and Hadoop clusters. Enterprises are starting to gear up to test how the public Cloud fits in its data center strategy. In the past month alone, several of our Fortune 1000 clients confirmed they’ve set aside 2014 budget and IT team resources to test public cloud deployments.
It is relatively well understood in development that dead code (code that is no longer in use, due to refactoring, or changes in features or algorithms) should be removed from the code base. (Otherwise it introduces a risk of bugs, and makes it much harder for new developers to come up to speed, as they have to understand the dead code, and if it is in fact in use, etc.) It is less well understood that the same principles apply to the rest of the IT infrastructure as well. Read more »
2013 was a huge year at LogicMonitor. Thanks to our great customers and a dedicated team, we doubled our revenue, doubled our customer base, and more than doubled both our data center infrastructure and the volume of monitoring performed. Best of all, we accomplished all of this while accelerating investment into R&D and our Engineering team to prepare for even bigger product news in 2014.
As a CEO of an exciting and fast-growing SaaS company, the best use of my time is spent with clients to get first-hand understanding of how customers use the product. Learning what we do well and more importantly — what we need to improve upon — helps us to get better. LM is a product-focused company that is hell bent on transforming the infrastructure performance monitoring business.
So as we rip into another exciting year we decided to put the big ideas of listening to customers and a dedication to making the market’s best product together in the format of the LM 2014 RoadShow. Read more »
Photo credit: Mike Baird
If you’ve got an Internet connection, you’ve most likely heard the popular tune “What Does the Fox Say?” (If not, I’m sorry and/or you’re welcome). The song is about the fact that while we know the sounds other animals make, the sounds of the fox remain a mystery. What could that possibly have to do with IT, you ask? Let me explain.
One thing I have learned from a long history of working in IT is that relying on technology that you don’t understand is dangerous. The technology may make everything work easily when all is well, but when things are not going well — it will be your job to fix it. If you’ve relied on the system hiding all complexity from you, you’re going to have a hard time recovering things, and getting all the “stuff” off the proverbial fan.
The best example I have of this was many years ago when I worked for a rapidly growing SaaS company. In those days, Oracle was the only reliable choice for a high-volume database, and Sun was the preferred hardware architecture. (My, how things change…)
To achieve the reliability we needed, we implemented a Sun clustering solution. Now this solution was so finicky Sun wouldn’t even let you run it (or least, not support it) without their Professional Service engineers setting it up for you. This, of course, made us mere systems administrators a lot less familiar with it than we would have been had we set it up ourselves.
[Originally appeared December 11, 2013 in Edhat Santa Barbara article.]
During the week of December 9th, 2013, the Passport to Santa Barbara will be distributed to 18,500 K-6 public school students in Santa Barbara County. The Passport program provides a passport booklet to children in grades K-6 who, with one adult, are given free admission to Santa Barbara Educator’s Roundtable (SBERT) member institutions, where children participate in specially-designed educational activities found outside of the classroom within the community and receive a stamp in their Passport.
This year’s program is sponsored by LogicMonitor, a local Santa Barbara-based technology company, and the Williams-Corbett Foundation. The Foundation has supported the program for a number of years, but LogicMonitor is a new sponsor.”As a local company interested in supporting our community, we are pleased to be a sponsor of the Passport program this year. Our employees have children that benefit from the Passport program, and as a high-tech company, we depend on the local area having both a great educational system, and a great community. The Passport program contributes to both.” said Kevin McGibben, CEO, LogicMonitor. Read more »
You may think from the preponderance of moustaches in the above the photo that LogicMonitor recruits entirely from former firemen, policemen and others who, in a cliche, sport the hairy lip. In fact we’re just supporting Movember, to help increase the performance, uptime and availability of Men’s health – and all IT infrastructure. (Feel free to contribute to support Men’s health at the link. LogicMonitor did, as well as individual fund contributors.)
And while we do have a variety of jobs open, we do not discriminate on race, gender, age – or even the presence or absence of facial hair.
(Note: we do have a sizeable contingent of female employees. For some reason, none of them participated in Movember….)
Performance monitoring for all your infrastructure & applications. In minutes, not hours.
Questions? Call Us!
(888) 415-6442 or +1 (805)-617-3884