[Originally appeared April 16, 2014 on the OneLogin blog; guest blog post written by Annie Dunham, Director of Product Management, LogicMonitor.]
IT managers either adopt a DevOps philosophy or think it’s passé. Either way, it’s hard to argue with the foundational principle that IT automation isn’t just a trend but rather a key tenant of today’s IT Ops environment. When done right, automation brings efficiency to IT teams.
At LogicMonitor – OneLogin’s newest integration partner that offers infrastructure performance monitoring via it’s SaaS-based platform — it’s often said that “the best way to get a promotion is to automate your way out of a job.” You may wonder what the reasoning is for this and it’s simple: automating manual tasks is fundamental to the LogicMonitor product philosophy. A team that monitors thousands of data points each day are also testing new data centers, adding equipment vendors, and performing the laundry list of daily IT Ops responsibilities. Efficiency isn’t nice to have — it’s a requirement!
Our integration with OneLogin reduces the time required for user management configuration by over 50%, as the integration completely removes the need to manage users and security policies within LogicMonitor.
As previously noted, LogicMonitor was fortunate that none of its infrastructure or services were vulnerable to the Heartbleed vulnerability. But the fact that many sites with excellent security were affected may lead some to question the wisdom of putting business information in the hands of a SaaS provider, no matter how secure, given that the services will necessarily be provided over the Internet.
I think the fact that SaaS providers that were affected remediated the vulnerability almost immediately (e.g. Stripe, Chargify) argues that SaaS providers are a great choice for such information. The entire business of SaaS companies like LogicMonitor rests on our ability to earn and keep our customers’ trust, month after month. Consequently, SaaS providers have to react quickly to vulnerabilities. Read more »
[Written by Chris Morgan, Senior Solutions Engineer at LogicMonitor]
At LogicMonitor, our monitoring philosophy is to provide customers with actionable intelligence. Great examples of actionable intelligence are the alerts we send you about performance issues in your IT infrastructure. Providing meaningful performance and health metrics is our bread and butter, but we want to avoid overwhelming you with alerts as overload often results in apathy, defeating the original purpose of monitoring.
Consider the case of when a Windows Server running SQL database receives a credential change. Any new client request to that server will then fail, and with every failure a Window Event will trigger. When your server has an issue and 100 different clients are trying to access it unsuccessfully, you’ll see an event, and an alert, for each and every failure. This quickly becomes overwhelming, and you’ll probably turn off EventSource alerting to avoid the alert storm. Your frustration in this case would be understandable – a single Windows Server can be responsible for thousands of event alerts in a very short time period. But turning off event alerting has potentially dire consequences: you can miss crucial events you actually need alerting on, so you’re throwing the baby out with the bath water.
Apparently some LogicMonitor people (and it wasn’t just the guys) decided to strut their geek cred in one of our internal chat rooms this afternoon. The names have been removed to protect the geeky.
See how many of these old school technologies you recall…. (sorry Gen Y – you might have to
Alta Vista Google some of this stuff.)
3:32 PM i like the “browser support” section
3:34 PM OMG IE 5.5!!!!!
3:34 PM i’d laugh if it wasn’t so sad that there still are some people using it
3:35 PM I have Netscape Navigator 4.0 installed on this machine
3:35 PM I have seen AOL IE recently Read more »
While LogicMonitor is great at identifying issues that need attention, sometimes figuring out what exactly the solution is can be a bit harder, especially for network issues. One relatively common case – an alert about failed TCP connections. Recently, one of our servers in the lab triggered this alert:
The host Labutil01 is experiencing an unusual number of failed TCP connections, probably incoming connections. There are now 2.01 per second failed connections, putting the host in a warn level. This started at 2014-02-26 10:54:50 PST. This could be caused by incorrect application backlog parameters, or by incorrect OS TCP listen queue settings.
OK – so what is the next step? Read more »
[Originally appeared February 26, 2014 in the Packet Pushers online community, written by Jeff Behl, Chief Network Architect with LogicMonitor.]
LogicMonitor is a SaaS-based performance and monitoring platform servicing clients across the world. Our customers install LogicMonitor “Collectors” within their data centers to gather data from devices and services utilizing a web application to analyze aggregated performance metrics, and to configure alerting and reporting. This means our entire operation (and therefore the monitoring our customers are dependent on) relies on ISPs to ensure that we efficiently and accurately receive billions of data points a day.
Most people know their hosts via DNS names (e.g. server1.lax.company.com) rather than IP addresses (192.168.3.45), and so enter them into their monitoring systems as DNS names. Thus there is a strong requirement that name resolution works as expected, in order to make sure that the monitoring system is in fact monitoring what the user expects it to be.
Sometimes we get support requests about how the LogicMonitor collector is resolving a DNS name to an IP address incorrectly, but DNS is all set up as it should be, so something is wrong with the collector. However, the issue is simply in the interactions of how hosts resolve names, which is not always the same as how DNS resolves names. Read more »
While walking our dogs, I often catch up on podcasts. On Planet Money episode number 352 – The High-Tech Cow, they lay out 4 rules for the success of a business in this constantly changing economy. These rules are:
Planet Money illustrates these rules in the context of a dairy farm, but I suggest you consider them as they apply to your IT department, too. Read more »
To paraphrase Oscar Wilde – there is only one thing worse than having no monitoring. And that is having monitoring. Or at least that can be the case when you have too many monitoring systems.
LogicMonitor was recently at the Gartner Data Center conference in Las Vegas. The attendees were somewhat larger enterprises (think General Motors) than the majority of our customer base, but shared many of the same goals – and problems – of smaller enterprises. One problem smaller enterprises do not share was the degree of proliferation of monitoring systems, and the problems this causes. Some companies had over 40 monitoring systems in place (more than one hundred for a few) – and all the commensurate silos that go with them. This means for non-trivial problems, resolving an issue often means getting many people into a war room, so the issue can be investigated and traced across the many monitoring systems, by the many people in all their fiefdoms.
There was an informal consensus that when a problem involves multiple silos, resolution was at least 3 to 4 days, as opposed to hours when it didn’t. This makes running multiple monitoring systems (which help create silos of operational people) a very expensive proposition. At LogicMonitor we often help companies consolidate from 10 or 12 monitoring systems to LogicMonitor plus one or two others, but the benefits in consolidating 40 or more walking dead monitoring systems would be huge.
Some of the other more interesting observations from the conference talks: Read more »
[Kevin McGibben (CEO), Steve Francis (Founder and Chief Product Officer) and Jeff Behl (Chief Network Architect) contributed to this post.]
This week LM’s Chief Network Architect “Real Deal Jeff Behl” was featured on the DABCC podcast with Doug Brown. The interview journey covered lots of ground and sparked our interest about IT industry predictions for 2014. There are so many exciting things happening in IT Ops these days it’s hard to name just a few.
Before it’s too late, here’s our turn at early year prognosticating.
1) 2014 is (at long last) the year for public Cloud testing. The definition of what “Cloud” is depends on whom you ask. To our SaaS Ops veterans, it means a group of machines running off premise for which someone else is responsible for managing. Given Cloud can mean lots of things — from public Cloud infrastructure (Amazon), Cloud services (Dyn or SumoLogic) to Cloud apps (Google Apps) to SaaS platforms (SalesForce and LogicMonitor!). The shared definition among all things Cloud is simple: it’s off premise (i.e., outside your data center or co-lo) hosted infrastructure, applications or services. For most enterprises currently , Cloud usually represents a public data center, offering from the very generic VM compute resources to specific services such as high performance NoSQL databases and Hadoop clusters. Enterprises are starting to gear up to test how the public Cloud fits in its data center strategy. In the past month alone, several of our Fortune 1000 clients confirmed they’ve set aside 2014 budget and IT team resources to test public cloud deployments.
Performance monitoring for all your infrastructure & applications. In minutes, not hours.
Questions? Call Us!
(888) 415-6442 or +1 (805)-617-3884