1-888-41LOGIC

Is SaaS Security stopping you from adopting the best solution for your business?

Is SaaS Security stopping you from adopting the best solution for your business?

February 26, 2010 – 11:08 am
Having worked in SaaS companies for a long time (going back to when they were called ASPs), I've heard a lot of companies not adopt SaaS solutions due to "security concerns".  This attitude has generated a quite a few blog posts recently, so thought I'd add my 2 cents. The people involved in SaaS think security is often better in SaaS systems that premise based systems. Justin Pirie at "The Week in SaaS" (an essential blog for those in SaaS, I think), put it this way: something struck me- 46% of people surveyed were not moving to the cloud because of security. This is bonkers! Just because it’s behind your firewall does not make it secure. Reuvan Cohen at Elastic Vapor summarizes his view: the new reality is that cloud computing is in a lot of ways more secure simply because people are actually spending time looking at the potential problems beforehand. So what's my opinion? ...

How Application monitoring saves you money

February 6, 2010 – 1:11 pm
We here at LogicMonitor use our own service to monitor the various parts of our infrastructure, and doing so demonstrates the financial value that LogicMonitor brings. The more you instrument with LogicMonitor, the more power it has.  In the cases below, the information and alerts that LogicMonitor presented allowed us to avoid spending money on more hardware - and with LogicMonitor's availability requirements, each hardware purchase usually means 3 x the hardware (active/passive at the datacenter in question, and failover hardware present in a different datacenter.) One case was relatively straightforward - a review of the MySQL performance monitoring metrics revealed that the number of rows read due to  read_rnd_next operations was very high - in the tens of thousands per second. (For those of you not DBAs, this is the number of rows MySQL reads sequentially in order to satisfy a read request - an indicator that indexes are not being ...

Complexity doesn’t belong in your datacenter.

January 29, 2010 – 10:11 am
When designing infrastructure architecture, there is usually a choice between complexity and fault tolerance.  It's not just an inverse relationship, however. It's a curve. You want the minimal complexity possible to achieve your availability goals. And you may even want to reduce your availability goals to reduce your complexity (which will end up increasing your availability.) Basically, the rule to adopt is If you don't understand something well enough that it seems simple to you (or your staff), even in it's failure modes, you are better off without it. Back in the day, clever people suggested that most web sites would have the best availability by running everything - DB, web application, everything - on a single server. This was the simplest configuration, and the easiest to understand. With no complexity - one of everything (one switch, one load balancer, one web server, one database, for example) - you can tolerate zero failures. With ...

Automation of Datacenter Monitoring

January 8, 2010 – 3:03 pm
Denise Dubie wrote a recent piece in CIO magazine about "5 Must-have IT Management Technologies for 2010", in which she identifies one of the must-haves as IT process automation. She quotes Jim Frey, research director at EMA: "On the monitoring side, automation will be able to keep up with the pace of virtual environments and recognize when changes happen in ways a human operator simply could not." At LogicMonitor we couldn't agree more. It's true that, as the article implies, virtualization and cloud computing make the need for monitoring automation more acute than previously (which is why customers use LogicMonitor to automatically detect new hosts and newly created monitor Amazon EC2 instances - having dynamic system scaling without the ability to automatically monitor the dynamic systems is just asking for undetected service affecting issues.) However, even in traditional non-virtualized datacenters (and despite the buzz, most datacenters and services are still built ...

The many faces of JMX monitoring

January 4, 2010 – 4:22 pm
We like monitoring. We like Java. Not to slight other languages - we like Ruby, perl, php, .NET and other platforms, too, and like to monitor them, also. However, unlike most other languages, Java provides an explicit technology for monitoring applications and system objects.   JMX is supported on any platform running the JVM, but like most other monitoring protocols, there are lots of interesting nuances and ways to use it. Which means lots of nuances in how to detect it and monitor it. We have quiet a few customers that use LogicMonitor for JMX monitoring, of both standard and custom applications, so we've run into quite a few little issues, and solved them. One example is that the naming convention for JMX objects is loosely defined.  Initially, the JMX collector for LogicMonitor assumed that every object would have a "type" key property, as specified in best practices. Of course, this is a rule ...

Active/Active or N+1?

December 21, 2009 – 2:51 pm
If your infrastructure has to be up at all times (or as much as possible), how to best achieve that?  In an Active/Active configuration, where all parts of the infrastructure are used all the time, or in an N+1 configuration, where there are idle resources waiting to take over in the event of a failure? The short answer is it doesn't matter unless you have good monitoring in place. The risk with Active/Active is that load does not scale linearly.  If you have two systems running at 40% load, that does not mean that one will be able the handle the load of both, and run at 80%.  More likely you will run into an inflection point, where you will run into an unanticipated bottleneck - be it CPU, memory bandwidth, disk IO, or some system that is providing external API resources. It can even be the power system. If servers have ...

When an OID is not an OID

December 13, 2009 – 7:39 pm
It's still surprising to me that hardware and software manufacturers do not seem to value any kind of consistency in their management interfaces.  Or maybe it's intentional, to complicate monitoring and management of their systems to encourage the purchase of the vendors own monitoring systems. In any event, it makes the case for a monitoring service such as LogicMonitor, where we actually provide the templates of what you should be monitoring for a specific kind of device, all the more compelling. A few examples of what I mean: NetApp decided to change the OIDs used for reporting fan and electronics failures from one minor release to the next. Similarly, NetApp changed the units that volume latency is reported in for releases after version 7.3 from millseconds to microsecond. Cisco changed the way it responds to queries for the interface queue length of vlan intefaces between minor releases of the 12.2 code. Microsoft changes all sorts of ...

Simple ways to start addressing DataCenter power needs

December 7, 2009 – 12:33 pm
Anyone that run's IT infrastructure is aware that power consumption is one of the biggest costs in datacenter provisioning and ongoing expenses.  If they are not, they will soon become aware, as energy costs are predicted to increase in the future, and are the fastest rising cost in the datacenter. Maximizing power efficiency is a complex topic, which can involve: virtualization to consolidate physical servers adoption of on-demand cloud computing evaluating whether your applications scale in a way such that new, more powerful equipment (which draws more load) will actually be efficient in delivering more requests per Amp (which may not be the case if your bottleneck is latency of an external storage system,  or database, not CPU speed) However, there are some simple things that all IT Managers should be on top of. Track your power usage. You should be tracking your power usage over time.  You should be able to see the total usage, ...

Why CPU load should not (usually) be a critical alert.

November 20, 2009 – 9:29 am
One question that often arises in monitoring is how to define alert levels and escalations, and what level to set various alerts at - Critical, Error or Warning.  Assuming you have Errors and Critical alerts set to notify teams by pager/phone, and Critical alerts with a shorter escalation time, here are some simple guidelines: Critical alerts should be for events that have immediate customer impacting effect.  For example, a production Virtual IP on a monitored load balancer going down, as it has no available services to route the traffic to.  The site is down, so page everyone. Error alerts should be for events that require immediate attention, and that, if unresolved, increase the likelihood that a production affecting event will occur.  To continue with the load balancer example, an error should be triggered if the Virtual IP only has one functioning backend server to route traffic to - there is now no ...

Top I.T./Datacenter Monitoring Mistakes, Part 4 in a series.

November 10, 2009 – 5:55 pm
Monitoring System Sprawl This is often a corollary to the first point, not relying on manual processes.  The number of monitoring systems you have in place should approach 1.  You do not want one system to monitor windows servers; another for linux, another for MySQL, another for storage.  Even if they are all capable of automatic updates, filtering and classifying, having multiple systems still virtually guarantees suboptimal datacenter performance.  What happens when the DBA changes his pager address, and the contact information is updated in the escalation methods of 2 systems, but not the other 2?  What happens when scheduled maintenance is specified in one system, but not another that is tracking another component of the systems undergoing maintenance? You will end up with alerts that are not routed correctly, and alert overload.  You may also end up with a system that notifies people about issues they have no ability to acknowledge, ...