No one likes to talk about outages. They’re horrible to experience as an employee and they take a heavy toll in customer confidence and future revenue. But they do happen. Even publicly traded tech powerhouses, such as eBay and Microsoft, who have more technical resources than you’ll ever have, fall prey to outages. And when they do, they are closed for business, much to the chagrin of their shareholders and executive teams.
It’s not so much a question of whether an outage will occur in your company but when. The secret to surviving them is to get better at handling them and learning from the mistakes of others. Nobody is perfect all the time (my current employer, LogicMonitor, included) but I hope by talking about these mistakes, we can all begin the hard work required to avoid them in the future.
An outage occurs. A barrage of emails is fired to the Tech Ops team from Customer Support. Executives begin demanding updates every five minutes. Tech team members all run to their separate monitoring tools to see what data they can dredge up, often only seeing a part of the problem. Mass confusion ensues as groups point their fingers at each other and Sys Admins are unsure whether to respond to the text from their boss demanding an update or to continue to troubleshoot and apply a possible fix. Marketing (“We’re getting trashed on social media! We need to send a mass email and do a blog post telling people what is happening!”) and Legal (“Don’t admit liability!”) jump in to help craft a public-facing response. Cats begin mating with dogs and the world explodes.
Read more »
How Cedexis Deploys Puppet Enterprise and LogicMonitor Jointly to Support its Global Operations
Founded in 2009, Cedexis is building a faster Web. Cedexis offers visibility and control of Web performance through its community-based monitoring & analysis solution, Cedexis Radar, and its global traffic management platform, Cedexis Openmix.
Ops in the Cloud
Deploying their technology strictly in a cloud environment, Cedexis’ TechOps team follows a simple rule: “Never touch hardware.” Cedexis manages its dynamic host deployments globally across a range of managed hosting and cloud providers. To ensure uniformity across datacenters, Cedexis configures new machines identically via configuration automation tools in order to prepare each with a “blueprint” to take the Cedexis code.
Read more »
“Excuse me. Did you just say that I could learn something of actual value from a marketing person?”
Yes, I did.
In case you haven’t noticed, marketing has undergone a phenomenal transformation in the last decade. Marketing has implemented complex automation platforms, like Marketo and Eloqua, that can sift through a sea of prospect data and use predictive analytics to pick out those most likely to purchase. Web analytics, too, have gone mainstream, largely due to Google giving away the functionality for free. Consequently, marketing can better quantify the ROI of what they spend. And, according to Gartner Research, 81% of companies with revenue of more than $500M have a Chief Marketing Technology Officer, and that number is expected to grow another 8% next year. By 2017, Gartner predicts that Chief Marketing Officers will be spending more on technology than CIOs!
So what does that have to do with IT monitoring?
Read more »
Too busy to keep up with what’s happening on the Web? Never fear. Starting this month, LogicMonitor will begin posting our favorite tech articles, blog posts, ebooks, videos, podcasts, cat pictures and more every month. Our favorites from August:
Tech conferences are great venues to get better at your job. You may set off to learn a new technology, hear from thought leaders about your industry, network with new and fabulous people in your field, and possibly find a new job! Don’t tell your boss, given she just begrudgingly signed off on your travel to San Francisco for VMWorld! Tech tradeshows can be overwhelming. The jetlag, milling about in large crowds, sleep deprivation and the technology onslaught are all factors working against you. Here are some tips to ensure that you get the most bang for your buck: Read more »
Why is Solaris any different? Two reasons: (1) it virtualizes the swap space, and includes unused parts of physical memory as swap space, and (2) it maintains the distinction between paging and swapping.
These two factors often give rise to confusion and misinterpretation of the data, especially when queried via SNMP. Read more »
For those of you using our MongoDB monitoring, there’s an update for replication monitoring.
There’s a few improvements over the prior datasource: it deals with authentication better; removes some assumptions about whether members of a replica set are running on common ports, etc. Most of the data points being monitored are standard, and don’t need much comment. (We find all the members; monitor their health, state, uptime, etc). Read more »
As we’ve often preached – too many alerts are just as bad as missing alerts. You don’t want your team to become inured to alerts, so they don’t take action on those indicating outages.
For those of you using Campfire as your team collaboration tool, you now have another way to help manage your infrastructure and server monitoring alerts, and ensure every alert is reacted to appropriately. LogicMonitor now integrates with Campfire using the Campfire API.
How does this integration help you react to alerts appropriately, and ensure your teams don’t suffer from alert overload? Read more »
In recent years, Solid-State Drives or SSDs have become a standard part of data center architecture. They handle more simultaneous read/write operations than traditional disks and use a fraction of the power. Of course, as a leading infrastructure, software and server monitoring platform vendor, we are very interested in monitoring our SSDs, not only because we want to make sure we’re getting what we paid for, but because we would also like to avoid a disk failure on a production machine at 3:00AM in the morning…and the Shaquille O’Neal sized headache to follow. But how do we know for sure if our SSDs are performing the way we want them to? Being one of the newest members of our technical operations team, it came as no surprise that I was tasked to answer this question. Read more »
Performance monitoring for all your infrastructure & applications. In minutes, not hours.
Questions? Call Us!
(888) 415-6442 or +1 (805)-617-3884