We’ve talked about this before on the blog a few times, but it’s really policies and processes that help make a successful datacenter monitoring deployment.

We’ve collected some of our best practices, from years of our own experience running datacenters, and also from working with a lot of customers, into a Alert Response Best Practices document.

The one key item we urge everyone to adopt, especially during the initial deployment of monitoring, would be a weekly alert review, to decide what alerts need tuning, disabling, or what issues to focus on fixing.

MySQL monitoring, SSDs and failover

As those customers that are (normally) housed out of our Boston datacenter know, we had to exercise a failover from East Coast to West Coast datacenter. (Why? Short version – a subcontractor of the colocation provider moved the wrong rack of servers.)

So we were confident in our failover – we have identical servers idle in the other datacenter, just waiting to take over, and we’d tested processes, so we were not expecting any issues. Failover went fine, and all customers were running happily within an hour of the event. But.. there were issues. Some customers complained that the UI reported intermittent “Access Denied” errors. Read more »

