When does 24x7 support really mean 24x7?
In my last post, I reviewed the two most important factors for succeeding in the new environment of cloud platforms… Now, I deep-dive into the first of these two factors: 24x7 support.
Quality support means experienced engineers delivering workable technical solutions to solve difficult technical problems for customers. It’s hard to imagine getting talented and experienced individuals to do this every evening, weekend and holiday. But, when a company contracts 24x7 support, this is exactly what it’s asking for, even if it means waking up a senior engineer at 2 am on Christmas Day.
But how do providers retain talent without burning out employees, who’ll then leave and join a company with easier hours? The answer: Investment. Companies that have heavily invested in two areas are the ones that are best positioned to deliver quality services to their customers in the long-term. These areas of investment are the first line of support and monitoring systems.
The first line of support
Many cloud providers today subcontract their 24x7 operations. Others deliver this service with just a handful of people. Neither approach will work on a sustained basis. Why? The actions that subcontracted first-line support workers can undertake are limited at most to anything for which a run-book exists. They’ll lack knowledge of the platforms under management. They don’t work with the engineers that deployed the solution so can’t learn and improve. Everything else needs to be escalated to the on-call engineers. This may work for a short period but eventually, those on-call engineers will get fed up and leave. When they do, they’ll take all the knowledge of the customer’s platforms with them.
A small in-house first line won’t cut it either. To deliver a quality 24x7 operation, you need a minimum of 12 people: three shifts a day with two people per shift and two teams per shift to cover a seven-day week. Any first-line operations with less than this does not offer the required coverage, nor is it resilient enough to handle people getting sick or taking leave.
Quality service providers know this and have invested in their first-line operations without outsourcing. Typically, they build and train a team so they can increasingly handle problems and issues themselves, without escalating – they become ever-increasingly skilled engineers. This team is obviously provided run-books, but also works closely with more senior engineers to learn how to diagnose and solve more complex problems. At least two support engineers are available 24x7 and they learn from each other.
By investing in their 24x7 first-line of support, more problems are resolved at this level, fewer issues require escalation and problems are solved faster. This kind of investment is the single most important way to deliver a quality service to your cloud customers – it doesn’t matter whether we’re talking about hosting, colocation or cloud services.
The second area of investment needed is in monitoring systems to quickly identify and diagnose problems. To quickly fix customer problems, a provider must identify and then diagnose them in the shortest time possible. A successful provider manages hundreds of platforms and thousands of devices. Only an effective and tailored approach to monitoring provides the right information at the right time. Many of the recent entrants in the cloud services market have never had to monitor live systems in the past. They delivered the application to the customer and it was the customer’s job to operate it. These organisations often simply use the standard monitoring suite for the infrastructure stack that they’ve purchased.
By contrast, experienced service providers have worked on the efficient identification and diagnosis of problems for years. Typically, they will be able to explain:
1. What monitoring system they use
2. How they have customised and tweaked it (in general and for specific customers)
3. What they used before
4. What limitations they faced with their old system and why they changed
It’s only through the continuous evolution and improvement of a monitoring system that a company can deliver a quality service. You just can’t find and fix problems quickly without it. What’s required is a deep knowledge of what can go wrong and how best to alert the support teams when it does. An out-of-the-box solution just won’t deliver.