Service level checklist: Is your provider providing stability?
What can cloud customers do when a provider’s service crashes? How should they plan for the worst?
Service Level Agreement (SLA) is one of the ways cloud vendors attempt to woo businesses with the robustness of its network, attempting to prove that its service is better than competitors'. But actually, it's a pretty pointless indicator of how good a company's network and service actually is.
If a dispute arises during the service period, both the vendor and customer are unlikely to agree who was at fault and this means SLAs are not particularly adequate to guarantee the best service.
Cloud computing may be on the rise, but it has played a big part in the SLA losing its once lawless reputation. It's almost impossible for a provider to provide an agreed level of availability in the specified time period. There are many parts to an SLA and although back in the 1980s, it was a much simpler way of communicating service levels a customer can expect from its provider, there are too many other considerations in the modern cloud world.
SLEs are only an “expectation”
To address this, Service “Expectation” Agreements (SLEs) are becoming more prevalent, agreeing on the steps a vendor will take to offer continuity of service. Other considerations within an SLE include whether it's in the customer's best interest to spread its services across multiple providers and whether on-premise backup systems are the answer.
In truth, cloud services go down. This means that the availability factor comes down to how much control you decide to retain over your data in the first place. CEO of Open-Xchange Rafael Laguna advises that to prevent being effectively powerless when a service provider goes down, there are several laws to follow to ensure that you maintain as much control as possible.
“Firstly, it’s vital that you choose a type of service that’s available from numerous providers to make switching from one provider to another easier; you also need the practical tools in place to be able to follow through with this. The service you subscribe to must also be available in software form in order for you to run on-premise if you feel necessary,” he said.
Cloud Pro Newsletter
Stay up to date with the latest news and analysis from the world of cloud computing with our twice-weekly newsletter
Laguna unsurprisingly also advocates an open source licence approach for maximising availability. He says that using an open platform will also mean that it can be maintained by a third party further down the line, should your relationship with the original vendor end.
The CEO of Freeform Dynamics, Dale Vile says that the basic principle is that performance and availability both cost money; the more you want (ie higher level of service) the more you pay.
“The second principle is that not all applications and services are equal. Some are genuinely business critical, but most are not. Although in reality, there’s obviously a ‘spectrum of sensitivity’ as you look across any application/service portfolio,” he said.
“Put these two principles together and the first piece advice is always to take time out to understand your real needs. Business stakeholders by default always overstate the criticality of systems by default. They therefore often need to be a) educated that fault tolerance / rapid recovery have a cost, then b) challenged (in an objective way) about the *real* requirement for availability.”
Technology analyst Vile heeds that the alternative here is that you end up paying huge sums for resilience you don’t need, or directing your budget to wrong places.
In the real world, we do see that “most” cloud providers will take customers through the fundamental mechanics that they need to know about how they run their operations. The call to firms here is to invest the time so that they can make a judgement about how happy they are with the robustness of service offered in the first place.
A worthy caveat to this advice comes from Alan Priestley, cloud and big data director for EMEA region at Intel. Priestley advises that with large providers like AWS/Google, customers may have little option to negotiate around service. But with smaller more ‘tailored’ hosters, cloud buyers may have the opportunity to agree on service levels and appropriate recompense for failing to hit the agreed service level.
The Intel cloud spokesman suggests that security also comes into question here. “If the service in question becomes compromised by malware, this is particularly problematic with IaaS where it is difficult for a security software running within a virtual machine to check the integrity of the underlying hypervisor,” he said.
Priestley is clearly alluding to those providers who implement what he calls out as “trusted compute pools”, which in this case would be based upon Intel’s TXT Trusted Execution Technology. This type of ‘measured’ cloud platform construction can (so says Intel) help ensure that service provider infrastructure and software is running in a known good state and that virtual machines can safely be instantiated onto this service.
In short, this (arguably) all leads to more managed, measured and controllable availability and this, surely, is our end goal.
Availability anxiety? Stop complaining!
So where has all this availability anxiety stem from? Solutions architect at disaster recovery specialist Databarracks’ Mark Thomas points out that AWS is often on the receiving end of criticism for failing to take responsibility for data losses or outages. But he says this is the case when, in fact, AWS is actually providing organisations with the resource to build their own resilience.
“AWS customers have access to multiple availability zones to avoid outages if one datacentre goes down. If they don’t take advantage of this, AWS holds no responsibility for downtime. There is nothing wrong with this model as long as you understand what your responsibility is. You can’t directly compare this with SLAs from other service providers who take more responsibility for availability across data centres. They are just different models,” says Thomas.
At the risk of the using the expression ‘at the end of the day’, ultimately here the service availability question comes down to a question of openness of platform and corresponding component technologies. Sean McNavan, managing director of hosting and application management company NaviSite Europe says that cloud customers should look for a provider that maximum transparency.
“The platform transparency factor must be combined with the ability to build in resilience in terms of high-availability components, backups and geographic diversity; only this can result in effective business continuity measures and resilient network connectivity. If a customer has insight into the underlying infrastructure and the ability to influence the design of their platform, they can take accurately quantifiable measures to mitigate the risks of outage.”
So availability isn’t just a question of SLAs, of even SLEs if they do indeed exist. It is also a question of architectural engineering, business clarity, platform transparency and ultimately, the responsibility that a customer is prepared to take for its own cloud-located data and applications. And all along you thought it was the cloud vendors who were at fault right?
Clare is the founder of Blue Cactus Digital, a digital marketing company that helps ethical and sustainability-focused businesses grow their customer base.
Prior to becoming a marketer, Clare was a journalist, working at a range of mobile device-focused outlets including Know Your Mobile before moving into freelance life.
As a freelance writer, she drew on her expertise in mobility to write features and guides for ITPro, as well as regularly writing news stories on a wide range of topics.