Read
more:

Matti Grönroos

End-to-End SLA and the Math of Availability

Every now and then, the end-to-end SLA comes up as a hot topic. It is about a service agreement where the availability is measured from the end users' workstations, and where a single service provider takes the responsibility for the entire value chain. This would be nice to have, but there are few service providers willing to take on such a responsibility. The volunteers will add a big risk premium into the price tag.

In addition, unrealistic expectations are often placed on target level of the end-to-end SLA. No, the end-to-end SLA is not a means to get 99.999 per cent availability for almost free. By default, there is no such thing as a 99.999 percent end-to-end availability.

Since it is fashionable to talk about end-to-end availability, many are ready to want it.

Let's examine the question a bit, both as a technical and contractual legal question.

Often, we recognize three degrees of availability:

Such a value chain is often fragmented into an archipelago of responsibility islands under the responsibility of several service providers: for example, between the local on-site IT support, a networking supplier, an infrastructure supplier, a few software suppliers and application support suppliers. The contractual legal issue arises from the fact that no single party has clear overall responsibility, and usually no one is ready to bear it either. In the SIAM operating model, the related complexity can be hidden from users, but even that does not resolve the fundamental puzzle. In practice, there is not much readiness to commit to end-to-end availability easily available for purchase anywhere.

Technically, purely based on probability calculus, the total availability of "series-connected" components is the product of the availability of individual components. In addition to the fact that the chain is as strong as its weakest link, each link itself reduces the total availability. This is illustrated by a fictitious example of the formation of end-to-end availability:

Browser99,80 %
Workstation software99,80 %
Workstation hardware99,95 %
Router99,98 %
WAN99,80 %
LAN99,80 %
Application99,80 %
Database Engine99,99 %
Middleware99,90 %
Server Software99,90 %
Server Hardware99,99 %
Disk system99,99 %
Total98,69 %

We can see that even if the projected availability of each value chain component was outstanding, the entire value chain does not reach 99 per cent.

Many techniques and redundancy take this straightforward model towards higher mathematics. Often the rule of "one additional nine" is valid: Add one 9 into the availability of each component to get an estimate of the availability of the entire value chain. For example, when aiming for 99.90 % availability, the availability of each component must be 99.99 %.

It is essential to understand that end-to-end availability is quite different from component availability. There is no sense applying the same targets to the end-to-end availability as to the individual components.

Along with acquiring end-to-end availability, it is also difficult to measure it in an objective way. The measurement must be done at the user workstation, which is often beyond the scope of the service provider. In addition, the metric would be subject to manipulation. In addition, the question remains whether end-to-end availability should be measured only when the user needs it. It could be quite expensive as a standard operating procedure.