Read more: |
Matti Grönroos
A watermelon is green outside and red inside. It is not uncommon for the users and service recipients to express their dissatisfaction even if all numeric SLA metric targets are met. The reality is like a watermelon: red inside but pretends to be green looking outside.
The service providers have decades of experience in how to show green even if the reality would be bright red. The customers are not guilty either.
It is not uncommon that service recipients and service providers permanently disagree on whether the quality of service is acceptable or not. The service providers refer to the the agreed SLA metrics and tell customers having met them. The service receivers insist the end-users are dissatisfied.
Most probably, both are right.
The SLA metrics usually are far from being optimal and reasonable.
For example, the SLA might state that the service quality is OK, if the availability is 99.0 % during service hours, calculated over a month. Sounds professional, but it is a partial truth only.
Information systems usually do not follow societal norms of equality. Instead, there are important and even more important parts. If the most important component is repeatedly down, the entire system might be in an unacceptable state regardless of whether the SLA metrics are green or red.
Let us have a look on the following picture:
Let us assume that the SLA is traditional, defining the Service Availability to be equal to the average of the Server Availability. Pretty easy, but severely cutting corners. If the Service Hours are 8 to 18 workdays, there are 215 hours of Service Hours on average during a month. As there are ten servers, 100 % availability equals to 2150 hours. The availability target of 99.0 % allows 21.5 hours downtime, and we still are green.
However, the servers are not equal. If the BOSS server is down, the entire service is down. The SLA would remain green even if the POMO service were down 21.5 hours a month, more than two full working days. The end-users will show red because the downtime reflects very bad quality.