Read more: |
Matti Grönroos
The Service Hours is about the period to provide services. But not usually everything, not to everyone, and not necessarily in their full scope.
There is a lot of desire to optimize the service time, on both sides of the table, because it is a significant cost factor. The bottom line is like betting: Make an educated guess and hope that it leads to a good result.
A pretty common mode of operation is to deliver support services during the office hours, for example 9–17 Mon-Fri. Outside the Service Hours the users are on their own unless extended Service Hours are available for extra cost. Typically, extended Service Hours apply to selected applications only, according to their criticality.
The Service Hours usually apply to Incident Management and Service Availability only, while the remaining services are delivered during the office hours. This is often the root cause for the Watermelon Effect: The users see the service status as red while the official SLA indicators are green. The users do not understand that his/her Important Case is on hold waiting for the Service Hours to resume. This is why the active Expectation Management related to Service Hours is vital.
The Availability figures are usually easy to understand. Let us assume that the monthly Availability target is 99.8 per cent during the Service Hours of 8–17 workdays. An average month has 21.5 workdays. Thus, there are 193.5 hours in average during the Service Hours. The target allows total of 23 minutes service outage per month during the Service Hours. Any outage happened during the weekends and during 17–8 is excluded from the SLA score calculation.
The Incident Management is a more complex case, because a common practice is exclude the time spent for waiting for response or action from the customer or the third-party vendor.
Let us think about the following scenario: The user submits an incident report at 11:00, and criticality class implies Time to Resolve target of eight hours during the Service Hours. At 14:00, the user is asked for more information, and the user responds at 16:00. The ticket is put on hold at 17:00 and resumed at 08:00 next day. A third party is asked for an action at 09:00, and that action is complete at 12:00. The ticket is closed at 13:00.
The final Time to Resolve was six hours, thus well meeting the SLA target. However, the user being not aware of the concept of Service Hours may see the result lousy, because the process took 26 hours elapsed time.
For the sake of clarity: It is possible to agree that the critical incidents are management outside Service Hours. This may involve special pricing and an approval procedure.
The extended Service Hours for Availability are often subject to gambling. The stand-by and on-call arrangements are usually expensive, and for complex services having one person on stand-by or on-call is often not enough. Pretty often, the extended hours are backed up by a lightweight arrangement only: Start calling the experts and tempt them to start working, usually for extra pay. There is a clear business case behind this approach: The sanctions for not meeting the SLA are probably less expensive than the extra cost of stand-by or on-call. From the customer's perspective, this is pretty cynical unless mutually agreed.
It easy to shoot yourself in the foot by making the Service Hours for Incident Management and Availability the same. If the systems fail during the night, the recovery time accumulates the non-availability time, and missing the SLA is possible. It may be advisable to begin the service state checking work and the recovery tasks before the Service Hours begin to have services up and running in time.