Read more: |
Matti Grönroos
It would be possible to approach the question more systematically, and to set recovery targets in addition to the functional requirements. There are two choices available: RTO and RPO, Recovery Time Objective and Recovery Point Objective.
When the mainframes dominated the world, the infrastructure restart was a time-consuming exercise. Almost as soon as it was complete, card readers started to eat card decks, and the production was up and running. The overwhelming interest in the server uptime and downtime only dates back to this era. Of course, the popularity is based also on the monitoring applications all capable of collecting this information.
When the company business is dependent on the IT systems, the entire value chain shall be designed for a quick recovery. That is why it is essential to define reasonable values for two recovery targets:
The business case is straightforward: The shorter the RTO and RPO are, the bigger budget is needed.
If these targets are to be included as SLA metrics in the service agreement, it is necessary to agree the exact criteria for the start and end times of the service break, see Service Break.
The definition of the starting time of a service break is not unique. Often, the downtime triggers when the notification by end-users is received by the service provider, or when the service provider's monitoring system detects it. In addition, it may be a necessary discussion topic, if the troubleshooting before an unplanned system restart is to be counted as Incident Management or as a part of the service break.
The end time has more options:
Each of these, and a few additional ones, is a valid criterion on some grounds. It is essential to find a common understanding of the criteria. A service provider delivering plain infrastructure services only most likely does not want to take the responsibility for the chain beyond point 4. Nevertheless, it is good to find a body whose responsibility is to ensure the success of the entire recovery.