Vihreä Ruusu - Availability - A Value Chain or a Value Network?

Vihreä Ruusu

Service Governance	Service Level	Processes and Practices
Mode of Operation	Vendor Relations	Availability

Suomeksi

Matti Grönroos CV

Matti Grönroos

Availability - A Value Chain or a Value Network?

We are still wondering what 99.999 per cent or any uptime figure means. Does it have some correlation to the company business? Does the availability figure mean anything? If not, is calculating such a thing a waste of money?

The history of the availability metrics is based on the mainframe architecture from decades ago: The display terminals had fixed cabling attached directly to the mainframe and the application were simple and straightforward. In such a world, the users' perception of the availability was pretty much the same as the mainframe's technical availability.

However, the world has changed, and it changes constantly. Servers are almost immortal in terms of their hardware, and the uptime of the hardware tells you almost nothing about anything. Availability has become a more unclear concept year by year as value chains become more complex.

The increasing complexity of IT systems seems to be a law of nature. When complex systems are measured by oversimplifying metrics, we are approaching the Watermelon Service discussed elsewhere on this website. The service level seen by the service recipient is totally different from what the service provider reports.

Let us start with a simple example. A user sits at his or her workstation and presses a button. If nothing happens for a while, the user thinks that the service provider is lousy, and he or she walks to the coffee machine.

But is the case that simple? Let us strip the example down to the bones and look at only the most central components represented in the value chain:

Even in this oversimplified model, the transaction passes about twenty times through different components. One must have a very good crystal ball at hand to be absolutely sure that issue lies within a component within the responsibility of the service provider.

The concept of availability is somewhat analogous to a spoiled egg: If the egg box contains a spoiled egg, what is rotten: The egg itself, the box of eggs, the entire shopping cart, the supermarket or the mall hosting the supermarket? There is no single answer to this, just as there is no single answer to what the availability of a complex value network is.

If the availability is to be measured and reported with unambiguous metrics, the procedure and the principles must be agreed very clearly to have a common view on both sides of the table. This is extremely important if the quality penalties will be based on the availability metrics. The calculation method shall be reasonable, taking into account the criticality of the systems. No reason to pay penalties for insignificant incidents. You always must aim for a dashboard producing reasonable information and useful metrics.

Let us look at a chart reflecting the Value Network thinking. We have two systems X and Y. There are three offices A, B, and C having a total of 1200 users. System X runs on a 4-way cluster where one member can be down without the users seeing a performance degradation.

The traditional model of availability calculation based of server availability gives the following results:

If point 1 is down, the availability is 80 %, even if every third user cannot work.
If point 2 is down, the availability is 80 %, even if every user sees the service being up and running.
If point 3 is down, the availability is 100 %, even if every user sees the service down.

If point 4 is down, the case is sexier. All these availability figures are logical:

100 %, because all servers are up
0 %, because there are users who cannot work.
67 %, because 2 offices of 3 can work.
50 %, because 600 users of 1200 cannot work.
43 %, because 450 of 800 users of the critical service X cannot work.

None of these figures is absolutely wrong or absolutely correct.

The whole thing is not made easier by the fact that internal systems may have a considerable number of integrations between systems. There can also be integrations to services outside the organization.