Debunking the server uptime myth

One thing I am getting more and more tired of on the web lately is the endless Operating System wars on tech sites I visit regularly. These debates go on and on endlessly about why one operating system is better than another. More often than not these debates are held by people without even the least amount of experience in real world IT environments bringing up moot arguments about why one operating system is better than the other. One such argument I see often is 'server uptime'. Server uptime is seen by some of this people as a measurement of operating system superiority. I do not see server uptime as a good argument or indication of one operating system being superior to another and neither should you.

So what is server uptime really and why should we not care about it all that much? Well server uptime is the time measured between operating system restarts. Usually the clock starts to tick when the operating system is loaded and resets to 0 when the operating system or server it runs on is restarted or shut down for whatever reason. In real enterprises we do not care much about server uptime because it is not an interesting statistic to look at. Your server can be online for 300 days for all I care, if the services it provides crash every other week the server is not really useful to end users.

This brings me to a more appropriate statistic to look at which is ‘service availability’. Service Availability is the total time in which a service was available and usable for end users during a certain timeframe, say a month or a year. This statistic is much more interesting than server uptime as it can usually be translated in user satisfaction. If a service that you provide to an end-user, let’s pick email as an example has a service availability of 100% in the month of January the user will be satisfied with the service you provide. If on the other hand you measure this in server uptime in which case the server is online for the whole month of January but the actual email server was unavailable for one day due to a crash you get a whole different picture and the actual uptime of the email service will not be 100% and will ultimately result in less satisfied users.

This is the reason why enterprises care so much about clustering and failover these days. Real enterprises do not count on just one single box with 300 days uptime to provide critical services to its end users. They use clusters of servers backed up with network failover features and redundant data storage to provide the highest possible service availability they can get. In this real world scenario it does not matter if one single node goes down for security updates or because it crashed as the other servers in the cluster and the services they provide will still be available.

In the real world operating system superiority is not based on server uptime. It is based on how easy they are to manage, how easy the services they provide are managed, how well those services scale and most importantly if the operating system provides the services the enterprise and its users need to get their job done.

Comments (1)

December 15th, 2009 16:16:13 UTC +1

It actually seems logical that the quality of the server is measured by looking how hight he service availability was/is.

But I can imagine that 'server uptime' is easier to understand for some people. ;)

Add comment

Only linebreaks and urls will be parsed in your comment. All fields are required.

You comment will be checked by the author before it's visible to others.