Data Center

Understanding Data Center Reliability, Availability and the Cost of Downtime

Here’s a sobering statistic from the National Archives and Records Administration in Washington, D.C.: 93% of businesses that have lost availability in their data center for 10 days or more have filed for bankruptcy within one year.

We all know downtime is to be avoided but that little tidbit really drives home just how important it is. It’s also important, then, to understand all the threats to data center availability as well as the cost of downtime – so you can make the business case to address those threats.

First let’s define a few terms associated with the topic. Reliability is the ability of a system or component to perform its required functions under stated conditions for a specified period of time.

Availability, on the other hand, is the degree to which a system or component is operational and accessible when it’s required for use. Reliability factors into availability, as does recovery time after a failure occurs. In a data center, having a reliable system design is the most critical variable. But once a failure does occur, the most important consideration becomes getting the IT equipment and business processes up and running as fast as possible, thus keeping downtime to a minimum.

When measuring data center availability, you’ll no doubt hear about 5 9s, which can be quite misleading. It derives from the world of networking, where if the network is available 99.999% of the time, which translates to all but 5 minutes per year, it is considered highly reliable. But it doesn’t translate so neatly to a data center. Too often, it is used to refer to the amount of time the data center is powered up. But loss of power is only one part of the equation when it comes to data center availability.

Consider two data centers that are both considered 99.999% available. In one year, Data Center A loses power once, for 5 minutes. Data Center B loses power 10 times, but for only 30 seconds each time. While both data centers were without power for a total of 5 minutes each, you must also consider recovery time. Anytime a server loses power, for example, it has to reboot, recover data and repair corrupted data. The time it takes to recover, known as the mean time to recover (MTR), could be minutes, hours or days. So the data center that loses power 10 times will have a far greater MTR than the one that lost power only once – and hence probably far from a 99.999% availability rating.

Other factors that pose a threat to data center availability include lack of cooling and hot spots, both of which can lead to downtime if IT equipment gets too hot. Other threats to IT equipment include prolonged improper utility power, exposure to high or low temperatures, humidity, component failures and simply old age. Disasters such as hurricanes and tornadoes obviously pose threats to data center availability as well.

But according to Gartner Group, the largest single cause of data center downtime is human error, which can result from poor training, inadequate documentation that leads to mistakes in change management, and fragmented systems management.

To learn more about the various threats to data center availability, and how to calculate the cost of downtime in your own data center, check out Fundamentals of Availability. It is one of the many courses offered by Schneider Electric’s free online education program Energy University, a series of courses based on an intuitive, interactive platform. The course takes only about an hour and you’ll gain valuable information to help you make the business case for improvements that will boost the availability of your data center. What’s more, you can get education credits from organizations including the IEEE, IFMA, BICSI and more.

Don’t let data center downtime put your business in peril. Take this free course and understand what it takes to operate a truly reliable, available data center. You’ll find this course, along with many others, in the college of Data Centers on the Energy University site.

One Response to “Understanding Data Center Reliability, Availability and the Cost of Downtime”

  1. morena sanidad

    Great article on Data Center Availability. Thanks for the free course on DC in the Energy University. I will check it out.

    Reply

Leave a Reply

  • (will not be published)