Data Center

Assessing Your Data Center Maintenance Program

It’s no secret that skimping on maintenance can lead to serious problems in a data center. But it can be difficult for data center operators to determine the appropriate level of maintenance for their facilities, especially if they are not maintenance professionals.

What’s required is a well-organized operations and maintenance (O&M) program that is in sync with the risk tolerance level of the data center. Lee Technologies, a subsidiary of Schneider Electric, in 2006 came out with a framework for aligning an existing or proposed maintenance program with a facility’s operational and performance requirements. Known as the Tiered Infrastructure Maintenance Standard (TIMS), the framework helps companies come up with a maintenance plan that aligns with corporate data center performance goals with respect to attributes such as reliability and risk tolerance.

The framework is comprised of four maintenance service tiers or levels, as follows:

TIMS-1: Run to Fail

Companies at this level perform only reactive maintenance, reflecting the old adage, “if it ain’t broke, don’t fix it.” Depending on the level of redundancy that exists, failures may or may not hinder data center performance, but it does increase the likelihood of multiple concurrent failures that can take down even redundant systems. Companies that fall into this category may feel that the cost of an outage is low compared to the cost of maintenance. Or maybe they’ve simply cut the maintenance budget because budgets are tight. Either way, they’re flying in the face of statistics that show any short-term savings in maintenance costs will eventually be overshadowed by costly outages and repairs.

TIMS-2: Unstructured Maintenance

Organizations at the TIMS-2 level do perform maintenance but with little to no structure to regulate how the work is performed and whether it’s effective. It’s a common practice throughout the industry, with companies following manufacturer’s maintenance recommendations. But without a detailed scope of work for each piece of equipment that factors in system interdependencies, important steps are likely to be neglected. And without detailed Methods of Procedure (MOPs) that detail how maintenance is to be performed, there’s greater risk of human error because even experienced technicians can become distracted and make mistakes. Another common characteristic of a TIMS-2 program is over-reliance on the same individuals to perform maintenance year in and year out on various data center components. It creates a significant risk when all the maintenance knowledge for any given system or piece of equipment resides in the head of a single individual as opposed to written documentation.

TIMS-3: Structured Maintenance

At this level, pretty much nothing is left to chance when it comes to maintenance. The idea is to maximize uptime by eliminating guesswork and minimizing human error. That means a heavy dose of policies and procedures that detail how and when work is performed along with programs to identify, train, supervise and evaluate qualified maintenance personnel. TIMS-3 practitioners use best practices for every facet of the O&M program, with the goal being to eliminate the variables that can introduce errors. Maintenance activities are proactive, controlled and documented.

TIMS-4: Facilitated Maintenance

Facilitated Maintenance, the highest level of maintenance service, combines a Structured Maintenance program with a data center design that facilitates concurrent maintenance. Such a design incorporates multiple power and cooling distribution paths with redundant components that allows individual pieces of equipment to be isolated, so maintenance can be performed on them with no disruption in service. TIMS-4 also incorporates a Building Automation System (BAS) and/or Data Center Infrastructure Management (DCIM) system to continually monitor critical infrastructure, report on performance trends and alert operators when conditions fall outside preset parameters. Additionally, a Computerized Maintenance Management System (CMMS) is used to enable efficient scheduling of maintenance events, as well as the analysis and management of maintenance effectiveness.

Most likely, your maintenance program doesn’t fit neatly into any one of those categories. It’s common for a data center to fall into one category with respect to certain data center components and a different category for others. Maybe your electrical systems fall into the Structured Maintenance category but your HVAC plant is more squarely in the TIMS-2 Unstructured level. In such cases, the weakest link principle applies: the overall service level is only as high as the lowest level of maintenance being performed in any critical area of the facility.

To learn more about the TIMS framework and how to apply it to your data center to achieve the level of reliability you need, check out APC by Schneider Electric white paper number 178, “A Framework for Developing and Evaluating Data Center Maintenance Programs.”

 

One Response to “Assessing Your Data Center Maintenance Program”

Leave a Reply

  • (will not be published)