Despite all of the growing sophistication of today’s buildings and the systems that keep them operating, the majority of interruptions don’t come from equipment failure but human error.
While data center infrastructure and component failures obviously require attention when they occur, the rigorous planning, design, specifications, and commissioning of these systems, combined with adequate redundancy, has reduced the number of such failures in recent years. Experts say it’s increasingly important to pay attention to human error. In fact, human error is at fault in between 60 to 80 percent of data center downtime events.
Facility managers first need to examine operating strategies and see where there are deficiencies. They need to determine if there is proper delineation of responsibilities between departments and develop work rules unique to the facility. Staff structure should match operations goals, annual objectives, and ownership of tasks, systems, and process should be assigned appropriately.
The computer room is one of the most crucial sites, as it is one where multiple groups often work side by side. Tasks performed in this space tend to present the greatest risk of human error simply because of the fact multiple departments are involved and increased human activity does occur in the room.
Developing written expectations, often called “internal service level agreements,” between each ownership group helps to clearly define each team’s role in the shared space and presents a necessary and significant level of detail when it comes to such functions as power distribution or master planning (where the hardware is located to achieve optimal cooling and performance). Downtime interruptions tend to occur most frequently when those with supposed ownership do not have the proper training, knowledge, and experience with proper procedures or even how to install or remove a device.
Click here to learn more about minimizing human error in data centers.