“As our clients deploy servers supporting their AI applications in our facilities, we can expect them to use a much higher percentage of the power they’ve leased,” explains Cristifer R. Engel, Vice President of Operations at Sabey Data Centers. “The Ops team has to be on top of its game. With this kind of density, we have to react much quicker to maintenance emergencies. We also have to ensure proper hot aisle containment in less forgiving densities, predict upcoming maintenance needs and provide service to critical equipment before it breaks down and causes a problem.”
The Functions of Operations Teams
In any data center facility, Operations takes care of the following:
- Space Logistics – The Operations ensures the customer’s IT deployments on the data center floor are within the company’s footprint, which is based on their lease and a floorplan provided by the Architecture team.
- “The Ops team manages the white space,” says Engel. “Each client gets a certain amount of floor space, based on their lease agreement. We provide the square footage for customers to set up their racks, install hot aisle or cold aisle containment, connect power and network cables to the servers and start running their computing operations.”
- Power and Cooling – The facilities are designed to provide power to the IT deployment, using UPS-backed power feeds to deliver stable and clean power. The Operations team is responsible for ensuring continuous delivery of air cooling for traditional IT servers, or a mix of liquid and air cooling for HPC servers.
- Maintenance - The Operations team handles maintenance for power, cooling and other critical infrastructure systems.
- “At Sabey, we follow a regular maintenance calendar and use Methods of Procedures (MOPs) for servicing critical equipment,” says Engel. “We also have a robust change management system. Any proposed changes to our critical environments must be reviewed at multiple levels and approved by our Change Advisory Board (CAB), a practice that mitigates risks to our clients and their IT systems.”
- Security – The Operations team oversees onsite security in the data center, managing security staff and confirming that security procedures are enforced. This includes the management of security contractors, as well as maintenance and repairs for security systems (front gate, camera system, biometric locks, etc.).
Ramping Up Power Use for AI Deployments
With traditional IT infrastructure, a data center customer might only use a percentage of their total leased space and power. On top of that, ramp periods are traditionally long, taking several months and even years to scale up to their total deployment.
But with HPC infrastructure, a customer can very quickly ramp up to their full leased capacity, even if their HPC deployment uses the same amount of floor space as their regular IT deployment.
“In 2011, when we built out Sabey’s first data center building in our Quincy, Washington campus, the building had 7.2 megawatts IT capacity,” says Engel. “During the time I was there, the building usually operated at a density between four and five megawatts.”
“But today, the first building on Sabey’s Austin, Texas campus offers 34 megawatts, and we expect our clients to utilize this building to nearly full capacity. Their HPC infrastructure is using a scale of power that is so much bigger now.”
Improving Data Center Operations
The need for more efficient data center operations also increases at scale as power requirements of HPC deployments increase dramatically.
“When a data center has a regular load, the risk of running the facility is not as bad,” says Engel. “If an air-cooling fan fails or a piece of equipment goes offline, the Operations team has time to step in and repair or replace it. But when the data center is fully loaded, that time to react goes down exponentially. If an emergency happens, the Ops team has to be on top of it.”
At each Sabey facility, the Operations team is constantly looking for new ways to improve efficiency. They must understand the useful life of all power and cooling systems components and anticipate when a piece of equipment needs to be replaced before it breaks.
“The funny thing is, we can now use AI to monitor system health and predict our maintenance needs,” says Engel. “The same technology that requires us to maximize power capacity in each facility can also help us run those facilities more smoothly.”
In the next blog in our AI series, we’ll look at how Sabey’s Operations team is using and planning to use AI applications to optimize maintenance schedules for critical power and cooling infrastructure. We’ll also look at how AI is helping Sabey to move from a reactive to a predictive model for operations support.
For information about pre-leasing critical environments for HPC deployments for AI, contact Sabey Data Centers today.