Skip to main content
The Dragos Blog

05.08.24 | 4 min read

Data Centre Operations: Cooling Systems Are Possible Targets for Operational Disruption 

Information provided here is sourced from Dragos OT Cyber Threat Intelligence adversary hunters and analysts who conduct research on adversary operations and their tactics, techniques, and procedures (TTPs). Dragos OT cyber threat intelligence is fully reported in Dragos WorldView threat intelligence reports and is also compiled into the Dragos Platform for threat detection and vulnerability management.

Numerous Australian industrial organisations depend on data centres to provide key operational elements such as data storage, processing, backups, and recovery. Data centres also support many different business applications and services, ranging from productivity applications to high-volume transactions, big data processing, and artificial intelligence. Not surprisingly, the data storage and processing sector is listed amongst the 11 sectors covered under the Security of Critical Infrastructure Act 2018 (SOCI), further highlighting its criticality. Although cloud adoption varies across specific sectors, many Australian industrial operators have implemented cloud technologies to enhance operational advantages, increase efficiency, and ensure operational redundancy. However, the adoption of these services is not without risk. Rather, the critical services reliant on the availability of data centres are potentially susceptible to operational disruption, stemming from events such as cooling system outages. This blog, therefore, delves into some of the core operational risks associated with data centre cooling systems.

Data Centre Operations and Cooling Systems

The building operations within the dedicated data centre facilities are often managed by building automation systems/building management systems (BAS/BMS). BAS and BMS are collective terms for the software, hardware, and communications protocols that integrate multiple building operations systems for improved performance, energy management, and facility oversight. One critical function of these management systems is temperature regulation. Modern data centres employ diverse cooling methods (either in isolation or in unison), which involve a complex interplay of different system elements. These technologies range from air conditioning systems, chilled water systems, and, more recently, liquid cooling technologies such as immersion and direct-to-chip cooling.

Temperature control within data centres is critical because the associated networking, storage, and computing infrastructure must be kept at specific operating temperatures to ensure optimum functioning and computing power. Failure to do so could result in overheating of the data centre – a scenario that could lead to infrastructure shutdown or damage. Concerningly, studies have shown that in the event of a cooling system outage, without a backup in place, server room temperatures can become unacceptably hot after only five minutes.

Stay Ahead of Industrial Cyber Threats with Actionable Cyber Threat Intelligence

Request a live Dragos WorldView demo and delve into example reports to see for yourself.

Request a WorldView Demo

Cooling System Outages

This overheating scenario is not simply theoretical – there have been numerous recent examples where cooling system issues led to the direct operational disruption of the downstream customer base. On 30 August 2023, a lightning strike and subsequent utility voltage sag led to the shutdown of cooling system chillers across several Microsoft data centres in Australia. Consequently, some of the storage and computing infrastructure within the data centre had to be powered down to prevent hardware damage. The associated service downtime extended to nearly 12 hours, with certain services experiencing disruptions up to 3 September – a period of over four days. Notably, the incident impacted the Bank of Queensland and the Australian airline Jetstar, amongst other organisations.

In another example, on 14 October 2023, a cooling system issue at an Equinix Data Centre in Singapore caused temperatures in certain sections of the data centre to rise beyond acceptable levels. This issue disrupted service availability and significantly impacted banking services at DBS Bank Limited and Citibank, with associated impacts lasting until the following day. As a result of this disruption, approximately 2.5 million payment and automated teller machine (ATM) transactions and up to 810,000 digital banking access attempts failed during the outage period.

Management Systems as a Potential Avenue of Cyber Attack

As is evident, cooling systems are critical to data centre operations. Any disruption to their functionality can cause equipment shutdown, resulting in downstream impacts on the associated industrial customer base. Hypothetically, the strategic importance of cooling systems makes them a plausible target for adversaries aiming to undermine data centre operations and the broader industrial sector reliant on cloud services. On this note, certain adversarial groups have recently signaled interest in building management systems.

Specifically, in July of 2021, classified documents that were publicly associated with the Iranian government were leaked, demonstrating an intent to perform high-level research into building management systems technologies amongst many other topics of interest. Such research could possibly serve as a precursor to targeting specific infrastructure components like cooling systems. In addition, the Dragos-designated Threat Group CHERNOVITE has the theoretical potential to target building management systems and temperature control technologies within data centres due to the PIPEDREAM malware framework’s use of ubiquitous industrial protocols. Specifically, the MOUSEHOLE module provides an interactive capability for manipulating open platform communication unified architecture (OPC-UA) server nodes and associated devices, which are often ubiquitous within data centre environments.

Disruptions to data centre cooling systems can cause temperatures to rise beyond acceptable levels, resulting in infrastructure shutdowns. Such shutdowns could subsequently impair service availability and have a range of downstream business impacts for industrial organisations that rely on data centre availability. At a minimum, some expected impacts could include operational disruption, reputational damage, loss of availability of critical services, and, in some cases, even loss of view.

Recommendations

Given the conveyed risk of operational disruption posed by outages to data centre cooling systems, Dragos recommends that organisations:

  • Conduct a risk assessment to identify business-critical applications and services that rely on data centres, such as cloud services.
  • Incorporate scenarios such as data centre outages into disaster/incident response planning and create a standard operating procedure to respond to such occurrences.
  • Engage with cloud service providers to discuss risks of failover failures or cooling system outages. Ensure they have backups and redundancy plans in place for such incidents. 

Dragos also recommends that all industrial organisations, including data centre operators, adopt the 5 Critical Controls  for World-Class OT Cybersecurity. These cybersecurity controls encompasses the key defensive elements of industrial control systems (ICS) incident response, defensible architecture, ICS network visibility monitoring, secure remote access, and risk-based vulnerability management.

Download the Complete Threat Analysis

Download the cyber threat intelligence report from Dragos WorldView, “Threat Perspective: Data Center Operations,” which delivers a comprehensive analysis and actionable recommendations for protecting essential OT infrastructure. 

SKIP

Ready to put your insights into action?

Take the next steps and contact our team today.