AI Infrastructure Study Series

Day 6: Data Center Power and Cooling

Understanding why electricity, thermal density, and cooling infrastructure have become the binding physical constraints on AI deployment.

Summary

Every layer studied so far — GPUs, memory, networking, packaging, servers — ultimately depends on two physical resources: electricity and cooling. No matter how advanced the silicon, a chip that cannot be powered cannot compute, and a chip that cannot be cooled must be throttled or shut down. Today we study why data center power and cooling have become the longest-duration bottleneck in AI infrastructure — a constraint measured not in chip generations but in years of physical construction and regulatory approvals.

1) Why This Matters

Semiconductor technology advances on 1–2 year cycles. A new GPU generation, a new process node, a new packaging technique — these move fast. But the power infrastructure that feeds data centers — transmission lines, substations, transformers, generation capacity — takes 5 to 10 years to build. This mismatch means that even if every chip-level bottleneck were resolved tomorrow, AI deployment would still be constrained by how fast electricity can reach the data center.

For investors, this is the layer where AI infrastructure meets energy infrastructure. The companies that supply power equipment, cooling systems, and grid connectivity are indirect but structural beneficiaries of AI spending — and the constraints here will persist longer than any semiconductor bottleneck.

2) One-Sentence Definitions

Term Simple Definition Why It Matters
Power Capacity Total electricity a data center can draw, measured in MW. New AI facilities target 100 MW to 1 GW+. Power supply — not demand — is the bottleneck
Power Density Power consumed per rack, measured in kW/rack. AI racks draw 40–120 kW vs 10–20 kW for traditional racks. 5–10× higher density requires completely different infrastructure
PUE Power Usage Effectiveness — total facility power ÷ IT equipment power. Lower is better (1.0 = perfect). Cooling overhead directly impacts operating cost
Air Cooling Traditional cooling using fans and air conditioning. Sufficient for standard racks, insufficient for AI density. Reaching its physical limits with AI workloads
Direct Liquid Cooling (DLC) Cold plates attached directly to GPUs/CPUs with circulating coolant. Far more efficient than air. Required by NVIDIA GB200 NVL72 — becoming the new standard
Immersion Cooling Submerging entire servers in non-conductive coolant. Highest thermal efficiency but early-stage adoption. Future potential for extreme density deployments

3) A Simple Analogy

Think of AI data center infrastructure like city utilities for a rapidly growing factory district.

Power Capacity = the city's electrical grid — you can build factories fast, but running power lines to them takes years

Power Density = how many appliances per apartment — AI racks are like cramming 20 industrial ovens into each unit

Air Cooling = opening windows — works for a normal apartment, not for 20 ovens

Direct Liquid Cooling = running water pipes directly to each oven — much more effective

Immersion Cooling = submerging the entire oven in a cooling bath — maximum efficiency, complex plumbing

4) Why Power Is the Ultimate Bottleneck

The Timeline Mismatch

GPU generations advance every 1–2 years. TSMC delivers new process nodes every 2–3 years. But building the power infrastructure to feed a new data center — transmission lines, substations, transformers, grid interconnection — takes 5 to 10 years including permitting and environmental review. This is the fundamental mismatch: chip technology is outrunning the physical grid.

The Scale of AI Power Demand

To put the numbers in perspective: Microsoft, Google, Meta, and Amazon are each investing over $50 billion per year in AI infrastructure. A single new AI data center campus targets 100 MW to over 1 GW of power capacity. For reference, 1 GW is roughly equivalent to one small nuclear power plant. Data centers are projected to consume 6–9% of total U.S. electricity by 2030.

Why Getting Power Is Hard

Building a data center is relatively fast — 12 to 18 months for construction. But securing the power to run it involves a chain of physical infrastructure that cannot be accelerated easily. High-voltage transmission lines require environmental impact assessments and land acquisition. Large power transformers have 2–3 year lead times globally, and demand is surging. Grid interconnection studies and utility agreements add further delays. This is why hyperscalers are now going directly to power sources — signing contracts to restart nuclear plants, building next to power stations, and investing in on-site generation.

What Beginners Often Get Wrong

People assume that if you have the money, you can build AI infrastructure quickly. In reality, the binding constraint is not capital — it is the physical time required to build power and cooling infrastructure. A hyperscaler can order $10 billion worth of GPUs, but if the data center does not have the electrical capacity to run them, those GPUs sit idle. This is why power availability has become the most important site-selection criterion for new AI data centers.

5) The Cooling Transition: From Air to Liquid

As power density rises, cooling must keep pace. Almost all of the electrical power consumed by a GPU is converted to heat. If that heat is not removed, the chip throttles its performance or shuts down entirely.

Cooling Method How It Works Best For Limitation
Air Cooling Fans push cold air over components Standard racks (10–20 kW) Cannot handle AI-density heat loads
Rear Door HX (RDHx) Water coils on rack rear door cool exhaust air Retrofitting existing air-cooled facilities Bridge solution — not sufficient for highest density
Direct Liquid Cooling (DLC) Cold plates on GPUs/CPUs with circulating coolant High-density AI racks (40–120 kW+) Requires new plumbing, CDUs, and secondary cooling
Immersion Cooling Servers submerged in non-conductive liquid Extreme density, future deployments Complex maintenance, high upfront cost, early adoption

The transition from air cooling to liquid cooling is not optional — it is being driven by hardware requirements. NVIDIA's GB200 NVL72 rack system requires direct liquid cooling as a baseline specification. This means every data center deploying next-generation NVIDIA hardware must invest in coolant distribution units (CDUs), piping infrastructure, and secondary heat rejection systems. The cooling transition is structural and recurring: every new AI data center that gets built will need liquid cooling from day one.

6) The Energy Source Question

Renewables

All major hyperscalers have carbon neutrality goals. Solar and wind are growing fast but face intermittency challenges — AI workloads run 24/7 and need baseload power. Renewables alone cannot yet meet the scale and reliability demands of large AI campuses.

Nuclear (SMR + Restarts)

Nuclear provides carbon-free baseload power — exactly what AI data centers need. Microsoft signed a deal to restart a Three Mile Island reactor. Amazon is building data centers near nuclear plants. SMRs (Small Modular Reactors) are being explored as dedicated on-site power for future campuses. Nuclear is emerging as the serious long-term answer to AI power demand.

Natural Gas

Natural gas generation remains the fastest path to large-scale reliable power. Many new AI data centers are being sited near gas-fired power plants. While not carbon-neutral, gas bridges the gap until nuclear and renewables can scale sufficiently.

Grid Stress

The rapid growth in data center power demand is straining regional grids. Northern Virginia, the largest U.S. data center market, has already experienced grid capacity constraints. Utilities are pushing back on new interconnection requests. Grid congestion is becoming a real limiting factor for AI expansion.

7) Who Matters at This Layer

Company / Segment Role in AI Power/Cooling What Investors Should Watch
Vertiv Power management, thermal management, CDUs — core data center infrastructure AI-related revenue growth, backlog size, liquid cooling orders
Schneider Electric Power distribution, UPS, cooling, data center management software Data center segment growth, cooling product mix shift
Eaton Electrical distribution, power quality, UPS systems Data center order growth, transformer/switchgear lead times
nVent Liquid cooling solutions, rack infrastructure, thermal management Liquid cooling revenue ramp, hyperscaler adoption
Utilities (Dominion, AES, NextEra) Grid power supply to data center campuses Data center interconnection pipeline, rate case filings, CapEx for grid expansion
Nuclear (Constellation, NuScale, Oklo) Baseload clean power for next-generation AI campuses PPA announcements with hyperscalers, SMR development milestones, regulatory approvals

8) Why Investors Should Care

Power and cooling are fundamentally different from semiconductor bottlenecks. Chip-level bottlenecks (CoWoS capacity, EUV supply) are technology and manufacturing problems that can be solved with investment and engineering on 1–3 year horizons. Power bottlenecks are physical infrastructure and regulatory problems that take 5–10 years to resolve. This makes power the longest-duration constraint in the AI stack.

The Core Framework

Chip Speed ≠ Deployment Speed — Power Sets the Pace

Semiconductor technology can advance in 1–2 year cycles. Power infrastructure takes 5–10 years. This mismatch means that AI deployment speed is ultimately gated not by how fast chips improve, but by how fast electricity and cooling can be delivered to the facility. Investors must track power availability, grid capacity, cooling infrastructure investment, and energy sourcing strategies as leading indicators of AI infrastructure growth — not lagging ones.

9) Connecting to the Stack

Day 1–4 → Day 6

GPUs (Day 1), HBM (Day 2), interconnects (Day 3), and packaging (Day 4) all consume power and generate heat. Every improvement in chip performance increases the power and cooling demands at the facility level.

Day 5 → Day 6

AI servers and racks from Day 5 drive the extreme power density (40–120 kW/rack) that makes power and cooling the binding constraint. The server layer creates the demand; the facility layer must supply it.

Day 6 → Day 7

Power and cooling costs feed directly into training economics. Day 7 will study what drives the total cost of training frontier AI models — and electricity is one of the largest line items.

The Full Chain So Far

GPU designed → manufactured at TSMC (Day 4) → packaged with HBM via CoWoS (Day 4) → connected via NVLink (Day 3) → installed in AI server (Day 5) → powered and cooled by data center infrastructure (Day 6). Each layer depends on every layer before it.

10) What I Learned Today

  • AI server racks consume 5–10× more power than traditional racks (40–120 kW vs 10–20 kW), and the electrical infrastructure to deliver that power takes years to build — making power the ultimate physical bottleneck for AI scaling.
  • Cooling is transitioning from air to direct liquid cooling (DLC), driven by NVIDIA's GB200 NVL72 requiring liquid cooling as a baseline spec. This creates structural, recurring demand for cooling infrastructure with every new AI data center.
  • Power and cooling bottlenecks are fundamentally different from semiconductor bottlenecks — they are physical infrastructure and regulatory problems with 5–10 year resolution timelines, making them the longest-lasting constraint in the AI stack.

11) One Question I'm Still Thinking About

If AI power demand continues growing at the current pace, will the grid infrastructure crisis force hyperscalers toward fully on-site generation — and could nuclear SMRs eventually make data centers energy-independent?

12) What Comes Next

In Day 7, I'll study Training Economics — what actually drives the cost of training frontier AI models. Hardware utilization, networking efficiency, energy costs, and model scaling laws all converge to determine whether training a model costs $10 million or $1 billion. Power and cooling from today's study are a major component of that equation.

Continue the AI Infrastructure Study Series

This series is designed to make the AI stack easier to follow — one layer at a time, from compute and memory to networking, packaging, and system economics.

Next: Day 7 — Training Economics
Sources & Methodology: Market data sourced from TradingView, Finviz, FRED, and SEC EDGAR filings. All analysis and commentary represent the author's independent assessment and is intended for educational purposes only.
Written & reviewed by Luke, Independent Market Analyst
EverHealthAI

Luke — Independent Market Analyst

Luke is an independent market analyst and the founder of EverHealthAI. He covers U.S. equities, geopolitical risk, macroeconomic trends, and AI infrastructure — with a focus on helping long-term investors understand the forces shaping capital markets. All content is written and edited by a human author and is intended for educational purposes only. Learn more →

Scroll to Top