Nuclear Power¶

Core Takeaway

Every piece of hardware is one bottleneck for LLMs — whichever bottleneck gets unblocked, that company's stock moves.

AI Industry Knowledge — History → Technology → Supply Chain → Business → Application → Geopolitics

P1-C5 (Part 1, Chapter 5). After this chapter, you can reverse-engineer why the entire hardware stack is designed this way from LLM working principles, without memorizing supply chain tickers.

1. The Problem: Why Can't You Train LLMs with CPUs?¶

You see hyperscalers spending 2026 combined capex $600-725B (Big 4 = MSFT + GOOGL + AMZN + META, ~75% AI-related, per Yahoo/CreditSights 2025-12) on GPUs, but never ask why they can't use cheaper CPUs. You see SK Hynix's stock soaring, but don't know how HBM differs from regular DRAM. You see Vertiv up 200% and think it's an air conditioning company.

How LLMs work (learned in C4) dictates every hardware requirement — you can derive the entire hardware stack from first principles, and then you won't need to memorize 60 supply chain tickers.

2. The Solution: LLM's 4 Core Requirements → 4 Hardware Categories¶

LLM Needs	Physical Bottleneck	Solving Hardware	Key Players
Massive parallel matrix multiplication (training)	CPU serial is slow	GPU / ASIC	NVDA · AMD · Google TPU
Fast data feeding (don't let GPU wait)	DRAM bandwidth insufficient	HBM high-bandwidth memory	SK Hynix · Micron · Samsung
GPU-to-GPU communication (1000+ card clusters)	Standard Ethernet is slow	NVLink / InfiniBand / Optical modules	NVDA Mellanox · ANET · COHR
Cooling + stable high power	Air cooling can't handle 800W+ chips	Liquid cooling + Nuclear / Gas	VRT · CEG · VST · ETN

Each link has a "physical bottleneck → solving hardware → key company". The hardware stack maps one-to-one with companies.

3. How It Works: 4 Bottlenecks Explained in Detail¶

3.1 GPU vs CPU — Parallel Matrix Multiplication¶

LLM training spends 99% of time on matrix multiplication (neural networks are essentially matrices).

CPU: 8-128 cores, each core handles complex tasks independently (like 100 PhDs)
GPU: 10,000+ cores, each core does simple arithmetic (like 10,000 elementary students doing addition/subtraction)
Matrix multiplication: 10,000 elementary students doing arithmetic is 100x faster than 100 PhDs

**NVDA H100**: 1 card with 80GB HBM, 700W power, $30K-40K. A training cluster has 1024-8192 cards.

**AMD MI300X / Google TPU / AWS Trainium: Same concept, different implementations. The CUDA ecosystem (NVDA's 20-year moat) keeps NVDA at 80%+ of the training market**.

3.2 HBM vs Regular Memory — Data Throughput¶

GPUs compute fast, but before computing, data must be read from memory into the GPU. Regular DRAM bandwidth is insufficient → GPU spends 80% of time waiting for data → wasted.

HBM (High Bandwidth Memory): 3D stacked memory, bandwidth 10x that of DDR5.

SK Hynix: Primary HBM3e supplier, NVDA uses 70%+ from SK Hynix
Micron: Ramped in 2024, gaining share
Samsung: Slow to qualify with NVDA (technology / yield / strike triple whammy), losing market share

→ HBM shortage is NVDA's shipment ceiling. Monitoring HBM capacity is monitoring NVDA's revenue ceiling.

3.3 NVLink / InfiniBand / Optical Modules — GPU-to-GPU Communication¶

One LLM is too large for a single GPU → distributed across 1000+ GPUs. They need high-speed communication (gradient synchronization).

NVLink: Between NVDA's own GPUs, 1.8TB/s (Blackwell)
InfiniBand: Between clusters (NVDA acquired Mellanox in 2019 to secure this)
Optical modules: Data center cabling, speeds from 400G → 800G → 1.6T → CPO (Co-Packaged Optics)

**COHR / LITE / AAOI**: Optical modules. NVDA invested $2B strategically in COHR / LITE to lock supply. ANET: Network switching (META's primary supplier, used for east-west fabric).

→ Optical module price increases = leading indicator of AI capex acceleration (as cluster scales up, optical module demand grows quadratically).

3.4 Liquid Cooling + Nuclear Power — Cooling + Sustained High Power¶

H100: 700W. Blackwell B200: 1200W. Air cooling can't handle it → liquid cooling is a must.

Some Stargate-scale clusters can reach 1GW (e.g., Stargate UAE 1GW Abu Dhabi cluster, Phase 1 200MW online Q3 2026 per OpenAI 2025/05/22 + G42 2025/12). 1GW ≈ 1 nuclear reactor's output. Stargate Abilene Texas / other US sites have capacities not yet officially disclosed — don't default-assume every Stargate site is 1GW.

VRT (Vertiv): Liquid cooling + data center electrical king
CEG (Constellation): MSFT's 20-year nuclear PPA (Three Mile Island restart)
VST (Vistra): Natural gas + nuclear
ETN (Eaton) / HUBB: Power distribution
GEV (GE Vernova): Gas turbines (backup + peak)

→ Energy is the real bottleneck for 2026+. You can buy GPUs, but you can't buy electricity (building a nuclear plant takes 10 years). That's why CEG / VST / GEV stocks soared in 2024+.

4. vs C4 — What You Already Know¶

Dimension	C4 Gives You	C5 Adds
LLM working principles	✓	Doesn't explain hardware
Hardware stack	✗	LLM → 4 bottlenecks → 4 hardware categories → key companies
Investment significance	Knows training vs inference compute	Knows which bottleneck unblocking moves which company's stock; monitoring HBM / optical modules / power is a leading indicator

C4 = Software. C5 = Hardware + Physics. Without C5, you don't know the true physical logic behind each link in the 60-ticker supply chain.

5. Try It: Estimate GPT-4's Electricity Usage for One Training Run¶

Task (10 minutes):

⚠️ Important caveat: GPT-4 model size / hardware / training compute / data / cost was NOT disclosed by OpenAI (per GPT-4 Tech Report). The numbers below are external estimates (industry estimates, not OpenAI official). The goal of this exercise is practicing order-of-magnitude estimation, not citing facts. (Note: external estimates vary widely — some sources say ~25,000 A100 over ~3 months, others ~10,000 H100 over ~6 months. Both reflect the lack of official data.)

GPT-4 training estimate (external estimate, NOT confirmed by OpenAI):
- 10,000 H100s, each 700W = 7 MW (peak)
- Train for 6 months = 4380 hours
- Compute utilization ~50% average
- Total electricity = 7 MW × 4380 × 0.5 = 15.3 GWh

Reference:
- 1 US household annual electricity ~10 MWh = 0.01 GWh
- 15.3 GWh = 1530 household-years

But this is one run. GPT-4 was trained multiple times (experiments + failures + final), total electricity estimated ~50 GWh = 5000 household-years.

Self-check (3 items met → proceed to P1-C6):

You can explain **why SK Hynix's stock flies in sync with NVDA**
You can explain why CEG (nuclear) surged 200%+ in 2024+
You can predict which link will rally next from hardware bottlenecks: HBM4 (2026)? Liquid cooling penetration (2026-27)? 1.6T optical modules?

6. What's Next¶

You can now reverse-engineer the hardware stack from LLMs. Now map the hardware stack to specific companies — which role each of the 60 tickers plays, and what they depend on.

→ P1-C6 · Supply Chain 5 Roles + 60 Ticker Map Upgrade the existing supply chain diagram; with C1-C5 as foundation, you're no longer learning in isolation.

7. Deep Dive (optional): CPO / NVLink vs Infiniband / TPU Economics / Inference Hardware Divergence¶

Click to see 5 hardware trends

CPO (Co-Packaged Optics) — 2025+: Optical modules go from pluggable to packaged together with the switch chip. Power consumption drops 50%, bandwidth doubles 2x. But CPO yield is difficult, mass production slow. Key players: TSM (packaging), AVGO (switch), Coherent (optical). → If CPO truly mass-produces in 2026, the entire optical module paradigm shifts, reshuffling existing players.

NVLink vs InfiniBand vs Ethernet: NVDA pushes NVLink (between its own GPUs) + InfiniBand (between clusters). But the Ultra Ethernet Consortium (Cisco/Arista/Intel/AMD/MSFT) is jointly promoting standard Ethernet for AI fabric. Long term, NVDA's networking advantage may be diluted.

TPU Economics (Google internal): TPU v5p performance is comparable to H100, but Google uses it internally (not sold externally). This diverts 30-50% of Google's demand from NVDA, but the total market is unchanged (Google uses the same compute even without buying NVDA).

Inference Hardware Divergence — Training vs inference hardware will separate in the future: Training: Massive clusters (NVDA Blackwell dominates) Inference: Single card / edge / small chips (Groq / Cerebras / SambaNova / Apple NPU). NVDA Blackwell also optimizes inference but competitors have a chance.

HBM4 (2026) — Next generation: SK Hynix mass production timeline, bandwidth doubles again. NVDA Rubin (2026 H2) uses HBM4. This is the starting point for the next HBM shortage cycle.

8. Further Reading (this chapter — GPU / HBM / liquid cooling / nuclear power)¶

All free sources, aligned with P5 0-paid policy

Classic papers / primary white papers:

NVIDIA Blackwell Architecture Technical Brief — Primary design for Blackwell GPU + NVLink
NVIDIA Hopper Architecture White Paper — Primary H100 / H200 design + performance numbers
TSMC Annual Report / 20-F — Process roadmap + capex data
SK Hynix HBM3E announcement (IR) — Primary HBM3E / HBM4 specs

Wikipedia (3-10 min):

"High Bandwidth Memory" — HBM ½/¾ evolution + 3 suppliers
"NVLink" — NVDA's interconnect solution
"InfiniBand" — Data center networking protocol
"Data center" — Data center power / cooling / design
"Three Mile Island Nuclear Generating Station" — 2024 Constellation-MSFT restart of unit 1 (history + agreement background)

Videos / public lectures:

NVIDIA GTC keynotes (official YouTube) — Every 6 months, see product roadmap
TSMC Technology Symposium public talks — Annual process + advanced-packaging updates
Asianometry "How TSMC Makes Chips" — Semiconductor manufacturing explainer channel, many free videos

Company IR (quarterly reports + investor day decks):

Vertiv Investor Relations — Liquid cooling + data center power
Constellation Energy IR — Nuclear power + data center PPAs
NVIDIA IR — Data Center segment breakdown in quarterly reports is the gold
SK Hynix IR — Primary HBM supply data

Podcasts:

Acquired — TSMC — Why fab manufacturing is so expensive + 7nm/3nm process
Acquired — NVIDIA Part III — H100 / Blackwell economics

Books (library):

Chris Miller "Chip War" (2022) — Semiconductor industry macro + geopolitics
Mark Lapedus and many co-authors — Industry magazines EE Times / SemiWiki, free deep articles (semiwiki.com)

Pair with this chapter's self-check:

After the Blackwell white paper + Wikipedia "HBM" + 1 Vertiv / Constellation IR deck, you should be able to answer "the 4 bottlenecks (compute / memory / interconnect / power)" and "estimate GPT-4 training electricity usage."