AI Infrastructure Study

AI Infrastructure Study

Day 3: NVLink vs InfiniBand vs Ethernet AI infrastructure

AI Infrastructure Study Series

Day 3: NVLink vs InfiniBand vs Ethernet

Understanding the networking layer of AI infrastructure — and why scale-up and scale-out matter just as much as raw bandwidth.

Summary

AI infrastructure does not run on compute and memory alone. It also depends on networking. NVLink helps GPUs communicate at extremely high speed inside tightly coupled domains, InfiniBand powers specialized low-latency AI and HPC fabrics, and Ethernet is evolving into a stronger AI networking option through open standards and AI-specific optimization. The key lesson is simple: networking is not just about speed — it is about architecture.

1) Why This Matters

In AI infrastructure, faster chips and bigger memory pools are not enough if data cannot move efficiently across GPUs, servers, and racks. That is why networking becomes a core part of performance, especially as model training and inference scale out across many machines.

For investors, the important point is that networking is no longer a background utility. It increasingly shapes GPU utilization, cluster efficiency, tail latency, and total system economics.

2) One-Sentence Definitions

Network Layer Simple Definition Core Strength
NVLink A high-bandwidth GPU interconnect designed to link GPUs very tightly inside scale-up domains. GPU-to-GPU bandwidth
InfiniBand A specialized low-latency network fabric built for AI and HPC clusters. Low latency + RDMA
Ethernet A widely adopted standards-based network that is increasingly being optimized for AI scale-out workloads. Open ecosystem + scale

3) A Simple Analogy

The easiest way to understand this is to imagine transportation inside and between factories.

NVLink = the ultra-fast internal conveyor belt inside one factory

InfiniBand = the dedicated industrial freight network between factory buildings

Ethernet = the open public road system that is now being upgraded for heavy AI traffic

4) What Each Network Actually Does in AI

NVLink: The Scale-Up Fabric

NVLink matters most when GPUs need to behave like one larger compute domain. It is not just a cable. It is a way to preserve high-bandwidth GPU-to-GPU communication inside tightly coupled systems, which is especially important for model parallel workloads.

InfiniBand: The Specialized AI/HPC Fabric

InfiniBand matters when distributed systems need low latency, RDMA, and advanced communication efficiency. It has long been the premium fabric for supercomputing and large AI clusters because it is designed around high-performance distributed workloads rather than general networking.

Ethernet: The Open Scale-Out Challenger

Ethernet matters because cloud-scale AI cannot ignore openness, interoperability, and operational flexibility. As Ethernet gets tuned for AI through RoCE, performance isolation, and full-stack optimization, it becomes a stronger contender in large AI clouds and distributed inference systems.

5) Where the Real Difference Shows Up

Scale-Up

Scale-up is about how efficiently accelerators communicate inside a tightly connected domain. This is where NVLink is strongest, because it is built to keep GPUs communicating as if they were part of one larger machine.

Scale-Out

Scale-out is about how systems communicate across many servers, racks, and even data centers. This is where InfiniBand and AI-optimized Ethernet become central, because cluster growth creates new demands around latency, congestion, isolation, and manageability.

This is why AI networking should not be reduced to a single winner. Different network layers solve different coordination problems inside the stack.

6) So Which One Is Better?

The better question is not which one is “best,” but which architecture fits the workload and operating model.

  • NVLink: Best when extremely tight GPU-to-GPU communication inside scale-up domains matters most.
  • InfiniBand: Best when specialized low-latency AI or HPC cluster communication is the priority.
  • Ethernet: Best when openness, interoperability, cloud-scale deployment, and standard-based expansion matter most.

7) Why Investors Should Care

AI networking is becoming a strategic layer of the stack. The winner is not necessarily the one with the single fastest link, but the one that best balances performance, software integration, cluster economics, and scale.

NVLink represents the economics of scale-up, InfiniBand represents the economics of specialized scale-out, and Ethernet represents the economics of open, cloud-scale AI infrastructure. That framework matters because AI spending is moving from isolated boxes toward full system design.

8) What I Learned Today

  • NVLink is mainly about fast GPU-to-GPU communication inside scale-up domains.
  • InfiniBand is a specialized low-latency fabric built for demanding AI and HPC clusters.
  • Ethernet is becoming a stronger AI networking competitor because open standards and AI optimization matter more at cloud scale.

9) One Question I’m Still Thinking About

If Ethernet keeps improving for AI, will the center of gravity in scale-out networking gradually shift away from specialized fabrics?

10) What Comes Next

In Day 4, I’ll move from networking to manufacturing and packaging and study Foundry, Packaging, and CoWoS. Once compute, memory, and networking are understood, the next question is how these systems are actually built and constrained in the real world.

Continue the AI Infrastructure Study Series

This series is designed to make the AI stack easier to follow — one layer at a time, from compute and memory to networking, packaging, and system economics.

Next: Day 4 — Foundry, Packaging, and CoWoS
AI Infrastructure Study

Day 2: HBM vs DRAM vs SSD

AI Infrastructure Study Series

Day 2: HBM vs DRAM vs SSD

Understanding the memory hierarchy of AI infrastructure — and why the next bottleneck often moves from compute to memory and data movement.

Summary

AI infrastructure is not just a compute story. It is also a memory story. HBM sits closest to the accelerator and delivers extreme bandwidth, DRAM acts as the larger system memory layer, and SSD provides persistent storage for model files, checkpoints, and overflow data. As AI workloads scale, bottlenecks often move from raw compute to memory hierarchy and data movement.

1) Why This Matters

Faster chips alone do not solve the full AI problem. A model can only run efficiently if data reaches the compute engine quickly enough. That is why modern AI systems are built around a memory hierarchy rather than a single memory pool.

For investors, this matters because the next bottleneck in AI infrastructure is often not the processor itself, but the system that feeds it: high-bandwidth memory, system DRAM, storage, packaging, and the software stack that moves data across those layers.

2) One-Sentence Definitions

Memory Layer Simple Definition Core Strength
HBM High-bandwidth memory placed very close to the accelerator for extremely fast data movement. Speed + bandwidth
DRAM The larger system memory layer used by servers and CPUs to buffer, stage, and manage data. Capacity + flexibility
SSD Persistent flash storage used for model files, checkpoints, datasets, and overflow tiers in AI systems. Scale + persistence

3) A Simple Analogy

The easiest way to understand this is to imagine a work desk.

HBM = the tools on your desk that you can reach instantly

DRAM = the shelf next to your desk where you keep more materials nearby

SSD = the storage room where the larger files and long-term materials are kept

4) What Each Memory Layer Actually Does in AI

HBM: The Fastest Working Memory

HBM is designed for bandwidth-heavy AI workloads. It sits close to the GPU or accelerator and is built to feed the compute engine as quickly as possible. In large-scale training and inference, that matters because the model cannot stay efficient if memory throughput falls behind the rate of computation.

DRAM: The System Buffer

DRAM is not as fast as HBM, but it is more scalable as a general-purpose server memory layer. It acts as a staging area for model loading, buffering, and system-level coordination. In practical terms, this means DRAM often carries data that is too large, too cold, or too expensive to keep in HBM all the time.

SSD: The Capacity Layer

SSD is not working memory in the same way as HBM or DRAM, but it is still essential. Model weights, checkpoints, datasets, and long-tail inference data often begin or end their lives in storage. As AI systems scale, SSD becomes part of the performance conversation because loading and moving large assets quickly is no longer optional.

5) Where the Bottleneck Shows Up

Training

In training, the main challenge is feeding massive amounts of data and parameters into compute units fast enough. HBM becomes critical here because large models create enormous memory bandwidth demands, and slow movement can leave expensive accelerators underutilized.

Inference

In inference, the challenge shifts toward latency, cost, and memory tiering. As models handle longer context windows and more requests, some data must move between GPU memory, CPU memory, and storage. That makes the memory hierarchy itself part of the inference architecture.

This is why the phrase memory bottleneck matters so much in AI. Compute can improve, but if memory and data movement do not improve with it, system efficiency breaks down.

6) So Which One Is Better?

The better question is not which memory is “best,” but which layer is right for the job.

  • HBM: Best when extreme bandwidth and proximity to the accelerator matter most.
  • DRAM: Best when the system needs a larger, more flexible working layer.
  • SSD: Best when persistence, scale, and lower-cost capacity matter more than raw speed.

7) Why Investors Should Care

AI is not just a race for faster chips. It is also a race to solve the memory hierarchy. That means value can accrue not only to accelerator vendors, but also to memory suppliers, storage providers, advanced packaging players, and system software companies that make tiered memory practical at scale.

If Day 1 explains who performs the work, Day 2 explains what makes that work sustainable. The faster the model, the more valuable efficient memory and data movement become.

8) What I Learned Today

  • HBM is the fastest and most bandwidth-focused memory layer in modern AI systems.
  • DRAM acts as the larger system memory layer that supports staging, buffering, and coordination.
  • SSD is not just cheap storage — it increasingly matters for model loading, persistence, and overflow data in large AI systems.

9) One Question I’m Still Thinking About

If compute keeps improving faster than memory economics, does the real AI bottleneck eventually shift from chips to data movement and storage architecture?

10) What Comes Next

In Day 3, I’ll move from memory to networking and study NVLink vs InfiniBand vs Ethernet. Once data moves inside a chip, the next question is how it moves across many chips and many servers.

Continue the AI Infrastructure Study Series

This series is designed to make the AI stack easier to follow — one layer at a time, from compute and memory to networking, packaging, and inference economics.

NNext: Day 3 — NVLink vs InfiniBand vs Ethernet
AI Infrastructure Study

Day 1: GPU vs ASIC vs CPU — Understanding the Compute Layer of AI Infrastructure

AI Infrastructure Study Series

Day 1: GPU vs ASIC vs CPU

Understanding the compute layer of AI infrastructure — and why investors should not look at GPUs alone.

Summary

AI infrastructure starts with compute, but not all chips play the same role. GPUs are the most flexible and dominant accelerators for large-scale AI workloads, ASICs are purpose-built chips designed for specific tasks with stronger efficiency in narrow use cases, and CPUs remain the control layer that keeps the overall system running. The real lesson for investors is simple: AI is not a one-chip story. It is a stack.

1) Why This Matters

Many people think AI infrastructure begins and ends with Nvidia. That is understandable, because GPUs sit at the center of today’s AI boom. But to really understand where value is created, it helps to step back and look at the broader compute layer. GPUs, ASICs, and CPUs each play different roles, and the balance between them shapes cost, performance, and long-term competitive advantage.

This is why Day 1 starts here. Before studying memory, networking, packaging, or inference economics, it is important to understand the basic job of each chip.

2) One-Sentence Definitions

Chip Simple Definition Core Strength
GPU A highly parallel processor built to accelerate massive computations, especially in AI training and inference. Flexibility + scale
ASIC A purpose-built chip optimized for a specific workload or model type. Efficiency in narrow tasks
CPU A general-purpose processor that manages the system and supports broader computing tasks around AI workloads. Control + orchestration

3) A Simple Analogy

The easiest way to understand this is to imagine a factory.

CPU = the factory manager

GPU = the large, flexible production line

ASIC = the specialized machine built to do one task extremely well

4) What Each Chip Actually Does in AI

GPU: The Main Workhorse

In today’s AI market, the GPU is the dominant general-purpose accelerator. It is powerful enough for large-scale training and still flexible enough for a wide range of inference workloads. That flexibility matters. When models change quickly, or when developers want one common platform across many applications, GPUs are usually the default choice.

ASIC: The Specialized Competitor

ASICs matter because the largest cloud companies do not want to rely forever on the same economics as everyone else. If a hyperscaler can design a chip for a narrower workload and run that workload more efficiently, it can lower cost and improve internal control. That is where products like TPU, Trainium, and Inferentia become important.

CPU: Still the System Coordinator

The CPU is often underestimated in AI discussions. But AI servers are not just accelerators plugged into empty boxes. Someone still has to manage data movement, system control, orchestration, scheduling, and many surrounding software tasks. In that sense, the CPU remains the coordinator of the broader machine.

5) Training vs Inference

Training

Training is the process of teaching a model. The system repeatedly processes data, compares predictions with targets, and updates model weights. This usually demands the highest raw compute power and is where GPUs have become especially dominant.

Inference

Inference is the process of using a trained model to answer new inputs. Here, cost efficiency, latency, and throughput become more important. This is where GPUs still matter, but ASICs and CPUs can play a larger role depending on the use case.

This difference is important because not every AI dollar goes to the same part of the stack. Training rewards raw compute leadership. Inference often rewards efficiency, system design, and cost control.

6) So Which One Is Better?

The wrong answer is to say one chip is simply “the best.” The better answer is that each one wins under different conditions.

  • GPU: Best when flexibility, ecosystem, and broad workload support matter.
  • ASIC: Best when scale and specialization justify building around a narrow workload.
  • CPU: Best when general-purpose control, orchestration, and support functions are the priority.

7) Why Investors Should Care

The real investment takeaway is that AI infrastructure should not be viewed as a single-product story. GPUs capture the broadest demand, but ASICs reveal how hyperscalers try to improve their own economics, and CPUs remain essential because AI systems still need a host layer to function efficiently.

In other words, AI is not just about who sells the fastest chip. It is also about who controls the platform, who captures the system economics, and where the bottleneck moves next.

8) What I Learned Today

  • GPU is the most flexible and widely used AI accelerator today.
  • ASIC is not automatically better than GPU — it becomes attractive when specialization and efficiency matter more.
  • CPU is still critical because AI infrastructure is a full system, not just a box full of accelerators.

9) One Question I’m Still Thinking About

If GPUs are so dominant, why are hyperscalers still spending heavily to build their own ASICs?

10) What Comes Next

In Day 2, I’ll move from compute to memory and study HBM vs DRAM vs SSD. That is where the story gets even more interesting, because raw compute power means less if data cannot move fast enough.

Follow the AI Infrastructure Study Series

I’m documenting this series to better understand how the AI stack really works — from compute and memory to networking, packaging, and inference economics.

Next: Day 2 — HBM vs DRAM vs SSD
Scroll to Top