The 54% Tax: The GPU Pricing Gap Across Today's Workhorse AI Silicon

June 30, 2026

AIForge Works Research

PricingGpuNeocloudSilicon Arbitrage

Last updated June 29, 2026. AIForge Works compares published GPU prices across nine major clouds every week. The current read: the H100 80GB SXM costs 53.7% less on a neocloud than on a hyperscaler, and across the four workhorse GPU families the average neocloud discount is 49.3%. Below: where that gap lives, why it persists, and what it's worth at scale. (Full method and exclusions at the end.)

The gap is real — and you may be paying it on the silicon you're actually running today

Every week, AIForge Works compares published GPU prices across all nine major clouds. The headline as of June 29, 2026: across the four workhorse families (H100 80GB SXM, A100 80GB, L40S 48GB, and H200 141GB), neoclouds undercut hyperscaler floors by 49.3% on average — measured like-for-like, matching GPU, memory, interconnect class, and region, then comparing the lowest published price on each side.

These are the four families doing the work in production AI today: H100 as the high-volume installed base for production training and inference — the chip most production AI was built on; H200 as the upgrade path for new training runs and large inference workloads where its larger HBM3e memory matters; A100 as the cost-anchored production base for smaller training and inference pipelines that haven't been retired; and L40S for inference, fine-tuning, and mixed workloads where its lower per-GPU cost beats the larger H-series chips. (Frontier-scale new training runs have largely moved to NVIDIA Blackwell — B200, GB200, GB300 — covered in the companion post; H100 remains the most widely deployed GPU and the largest pool of optimizable spend.)

The trajectory across the spring: 45.1% on April 26 → 54.3% on June 1 → 49.7% on June 22 → 49.3% on June 29. The narrowing reflects real provider movement — Crusoe raised L40S prices roughly 50% and A100 roughly 21% in mid-June. The direction is the part worth watching: hyperscaler GPU pricing is not catching up to neocloud floors on these four families, and the structural reasons for the gap aren't changing.

This isn't a cherry-picked comparison. It's drawn from the full published price lists of all nine clouds — more than 5.4 million provider price points as of June 29 — narrowed to the roughly 111,000 comparable instance offers that can be matched GPU-for-GPU across AWS, Azure, GCP, OCI, CoreWeave, Lambda, Vultr, Nebius, and Crusoe.

A cheap price you can't actually buy isn't a deal

A neocloud floor at half the hyperscaler price means nothing if that provider doesn't offer the GPU you need — in a region you can deploy to. And availability is wildly uneven: some providers list a given GPU dozens of ways across dozens of locations; others don't carry it at all.

That's the half of the procurement question a price-only comparison misses. AIForge Works tracks both halves — price and availability — across all nine clouds and refreshes them every week, so you can see them side by side instead of opening nine separate pricing pages and stitching them together by hand.

Here's the availability picture for the four workhorse families, as of June 29, 2026. Each cell shows two numbers: how many distinct ways you can buy that GPU on that provider, and how many regions it's offered in.

Provider	H100 80GB SXM	H200 141GB	A100 80GB	L40S 48GB
	SKUs / Regions	SKUs / Regions	SKUs / Regions	SKUs / Regions
AWS	22 / 13	1 / 1	6 / 6	95 / 12
Azure	140 / 30	32 / 31	147 / 34	—
GCP	49 / 36	4 / 6	39 / 36	—
OCI	52 / 26	26 / 26	26 / 26	26 / 26
CoreWeave	2 / 23	2 / 23	3 / 23	1 / 23
Lambda	4 / 14	—	1 / 14	—
Vultr	1 / 0	—	2 / 2	3 / 1
Nebius	2 / 1	2 / 4	—	12 / 1
Crusoe	1 / 3	1 / 1	5 / 2	5 / 2

How to read it. The first number is how many distinct purchasable options a provider lists for that GPU — a single-GPU instance, an 8-GPU server, and different networking or interconnect setups each count separately. The second is how many regions (physical locations) it's offered in. A dash ("—") means the provider doesn't offer that GPU at all. A "0" in the region slot (Vultr H100 SXM) means the GPU is listed but not yet tied to a specific named region. CoreWeave shows the same region count across every family because it offers its whole lineup uniformly across its locations rather than region by region. Lambda's H200-class capacity is GH200 96GB — real hardware, but outside the strict H200 141GB definition used here — so it shows as a dash. These are the same four families behind the 49.3% figure above.

What a buyer should take from it. The widest price gaps are on H100 and H200 — but notice where they're actually available. Azure and OCI carry every family broadly; AWS lists H200 just one way, in one region; Nebius doesn't carry A100 at all; L40S only exists on a handful of providers. A neocloud's cheap floor is only a real option when it lines up with the exact GPU and region your workload needs. Checking that across nine clouds, by hand, every time prices move, is the work — and it's exactly the work AIForge Works does for you in one place. Cloud Advisor narrows it further, to the options that actually fit your workload and region.

Where the delta lives — by family

The gap isn't uniform across the four families. Three patterns stand out in the June 29 data:

H100 80GB SXM — the highest-volume installed base

As of June 29, the average hyperscaler floor was $8.41/GPU-hr (across the four hyperscalers) versus $3.90 on the neocloud side (across the five neoclouds) — a 53.7% gap on this one GPU. For a 100-GPU cluster running 24×7×365, that's roughly $4.0M a year on this single GPU type alone. H100 SXM is where the most deployed dollars sit today, and where the gap is widest per GPU. Large-scale new training has shifted to Blackwell, but the H100 installed base — and the spend that can be optimized against it — remains the largest of any family here.

Availability skews hard by provider: Azure lists 140 H100 SXM options across 30 regions, GCP 49 across 36, OCI 52 across 26, AWS 22 across 13. Among neoclouds, Lambda has the widest geographic reach, with H100 SXM in 14 regions. When the workload needs a specific region, both the price and the number of options available in that region matter — which is the kind of two-dimensional pick Cloud Advisor is built to make.

H200 141GB SXM — the new-training upgrade path

H200 carries 76% more memory bandwidth than H100 and 141GB of HBM3e memory (vs H100's 80GB). It has become the default for new training runs and large inference workloads where the H100's smaller memory was a bottleneck. The gap on H200 (53.5%) sits just below H100 SXM (53.7%) in the June 29 read, but in absolute dollars H200 is the bigger line: $9.46/GPU-hr on the hyperscaler side versus $4.40 on the neocloud side — roughly a $4.4M annual difference on a 100-GPU deployment run 24×7×365. Hyperscalers are slow to reprice as new variants land; neoclouds move aggressively to win the frontier inference workloads the larger memory footprint enables.

H200 availability concentrates in Azure (32 options across 31 regions) and OCI (26 across 26); GCP lists 4 across 6 regions and AWS just 1, in a single region, as of June 29. Neoclouds are thinner under the strict H200 141GB definition: CoreWeave and Nebius list two options each, Crusoe one — and Lambda and Vultr don't publish a strict H200 141GB row in the current week (Lambda's H200-class capacity is GH200 96GB, which sits outside the strict H200 141GB definition). On H200, where you can get the neocloud price is as much the question as the price itself.

A100 80GB — the cost-anchored production base

A100 80GB is still a meaningful share of deployed AI capacity, especially for inference and for training pipelines that haven't migrated to newer H-series chips. The gap on June 29 is 34.3% — narrower than H100/H200 because A100 is older silicon and hyperscalers have had more time to compete. But narrower isn't closed: 34.3% on a 100-GPU A100 footprint is still seven figures a year, on a workload that often hasn't been re-evaluated in two years. That stale-but-expensive footprint is exactly what GPU Audit is designed to surface.

Availability is broadest on Azure (147 options across 34 regions) and GCP (39 across 36), reflecting A100's long deployment history on the hyperscalers; OCI lists 26 across 26, AWS 6 across 6. Among neoclouds, Crusoe leads on options at 5 across 2 regions, CoreWeave 3 across 23, Vultr 2 across 2, and Lambda 1 option now spanning 14 regions — the widest neocloud A100 geographic footprint. Nebius doesn't offer A100 80GB at all as of June 29.

L40S 48GB — the right tool when it's the right tool

L40S is where the comparison gets thinnest — only two hyperscalers and three neoclouds list it publicly — so we hold the comparison to that real footprint rather than padding it with assumptions. The gap on June 29 is 41.3%. The L40S decision is workload-shaped: for inference, fine-tuning, and mixed workloads where its lower per-GPU cost is the right fit, that 41.3% is real spending exposure. For large-scale training or memory-intensive inference where the H-series chips' bigger HBM memory is required, L40S isn't the right tool.

AWS lists the most L40S options (95 across 12 regions), reflecting its wide use across CPU-heavy AWS instance families; OCI lists 26 across 26. Azure, GCP, and Lambda don't offer it at all. Among neoclouds, Nebius has the most options (12, in Helsinki), followed by Crusoe and Vultr at 5 and 3, and CoreWeave at 1.

Why the gap holds

Three structural reasons keep the gap open across these four families:

First, hyperscalers bundle GPU pricing with ecosystem lock-in — networking, storage, IAM, managed services, the entire surrounding stack. The GPU is a loss leader for the broader platform spend. Neoclouds have no such ecosystem to subsidize, so they compete on the GPU price directly.

Second, commitment structures aren't comparable. Hyperscaler reserved instances require 1–3 year lock-ins for meaningful discounts. Neoclouds often offer comparable rates on much shorter commitments — sometimes on-demand — because their cost structures are leaner.

Third, generation lag runs both ways. On older silicon (A100), hyperscalers are slow to deprecate pricing as the silicon matures. On newer silicon (H200), they anchor pricing high at launch and adjust slowly. Neoclouds move faster in both directions because their entire business depends on staying current on the GPU itself.

None of these are bugs. They're the natural consequence of two structurally different businesses. But they mean the ~49% gap doesn't close on its own — it closes only when buyers force the comparison.

What about Blackwell and AMD?

A reasonable next question: does the pattern extend to Blackwell (B200, GB200, B300, GB300) and AMD Instinct (MI300X, MI325X, MI355X)? AIForge Works tracks these too, and directionally the dynamics look similar — hyperscalers anchor high on new silicon, neoclouds compete hard to capture the frontier. But comparing them at the same rigor as the four workhorse families takes a more careful like-for-like treatment than this post covers, so they're not part of the 49.3% figure. They're worth comparing on their own terms — the Multi-Cloud Pricing Analyzer and Cloud Advisor surface both when the comparison matters for a specific decision.

For what's actually listed across Blackwell and AMD Instinct, and the three structural reasons published prices for frontier silicon diverge from what's actually deployed — including what the June 22 SpaceX–Reflection $6.3B GB300 deal tells you about how Blackwell capacity really moves, and why open-market leverage on GB300 is set to tighten as July 2026 begins — see our companion piece, "Why the Blackwell Public Catalog Runs Thin (And Why It's About to Get Thinner!)".

The catch — the comparison only works if it's truly like-for-like

The 49.3% holds only once you're comparing the same thing on both sides. Different clouds package GPU instances differently: some include NVLink, some charge extra for high-bandwidth networking, some bundle storage in ways that change the effective hourly rate. The floors in this post match GPU type, memory, and interconnect class across providers, with the same region definitions on both sides of the comparison.

That matching is exactly what AIForge Works does for you — same GPU family, same memory, same interconnect class, same regions — and then shows what each provider actually charges. From there, GPU Audit layers in the four cost leaks that compound on top of the headline rate gap:

Price Gap vs Floor — your current rate versus the best comparable price across all nine clouds
Silicon Penalty — value lost paying SXM-tier prices for PCIe-tier performance
Egress — data-transfer overhead that inflates your effective hourly rate
Idle Waste — provisioned capacity your workload isn't using

Any one of these can move a six-figure budget line. Compounded, they routinely turn a $2.1M annual GPU bill into a $3.5M one.

What this means for procurement teams

If your team is evaluating GPU capacity for training or inference on the four workhorse families, the arbitrage opportunity is bigger now than it was a quarter ago. The 49.3% is the most visible part of the gap — but it's not the only part, and it sits on top of the four cost leaks underneath.

Three actions, in order:

Make it apples-to-apples before you negotiate. Match GPU family, memory, interconnect class, and region. The 49.3% gap stays a 49.3% gap only if you compare like-for-like — same instance, same interconnect class, same region tier. The Multi-Cloud Pricing Analyzer does this matching across all nine clouds in one view.
Run the comparison across all four families, not just the one you started with. H100/H200 is where deployed dollars compound fastest today on production training and frontier inference; A100 is where the legacy base often hasn't been re-evaluated; L40S is where workload-fit decides whether the gap even applies.
Audit your existing footprint for the four cost leaks above. The headline rate gap is rarely the only thing leaking — the silicon penalty and egress overhead usually compound underneath it, not against it. That's the audit GPU Audit runs.

The 49.3% gap won't last forever. But right now, teams that aren't at least evaluating neocloud options on the four workhorse families are leaving significant budget on the table — and the structural reasons keeping the gap open aren't changing on the timeline of a procurement cycle.

Methodology

These are public, list (on-demand) prices as published by each provider, compared as of June 29, 2026 — not live quotes. Verify current pricing, terms, fees, and discounts directly with the provider before acting.

We compare floors like-for-like: each family matches GPU type, memory, and interconnect class across providers (H100 SXM against H100 SXM, not PCIe; A100 strictly 80GB; L40S held to its real footprint of two hyperscaler and three neocloud prices), using the same region definitions on both sides. Hyperscaler floors use standard Linux on-demand rates only; spot, reserved and committed-term contracts, license-tier meters (Windows tiers, OCI AI Enterprise, GCP Dynamic Workload Scheduler), and zero-price rows are excluded. The aggregate discount is the average of the hyperscaler floors versus the average of the neocloud floors across the four families, using each provider's lowest available regions.

The availability table counts distinct purchasable instance options and the number of regions each is offered in. CoreWeave is shown across all 23 of its locations because it offers its lineup uniformly rather than region by region; a "0" region count (Vultr) means the GPU is listed but not yet tied to a named region. Comparisons cover nine providers — AWS, Azure, GCP, OCI, CoreWeave, Lambda, Vultr, Nebius, Crusoe — across the four workhorse families (H100 80GB SXM, A100 80GB, L40S 48GB, H200 141GB). Prices are refreshed weekly and checked against each provider's own published pricing. NVIDIA Blackwell (B200, GB200, B300, GB300) and AMD Instinct (MI300X, MI325X, MI355X) are covered in a separate companion post and are not part of the four-family figure.