Sequel to AI Data Center's Split — measuring the real depth of the agent economy through three gauges: tokens, traffic, ARR
HHaelangdal·Founder AnalystMay 16, 202618 min readAgentic AI, Penetration reverse-engineering, Token throughput, ARR crossover, Web traffic, ChatGPT, Claude, Claude Code, Anthropic, OpenAI, TPU, Broadcom, HBM, AI data center split, Haelangdal perspectives
Reader's Brief — 30-second TL;DR
Intermediate
Thesis
Agents already drive 36% of LLM industry revenue while holding under 5% of traffic. Revenue follows tokens, not user count.
Signal
Anthropic ARR jumped 3.3x from $9B to $30B in four months while web traffic moved only from 2.2% to 6.0%. Claude Code alone hit $1B ARR in six months and $2.5B in 14.
Risk
If token-price declines outrun usage multiplication, unit economics break and breakeven slips from 2028 to 2030. The core risks are major-cloud in-house models and Chinese open-source coding agents reaching parity.
Action
Position the TPU value chain (Alphabet, Broadcom, Amazon's Bedrock plus $8B direct investment), memory HBM (SK Hynix, Samsung, Micron), and OpenAI exposure via Microsoft and Oracle. Validate first via quarterly token-throughput disclosure, then via the HBM order cycle.
Three gauges — the Token / Traffic / ARR framework
Three objective gauges measure demand in the LLM industry. Each sees a different picture, and only crossing them lets you reverse-engineer agent penetration.
throughput — the infrastructure-side gauge
A token is the minimum unit an LLM uses to process text, and translates directly into GPU and (Tensor Processing Unit) cycles. Token demand is compute demand, and it is the ignition point of the data-center capex cycle. Google first disclosed monthly token throughput at I/O in May 2025 and has updated the number each quarter since.
There are two weaknesses. First, the number is sensitive to efficiency and reasoning depth — a reasoning model like Gemini 2.5 Flash burns 17x more tokens per request. Second, the definition isn't consistent — Alphabet switched its disclosure from `across surfaces` to `direct API tokens per minute` starting with the Q4 2026 earnings call, so direct comparison of absolute numbers requires care.
Web traffic share — the consumer chatbot gauge
Similarweb tracks generative-AI web traffic share at the domain level, showing which platform consumer chatbot users use. It nails the distribution of general users, but has a hard limit — API calls, mobile apps, embedded integrations, and agent workflows don't appear in it at all.
That limit is the starting point of this analysis. measured by traffic are chatbot users; users NOT measured by traffic are agent users. To recover the share and value of both groups, you have to look at revenue that flows outside of traffic — that is, .
ARR — the billing gauge
ARR (Annual Recurring Revenue) is the annualized estimate of what users actually pay. Consumer subscriptions (ChatGPT Plus, Claude Pro), enterprise seats, and — above all — API token billing all roll into it. Whether a user sends one chatbot message or an agent processes 10M tokens in the background, multiplying by token price turns them both into revenue.
Crossing the three breaks simple proportionality. The width of that break is the depth of agent penetration.
Full access requires 🥉 Bronze tier
Sign in with Google — your tier will be checked automatically and access granted if eligible.
This site runs on ads — the tier system rewards community contributions.
Comments
This report is provided for informational purposes only and does not constitute a recommendation to buy or sell any financial instrument. Investment decisions should be made based on your own judgment and responsibility. The analysis and opinions contained herein are based on information available at the time of writing and are subject to change.