EDITORIAL VIEW

Reverse-engineering Agentic AI Penetration

Sequel to AI Data Center's Split — measuring the real depth of the agent economy through three gauges: tokens, traffic, ARR

Haelangdal·Founder AnalystMay 16, 202618 min readHaelangdal Perspectives

Bottom Line

Agents already drive 36% of LLM industry revenue while holding under 5% of traffic. Revenue follows tokens, not user count.

Reader's Brief — 30-second TL;DR

Intermediate

Signal

Anthropic ARR jumped 3.3x from $9B to $30B in four months while web traffic moved only from 2.2% to 6.0%. Claude Code alone hit $1B ARR in six months and $2.5B in 14.

Risk

If token-price declines outrun usage multiplication, unit economics break and breakeven slips from 2028 to 2030. The core risks are major-cloud in-house models and Chinese open-source coding agents reaching parity.

Action

Position the TPU value chain (Alphabet, Broadcom, Amazon's Bedrock plus $8B direct investment), memory HBM (SK Hynix, Samsung, Micron), and OpenAI exposure via Microsoft and Oracle. Validate first via quarterly token-throughput disclosure, then via the HBM order cycle.

Reading depth

Three gauges — the Token / Traffic / ARR frameworkCrossing the three breaks simple proportionality. The width of that break is the depth of agent penetration.Jump to section
What the gauges show — 134x growth and the ARR crossoverGoogle's token throughput grew 134x in 18 months and per-user tokens 27x. Reasoning models and API and agent calls have outpaced consumer chat traffic.Jump to section
The traffic-ARR decoupling — 11x per-user gap and the Claude Code multiplierThe traffic-ARR decoupling is about user mix. Anthropic's 11.3x higher ARR per user means its base includes heavy API and agent token consumers.Jump to section
Penetration today and the room left — labor market, not SaaSAgents opened a new dimension of revenue, and that dimension's room is 100x bigger than total SaaS. Less than 1% of it has been used.Jump to section
Investment implications — TPU value chain, HBM, OpenAI exposureThe next leg of the revenue curve is tied to compute supply. First-order winners are the TPU chain, second HBM, and the 134x throughput is HBM demand's real function.Jump to section
Conclusion — infrastructure and revenue, aligned on the same splitChatbots are not the whole revenue curve. And infrastructure and revenue alike are now split into two species each.Jump to section

Three gauges — the Token / Traffic / ARR framework

Three objective gauges measure demand in the LLM industry. Each sees a different picture, and only crossing them lets you reverse-engineer agent penetration.

throughput — the infrastructure-side gauge

A token is the minimum unit an LLM uses to process text, and translates directly into GPU and (Tensor Processing Unit) cycles. Token demand is compute demand, and it is the ignition point of the data-center capex cycle. Google first disclosed monthly token throughput at I/O in May 2025 and has updated the number each quarter since.

There are two weaknesses. First, the number is sensitive to efficiency and reasoning depth — a reasoning model like Gemini 2.5 Flash burns 17x more tokens per request. Second, the definition isn't consistent — Alphabet switched its disclosure from `across surfaces` to `direct API tokens per minute` starting with the Q4 2026 earnings call, so direct comparison of absolute numbers requires care.

Web traffic share — the consumer chatbot gauge

Similarweb tracks generative-AI web traffic share at the domain level, showing which platform consumer chatbot users use. It nails the distribution of general users, but has a hard limit — API calls, mobile apps, embedded integrations, and agent workflows don't appear in it at all.

That limit is the starting point of this analysis. measured by traffic are chatbot users; users NOT measured by traffic are agent users. To recover the share and value of both groups, you have to look at revenue that flows outside of traffic — that is, .

ARR — the billing gauge

ARR (Annual Recurring Revenue) is the annualized estimate of what users actually pay. Consumer subscriptions (ChatGPT Plus, Claude Pro), enterprise seats, and — above all — API token billing all roll into it. Whether a user sends one chatbot message or an agent processes 10M tokens in the background, multiplying by token price turns them both into revenue.

Crossing the three breaks simple proportionality. The width of that break is the depth of agent penetration.

Full access requires 🥉 Bronze tier

Sign in with Google — your tier will be checked automatically and access granted if eligible.

Sign in with Google

This site runs on ads — the tier system rewards community contributions.

Comments

This report is provided for informational purposes only and does not constitute a recommendation to buy or sell any financial instrument. Investment decisions should be made based on your own judgment and responsibility. The analysis and opinions contained herein are based on information available at the time of writing and are subject to change.