The token bill arrives

Jun 05, 2026

As agentic harnesses enhance the capabilities of their underlying models and increase the time horizon with which they can operate, token consumption soars. Organizations leaning into coding agents in particular are discovering sticker shock as they burn through AI budgets far faster than anticipated. Capabilities continue to expand. Costs likewise balloon. Outcomes and return-on-investment remain, at times, elusive.

FRESH READS

① Uber’s finance team overtaken by engineering in AI use
Summary: Uber’s CTO Praveen Neppalli Naga disclosed that the company consumed its annual 2026 AI budget in the first four months of the year. COO Andrew Macdonald acknowledged it remains difficult to draw a direct line between rising token usage and improvements in customer outcomes. Uber’s engineering team has overtaken the finance team as the company’s leading AI adopter, with roughly 95% of engineers using AI tools monthly and about 70% of committed code coming from AI coding tools.
Signal: Token costs are a rising concern as organizations scale AI usage without clear ties to business outcomes. Even teams that avoid token-maxxing, i.e., Uber’s finance organization explicitly noted the absence of it, need cost governance to be paired with adoption programs from the start.

② DeepSeek V4 Pro API: 75% price cut made permanent
Summary: DeepSeek has cut prices for its V4 Pro API by 75% permanently. The price cut is enabled by 1/ architecture efficiency, by having cost scale with only active parameters rather than total parameters, 2/ compressed attention, reducing KV-cache memory pressure, the dominant cost of long-context inference, and 3/ aggressive prefix caching.
Signal: At $0.18 per 1 million blended tokens, V4 Pro significantly undercuts frontier lab pricing for multi-step agentic workloads. For teams currently budgeting agentic workloads on GPT-5 or Claude-class models, this is a materially different cost curve and raises the question of whether capability parity justifies the price premium of closed frontier models.

③ Linux Foundation announces the intent to launch the Tokenomics Foundation to establish open standards for AI cost management
Summary: The Linux Foundation announced the intent to launch the Tokenomics Foundation, which will focus on establishing open industry standards, benchmarks, and best practices for the economics of AI infrastructure. Research from Goldman Sachs projects global token usage to multiply 24x between 2026 and 2030 to 120 quadrillion tokens per month. With industry analysts forecasting more than $1 trillion in AI infrastructure investment through 2027, and the inference market projected to expand from $106 billion in 2025 to $255 billion by 2030, the Foundation will serve both the buyer and supplier side of the AI economy, acting as a neutral home to develop the standards needed to measure token economics transparently across the entire supply chain.
Signal: Per-token costs fell steadily from 2023 to 2025 but have leveled off, and new model prices are rising again, making AI the fastest growing line item on enterprise technology budgets. To evaluate cost on an apples-to-apples basis, organizations need vendor-neutral standards for measuring token efficiency. The Tokenomics Foundation is an attempt to do for AI spend what FinOps did for cloud, i.e., create a common framework that organizations need to manage cost at scale.

④ Anthropic confidentially submits draft S-1 to the SEC
Summary: Anthropic filed a confidential draft S-1 with the SEC on June 1, one week after OpenAI. The filing gives Anthropic the option to go public, though no timeline has been set. The company raised $65 billion in series H funding at $965 billion post-money valuation just last week, overtaking OpenAI whose last valuation was at $852 billion.
Signal: SpaceX, OpenAI, and Anthropic have filed with the SEC putting three high-profile AI and deep technology companies on a path towards the public markets. These IPOs will test whether public investors will fund the sustained capital expenditures that these companies require against the promise of large but long-horizon returns.

⑤ Expanding Project Glasswing
Summary: Anthropic announced that they are expanding Project Glasswing from roughly 50 initial partners to approximately 150 new organizations, based in more than 15 countries and covering more industries that were not well represented in the initial cohort, e.g., power, water, healthcare, communications, and hardware. The initial partners have already used Claude Mythos Preview to surface more than 10k high- or critical-severity vulnerabilities. To support cyber defenders, Anthropic released Claude Security and is sharing its internal Glasswing tooling with trusted security teams on request.
Signal: Anthropic estimates that a successful attack on most of these new partner organizations could affect more than 100 million people. Critically, Anthropic also anticipates that Mythos-class models from other AI companies will be available within 6 to 12 months, potentially released without adequate misuse safeguards in place. With vulnerability patching as the new bottleneck, organizations must adapt their pipelines to quickly address disclosed vulnerabilities.

⑥ OpenAI frontier models and Codex are now available on AWS
Summary: OpenAI frontier models are now generally available on AWS, i.e., GPT-5.5 and GPT-5.4 join the previously available OSS models. This integration allows organizations to consume OpenAI capabilities through existing AWS security, compliance, procurement, and governance workflows. OpenAI also hinted that Daybreak, their approach to AI-native software development and defense, is coming to AWS.
Signal: OpenAI and Anthropic now distribute their frontier models across all three major cloud providers, e.g., AWS, Azure, and GCP, commoditizing model access. The strategic battleground is moving up the stack to the harness and application layer, as illustrated by the direction of both Mythos and Daybreak, coupling cyber security models with intelligent agentic harnesses.

⑦ NVIDIA and Microsoft reinvent Windows PCs for the age of personal AI
Summary: NVIDIA unveiled RTX Spark, a 1-petaflop superchip, combining a Blackwell RTX GPU with 6,144 CUDA cores, a 20-core Grace CPU, 128 GB unified memory, and NVLink-C2C high speed interconnects. It powers the first Windows PCs purpose-built for local AI agents, supporting 120 billion parameter LLMs with 1 million token context windows. It implements new Windows security primitives and NVIDIA OpenShell runtime for policy controls over agent permissions, model routing, and personal data masking.
Signal: With the cost of tokens top of mind and subsidized seat-based pricing eroding, running agentic workloads locally is increasingly attractive. The new Windows security primitives and OpenShell runtime address concerns with running agents securely on a user’s primary computing device.

ONE TO WATCH

① defending-code-reference-harness
Summary: A reference harness implementation for autonomous vulnerability discovery and remediation with Claude, based on learnings from partnering with security teams since launching Claude Mythos Preview.

Always be learning.

heeki reads #3
Written by Heeki Park, Principal SA @ AWS. Opinions are my own.
Alcurio is where alchemy meets curiosity.

heeki builds

Discussion about this post

Ready for more?