AI

Tokenmaxxing Turns AI Productivity Into a Cost Trap

Published

3 months ago

May 10, 2026

Tokenmaxxing, the Silicon Valley habit of pushing artificial intelligence (AI, software that performs tasks tied to human reasoning) tools to burn through more tokens, has run into a blunt enterprise objection: usage is not the same as value. Chris Bedi, ServiceNow’s chief customer officer, says the token race may fade fast because every prompt, agent run, and coding loop carries a bill.

That warning matters because AI budgets are moving from experiment lines to operating plans. The easy scoreboard counts tokens. The harder one asks whether a worker shipped safer code, closed a case faster, or saved the company more than the model provider charged.

ServiceNow Puts a Bill Behind the Buzzword

Bedi’s objection is less about whether workers should use AI and more about what companies reward once they do. At the company’s Knowledge event in Las Vegas this week, he told Observer that tokenmaxxing could be a short hype cycle because heavier use still has to pass a return on investment (ROI, the gain a company gets compared with what it spends) test.

There’s a bill to pay for those tokens.

Bedi, chief customer officer at the workflow software vendor, made the point while his own company is leaning hard into enterprise AI. In ServiceNow’s first quarter financial results, the company reported $3.671 billion in subscription revenue, up 22% year over year, and said Now Assist customers spending more than $1 million in annual contract value grew more than 130%.

That mix gives the warning weight. The company is not talking down AI adoption. It is selling into it. More than 85% of the Fortune 500 work with the platform, according to ServiceNow’s own platform overview, which means Bedi hears from buyers who have to defend AI spend to finance, legal, security, and line managers.

Tokenmaxxing AI costs challenge enterprise productivity metrics and workflow budgets.

Token Volume Became the Easy Scoreboard

A token is a chunk of text, code, or other input that a model processes. Counting tokens is useful for billing and capacity planning. It becomes dangerous when managers treat the count as proof that work improved.

The reason the metric spread is simple. Tokens are visible. Output quality is messy. A company can pull a dashboard showing how much AI was consumed across teams faster than it can prove whether the code was maintainable, the answer was compliant, or the customer case was handled well.

More than 16 billion tokens per minute: Google and Alphabet chief executive Sundar Pichai said first party models such as Gemini process that volume through direct application programming interface use, up from 10 billion the prior quarter, in Alphabet’s Q1 remarks.
1000x more tokens: An April paper on agentic coding tasks found those tasks consumed 1000 times more tokens than code reasoning and code chat, according to the arXiv study on agent token spending.
$150 to $250 per developer per month: Anthropic’s Claude Code documentation says average enterprise deployment costs land in that range, while per developer costs vary by model, codebase size, and automation pattern.
30x variance: The same agentic coding study found runs on the same task can differ by up to 30 times in total token use, which makes raw consumption a shaky productivity signal.

Coding Agents Bend the Cost Curve

Tokenmaxxing looks different when it moves from chat to coding agents. A human asking for a summary has a natural ceiling. An agent that searches a repo, edits files, runs tests, reads errors, and tries again can keep spending long after the worker has walked away.

That is why the coding wave has made the metric so visible. Agentic tasks create long context windows, repeated tool calls, and large input loads. Anthropic’s Claude Code cost guidance tells teams to track token use, set workspace spend limits, manage context, keep agent teams small, and clean up active teammates because each instance has its own context window.

AI Work Pattern	What Token Counts Capture	What They Miss	Better Enterprise Measure
Casual chat help	Prompt and answer size	Whether the answer changed the work	Accepted answer rate
Coding agent loop	Repo reads, edits, test runs, retries	Maintainability, security, and review burden	Accepted pull requests after review
Customer service agent	Conversation and retrieval volume	Customer satisfaction and escalation quality	Resolved cases with low reopen rate
Workflow automation	Model calls inside a process	Manual steps removed from the process	Cost per useful outcome

The table is the trap in one screen. The highest token total may belong to the least disciplined workflow. A short run that fixes the right thing beats an overnight run that creates review debt.

Model Vendors Benefit When the Meter Spins

There is a business reason tokenmaxxing sounds attractive in parts of the AI market. Many model products charge by consumption, so more usage can mean more revenue for the provider. OpenAI, Google, Anthropic, and other model companies also need high utilization to justify huge infrastructure spending.

That does not make consumption a bad metric. It makes it an incomplete one. OpenAI’s API pricing page lists model charges per 1 million tokens, with different rates for input, cached input, output, priority processing, and tools. For a finance team, that price sheet turns every agent design choice into a budget choice.

Google’s disclosure adds the scale. If first party models are already processing more than 16 billion tokens a minute through customer API use, token growth has become an industry health signal. For a buyer, though, the question is narrower: how many of those tokens removed a ticket, reduced a false positive, shortened a sales cycle, or prevented a bad answer from reaching a customer?

ServiceNow Sells the Control Layer

The same warning that undercuts tokenmaxxing also supports the product pitch from workflow platforms. If AI work spreads across departments, a company needs a way to see agents, govern access, measure spend, and shut down risky behavior before a novelty dashboard turns into operational debt.

That is the lane the workflow vendor is building. At Knowledge, AI Control Tower’s expanded capabilities were described across five functions: discover, observe, govern, secure, and measure. The company said the tool can monitor AI systems beyond its own platform, connect to clouds and enterprise applications, and detect when an agent goes beyond its permissions.

Finance teams need spend controls before agent use becomes a surprise line item.
Security teams need identity and permission limits for non-human actors.
Compliance teams need logs that show why an agent made a decision.
Operations leaders need outcome metrics tied to work, not just model activity.

The company’s Autonomous Workforce expansion also shows why governance is moving to the center of enterprise AI. New AI specialists for information technology, customer relationship management (CRM, software for managing customer relationships), employee service, and security are meant to complete end to end processes, not simply answer questions.

The Employee Signal Behind the Metric

Tokenmaxxing also changes the worker signal. In the early productivity rush, heavy AI use can look like ambition. The worker who prompts more, runs more agents, and keeps tools active overnight appears to be adopting faster than the person who uses AI sparingly but carefully.

That can turn a budget metric into a culture problem. If employees believe the company values visible AI consumption, they will produce visible AI consumption. The result may be longer prompts, more speculative agent runs, and less attention to whether the work improved. Oton Technology has covered a related pressure point in Meta’s AI restructuring and Nairobi contractor cuts, where AI strategy was not just a tool decision but a labor decision.

Managers should be careful here. Low token use can mean resistance. It can also mean skill. A good engineer may know when to ask a model, when to write the patch, and when to stop the agent before it burns budget chasing a weak plan.

Outcome per Token Becomes the Enterprise Test

The better metric will not be one number. It will be a bundle of boring measures that finance and operations teams already understand: time saved, defects avoided, cases closed, code accepted, incidents resolved, and cost per completed workflow.

For coding teams, that means pairing token use with review acceptance, rollback rates, security findings, and cycle time. For service teams, it means case resolution, reopen rates, customer satisfaction, and escalation quality. For internal operations, it means fewer handoffs and less manual rekeying, not just more prompts.

Bedi’s warning lands because the first stage of AI adoption rewarded movement. The second stage will reward proof. Companies can still give workers generous AI budgets, but the budget has to buy outcomes rather than theatrical consumption.

If tokenmaxxing makes employees fluent with AI, the experiment will have served a purpose. If it becomes the scorecard, the bill will arrive before the value does.