Connect with us

AI

After the AI Binge, Companies Balk at Soaring Token Bills

Published

on

AI costs are rising fast in 2026, and the companies that binged on chatbots, coding assistants, and autonomous agents are now staring at invoices several times larger than the ones they signed up for. The reason is simple. The era of investor-subsidized pricing is ending, and the meter is finally running at something closer to the true cost of the compute behind every answer.

Kevin Simback of startup incubator Delphi Labs calls the old arrangement “subsidized intelligence,” meaning venture money was quietly footing the bill so AI firms could hook customers cheaply. That subsidy is being switched off just as OpenAI and Anthropic prepare to court public investors later this year, and the timing is not a coincidence.

The End of Subsidized Intelligence

The playbook is old Silicon Valley. Price low, capture the market, raise prices once customers are locked in. AI companies ran it at speed after ChatGPT arrived, charging rock-bottom rates that never reflected what it costs to keep racks of chips humming. Now the bills are catching up with reality.

“But the tides are beginning to turn,” Simback warned, describing the start of an era when the big labs actually have to make money. That pressure is sharpest for the two front-runners. With both OpenAI and Anthropic eyeing public markets and main-street investors, the cheap pricing that built their user bases has become a liability on the path to profitability.

The price moves are visible in the rate cards themselves. You can read the published per-token rates for frontier models and watch each new flagship land more expensive than the last, even as the labs insist the newer versions do more per dollar.

  • $5 / $30 per million tokens input and output for the newest flagship generation, against $2.50 / $15 for the prior one
  • 49% to 92% higher real-world cost for the latest model depending on prompt length, by one independent routing analysis
  • $852 billion the valuation OpenAI carried as of March, a number that needs hard revenue underneath it before a listing

Why AI Agents Burn Tokens Faster Than Chatbots

Price increases are only half the story. The other half is volume, and the volume problem has a name: agents. A chatbot answers a question and stops. An agent does things, booking appointments, writing and testing code, moving files, and a single task can spin up dozens of these agents working in parallel, each one billing as it goes.

Everything is metered in tokens, the basic unit AI firms use to charge for input and output. The trouble is how fast agent work stacks them up.

  • One agent-driven task can consume dozens of times more tokens than a single chat message
  • Agents reload context, retry failed steps, and call other agents, each pass adding to the tab
  • Coding workloads are the heaviest, generating and rechecking long stretches of text

“Especially in developer circles, the cost to use AI for things like coding has grown exponentially,” said Mark Barton of tech consultancy Omniux. “All the costs are really starting to skyrocket.” Some teams pushed usage so hard the spending lost any link to value, a binge insiders started calling “tokenmaxxing.” Jack Gold, analyst at J.Gold Associates, put the failure mode bluntly: “In some cases people are seeing the cost of tokens exceed the cost of the employee within a month or two of use, just because they’re using it too much.”

Uber Ran Through Its AI Budget Four Months In

No company has made the reckoning more concrete than Uber. Its chief technology officer, Praveen Neppalli Naga, disclosed that the ride-hailing firm had exceeded its entire 2026 AI budget within the first four months of the year. The usage is everywhere inside the company: roughly 95% of its engineers touch AI tools monthly, and on Uber’s own account about 70% of committed code is now AI-generated.

The catch is what that spend is buying. Andrew Macdonald, Uber’s chief operating officer (COO), said this week he cannot draw a clean line between the surging token bills and measurable gains in the company’s products, which makes the spending harder to justify against headcount. The disconnect sits awkwardly next to the company’s reported results, available in Uber’s first-quarter 2026 earnings filing.

Meta Walks Back the Leaderboard

Meta is rethinking the same instinct. Earlier this year it pushed staff to run up token counts as a proxy for productivity, even building an internal leaderboard. Then chief technology officer Andrew Bosworth reversed course in a staff memo reported by the Wall Street Journal.

Nobody should be using AI tools just for the sake of using them.

That line, from a company that spent the past year telling employees the opposite, marks how quickly the mood has shifted from more-is-better to prove-it.

The Retreat to Smaller and Open Models

Faced with bills like these, finance teams are doing what finance teams do. They are shopping for cheaper supply. The first move is away from the giant general-purpose model and toward a free, open-weight one that anyone can download, less capable than ChatGPT or Claude but good enough for plenty of routine work.

The second move is specialization, swapping a do-everything model for a smaller one tuned to real estate, finance, or another single domain. The third is routing, breaking a big job into steps and handing each piece to the cheapest model that can clear the bar. The savings are not marginal. “The big large monolithic model, it’s $15 per million tokens, but you can get that down to like five cents if you use the smaller mini model,” said Adrian Balfour of consultancy Enverso.

Approach Typical example Rough cost signal Best suited to
Monolithic frontier model Latest flagship from a top lab Highest per-token rate Hardest reasoning, novel tasks
Smaller “mini” model Mini or flash variant of a hosted model A fraction of flagship cost High-volume, simpler queries
Open-weight model run in-house DeepSeek, Alibaba’s Qwen, Meta’s Llama No per-token fee, you pay compute Privacy-sensitive or steady workloads
Domain-specific small model Finance or property-tuned model Low, narrow scope Repetitive industry tasks

The open camp has gotten harder to dismiss. Meta’s open-weight Llama models, China’s DeepSeek, and Alibaba’s Qwen family now handle many production jobs that a year ago would have gone to a paid frontier service by default. The trade is capability for control over cost, and for a growing share of work, that trade now pencils out.

What a Commoditized Market Means for OpenAI and Anthropic

Add it up and the direction points one way. AI is starting to behave like a commodity, where the brand of the model matters less than finding one that does the job at the right price. That is a strange backdrop for two companies trying to sell shares on the strength of being the best.

The frontier labs are not conceding the point. Anthropic publishes Claude’s API price list and keeps pushing capability at the top end, betting that demand for the smartest model outruns any drift toward cheap-and-good-enough. John Belton, a portfolio manager at Gabelli Funds, sides with them on the demand math. “The most advanced users” will always pay for the best, he said. “It’s a growing pie.”

He may be right that the pie keeps growing. The open question is how much of it stays premium once a five-cent model clears the bar for the work most companies actually run. Every quarter that agents cost more than the staff they were meant to free up, the audit of what really needs the flagship gets shorter.

The next year settles it. If agent productivity finally shows up in the numbers the way Uber’s engineers keep promising, the rising bills get easier to defend and the IPO story holds. If it does not, the spreadsheet wins, and the cheapest model that does the job takes the work.

Frequently Asked Questions

What is tokenmaxxing?

Tokenmaxxing is the practice of running up AI usage for its own sake, often because token counts were treated as a sign of productivity. Tokens are the billing unit AI firms use, so heavy use drives up cost. Analysts say some teams spent more on tokens than on the employees the tools were meant to help, prompting companies like Meta to discourage the habit.

Why are AI costs rising in 2026?

Two forces are pushing costs up at once. Investor subsidies that kept early prices artificially low are fading as leading labs move toward profitability and public listings, and newer flagship models carry higher per-token rates. At the same time, AI agents consume far more tokens than simple chatbots, so total bills climb even when individual prices hold.

How much cheaper are smaller AI models?

The gap can be enormous. By one consultant’s estimate, a large monolithic model running at about $15 per million tokens can be replaced by a smaller variant at roughly five cents per million tokens for suitable tasks. The smaller model is less capable, but for high-volume, routine work many companies find it good enough.

Are open-source AI models a real alternative to ChatGPT and Claude?

For a growing share of work, yes. Open-weight models such as Meta’s Llama, DeepSeek, and Alibaba’s Qwen are free to download and run, and they now handle many production tasks that once went to paid frontier services. They are generally less capable at the hardest problems, so most firms reserve premium models for tougher jobs and route routine work to cheaper options.

Does heavy AI spending actually boost productivity?

The evidence is mixed and contested. Uber’s chief operating officer said this week the company cannot draw a clear link between its rising AI token spend and measurable product gains, even with most engineers using the tools and a large share of code now AI-generated. Other firms report real gains in specific tasks like coding, but the broad return on investment remains unproven.

When are OpenAI and Anthropic expected to go public?

Both companies are widely expected to pursue public listings later in 2026, with reporting pointing to the second half of the year. Both are still in early or confidential stages, and the push toward profitability, including higher prices, is part of preparing for main-street investors.

Logan Pierce is a writer and web publisher with over seven years of experience covering consumer technology. He has published work on independent tech blogs and freelance bylines covering Android devices, privacy focused software, and budget gadgets. Logan founded Oton Technology to publish clear, no nonsense tech news and reviews based on real hands on testing. He has personally tested and reviewed dozens of mid range and budget Android phones, written extensively about app privacy, and built and managed multiple WordPress publications over the past decade. Logan holds a bachelor's degree in English and studied digital marketing at a certificate level.

Continue Reading
Click to comment

Leave a Reply

Your email address will not be published. Required fields are marked *

Trending