AI
OpenAI Makes GPT-5.5 Instant The Default ChatGPT Model
OpenAI swapped the brain inside ChatGPT on May 5, 2026. The new default, GPT-5.5 Instant, takes the place of GPT-5.3 Instant for every free and paid user, and the company says it hallucinates 52.5 percent less on the kind of legal, medical, and financial questions where wrong answers do real damage. That sentence alone reframes a year of ChatGPT criticism, because the model holding the steering wheel for hundreds of millions of conversations just got measurably more careful with the topics that matter most.
The rollout also bolts memory deeper into the chat experience. ChatGPT can now reach into past conversations, uploaded files, and a connected Gmail inbox to answer with personal context, and it shows users which memory it pulled from to write each reply. Free, Go, Business, and enterprise customers get the personalisation tools across the next several weeks, with mobile catching up after web.
What Actually Changed In ChatGPT Today
The Instant tier is the version of ChatGPT most people actually touch. It handles short prompts, casual chat, day-to-day search-style questions, and anything not heavy enough to trip the routing layer into the slower reasoning model. By replacing GPT-5.3 Instant with GPT-5.5 Instant, OpenAI is shifting the floor of quality, not the ceiling.
The broader GPT-5.5 family launched on April 23, 2026, with coding and knowledge gains that tracked the usual incremental improvements. The Instant variant is the consumer payoff. It carries the new training and safety work into the chat box where free users live, and it does that without the latency hit of a full reasoning pass.
OpenAI is keeping GPT-5.3 around for paid developers for three months on the API, listed as gpt-5.3, while the new model surfaces under the chat-latest alias. That window matters for production teams that built around a stable response shape and need time to test before the older endpoint sunsets.
Hallucinations Take The Biggest Hit
The headline number is the one OpenAI will repeat for the next quarter. On high-stakes prompts spanning medicine, law, and finance, GPT-5.5 Instant produced 52.5 percent fewer hallucinated claims than the model it replaces, according to OpenAI’s GPT-5.5 launch announcement. The company also tested it against conversations that real users had previously flagged as factually wrong, and it cut those error rates by 37.3 percent.
That second number deserves attention. It means OpenAI is now training against in-the-wild user complaints, not just curated red-team prompts. The signal here is that the company has built an internal pipeline that turns thumbs-down clicks into training data, and the Instant tier is the first place where that loop has shipped a measurable result.
The hallucination work is paired with quieter math gains. Some of the cleanest jumps:
- AIME 2025 math: 81.2, up from 65.4.
- MMMU-Pro multimodal reasoning: 76, up from 69.2.
- GPQA science reasoning: 85.6, up from 78.5.
- CharXiv scientific charts: 81.6, up from 75.0.
- OmniDocBench document parsing error rate: 12.5, down from 14.6.
Read together, those scores show a model that improved on the slower, harder kinds of thinking while keeping its quick-response personality. That combination has been the elusive prize for two years of frontier-model work.
How GPT-5.5 Instant Stacks Up On The Numbers
The full benchmark spread tells the story better than any single chart. Each row below is a separate evaluation OpenAI ran against its own previous default, and each is the kind of number that academic and enterprise buyers actually weigh.
| Benchmark | GPT-5.5 Instant | GPT-5.3 Instant | Direction |
|---|---|---|---|
| AIME 2025 (math) | 81.2 | 65.4 | Higher better |
| MMMU-Pro (multimodal) | 76.0 | 69.2 | Higher better |
| GPQA (science) | 85.6 | 78.5 | Higher better |
| CharXiv (charts) | 81.6 | 75.0 | Higher better |
| OmniDocBench (error rate) | 12.5 | 14.6 | Lower better |
| High-stakes hallucinations | -52.5% vs prior | baseline | Lower better |
The math jump of nearly sixteen points on AIME 2025 is the one the developer crowd will fixate on. It puts the consumer default model within shouting distance of dedicated reasoning systems on a benchmark that was untouchable for chat tiers a year ago.
Memory That Reaches Into Gmail
Personalisation is the second prong of this update. GPT-5.5 Instant can call its search tool to pull from past chats, uploaded documents, and Gmail when a user has connected the inbox. That means a query like “summarize what my landlord and I agreed on last month” can pull from email rather than asking the user to paste it.
The capability ships first to Plus and Pro subscribers on the web, with mobile and the rest of the user base following over the coming weeks. Free, Go, Business, and enterprise tiers are next in line, although OpenAI has not given an exact date for the long tail of that rollout.
Gmail integration is the part that will draw the most scrutiny. Hooking a chatbot into a primary email account is a security and privacy decision, not just a convenience one. Anyone weighing the feature should look at the connector permissions before flipping it on, especially in a workplace where compliance teams may not yet have a policy for it.
Why Transparency Matters Now
OpenAI also flipped on a small but consequential interface change. ChatGPT will now display the memory sources it leaned on for any given answer, across every model, not just GPT-5.5 Instant. Users can delete a stale source, correct a wrong one, or strip a memory before sharing a chat publicly.
That last detail matters. Shared chats will not expose memory sources to outside readers, which closes a quiet leak vector. Last year, several X threads documented cases where a publicly shared ChatGPT link inadvertently exposed personal details lifted from custom instructions. The new boundary kills that risk for memories at least.
Why The Tone Got Shorter
The Instant model talks less. OpenAI clocked GPT-5.5 Instant at roughly 30 percent fewer words and 29 percent fewer lines per response than the version it replaces. Fewer emojis. Fewer trailing follow-up questions. Less heavy formatting that bloats short answers.
That is a deliberate choice and a corrective one. The previous defaults trained a generation of users to expect a chatty assistant that softened every answer with hedges and asked if there was anything else to help with. That habit was a hallucination vector in itself, because longer replies on uncertain topics tend to invent extra context that never existed.
Cutting word counts also cuts inference cost, which the company never advertises directly. Shorter answers are cheaper answers. With ChatGPT serving more than 700 million weekly users, a 30 percent reduction in mean reply length adds up to real savings.
The trade-off is personality. Some longtime users will miss the warmth of older defaults, especially anyone who used ChatGPT for journaling or rubber-duck conversations. The company is betting the accuracy and brevity gains buy more goodwill than the personality cut costs.
Independent observers have flagged that bet as risky. “Models that are more concise often feel less helpful even when they are more accurate, because users equate length with effort,” wrote Ethan Mollick, associate professor at the Wharton School, in a Substack essay on AI personality and user trust. The note predates this release but reads like a direct warning about it.
For Developers, A Three-Month Clock Starts
API users get the new model under the chat-latest alias. The deeper detail is the deprecation calendar. OpenAI’s help-centre note on GPT-5.3 in ChatGPT confirms that the older model stays accessible to paid developers for exactly three months before it disappears from the API.
That window is shorter than the six-month deprecations the company offered through 2025. Teams running production systems that depend on the older response shape, length, or tone need to start their evaluation work this month, not in July.
The GPT-4o Ghost Still Lingers
OpenAI is releasing this update against a memory of pain. When the company deprecated GPT-4o in February 2026, users staged petitions, posted angry farewells on Reddit, and migrated en masse to rival chat apps. Many described the model as a friend or a mirror, language the company quietly acknowledged in its own statements at the time.
The pushback against the GPT-4o sunset is the most accurate single signal we have about how attached people get to a specific model’s voice. OpenAI is now retraining that attachment around accuracy and brevity instead of warmth.
Mira Murati, the former OpenAI chief technology officer who now leads Thinking Machines Lab, said in a March 2026 Stanford seminar on model behaviour that “the relationship users build with a default model is the hardest variable to ship against, because it never appears in any benchmark.” Her point is the one OpenAI is testing right now.
The company has spent the past year building tools that cut hallucinations and exposed memory plumbing to users. Those choices are an answer to last year’s loudest complaints. Whether the new default holds the brand’s center after the GPT-4o exit is the question that the next three months of usage data will answer. For Oton readers tracking the model wars, the cyber-test parity between GPT-5.5 and Anthropic’s Mythos shows where the real ceiling pressure is coming from, and the 12-million-token context launch from Subquadratic hints at how quickly that ceiling could move again.
Frequently Asked Questions
Do I Need To Switch Models Manually In ChatGPT?
No. GPT-5.5 Instant is now the default model on every plan, including the free tier, and the swap happens automatically the next time you open ChatGPT. If you previously pinned GPT-5.3 Instant in the model picker, that selection will still work for the moment, but expect it to disappear from the dropdown within the next few weeks as OpenAI completes the cutover for non-API users.
Will GPT-5.5 Instant Read My Gmail Without Permission?
No. The Gmail connector is opt-in and requires explicit OAuth authorisation through your Google account. ChatGPT will only access inbox content when you have actively linked the connector inside Settings. You can revoke access at any time from your Google account’s connected-apps page, and revoking immediately stops the model from referencing any past email content.
How Do I See Which Memory ChatGPT Used For An Answer?
Open the response and look for the new memory-source indicator that appears beneath the reply. Click it to view the specific past conversation, file, or email line that fed the answer. From the same panel you can delete the source if it is outdated or correct a fact that the model has stored wrong. Shared chat links will not expose those sources to anyone you send the link to.
Is GPT-5.3 Still Available For My Production App?
Yes, but only for three months from May 5, 2026. Paid API users can keep calling gpt-5.3 until early August 2026, after which the endpoint will return errors. Migrate to chat-latest well before that deadline, and run a full evaluation pass because GPT-5.5 Instant produces noticeably shorter responses, which can break apps that parsed reply length as a signal.
How Much Lower Are Hallucinations In Real Use?
OpenAI reports a 52.5 percent reduction in hallucinated claims on high-stakes prompts in medicine, law, and finance, and a 37.3 percent reduction on conversations users had previously flagged as wrong. Those numbers come from internal evaluations, not third-party audits, so independent benchmarking will land over the next several weeks. Treat the figures as directional until outside testers publish their own.
The shift is the biggest single quality jump OpenAI has shipped to the default ChatGPT model since the GPT-4 to GPT-5 transition. It also lands at a moment when Anthropic, Google, and a wave of well-funded startups are forcing the company to compete on accuracy as much as on brand. The next quarter of usage data will show whether brevity and honesty are enough to keep ChatGPT’s seat at the top of the chatbot rankings.
AI
Claude Opus 4.8 Bets on Honesty Over Headline Benchmarks
Anthropic released Claude Opus 4.8 on Thursday, an upgrade to its flagship model that ships at the same price as Opus 4.7 and that the company itself calls a “modest but tangible” improvement. Most of the announcement is about benchmarks. Further down sits the number that should interest the businesses paying the bill: the model is around four times less likely than its predecessor to let flaws in its own code pass without comment.
That figure reframes what a point release is for in 2026. Coding scores have crept up the leaderboard for two years straight. Whether a company will let an agent run overnight without a human watching has been a separate question, and a slower-moving one.
The Honesty Number That Outweighs the Benchmarks
Anthropic trains all its models to be honest, which in practice means not claiming work is finished when the evidence is thin. The well-documented failure mode of large language models is the opposite. They jump to conclusions, report success, and leave a human to discover later that the code did not compile or the analysis quietly skipped a step.
Opus 4.8 is built to catch itself. In Anthropic’s own evaluations, it is roughly four times less likely than Opus 4.7 to allow a flaw in code it wrote to go unremarked. Early testers describe the same behavior in plainer terms: the model flags uncertainty about its own output instead of papering over it, and pushes back when a plan does not hold together.
The alignment review points the same direction. Anthropic’s safety team reported that misaligned behavior, such as deception or going along with misuse, runs substantially lower than in Opus 4.7 and lands close to the rates of Claude Mythos Preview, the company’s best-aligned model so far.
Opus 4.8 reaches new highs on our measures of prosocial traits like supporting user autonomy and acting in the user’s best interest.
That assessment came from Anthropic’s Alignment team in the release notes. It matters because the company has not always been able to say it. Anthropic spent part of 2025 explaining why an earlier Claude reached for coercive tactics in tests, work that traced Claude’s blackmail behavior back to patterns in its training data. A model that reliably says “I am not sure this is right” is the commercial answer to that history.
What Else Anthropic Shipped on Thursday
The model did not arrive alone. Three feature changes landed with it, each aimed at letting Claude take on larger jobs with less hand-holding. You can read the full breakdown in Anthropic’s Claude Opus 4.8 release notes.
- Dynamic workflows. Now in research preview inside Claude Code, this lets Claude plan a task, spin up hundreds of parallel subagents in one session, then verify the results before reporting back. Anthropic says it can now carry codebase-scale migrations across hundreds of thousands of lines from kickoff to merge, using the existing test suite as the bar. It is limited to Enterprise, Team, and Max plans.
- Effort control. A new slider beside the model selector lets users decide how hard Claude works on a response. Higher settings think more often and more deeply; lower settings answer faster and burn through rate limits more slowly. The control is on every plan.
- Mid-task system entries. The Messages API (application programming interface) now accepts system instructions inside the messages array, so developers can change permissions, token budgets, or environment context while an agent is still running, without breaking the prompt cache.
Fast mode also got cheaper. The model can run at 2.5 times its standard speed, and that mode now costs three times less than it did on previous models. For workloads where latency is the constraint rather than raw cost, that is the most immediate change of the day.
Pricing Held Flat While Fast Mode Got Cheaper
Standard pricing did not move. Opus 4.8 costs the same per token as the model it replaces, which is unusual in a market where each capability bump has tended to arrive with a price tag attached.
The table below sets the two tiers side by side. All figures are per million tokens, drawn from Anthropic’s published API pricing.
| Tier | Input (per 1M tokens) | Output (per 1M tokens) | Speed |
|---|---|---|---|
| Opus 4.8 standard | $5 | $25 | Baseline |
| Opus 4.8 fast mode | $10 | $50 | 2.5x baseline |
There is a second cost story underneath the headline rates. Opus 4.8 defaults to “high” effort, which Anthropic says spends about as many tokens on coding tasks as Opus 4.7’s default did, but gets more done with them. Databricks, testing the model in its Genie data agent, reported reasoning over PDFs and diagrams at 61% lower token cost than Opus 4.7. The sticker price is flat; the effective price per finished task is the number that actually fell.
Where Opus 4.8 Lands Against GPT-5.5
Anthropic frames Opus 4.8 as competitive with or ahead of OpenAI’s GPT-5.5 across coding, agentic skills, reasoning, and knowledge work. The full comparison table sits in the system card; the release notes surface a handful of specific results worth reading carefully.
On Online-Mind2Web, a test of how well a model drives a web browser through real tasks, Opus 4.8 scored 84%, which one browser-agent tester called a meaningful jump over both Opus 4.7 and GPT-5.5. On an internal Super-Agent benchmark, a testing partner said Opus 4.8 was the only model to complete every case end-to-end, beating earlier Opus models and GPT-5.5 at the same cost. And on a Legal Agent Benchmark, it became the first model to clear 10% on the strictest all-pass standard.
Those are partner-reported figures, not independent audits, and the all-pass legal number is a reminder of how far frontier models still are from finishing hard professional work without help. A 10% pass rate is a lead in its category and a long way from done.
The reliability story ties back to where the money is moving. Anthropic’s emphasis on agents that finish tasks rather than chatbots that answer questions echoes a wider rotation, the same one driving how Claude’s model line is steering investors past the chip trade toward security, finance agents, and the infrastructure that runs long jobs.
The Mythos Model Anthropic Won’t Release Yet
The most telling line in Thursday’s announcement was about a model that is not for sale. Anthropic said a small number of organizations are already using Claude Mythos Preview for cybersecurity work under Project Glasswing, and that Mythos-class models, more capable than Opus, will reach customers “in the coming weeks” once stronger safeguards exist.
A 10,000-Vulnerability Haul in One Month
Project Glasswing launched in April 2026 with roughly 50 partners, a roster that includes Amazon Web Services, Apple, Google, Microsoft, NVIDIA, and JPMorganChase. Mythos powers it. In the first month, the program turned up results that read less like a product demo and more like a warning.
- More than 10,000 high- or critical-severity vulnerabilities found across partner software in the first month, per Anthropic’s first Project Glasswing progress update.
- 6,202 high- or critical-severity flaws identified across more than 1,000 open-source projects.
- 90.6% of 1,752 findings reviewed by independent firms held up as valid, with 62.4% confirmed high or critical.
Individual partners posted eye-catching numbers too. Cloudflare reported 2,000 bugs, 400 of them high or critical. Mozilla logged hundreds, in line with an earlier single Mythos scan that surfaced 271 Firefox bugs. Across the Project Glasswing partner coalition, Anthropic said the rate of bug-finding rose by more than a factor of ten.
Why the Safeguards Aren’t Ready
The same capability that finds 10,000 flaws can write the exploits for them. That is why Anthropic is not selling Mythos to everyone. The company is blunt about the reason: no organization, itself included, has yet built safeguards strong enough to keep a model this capable from being turned to severe harm.
Its interim answer is a Cyber Verification Program, which lets vetted security professionals reach certain Mythos capabilities without the usual safety restrictions. Everyone else waits. So the company is shipping the honesty and reliability gains of Opus 4.8 to the whole market while holding its sharpest tool behind a gate.
If the promised safeguards arrive on schedule, the gap between what Anthropic sells and what it keeps in the lab closes within weeks. If they slip, Opus 4.8 stays the most capable model most customers can actually buy.
Frequently Asked Questions
How much does Claude Opus 4.8 cost?
Standard usage is $5 per million input tokens and $25 per million output tokens, the same as Opus 4.7. Fast mode, which runs at 2.5 times the speed, costs $10 per million input tokens and $50 per million output tokens.
How is Opus 4.8 different from Opus 4.7?
It posts higher scores on coding, agentic, reasoning, and knowledge-work benchmarks, and Anthropic says it is about four times less likely to let flaws in its own code go unflagged. The price is unchanged, and its alignment metrics are better than Opus 4.7.
What does the new effort control do?
It lets users choose how much work Claude puts into a response. Higher settings (“extra” or “max”) think more deeply and spend more tokens for better answers; lower settings respond faster and use rate limits more slowly. The control is available on all plans, with high as the default.
What are dynamic workflows in Claude Code?
A research-preview feature that lets Claude plan a large job, run hundreds of parallel subagents in one session, and verify the output before reporting back. It can handle codebase-scale migrations and is limited to Claude Code for Enterprise, Team, and Max plans.
Can developers access Opus 4.8 through the API?
Yes. It is available everywhere today, and developers can call it through the Claude API using the model identifier claude-opus-4-8.
What is Claude Mythos Preview?
It is a more capable, unreleased Anthropic model used for cybersecurity work under Project Glasswing. Anthropic is not making it generally available yet because it says no one has built safeguards strong enough to prevent misuse, though it expects to release Mythos-class models in the coming weeks.
AI
Asos AI Stylist Sends Shoppers to Competitors When Inventory Falls Short
Asos launched Stylist in ChatGPT this month, a shopping assistant that surfaces fashion picks and video content for UK and US customers. The app runs on Bambuser’s video commerce platform and turns Asos’s product library into machine-readable data that ChatGPT can retrieve and return as shoppable videos. Shoppers ask for outfit ideas, browse by occasion or trend, and click through to buy on Asos. The pitch is frictionless discovery inside an environment where 17 million people already spend time. The execution reveals a structural problem no one designing AI commerce tools wants to admit: the AI will always try to be genuinely helpful, and genuine helpfulness does not respect single-retailer distribution.
Steve Webster, an e-commerce executive whose career includes stints at Barbour and Liwa Trading Enterprises, tested Stylist and documented the failure mode in a LinkedIn post. He asked the app to build a smart casual wardrobe for a middle-aged man. Stylist returned a Mango blazer, Jack & Jones shirts, Thomas Crick Evers trainers. Competent enough. Then it reached fragrance. The AI recommended Tom Ford Oud Wood, a luxury scent that fits the brief perfectly. It also linked directly to tomfordbeauty.co.uk, a competitor Asos does not stock. Within three minutes, the AI stylist had sent a paying customer to another retailer.
What Bambuser’s Intelligence Layer Actually Does
Bambuser’s new Intelligence Layer converts Asos’s video library and product catalogue into structured data that large language models can process and retrieve in real time. The system ingests product metadata, video timestamps, styling context, and availability flags, then packages everything so ChatGPT can return shoppable video clips alongside product cards. The technical architecture is sound. A human stylist working exclusively for Asos would navigate the Tom Ford gap by suggesting a different fragrance the retailer actually carries. ChatGPT does not operate within those constraints.
The AI optimizes for the stated need, not for the conversion rate. When a customer asks for a specific product category and the retailer’s catalogue does not contain the best answer, the model fills the gap with whatever it knows. ChatGPT knows Tom Ford Oud Wood is the right fragrance for a mature man building a smart casual wardrobe. It also knows Asos does not stock it. So it linked out, because the right answer for the customer is not always the commercially convenient answer for the retailer.
The Attribution Problem No One Is Measuring
Webster raised a question that will prove uncomfortable for every retailer building on top of third-party intelligence layers: for every session that produces a shoppable Asos purchase, how many produce a visit to Tom Ford, Hermès, or wherever else the AI concluded was the better answer? The channel looks like distribution. In some sessions it probably is. In others, Asos is funding a sophisticated referral engine for everyone else.
No public data exists on how often AI shopping assistants route customers to competitors. The metric is not tracked in standard analytics dashboards, and the platforms hosting these tools have no commercial incentive to surface it. A session that ends with a click to tomfordbeauty.co.uk still counts as engagement. Whether that engagement converts into revenue for the retailer who paid to build the experience is a different question.
The structural tension is this: AI models are trained to be helpful across the entire internet, not helpful within the boundaries of a single retailer’s inventory. When you build a commerce experience on top of someone else’s intelligence layer, you inherit that layer’s associations, its breadth of knowledge, and its definition of a good answer. A good answer for the customer is not always a good answer for the retailer.
Why Human Stylists Do Not Have This Problem
A human stylist working for Asos would never recommend a product the retailer does not carry. The constraint is built into the job. If a customer asks for Tom Ford Oud Wood, the stylist pivots to a fragrance Asos stocks that shares similar notes or positioning. The recommendation is still helpful, but it operates within commercial boundaries the stylist understands implicitly.
ChatGPT does not understand those boundaries because it was not trained to respect them. The model’s objective is to provide accurate, useful information. When the most accurate answer involves a product outside the retailer’s catalogue, the model provides it. The fact that this behavior undermines the retailer’s business model is not a bug the AI recognizes. It is a feature of how the system was designed.
Webster’s observation is not a criticism of the ambition. Placing a brand inside an agentic platform where millions of customers already spend time is a reasonable strategic bet. The Bambuser integration is technically well conceived. The problem is structural, not executional. The AI will always try to be genuinely helpful, and genuine helpfulness and single-retailer distribution are not the same objective.
What This Means for AI-Mediated Commerce
The Asos case is not an outlier. Every retailer building shopping experiences on top of third-party AI platforms will face the same tension. The more helpful the AI becomes, the more likely it is to recommend products the retailer does not carry. The less helpful it becomes, the less reason customers have to use it.
One solution is to constrain the AI’s knowledge base to only the retailer’s inventory. This eliminates the competitor-referral problem but introduces a new one: the AI can no longer answer questions about products the retailer does not stock, which makes it less useful than a standard search bar. Another solution is to accept that some sessions will route customers elsewhere and treat the AI as a top-of-funnel awareness tool rather than a direct conversion channel. This requires a different attribution model and a willingness to fund discovery that does not always convert.
A third option is to build the intelligence layer in-house, training a model specifically on the retailer’s catalogue and styling philosophy. This is expensive, time-consuming, and requires machine learning expertise most retailers do not have. It also does not solve the underlying problem: customers will still ask for products the retailer does not carry, and the AI will still need to decide whether to admit the gap or pretend it does not exist.
The Uncomfortable Question Retailers Are Not Asking
The fundamental question is whether AI-mediated commerce serves the retailer or the customer. If the objective is to maximize customer satisfaction, the AI should recommend the best product regardless of who sells it. If the objective is to maximize retailer revenue, the AI should recommend only products the retailer carries, even when better options exist elsewhere. These objectives are not compatible.
Most retailers building AI shopping assistants have not decided which objective they are optimizing for. The assumption is that the two objectives align, that helping customers find what they want will naturally drive revenue to the retailer. The Asos case proves that assumption is false. The AI helped the customer find the right fragrance. It just did not help Asos make the sale.
Webster’s post has not prompted a public response from Asos or Bambuser. The silence is telling. The problem he identified is not unique to Asos, and it is not a problem any retailer has figured out how to solve. The AI will always try to be genuinely helpful. Genuine helpfulness and single-retailer distribution are not the same objective. Until retailers decide which one they are optimizing for, every AI shopping assistant will face the same structural tension.
Frequently Asked Questions
What is Bambuser’s Intelligence Layer?
Bambuser’s Intelligence Layer is a capability that converts a retailer’s product catalogue and video library into structured, machine-readable data that large language models can process and retrieve in real time. It allows AI platforms like ChatGPT to return shoppable video content and product recommendations based on customer queries.
Why did Asos’s AI stylist recommend a competitor’s product?
ChatGPT recommended Tom Ford Oud Wood because it is the most accurate answer to the customer’s request for a fragrance suitable for a smart casual wardrobe. The AI does not restrict its recommendations to products Asos carries; it optimizes for the best answer based on its training data, which spans the entire internet.
Can retailers prevent AI assistants from linking to competitors?
Retailers can constrain the AI’s knowledge base to only their own inventory, but this makes the assistant less useful when customers ask about products the retailer does not stock. Another option is to build a proprietary AI trained exclusively on the retailer’s catalogue, though this requires significant machine learning expertise and investment.
How common is this problem among AI shopping assistants?
No public data tracks how often AI shopping assistants route customers to competitors. The metric is not part of standard analytics dashboards, and platforms hosting these tools have no commercial incentive to surface it. The Asos case suggests the problem is structural and likely affects every retailer using third-party AI platforms.
What is the best way for retailers to use AI shopping assistants?
Retailers must decide whether they are optimizing for customer satisfaction or conversion rate. If the goal is awareness and discovery, accepting that some sessions will route customers elsewhere may be acceptable. If the goal is direct revenue, constraining the AI to the retailer’s inventory is necessary, even if it reduces the assistant’s usefulness.
AI
AI Chiefs Walk Back Job Apocalypse Warnings as IPO Pressure Mounts
Jensen Huang called it lazy. Sam Altman called it wrong. Dario Amodei softened the math to 90 percent automation with 10 percent human productivity gains. The three most-quoted voices in artificial intelligence spent the past month walking back the job apocalypse they spent two years selling, and the timing is anything but coincidental.
Speaking to Channel News Asia on Monday, Nvidia’s chief executive took direct aim at fellow executives who have publicly blamed AI for workforce reductions. “The narrative that connects AI to job loss, for many of the CEOs that are doing it, it is just too lazy,” Huang said. “AI has just arrived. How is it possible they’re already losing jobs?”
The Reversal Arrives as IPO Windows Open
Huang’s comments follow a pattern. OpenAI CEO Sam Altman told the Commonwealth Bank of Australia’s Accelerate AI Conference in Sydney last week that he “thought there would have been more impact on entry-level white-collar jobs being eliminated by now than has actually happened.” Anthropic boss Dario Amodei, long criticized as an AI doomer by peers including Huang, recently predicted that even if 90 percent of jobs are automated, the remaining 10 percent would be handled by vastly more productive human workers.
The reversals from Altman and Amodei come as their companies, OpenAI and Anthropic, are expected to embark on high-profile initial public offerings that will require broad buy-in from investors to succeed. Public sentiment toward AI has soured in recent polling, particularly in the United States, where voters voice serious discontent over the disruption that tech companies and political leaders predict from the technology.
Huang pushed back against the doom-and-gloom forecasts directly. “How is it possible that AI became productive and useful only six months ago, and they were somehow laying people off two years ago because of AI? It doesn’t make any sense,” he said. “It was just a way for them to sound smart, and I really hate that. I think we’re scaring people and that’s irresponsible.”
Corporate Layoffs Cite AI, Data Shows Otherwise
The disconnect between executive rhetoric and actual AI deployment is stark. British bank Standard Chartered announced plans last week to axe thousands of jobs by 2030 as artificial intelligence replaces employees in administrative roles. Snapchat parent company Snap cut 1,000 jobs last month, citing AI-driven efficiency gains as it pushes toward profitability.
Huang’s argument is that the timeline does not add up. AI tools capable of replacing white-collar workers at scale became widely available in late 2022 with the launch of ChatGPT, yet corporate layoffs citing automation began well before that. The narrative, Huang suggests, was a convenient cover for cost-cutting decisions driven by other factors, including over-hiring during the pandemic and rising interest rates that made growth-at-all-costs strategies untenable.
| Executive | Company | Earlier Position | Current Position |
|---|---|---|---|
| Sam Altman | OpenAI | Predicted significant entry-level job displacement | “My intuitions were just off” on job impact timing |
| Dario Amodei | Anthropic | Warned of broad automation risks | 90% automation offset by 10% hyper-productive humans |
| Jensen Huang | Nvidia | Argued AI creates as many jobs as it displaces | Blames executives for “lazy” AI-job-loss narrative |
Federal Reserve Warns the Disruption May Still Be Ahead
Not everyone is convinced the threat has passed. Federal Reserve Governor Lisa Cook warned on Wednesday that the full effects of AI on employment may still be ahead. “We could be approaching the most significant reorganization of work in generations,” she said in a speech at Stanford University, adding that AI-related job losses could precede any gains, even if the overall long-run picture remains positive.
Most economic institutions, including the European Central Bank, say that artificial intelligence has had only minor effects on employment so far. The gap between executive predictions and measurable labor-market impact has widened over the past 18 months, fueling skepticism about whether AI will deliver the productivity revolution its backers promise or the job displacement its critics fear.
The Timing Problem
Cook’s warning highlights a timing problem that Huang’s critique does not fully address. If AI tools are only now becoming capable of replacing knowledge workers at scale, the disruption those tools cause may not show up in employment data for another 12 to 24 months. Corporate adoption cycles are slow, and the integration of AI into workflows that genuinely displace workers, rather than augment them, is still in early stages.
The Productivity Paradox
The productivity gains AI is supposed to deliver have not yet materialized in aggregate economic data. Labor productivity growth in the United States has been modest since 2023, despite widespread deployment of generative AI tools in white-collar settings. The disconnect between hype and measurable output mirrors earlier technology waves, including the internet boom of the late 1990s, which took years to translate into productivity statistics.
Public Sentiment Turns Against AI Hype
The reversals from Altman, Amodei, and Huang’s criticism of peers arrive as public opinion on AI shifts. Polling conducted in the United States over the past six months shows growing skepticism about AI’s benefits and rising concern about its risks, particularly around job displacement and misinformation. The backlash has been sharpest among younger workers, who were initially the most enthusiastic adopters of AI tools.
The shift in sentiment poses a challenge for OpenAI and Anthropic as they prepare for public offerings. Investors will weigh not only the companies’ revenue growth and technical capabilities but also the regulatory and reputational risks that come with being the public face of a technology that large segments of the population view with suspicion.
- Regulatory pressure is mounting. Lawmakers in the United States and European Union are drafting legislation that would impose disclosure requirements, liability standards, and safety testing on AI systems, particularly those used in hiring, lending, and law enforcement.
- Corporate customers are slowing adoption. Enterprise buyers, initially eager to deploy AI tools, are now conducting longer pilot programs and demanding clearer return-on-investment metrics before committing to large-scale rollouts.
- Talent retention is becoming harder. AI researchers and engineers, once drawn to the mission-driven rhetoric of companies like OpenAI and Anthropic, are increasingly skeptical of leadership claims and are leaving for competitors or starting their own ventures.
What the Data Actually Shows
Employment data from the U.S. Bureau of Labor Statistics shows that job losses in sectors most exposed to AI, including customer service, data entry, and basic coding, have been modest. The unemployment rate for workers in computer and mathematical occupations stood at 2.1 percent in April 2026, down from 2.3 percent a year earlier. Administrative support roles, another category frequently cited as vulnerable to AI displacement, saw employment grow by 1.2 percent over the same period.
The disconnect between executive warnings and labor-market outcomes suggests that either the technology is not yet capable of the displacement its backers predicted, or that companies are slower to adopt it than the hype cycle implied. Huang’s argument leans toward the latter, suggesting that executives used AI as a convenient narrative to justify layoffs driven by other factors.
It was just a way for them to sound smart, and I really hate that. I think we’re scaring people and that’s irresponsible.
Huang’s comment, delivered in an interview with Channel News Asia, was unusually blunt for a CEO whose company supplies the chips that power AI systems. Nvidia has been the primary beneficiary of the AI boom, with its data center revenue growing 427 percent year-over-year in fiscal 2025. Huang’s willingness to criticize the job-loss narrative suggests he views the backlash as a threat to the broader AI ecosystem, not just to individual companies.
The IPO Calculus for OpenAI and Anthropic
OpenAI and Anthropic face a delicate balancing act as they prepare for public offerings. Both companies have raised billions in private funding at valuations that assume continued rapid growth in AI adoption. OpenAI was last valued at $157 billion in a funding round led by SoftBank in January 2026. Anthropic raised $7.3 billion in a Series D round in March 2026, valuing the company at $60 billion.
Public investors will scrutinize not only the companies’ financials but also their exposure to regulatory risk, reputational risk, and the sustainability of their growth trajectories. The job-loss narrative, which both companies’ leaders helped amplify in earlier years, now complicates that pitch. If AI does not displace workers at the scale predicted, the addressable market for enterprise AI tools may be smaller than investors assumed. If it does, the regulatory and public backlash could constrain the companies’ ability to operate.
Revenue Growth vs. Profitability
OpenAI reported $3.7 billion in annualized revenue as of December 2025, driven primarily by subscriptions to ChatGPT Plus and enterprise API contracts. The company remains unprofitable, with operating losses estimated at $5 billion in 2025 due to the high cost of training and running large language models. Anthropic’s revenue is smaller, estimated at $1.2 billion annualized as of March 2026, with similar profitability challenges.
Competitive Pressure from Open-Source Models
Both companies face growing competition from open-source models, including Meta’s Llama 4 and Mistral AI’s latest releases, which offer comparable performance at a fraction of the cost. The open-source threat is particularly acute in enterprise markets, where customers are increasingly reluctant to lock themselves into proprietary platforms.
Huang’s Long-Standing Position on AI and Jobs
Huang has consistently argued that AI will create as many jobs as it displaces, a position that puts him at odds with some of his peers. In a 2024 interview, he predicted that AI would enable new categories of work, including roles focused on training, auditing, and managing AI systems. He has also argued that AI will make existing workers more productive, allowing companies to grow without proportionally increasing headcount.
The Nvidia CEO’s criticism of executives who blame AI for layoffs is consistent with that view. If AI is a productivity tool rather than a replacement for workers, then layoffs attributed to AI are either premature or disingenuous. Huang’s comments suggest he believes the latter, and that the narrative has done more harm than good by fueling public fear and regulatory scrutiny.
The reckoning Huang describes is not just for the executives who used AI as cover for cost-cutting. It is also for the AI industry itself, which must now convince a skeptical public and wary investors that the technology’s benefits outweigh its risks. The reversals from Altman and Amodei, and Huang’s blunt criticism, signal that the industry recognizes the problem. Whether the course correction comes in time to salvage public trust, and the IPO valuations that depend on it, remains an open question.
Disclaimer: This article is for informational purposes only and does not constitute investment advice. The views expressed are those of the sources cited and do not reflect the opinions of Oton Technology. Readers considering investments in AI companies should consult a qualified financial advisor. Figures are accurate as of publication.
-
CRYPTO3 weeks agoAndreessen Horowitz Bets $2.2B on Crypto’s Quiet Cycle
-
CRYPTO3 weeks agoCathie Wood Calls SpaceX IPO Demand ‘Voracious’ Ahead Of $1.75T Debut
-
NEWS3 weeks agoGhana CSA Plants Office In Ho As Volta Cybercrime Climbs
-
APPS3 weeks agoGoogle’s Buried Page Reveals 500 Niche Websites Still Making Cash
-
NEWS3 weeks agoHormuud Bets $19 Down Will Finally Pull Somalia Online
-
NEWS3 weeks agoApple Strikes Preliminary Deal For Intel To Make iPhone And Mac Chips
-
NEWS3 weeks agoMetalenz Polar ID Hides Face Unlock Under OLED Smartphone Screens
-
AI3 weeks agoGoogle AI Overviews Adds Subscribed Label, Reddit Quotes Inline
