Connect with us

COMPUTERS

NVIDIA Vera CPU Takes Direct Aim at the x86 Server Moat

Published

on

The NVIDIA Vera CPU, the company’s first data-center processor designed for agentic AI, reached its first customers on May 17, when the earliest units arrived at Anthropic, OpenAI and Elon Musk’s xAI. It packs 88 custom Olympus cores, runs the Armv9.2 instruction set, and NVIDIA says it delivers up to 1.8 times the agentic workload throughput of a current x86 server chip.

That positioning is a shot across the bow of Intel and AMD. For two decades the data-center processor was an x86 business, and NVIDIA was the accelerator company that plugged into it. With Vera, NVIDIA is selling the host processor too, and it is arriving while x86’s grip on the server room is already loosening.

Why the CPU Climbed Onto the AI Critical Path

For most of the deep-learning era, the CPU’s job in an AI server was to feed the GPU and stay out of the way. Agentic systems changed that. When a model stops answering questions and starts taking actions, much of the work happens off the GPU.

An AI agent that writes code, runs it, checks the output and tries again is leaning on the processor at every turn. Sandboxed code execution, data retrieval, results checking and orchestration all run on CPU cores. In its official Vera CPU launch announcement, NVIDIA describes the result as a tight loop:

  1. A prompt, from a user or a previous step, triggers generation on the GPU.
  2. The GPU writes the tool call, for example a command to compile and run a program.
  3. The CPU executes that call in a sandbox and hands the result back.
  4. The GPU reads the outcome and generates the next reasoning step.

Every extra step an agent takes adds CPU time, and reinforcement learning (RL, the training method where models improve by acting in an environment and scoring the result) multiplies those steps across thousands of parallel runs. “The CPU is no longer simply supporting the model; it’s driving it,” Jensen Huang, NVIDIA’s chief executive, said at the launch. That is the wedge NVIDIA is using to sell a new kind of server chip.

Inside the Olympus Core’s Numbers

The heart of Vera is the Olympus core, a custom design NVIDIA built rather than licensed off the shelf. There are 88 of them per chip, each compatible with Arm’s Armv9.2 instruction set, and each running two tasks at once through a feature NVIDIA calls Spatial Multithreading.

The Branch Predictor and Wide Front End

Agentic code is branch-heavy: lots of conditional logic, scripting engines and deep software stacks like PyTorch. NVIDIA says Olympus delivers up to 50% higher instructions per cycle (IPC, a measure of how much work a core does each clock tick) than its previous Grace CPU. It gets there with a 10-wide decode unit, a deep out-of-order engine and a neural branch predictor that can sustain two taken branches per cycle with no penalty.

Feeding the Cores: Memory and Fabric

Fast cores stall without data. Vera pairs the Olympus cores with up to 1.2 TB/s of LPDDR5X memory bandwidth (a low-power memory standard borrowed from mobile devices), sustaining more than 90% of peak under load and cutting peak memory latency by about 40% versus x86. A second-generation Scalable Coherency Fabric links the cores with 50% faster core-to-core data movement, and a dedicated graph prefetcher gives Vera more than three times the graph-traversal performance of an x86 part. Add it up and, per NVIDIA’s technical breakdown of the Olympus core, the chip claims 1.8 times the agentic sandbox throughput of a comparable x86 design.

Metric NVIDIA Vera (Olympus) Reference point
Cores 88, Armv9.2 Custom NVIDIA design
Per-core IPC Up to 50% higher vs NVIDIA Grace
Memory bandwidth Up to 1.2 TB/s LPDDR5X Sustains 90%+ under load
Peak memory latency About 40% lower vs x86
Graph traversal More than 3x vs x86
Agentic sandbox throughput More than 1.8x vs x86
Memory power Under 30W vs 100W+ DDR5
Configurable TDP 250W to 450W Thermal design power

Where x86 Stands as Vera Lands

Vera is not arriving into a healthy x86 monopoly. It is landing in a market already shifting toward AMD and Arm. AMD’s share of x86 server CPU revenue hit a record 46.2% in the first quarter of 2026, up from 41.3% three months earlier, according to Mercury Research, with Intel holding the remaining 53.8%. You can trace the long-run x86 server share trend sliding away from Intel year after year.

Arm-based server chips are climbing faster still. Arm’s share of server CPU units reached 17.7% in early 2026, up from 11.5% a year before, and analysts project it could reach 40% to 45% by 2030. Vera is an Arm design with the most aggressive backer in the industry behind it, which is why the launch reads less like a product update and more like a structural challenge to the firms that have owned the socket. It also fits NVIDIA’s expanding web of AI infrastructure bets.

Server CPU segment Q1 2026 Prior period
AMD, x86 server revenue 46.2% 41.3% (Q4 2025)
Intel, x86 server revenue 53.8% Higher
Arm, server unit share 17.7% 11.5% (a year earlier)
Arm, projected share 40% to 45% by 2030 Analyst estimate

The First Chips Already Reached the Labs

The clearest sign NVIDIA means business is that Vera is already out the door. The first production units landed at Anthropic, OpenAI and xAI on May 17, with Oracle Cloud Infrastructure (OCI, Oracle’s cloud arm) receiving its delivery three days later, according to NVIDIA’s account of the first Vera deliveries.

The customer roster goes well beyond those four. NVIDIA lists hyperscalers, independent cloud providers and the major server builders among the companies adopting Vera:

  • Hyperscalers: Alibaba Cloud, ByteDance, Meta and Oracle Cloud Infrastructure
  • Cloud providers: CoreWeave, Lambda, Nebius, Together.AI, Cloudflare and Crusoe
  • System builders: Dell Technologies, HPE, Lenovo and Supermicro

OCI alone plans to deploy hundreds of thousands of the chips beginning this year. The Claude developer that recently closed financing at a $965 billion valuation is testing Vera as part of its compute stack, and coding-tool maker Cursor says it expects faster agents from the switch.

Agentic AI is creating a new CPU moment in the AI factory. As models move from answering to acting, Vera is purpose-built to keep that work moving at scale.

That was Ian Buck, NVIDIA’s vice president of hyperscale and high-performance computing, framing the pitch the company is making to every lab now budgeting for agent fleets. James Bradbury, head of compute at the Claude lab, was more measured, calling Vera “a promising part of the ecosystem” rather than a settled choice.

From Cores per Dollar to Tokens per Dollar

The deeper argument NVIDIA is making is about the metric that matters. For a decade the data-center CPU was sold on cores per dollar, a logic tuned to renting virtual machines. NVIDIA wants buyers thinking in tokens per dollar instead: how much AI output a rack produces per watt and per dollar of power.

Efficiency sits at the center of that case. Vera leans on low-power LPDDR5X memory packaged on SOCAMM modules, which NVIDIA says draws far less power than conventional server memory, freeing budget for compute and cooling.

  • Under 30 watts for the LPDDR5X memory subsystem, against 100 watts or more for DDR5 server memory
  • 256 liquid-cooled Vera CPUs per rack, sustaining more than 22,500 concurrent CPU environments
  • 250W to 450W configurable thermal design power per chip
  • 5.5x lower latency on Apache Kafka-compatible workloads in a test by streaming-data firm Redpanda

For an operator running thousands of agents around the clock, those numbers compound. The token-spending side makes the point vivid: one enterprise ran up a $500 million Claude bill in a single month, and every token in that bill rode through orchestration and tool calls that live on the CPU.

The Claims NVIDIA’s Benchmarks Leave Open

Every headline figure here comes from NVIDIA. The company’s own footnote says performance is “based on measured data, and subject to change,” baselined against an unnamed “latest x86 CPU,” and independent testing is only beginning to appear. A vendor claiming a wide lead over the rival’s throughput is a starting point for evaluation, not the verdict.

There is also the software question. The x86 moat was never only about silicon; it was about decades of code that assumes x86. Vera’s Armv9.2 cores need that software ported and tuned, and the labs taking early deliveries are the ones equipped to do it. If independent benchmarks confirm NVIDIA’s numbers and the broader software stack follows the early adopters, Intel and AMD lose their last uncontested foothold in the AI data center. If the gains shrink under outside testing or the porting drags, Vera settles in as a strong part for NVIDIA’s own racks while the x86 incumbents keep the volume.

Frequently Asked Questions

What is the NVIDIA Vera CPU?

The Vera CPU is NVIDIA’s first data-center processor built specifically for agentic AI. It uses 88 custom Olympus cores on the Armv9.2 instruction set and pairs with NVIDIA’s Rubin GPUs in the Vera Rubin platform.

How many cores does the Vera CPU have?

Vera has 88 custom Olympus cores. Each core handles two tasks at once through NVIDIA Spatial Multithreading, and a single rack can hold 256 Vera CPUs running more than 22,500 concurrent environments.

Is the Vera CPU Arm-based or x86?

Arm-based. The Olympus cores are compatible with Arm’s Armv9.2 instruction set, not the x86 architecture used by Intel and AMD, which is why Vera competes directly with x86 server chips.

How does Vera compare to NVIDIA’s Grace CPU?

NVIDIA says the Olympus core delivers up to 50% higher instructions per cycle than the Grace CPU, with up to 1.2 TB/s of memory bandwidth and a configurable 250W to 450W power envelope.

When will the Vera CPU be available?

First production units shipped to leading AI labs in mid-May 2026, and NVIDIA says Vera will be available from server partners in the second half of 2026 as part of the Vera Rubin platform.

Which companies are using the Vera CPU?

Early adopters include OpenAI, xAI and Oracle Cloud Infrastructure, alongside hyperscalers such as Alibaba Cloud, ByteDance and Meta, plus system builders Dell Technologies, HPE, Lenovo and Supermicro.

Logan Pierce is a writer and web publisher with over seven years of experience covering consumer technology. He has published work on independent tech blogs and freelance bylines covering Android devices, privacy focused software, and budget gadgets. Logan founded Oton Technology to publish clear, no nonsense tech news and reviews based on real hands on testing. He has personally tested and reviewed dozens of mid range and budget Android phones, written extensively about app privacy, and built and managed multiple WordPress publications over the past decade. Logan holds a bachelor's degree in English and studied digital marketing at a certificate level.

Continue Reading
Click to comment

Leave a Reply

Your email address will not be published. Required fields are marked *

Trending