AI

Perplexity Computer Will Split AI Tasks Between Your Laptop and Cloud

Perplexity Computer will split each AI task between a local model on your laptop and cloud frontier models, starting in July. Here is how the hybrid system works.

Published

2 hours ago

June 3, 2026

Logan Pierce

Perplexity’s hybrid inference is coming to Perplexity Computer, the company’s AI agent that works across your files, apps, and the web, and it starts rolling out in July. The system automatically splits each task between a small model running on your own device and the larger frontier models in Perplexity’s cloud, keeping sensitive files local and sending only the heavier work to servers. The pitch is privacy and lower cost.

The privacy angle is real, but it sits downstream of a money problem: running every query through the biggest models in a remote data center is bleeding AI firms, and consumer chips have finally gotten fast enough to take some of the load.

How Perplexity Splits a Task Between Your Laptop and the Cloud

Perplexity Computer is the agent that can read and edit local files, control the machine, and browse the web through Comet, Perplexity’s own browser. It runs today as a Mac app, with a Windows version on the way. What’s new is an orchestrator that breaks a job into pieces and decides where each piece runs in real time, while the task is still going, an approach Perplexity lays out in its technical writeup on shifting inference to the device.

In a demo at Computex in Taipei on June 2, chief executive Aravind Srinivas fed confidential deal documents to the agent onstage during Intel’s keynote. Local models on an Intel Core Ultra chip sorted what should stay on the device from what could safely go to the cloud. You don’t choose a model before you start. The orchestrator makes that call, not you.

Roughly, the split looks like this:

Stays on the device: summarizing, reformatting, lightweight classification, and any step that touches financial records, health information, or personal files.
Goes to the cloud: multi-step reasoning and retrieval across large datasets, the work that genuinely needs a frontier model.
Routed automatically: the system reasons mid-task about location, so private data can stay put while only the heavy lifting leaves.

Users never pick between local and cloud before getting started, and the routing stays invisible the whole way through.

Perplexity Computer hybrid inference splits AI tasks between laptop and cloud.

The Cloud Bill Behind the Privacy Pitch

The cost story is the one Srinivas keeps coming back to. Cloud inference, the work of actually answering a query, loses money for most AI providers at scale, and the largest models burn the most. He has put it plainly.

You don’t want all your compute centralised in servers and everything running through the largest models. You’re already reading reports of how people are freaking out about their cost.

said Aravind Srinivas, Perplexity’s chief executive, at Computex. His stated goal is “efficient value per watt per user,” and routing is how he gets there. Routing across many models is already familiar to anyone using a routing layer across multiple large language models to switch between GPT-4o, Claude, and Gemini; Perplexity extends the idea from which model answers to where the answer is computed.

Why Routine Work Doesn’t Need a Frontier Model

Summarizing a PDF or tagging an email does not require the most capable model on the planet. A small model on a laptop NPU (neural processing unit, the chip block built for AI math) can do it in milliseconds without ever touching the network. Send that same task to a remote data center and you waste both money and power, which is the case Perplexity is making for keeping it home. The harder reasoning, the part that does need scale, is the only thing that leaves.

The Numbers Perplexity Points To

The economics get stark once the work moves off the server, with on-device inference running a fraction of the cloud cost in side-by-side edge-versus-cloud cost comparisons.

90% lower per-query cost when an inference that runs about $0.50 in the cloud runs on-device instead, by one edge-AI economics analysis.
5x revenue growth at Perplexity, from roughly $100 million to $500 million, on just 34% more staff.
~$500 million a month in AI compute spend at the scale cited for the largest AI labs when describing the cost squeeze.

Perplexity Joins a Shift Already Underway

Perplexity is loud about this, though it is neither alone nor first. Splitting inference between device and cloud is the dominant AI strategy of 2026, and the biggest platform companies got there ahead of it.

At its Build conference, Microsoft showed a Copilot Runtime that does much the same thing, deciding in real time whether a task runs on a Copilot+ PC’s neural processing unit or in Azure, based on latency, cost, and how sensitive the data is. It is shipping a set of local Windows models tuned for Qualcomm, Intel, and AMD NPUs. Apple Intelligence has run an on-device-first model since launch, escalating to Apple’s own servers only when a request needs more. Google pushes Gemini Nano on Android the same way.

System	How it decides	Local hardware	Where it stands
Perplexity Computer	Orchestrator routes each piece of a task	Intel Core Ultra, Nvidia RTX Spark, chip-agnostic	Rolling out in July
Microsoft Copilot Runtime	Splits NPU vs Azure by latency, cost, sensitivity	Snapdragon X, Intel, AMD Ryzen AI	Shown at Build
Apple Intelligence	On-device first, escalates to Apple servers	Apple silicon NPU	Shipping

The shared logic is simple: keep the cheap, private, latency-sensitive work close, and reserve the expensive servers for the requests that actually need them. Perplexity’s twist is doing it inside a general agent that can already touch your files and your apps, which raises both the usefulness and the stakes.

Intel and Nvidia Have the Most to Gain

The hybrid model only works if the chip in your laptop is fast enough, which makes the silicon makers the quiet winners here. Perplexity says its harness is chip-agnostic, running the same framework across different vendors’ hardware. The Computex demo ran on Intel Core Ultra; the company also says it works on Nvidia’s RTX Spark platform for AI laptops, the Arm-based chip Nvidia pitched at the show as built for AI agents.

For Intel, the timing helps. The company has spent two years trying to prove its chips matter to the AI build-out, and putting Perplexity’s agent on its own keynote stage, with chief executive Lip-Bu Tan standing alongside, is the kind of showcase it has struggled to land. For Nvidia, already dominant in data-center GPUs, RTX Spark is a play to own the device side of the same workload.

Chip-agnostic cuts both ways. No single vendor locks in the win; whoever ships the best NPU per watt picks up the consumer share. That is good for Perplexity, which avoids betting on one supplier, and it raises the pressure on Intel, AMD, Qualcomm, and Nvidia to out-build each other on laptop AI silicon. The agent becomes the demand signal; the chips become the contest.

Who Pays for the Compute Now?

Move inference onto your laptop and the cost does not vanish; it moves. The compute and the power bill shift to you. The chip in your machine does the work, your battery drains faster, and the laptop runs warmer during heavy tasks. For most light jobs that is trivial. For sustained agent work it is not nothing.

There is a capability gap, too. A small model on an NPU, the kind that runs a few billion parameters at roughly the speed an iPhone’s chip manages, is not a frontier model. On borderline tasks, the quality of the result depends on the orchestrator routing correctly, and a wrong call sends real work to a model too small for it. Perplexity’s whole pitch rests on that routing being good.

The privacy claim holds only for the part that stays home. Anything the orchestrator decides needs the cloud still leaves your device and lands on Perplexity’s servers, the same as any other cloud query. The local-first design shrinks how much sensitive data travels, but it does not promise that none of it does. “Data center on your laptop” is a sharp line from Perplexity’s announcement of hybrid agentic inference; in practice the laptop is doing a slice of the work, not replacing the data center.

Frequently Asked Questions

When can I use Perplexity Computer’s hybrid inference?

Perplexity says the hybrid system starts rolling out in July. Perplexity Computer is available now as a Mac app, and a Windows version is on the way, so Mac users will likely see the split-inference feature first.

Do I need a special PC to run AI tasks locally?

You benefit most from a machine with a dedicated NPU, such as Intel Core Ultra chips or Nvidia’s RTX Spark platform. Perplexity says its framework is chip-agnostic and meant to run across vendors, but a faster on-device AI block means more work can stay local.

Does hybrid inference keep my data private?

Only for the parts that run on your device. Sensitive files Perplexity routes locally never touch the network, but anything the system sends to the cloud reaches Perplexity’s servers like any other query. The design reduces how much private data travels rather than eliminating it.

Does the hybrid feature cost extra?

Perplexity has not announced a separate price; hybrid inference is part of Perplexity Computer. The trade is indirect, since routing work to your device uses your own compute and power instead of cloud resources you would otherwise pay Perplexity to provide.