AI

Perplexity Search as Code Bets Agents Will Script Their Own Search

Perplexity’s Search as Code lets AI agents write Python to query its search stack, cutting tokens 85% in tests and echoing moves by Anthropic and Cloudflare.

Published

2 hours ago

June 3, 2026

Logan Pierce

Search as Code is Perplexity’s new architecture for AI agents that writes Python to call the company’s search stack directly, instead of looping through separate function calls one at a time. It is now the default inside Computer, Perplexity’s agent product, and available through the Agent API for outside developers. In one internal test, the approach answered a query with full accuracy while using 85% fewer tokens.

On its own, a search company changing how its agents fetch results is a footnote. The reason to look twice is the company Perplexity is keeping. It is at least the third major AI shop in recent months to make the same architectural bet: let the model write code, not pick tools. Anthropic and Cloudflare got there first, and a growing pile of benchmark data says the move pays off, right up until you have to run that code safely.

Perplexity Now Writes Python Instead of Calling Search Tools

The old way an agent searched looked like a conversation with a vending machine. The model picked a tool, sent a query, waited for the result, read it back into its context window, then picked the next tool. Every round trip went through the large language model (LLM, the AI model doing the reasoning). Perplexity says that pattern falls apart once an agent is doing real work rather than answering a single question.

From Tool Calls to Generated Code

In the company’s writeup on rethinking search as code generation, the design has three layers. The model acts as the control plane, reasoning about a task and writing code. A compute sandbox runs that Python in a contained environment. And an Agentic Search SDK (software development kit, the set of building blocks developers code against) exposes search as small parts: retrieval, ranking, filtering, rendering. The model snaps those parts together into a custom pipeline and runs the whole thing in one turn.

That matters because of scale. Perplexity says a single task inside Computer can fire off hundreds, sometimes thousands, of retrieval operations within a few minutes. Pushing each of those through the model one call at a time is slow and expensive. Generated code can fan out, run searches in parallel, hold intermediate results in a file rather than the model’s memory, and hand back only the answer.

Introducing Search as Code, our new search architecture for AI agents. It writes Python that calls our search stack directly, instead of looping through function calls one at a time. Available in the Perplexity Agent API, and now default in Computer.

That is from Perplexity’s official account on X. Aravind Srinivas, the company’s chief executive, described the direction as “search as codegen” in a post on the same platform, framing code execution as the foundation for future agent workflows rather than a side feature.

The Numbers Behind the Claim

The pitch leans on benchmarks, and Perplexity published several:

85.1% fewer tokens on a software-vulnerability scanning task, dropping from 288,700 tokens to 42,900 while keeping 100% accuracy.
0.871 accuracy on DSQA, a question-answering benchmark, up 19.77 percentage points over the same system without the code approach, about a 29% gain.
2.5 times the score of the next system on WANDR, a benchmark Perplexity built to mirror the knowledge-heavy professional tasks Computer handles.

Cost tracks the same way. On DSQA, the company reports its medium-reasoning setting hitting top performance for under a dollar per task.

Perplexity Search as Code architecture lets AI agents write Python search queries.

Why One-Call-at-a-Time Search Ran Out of Road

Function calling, sometimes called tool calling, was the standard way to give a model abilities beyond text. It works fine when an agent needs one or two tools. It buckles when the agent has dozens of them and a long job to do, and the reason is the context window, the fixed amount of text a model can hold at once.

Three things eat that window. Tool definitions get loaded in first, and every available tool spends tokens just describing itself. Then the actual results flow through, so a large document or dataset passes through the model even when the agent only needs a one-line summary of it. And because each step waits on the last, a task that touches data many times pays the model tax over and over.

This shift lands as agents stop answering questions and start finishing tasks, the same change driving everything from coding assistants to AI agents that handle grocery shopping and checkout. A task agent does not want a search engine. It wants a way to script the search engine. Code gives it that, and the model only reads back what it asked for.

Anthropic and Cloudflare Made the Same Move

Here is the part most coverage of the Perplexity launch skips. The same idea is showing up across the field, under different names, within months of each other.

Anthropic published a pattern it calls code execution with MCP (Model Context Protocol, the open standard for connecting agents to tools). Instead of loading every tool into the model and routing results back through it, the agent writes code that calls the tools as code APIs and keeps the heavy data on disk. Anthropic reported one workflow falling from about 150,000 tokens to roughly 2,000, a 98.7% cut, with agents running about ten times faster. The same MCP plumbing is what lets platforms like Atlassian, which recently began opening a 150-billion-object data graph to Claude code agents, hand agents that much surface area without drowning them.

Cloudflare’s version is a feature called Code Mode for MCP agents. It turns a pile of MCP tools into a TypeScript API and runs the generated JavaScript inside a secure V8 isolate. The company says it shrank the cost of exposing more than 2,500 API endpoints from over 1.17 million tokens to about 1,000, near 99.9%. Hugging Face has shipped this idea the longest, in its smolagents library, where the CodeAgent that acts by writing Python rather than JSON cuts steps and model calls by roughly 30% versus a standard tool-caller. The thread runs through Anthropic’s broader push on agent design too, including its work on managed agents and persistent memory.

Company / Product	Approach	Reported token cut
Perplexity / Search as Code	Model writes Python that calls a search SDK in a sandbox	85.1% on a vulnerability-scan test
Anthropic / Code execution with MCP	Agent codes against tools exposed as code APIs, data stays on disk	98.7% on one workflow
Cloudflare / Code Mode	MCP tools become a TypeScript API, JavaScript runs in a V8 isolate	~99.9% across 2,500+ endpoints
Hugging Face / smolagents CodeAgent	Agent writes Python instead of JSON tool calls	~30% fewer steps and model calls

Four independent teams, four different stacks, one conclusion. When an agent has a lot to do, code is a better controller than a menu of tools.

The Bill Comes Due in the Sandbox

The catch sits in plain sight in Perplexity’s own design: a compute sandbox that runs Python the model wrote. Every framework on that list now executes machine-generated code, and that is a fresh, fat attack surface.

The risk is concrete, not theoretical. A flaw in Cohere’s Terrarium Python sandbox, tracked as CVE (Common Vulnerabilities and Exposures, the public catalog of security bugs) entry CVE-2026-5752 and rated CVSS 9.3 on the standard severity scale, let attackers escape the sandbox and run code with root privileges on the host, as detailed in reporting on the Terrarium sandbox escape. Pillar Security found a separate sandbox-escape path in Google’s Antigravity agent tooling. In March, an Alibaba research agent called ROME reportedly broke out of its test environment during training and started mining cryptocurrency on GPU resources it was never given.

Prompt injection makes it worse. If an agent reads a poisoned document or webpage, a hidden instruction can steer the code it then generates and runs. The blast radius is no longer a bad text answer; it is whatever that code can touch on the machine. Demand for guardrails shows the scale of the exposure: sandbox provider E2B says it went from about 40,000 sessions a month in early 2024 to roughly 15 million a month a year later. The architecture saves tokens by moving work out of the model and into a runtime, and that runtime is exactly where the new danger lives.

What This Means if You’re Building Agents

For developers, Search as Code is live now through Perplexity’s Agent API quickstart documentation, and the broader pattern is available across Anthropic’s, Cloudflare’s, and Hugging Face’s stacks. The payoff is real for any agent that chains many steps or juggles a large toolset: lower cost, faster runs, and the ability to handle far more tools than you could cram into a context window.

The judgment call is where you run the generated code. If you adopt one of these systems, the sandbox is no longer an implementation detail; it is your security boundary, and it needs the same scrutiny you would give any service that executes untrusted input. Teams that treat code-as-control as a pure efficiency win, without locking down the runtime, are signing up for the exact failure modes Cohere and Alibaba already hit.

For now, Search as Code is shipping in the Agent API, and the harder question of how to safely run model-written code at scale is the one the whole field is still working through.

Frequently Asked Questions

What is Perplexity’s Search as Code?

It is a search architecture for AI agents in which the model writes Python that calls Perplexity’s search stack directly, instead of making separate tool calls one at a time. The generated code runs in a compute sandbox and can combine retrieval, ranking, and filtering into a single workflow. It is the default in Perplexity’s Computer agent and available through the Agent API.

How is it different from function calling?

Function calling has the model pick a tool, send a query, and read the result back into its context before the next step, repeating for every action. Search as Code lets the model write one block of code that runs many operations at once, in parallel if needed, and returns only the final answer. That avoids loading every tool definition and every intermediate result into the model.

Can developers use Search as Code today?

Yes. Perplexity says the capability is available through its Agent API and is the default search method inside Computer. Developers can build agents that generate and run search code against Perplexity’s stack rather than issuing individual search calls.

Does running AI-generated code create security risks?

It does. Executing model-written code expands the attack surface to whatever that code can reach. Documented cases include a CVSS 9.3 sandbox-escape flaw in Cohere’s Terrarium, a sandbox-escape path in Google’s Antigravity tooling, and an Alibaba agent that broke out of its environment to mine cryptocurrency. Prompt injection can also steer the code an agent generates, so the runtime sandbox becomes the critical security boundary.

Which other companies use code execution for AI agents?

Anthropic ships a code execution with MCP pattern that cut one workflow from about 150,000 tokens to roughly 2,000. Cloudflare’s Code Mode turns MCP tools into a TypeScript API run in a secure isolate. Hugging Face’s smolagents library has long favored a CodeAgent that writes Python instead of JSON tool calls.

Does the approach actually lower cost?

The reported token reductions are large: 85.1% on one Perplexity test, 98.7% in an Anthropic workflow, and close to 99.9% in Cloudflare’s endpoint example. Fewer tokens generally mean lower cost and latency, though the savings depend on the task and require running a secure sandbox to capture them safely.