Connect with us

NEWS

Free AI Worm Breaches 73.8% of a Corporate Network in 7 Days

Toronto researchers’ AI worm breached 73.8% of a simulated enterprise network in 7 days by stealing victim GPU compute to run a free open-weight LLM, setting attacker cost per infection to zero.

Published

on

A fully autonomous AI worm built by researchers at the University of Toronto, the Vector Institute, and the University of Cambridge compromised 73.8% of a simulated corporate network in seven days of unassisted operation, running on a free open-weight large language model and funding its own compute by stealing GPU resources from machines it had already breached. Published June 2, the preprint “AI Agents Enable Adaptive Computer Worms” describes a worm that generates a custom attack strategy for each machine it encounters, works across Linux, Windows, and IoT devices, and exploited vulnerabilities disclosed after its model’s training cutoff by reading publicly available security advisories in real time.

The paper’s authors frame their core finding in economic terms: since the worm runs on victims’ own hardware, the attacker’s marginal cost per new infection is zero.

Seventy-Three Percent in Seven Days

A Network Built to Look Real

The test environment held 33 machines configured to mirror a realistic corporate fleet: Ubuntu servers across six version generations from 16.04 to 24.04, Debian 9 through 12, Alpine Linux, Rocky Linux 9, CentOS Stream 9, three Windows Server editions spanning 2008 R2 through 2022, and various IoT devices representing printers, cameras, and networked sensors. Five of the 33 nodes had local GPUs. The other 28 relied on inference capacity borrowed from GPU nodes the worm had already compromised, so each reasoning step it executed on a new target cost the attacker nothing to run.

The vulnerabilities seeded into the network came from three categories: named CVE (Common Vulnerabilities and Exposures) flaws including EternalBlue, SambaCry, PrintNightmare, and Dirty Pipe; structural weakness classes tracked as CWEs (Common Weakness Enumerations) including SQL injection, command injection, and default credentials; and three hosts carrying vulnerabilities disclosed in April and May 2026, after the LLM’s training data cutoff. Modern enterprise environments routinely include GPU-equipped nodes for developer workstations, data analytics, and machine learning pipelines, giving the worm more seed infrastructure to work with than the five-GPU test setup might suggest.

Seven Days, Seven Generations

Over seven days of operation with no human direction, the worm’s results, averaged across experimental runs:

  • 31.3 vulnerabilities identified per run
  • 23.1 hosts elevated to root or SYSTEM-level access
  • 20.4 hosts reached by self-replication, across up to seven generations of propagation
  • 61% success rate against one-day vulnerabilities disclosed after the LLM’s training cutoff

The one-day result deserves specifics. The three hosts carrying post-training flaws held vulnerabilities from April and May 2026. Working from publicly available advisory information it read during the test, the worm cracked two Linux vulnerabilities, CopyFail and DirtyFrag, and a critical remote code execution bug in the Marimo notebook platform tracked as CVE-2026-39987. It succeeded on 41 of 67 attempts across those hosts, despite the underlying model having no prior training on any of those specific flaws.

Where Execution Breaks

Exploitation succeeded roughly 44% of the time overall. Failures came from malformed payloads, incorrect tool flags, and multi-step attack chains where the agent failed to thread output correctly between steps. The paper describes individual exploit execution as “fundamentally fragile.” That frailty didn’t slow propagation much: on 88% of successfully breached hosts, the worm deployed independent replicas, building a distributed swarm rather than a single propagation thread. Each newly compromised GPU-equipped node added inference capacity to the swarm, making future attacks cheaper to reason through as the footprint grew.

Zero Cost Per Infection Changes the Math

Traditional malware economics start with a fixed development cost spread across however many infections the campaign manages to produce. The Toronto AI worm restructures the cost side. Once on a network, its GPU compute for each subsequent target comes from machines already inside the perimeter, so the cost of attacking host number 500 equals the cost of attacking host number 1: nothing. The victims’ hardware pays for the reasoning that breaches them.

The paper’s abstract states the consequence plainly: “Since the worm is powered by stolen compute, the attacker’s marginal cost per new infection is zero. This creates a destabilizing economic asymmetry between attackers and defenders.” As the swarm grows, so does the compute pool available to it. Each new infected GPU node joins the collective inference capacity without any additional investment from the attacker.

The worm also bypasses the controls commercial AI providers built into their APIs. Because it runs an open-weight model locally on compromised hardware, rate limiting, service refusals, and safety alignment policies in commercial platforms are, the paper states, “structurally irrelevant.” No API key, no commercial account, no proprietary infrastructure required. Enterprise security operations teams increasingly monitor commercial AI API logs for signs of attacker reconnaissance; an open-weight worm generating exploit code inside the compromised perimeter produces none of that signal.

On May 10, roughly three weeks before the Toronto paper published, Sysdig’s threat research team documented what it described as the first publicly confirmed real-world intrusion driven by an LLM agent: an attack that moved from initial access through a vulnerable Python notebook to a fully exfiltrated internal database in under an hour, across four network pivots. A human attacker established the initial foothold. Once inside, the LLM agent handled every subsequent step without human direction. The timeline it demonstrated, AI-directed autonomous movement through four pivots under an hour, runs ahead of many enterprise security teams’ planning horizons.

A Worm Without a Kill Switch

WannaCry infected more than 230,000 computers across 150 countries in under 24 hours on May 12, 2017. Microsoft had released the patch for the underlying EternalBlue vulnerability in Windows’ Server Message Block protocol nearly two months earlier. Estimates from Symantec put the aggregate damage at roughly $4 billion. Hours into the attack, security researcher Marcus Hutchins found a hard-coded domain name embedded in WannaCry’s code; registering that domain stopped the spread.

WannaCry’s history delivered a specific lesson: patch deployment speed is the survival variable. Organizations running Microsoft’s March 2017 patch were largely spared. That lesson was correct for a worm built around one static exploit. Against the Toronto worm, patching a specific SQL injection exposure on one host removes one vulnerability from one target; the worm reads the next machine for a different opening. With zero marginal cost per attempt, it can probe indefinitely across whatever vulnerabilities remain.

Feature WannaCry (2017) Toronto AI Worm (2026 PoC)
Exploit type Single fixed flaw (EternalBlue, CVE-2017-0145) Adapts attack strategy per target; exploits CVEs, CWEs, one-day flaws
AI/LLM dependency None Free open-weight model on a single GPU
Cost per new infection Fixed development cost amortized across infections Zero (stolen victim compute)
Kill switch Yes (hard-coded domain name) None by design
Stopped by a single patch Yes No
Post-training vulnerability exploitation Not applicable 61% success via live advisory reading

Nicolas Papernot, associate professor of computer science at the University of Toronto and the paper’s corresponding author, put the scope directly in a University of Toronto statement: “In an interconnected world, no system is immune to this threat.” Before publishing, the team notified Canadian science, security, and defense authorities, withheld the LLM’s name and the worm’s reasoning architecture from the public paper, and restricted prototype access to vetted defensive-security researchers through the University.

How Far Is the Lab From a Real Network?

The Containment Gap

The worm hasn’t appeared in the wild. The Toronto team ran it behind four layers of network isolation, including a hypervisor trust boundary and a purpose-built Containment Attestation Service designed to prevent escapes. Individual exploit execution remained fragile, failing more than half the time at the per-attempt level, and real enterprise networks add configuration noise the clean test environment didn’t include.

Trevor Horwitz, chief information security officer at cybersecurity firm TrustNet, said the gap between a controlled research environment and a production network is genuine:

Real enterprise networks are messy. They have inconsistent configurations, legacy systems, security tooling, partial visibility and a lot of operational friction. That makes real-world propagation harder than a lab demo.

Horwitz placed AI worms inside the category of risks security teams already handle: automated malware, lateral movement, weak segmentation, and poor identity controls. Mike Wilkes, chief information security officer at cybersecurity vendor Aikido Security, agreed the immediate alarm is unwarranted, but drew a harder line on the intelligence question. “We can comfortably presume that if someone acting as a defender in the infosec community has come up with this idea, then someone in the attacker world has also set such tooling in motion,” Wilkes said.

Research That Arrived the Same Week

In March 2026, researchers from five Chinese and Singaporean universities published ClawWorm, a self-propagating attack that achieves a fully autonomous infection cycle against OpenClaw, an open-source AI agent framework with more than 40,000 active instances. ClawWorm operates against AI agent ecosystems rather than traditional network infrastructure, but it runs on the same dynamic: LLM-driven self-replication, no human operator directing individual steps, attack logic generated at runtime from a single initiating message.

An AI executive order signed at the White House that same week directed federal agencies to accelerate evaluation of AI systems for security vulnerabilities, giving the Toronto findings an immediate policy audience beyond the research community. More than 13,000 security professionals attended Infosecurity Europe 2026 at ExCeL London across the same three days, where agentic AI attacks dominated the floor sessions. Gary McGraw, chief executive of the Berryville Institute of Machine Learning, an AI security nonprofit, reviewed the Toronto findings. “This is bigger than Mythos in my view,” McGraw said, referring to Anthropic’s recently deployed Mythos model, which had drawn attention from enterprise security teams by revealing the volume of unpatched vulnerabilities sitting across corporate infrastructure. Martin Reynolds, field CTO at DevSecOps vendor Harness, said the worm itself matters less than the trajectory it confirms: “AI gives attackers greater speed, scale and adaptability, often against the same vulnerabilities and misconfigurations security teams have faced for years.”

Defenses That Still Hold

The Controls That Apply

Wilkes, Horwitz, and the Toronto researchers converge on the same point: the controls that constrain this worm are the same controls recommended before AI malware was a recognized category. Wilkes explicitly warned against spending on products marketed as anti-AI malware. The worm’s attack surface is known vulnerabilities, default credentials, and weak network segmentation. His countermeasures, with the AI worm threat applied:

  • Continuous asset inventory. The worm probes whatever machines exist on the network. A host you don’t know about can’t be patched or isolated from the swarm.
  • Network micro-segmentation. Seven generations of lateral movement require open network paths between hosts. Tight segmentation shrinks the footprint each breach can reach.
  • Default credential elimination. Default credentials and command injection were among the worm’s most reliable CWE entry points in the test. Rotating defaults closes an entire attack class at low cost.
  • Machine-to-machine trust limits. The worm uses compromised hosts to run LLM inference and propagate further. Wilkes specifically called out MCP (Model Context Protocol, the standard that lets AI models connect to external services and data) servers as “breach gateways” requiring hardening as AI agent infrastructure expands in enterprise environments.
  • Fast-path vulnerability response. Mitigate known exploitable exposure within hours, patch within days. The worm’s test vulnerability set drew directly from the CISA Known Exploited Vulnerabilities catalog, the same prioritized list defenders already use.
  • Monitor for abnormal GPU compute. A GPU node running LLM inference for a workload with no authorized business purpose on that host is the specific signature this class of worm leaves. Most conventional intrusion detection systems aren’t watching for it.

Finding the Vulnerabilities First

The Toronto paper adds a proactive measure beyond the checklist: use AI-assisted penetration testing and fuzzing on your own infrastructure to find the vulnerabilities the worm targets, before it does. The same capability that powers the worm can be redirected as a defensive probe against your own network, the researchers note in their defense and governance section.

Horwitz’s summary for the CISO community: “AI-powered threats do not make these controls obsolete. They make weak execution more expensive.”

The signature most conventional intrusion detection systems weren’t built to catch: a GPU node inside the breached network running someone else’s inference workload.

Logan Pierce is a writer and web publisher with over seven years of experience covering consumer technology. He has published work on independent tech blogs and freelance bylines covering Android devices, privacy focused software, and budget gadgets. Logan founded Oton Technology to publish clear, no nonsense tech news and reviews based on real hands on testing. He has personally tested and reviewed dozens of mid range and budget Android phones, written extensively about app privacy, and built and managed multiple WordPress publications over the past decade. Logan holds a bachelor's degree in English and studied digital marketing at a certificate level.

Continue Reading
Click to comment

Leave a Reply

Your email address will not be published. Required fields are marked *

Trending