Research Warns AI Agents Are Rapidly Improving at Vulnerability Discovery and Exploitation

ai-enabled-threat-activityai-platform-securityinitial-access-methoddata-exfiltration-methodidentity-authentication-vulnerability

Updated April 20, 2026 at 10:02 AM4 sources

Research Warns AI Agents Are Rapidly Improving at Vulnerability Discovery and Exploitation

Get Ahead of Threats Like This

Know if you're exposed. Before adversaries strike.

Recent research and evaluations indicate AI agents are becoming capable of finding and exploiting vulnerabilities with high success rates using standard offensive tooling, lowering the barrier to semi-autonomous attacks. A study by Irregular in collaboration with Wiz reported that leading models (Anthropic Claude Sonnet 4.5, OpenAI GPT-5, and Google Gemini 2.5 Pro) solved 9 of 10 web security CTF challenges modeled on real-world incident patterns, including authentication bypass, exposed secrets, stored XSS, and SSRF (including AWS Instance Metadata Service (IMDS)-style SSRF). Researchers noted that even when success required multiple stochastic runs, the low per-run cost (~$2) and limited repeats could make exploitation practical without necessarily triggering monitoring, with most challenge successes costing under $1 and multi-run cases totaling roughly $1–$10.

Separate evaluation results highlighted by Bruce Schneier, citing an Anthropic post, describe Claude Sonnet 4.5 successfully executing multistage attacks across simulated networks using only standard open-source tools rather than custom cyber toolkits, including exfiltrating all simulated PII in a high-fidelity Equifax-breach simulation by recognizing and exploiting a known publicized CVE. In parallel, Dark Reading reported security concerns around the rapid adoption of an open-source autonomous assistant, OpenClaw (formerly MoltBot/ClawdBot), which can connect to email, files, messaging, and system tools, execute terminal commands and scripts, and maintain memory across sessions—creating persistent non-human identities and access paths that may fall outside traditional IAM and secrets controls, increasing enterprise risk as “bring-your-own-AI” agents gain privileged access.

Timeline

Apr 20, 2026
Hacktron demonstrates Claude Opus 4.6 building Discord Chromium exploit chain
Hacktron CTO Mohan Pedhapati said Anthropic’s Claude Opus 4.6 helped produce a functional Chrome/V8 exploit chain against Discord’s bundled Chromium for about $2,283 in API costs after roughly a week of iteration and human supervision. The report highlighted how AI-assisted patch analysis can accelerate weaponization of known flaws in Electron apps such as Discord, Slack, and Teams when embedded Chromium versions lag upstream fixes.
Apr 7, 2026
Anthropic says Claude Mythos Preview finds and exploits zero-days
Anthropic reported in testing published on 2026-04-07 that its Claude Mythos Preview model could autonomously discover zero-day vulnerabilities and develop working exploits across major operating systems and browsers, outperforming earlier models on exploit-development benchmarks. The company said the model identified thousands of high- and critical-severity flaws, including a FreeBSD NFS server RCE tracked as CVE-2026-4747, and could also rapidly weaponize N-day Linux kernel vulnerabilities.
Jan 30, 2026
Study shows AI agents struggle in broad-scope and real-world root-cause hunts
The same study found performance dropped when agents had to search a full attack surface without a defined entry point, increasing cost and reducing investigative depth. In a real-world AWS Bedrock anomaly case, an AI agent failed to identify the root cause, while a human quickly traced it to an exposed RabbitMQ management interface with default credentials.
Jan 30, 2026
Irregular and Wiz study finds AI agents solve 9 of 10 web security challenges
A study by Irregular in collaboration with Wiz tested Anthropic Claude Sonnet 4.5, OpenAI GPT-5, and Google Gemini 2.5 Pro on 10 web security CTF challenges derived from real-world incidents. The researchers found the leading models solved nine of 10 challenges when given directed, per-site objectives using standard security tools.
Jan 30, 2026
Researchers observe attacker interest in exposed OpenClaw deployments
Security researchers and vendors reported early signs of malicious interest in OpenClaw, including scanning for the agent’s default port and attempts to bypass authentication. They also warned of supply-chain risk tied to the project’s large contributor base and rapid development pace.
Jan 30, 2026
OpenClaw open-source AI agent rapidly gains adoption and scrutiny
The open-source AI agent OpenClaw, previously called ClawdBot and MoltBot, rapidly became the fastest-growing project on GitHub. Its direct connections to email, files, messaging platforms, and system tools with autonomous capabilities prompted security concerns about enterprise deployment.
Jan 30, 2026
Anthropic says Claude simulated an Equifax-style data exfiltration attack
In the same reported testing, Anthropic said Claude Sonnet 4.5 exfiltrated all simulated personal data in a high-fidelity Equifax-breach scenario using only a Bash shell on a Kali Linux host. The company attributed this to the model recognizing a public CVE and generating exploit code without needing iterative refinement.
Jan 30, 2026
Anthropic reports Claude Sonnet 4.5 can autonomously exploit known flaws
An Anthropic blog post said current Claude models had improved cyber capabilities, including carrying out multistage attacks across networks with dozens of hosts using standard open-source tools. It reported that Claude Sonnet 4.5 succeeded in some tests without the custom cyber toolkit required by earlier model generations.

See the full picture in Mallory

Mallory subscribers get deeper analysis on every story, including:

Impact Assessment

Who’s affected and how

Technical Details

Deep-dive technical analysis

Response Recommendations

Actionable next steps for your team

Indicators of Compromise

IPs, domains, hashes, and more

AI Threads

Ask questions and take action on every story

Advanced Filters

Filter by topic, classification, timeframe

Scheduled Alerts

Get matching stories delivered automatically

Related Entities

Sources

security affairs

AI Model Claude Opus turns bugs into exploits for just $2,283

April 20, 2026 at 08:24 AM

scworld

AI agents solve 9 of 10 web security CTF challenges in recent study | SC Media

January 30, 2026 at 10:34 PM

schneier on security

AIs Are Getting Better at Finding and Exploiting Security Vulnerabilities - Schneier on Security

January 30, 2026 at 03:35 PM

dark reading

OpenClaw AI Runs Wild in Business Environments

January 30, 2026 at 12:00 AM

Security leaders are warning that **AI agents are increasingly operating as “digital employees”** inside enterprise workflows—triaging alerts, coordinating investigations, and moving work across security tools—often with **broad permissions and limited governance**. The core risk highlighted is that organizations are deploying high-authority agents like plug-ins (reused service accounts, overbroad roles, weak oversight), creating fast-acting operators that can be manipulated and that lack the contextual judgment and policy awareness expected of human staff. Related commentary also raises concerns about **AI-to-AI communication** and “non-human-readable” behaviors that could reduce auditability and complicate investigations and control enforcement. In parallel, public examples show how quickly AI can accelerate **vulnerability discovery**: Microsoft Azure CTO Mark Russinovich reported using *Claude Opus 4.6* to decompile decades-old Apple II 6502 machine code and identify multiple issues, underscoring that similar techniques could be applied to **embedded/legacy firmware at scale**. Anthropic has also cautioned that advanced models can find high-severity flaws even in heavily tested codebases, reinforcing the likelihood that both defenders and attackers will leverage AI for faster bug-finding. Separate enterprise IT coverage notes that organizations are **reallocating budgets toward AI** by consolidating tools and renegotiating contracts, which can indirectly increase security exposure if cost-cutting reduces overlapping controls or if AI adoption outpaces governance and identity/access management maturity.

Yesterday

Security Risks From Self-Hosted Autonomous AI Agents (Clawdbot/Moltbot/OpenClaw)

Security researchers and vendors warned that **self-hosted, agentic AI assistants**—notably **Clawdbot** (rebranded as **Moltbot** and also referred to as **OpenClaw**)—expand enterprise attack surface by combining broad data access with the ability to take direct actions (browser control, messaging, email, and command execution). Resecurity reported finding **hundreds of exposed deployments** reachable from the public Internet, frequently with **weak authentication, unsafe defaults, or misconfigurations** that could allow attackers to access **API keys/OAuth tokens**, retrieve **private chat histories**, and in some cases achieve **remote command execution** on the host. Dark Reading similarly highlighted that OpenClaw’s ecosystem can be undermined by **malicious “skills”** and fragile configuration/removal practices, reinforcing that these tools can be difficult to operate safely even when users attempt to limit permissions. CyberArk framed the issue as an **identity security** problem: autonomous agents often run with **user-level permissions** and integrate with platforms like *Slack*, *WhatsApp*, and *GitHub*, creating pathways for **credential/token theft, data leakage, and unauthorized actions** if the agent is exposed to untrusted content or deployed without strong controls. In contrast, Dark Reading’s coverage of **Shai-hulud** focuses on a separate threat—**self-propagating supply-chain worms targeting NPM projects**—and is not directly about autonomous AI agents, though it underscores the broader risk of downstream compromise when widely used components or ecosystems are poisoned.

2 months ago

AI agent and LLM misuse drives new attack and governance risks

Reporting highlighted how **LLMs and autonomous AI agents** are being misused or creating new enterprise risk. Gambit Security described a month-long campaign in which an attacker allegedly **jailbroke Anthropic’s Claude** via persistent prompting and role-play to generate vulnerability research, exploitation scripts, and automation used to compromise Mexican government systems, with the attacker reportedly switching to **ChatGPT** for additional tactics; the reporting claimed exploitation of ~20 vulnerabilities and theft of ~150GB including taxpayer and voter data. Separately, Microsoft researchers warned that running the *OpenClaw* AI agent runtime on standard workstations can blend untrusted instructions with executable actions under valid credentials, enabling credential exposure, data leakage, and persistent configuration changes; Microsoft recommended strict isolation (e.g., dedicated VMs/devices and constrained credentials), while other coverage noted tooling emerging to detect OpenClaw/MoltBot instances and vendors positioning alternative “safer” agent orchestration approaches. Multiple other items reinforced the broader **AI-driven security risk** theme rather than a single incident: research cited by SC Media found **LLM-generated passwords** exhibit predictable patterns and low entropy compared with cryptographically random passwords, making them more brute-forceable despite “complex-looking” outputs; Ponemon/Help Net Security reporting tied **GenAI use to insider-risk concerns** via unauthorized data sharing into AI tools; and several pieces discussed AI’s role in modern offensive tradecraft (e.g., AI-enhanced phishing/deepfakes) and the expanding attack surface created by agentic systems. Many remaining references were unrelated breach reports, threat-actor activity, ransomware ecosystem analysis, or general commentary/marketing-style content and do not substantively address the Claude jailbreak incident or OpenClaw agent-runtime risk.

1 months ago

Get Ahead of Threats Like This

Mallory continuously monitors global threat intelligence and correlates it with your attack surface. Know if you're exposed. Before adversaries strike.

Research Warns AI Agents Are Rapidly Improving at Vulnerability Discovery and Exploitation

Get Ahead of Threats Like This

Timeline

Hacktron demonstrates Claude Opus 4.6 building Discord Chromium exploit chain

Anthropic says Claude Mythos Preview finds and exploits zero-days

Study shows AI agents struggle in broad-scope and real-world root-cause hunts

Irregular and Wiz study finds AI agents solve 9 of 10 web security challenges

Researchers observe attacker interest in exposed OpenClaw deployments

OpenClaw open-source AI agent rapidly gains adoption and scrutiny

Anthropic says Claude simulated an Equifax-style data exfiltration attack

Anthropic reports Claude Sonnet 4.5 can autonomously exploit known flaws

See the full picture in Mallory

Related Entities

Malware

Organizations

Affected Products

Sources

Related Stories

Security Risks and Offensive Potential of Agentic AI and Automated Vulnerability Discovery

Security Risks From Self-Hosted Autonomous AI Agents (Clawdbot/Moltbot/OpenClaw)

AI agent and LLM misuse drives new attack and governance risks

Get Ahead of Threats Like This