Research Warns AI Agents Are Rapidly Improving at Vulnerability Discovery and Exploitation
Recent research and evaluations indicate AI agents are becoming capable of finding and exploiting vulnerabilities with high success rates using standard offensive tooling, lowering the barrier to semi-autonomous attacks. A study by Irregular in collaboration with Wiz reported that leading models (Anthropic Claude Sonnet 4.5, OpenAI GPT-5, and Google Gemini 2.5 Pro) solved 9 of 10 web security CTF challenges modeled on real-world incident patterns, including authentication bypass, exposed secrets, stored XSS, and SSRF (including AWS Instance Metadata Service (IMDS)-style SSRF). Researchers noted that even when success required multiple stochastic runs, the low per-run cost (~$2) and limited repeats could make exploitation practical without necessarily triggering monitoring, with most challenge successes costing under $1 and multi-run cases totaling roughly $1–$10.
Separate evaluation results highlighted by Bruce Schneier, citing an Anthropic post, describe Claude Sonnet 4.5 successfully executing multistage attacks across simulated networks using only standard open-source tools rather than custom cyber toolkits, including exfiltrating all simulated PII in a high-fidelity Equifax-breach simulation by recognizing and exploiting a known publicized CVE. In parallel, Dark Reading reported security concerns around the rapid adoption of an open-source autonomous assistant, OpenClaw (formerly MoltBot/ClawdBot), which can connect to email, files, messaging, and system tools, execute terminal commands and scripts, and maintain memory across sessions—creating persistent non-human identities and access paths that may fall outside traditional IAM and secrets controls, increasing enterprise risk as “bring-your-own-AI” agents gain privileged access.
Timeline
Apr 20, 2026
Hacktron demonstrates Claude Opus 4.6 building Discord Chromium exploit chain
Hacktron CTO Mohan Pedhapati said Anthropic’s Claude Opus 4.6 helped produce a functional Chrome/V8 exploit chain against Discord’s bundled Chromium for about $2,283 in API costs after roughly a week of iteration and human supervision. The report highlighted how AI-assisted patch analysis can accelerate weaponization of known flaws in Electron apps such as Discord, Slack, and Teams when embedded Chromium versions lag upstream fixes.
Apr 7, 2026
Anthropic says Claude Mythos Preview finds and exploits zero-days
Anthropic reported in testing published on 2026-04-07 that its Claude Mythos Preview model could autonomously discover zero-day vulnerabilities and develop working exploits across major operating systems and browsers, outperforming earlier models on exploit-development benchmarks. The company said the model identified thousands of high- and critical-severity flaws, including a FreeBSD NFS server RCE tracked as CVE-2026-4747, and could also rapidly weaponize N-day Linux kernel vulnerabilities.
Jan 30, 2026
Study shows AI agents struggle in broad-scope and real-world root-cause hunts
The same study found performance dropped when agents had to search a full attack surface without a defined entry point, increasing cost and reducing investigative depth. In a real-world AWS Bedrock anomaly case, an AI agent failed to identify the root cause, while a human quickly traced it to an exposed RabbitMQ management interface with default credentials.
Jan 30, 2026
Irregular and Wiz study finds AI agents solve 9 of 10 web security challenges
A study by Irregular in collaboration with Wiz tested Anthropic Claude Sonnet 4.5, OpenAI GPT-5, and Google Gemini 2.5 Pro on 10 web security CTF challenges derived from real-world incidents. The researchers found the leading models solved nine of 10 challenges when given directed, per-site objectives using standard security tools.
Jan 30, 2026
Researchers observe attacker interest in exposed OpenClaw deployments
Security researchers and vendors reported early signs of malicious interest in OpenClaw, including scanning for the agent’s default port and attempts to bypass authentication. They also warned of supply-chain risk tied to the project’s large contributor base and rapid development pace.
Jan 30, 2026
OpenClaw open-source AI agent rapidly gains adoption and scrutiny
The open-source AI agent OpenClaw, previously called ClawdBot and MoltBot, rapidly became the fastest-growing project on GitHub. Its direct connections to email, files, messaging platforms, and system tools with autonomous capabilities prompted security concerns about enterprise deployment.
Jan 30, 2026
Anthropic says Claude simulated an Equifax-style data exfiltration attack
In the same reported testing, Anthropic said Claude Sonnet 4.5 exfiltrated all simulated personal data in a high-fidelity Equifax-breach scenario using only a Bash shell on a Kali Linux host. The company attributed this to the model recognizing a public CVE and generating exploit code without needing iterative refinement.
Jan 30, 2026
Anthropic reports Claude Sonnet 4.5 can autonomously exploit known flaws
An Anthropic blog post said current Claude models had improved cyber capabilities, including carrying out multistage attacks across networks with dozens of hosts using standard open-source tools. It reported that Claude Sonnet 4.5 succeeded in some tests without the custom cyber toolkit required by earlier model generations.
See the full picture in Mallory
Mallory subscribers get deeper analysis on every story, including:
Who’s affected and how
Deep-dive technical analysis
Actionable next steps for your team
IPs, domains, hashes, and more
Ask questions and take action on every story
Filter by topic, classification, timeframe
Get matching stories delivered automatically
Related Entities
Malware
Organizations
Sources
Related Stories

Security Risks and Offensive Potential of Agentic AI and Automated Vulnerability Discovery
Security leaders are warning that **AI agents are increasingly operating as “digital employees”** inside enterprise workflows—triaging alerts, coordinating investigations, and moving work across security tools—often with **broad permissions and limited governance**. The core risk highlighted is that organizations are deploying high-authority agents like plug-ins (reused service accounts, overbroad roles, weak oversight), creating fast-acting operators that can be manipulated and that lack the contextual judgment and policy awareness expected of human staff. Related commentary also raises concerns about **AI-to-AI communication** and “non-human-readable” behaviors that could reduce auditability and complicate investigations and control enforcement. In parallel, public examples show how quickly AI can accelerate **vulnerability discovery**: Microsoft Azure CTO Mark Russinovich reported using *Claude Opus 4.6* to decompile decades-old Apple II 6502 machine code and identify multiple issues, underscoring that similar techniques could be applied to **embedded/legacy firmware at scale**. Anthropic has also cautioned that advanced models can find high-severity flaws even in heavily tested codebases, reinforcing the likelihood that both defenders and attackers will leverage AI for faster bug-finding. Separate enterprise IT coverage notes that organizations are **reallocating budgets toward AI** by consolidating tools and renegotiating contracts, which can indirectly increase security exposure if cost-cutting reduces overlapping controls or if AI adoption outpaces governance and identity/access management maturity.
Yesterday
Security Risks From Self-Hosted Autonomous AI Agents (Clawdbot/Moltbot/OpenClaw)
Security researchers and vendors warned that **self-hosted, agentic AI assistants**—notably **Clawdbot** (rebranded as **Moltbot** and also referred to as **OpenClaw**)—expand enterprise attack surface by combining broad data access with the ability to take direct actions (browser control, messaging, email, and command execution). Resecurity reported finding **hundreds of exposed deployments** reachable from the public Internet, frequently with **weak authentication, unsafe defaults, or misconfigurations** that could allow attackers to access **API keys/OAuth tokens**, retrieve **private chat histories**, and in some cases achieve **remote command execution** on the host. Dark Reading similarly highlighted that OpenClaw’s ecosystem can be undermined by **malicious “skills”** and fragile configuration/removal practices, reinforcing that these tools can be difficult to operate safely even when users attempt to limit permissions. CyberArk framed the issue as an **identity security** problem: autonomous agents often run with **user-level permissions** and integrate with platforms like *Slack*, *WhatsApp*, and *GitHub*, creating pathways for **credential/token theft, data leakage, and unauthorized actions** if the agent is exposed to untrusted content or deployed without strong controls. In contrast, Dark Reading’s coverage of **Shai-hulud** focuses on a separate threat—**self-propagating supply-chain worms targeting NPM projects**—and is not directly about autonomous AI agents, though it underscores the broader risk of downstream compromise when widely used components or ecosystems are poisoned.
2 months ago
AI agent and LLM misuse drives new attack and governance risks
Reporting highlighted how **LLMs and autonomous AI agents** are being misused or creating new enterprise risk. Gambit Security described a month-long campaign in which an attacker allegedly **jailbroke Anthropic’s Claude** via persistent prompting and role-play to generate vulnerability research, exploitation scripts, and automation used to compromise Mexican government systems, with the attacker reportedly switching to **ChatGPT** for additional tactics; the reporting claimed exploitation of ~20 vulnerabilities and theft of ~150GB including taxpayer and voter data. Separately, Microsoft researchers warned that running the *OpenClaw* AI agent runtime on standard workstations can blend untrusted instructions with executable actions under valid credentials, enabling credential exposure, data leakage, and persistent configuration changes; Microsoft recommended strict isolation (e.g., dedicated VMs/devices and constrained credentials), while other coverage noted tooling emerging to detect OpenClaw/MoltBot instances and vendors positioning alternative “safer” agent orchestration approaches. Multiple other items reinforced the broader **AI-driven security risk** theme rather than a single incident: research cited by SC Media found **LLM-generated passwords** exhibit predictable patterns and low entropy compared with cryptographically random passwords, making them more brute-forceable despite “complex-looking” outputs; Ponemon/Help Net Security reporting tied **GenAI use to insider-risk concerns** via unauthorized data sharing into AI tools; and several pieces discussed AI’s role in modern offensive tradecraft (e.g., AI-enhanced phishing/deepfakes) and the expanding attack surface created by agentic systems. Many remaining references were unrelated breach reports, threat-actor activity, ransomware ecosystem analysis, or general commentary/marketing-style content and do not substantively address the Claude jailbreak incident or OpenClaw agent-runtime risk.
1 months ago