Skip to main content
Mallory

Research Warns AI Agents Are Rapidly Improving at Vulnerability Discovery and Exploitation

ai-enabled-threat-activityai-platform-securityinitial-access-methoddata-exfiltration-methodidentity-authentication-vulnerability
Updated April 20, 2026 at 10:02 AM4 sources
Share:
Research Warns AI Agents Are Rapidly Improving at Vulnerability Discovery and Exploitation

Get Ahead of Threats Like This

Know if you're exposed. Before adversaries strike.

Recent research and evaluations indicate AI agents are becoming capable of finding and exploiting vulnerabilities with high success rates using standard offensive tooling, lowering the barrier to semi-autonomous attacks. A study by Irregular in collaboration with Wiz reported that leading models (Anthropic Claude Sonnet 4.5, OpenAI GPT-5, and Google Gemini 2.5 Pro) solved 9 of 10 web security CTF challenges modeled on real-world incident patterns, including authentication bypass, exposed secrets, stored XSS, and SSRF (including AWS Instance Metadata Service (IMDS)-style SSRF). Researchers noted that even when success required multiple stochastic runs, the low per-run cost (~$2) and limited repeats could make exploitation practical without necessarily triggering monitoring, with most challenge successes costing under $1 and multi-run cases totaling roughly $1–$10.

Separate evaluation results highlighted by Bruce Schneier, citing an Anthropic post, describe Claude Sonnet 4.5 successfully executing multistage attacks across simulated networks using only standard open-source tools rather than custom cyber toolkits, including exfiltrating all simulated PII in a high-fidelity Equifax-breach simulation by recognizing and exploiting a known publicized CVE. In parallel, Dark Reading reported security concerns around the rapid adoption of an open-source autonomous assistant, OpenClaw (formerly MoltBot/ClawdBot), which can connect to email, files, messaging, and system tools, execute terminal commands and scripts, and maintain memory across sessions—creating persistent non-human identities and access paths that may fall outside traditional IAM and secrets controls, increasing enterprise risk as “bring-your-own-AI” agents gain privileged access.

Timeline

  1. Apr 20, 2026

    Hacktron demonstrates Claude Opus 4.6 building Discord Chromium exploit chain

    Hacktron CTO Mohan Pedhapati said Anthropic’s Claude Opus 4.6 helped produce a functional Chrome/V8 exploit chain against Discord’s bundled Chromium for about $2,283 in API costs after roughly a week of iteration and human supervision. The report highlighted how AI-assisted patch analysis can accelerate weaponization of known flaws in Electron apps such as Discord, Slack, and Teams when embedded Chromium versions lag upstream fixes.

  2. Apr 7, 2026

    Anthropic says Claude Mythos Preview finds and exploits zero-days

    Anthropic reported in testing published on 2026-04-07 that its Claude Mythos Preview model could autonomously discover zero-day vulnerabilities and develop working exploits across major operating systems and browsers, outperforming earlier models on exploit-development benchmarks. The company said the model identified thousands of high- and critical-severity flaws, including a FreeBSD NFS server RCE tracked as CVE-2026-4747, and could also rapidly weaponize N-day Linux kernel vulnerabilities.

  3. Jan 30, 2026

    Study shows AI agents struggle in broad-scope and real-world root-cause hunts

    The same study found performance dropped when agents had to search a full attack surface without a defined entry point, increasing cost and reducing investigative depth. In a real-world AWS Bedrock anomaly case, an AI agent failed to identify the root cause, while a human quickly traced it to an exposed RabbitMQ management interface with default credentials.

  4. Jan 30, 2026

    Irregular and Wiz study finds AI agents solve 9 of 10 web security challenges

    A study by Irregular in collaboration with Wiz tested Anthropic Claude Sonnet 4.5, OpenAI GPT-5, and Google Gemini 2.5 Pro on 10 web security CTF challenges derived from real-world incidents. The researchers found the leading models solved nine of 10 challenges when given directed, per-site objectives using standard security tools.

  5. Jan 30, 2026

    Researchers observe attacker interest in exposed OpenClaw deployments

    Security researchers and vendors reported early signs of malicious interest in OpenClaw, including scanning for the agent’s default port and attempts to bypass authentication. They also warned of supply-chain risk tied to the project’s large contributor base and rapid development pace.

  6. Jan 30, 2026

    OpenClaw open-source AI agent rapidly gains adoption and scrutiny

    The open-source AI agent OpenClaw, previously called ClawdBot and MoltBot, rapidly became the fastest-growing project on GitHub. Its direct connections to email, files, messaging platforms, and system tools with autonomous capabilities prompted security concerns about enterprise deployment.

  7. Jan 30, 2026

    Anthropic says Claude simulated an Equifax-style data exfiltration attack

    In the same reported testing, Anthropic said Claude Sonnet 4.5 exfiltrated all simulated personal data in a high-fidelity Equifax-breach scenario using only a Bash shell on a Kali Linux host. The company attributed this to the model recognizing a public CVE and generating exploit code without needing iterative refinement.

  8. Jan 30, 2026

    Anthropic reports Claude Sonnet 4.5 can autonomously exploit known flaws

    An Anthropic blog post said current Claude models had improved cyber capabilities, including carrying out multistage attacks across networks with dozens of hosts using standard open-source tools. It reported that Claude Sonnet 4.5 succeeded in some tests without the custom cyber toolkit required by earlier model generations.

See the full picture in Mallory

Mallory subscribers get deeper analysis on every story, including:

Impact Assessment

Who’s affected and how

Technical Details

Deep-dive technical analysis

Response Recommendations

Actionable next steps for your team

Indicators of Compromise

IPs, domains, hashes, and more

AI Threads

Ask questions and take action on every story

Advanced Filters

Filter by topic, classification, timeframe

Scheduled Alerts

Get matching stories delivered automatically

Related Stories

Security Risks and Offensive Potential of Agentic AI and Automated Vulnerability Discovery

Security Risks and Offensive Potential of Agentic AI and Automated Vulnerability Discovery

Security leaders are warning that **AI agents are increasingly operating as “digital employees”** inside enterprise workflows—triaging alerts, coordinating investigations, and moving work across security tools—often with **broad permissions and limited governance**. The core risk highlighted is that organizations are deploying high-authority agents like plug-ins (reused service accounts, overbroad roles, weak oversight), creating fast-acting operators that can be manipulated and that lack the contextual judgment and policy awareness expected of human staff. Related commentary also raises concerns about **AI-to-AI communication** and “non-human-readable” behaviors that could reduce auditability and complicate investigations and control enforcement. In parallel, public examples show how quickly AI can accelerate **vulnerability discovery**: Microsoft Azure CTO Mark Russinovich reported using *Claude Opus 4.6* to decompile decades-old Apple II 6502 machine code and identify multiple issues, underscoring that similar techniques could be applied to **embedded/legacy firmware at scale**. Anthropic has also cautioned that advanced models can find high-severity flaws even in heavily tested codebases, reinforcing the likelihood that both defenders and attackers will leverage AI for faster bug-finding. Separate enterprise IT coverage notes that organizations are **reallocating budgets toward AI** by consolidating tools and renegotiating contracts, which can indirectly increase security exposure if cost-cutting reduces overlapping controls or if AI adoption outpaces governance and identity/access management maturity.

Yesterday
Security Risks From Self-Hosted Autonomous AI Agents (Clawdbot/Moltbot/OpenClaw)

Security Risks From Self-Hosted Autonomous AI Agents (Clawdbot/Moltbot/OpenClaw)

Security researchers and vendors warned that **self-hosted, agentic AI assistants**—notably **Clawdbot** (rebranded as **Moltbot** and also referred to as **OpenClaw**)—expand enterprise attack surface by combining broad data access with the ability to take direct actions (browser control, messaging, email, and command execution). Resecurity reported finding **hundreds of exposed deployments** reachable from the public Internet, frequently with **weak authentication, unsafe defaults, or misconfigurations** that could allow attackers to access **API keys/OAuth tokens**, retrieve **private chat histories**, and in some cases achieve **remote command execution** on the host. Dark Reading similarly highlighted that OpenClaw’s ecosystem can be undermined by **malicious “skills”** and fragile configuration/removal practices, reinforcing that these tools can be difficult to operate safely even when users attempt to limit permissions. CyberArk framed the issue as an **identity security** problem: autonomous agents often run with **user-level permissions** and integrate with platforms like *Slack*, *WhatsApp*, and *GitHub*, creating pathways for **credential/token theft, data leakage, and unauthorized actions** if the agent is exposed to untrusted content or deployed without strong controls. In contrast, Dark Reading’s coverage of **Shai-hulud** focuses on a separate threat—**self-propagating supply-chain worms targeting NPM projects**—and is not directly about autonomous AI agents, though it underscores the broader risk of downstream compromise when widely used components or ecosystems are poisoned.

2 months ago
AI agent and LLM misuse drives new attack and governance risks

AI agent and LLM misuse drives new attack and governance risks

Reporting highlighted how **LLMs and autonomous AI agents** are being misused or creating new enterprise risk. Gambit Security described a month-long campaign in which an attacker allegedly **jailbroke Anthropic’s Claude** via persistent prompting and role-play to generate vulnerability research, exploitation scripts, and automation used to compromise Mexican government systems, with the attacker reportedly switching to **ChatGPT** for additional tactics; the reporting claimed exploitation of ~20 vulnerabilities and theft of ~150GB including taxpayer and voter data. Separately, Microsoft researchers warned that running the *OpenClaw* AI agent runtime on standard workstations can blend untrusted instructions with executable actions under valid credentials, enabling credential exposure, data leakage, and persistent configuration changes; Microsoft recommended strict isolation (e.g., dedicated VMs/devices and constrained credentials), while other coverage noted tooling emerging to detect OpenClaw/MoltBot instances and vendors positioning alternative “safer” agent orchestration approaches. Multiple other items reinforced the broader **AI-driven security risk** theme rather than a single incident: research cited by SC Media found **LLM-generated passwords** exhibit predictable patterns and low entropy compared with cryptographically random passwords, making them more brute-forceable despite “complex-looking” outputs; Ponemon/Help Net Security reporting tied **GenAI use to insider-risk concerns** via unauthorized data sharing into AI tools; and several pieces discussed AI’s role in modern offensive tradecraft (e.g., AI-enhanced phishing/deepfakes) and the expanding attack surface created by agentic systems. Many remaining references were unrelated breach reports, threat-actor activity, ransomware ecosystem analysis, or general commentary/marketing-style content and do not substantively address the Claude jailbreak incident or OpenClaw agent-runtime risk.

1 months ago

Get Ahead of Threats Like This

Mallory continuously monitors global threat intelligence and correlates it with your attack surface. Know if you're exposed. Before adversaries strike.