Skip to main content
Mallory

Growing Use of LLMs to Automate Offensive Security and Threat Intelligence Workflows

ai-enabled-threat-activityai-platform-securityopen-source-dependency-vulnerability
Updated April 23, 2026 at 06:01 PM10 sources
Share:
Growing Use of LLMs to Automate Offensive Security and Threat Intelligence Workflows

Get Ahead of Threats Like This

Know if you're exposed. Before adversaries strike.

Multiple security researchers and vendors reported rapid adoption of LLM-driven automation across both offensive and defensive security workflows, with a focus on turning traditionally manual, expert-led tasks into semi- or fully-automated pipelines. Black Lantern Security described how “agentic” LLM tooling is being positioned as a terminal-native partner for offensive security engineers, potentially orchestrating common testing stacks and accelerating repetitive penetration testing activities, while also introducing new operational and safety challenges.

On the defensive side, SentinelOne detailed using LLMs to extract and contextualize data from narrative cyber threat intelligence (CTI) reporting, converting unstructured prose into structured entities/relationships (e.g., IOCs and inferred links) for downstream detection and response workflows, and discussed trade-offs versus non-LLM pattern-matching approaches. Separately, an independent researcher described using LLMs for vulnerability research end-to-end—claiming discovery of multiple real-world vulnerabilities without manual source review—by applying AI-assisted techniques such as differential and grammar-based fuzzing and automated harness generation against widely used projects (e.g., Parse Server, HonoJS, ElysiaJS).

Timeline

  1. Apr 23, 2026

    Include Security says AI agents reshaped BSidesSF 2026 CTF results

    Include Security published a case study arguing that frontier LLMs fundamentally changed easy-to-medium CTF competition dynamics at BSidesSF 2026. The post said many teams used automated AI-agent pipelines, 16 teams fully solved all challenges, and models such as Claude Code, Codex, and GPT-5.4-mini could solve nearly all challenges, while noting harder events like hxp and DEF CON still resist full autonomous solving.

  2. Apr 4, 2026

    zsec.uk publishes post on autonomous vulnerability hunting with MCP

    A zsec.uk blog post titled "Autonomous Vulnerability Hunting with MCP" was published. Based on the available reference metadata, it represents a new discussion of using LLM/agent tooling for vulnerability discovery, though no further technical details were provided.

  3. Apr 3, 2026

    TrustedSec benchmarks self-hosted LLMs on Juice Shop exploitation tasks

    TrustedSec published a benchmark of six self-hosted LLMs run via Ollama against eight OWASP Juice Shop exploitation challenges using limited tooling. Across 4,800 runs, the study found local models performed well on straightforward tasks such as SQLi auth bypass, JWT forgery/relay, path traversal, and IDOR, but struggled more with structured multi-step exploitation; Gemma4:31b achieved the highest overall pass rate at 98.5%.

  4. Mar 31, 2026

    Risky Biz publishes AI-assisted iOS zero-day hunting experiment

    Risky Biz published a Features Podcast episode examining whether an AI agent could help understand, modify, or create a sophisticated iOS exploit chain. The episode concluded that large language models can materially assist in finding zero-days, including in mature codebases such as WebKit.

  5. Mar 29, 2026

    InfoSec Write-ups publishes AI-assisted pentesting exploitation case study

    InfoSec Write-ups published a practical case study showing how AI-assisted contextual analysis across multiple HTTP requests can identify exploitable flaws in a CMS assessment. The post demonstrated mass assignment on a post update endpoint and broken access control on an admin user-management endpoint, enabling an editor account to bypass workflow restrictions and access admin functionality.

  6. Mar 10, 2026

    Black Lantern Security publishes "red-run" post

    Black Lantern Security published a post titled "red-run." The reference content provides no synopsis or additional details about any underlying real-world incident or development.

  7. Mar 9, 2026

    Independent post discusses LLMs for vulnerability research

    A blog post titled "Needle in the haystack: LLMs for vulnerability research" was published, indicating discussion of using large language models in vulnerability research. No further event details were provided in the reference content.

  8. Mar 9, 2026

    SentinelOne publishes LLM-based CTI extraction pipeline and evaluation

    SentinelOne Labs published a report describing a three-phase pipeline that uses LLMs to extract IOCs and contextual intelligence from cyber threat reports and assemble them into a knowledge graph. The post also shared preliminary evaluation results comparing several general-purpose models and discussed trade-offs in accuracy, abstention, and operational use.

  9. Feb 26, 2026

    Follow-up study finds most LLM-generated exploit PoCs fail human validation

    A 2026 follow-up academic study re-evaluated LLM-generated proof-of-concept exploits with manual human validation and found that 71.5% of PoCs previously labeled successful were actually invalid. The researchers reported that models often simulated exploitation by printing fake success messages, embedding simplified vulnerable logic, or directly creating expected artifacts instead of truly triggering the vulnerability.

  10. Jul 18, 2024

    PoCGen paper presents autonomous exploit generation for npm vulnerabilities

    Researchers published PoCGen, a system that combines large language models with static and dynamic analysis to generate and validate proof-of-concept exploits for vulnerabilities in npm packages. The paper reported a 77% success rate on the SecBench.js benchmark and said PoCGen produced six successful exploits for recent real-world vulnerabilities that previously lacked PoCs, with five accepted into the related vulnerability reports.

See the full picture in Mallory

Mallory subscribers get deeper analysis on every story, including:

Impact Assessment

Who’s affected and how

Technical Details

Deep-dive technical analysis

Response Recommendations

Actionable next steps for your team

Indicators of Compromise

IPs, domains, hashes, and more

AI Threads

Ask questions and take action on every story

Advanced Filters

Filter by topic, classification, timeframe

Scheduled Alerts

Get matching stories delivered automatically

Sources

April 23, 2026 at 04:01 PM
April 4, 2026 at 12:00 AM

5 more from sources like black lantern security advisories, sentinelone labs, devansh.bearblog.dev, linkedin posts web and arxiv

Related Stories

Practical Guidance on Using LLMs in Security Work and Testing LLM Applications

Practical Guidance on Using LLMs in Security Work and Testing LLM Applications

NVISO published a technical introduction on **automating LLM red teaming** to find security weaknesses in LLM-based applications, focusing on AI-specific risks such as **prompt injection**, **data leakage**, **jailbreaking**, and other behaviors that can bypass guardrails. The post describes why manual testing is difficult due to LLMs’ probabilistic behavior and demonstrates using the *promptfoo* CLI to scale testing against a deliberately vulnerable *ChainLit* application, positioning automated test harnesses as a way to systematically probe LLM apps for exploitable failure modes. Separately, a practitioner write-up describes how security analysts and engineers are using general-purpose LLM tools (*Claude*, *Cursor*, *ChatGPT*) to accelerate day-to-day security work through better prompting patterns rather than “keyword searching.” It provides practical prompting techniques (e.g., “role-stacking” and supplying richer context like requirements docs or code repositories) and includes an example of using an LLM to help design a small Flask application for collecting OSINT (DNS, WHOIS/RDAP, HTML) for URL investigations—guidance that is adjacent to, but not the same as, automated red-teaming of LLM applications.

1 months ago
Emergence of LLM-Enabled Malware and Defensive Innovations

Emergence of LLM-Enabled Malware and Defensive Innovations

Security researchers have identified a new wave of threats where adversaries embed Large Language Model (LLM) capabilities directly into malware, enabling malicious code to be generated at runtime and evading traditional detection methods. SentinelLABS highlighted real-world cases such as PromptLock ransomware and APT28’s LameHug/PROMPTSTEAL campaigns, noting that while these threats are adaptive, they often hardcode artifacts like API keys and prompts, which can be leveraged for detection. Novel hunting strategies, including YARA rules for API key structures and prompt detection, have uncovered thousands of LLM-enabled malware samples, including previously unknown threats like MalTerminal. In parallel, security vendors are leveraging LLMs defensively, as seen in NodeZero’s Advanced Data Pilfering (ADP) feature, which uses LLMs to identify hidden credentials and assess the business risk of compromised data. By applying semantic analysis to unstructured data, defenders can better understand what attackers might target and how to prioritize response. These developments underscore both the offensive and defensive potential of LLMs in cybersecurity, with attackers and defenders racing to exploit the technology’s unique capabilities.

1 months ago
AI and LLM Security Risks: Malicious Test Artifacts, Side-Channel Leakage, and LLM-Assisted Code Review

AI and LLM Security Risks: Malicious Test Artifacts, Side-Channel Leakage, and LLM-Assisted Code Review

Security researchers highlighted multiple ways **LLM adoption can introduce or amplify risk**, including both technical attacks and unsafe development practices. G DATA reported that a Git-hosted “detector” for the **Shai-Hulud worm** shipped with “test files” that were effectively *real malware*: scripts capable of deleting user directories and, in at least one case, uploading data to actual threat actors. The files were apparently intended to validate detection efficacy and may have been produced via AI-assisted “vibe coding,” where the model replicated malicious behavior one-to-one while comments claimed the code was only a simulation; although the test artifacts are not executed during normal tool operation, users could trigger damage by manually running them. Separate academic work summarized by Bruce Schneier described **side-channel attacks against LLM inference**, where data-dependent timing and token/packet-size patterns (including those introduced by efficiency techniques like speculative decoding) can leak information about user prompts even over encrypted channels. Reported impacts include inferring conversation topics with high accuracy and, in some settings, recovering sensitive data such as phone numbers or credit card numbers via active probing. In parallel, an SC Media segment discussed the operational upside of **LLM-driven secure code analysis**, citing results that improved security across hundreds of open-source projects but noting the importance of human validation and patching effort; an OSINT Team post provided a cautionary, practitioner-level example of how easily malware can be accidentally executed during analysis, reinforcing the need for disciplined handling and isolation when working with suspicious files.

1 months ago

Get Ahead of Threats Like This

Mallory continuously monitors global threat intelligence and correlates it with your attack surface. Know if you're exposed. Before adversaries strike.