Prompt Injection and Jailbreak Techniques Targeting LLM-Powered Applications
Security researchers and vendors are warning that prompt injection and jailbreak techniques remain a leading risk for enterprise deployments of large language models (LLMs), enabling attackers to override system instructions, bypass safety controls, and potentially drive data exposure outcomes. Resecurity reports assisting a Fortune 100 organization where AI-powered banking and HR applications were targeted with prompt-injection attempts, emphasizing that these attacks exploit model behavior rather than traditional software flaws and can be used in scenarios such as extracting sensitive configuration data (for example, attempts to elicit content resembling /etc/passwd). Resecurity also cites OWASP’s 2025 Top 10 for LLM Applications, where prompt injection is ranked as the top issue, and frames continuous security testing (e.g., VAPT) as a key control for enterprise AI systems.
Separate research highlighted by Kaspersky describes a “poetry” jailbreak technique in which prompts framed as rhyming verse increased the likelihood that chatbots would produce disallowed or unsafe responses; the study tested this approach across 25 models from multiple vendors (including Anthropic, OpenAI, Google, Meta, DeepSeek, and xAI). In contrast, OpenAI’s planned upgrade to ChatGPT Temporary Chat is primarily a product/privacy change—adding optional personalization while keeping temporary chats out of history and model training (with possible retention for up to 30 days)—and does not describe a specific security incident or vulnerability disclosure tied to prompt injection or jailbreak research.
Timeline
Apr 30, 2026
Capital One proposes adaptive automated LLM red-teaming framework
Researchers from Capital One’s AI Foundations group introduced Adaptive Instruction Composition, an automated jailbreak testing framework that uses a contextual bandit to learn effective query-and-tactic combinations instead of relying on random combinations. In simulations against Mistral-7B and Llama models, the method reportedly more than doubled WildTeaming’s attack success rate and showed cross-model transferability of learned jailbreak strategies.
Apr 10, 2026
Trend Micro discloses 'sockpuppeting' jailbreak affecting 11 AI models
Trend Micro detailed a new black-box jailbreak technique called 'sockpuppeting' that abuses assistant-prefill support to inject a fake compliant response and bypass safety guardrails in 11 major LLMs. The researchers reported impacts including generation of malicious exploit code and disclosure of system prompts, and said API-level blocking of assistant prefills is the strongest defense.
Jan 25, 2026
Resecurity details prompt-injection risks and simulated data disclosure
Resecurity published an analysis describing prompt injection as a leading security risk for enterprise AI applications, outlining direct and indirect injection techniques and a scenario in which an AI HR assistant is manipulated into disclosing a simulated /etc/passwd file. The article also recommended mitigations such as least-privilege tool access, input and output validation, segregation of untrusted content, and continuous adversarial testing.
Jan 23, 2026
Study finds poetic prompts can jailbreak major LLMs
Researchers tested rhyming versions of malicious prompts from the MLCommons AILuminate Benchmark against 25 popular models and found that poetry significantly increased the likelihood of unsafe responses. Using a hand-picked set of 20 effective poetic prompts, they reported an average attack success rate of about 62%, with some models such as Gemini 1.5 Pro reportedly bypassed consistently under that metric.
See the full picture in Mallory
Mallory subscribers get deeper analysis on every story, including:
Who’s affected and how
Deep-dive technical analysis
Actionable next steps for your team
IPs, domains, hashes, and more
Ask questions and take action on every story
Filter by topic, classification, timeframe
Get matching stories delivered automatically
Sources
Related Stories

Prompt Injection and Jailbreak Attacks on Large Language Models
Recent research has demonstrated that large language models (LLMs) such as GPT-5 and others are increasingly vulnerable to prompt injection and jailbreak attacks, which can be exploited to bypass built-in safety guardrails and leak sensitive information. Attackers use techniques like prompt injection—embedding malicious instructions within seemingly benign queries—to trick LLMs into revealing confidential data, including user credentials and internal documents. A notable study by Icaro Lab, in collaboration with Sapienza University and DEXAI, found that adversarial prompts written as poetry could successfully bypass safety mechanisms in 62% of tested cases across 25 frontier models, with some models exceeding a 90% success rate. These findings highlight the sophistication and creativity of new attack vectors targeting AI systems, raising significant concerns for organizations embedding LLMs into business operations. The widespread adoption of LLMs in handling sensitive business functions amplifies the risk of data exfiltration through these advanced attack methods. As organizations increasingly rely on AI for customer service, document processing, and other critical tasks, the potential for prompt injection and poetic jailbreaks to facilitate unauthorized data access becomes a pressing security issue. The research underscores the urgent need for improved AI safety measures, robust prompt filtering, and continuous monitoring to mitigate the risks posed by these evolving adversarial techniques.
1 months ago
Cisco Testing Finds Open-Weight LLMs Highly Susceptible to Multi-Turn Jailbreaks
Cisco reported that **multi-turn jailbreak** techniques—iterative, conversational prompt sequences designed to erode safety guardrails—successfully bypassed protections in eight major **open-weight** large language models **92.78%** of the time, while single-turn prompt attempts were notably less effective. The findings, published in Cisco’s *State of AI Security* research and covered by multiple outlets, highlight that many enterprise AI deployments using downloadable, self-hosted models may be more vulnerable to sustained adversarial prompting than organizations assume. The report’s risk framing is amplified by broader concerns that model misuse and capability leakage can scale quickly: Anthropic separately alleged coordinated **model distillation** activity by Chinese AI labs using large volumes of fraudulent accounts and proxy infrastructure to extract advanced behaviors from *Claude*, warning that copied models may lack comparable safety controls and could be repurposed for malicious use. Related research coverage also notes that LLMs can sometimes be induced—via specialized prompting/jailbreaking methods—to reproduce near-verbatim copyrighted text from training data, underscoring that prompt-based attacks can drive both **policy bypass** and **data/content extraction** outcomes, particularly when guardrails are tested over extended interactions.
1 months ago
LLM Guardrail Bypass and Prompt Injection Weaknesses
Multiple writeups describe how **LLM safety controls can be bypassed through prompt-based attacks**, arguing that jailbreaks and prompt injection are a practical security problem rather than a novelty. The reporting highlights common defense layers—training-time alignment, system prompts, input classifiers, and output filters—and says each can fail because the same model that follows instructions is also asked to interpret and enforce them. One article frames jailbreaks as an attack on the trust architecture of enterprise AI deployments, while the other demonstrates the issue through Lakera’s *Gandalf* challenge, where progressively stronger controls are still defeated by prompt manipulation. The material is **not fluff** because it provides substantive security analysis of an emerging attack class affecting AI systems. Both references focus on the same topic: how prompts can subvert LLM defenses, expose protected information, and reveal architectural weaknesses in current guardrail designs. The practical takeaway for defenders is that natural-language controls alone are brittle, especially when secrets, policy enforcement, and user-controlled input share the same inference path, making prompt injection and jailbreak resistance a core application security concern for enterprise AI deployments.
1 months ago