Skip to main content
Mallory

AI Workflow and Agent Security Risks: Prompt Injection, Credential Leakage, and Recommendation Poisoning

ai-platform-securitydata-exfiltration-methodidentity-authentication-vulnerabilitybuild-pipeline-compromiseleaked-secret-api-key
Updated April 28, 2026 at 03:03 PM6 sources
Share:
AI Workflow and Agent Security Risks: Prompt Injection, Credential Leakage, and Recommendation Poisoning

Get Ahead of Threats Like This

Know if you're exposed. Before adversaries strike.

Multiple reports warn that the most immediate AI security risk is attackers hijacking trusted workflows—AI copilots/agents, CI pipelines, SaaS admin planes, and identity control points—rather than “AI” being a standalone threat category. Commentary and research highlight how prompt-injection-style techniques can turn normal user actions (e.g., clicking a legitimate-looking link) into silent data exfiltration or unsafe tool use, and how autonomous agents can still complete scams even when they can correctly label a page as phishing. 1Password introduced an open-source benchmark, Security Comprehension and Awareness Measure (SCAM), to test whether AI agents behave safely in realistic workplace tasks (email triage, link clicking, retrieving credentials from a vault, and form-filling) using production-like APIs; in testing, models that could identify phishing when asked still proceeded to retrieve and submit real credentials during routine workflows.

Microsoft research described AI recommendation poisoning affecting 31 companies across 14 industries, where hidden instructions embedded in “Summarize with AI” links attempt to inject persistent directives into an assistant’s memory via URL prompt parameters, biasing future recommendations (e.g., prioritizing a specific domain/company). Separately, identity-focused analysis argues that as AI increases automation and API-driven decisioning, identity becomes the enterprise control plane, making IAM architecture and resilience (including where policy evaluation and authorization live) a central security concern at “AI scale.” Two SC Media opinion pieces broaden the theme: one ties recent supply-chain and developer-workflow compromises (e.g., malicious packages/actions and token theft) to the same “trusted workflow” abuse pattern, while another discusses mobile apps as an early-warning surface for supply-chain risk (including AI arriving via third-party SDKs), but it is more forward-looking guidance than incident reporting.

Timeline

  1. Apr 19, 2026

    Researchers disclose AI agent integration flaws and vendors dispute severity

    Researchers reported that Anthropic Claude Code Security Review, Google Gemini CLI Action, and Microsoft GitHub Copilot integrations with GitHub Actions could be abused to steal API keys and access tokens. The article also describes a separate dispute over Anthropic’s Model Context Protocol design, which researchers said could expose up to 200,000 servers, while vendors reportedly paid bug bounties but did not issue CVEs or public advisories for the platform-level issues.

  2. Feb 12, 2026

    Microsoft publishes research on AI recommendation poisoning

    Microsoft disclosed research showing that assistants such as ChatGPT, Claude, Grok, and Microsoft 365 Copilot can be manipulated through hidden instructions embedded in 'Summarize with AI' links and URL prompt parameters. It also shared threat-hunting guidance for detecting these links in email and Microsoft Teams messages.

  3. Feb 12, 2026

    1Password open-sources the SCAM AI agent safety benchmark

    1Password released the Security Comprehension and Awareness Measure (SCAM) under the MIT License, along with tooling to replay scenarios and export video results. The benchmark is intended to help researchers and enterprises evaluate whether AI agents behave safely in realistic workflows.

  4. Feb 12, 2026

    1Password finds security guidance sharply reduces AI agent failures

    In SCAM testing, 1Password found that providing a short security-skills document significantly reduced critical failures and, for several models, eliminated them across repeated runs. Some models still remained inconsistent or continued to fail specific scenarios, including forwarding notes containing embedded passwords and access keys.

  5. Feb 12, 2026

    1Password tests frontier AI agents with SCAM benchmark scenarios

    1Password evaluated eight AI models across 30 realistic workplace security scenarios, finding safety scores ranging from 35% to 92% and critical failures in every model under baseline conditions. The tests showed agents could recognize phishing in isolation yet still perform unsafe actions such as entering credentials into attacker-controlled pages or forwarding secrets.

  6. Dec 14, 2025

    Microsoft observes AI recommendation-poisoning attempts across 31 companies

    Over a 60-day period, Microsoft recorded 50 unique prompt-based memory-poisoning attempts tied to 31 companies, showing that hidden instructions in AI-summary links were being used to manipulate assistant recommendations. The activity was attributed largely to legitimate businesses rather than typical cybercriminal SEO operators.

  7. Jan 17, 2025

    NIST publishes AI agent hijacking evaluation research

    NIST's Center for AI Standards and Innovation published research on AI agent hijacking, using the open-source AgentDojo framework and Claude 3.5 Sonnet agents to measure how indirect prompt injection attacks can drive harmful actions. The study introduced new attacks and high-impact scenarios such as remote code execution, database exfiltration, and automated phishing, and found that repeated attack attempts significantly increased compromise rates.

See the full picture in Mallory

Mallory subscribers get deeper analysis on every story, including:

Impact Assessment

Who’s affected and how

Technical Details

Deep-dive technical analysis

Response Recommendations

Actionable next steps for your team

Indicators of Compromise

IPs, domains, hashes, and more

AI Threads

Ask questions and take action on every story

Advanced Filters

Filter by topic, classification, timeframe

Scheduled Alerts

Get matching stories delivered automatically

Related Stories

Security Risks and Offensive Potential of Agentic AI and Automated Vulnerability Discovery

Security Risks and Offensive Potential of Agentic AI and Automated Vulnerability Discovery

Security leaders are warning that **AI agents are increasingly operating as “digital employees”** inside enterprise workflows—triaging alerts, coordinating investigations, and moving work across security tools—often with **broad permissions and limited governance**. The core risk highlighted is that organizations are deploying high-authority agents like plug-ins (reused service accounts, overbroad roles, weak oversight), creating fast-acting operators that can be manipulated and that lack the contextual judgment and policy awareness expected of human staff. Related commentary also raises concerns about **AI-to-AI communication** and “non-human-readable” behaviors that could reduce auditability and complicate investigations and control enforcement. In parallel, public examples show how quickly AI can accelerate **vulnerability discovery**: Microsoft Azure CTO Mark Russinovich reported using *Claude Opus 4.6* to decompile decades-old Apple II 6502 machine code and identify multiple issues, underscoring that similar techniques could be applied to **embedded/legacy firmware at scale**. Anthropic has also cautioned that advanced models can find high-severity flaws even in heavily tested codebases, reinforcing the likelihood that both defenders and attackers will leverage AI for faster bug-finding. Separate enterprise IT coverage notes that organizations are **reallocating budgets toward AI** by consolidating tools and renegotiating contracts, which can indirectly increase security exposure if cost-cutting reduces overlapping controls or if AI adoption outpaces governance and identity/access management maturity.

Yesterday
AI’s Impact on Secure Coding, Security Operations, and Workforce Strain

AI’s Impact on Secure Coding, Security Operations, and Workforce Strain

Security leaders and practitioners are increasingly framing **AI** as both a force-multiplier for defenders and a risk amplifier for software and operations. Commentary and executive guidance highlighted that AI-assisted fuzzing, static analysis, and large-scale pattern recognition can surface vulnerabilities faster than traditional review, but that faster discovery does not automatically reduce enterprise risk because real-world impact depends on exposure, identity/privilege design, data flows, and business process dependencies. Separately, industry guidance on “rolling out AI” emphasized practical governance measures—knowledge-sharing, partnering, and automation—arguing that the same capabilities that make AI valuable also expand the attack surface and the speed at which threats evolve. Operational reporting also underscored how AI-related and traditional threats are converging in day-to-day security work. A monthly security briefing cited rapid weaponization of a critical BeyondTrust Remote Support pre-auth RCE (**CVE-2026-1731**) with proof-of-concept and exploitation observed shortly after disclosure, later treated as a zero-day and reportedly used in ransomware activity; it also noted emerging integrity risks such as **AI recommendation poisoning** (manipulating AI-generated outputs via hidden instructions) and an AI tooling supply-chain incident involving an unintended update to the *Cline CLI* coding assistant after a compromised token. In parallel, survey results pointed to sustained **workforce burnout**—U.S. security professionals averaging significant weekly overtime and reporting emotional exhaustion—while also indicating a skills shift toward communication and stakeholder management as AI tooling adoption increases cross-functional demands.

Today
Enterprise Security Risks From Agentic and Generative AI Deployments

Enterprise Security Risks From Agentic and Generative AI Deployments

Enterprises are rapidly integrating **agentic AI** assistants with high-privilege connections to ticketing systems, source code repositories, chat platforms, and cloud dashboards, enabling actions such as opening pull requests, querying internal databases, and triggering automated workflows with limited human oversight. Reporting citing Cisco’s *State of AI Security 2026* indicates many organizations are moving forward with these deployments despite low security readiness, expanding exposure across model interfaces, tool integrations, and the broader supply chain. Multiple sources highlight that attacker techniques against AI systems are maturing, particularly **prompt injection/jailbreaks** and multi-turn attacks that exploit session state, memory, and tool-calling to drive unsafe actions or data leakage. Separately, adversaries are using generative AI for **deepfake-enabled social engineering** (including video/voice impersonation to bypass identity verification and authorize sensitive actions) and for scalable brand impersonation via malicious ad campaigns; one widely cited example involved Arup, where a deepfake video call led to authorization of a fraudulent HK$200 million transfer. Overall, the material is primarily risk and threat reporting (not a single incident), emphasizing that AI systems’ contextual behavior and privileged integrations create new control gaps that traditional security testing and defenses may not detect.

1 months ago

Get Ahead of Threats Like This

Mallory continuously monitors global threat intelligence and correlates it with your attack surface. Know if you're exposed. Before adversaries strike.