Monday, August 4, 2025

LegalPwn Attack Exploits Gemini, ChatGPT and other AI Tools into Executing Malware

Share

A sophisticated new attack method that exploits AI models’ tendency to comply with legal-sounding text, successfully bypassing safety measures in popular development tools.

A study by Pangea AI Security has revealed a novel prompt injection technique dubbed “LegalPwn” that weaponizes legal disclaimers, copyright notices, and terms of service to manipulate large language models (LLMs) into executing malicious code.

The attack has proven effective against major AI tools, including GitHub Copilot, Google’s Gemini CLI, ChatGPT, and several other prominent models.


Google News

LegalPwn works by embedding malicious instructions within legitimate-looking legal text that AI models are programmed to respect and process.

Rather than using obvious adversarial prompts, attackers disguise their payload within familiar legal language such as copyright violation warnings, confidentiality notices, or terms of service violations.

Disclaimer Weaponized
Disclaimer Weaponized

“The ability of these models to interpret and contextualize information, while a core strength, can also be a weakness when subtle adversarial instructions are embedded within trusted or seemingly innocuous text,” the researchers explain in their report.

LegalPwn Attack method

The technique proved remarkably effective during testing. When researchers presented malicious code containing a reverse shell (which provides remote system access to attackers) wrapped in legal disclaimers, multiple AI systems failed to identify the security threat. Instead, they classified the dangerous code as safe, with some tools even recommending its execution.

The research team successfully demonstrated LegalPwn attacks in live environments with alarming results. GitHub Copilot, Microsoft’s AI coding assistant, completely missed a reverse shell payload hidden within what appeared to be a simple calculator program, describing the malicious code merely as “a calculator.”

Even more concerning, Google’s Gemini CLI not only failed to detect the threat but actively recommended that users accept and execute the malicious command, which would have provided attackers with complete remote control over the target system.

The malicious payload used in testing was a C program that appeared to be a basic arithmetic calculator but contained a hidden pwn() function.

Attack Result
Attack Result

When triggered during an addition operation, this function would establish a connection to an attacker-controlled server and spawn a remote shell, effectively compromising the entire system.

Testing across 12 major AI models revealed that approximately two-thirds are vulnerable to LegalPwn attacks under certain conditions. ChatGPT 4o, Gemini 2.5, various Grok models, LLaMA 3.3, and DeepSeek Qwen all demonstrated susceptibility to the technique in multiple test scenarios.

AI Models Test
AI Models Test

However, not all models were equally vulnerable. Anthropic’s Claude models (both 3.5 Sonnet and Sonnet 4), Microsoft’s Phi 4, and Meta’s LLaMA Guard 4 consistently resisted the attacks, correctly identifying malicious code and refusing to comply with misleading instructions.

The effectiveness of LegalPwn attacks varied depending on how the AI systems were configured. Models without specific safety instructions were most vulnerable, while those with strong system prompts emphasizing security performed significantly better.

The discovery highlights a critical blind spot in AI security, particularly concerning applications where LLMs process user-generated content, external documents, or internal system texts containing disclaimers.

The attack vector is especially dangerous because legal text is ubiquitous in software development environments and typically processed without suspicion.

Security experts warn that LegalPwn represents more than just a theoretical threat. The technique’s success in bypassing commercial AI security tools demonstrates that attackers could potentially use similar methods to manipulate AI systems into performing unauthorized operations, compromising system integrity, or leaking sensitive information.

Researchers recommend several mitigation strategies, including implementing AI-powered guardrails specifically designed to detect prompt injection attempts, maintaining human oversight for high-stakes applications, and incorporating adversarial training scenarios into LLM development. Enhanced input validation that analyzes semantic intent rather than relying on simple keyword filtering is also crucial.

Integrate ANY.RUN TI Lookup with your SIEM or SOAR To Analyses Advanced Threats -> Try 50 Free Trial Searches

Tarun Chhetri
Tarun Chhetri
We love Tech, AI, Cybersecurity, Startups, Business, Skills, Sports.

Table of contents

Read more

Local News

Follow Us