RedTeamCUA: Realistic Adversarial Testing of Computer-Use Agents in Hybrid Web-OS Environments

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 6.0 Download Report PDF

Computer-Use AgentsAdversarial RisksSandboxBenchmark

Computer-use agents (CUAs) promise to automate complex tasks across operating systems (OS) and the web, but remain vulnerable to indirect prompt injection, where attackers embed malicious content into the environment to hijack agent behavior. Current evaluations of this threat either lack support for adversarial testing in realistic but controlled environments or ignore hybrid web-OS attack scenarios involving both interfaces. To address this, we propose RedTeamCUA, an adversarial testing framework featuring a novel hybrid sandbox that integrates a VM-based OS environment with Docker-based web platforms. Our sandbox supports key features tailored for red teaming, such as flexible adversarial scenario configuration, and a setting that decouples adversarial evaluation from navigational limitations of CUAs by initializing tests directly at the point of an adversarial injection. Using RedTeamCUA, we develop RTC-Bench, a comprehensive benchmark with 864 examples that investigate realistic, hybrid web-OS attack scenarios and fundamental security vulnerabilities. Benchmarking current frontier CUAs identifies significant vulnerabilities: Claude 3.7 Sonnet | CUA demonstrates an Attack Success Rate (ASR) of 42.9%, while Operator, the most secure CUA evaluated, still exhibits an ASR of 7.6%. Notably, CUAs often attempt to execute adversarial tasks with an Attempt Rate as high as 92.5%, although failing to complete them due to capability limitations. Nevertheless, we observe concerning ASRs of up to 50% in realistic end-to-end settings, indicating that CUA threats can already result in tangible risks to users and computer systems. Overall, RedTeamCUA provides an essential framework for advancing realistic, controlled, and systematic analysis of CUA vulnerabilities, highlighting the urgent need for robust defenses to indirect prompt injection prior to real-world deployment.

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper introduces RedTeamCUA, a framework for adversarial testing of computer-use agents against indirect prompt injection, and RTC-Bench, a benchmark with 864 examples. It resides in the 'Comprehensive Multi-Environment Testing Frameworks' leaf alongside two sibling papers. This leaf sits within the broader 'Adversarial Testing Frameworks and Benchmarks' branch, which itself is one of five major research directions in the field. The taxonomy contains 50 papers total, indicating a moderately active research area with multiple specialized sub-communities.

The paper's leaf neighbors include 'Specialized Domain Benchmarks' (four papers targeting specific applications like tool-integrated agents) and 'Black-Box Fuzzing and Automated Discovery' (four papers on automated vulnerability discovery). The broader taxonomy reveals parallel efforts in attack technique development, defense mechanisms, threat modeling, and empirical evaluations. The scope note for the paper's leaf emphasizes 'hybrid environments (web-OS, GUI, multi-modal)' and 'realistic scenario configuration,' distinguishing it from single-environment benchmarks. This positioning suggests the work aims to bridge gaps between isolated testing paradigms.

Among 25 candidates examined across three contributions, no clearly refuting prior work was identified. The hybrid sandbox contribution examined 10 candidates with zero refutations; the benchmark contribution similarly examined 10 with none refuting; the decoupled evaluation setting examined 5 with zero refutations. This suggests that within the limited search scope—focused on top semantic matches and citation expansion—the specific combination of hybrid web-OS sandboxing, decoupled evaluation, and comprehensive adversarial scenarios appears less directly overlapping with existing frameworks. However, the search scale (25 candidates, not hundreds) means unexplored literature may contain relevant comparisons.

Based on the limited literature search, the work appears to occupy a distinct position within comprehensive testing frameworks, particularly in its hybrid web-OS integration and decoupled evaluation design. The absence of refuting candidates among 25 examined suggests novelty in the specific technical approach, though the broader research direction (multi-environment adversarial testing) is clearly established with multiple active efforts. The analysis covers top semantic matches and immediate citations but does not claim exhaustive field coverage.

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: adversarial testing of computer-use agents against indirect prompt injection. The field has organized itself around five main branches that collectively address how to discover, understand, and mitigate vulnerabilities in agents that interact with external environments. Adversarial Testing Frameworks and Benchmarks provide structured evaluation platforms—ranging from comprehensive multi-environment setups like RedTeamCUA[0] and EVA[3] to more focused benchmarks such as Agentdojo[8]—that systematically probe agent robustness. Attack Techniques and Vulnerability Analysis explores the mechanics of injection vectors, including cross-modal exploits and context manipulation strategies. Defense Mechanisms and Mitigation Strategies propose countermeasures such as input filtering, runtime monitoring, and architectural safeguards to harden agents. Threat Modeling and Security Analysis offers conceptual frameworks for reasoning about adversarial scenarios and risk surfaces, while Empirical Security Evaluations and Comparative Studies measure real-world performance across different agent designs and defenses. A particularly active line of work centers on building holistic testing frameworks that span diverse environments and attack surfaces. RedTeamCUA[0] exemplifies this trend by offering a comprehensive multi-environment approach to red-teaming computer-use agents, situating itself alongside EVA[3] and Agentdojo[8], which similarly emphasize broad coverage and reproducible adversarial scenarios. In contrast, works like Benchmarking Indirect Injection[2] and Cross-Modal Injection[4] zoom in on specific attack modalities, exploring how malicious content can be embedded in web pages or multimodal inputs. Meanwhile, defense-oriented studies such as AgentVigil[5] and IPIGuard[7] investigate runtime detection and filtering techniques, highlighting ongoing trade-offs between security overhead and agent utility. RedTeamCUA[0] occupies a central position within the comprehensive testing cluster, emphasizing scalable adversarial evaluation across multiple interaction contexts, whereas neighboring frameworks like EVA[3] may prioritize different environment types or evaluation metrics. These contrasting emphases reflect open questions about which testing paradigms best capture real-world deployment risks and how to balance breadth with depth in adversarial coverage.

Claimed Contributions

REDTEAMCUA adversarial testing framework with hybrid sandbox

10 retrieved papers

The authors introduce REDTEAMCUA, a framework that combines a VM-based operating system environment with Docker-based web platforms to enable realistic and controlled adversarial testing of computer-use agents across both web and OS interfaces. The hybrid sandbox supports configurable adversarial scenario injection and a decoupled evaluation setting that separates adversarial robustness testing from navigational capability limitations.

10 retrieved papers

RTC-BENCH comprehensive adversarial benchmark

10 retrieved papers

The authors construct RTC-BENCH, a benchmark comprising 864 test examples designed to evaluate CUA vulnerabilities to indirect prompt injection. The benchmark systematically explores hybrid web-OS attack pathways by coupling 9 benign goals with 24 adversarial goals based on the CIA security framework, with variations in instruction specificity and injection content type.

10 retrieved papers

Decoupled evaluation setting for focused vulnerability analysis

5 retrieved papers

The authors introduce a Decoupled Eval setting that uses pre-processed actions to place agents directly at the adversarial injection site, isolating adversarial robustness assessment from navigation limitations. This enables focused analysis of CUA vulnerabilities when directly exposed to malicious content, independent of the agent's ability to navigate to the injection point.

5 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

[3] EVA: Red-Teaming GUI Agents via Evolving Indirect Prompt Injection PDF

Lu Yijie, Yijie Lu, Zhao, Manman, Tianjie Ju, Ma, Xinbei, Manman Zhao, Guo Yuan, Xinbei Ma, Zhang, Zhuosheng, Yuan Guo, Zhuosheng Zhang (2025)

[8] Agentdojo: A dynamic environment to evaluate prompt injection attacks and defenses for llm agents PDF

Mislav Balunovic, Luca Beurer-Kellner, Edoardo Debenedetti, Marc Fischer, Florian Tramer, Jie Zhang (2024)

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

REDTEAMCUA adversarial testing framework with hybrid sandbox

[25] A systematization of security vulnerabilities in computer use agents PDF

Cannot Refute

[35] The Commercial Landscape of Agentic AI Security PDF

Cannot Refute

[58] Coral: Container online risk assessment with logical attack graphs PDF

Cannot Refute

[59] EVMFuzz: Differential fuzz testing of Ethereum virtual machine PDF

Cannot Refute

[60] Secure software development and testing: A model-based methodology PDF

Cannot Refute

[61] AI-Optimized Network Function Virtualization Security in Cloud Infrastructure PDF

Cannot Refute

[62] Laccolith: Hypervisor-based adversary emulation with anti-detection PDF

Cannot Refute

[63] A Review of TRiSM Frameworks in Artificial Intelligence Systems: Fundamentals, Taxonomy, Use Cases, Key Challenges and Future Directions PDF

Cannot Refute

[64] Construction and Evaluation of LLM-based agents for Semi-Autonomous penetration testing PDF

Cannot Refute

[65] Torpedo: A Fuzzing Framework for Discovering Adversarial Container Workloads PDF

Cannot Refute

Contribution

RTC-BENCH comprehensive adversarial benchmark

[5] AgentVigil: Generic Black-Box Red-teaming for Indirect Prompt Injection against LLM Agents PDF

Cannot Refute

[14] InjecAgent: Benchmarking Indirect Prompt Injections in Tool-Integrated Large Language Model Agents PDF

Cannot Refute

[15] MELON: Provable Defense Against Indirect Prompt Injection Attacks in AI Agents PDF

Cannot Refute

[51] VPI-Bench: Visual Prompt Injection Attacks for Computer-Use Agents PDF

Cannot Refute

[52] [Short Paper] Forensic Analysis of Indirect Prompt Injection Attacks on LLM Agents PDF

Cannot Refute

[53] Agentharm: A benchmark for measuring harmfulness of llm agents PDF

Cannot Refute

[54] {SelfDefend}:{LLMs} can defend themselves against jailbreaking in a practical manner PDF

Cannot Refute

[55] Llamafirewall: An open source guardrail system for building secure ai agents PDF

Cannot Refute

[56] OS-Harm: A Benchmark for Measuring Safety of Computer Use Agents PDF

Cannot Refute

[57] Jailjudge: A comprehensive jailbreak judge benchmark with multi-agent enhanced explanation evaluation framework PDF

Cannot Refute

Contribution

Decoupled evaluation setting for focused vulnerability analysis

[66] Adaptive decentralized knowledge networks uniting causal generative models, federated optimization, and cryptographic proofs for scalable autonomous â¦ PDF

Cannot Refute

[67] Resilient distributed state estimation with mobile agents: overcoming Byzantine adversaries, communication losses, and intermittent measurements PDF

Cannot Refute

[68] Robust and self-repairing formation control for swarms of mobile agents PDF

Cannot Refute

[69] Security of deep reinforcement learning PDF

Cannot Refute

[70] Mobility Robustness Optimization for Dynamic Cell ON/OFF Small-Cell Networks: Independent DQN-Based Multi-Agent Reinforcement Learning Approach PDF

Cannot Refute

RedTeamCUA: Realistic Adversarial Testing of Computer-Use Agents in Hybrid Web-OS Environments

Overview

Overall Novelty Assessment

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

[3] EVA: Red-Teaming GUI Agents via Evolving Indirect Prompt Injection PDF

[8] Agentdojo: A dynamic environment to evaluate prompt injection attacks and defenses for llm agents PDF

Contribution Analysis

REDTEAMCUA adversarial testing framework with hybrid sandbox

[25] A systematization of security vulnerabilities in computer use agents PDF

[35] The Commercial Landscape of Agentic AI Security PDF

[58] Coral: Container online risk assessment with logical attack graphs PDF

[59] EVMFuzz: Differential fuzz testing of Ethereum virtual machine PDF

[60] Secure software development and testing: A model-based methodology PDF

[61] AI-Optimized Network Function Virtualization Security in Cloud Infrastructure PDF

[62] Laccolith: Hypervisor-based adversary emulation with anti-detection PDF

[63] A Review of TRiSM Frameworks in Artificial Intelligence Systems: Fundamentals, Taxonomy, Use Cases, Key Challenges and Future Directions PDF

[64] Construction and Evaluation of LLM-based agents for Semi-Autonomous penetration testing PDF

[65] Torpedo: A Fuzzing Framework for Discovering Adversarial Container Workloads PDF

RTC-BENCH comprehensive adversarial benchmark

[5] AgentVigil: Generic Black-Box Red-teaming for Indirect Prompt Injection against LLM Agents PDF

[14] InjecAgent: Benchmarking Indirect Prompt Injections in Tool-Integrated Large Language Model Agents PDF

[15] MELON: Provable Defense Against Indirect Prompt Injection Attacks in AI Agents PDF

[51] VPI-Bench: Visual Prompt Injection Attacks for Computer-Use Agents PDF

[52] [Short Paper] Forensic Analysis of Indirect Prompt Injection Attacks on LLM Agents PDF

[53] Agentharm: A benchmark for measuring harmfulness of llm agents PDF

[54] {SelfDefend}:{LLMs} can defend themselves against jailbreaking in a practical manner PDF

[55] Llamafirewall: An open source guardrail system for building secure ai agents PDF

[56] OS-Harm: A Benchmark for Measuring Safety of Computer Use Agents PDF

[57] Jailjudge: A comprehensive jailbreak judge benchmark with multi-agent enhanced explanation evaluation framework PDF

Decoupled evaluation setting for focused vulnerability analysis

[66] Adaptive decentralized knowledge networks uniting causal generative models, federated optimization, and cryptographic proofs for scalable autonomous â¦ PDF

[67] Resilient distributed state estimation with mobile agents: overcoming Byzantine adversaries, communication losses, and intermittent measurements PDF

[68] Robust and self-repairing formation control for swarms of mobile agents PDF

[69] Security of deep reinforcement learning PDF

[70] Mobility Robustness Optimization for Dynamic Cell ON/OFF Small-Cell Networks: Independent DQN-Based Multi-Agent Reinforcement Learning Approach PDF

Table of Contents

[66] Adaptive decentralized knowledge networks uniting causal generative models, federated optimization, and cryptographic proofs for scalable autonomous â¦ PDF