RedTeamCUA: Realistic Adversarial Testing of Computer-Use Agents in Hybrid Web-OS Environments

ICLR 2026 Conference SubmissionAnonymous Authors
Computer-Use AgentsAdversarial RisksSandboxBenchmark
Abstract:

Computer-use agents (CUAs) promise to automate complex tasks across operating systems (OS) and the web, but remain vulnerable to indirect prompt injection, where attackers embed malicious content into the environment to hijack agent behavior. Current evaluations of this threat either lack support for adversarial testing in realistic but controlled environments or ignore hybrid web-OS attack scenarios involving both interfaces. To address this, we propose RedTeamCUA, an adversarial testing framework featuring a novel hybrid sandbox that integrates a VM-based OS environment with Docker-based web platforms. Our sandbox supports key features tailored for red teaming, such as flexible adversarial scenario configuration, and a setting that decouples adversarial evaluation from navigational limitations of CUAs by initializing tests directly at the point of an adversarial injection. Using RedTeamCUA, we develop RTC-Bench, a comprehensive benchmark with 864 examples that investigate realistic, hybrid web-OS attack scenarios and fundamental security vulnerabilities. Benchmarking current frontier CUAs identifies significant vulnerabilities: Claude 3.7 Sonnet | CUA demonstrates an Attack Success Rate (ASR) of 42.9%, while Operator, the most secure CUA evaluated, still exhibits an ASR of 7.6%. Notably, CUAs often attempt to execute adversarial tasks with an Attempt Rate as high as 92.5%, although failing to complete them due to capability limitations. Nevertheless, we observe concerning ASRs of up to 50% in realistic end-to-end settings, indicating that CUA threats can already result in tangible risks to users and computer systems. Overall, RedTeamCUA provides an essential framework for advancing realistic, controlled, and systematic analysis of CUA vulnerabilities, highlighting the urgent need for robust defenses to indirect prompt injection prior to real-world deployment.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper introduces RedTeamCUA, a framework for adversarial testing of computer-use agents against indirect prompt injection, and RTC-Bench, a benchmark with 864 examples. It resides in the 'Comprehensive Multi-Environment Testing Frameworks' leaf alongside two sibling papers. This leaf sits within the broader 'Adversarial Testing Frameworks and Benchmarks' branch, which itself is one of five major research directions in the field. The taxonomy contains 50 papers total, indicating a moderately active research area with multiple specialized sub-communities.

The paper's leaf neighbors include 'Specialized Domain Benchmarks' (four papers targeting specific applications like tool-integrated agents) and 'Black-Box Fuzzing and Automated Discovery' (four papers on automated vulnerability discovery). The broader taxonomy reveals parallel efforts in attack technique development, defense mechanisms, threat modeling, and empirical evaluations. The scope note for the paper's leaf emphasizes 'hybrid environments (web-OS, GUI, multi-modal)' and 'realistic scenario configuration,' distinguishing it from single-environment benchmarks. This positioning suggests the work aims to bridge gaps between isolated testing paradigms.

Among 25 candidates examined across three contributions, no clearly refuting prior work was identified. The hybrid sandbox contribution examined 10 candidates with zero refutations; the benchmark contribution similarly examined 10 with none refuting; the decoupled evaluation setting examined 5 with zero refutations. This suggests that within the limited search scope—focused on top semantic matches and citation expansion—the specific combination of hybrid web-OS sandboxing, decoupled evaluation, and comprehensive adversarial scenarios appears less directly overlapping with existing frameworks. However, the search scale (25 candidates, not hundreds) means unexplored literature may contain relevant comparisons.

Based on the limited literature search, the work appears to occupy a distinct position within comprehensive testing frameworks, particularly in its hybrid web-OS integration and decoupled evaluation design. The absence of refuting candidates among 25 examined suggests novelty in the specific technical approach, though the broader research direction (multi-environment adversarial testing) is clearly established with multiple active efforts. The analysis covers top semantic matches and immediate citations but does not claim exhaustive field coverage.

Taxonomy

Core-task Taxonomy Papers
50
3
Claimed Contributions
25
Contribution Candidate Papers Compared
0
Refutable Paper

Research Landscape Overview

Core task: adversarial testing of computer-use agents against indirect prompt injection. The field has organized itself around five main branches that collectively address how to discover, understand, and mitigate vulnerabilities in agents that interact with external environments. Adversarial Testing Frameworks and Benchmarks provide structured evaluation platforms—ranging from comprehensive multi-environment setups like RedTeamCUA[0] and EVA[3] to more focused benchmarks such as Agentdojo[8]—that systematically probe agent robustness. Attack Techniques and Vulnerability Analysis explores the mechanics of injection vectors, including cross-modal exploits and context manipulation strategies. Defense Mechanisms and Mitigation Strategies propose countermeasures such as input filtering, runtime monitoring, and architectural safeguards to harden agents. Threat Modeling and Security Analysis offers conceptual frameworks for reasoning about adversarial scenarios and risk surfaces, while Empirical Security Evaluations and Comparative Studies measure real-world performance across different agent designs and defenses. A particularly active line of work centers on building holistic testing frameworks that span diverse environments and attack surfaces. RedTeamCUA[0] exemplifies this trend by offering a comprehensive multi-environment approach to red-teaming computer-use agents, situating itself alongside EVA[3] and Agentdojo[8], which similarly emphasize broad coverage and reproducible adversarial scenarios. In contrast, works like Benchmarking Indirect Injection[2] and Cross-Modal Injection[4] zoom in on specific attack modalities, exploring how malicious content can be embedded in web pages or multimodal inputs. Meanwhile, defense-oriented studies such as AgentVigil[5] and IPIGuard[7] investigate runtime detection and filtering techniques, highlighting ongoing trade-offs between security overhead and agent utility. RedTeamCUA[0] occupies a central position within the comprehensive testing cluster, emphasizing scalable adversarial evaluation across multiple interaction contexts, whereas neighboring frameworks like EVA[3] may prioritize different environment types or evaluation metrics. These contrasting emphases reflect open questions about which testing paradigms best capture real-world deployment risks and how to balance breadth with depth in adversarial coverage.

Claimed Contributions

REDTEAMCUA adversarial testing framework with hybrid sandbox

The authors introduce REDTEAMCUA, a framework that combines a VM-based operating system environment with Docker-based web platforms to enable realistic and controlled adversarial testing of computer-use agents across both web and OS interfaces. The hybrid sandbox supports configurable adversarial scenario injection and a decoupled evaluation setting that separates adversarial robustness testing from navigational capability limitations.

10 retrieved papers
RTC-BENCH comprehensive adversarial benchmark

The authors construct RTC-BENCH, a benchmark comprising 864 test examples designed to evaluate CUA vulnerabilities to indirect prompt injection. The benchmark systematically explores hybrid web-OS attack pathways by coupling 9 benign goals with 24 adversarial goals based on the CIA security framework, with variations in instruction specificity and injection content type.

10 retrieved papers
Decoupled evaluation setting for focused vulnerability analysis

The authors introduce a Decoupled Eval setting that uses pre-processed actions to place agents directly at the adversarial injection site, isolating adversarial robustness assessment from navigation limitations. This enables focused analysis of CUA vulnerabilities when directly exposed to malicious content, independent of the agent's ability to navigate to the injection point.

5 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

REDTEAMCUA adversarial testing framework with hybrid sandbox

The authors introduce REDTEAMCUA, a framework that combines a VM-based operating system environment with Docker-based web platforms to enable realistic and controlled adversarial testing of computer-use agents across both web and OS interfaces. The hybrid sandbox supports configurable adversarial scenario injection and a decoupled evaluation setting that separates adversarial robustness testing from navigational capability limitations.

Contribution

RTC-BENCH comprehensive adversarial benchmark

The authors construct RTC-BENCH, a benchmark comprising 864 test examples designed to evaluate CUA vulnerabilities to indirect prompt injection. The benchmark systematically explores hybrid web-OS attack pathways by coupling 9 benign goals with 24 adversarial goals based on the CIA security framework, with variations in instruction specificity and injection content type.

Contribution

Decoupled evaluation setting for focused vulnerability analysis

The authors introduce a Decoupled Eval setting that uses pre-processed actions to place agents directly at the adversarial injection site, isolating adversarial robustness assessment from navigation limitations. This enables focused analysis of CUA vulnerabilities when directly exposed to malicious content, independent of the agent's ability to navigate to the injection point.