How Dark Patterns Manipulate Web Agents

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 6.0 Download Report PDF

AgentsRedteamingEvaluationsReasoningFoundation Models

Deceptive UI designs, widely instantiated across the web and commonly known as dark patterns, manipulate users into performing actions misaligned with their goals. In this paper, we show that dark patterns are highly effective in steering agent trajectories, posing a significant risk to agent robustness. To quantify this risk, we introduce , an environment for testing individual dark patterns in isolation. DECEPTICON includes 850 web navigation tasks with dark patterns—600 generated tasks and 250 real-world tasks, designed to measure instruction-following success and dark pattern effectiveness. Across SOTA agents, we find dark patterns successfully steer agent trajectories towards malicious outcomes in over 70% of tested generated and real-world tasks. Moreover, we find that dark pattern effectiveness correlates positively with model size and test-time reasoning, making larger, more capable models more susceptible. Leading countermeasures against adversarial attacks, including in-context prompting and guardrail models, fail to consistently reduce the success rate of dark pattern interventions. Our findings reveal dark patterns as a latent and unmitigated risk to web agents, highlighting the urgent need for robust defenses against manipulative designs.

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper introduces DECEPTICON, a benchmark environment for testing web agent robustness against dark patterns through 850 isolated tasks (600 synthetic, 250 real-world). It resides in the 'Isolated Dark Pattern Testing Environments' leaf alongside two sibling papers within a taxonomy of 15 total works. This leaf represents a focused research direction under 'Benchmark Development and Empirical Evaluation,' suggesting a moderately sparse but active area where controlled experimental approaches to dark pattern susceptibility are emerging as a distinct methodological paradigm.

The taxonomy reveals neighboring work in 'Human-AI Comparative Studies' examining cross-population susceptibility, 'Deceptive Feedback in Multi-Agent Workflows' exploring adversarial judge models, and 'Environmental Injection Attacks' targeting visual perception in mobile contexts. The paper's focus on isolated UI manipulation distinguishes it from these adjacent directions: it excludes multi-agent feedback systems and dynamic environmental corruption, instead concentrating on static web-based deceptive design elements. The taxonomy's scope notes clarify that this work sits at the intersection of behavioral manipulation measurement and systematic benchmark construction, diverging from broader security taxonomies and phishing detection paradigms.

Among 30 candidates examined, the DECEPTICON environment contribution shows overlap with 2 prior works out of 10 candidates reviewed, suggesting some precedent in benchmark-driven dark pattern testing. The operationalized taxonomy of dark patterns by attack mode found no clear refutations across 10 candidates, indicating potential novelty in classification approach. The adversarial generation pipeline similarly encountered no refutations among 10 candidates, though the limited search scope means undiscovered prior work in synthetic task generation remains possible. The correlation findings between model capability and susceptibility appear less explored in the examined literature.

Based on top-30 semantic matches and citation expansion, the work appears to advance a relatively nascent research direction where systematic, isolated testing of dark patterns on agents is still developing methodological foundations. The analysis covers benchmark construction and empirical evaluation but does not exhaustively survey all adversarial robustness literature or real-world deployment studies, leaving open questions about how findings generalize beyond controlled environments and whether the taxonomy comprehensively captures all dark pattern attack modes.

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: Robustness of web agents against deceptive user interface designs. The field examines how autonomous agents navigate web environments that deliberately mislead or manipulate users through dark patterns, phishing attempts, and other adversarial UI elements. The taxonomy reveals four main branches: Dark Pattern Susceptibility and Manipulation focuses on how agents respond to manipulative design tactics such as hidden costs or forced actions; Privacy and Security Threat Analysis investigates vulnerabilities including prompt injection and credential theft; Phishing and Fraudulent Content Detection addresses agents' ability to identify malicious sites and scams; and Contextual Awareness and Specialized Domains explores challenges in culturally diverse settings and domain-specific tasks. Works like Dark Patterns GUI Agents[1] and Susbench[3] exemplify benchmark-driven evaluation of agent robustness, while studies such as Phishing Detection Robustness[2] and Deep Learning Phishing Detection[5] concentrate on detection mechanisms for fraudulent content. A particularly active line of work centers on developing controlled testing environments to measure agent susceptibility to specific deceptive tactics. Dark Patterns Manipulate Agents[0] sits squarely within this cluster, emphasizing isolated dark pattern testing to systematically evaluate how agents handle manipulative UI elements in benchmark settings. This approach contrasts with broader security frameworks like Security Vulnerabilities Systematization[7] that catalog diverse threat vectors, and differs from detection-oriented studies such as Phishing Detection Robustness[2] that prioritize identifying malicious content rather than measuring behavioral manipulation. Nearby works like Susbench[3] and DECEPTICON[11] share a similar empirical focus on benchmark construction, yet Dark Patterns Manipulate Agents[0] appears to concentrate more narrowly on the behavioral impact of dark patterns themselves. Open questions remain about how findings from isolated testing environments generalize to real-world scenarios where multiple deceptive tactics combine, and whether agents can develop robust defenses without sacrificing task performance.

Claimed Contributions

DECEPTICON environment for testing dark patterns on web agents

Can Refute

10 retrieved papers

The authors construct DECEPTICON, a controlled evaluation environment built on WebVoyager that enables systematic investigation of dark pattern effects on web agents. It includes 850 tasks (600 generated and 250 real-world) designed to isolate and measure individual dark pattern effectiveness while ensuring reproducibility through archived web pages.

10 retrieved papers

Can Refute

Operationalized taxonomy of dark patterns by mode of attack

10 retrieved papers

The authors develop an action-centric taxonomy that classifies dark patterns into six categories (sneaking, urgency, misdirection, social proof, obstruction, forced action) based on their attack mechanisms rather than implementation details or website types.

10 retrieved papers

Adversarial generation pipeline for synthetic dark pattern tasks

10 retrieved papers

The authors design a generation method that creates realistic dark pattern tasks by first generating base website UIs, then injecting dark patterns based on documented examples, and using an agentic scaffold with iterative testing to ensure tasks are solvable but challenging.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

[3] Susbench: An online benchmark for evaluating dark pattern susceptibility of computer-use agents PDF

Yuan Chenjie, Longjie Guo, Zhong Mingyuan, Chenjie Yuan, Wolfe, Robert, Mingyuan Zhong, Robert Wolfe, Xu Yue, Ruican Zhong, Wen, Bingbing, Yue Xu, Shen Hua, Bingbing Wen, Wang, Lucy Lu, Hua Shen, Hiniker, Alexis, Lucy Lu Wang, Alexis Hiniker (2025)

[11] DECEPTICON: How Dark Patterns Manipulate Web Agents PDF

Phil Cuvin, Hao Zhu, Diyi Yang (2025)

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

DECEPTICON environment for testing dark patterns on web agents

[3] Susbench: An online benchmark for evaluating dark pattern susceptibility of computer-use agents PDF

Can Refute

[13] Investigating the Impact of Dark Patterns on LLM-Based Web Agents PDF

Can Refute

[1] Dark patterns meet gui agents: Llm agent susceptibility to manipulative interfaces and the role of human oversight PDF

Cannot Refute

[6] Helpful Agent Meets Deceptive Judge: Understanding Vulnerabilities in Agentic Workflows PDF

Cannot Refute

[7] A systematization of security vulnerabilities in computer use agents PDF

Cannot Refute

[11] DECEPTICON: How Dark Patterns Manipulate Web Agents PDF

Cannot Refute

[26] Hijacking jarvis: Benchmarking mobile gui agents against unprivileged third parties PDF

Cannot Refute

[27] macOSWorld: A Multilingual Interactive Benchmark for GUI Agents PDF

Cannot Refute

[28] Are Your Agents Upward Deceivers? PDF

Cannot Refute

[29] It's a TRAP! Task-Redirecting Agent Persuasion Benchmark for Web Agents PDF

Cannot Refute

Contribution

Operationalized taxonomy of dark patterns by mode of attack

[16] DarkBench: Benchmarking Dark Patterns in Large Language Models PDF

Cannot Refute

[17] Consumer manipulationâa definition, classification and future research agenda PDF

Cannot Refute

[18] Dark patterns at scale: Findings from a crawl of 11K shopping websites PDF

Cannot Refute

[19] Measuring the deceptive potential of design patterns: a decision-making game PDF

Cannot Refute

[20] Sludge, dark patterns and dark nudges: A taxonomy of online gambling platforms' deceptive design features PDF

Cannot Refute

[21] Regulating Dark Patterns PDF

Cannot Refute

[22] The siren song of llms: How users perceive and respond to dark patterns in large language models PDF

Cannot Refute

[23] Conceptualizations of user autonomy within the normative evaluation of dark patterns PDF

Cannot Refute

[24] Beyond Dark Patterns: A Concept-Based Framework for Ethical Software Design PDF

Cannot Refute

[25] Dark Patterns, Enforcement, and the emerging Digital Design Acquis: Manipulation beneath the Interface PDF

Cannot Refute

Contribution

Adversarial generation pipeline for synthetic dark pattern tasks

[2] From ML to LLM: Evaluating the Robustness of Phishing Web Page Detection Models against Adversarial Attacks PDF

Cannot Refute

[30] Adversarial Environment Generation for Learning to Navigate the Web PDF

Cannot Refute

[31] Towards Safe Synthetic Image Generation On the Web: A Multimodal Robust NSFW Defense and Million Scale Dataset PDF

Cannot Refute

[32] PWDGAN: Generating adversarial malicious URL examples for deceiving black-box phishing website detector using GANs PDF

Cannot Refute

[33] AWA: Adversarial website adaptation PDF

Cannot Refute

[34] Him of many faces: characterizing billion-scale adversarial and benign browser fingerprints on commercial websites. PDF

Cannot Refute

[35] WF-A2D: Enhancing Privacy With Asymmetric Adversarial Defense Against Website Fingerprinting PDF

Cannot Refute

[36] Enhancing resilience in website fingerprinting: Novel adversary strategies for noisy traffic environments PDF

Cannot Refute

[37] {PhishDecloaker}: Detecting {CAPTCHA-cloaked} Phishing Websites via Hybrid Vision-based Interactive Models PDF

Cannot Refute

[38] Generation of Realistic Navigation Paths for Web Site Testing Using RNN and GAN PDF

Cannot Refute

How Dark Patterns Manipulate Web Agents

Overview

Overall Novelty Assessment

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

[3] Susbench: An online benchmark for evaluating dark pattern susceptibility of computer-use agents PDF

[11] DECEPTICON: How Dark Patterns Manipulate Web Agents PDF

Contribution Analysis

DECEPTICON environment for testing dark patterns on web agents

[3] Susbench: An online benchmark for evaluating dark pattern susceptibility of computer-use agents PDF

[13] Investigating the Impact of Dark Patterns on LLM-Based Web Agents PDF

[1] Dark patterns meet gui agents: Llm agent susceptibility to manipulative interfaces and the role of human oversight PDF

[6] Helpful Agent Meets Deceptive Judge: Understanding Vulnerabilities in Agentic Workflows PDF

[7] A systematization of security vulnerabilities in computer use agents PDF

[11] DECEPTICON: How Dark Patterns Manipulate Web Agents PDF

[26] Hijacking jarvis: Benchmarking mobile gui agents against unprivileged third parties PDF

[27] macOSWorld: A Multilingual Interactive Benchmark for GUI Agents PDF

[28] Are Your Agents Upward Deceivers? PDF

[29] It's a TRAP! Task-Redirecting Agent Persuasion Benchmark for Web Agents PDF

Operationalized taxonomy of dark patterns by mode of attack

[16] DarkBench: Benchmarking Dark Patterns in Large Language Models PDF

[17] Consumer manipulationâa definition, classification and future research agenda PDF

[18] Dark patterns at scale: Findings from a crawl of 11K shopping websites PDF

[19] Measuring the deceptive potential of design patterns: a decision-making game PDF

[20] Sludge, dark patterns and dark nudges: A taxonomy of online gambling platforms' deceptive design features PDF

[21] Regulating Dark Patterns PDF

[22] The siren song of llms: How users perceive and respond to dark patterns in large language models PDF

[23] Conceptualizations of user autonomy within the normative evaluation of dark patterns PDF

[24] Beyond Dark Patterns: A Concept-Based Framework for Ethical Software Design PDF

[25] Dark Patterns, Enforcement, and the emerging Digital Design Acquis: Manipulation beneath the Interface PDF

Adversarial generation pipeline for synthetic dark pattern tasks

[2] From ML to LLM: Evaluating the Robustness of Phishing Web Page Detection Models against Adversarial Attacks PDF

[30] Adversarial Environment Generation for Learning to Navigate the Web PDF

[31] Towards Safe Synthetic Image Generation On the Web: A Multimodal Robust NSFW Defense and Million Scale Dataset PDF

[32] PWDGAN: Generating adversarial malicious URL examples for deceiving black-box phishing website detector using GANs PDF

[33] AWA: Adversarial website adaptation PDF

[34] Him of many faces: characterizing billion-scale adversarial and benign browser fingerprints on commercial websites. PDF

[35] WF-A2D: Enhancing Privacy With Asymmetric Adversarial Defense Against Website Fingerprinting PDF

[36] Enhancing resilience in website fingerprinting: Novel adversary strategies for noisy traffic environments PDF

[37] {PhishDecloaker}: Detecting {CAPTCHA-cloaked} Phishing Websites via Hybrid Vision-based Interactive Models PDF

[38] Generation of Realistic Navigation Paths for Web Site Testing Using RNN and GAN PDF

Table of Contents

[17] Consumer manipulationâa definition, classification and future research agenda PDF