How Dark Patterns Manipulate Web Agents

ICLR 2026 Conference SubmissionAnonymous Authors
AgentsRedteamingEvaluationsReasoningFoundation Models
Abstract:

Deceptive UI designs, widely instantiated across the web and commonly known as dark patterns, manipulate users into performing actions misaligned with their goals. In this paper, we show that dark patterns are highly effective in steering agent trajectories, posing a significant risk to agent robustness. To quantify this risk, we introduce , an environment for testing individual dark patterns in isolation. DECEPTICON includes 850 web navigation tasks with dark patterns—600 generated tasks and 250 real-world tasks, designed to measure instruction-following success and dark pattern effectiveness. Across SOTA agents, we find dark patterns successfully steer agent trajectories towards malicious outcomes in over 70% of tested generated and real-world tasks. Moreover, we find that dark pattern effectiveness correlates positively with model size and test-time reasoning, making larger, more capable models more susceptible. Leading countermeasures against adversarial attacks, including in-context prompting and guardrail models, fail to consistently reduce the success rate of dark pattern interventions. Our findings reveal dark patterns as a latent and unmitigated risk to web agents, highlighting the urgent need for robust defenses against manipulative designs.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper introduces DECEPTICON, a benchmark environment for testing web agent robustness against dark patterns through 850 isolated tasks (600 synthetic, 250 real-world). It resides in the 'Isolated Dark Pattern Testing Environments' leaf alongside two sibling papers within a taxonomy of 15 total works. This leaf represents a focused research direction under 'Benchmark Development and Empirical Evaluation,' suggesting a moderately sparse but active area where controlled experimental approaches to dark pattern susceptibility are emerging as a distinct methodological paradigm.

The taxonomy reveals neighboring work in 'Human-AI Comparative Studies' examining cross-population susceptibility, 'Deceptive Feedback in Multi-Agent Workflows' exploring adversarial judge models, and 'Environmental Injection Attacks' targeting visual perception in mobile contexts. The paper's focus on isolated UI manipulation distinguishes it from these adjacent directions: it excludes multi-agent feedback systems and dynamic environmental corruption, instead concentrating on static web-based deceptive design elements. The taxonomy's scope notes clarify that this work sits at the intersection of behavioral manipulation measurement and systematic benchmark construction, diverging from broader security taxonomies and phishing detection paradigms.

Among 30 candidates examined, the DECEPTICON environment contribution shows overlap with 2 prior works out of 10 candidates reviewed, suggesting some precedent in benchmark-driven dark pattern testing. The operationalized taxonomy of dark patterns by attack mode found no clear refutations across 10 candidates, indicating potential novelty in classification approach. The adversarial generation pipeline similarly encountered no refutations among 10 candidates, though the limited search scope means undiscovered prior work in synthetic task generation remains possible. The correlation findings between model capability and susceptibility appear less explored in the examined literature.

Based on top-30 semantic matches and citation expansion, the work appears to advance a relatively nascent research direction where systematic, isolated testing of dark patterns on agents is still developing methodological foundations. The analysis covers benchmark construction and empirical evaluation but does not exhaustively survey all adversarial robustness literature or real-world deployment studies, leaving open questions about how findings generalize beyond controlled environments and whether the taxonomy comprehensively captures all dark pattern attack modes.

Taxonomy

Core-task Taxonomy Papers
15
3
Claimed Contributions
30
Contribution Candidate Papers Compared
2
Refutable Paper

Research Landscape Overview

Core task: Robustness of web agents against deceptive user interface designs. The field examines how autonomous agents navigate web environments that deliberately mislead or manipulate users through dark patterns, phishing attempts, and other adversarial UI elements. The taxonomy reveals four main branches: Dark Pattern Susceptibility and Manipulation focuses on how agents respond to manipulative design tactics such as hidden costs or forced actions; Privacy and Security Threat Analysis investigates vulnerabilities including prompt injection and credential theft; Phishing and Fraudulent Content Detection addresses agents' ability to identify malicious sites and scams; and Contextual Awareness and Specialized Domains explores challenges in culturally diverse settings and domain-specific tasks. Works like Dark Patterns GUI Agents[1] and Susbench[3] exemplify benchmark-driven evaluation of agent robustness, while studies such as Phishing Detection Robustness[2] and Deep Learning Phishing Detection[5] concentrate on detection mechanisms for fraudulent content. A particularly active line of work centers on developing controlled testing environments to measure agent susceptibility to specific deceptive tactics. Dark Patterns Manipulate Agents[0] sits squarely within this cluster, emphasizing isolated dark pattern testing to systematically evaluate how agents handle manipulative UI elements in benchmark settings. This approach contrasts with broader security frameworks like Security Vulnerabilities Systematization[7] that catalog diverse threat vectors, and differs from detection-oriented studies such as Phishing Detection Robustness[2] that prioritize identifying malicious content rather than measuring behavioral manipulation. Nearby works like Susbench[3] and DECEPTICON[11] share a similar empirical focus on benchmark construction, yet Dark Patterns Manipulate Agents[0] appears to concentrate more narrowly on the behavioral impact of dark patterns themselves. Open questions remain about how findings from isolated testing environments generalize to real-world scenarios where multiple deceptive tactics combine, and whether agents can develop robust defenses without sacrificing task performance.

Claimed Contributions

DECEPTICON environment for testing dark patterns on web agents

The authors construct DECEPTICON, a controlled evaluation environment built on WebVoyager that enables systematic investigation of dark pattern effects on web agents. It includes 850 tasks (600 generated and 250 real-world) designed to isolate and measure individual dark pattern effectiveness while ensuring reproducibility through archived web pages.

10 retrieved papers
Can Refute
Operationalized taxonomy of dark patterns by mode of attack

The authors develop an action-centric taxonomy that classifies dark patterns into six categories (sneaking, urgency, misdirection, social proof, obstruction, forced action) based on their attack mechanisms rather than implementation details or website types.

10 retrieved papers
Adversarial generation pipeline for synthetic dark pattern tasks

The authors design a generation method that creates realistic dark pattern tasks by first generating base website UIs, then injecting dark patterns based on documented examples, and using an agentic scaffold with iterative testing to ensure tasks are solvable but challenging.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

DECEPTICON environment for testing dark patterns on web agents

The authors construct DECEPTICON, a controlled evaluation environment built on WebVoyager that enables systematic investigation of dark pattern effects on web agents. It includes 850 tasks (600 generated and 250 real-world) designed to isolate and measure individual dark pattern effectiveness while ensuring reproducibility through archived web pages.

Contribution

Operationalized taxonomy of dark patterns by mode of attack

The authors develop an action-centric taxonomy that classifies dark patterns into six categories (sneaking, urgency, misdirection, social proof, obstruction, forced action) based on their attack mechanisms rather than implementation details or website types.

Contribution

Adversarial generation pipeline for synthetic dark pattern tasks

The authors design a generation method that creates realistic dark pattern tasks by first generating base website UIs, then injecting dark patterns based on documented examples, and using an agentic scaffold with iterative testing to ensure tasks are solvable but challenging.

How Dark Patterns Manipulate Web Agents | Novelty Validation