A Framework for Studying AI Agent Behavior: Evidence from Consumer Choice Experiments
Overview
Overall Novelty Assessment
The paper introduces ABxLab, a framework for systematically probing how LLM-powered agents make consumer choices under controlled manipulations of prices, ratings, and psychological nudges. Within the taxonomy, it occupies the 'Experimental Frameworks and Methodological Approaches' leaf, which currently contains only this paper—making it a sparse, methodologically focused niche. While the broader taxonomy encompasses fifty papers across diverse topics like anthropomorphism, trust, and personalization, this leaf stands alone, suggesting the paper addresses a methodological gap in how researchers study agentic decision-making behavior.
The taxonomy reveals dense neighboring branches examining consumer responses to AI design features (anthropomorphism, communication style) and behavioral outcomes (trust, autonomy, purchase decisions). The original paper diverges from these empirical applications by offering a controlled experimental testbed rather than studying consumer perceptions or adoption. Its closest conceptual neighbors—such as work on recommendation nudging and behavioral economics—focus on human responses to AI, whereas this framework evaluates the agents themselves. The taxonomy's scope notes clarify that this leaf excludes theoretical reviews and empirical applications, positioning the paper as a methodological contribution distinct from the field's dominant empirical and theoretical streams.
Among twenty-six candidates examined, the contribution-level analysis shows varied novelty. The ABxLab framework itself (ten candidates examined, zero refutations) appears methodologically distinct within the limited search scope. The scalable benchmark contribution (six candidates, zero refutations) similarly lacks direct prior work among examined papers. However, the empirical finding of systematic biases in LLM agents (ten candidates, one refutation) encounters at least one overlapping study, suggesting this observation may not be entirely new. The analysis explicitly notes this is a top-K semantic search, not an exhaustive review, so these statistics reflect a bounded literature sample rather than definitive field coverage.
Given the limited search scope and the paper's placement in a singleton taxonomy leaf, the framework contribution appears methodologically novel within the examined literature. The empirical bias findings, while supported by controlled experiments, show some overlap with prior work. The taxonomy structure suggests the paper occupies a sparse methodological niche, though the small candidate pool (twenty-six papers) means this assessment is provisional and would benefit from broader literature coverage to confirm the framework's distinctiveness relative to adjacent experimental and evaluation methodologies.
Taxonomy
Research Landscape Overview
Claimed Contributions
The authors present ABXLAB, an open-source man-in-the-middle framework that intercepts and modifies real-world web content in real-time, transforming arbitrary websites into controllable behavioral testbeds for studying AI agent decision-making under controlled experimental conditions.
The authors contribute a comprehensive benchmark consisting of over 80,000 experiments across 17 models, systematically testing agent responses to various interventions including authority cues, social proof, scarcity, negative framing, and incentives in realistic web-based shopping environments.
The authors provide empirical evidence demonstrating that LLM agents exhibit strong, systematic biases in response to ratings, prices, order effects, and nudges, with effect sizes substantially larger than human baselines, revealing that agents are hypersensitive to choice architecture manipulations.
Core Task Comparisons
Comparisons with papers in the same taxonomy category
Contribution Analysis
Detailed comparisons for each claimed contribution
ABXLAB framework for studying AI agent behavior
The authors present ABXLAB, an open-source man-in-the-middle framework that intercepts and modifies real-world web content in real-time, transforming arbitrary websites into controllable behavioral testbeds for studying AI agent decision-making under controlled experimental conditions.
[61] Trust and reliance on AIâAn experimental study on the extent and costs of overreliance on AI PDF
[62] Explanations can reduce overreliance on ai systems during decision-making PDF
[63] Agent q: Advanced reasoning and learning for autonomous ai agents PDF
[64] Voting or Consensus? Decision-Making in Multi-Agent Debate PDF
[65] HADA: Human-AI Agent Decision Alignment Architecture PDF
[66] Interactive AI agent for code refactoring assistance: A study on decision-making strategies and human-agent collaboration effectiveness PDF
[67] Analyzing operator states and the impact of ai-enhanced decision support in control rooms: A human-in-the-loop specialized reinforcement learning framework for ⦠PDF
[68] The Amplifying Effect of Explainability in AI-assisted Decision-making in Groups PDF
[69] PathFinder: A Multi-Modal Multi-Agent System for Medical Diagnostic Decision-Making Applied to Histopathology PDF
[70] Utilizing human behavior modeling to manipulate explanations in AI-assisted decision making: the good, the bad, and the scary PDF
Scalable benchmark for evaluating agent decision-making
The authors contribute a comprehensive benchmark consisting of over 80,000 experiments across 17 models, systematically testing agent responses to various interventions including authority cues, social proof, scarcity, negative framing, and incentives in realistic web-based shopping environments.
[71] PsyScam: A Benchmark for Psychological Techniques in Real-World Scams PDF
[72] Belief in Authority: Impact of Authority in Multi-Agent Evaluation Framework PDF
[73] The influence of real estate agents on investors decisions in cyprus PDF
[74] Afterward: Ignorance Studies PDF
[75] Psychology of Phishing Emails: Quantifying Persuasion Principles and Simulating Detection with Large Language Models PDF
[76] Delta One, Linguist Zero? Identity, Epistemic Agency, and Rapport Management in Airline Upgrade Negotiations PDF
Empirical evidence of systematic biases in LLM agents
The authors provide empirical evidence demonstrating that LLM agents exhibit strong, systematic biases in response to ratings, prices, order effects, and nudges, with effect sizes substantially larger than human baselines, revealing that agents are hypersensitive to choice architecture manipulations.