Measuring Physical-World Privacy Awareness of Large Language Models: An Evaluation Benchmark
Overview
Overall Novelty Assessment
The paper introduces EAPrivacy, a benchmark for evaluating physical-world privacy awareness in LLM-powered embodied agents. It resides in the Physical-World Privacy Assessment leaf, which contains only three papers total, indicating a relatively sparse research direction. The taxonomy shows this leaf is distinct from Digital Environment Privacy Evaluation (three papers focused on virtual interfaces and memory systems) and from broader safety frameworks. This positioning suggests the work addresses an emerging gap where privacy evaluation meets physical embodiment, a less crowded area compared to digital-only privacy assessments or general safety benchmarks.
The taxonomy reveals neighboring research in Privacy-Preserving Architectures (seven papers across tool-using agents, edge deployment, and healthcare robotics) and Safety and Contextual Reasoning Frameworks (three papers on risk assessment and dynamic adaptation). The paper's focus on evaluation distinguishes it from these mitigation-oriented branches. Within Privacy Evaluation and Benchmarking, the sibling papers in Physical-World Privacy Assessment share the embodied context but may differ in evaluation methodology or scenario design. The taxonomy's scope notes clarify that attack methods and deployment architectures are excluded from this evaluation-focused branch, helping position the work as diagnostic rather than defensive.
Across three contributions examined, the analysis reviewed thirty candidate papers total, with ten candidates per contribution. None of the contributions were clearly refuted by prior work in this limited search. The EAPrivacy benchmark contribution examined ten candidates with zero refutable matches, as did the four-tiered framework and PDDL-based representation contributions. This suggests that among the top-thirty semantically similar papers identified, none provided overlapping prior work on procedurally generated physical privacy scenarios with tiered complexity. The absence of refutations across all contributions indicates potential novelty within the examined scope, though the search was not exhaustive.
Given the limited search scope of thirty candidates and the sparse three-paper leaf in the taxonomy, the work appears to occupy relatively unexplored territory at the intersection of embodied agents and privacy evaluation. The analysis covers top-K semantic matches and does not claim comprehensive field coverage. The lack of refutable prior work among examined candidates, combined with the sparse taxonomy leaf, suggests the specific combination of physical-world scenarios, tiered evaluation, and PDDL-based representation may be distinctive within the surveyed literature.
Taxonomy
Research Landscape Overview
Claimed Contributions
The authors introduce EAPrivacy, a novel benchmark that systematically evaluates LLM-powered agents' privacy awareness in physical environments through four progressive tiers: sensitive object identification, privacy in shifting environments, inferential privacy under task conflicts, and social norms versus personal privacy. The benchmark comprises over 400 procedurally generated scenarios across more than 60 unique physical scenes.
The authors design a four-tiered evaluation structure that progressively tests agents' abilities: recognizing sensitive objects in cluttered environments, adapting to dynamic physical contexts, resolving conflicts between tasks and inferred privacy constraints, and navigating ethical dilemmas where social norms conflict with personal privacy. Each tier addresses distinct aspects of privacy reasoning in physical settings.
The authors employ PDDL (Planning Domain Definition Language) format to represent physical environments and spatial relationships, moving beyond simple text descriptions. This structured approach enables systematic evaluation of agents' ability to ground privacy concepts in concrete physical spaces with explicit spatial reasoning requirements.
Core Task Comparisons
Comparisons with papers in the same taxonomy category
Contribution Analysis
Detailed comparisons for each claimed contribution
EAPrivacy benchmark for physical-world privacy awareness evaluation
The authors introduce EAPrivacy, a novel benchmark that systematically evaluates LLM-powered agents' privacy awareness in physical environments through four progressive tiers: sensitive object identification, privacy in shifting environments, inferential privacy under task conflicts, and social norms versus personal privacy. The benchmark comprises over 400 procedurally generated scenarios across more than 60 unique physical scenes.
[3] Benchmarking llm privacy recognition for social robot decision making PDF
[10] SafeMind: Benchmarking and Mitigating Safety Risks in Embodied LLM Agents PDF
[30] Hourvideo: 1-hour video-language understanding PDF
[31] Embodied understanding of driving scenarios PDF
[32] EgoNormia: Benchmarking Physical Social Norm Understanding PDF
[33] Measuring what matters: A benchmarking system for occupant satisfaction with workspace environments PDF
[34] Cognitive Edge Computing: A Comprehensive Survey on Optimizing Large Models and AI Agents for Pervasive Deployment PDF
[35] Mind the Third Eye! Benchmarking Privacy Awareness in MLLM-powered Smartphone Agents PDF
[36] Read the room: Inferring social context through dyadic interaction recognition in cyber-physical-social infrastructure systems PDF
[37] AI Should Sense Better, Not Just Scale Bigger: Adaptive Sensing as a Paradigm Shift PDF
Four-tiered evaluation framework for physically-grounded privacy
The authors design a four-tiered evaluation structure that progressively tests agents' abilities: recognizing sensitive objects in cluttered environments, adapting to dynamic physical contexts, resolving conflicts between tasks and inferred privacy constraints, and navigating ethical dilemmas where social norms conflict with personal privacy. Each tier addresses distinct aspects of privacy reasoning in physical settings.
[20] An efficient image privacy preservation scheme for smart city applications using compressive sensing and multi-level encryption PDF
[21] Doxing via the Lens: Revealing Location-related Privacy Leakage on Multi-modal Large Reasoning Models PDF
[22] Sharing and Generating Privacy-Preserving Spatio-Temporal Data Using Real-World Knowledge PDF
[23] Privacy-Preserving Personalized Fitness Recommender System P3FitRec: A Multi-level Deep Learning Approach PDF
[24] Real-World Trajectory Sharing with Local Differential Privacy PDF
[25] Hprop: Hierarchical privacy-preserving route planning for smart cities PDF
[26] Iotbeholder: A privacy snooping attack on user habitual behaviors from smart home wi-fi traffic PDF
[27] Three-tier Storage Framework Based on TBchain and IPFS for Protecting IoT Security and Privacy PDF
[28] A multi-level clustering approach for anonymizing large-scale physical activity data PDF
[29] PLASMA-Privacy-Preserved Lightweight and Secure Multi-level Authentication scheme for IoMT-based smart healthcare PDF
PDDL-based structured representation for physical environment evaluation
The authors employ PDDL (Planning Domain Definition Language) format to represent physical environments and spatial relationships, moving beyond simple text descriptions. This structured approach enables systematic evaluation of agents' ability to ground privacy concepts in concrete physical spaces with explicit spatial reasoning requirements.