Doxing via the Lens: Revealing Location-related Privacy Leakage on Multi-modal Large Reasoning Models

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 6.0 Download Report PDF

Privacy Leakage

Recent advances in multi-modal large reasoning models (MLRMs) have shown significant ability to interpret complex visual content. While these models possess impressive reasoning capabilities, they also introduce novel and underexplored privacy risks. In this paper, we identify a novel category of privacy leakage in MLRMs: Adversaries can infer sensitive geolocation information, such as users' home addresses or neighborhoods, from user-generated images, including selfies captured in private settings. To formalize and evaluate these risks, we propose a three-level privacy risk framework that categorizes image based on contextual sensitivity and potential for geolocation inference. We further introduce DoxBench, a curated dataset of 500 real-world images reflecting diverse privacy scenarios divided into 6 categories. Our evaluation across 13 advanced MLRMs and MLLMs demonstrates that most of these models outperform non-expert humans in geolocation inference and can effectively leak location-related private information. This significantly lowers the barrier for adversaries to obtain users' sensitive geolocation information. We further analyze and identify two primary factors contributing to this vulnerability: (1) MLRMs exhibit strong geolocation reasoning capabilities by leveraging visual clues in combination with their internal world knowledge; and (2) MLRMs frequently rely on privacy-related visual clues for inference without any built-in mechanisms to suppress or avoid such usage. To better understand and demonstrate real-world attack feasibility, we propose GeoMiner, a collaborative attack framework that decomposes the prediction process into two stages consisting of clue extraction and reasoning to improve geolocation performance. Our findings highlight the urgent need to reassess inference-time privacy risks in MLRMs to better protect users' sensitive information.

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper introduces DoxBench, a 500-image dataset with a three-level privacy risk framework, alongside ClueMiner and GeoMiner tools for analyzing geolocation inference attacks by multi-modal models. It resides in the Privacy Risk Assessment and Mitigation leaf, which contains five papers total—a moderately populated niche within the broader 50-paper taxonomy. This leaf focuses specifically on identifying and quantifying privacy threats from geolocation systems, distinguishing it from performance-oriented benchmarking or model development branches. The work addresses adversarial doxing scenarios, a narrower framing than general geo-privacy policy or mitigation strategies explored by sibling papers.

The taxonomy reveals that Privacy Risk Assessment sits under Evaluation, Benchmarking, and Privacy Analysis, adjacent to Benchmark Datasets and Comparative Evaluation and Geospatial AI and Trajectory Prediction. Neighboring branches like Multi-Modal Foundation Model Architectures focus on advancing model capabilities (e.g., Gaea, LLMGeo), while Specialized Geolocation Contexts address domain-specific challenges such as disaster response or indoor localization. The paper's emphasis on adversarial exploitation of existing models contrasts with these capability-building efforts, positioning it as a critical counterpoint that examines societal risks rather than technical performance gains. Its scope excludes mitigation mechanisms beyond analysis, per the leaf's exclude_note.

Among 28 candidates examined, none clearly refute the three core contributions. The DoxBench dataset and privacy framework (8 candidates, 0 refutable) appear novel in their focus on real-world doxing scenarios with structured risk categorization. ClueMiner (10 candidates, 0 refutable) and GeoMiner (10 candidates, 0 refutable) show no direct prior work within the limited search scope. Sibling papers like Geolocation Privacy Risks and Granular Privacy Control address related privacy concerns but do not present equivalent datasets or collaborative attack frameworks. The absence of refutable candidates suggests these specific artifacts are new, though the search scale limits certainty about exhaustive prior work.

Based on top-28 semantic matches, the work introduces concrete evaluation artifacts—dataset, risk taxonomy, and attack tools—that fill a gap in adversarial privacy analysis for geolocation models. The limited search scope means undiscovered prior work may exist, particularly in adjacent security or privacy communities outside the core geolocation literature. The contributions appear incremental in concept (privacy risks are known) but novel in execution (structured benchmarks for doxing attacks). Further investigation of broader security venues would strengthen confidence in this assessment.

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: geolocation inference from user-generated images using multi-modal models. The field has evolved around several complementary branches. Multi-Modal Foundation Model Architectures and Training focuses on developing and fine-tuning large-scale vision-language models that can reason about geographic cues, as seen in works like Gaea[2] and GeoLocSFT[3]. Multi-Modal Data Fusion and Representation Learning explores how to combine visual features with textual metadata, temporal signals, or other modalities to improve localization accuracy. Specialized Geolocation Contexts and Applications addresses domain-specific challenges such as disaster response (Disaster Geolocalization[5]), news verification (News Photo Geolocation[16]), or street-level positioning (Street-Level Geolocalization[10]). Semantic Geolocation and Address Prediction targets fine-grained outputs like postal addresses or natural-language place descriptions, while Evaluation, Benchmarking, and Privacy Analysis examines both performance metrics and the societal risks of increasingly powerful geolocation systems. Recent work highlights a tension between advancing model capabilities and mitigating privacy harms. On one hand, studies like Omnigeo[17] and LLMGeo[31] push the frontier of what multi-modal models can infer from minimal visual clues. On the other hand, privacy-focused research investigates how easily such models can be exploited for malicious purposes. Doxing via Lens[0] sits squarely within the Privacy Risk Assessment and Mitigation cluster, examining adversarial scenarios where geolocation inference threatens individual anonymity. It shares thematic concerns with Geolocation Privacy Risks[8] and Granular Privacy Control[14], which also explore safeguards and threat models, yet differs in its emphasis on real-world doxing attacks rather than broader policy frameworks. This line of inquiry underscores an urgent open question: how to balance the utility of geolocation models in legitimate applications against their potential for abuse.

Claimed Contributions

DOXBENCH dataset and three-level privacy risk framework

8 retrieved papers

The authors introduce a novel three-level framework (individual risk, household risk, and both) grounded in GDPR and CCPA regulations to categorize privacy risks in images. They also construct DOXBENCH, a benchmark dataset of 500 high-resolution images from California representing diverse privacy scenarios across six categories to evaluate location-related privacy leakage.

8 retrieved papers

CLUEMINER analysis tool

10 retrieved papers

The authors develop CLUEMINER, a test-time adaptation algorithm that iteratively derives unified semantic categories of visual clues from unstructured model reasoning outputs. This tool reveals that MLRMs frequently rely on privacy-sensitive visual clues without built-in mechanisms to suppress such usage.

10 retrieved papers

GEOMINER collaborative attack framework

10 retrieved papers

The authors propose GEOMINER, a two-stage framework simulating realistic adversarial scenarios where a Detector MLLM extracts visual clues and an Analyzer MLLM uses them for geolocation inference. This framework demonstrates how attackers can amplify location-related privacy leakage by providing contextual hints to MLLMs.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

[8] Evaluation of Geolocation Capabilities of Multimodal Large Language Models and Analysis of Associated Privacy Risks PDF

Zhang Xian, Cheng Xiang (2025)

[14] Granular privacy control for geolocation with vision language models PDF

Chen Yang, Das, Sauvik, Hays, James, Mendes, Ethan, Ritter, Alan, Xu Wei (2024)

[20] GeoLocator: A location-integrated large multimodal model (LMM) for inferring geo-privacy PDF

Yifan Yang, Siqin Wang, Daoyang Li, Shuju Sun, Qingyang Wu (2024)

[44] AI Knows Where You Are: Exposure, Bias, and Inference in Multimodal Geolocation with KoreaGEO PDF

Xiaonan Wang, Bo Shao, Hansaem Kim (2025)

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

DOXBENCH dataset and three-level privacy risk framework

[51] Image-based geolocation using large vision-language models PDF

Cannot Refute

[52] Where you go is who you are: a study on machine learning based semantic privacy attacks PDF

Cannot Refute

[53] The long road to computational location privacy: A survey PDF

Cannot Refute

[54] From object obfuscation to contextually-dependent identification: enhancing automated privacy protection in street-level image platforms (SLIPs) PDF

Cannot Refute

[55] Systematic Evaluation of Geolocation Privacy Mechanisms PDF

Cannot Refute

[56] Context Adaptive Personalized Privacy for Location-based Systems PDF

Cannot Refute

[57] Cardea: Context-aware visual privacy protection for photo taking and sharing PDF

Cannot Refute

[58] Protecting location privacy in mobile geoservices using fuzzy inference systems PDF

Cannot Refute

Contribution

CLUEMINER analysis tool

[8] Evaluation of Geolocation Capabilities of Multimodal Large Language Models and Analysis of Associated Privacy Risks PDF

Cannot Refute

[59] Using generative AI to investigate medical imagery models and datasets PDF

Cannot Refute

[60] Visual content privacy protection: A survey PDF

Cannot Refute

[61] Privacy-preserving visual localization with event cameras PDF

Cannot Refute

[62] You can use but cannot recognize: Preserving visual privacy in deep neural networks PDF

Cannot Refute

[63] Image-guided topic modeling for interpretable privacy classification PDF

Cannot Refute

[64] Deep gated multi-modal fusion for image privacy prediction PDF

Cannot Refute

[65] Learning Privacy from Visual Entities PDF

Cannot Refute

[66] A user-centric context-aware framework for real-time optimisation of multimedia data privacy protection, and information retention within multimodal AI systems PDF

Cannot Refute

[67] Connecting Visual Data to Privacy: Predicting and Measuring Privacy Risks in Images PDF

Cannot Refute

Contribution

GEOMINER collaborative attack framework

[68] Cross-view image sequence geo-localization PDF

Cannot Refute

[69] Swarm intelligence in geo-localization: A multi-agent large vision-language model collaborative framework PDF

Cannot Refute

[70] Sequence matching for image-based uav-to-satellite geolocalization PDF

Cannot Refute

[71] Robust visual localization across seasons PDF

Cannot Refute

[72] Enhancing Scene Coordinate Regression With Efficient Keypoint Detection and Sequential Information PDF

Cannot Refute

[73] Particle Filter Networks with Application to Visual Localization PDF

Cannot Refute

[74] Sequential Monte Carlo localization in topometric appearance maps PDF

Cannot Refute

[75] CLoc: Confident Initial Estimation of Long-Term Visual Localization Using a Few Sequential Images in Large-Scale Spaces PDF

Cannot Refute

[76] Using image sequences for long-term visual localization PDF

Cannot Refute

[77] Multi-stage Collaborative filtering for Tweet Geolocation PDF

Cannot Refute

Doxing via the Lens: Revealing Location-related Privacy Leakage on Multi-modal Large Reasoning Models

Overview

Overall Novelty Assessment

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

[8] Evaluation of Geolocation Capabilities of Multimodal Large Language Models and Analysis of Associated Privacy Risks PDF

[14] Granular privacy control for geolocation with vision language models PDF

[20] GeoLocator: A location-integrated large multimodal model (LMM) for inferring geo-privacy PDF

[44] AI Knows Where You Are: Exposure, Bias, and Inference in Multimodal Geolocation with KoreaGEO PDF

Contribution Analysis

DOXBENCH dataset and three-level privacy risk framework

[51] Image-based geolocation using large vision-language models PDF

[52] Where you go is who you are: a study on machine learning based semantic privacy attacks PDF

[53] The long road to computational location privacy: A survey PDF

[54] From object obfuscation to contextually-dependent identification: enhancing automated privacy protection in street-level image platforms (SLIPs) PDF

[55] Systematic Evaluation of Geolocation Privacy Mechanisms PDF

[56] Context Adaptive Personalized Privacy for Location-based Systems PDF

[57] Cardea: Context-aware visual privacy protection for photo taking and sharing PDF

[58] Protecting location privacy in mobile geoservices using fuzzy inference systems PDF

CLUEMINER analysis tool

[8] Evaluation of Geolocation Capabilities of Multimodal Large Language Models and Analysis of Associated Privacy Risks PDF

[59] Using generative AI to investigate medical imagery models and datasets PDF

[60] Visual content privacy protection: A survey PDF

[61] Privacy-preserving visual localization with event cameras PDF

[62] You can use but cannot recognize: Preserving visual privacy in deep neural networks PDF

[63] Image-guided topic modeling for interpretable privacy classification PDF

[64] Deep gated multi-modal fusion for image privacy prediction PDF

[65] Learning Privacy from Visual Entities PDF

[66] A user-centric context-aware framework for real-time optimisation of multimedia data privacy protection, and information retention within multimodal AI systems PDF

[67] Connecting Visual Data to Privacy: Predicting and Measuring Privacy Risks in Images PDF

GEOMINER collaborative attack framework

[68] Cross-view image sequence geo-localization PDF

[69] Swarm intelligence in geo-localization: A multi-agent large vision-language model collaborative framework PDF

[70] Sequence matching for image-based uav-to-satellite geolocalization PDF

[71] Robust visual localization across seasons PDF

[72] Enhancing Scene Coordinate Regression With Efficient Keypoint Detection and Sequential Information PDF

[73] Particle Filter Networks with Application to Visual Localization PDF

[74] Sequential Monte Carlo localization in topometric appearance maps PDF

[75] CLoc: Confident Initial Estimation of Long-Term Visual Localization Using a Few Sequential Images in Large-Scale Spaces PDF

[76] Using image sequences for long-term visual localization PDF

[77] Multi-stage Collaborative filtering for Tweet Geolocation PDF

Table of Contents