MoRA: Mobility as the Backbone for Geospatial Representation Learning at Scale

ICLR 2026 Conference SubmissionAnonymous Authors
GeoAIspatial representation learninglocation embeddingmulti-modalcontrastive learning
Abstract:

Representation learning of geospatial locations remains a core challenge in achieving general geospatial intelligence, with increasingly diverging philosophies and techniques. While Earth observation paradigms excel at depicting locations in their physical states, we propose that a location’s full characterization requires grounding in both its physical attributes and its internal human activity pattern, the latter being particularly crucial for understanding its human-centric functions. We present MoRA, a human-centric geospatial framework that leverages a mobility graph as its core backbone to fuse various data modalities, aiming to learn embeddings that represent the socio-economic context and functional role of a location. MoRA achieves this through the integration of spatial tokenization, GNNs, and asymmetric contrastive learning to align 100M+ POIs, massive remote sensing imagery, and structured demographic statistics with a billion-edge mobility graph, ensuring the three auxiliary modalities are interpreted through the lens of fundamental human dynamics. To rigorously evaluate the effectiveness of MoRA, we construct a benchmark dataset composed of 9 downstream prediction tasks across social and economic domains. Experiments show that MoRA, with four input modalities and a compact 128-dimensional representation space, achieves superior predictive performances than state-of-the-art models by an average of 12.9%. Echoing LLM scaling laws, we further demonstrate the scaling behavior in geospatial representation learning. We open-source code and pretrained models at: https://anonymous.4open.science/r/MoRA-.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

MoRA proposes a human-centric geospatial framework that uses a mobility graph as its core backbone to fuse POIs, remote sensing imagery, and demographic statistics, learning embeddings that represent socio-economic context and functional roles of locations. The paper resides in the 'Scalable Geospatial Representation Learning' leaf under 'Foundation Models and Large-Scale Geospatial Learning', which contains only three papers total. This is a relatively sparse research direction within the broader taxonomy of fifty papers, suggesting that large-scale, general-purpose geospatial representation learning remains an emerging area compared to more crowded branches like multimodal fusion or mobility prediction.

The taxonomy reveals that MoRA's neighboring work spans multiple branches: 'Multimodal Fusion Frameworks' (thirteen papers across three sub-leaves) explores integration strategies but often without mobility as the primary organizing principle, while 'Mobility-Driven Region Representation Learning' (six papers) focuses on mobility patterns but typically without the scale or multimodal scope MoRA claims. The 'Contrastive and Self-Supervised Multimodal Learning' sub-leaf (four papers) shares methodological overlap in using contrastive objectives, yet those works do not explicitly position mobility graphs as the interpretive lens for auxiliary modalities. MoRA's approach of grounding physical and demographic data through human dynamics appears to bridge these directions.

Among thirty candidates examined, the framework contribution shows two refutable candidates out of ten examined, the benchmark contribution also has two refutable candidates from ten, and the scaling laws contribution has one refutable candidate from ten. The statistics indicate that while some prior work exists in each area, the search scope was limited and the majority of examined candidates did not clearly overlap. The framework's emphasis on asymmetric contrastive learning and billion-edge mobility graphs at scale distinguishes it from smaller-scale or single-modality methods, though the extent of novelty depends on how thoroughly the limited candidate pool represents the field.

Based on the top-thirty semantic matches and citation expansion, MoRA appears to occupy a relatively under-explored intersection of mobility-centric reasoning and large-scale multimodal fusion. The analysis does not cover exhaustive literature in urban computing or remote sensing communities, and the taxonomy's sparse 'Scalable Geospatial Representation Learning' leaf suggests this direction is still maturing. The contribution-level statistics hint at incremental overlap rather than wholesale redundancy, though a broader search might reveal additional related efforts.

Taxonomy

Core-task Taxonomy Papers
50
3
Claimed Contributions
30
Contribution Candidate Papers Compared
5
Refutable Paper

Research Landscape Overview

Core task: geospatial representation learning using multimodal data and mobility graphs. The field has evolved into several interconnected branches that address different facets of learning meaningful region embeddings. Mobility-Driven Region Representation Learning focuses on extracting patterns from human movement data, often leveraging trajectory flows and transition graphs to capture functional connectivity between areas. Multimodal Fusion Frameworks for Region Representation integrate diverse data sources—such as points of interest, satellite imagery, and social media—to build richer semantic profiles, as seen in works like Effective urban region representation[2] and MGRL4RE[5]. Graph Neural Network Architectures for Regions provide the structural backbone for encoding spatial relationships, while Remote Sensing and Spatial Context Integration emphasizes the role of imagery and environmental features. Mobility Prediction and Forecasting applies these representations to anticipate future flows, and Foundation Models and Large-Scale Geospatial Learning explores scalable pretraining strategies that generalize across cities and tasks. Specialized Applications and Emerging Paradigms address domain-specific challenges, from urban planning to location recommendation. Recent efforts have increasingly turned toward scalable, self-supervised methods that can handle the heterogeneity and volume of geospatial data. Within the Foundation Models and Large-Scale Geospatial Learning branch, MoRA[0] sits alongside other scalable approaches like MobCLIP[27] and Temporal Embeddings[50], emphasizing efficient representation learning that can transfer across diverse urban contexts. While MobCLIP[27] leverages contrastive learning on mobility traces and MoRA[0] focuses on mobility-aware region aggregation, both share the goal of reducing reliance on task-specific labels. In contrast, earlier methods such as Region representation learning via[1] and Unsupervised Representation Learning of[18] laid foundational ideas but operated at smaller scales. The central tension across these lines of work involves balancing expressiveness—capturing fine-grained spatial semantics—with computational efficiency and the ability to generalize to new regions with limited supervision.

Claimed Contributions

MoRA framework using mobility as backbone for multimodal geospatial representation learning

The authors introduce MoRA, a framework that positions human mobility graphs as the central structural backbone for aligning multiple geospatial data modalities (POIs, satellite imagery, demographics). This mobility-centric design ensures all auxiliary modalities are interpreted through fundamental human dynamics, producing comprehensive location embeddings for socio-economic inference.

10 retrieved papers
Can Refute
Benchmark dataset for human-centric geospatial representation evaluation

The authors curate a benchmark comprising 9 diverse downstream tasks spanning social and economic domains at multiple spatial scales (point, grid, county, city). This benchmark enables rigorous evaluation of geospatial representation quality for human-centric inference tasks.

10 retrieved papers
Can Refute
Empirical evidence of scaling laws in geospatial representation learning

The authors demonstrate that geospatial representation learning exhibits scaling behavior analogous to large language models: increasing pretraining data size and spatial coverage from local to national scales consistently improves downstream task performance, revealing predictable performance gains with scale.

10 retrieved papers
Can Refute

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

MoRA framework using mobility as backbone for multimodal geospatial representation learning

The authors introduce MoRA, a framework that positions human mobility graphs as the central structural backbone for aligning multiple geospatial data modalities (POIs, satellite imagery, demographics). This mobility-centric design ensures all auxiliary modalities are interpreted through fundamental human dynamics, producing comprehensive location embeddings for socio-economic inference.

Contribution

Benchmark dataset for human-centric geospatial representation evaluation

The authors curate a benchmark comprising 9 diverse downstream tasks spanning social and economic domains at multiple spatial scales (point, grid, county, city). This benchmark enables rigorous evaluation of geospatial representation quality for human-centric inference tasks.

Contribution

Empirical evidence of scaling laws in geospatial representation learning

The authors demonstrate that geospatial representation learning exhibits scaling behavior analogous to large language models: increasing pretraining data size and spatial coverage from local to national scales consistently improves downstream task performance, revealing predictable performance gains with scale.