Earth-Agent: Unlocking the Full Landscape of Earth Observation with Agents

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 6.0 Download Report PDF

Earth observationEarth-AgentEarth-Bench

Earth observation (EO) is essential for understanding the evolving states of the Earth system. Although recent MLLMs have advanced EO research, they still lack the capability to tackle complex tasks that require multi-step reasoning and the use of domain-specific tools. Agent-based methods offer a promising direction, but current attempts remain in their infancy, confined to RGB perception, shallow reasoning, and lacking systematic evaluation protocols. To overcome these limitations, we introduce Earth-Agent, the first agentic framework that unifies RGB and spectral EO data within an MCP-based tool ecosystem, enabling cross-modal, multi-step, and quantitative spatiotemporal reasoning beyond pretrained MLLMs. Earth-Agent supports complex scientific tasks such as geophysical parameter retrieval and quantitative spatiotemporal analysis by dynamically invoking expert tools and models across modalities. To support comprehensive evaluation, we further propose Earth-Bench, a benchmark of 248 expert-curated tasks with 13,729 images, spanning spectrum, products and RGB modalities, and equipped with a dual-level evaluation protocol that assesses both reasoning trajectories and final outcomes. We conduct comprehensive experiments varying different LLM backbones, comparisons with general agent frameworks, and comparisons with MLLMs on remote sensing benchmarks, demonstrating both the effectiveness and potential of Earth-Agent. Earth-Agent establishes a new paradigm for EO analysis, moving the field toward scientifically grounded, next-generation applications of LLMs in Earth observation. Our code and dataset will be publicly released.

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

Earth-Agent introduces an agentic framework unifying RGB and spectral Earth observation data within an MCP-based tool ecosystem, enabling cross-modal, multi-step reasoning for geophysical parameter retrieval and spatiotemporal analysis. The paper resides in the Remote Sensing-Specific Agents leaf, which contains eight papers focused on agent systems specialized for remote sensing image analysis and EO-specific reasoning tasks. This leaf sits within the broader Agentic Frameworks and Multi-Agent Systems branch, indicating a moderately populated research direction where agent-based approaches to Earth observation are actively explored but not yet saturated.

The taxonomy reveals neighboring work in General-Purpose Geospatial Agents (five papers on broad geospatial task automation) and Interactive and Conversational Agents (one paper on multi-turn dialogue for environmental analysis). The Vision-Language Models for Earth Observation branch offers an alternative paradigm without explicit agent architecture, while Tool-Use Mechanisms and Evaluation provides complementary infrastructure for assessing tool invocation. Earth-Agent bridges these areas by combining agent planning with multi-modal vision-language understanding and systematic tool orchestration, positioning itself at the intersection of agentic reasoning and domain-specific EO capabilities.

Among thirty candidates examined, the framework-level contribution (Contribution A) shows no clear refutation across ten candidates, suggesting the integrated MCP-based architecture may offer distinctive design choices. Multi-spectral image processing (Contribution B) encountered one refutable candidate among ten examined, indicating prior work addresses spectral data handling. Interactive reasoning with external tools (Contribution C) found four refutable candidates among ten, reflecting established precedent in tool-augmented EO agents. The limited search scope means these statistics capture top semantic matches rather than exhaustive field coverage, and the dual-level evaluation protocol for Earth-Bench appears less examined in the candidate set.

Given the search examined thirty papers from semantic retrieval, the analysis provides a snapshot of closely related work rather than comprehensive field coverage. The framework's integration of spectral data, MCP-based tools, and dual-level evaluation appears to combine elements present separately in prior work, though the specific architectural synthesis may offer incremental advances. The benchmark's scale (248 tasks, 13,729 images) and cross-modal scope warrant attention, but the novelty assessment remains constrained by the limited literature sample and the presence of overlapping tool-use and multi-spectral capabilities in examined candidates.

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: Multi-step reasoning and tool use for Earth observation analysis. The field structure reflects a convergence of agentic AI capabilities with domain-specific Earth observation challenges. The taxonomy organizes work into several main branches: Agentic Frameworks and Multi-Agent Systems focus on building autonomous or collaborative agents that orchestrate complex geospatial workflows; Vision-Language Models for Earth Observation adapt multimodal foundation models to interpret satellite and aerial imagery alongside textual queries; Tool-Use Mechanisms and Evaluation address how agents select, invoke, and assess specialized geospatial tools; Instruction-Following Datasets and Training Resources provide the supervised signals needed to teach models domain conventions; Task-Specific Applications and Domain Integration demonstrate end-to-end systems for problems like flood detection or forest monitoring; and Foundational Methods and Cross-Domain Techniques supply general reasoning strategies that transfer across domains. Representative works such as ThinkGeo[2] and Teochat[1] illustrate how agents can chain reasoning steps with tool calls, while Geo-olm[11] and Naiad[33] show efforts to ground vision-language understanding in geospatial semantics. A particularly active line of work explores remote sensing-specific agents that combine large language models with geospatial APIs and domain knowledge bases. Earth-Agent[0] sits within this cluster, emphasizing multi-step reasoning pipelines that decompose complex Earth observation queries into executable tool sequences. Nearby efforts like Multi-agent Remote Sensing[5] and Agentic AI Remote Sensing[29] similarly pursue collaborative or hierarchical agent architectures, though they differ in whether they prioritize single-agent orchestration or multi-agent negotiation. Compared to more general agentic frameworks, Earth-Agent[0] and its neighbors integrate domain-specific constraints—such as coordinate reference systems, temporal resolution trade-offs, and sensor modality selection—directly into the reasoning loop. Open questions remain around how to evaluate reasoning quality beyond task success, how to handle noisy or incomplete geospatial metadata, and whether hybrid approaches that blend symbolic planning with neural tool selection offer better generalization than end-to-end learned policies.

Claimed Contributions

Earth-Agent framework for comprehensive Earth observation tasks

10 retrieved papers

The authors propose Earth-Agent, a framework that extends beyond existing MLLM-based and agent-based Earth observation research by supporting multi-spectral imagery, processing numerous images simultaneously, and performing complex multi-step reasoning while integrating external tools and expert models.

10 retrieved papers

Multi-spectral image processing capability

Can Refute

10 retrieved papers

Unlike prior work limited to RGB images, Earth-Agent can handle both multi-spectral and RGB imagery, expanding the range of Earth observation data that can be analyzed.

10 retrieved papers

Can Refute

Interactive reasoning with external tools and models

Can Refute

10 retrieved papers

The framework enables complex multi-step interactive reasoning by integrating with external tools and expert models, making it extensible beyond the capabilities of standalone models.

10 retrieved papers

Can Refute

Core Task Comparisons

Comparisons with papers in the same taxonomy category

[2] ThinkGeo: Evaluating Tool-Augmented Agents for Remote Sensing Tasks PDF

Munir, Muhammad Akhtar, Dudhane Akshay, Khan, Muhammad Haris, Fraccaro Paolo, Fahad Shahbaz, Salman (2025)

[5] An LLM-based multi-agent system for remote sensing analysis PDF

Z Sun, Y Zhou, J Yang (2026)

[8] Earth AI: unlocking geospatial insights with foundation models and cross-modal reasoning PDF

Bell Aaron, Aides, Amit, Helmy, Amr, Slobodkin, Aviv, Leifman George, Sun Mi-mi, Natalie Williams, Lee Roy, Thomas Turnbull, Shekel, Tomer, Gigi, Yotam, Boulanger, Adam, Vahedi, Behzad, Elliott Charles, AndrÃ© David, Bien, Jacob, Rothenberg, Juliet, Hegde, Kartik, Jablonski, Kim Philipp, Pilarski Sebastian, Jiang, Siduo, Colthurst, Thomas, Chen Yang, Refael, Yehonathan, Blau, Yochai, Hassidim, Avinatan, Manyika, James, Beryozkin, Genady, Prasad, Gautam, Barrington, Luke, Matias, Yossi, Shetty, Shravya (2025)

[11] Geo-olm: Enabling sustainable earth observation studies with cost-efficient open language models & state-driven workflows PDF

Stamoulis, Dimitrios, Marculescu Diana (2025)

[29] Agentic AI in Remote Sensing: Foundations, Taxonomy, and Emerging Systems PDF

Niloufar Alipour Talemi, Julia Boone, Fatemeh Afghah (2026)

[30] CangLing-KnowFlow: A Unified Knowledge-and-Flow-fused Agent for Comprehensive Remote Sensing Applications PDF

Zhengchao Chen, Haoran Wang, Jing Yao, Pedram Ghamisi, Jun Zhou, Peter M. Atkinson, Bing Zhang (2025)

[33] Naiad: novel agentic intelligent autonomous system for inland water monitoring PDF

Eirini Baltzi, Tilemachos Moumouris, Athina Psalta, Vasileios Tsironis, K. Karantzalos, Konstantinos Karantzalos (2025)

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Earth-Agent framework for comprehensive Earth observation tasks

[5] An LLM-based multi-agent system for remote sensing analysis PDF

Cannot Refute

[11] Geo-olm: Enabling sustainable earth observation studies with cost-efficient open language models & state-driven workflows PDF

Cannot Refute

[41] Artificial intelligence-assisted remote sensing observation, understanding, and decision PDF

Cannot Refute

[42] GeoSR: Cognitive-Agentic Framework for Probing Geospatial Knowledge Boundaries via Iterative Self-Refinement PDF

Cannot Refute

[43] STA-CoT: Structured Target-Centric Agentic Chain-of-Thought for Consistent Multi-Image Geological Reasoning PDF

Cannot Refute

[44] UAV-CodeAgents: Scalable UAV Mission Planning via Multi-Agent ReAct and Vision-Language Reasoning PDF

Cannot Refute

[45] Asking like Socrates: Socrates helps VLMs understand remote sensing images PDF

Cannot Refute

[46] Designing Domain-Specific Agents via Hierarchical Task Abstraction Mechanism PDF

Cannot Refute

[47] Towards a Barrier-free GeoQA Portal: Natural Language Interaction with Geospatial Data Using Multi-Agent LLMs and Semantic Search PDF

Cannot Refute

[48] Remotereasoner: Towards unifying geospatial reasoning workflow PDF

Cannot Refute

Contribution

Multi-spectral image processing capability

Unlike prior work limited to RGB images, Earth-Agent can handle both multi-spectral and RGB imagery, expanding the range of Earth observation data that can be analyzed.

[58] EarthDial: Turning Multi-sensory Earth Observations to Interactive Dialogues PDF

Can Refute

[49] Multispectral and hyperspectral image fusion in remote sensing: A survey PDF

Cannot Refute

[50] RNN-based multispectral satellite image processing for remote sensing applications PDF

Cannot Refute

[51] Beyond the visible: Multispectral vision-language learning for earth observation PDF

Cannot Refute

[52] Estimating Rice SPAD Values via Multi-Sensor Data Fusion of Multispectral and RGB Cameras Using Machine Learning with a Phenotyping Robot PDF

Cannot Refute

[53] Leveraging U-Net and selective feature extraction for land cover classification using remote sensing imagery PDF

Cannot Refute

[54] Machine Learning-Based Processing of Multispectral and RGB UAV Imagery for the Multitemporal Monitoring of Vineyard Water Status PDF

Cannot Refute

[55] Estimation of Fv/Fm in Spring Wheat Using UAV-Based Multispectral and RGB Imagery with Multiple Machine Learning Methods PDF

Cannot Refute

[56] Advancing Sparse Vegetation Monitoring in the Arctic and Antarctic: A Review of Satellite and UAV Remote Sensing, Machine Learning, and Sensor Fusion PDF

Cannot Refute

[57] Multispectral data mining: A focus on remote sensing satellite images PDF

Cannot Refute

Contribution

Interactive reasoning with external tools and models

The framework enables complex multi-step interactive reasoning by integrating with external tools and expert models, making it extensible beyond the capabilities of standalone models.

[60] Reasoning on Graphs: Faithful and Interpretable Large Language Model Reasoning PDF

Can Refute

[66] Openthinkimg: Learning to think with images via visual tool reinforcement learning PDF

Can Refute

[67] ART: Automatic multi-step reasoning and tool-use for large language models PDF

Can Refute

[68] VideoAgent: Long-Form Video Understanding with Large Language Model as Agent PDF

Can Refute

[59] RAFT: Adapting Language Model to Domain Specific RAG PDF

Cannot Refute

[61] ToolQA: A Dataset for LLM Question Answering with External Tools PDF

Cannot Refute

[62] A Multitask, Multilingual, Multimodal Evaluation of ChatGPT on Reasoning, Hallucination, and Interactivity PDF

Cannot Refute

[63] Reasoning about External Calls - Coq Model PDF

Cannot Refute

[64] Qwen2.5-Math Technical Report: Toward Mathematical Expert Model via Self-Improvement PDF

Cannot Refute

[65] ToolVQA: A Dataset for Multi-step Reasoning VQA with External Tools PDF

Cannot Refute

Earth-Agent: Unlocking the Full Landscape of Earth Observation with Agents

Overview

Overall Novelty Assessment

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

[2] ThinkGeo: Evaluating Tool-Augmented Agents for Remote Sensing Tasks PDF

[5] An LLM-based multi-agent system for remote sensing analysis PDF

[8] Earth AI: unlocking geospatial insights with foundation models and cross-modal reasoning PDF

[11] Geo-olm: Enabling sustainable earth observation studies with cost-efficient open language models & state-driven workflows PDF

[29] Agentic AI in Remote Sensing: Foundations, Taxonomy, and Emerging Systems PDF

[30] CangLing-KnowFlow: A Unified Knowledge-and-Flow-fused Agent for Comprehensive Remote Sensing Applications PDF

[33] Naiad: novel agentic intelligent autonomous system for inland water monitoring PDF

Contribution Analysis

Earth-Agent framework for comprehensive Earth observation tasks

[5] An LLM-based multi-agent system for remote sensing analysis PDF

[11] Geo-olm: Enabling sustainable earth observation studies with cost-efficient open language models & state-driven workflows PDF

[41] Artificial intelligence-assisted remote sensing observation, understanding, and decision PDF

[42] GeoSR: Cognitive-Agentic Framework for Probing Geospatial Knowledge Boundaries via Iterative Self-Refinement PDF

[43] STA-CoT: Structured Target-Centric Agentic Chain-of-Thought for Consistent Multi-Image Geological Reasoning PDF

[44] UAV-CodeAgents: Scalable UAV Mission Planning via Multi-Agent ReAct and Vision-Language Reasoning PDF

[45] Asking like Socrates: Socrates helps VLMs understand remote sensing images PDF

[46] Designing Domain-Specific Agents via Hierarchical Task Abstraction Mechanism PDF

[47] Towards a Barrier-free GeoQA Portal: Natural Language Interaction with Geospatial Data Using Multi-Agent LLMs and Semantic Search PDF

[48] Remotereasoner: Towards unifying geospatial reasoning workflow PDF

Multi-spectral image processing capability

[58] EarthDial: Turning Multi-sensory Earth Observations to Interactive Dialogues PDF

[49] Multispectral and hyperspectral image fusion in remote sensing: A survey PDF

[50] RNN-based multispectral satellite image processing for remote sensing applications PDF

[51] Beyond the visible: Multispectral vision-language learning for earth observation PDF

[52] Estimating Rice SPAD Values via Multi-Sensor Data Fusion of Multispectral and RGB Cameras Using Machine Learning with a Phenotyping Robot PDF

[53] Leveraging U-Net and selective feature extraction for land cover classification using remote sensing imagery PDF

[54] Machine Learning-Based Processing of Multispectral and RGB UAV Imagery for the Multitemporal Monitoring of Vineyard Water Status PDF

[55] Estimation of Fv/Fm in Spring Wheat Using UAV-Based Multispectral and RGB Imagery with Multiple Machine Learning Methods PDF

[56] Advancing Sparse Vegetation Monitoring in the Arctic and Antarctic: A Review of Satellite and UAV Remote Sensing, Machine Learning, and Sensor Fusion PDF

[57] Multispectral data mining: A focus on remote sensing satellite images PDF

Interactive reasoning with external tools and models

[60] Reasoning on Graphs: Faithful and Interpretable Large Language Model Reasoning PDF

[66] Openthinkimg: Learning to think with images via visual tool reinforcement learning PDF

[67] ART: Automatic multi-step reasoning and tool-use for large language models PDF

[68] VideoAgent: Long-Form Video Understanding with Large Language Model as Agent PDF

[59] RAFT: Adapting Language Model to Domain Specific RAG PDF

[61] ToolQA: A Dataset for LLM Question Answering with External Tools PDF

[62] A Multitask, Multilingual, Multimodal Evaluation of ChatGPT on Reasoning, Hallucination, and Interactivity PDF

[63] Reasoning about External Calls - Coq Model PDF

[64] Qwen2.5-Math Technical Report: Toward Mathematical Expert Model via Self-Improvement PDF

[65] ToolVQA: A Dataset for Multi-step Reasoning VQA with External Tools PDF

Table of Contents