Image Quality Assessment for Embodied AI

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 7.0 Download Report PDF

Image Quality Assessment; Image Processing; Perceptual Quality; Embodied AI;

Embodied AI has developed rapidly in recent years, but it is still mainly deployed in laboratories, with various distortions in the Real-world limiting its application. Traditionally, Image Quality Assessment (IQA) methods are applied to predict human preferences for distorted images; however, there is no IQA method to assess the usability of an image in embodied tasks, namely, the perceptual quality for robots. To provide accurate and reliable quality indicators for future embodied scenarios, we first propose the topic: IQA for Embodied AI. Specifically, we (1) based on the Mertonian system and meta-cognitive theory, constructed a perception-cognition-decision-execution pipeline and defined a comprehensive subjective score collection process; (2) established the Embodied-IQA database, containing over 30k reference/distorted image pairs, with more than 5m fine-grained annotations provided by Vision Language Models/Vision Language Action-models/Real-world robots; (3) trained and validated the performance of mainstream IQA methods on Embodied-IQA, demonstrating the need to develop more accurate quality indicators for Embodied AI. We sincerely hope that through evaluation, we can promote the application of Embodied AI under complex distortions in the Real-world.

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper proposes a perception-cognition-decision-execution pipeline for assessing image quality in embodied AI contexts, establishes the Embodied-IQA database with over 30,000 image pairs and 5 million annotations from vision-language models and real robots, and benchmarks mainstream IQA methods on this data. Within the taxonomy, it resides in the 'Embodied-Specific Quality Assessment' leaf under 'Quality Assessment Frameworks and Benchmarks', alongside three sibling papers. This leaf represents a relatively sparse research direction within a 50-paper taxonomy spanning 23 leaf nodes, suggesting the work addresses an emerging rather than saturated area.

The taxonomy reveals that quality assessment for embodied AI sits at the intersection of multiple research streams. Neighboring leaves include 'World Model and Generative Content Evaluation' (assessing scene quality and physical plausibility in generative systems) and 'General Visual Quality Assessment' (broader multimedia quality metrics). The paper's focus on robot-centric usability distinguishes it from general visual quality work, while its emphasis on task-driven metrics connects to navigation and manipulation branches. The taxonomy's scope notes clarify that embodied-specific quality assessment excludes general multimedia metrics, positioning this work as bridging perceptual quality and downstream task performance.

Among 24 candidates examined across three contributions, the analysis found limited prior work overlap. The perception-cognition-decision-execution pipeline examined 10 candidates with 1 potential refutation; the database construction examined 4 candidates with 1 refutation; and the benchmark evaluation examined 10 candidates with 2 refutations. These statistics suggest that within the top-24 semantic matches, most contributions appear relatively novel, though the search scope is modest. The pipeline and database contributions show particularly sparse prior work, while the benchmarking component encounters slightly more existing evaluation efforts.

Based on this limited search of 24 candidates, the work appears to occupy a relatively underexplored niche at the intersection of image quality assessment and embodied task performance. The sparse sibling count and low refutation rates suggest novelty, though the analysis does not cover exhaustive literature review or domain-specific venues. The taxonomy structure indicates this is an emerging research direction rather than a mature subfield, consistent with the observed scarcity of directly comparable prior work.

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: Image quality assessment for embodied artificial intelligence tasks. The field spans a diverse set of challenges, from developing quality metrics and benchmarks tailored to embodied settings, to building robust visual perception systems, navigation and manipulation capabilities, scene reconstruction methods, generative world models, and comprehensive simulation platforms. At the top level, the taxonomy organizes work into eight major branches: Quality Assessment Frameworks and Benchmarks focuses on metrics and evaluation protocols specific to embodied contexts (e.g., Perceptual Quality Embodied[1], Embodied Image Quality[12]); Visual Perception and Representation Learning addresses how agents encode and interpret visual input (e.g., Artificial Visual Cortex[3], Visual Embedding Distillation[6]); Embodied Navigation and Spatial Reasoning explores goal-driven movement and spatial understanding (e.g., Objectnav Revisited[5], Omnidirectional Spatial Reasoning[18]); Manipulation and Interaction examines physical interaction with objects (e.g., PerTouch[39], TextToucher[25]); Scene Reconstruction and 3D Representation deals with building spatial models (e.g., Lightweight Gaussian Splatting[22], OGGSplat[46]); Generative Models and World Simulation investigates predictive and generative approaches (e.g., Generative Physical AI[17], World in World[19]); Simulation Platforms and Datasets provides testbeds and data resources (e.g., Habitat Matterport[9], Ewmbench[2]); and System Integration and Applications brings these components together in real-world deployments (e.g., Multimodal Indoor Robotics[4], Embodied AI Vehicular[15]). A particularly active line of work centers on defining and measuring perceptual quality in ways that align with embodied task performance, contrasting traditional image quality metrics with task-driven assessments. Image Quality Embodied AI[0] sits squarely within the Quality Assessment Frameworks and Benchmarks branch, specifically under Embodied-Specific Quality Assessment, where it joins efforts like Perceptual Quality Embodied[1] and RGC-VQA[49] in developing metrics that account for agent-centric visual demands. While Perceptual Quality Embodied[1] emphasizes human-aligned perceptual measures, Image Quality Embodied AI[0] appears to focus more directly on how image degradation affects downstream embodied task success, bridging quality assessment with navigation and manipulation outcomes. This contrasts with broader embodied AI surveys (e.g., Embodied AI Survey[14]) that catalog task types without deep dives into quality metrics, and with works like Embodied Image Compression[41] that optimize compression for embodied scenarios. The central tension across these branches involves balancing perceptual fidelity, computational efficiency, and task-specific relevance—questions that remain open as embodied systems scale to more complex, real-world environments.

Claimed Contributions

Perception-cognition-decision-execution pipeline for Embodied AI quality assessment

Can Refute

10 retrieved papers

The authors develop a theoretical framework grounded in Mertonian systems and meta-cognitive theory that structures Embodied AI evaluation into four stages: perception, cognition, decision, and execution. This pipeline defines how to collect quality scores for robotic tasks.

10 retrieved papers

Can Refute

Embodied-IQA database with multi-stage annotations

Can Refute

4 retrieved papers

The authors create a large-scale database of reference and distorted image pairs for embodied tasks, annotated by VLMs, VLAs, and real robots. This resource provides fine-grained labels across cognition, decision, and execution stages to support quality metric development.

4 retrieved papers

Can Refute

Benchmark evaluation of IQA methods for Embodied AI

Can Refute

10 retrieved papers

The authors evaluate 15 existing IQA methods on their Embodied-IQA database, showing that current approaches are insufficient for robotic perception tasks. They also conduct real-world robot experiments to reveal connections among cognition, decision, and execution.

10 retrieved papers

Can Refute

Core Task Comparisons

Comparisons with papers in the same taxonomy category

[1] Perceptual Quality Assessment for Embodied AI PDF

C Li, J Xiao, J Zhang, F Wen, Z Zhang, Y Tian (2025)

[12] Embodied Image Quality Assessment for Robotic Intelligence PDF

Zhang Jianbo, LI Chun-Yi, Hao Jie, Jia Jun, Duan Hui-yu, Zheng Guoquan, Yuan Liang, Zhai, Guangtao (2024) • arXiv.org

[49] RGC-VQA: An Exploration Database for Robotic-Generated Video Quality Assessment PDF

Jin Jianing, Ying Jiangyong, Duan Hui-yu, Yang Liu, Wu, Sijing, Li Yunhao, Min, Xiongkuo, Zhai, Guangtao (2025)

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Perception-cognition-decision-execution pipeline for Embodied AI quality assessment

[1] Perceptual Quality Assessment for Embodied AI PDF

Can Refute

[26] Embodied Question Answering PDF

Cannot Refute

[54] Agentic Robot: A Brain-Inspired Framework for Vision-Language-Action Models in Embodied Agents PDF

Cannot Refute

[55] A survey on vision-language-action models for embodied ai PDF

Cannot Refute

[56] Large model empowered embodied ai: A survey on decision-making and embodied learning PDF

Cannot Refute

[57] Embodied intelligence-based perception, decision-making, and control for autonomous operations of rail transportation PDF

Cannot Refute

[58] Embodied ai agents: Modeling the world PDF

Cannot Refute

[59] Thinking and Moving: An Efficient Computing Approach for Integrated Task and Motion Planning in Cooperative Embodied AI Systems (Invited Paper) PDF

Cannot Refute

[60] Robobench: A Comprehensive Evaluation Benchmark for Multimodal Large Language Models as Embodied Brain PDF

Cannot Refute

[61] Embodied AI with Large Language Models: A Survey and New HRI Framework PDF

Cannot Refute

Contribution

Embodied-IQA database with multi-stage annotations

[12] Embodied Image Quality Assessment for Robotic Intelligence PDF

Can Refute

[51] Point-It-Out: Benchmarking Embodied Reasoning for Vision Language Models in Multi-Stage Visual Grounding PDF

Cannot Refute

[52] A survey of embodied ai in healthcare: Techniques, applications, and opportunities PDF

Cannot Refute

[53] Embodied-R1: Reinforced Embodied Reasoning for General Robotic Manipulation PDF

Cannot Refute

Contribution

Benchmark evaluation of IQA methods for Embodied AI

[1] Perceptual Quality Assessment for Embodied AI PDF

Can Refute

[12] Embodied Image Quality Assessment for Robotic Intelligence PDF

Can Refute

[2] Ewmbench: Evaluating scene, motion, and semantic quality in embodied world models PDF

Cannot Refute

[10] Evaluating Uncertainty and Quality of Visual Language Action-enabled Robots PDF

Cannot Refute

[62] Improved no-reference image quality assessment algorithm based on visual perception characteristics PDF

Cannot Refute

[63] Fast underwater image enhancement for improved visual perception PDF

Cannot Refute

[64] Research Progress on Color Image Quality Assessment PDF

Cannot Refute

[65] NaviTrace: Evaluating Embodied Navigation of Vision-Language Models PDF

Cannot Refute

[66] An exploration of embodied visual exploration PDF

Cannot Refute

[67] MEFormer: Enhancing Low-Light Images While Preserving Image Authenticity in Mining Environments PDF

Cannot Refute

Image Quality Assessment for Embodied AI

Overview

Overall Novelty Assessment

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

[1] Perceptual Quality Assessment for Embodied AI PDF

[12] Embodied Image Quality Assessment for Robotic Intelligence PDF

[49] RGC-VQA: An Exploration Database for Robotic-Generated Video Quality Assessment PDF

Contribution Analysis

Perception-cognition-decision-execution pipeline for Embodied AI quality assessment

[1] Perceptual Quality Assessment for Embodied AI PDF

[26] Embodied Question Answering PDF

[54] Agentic Robot: A Brain-Inspired Framework for Vision-Language-Action Models in Embodied Agents PDF

[55] A survey on vision-language-action models for embodied ai PDF

[56] Large model empowered embodied ai: A survey on decision-making and embodied learning PDF

[57] Embodied intelligence-based perception, decision-making, and control for autonomous operations of rail transportation PDF

[58] Embodied ai agents: Modeling the world PDF

[59] Thinking and Moving: An Efficient Computing Approach for Integrated Task and Motion Planning in Cooperative Embodied AI Systems (Invited Paper) PDF

[60] Robobench: A Comprehensive Evaluation Benchmark for Multimodal Large Language Models as Embodied Brain PDF

[61] Embodied AI with Large Language Models: A Survey and New HRI Framework PDF

Embodied-IQA database with multi-stage annotations

[12] Embodied Image Quality Assessment for Robotic Intelligence PDF

[51] Point-It-Out: Benchmarking Embodied Reasoning for Vision Language Models in Multi-Stage Visual Grounding PDF

[52] A survey of embodied ai in healthcare: Techniques, applications, and opportunities PDF

[53] Embodied-R1: Reinforced Embodied Reasoning for General Robotic Manipulation PDF

Benchmark evaluation of IQA methods for Embodied AI

[1] Perceptual Quality Assessment for Embodied AI PDF

[12] Embodied Image Quality Assessment for Robotic Intelligence PDF

[2] Ewmbench: Evaluating scene, motion, and semantic quality in embodied world models PDF

[10] Evaluating Uncertainty and Quality of Visual Language Action-enabled Robots PDF

[62] Improved no-reference image quality assessment algorithm based on visual perception characteristics PDF

[63] Fast underwater image enhancement for improved visual perception PDF

[64] Research Progress on Color Image Quality Assessment PDF

[65] NaviTrace: Evaluating Embodied Navigation of Vision-Language Models PDF

[66] An exploration of embodied visual exploration PDF

[67] MEFormer: Enhancing Low-Light Images While Preserving Image Authenticity in Mining Environments PDF

Table of Contents