GIQ: Benchmarking 3D Geometric Reasoning of Vision Foundation Models with Simulated and Real Polyhedra
Overview
Overall Novelty Assessment
The paper introduces GIQ, a benchmark dataset and evaluation framework targeting geometric reasoning in vision and vision-language foundation models. It resides in the '3D Geometric and Structural Understanding' leaf, which contains five papers total, including works on probing 3D awareness, visual attribute benchmarks, and embodied 3D evaluation. This leaf sits within the broader 'Spatial Reasoning Capabilities and Evaluation' branch, indicating a moderately populated research direction focused on diagnostic assessment rather than method development. The taxonomy reveals that geometric reasoning evaluation is an active but not overcrowded area, with distinct clusters for general spatial reasoning, embodied intelligence, and domain-specific tasks.
The taxonomy tree shows neighboring leaves addressing general spatial reasoning benchmarks (six papers on broad relationship understanding) and embodied/robotic spatial intelligence (three papers on egocentric tasks). GIQ's focus on polyhedra, symmetry detection, and mental rotation distinguishes it from these adjacent directions: general spatial benchmarks emphasize 2D relationships and orientation, while embodied benchmarks prioritize navigation and manipulation. The taxonomy's scope notes clarify that 3D geometric property recognition belongs specifically in this leaf, separating it from 2D spatial reasoning or action-oriented evaluation. This structural positioning suggests GIQ addresses a gap between abstract geometric understanding and task-driven spatial intelligence.
Among twenty candidates examined across three contributions, zero refutable pairs were identified. The first contribution (GIQ dataset) examined ten candidates with no clear refutations, while the second contribution (evaluation framework) examined zero candidates directly. The third contribution (empirical findings) also examined ten candidates without refutation. This limited search scope—twenty papers from semantic retrieval—means the analysis captures top-ranked related work but cannot claim exhaustive coverage. The absence of refutations among examined candidates suggests the specific combination of polyhedra-focused tasks and systematic geometric probing may represent a novel evaluation angle, though the small search window leaves room for undetected overlaps.
Based on the top-twenty semantic matches and taxonomy context, GIQ appears to occupy a distinct position within 3D geometric evaluation, emphasizing structural properties and symmetry rather than scene-level reconstruction or embodied tasks. The limited search scope and zero refutations among examined candidates suggest potential novelty, but a broader literature review would be needed to confirm whether similar polyhedra-based benchmarks or mental rotation tests exist outside the retrieved set. The taxonomy structure indicates this work contributes to an active but not saturated research direction.
Taxonomy
Research Landscape Overview
Claimed Contributions
The authors present GIQ, a novel benchmark dataset comprising synthetic and real-world images of 224 diverse polyhedra with corresponding 3D meshes. The dataset systematically varies geometric complexity, symmetry properties, and topological regularity to enable rigorous evaluation of spatial reasoning in vision models.
The authors develop a comprehensive evaluation framework consisting of four distinct tasks that probe different dimensions of geometric intelligence: explicit 3D reconstruction, implicit symmetry detection via linear and non-linear probing, mental rotation capabilities, and high-level semantic classification by frontier vision-language models.
The authors demonstrate through extensive experiments that current state-of-the-art models exhibit significant limitations in geometric reasoning, including failures in reconstructing simple shapes, struggles with mental rotation tasks requiring fine-grained differentiation, and remarkably low accuracy in interpreting basic shape properties by advanced vision-language assistants.
Core Task Comparisons
Comparisons with papers in the same taxonomy category
[1] Probing the 3d awareness of visual foundation models PDF
[8] Ava-bench: Atomic visual ability benchmark for vision foundation models PDF
[19] E3D-Bench: A Benchmark for End-to-End 3D Geometric Foundation Models PDF
[32] Shape and texture recognition in large vision-language models PDF
Contribution Analysis
Detailed comparisons for each claimed contribution
GIQ benchmark dataset for evaluating geometric reasoning
The authors present GIQ, a novel benchmark dataset comprising synthetic and real-world images of 224 diverse polyhedra with corresponding 3D meshes. The dataset systematically varies geometric complexity, symmetry properties, and topological regularity to enable rigorous evaluation of spatial reasoning in vision models.
[2] Spatialrgpt: Grounded spatial reasoning in vision-language models PDF
[14] SpatialRGPT: Grounded Spatial Reasoning in Vision Language Model PDF
[22] SpatialVLM: Endowing Vision-Language Models with Spatial Reasoning Capabilities PDF
[29] Mind the gap: Benchmarking spatial reasoning in vision-language models PDF
[30] Spatialladder: Progressive training for spatial reasoning in vision-language models PDF
[59] Llava-cot: Let vision language models reason step-by-step PDF
[60] Abo: Dataset and benchmarks for real-world 3d object understanding PDF
[61] DUSt3R: Geometric 3D Vision Made Easy PDF
[62] Measuring multimodal mathematical reasoning with math-vision dataset PDF
[63] Rvtbench: A benchmark for visual reasoning tasks PDF
Systematic evaluation framework across four geometric reasoning tasks
The authors develop a comprehensive evaluation framework consisting of four distinct tasks that probe different dimensions of geometric intelligence: explicit 3D reconstruction, implicit symmetry detection via linear and non-linear probing, mental rotation capabilities, and high-level semantic classification by frontier vision-language models.
Empirical findings revealing fundamental gaps in geometric understanding
The authors demonstrate through extensive experiments that current state-of-the-art models exhibit significant limitations in geometric reasoning, including failures in reconstructing simple shapes, struggles with mental rotation tasks requiring fine-grained differentiation, and remarkably low accuracy in interpreting basic shape properties by advanced vision-language assistants.