LRIM: a Physics-Based Benchmark for Provably Evaluating Long-Range Capabilities in Graph Learning

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 6.7 Download Report PDF

Graph LearningLong-RangeBenchmark

Accurately modeling long-range dependencies in graph-structured data is critical for many real-world applications. However, incorporating long-range interactions beyond the nodes' immediate neighborhood in a $\textit{scalable}$ manner remains an open challenge for graph machine learning models. Existing benchmarks for evaluating long-range capabilities either cannot $\textit{guarantee}$ that their tasks actually depend on long-range information or are rather limited. Therefore, claims of long-range modeling improvements based on said performance remain questionable. We introduce the Long-Range Ising Model Graph Benchmark, a physics-based benchmark utilizing the well-studied Ising model whose ground truth $\textit{provably}$ depends on long-range dependencies. Our benchmark consists of ten datasets that scale from 256 to 65k nodes per graph, and provide controllable long-range dependencies through tunable parameters, allowing precise control over the hardness and ``long-rangedness". We provide model-agnostic evidence that local information is insufficient, further validating the design choices of our benchmark. Via experiments on classical message-passing architectures and graph transformers, we show that both perform far from the optimum, especially those with scalable complexity. Our goal is that our benchmark will foster the development of scalable methodologies that effectively model long-range interactions in graphs.

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper introduces a physics-based benchmark using the Ising model to evaluate long-range dependency modeling in graph neural networks. It resides in the 'Formal Benchmarks with Provable Long-Range Dependencies' leaf, which contains only two papers total. This is a notably sparse research direction within the broader taxonomy of 50 papers across 15 leaf nodes, suggesting that provably designed benchmarks remain an underexplored area. The sibling paper in this leaf takes a different approach, indicating that even within this small niche, methodological diversity exists.

The taxonomy tree reveals that the paper's leaf sits within the 'Benchmark Design and Theoretical Foundations' branch, which also includes 'Empirical Benchmark Collections' (two papers) and 'Theoretical Measurement and Characterization' (one paper). Neighboring branches focus on architecture design (six leaves with 18 papers) and domain applications (six leaves with 25 papers), showing that the field has invested more heavily in building models and applying them than in creating rigorous evaluation frameworks. The scope note for the paper's leaf explicitly excludes empirical benchmarks without provable guarantees, positioning this work as pursuing formal rigor rather than scale or realism.

Among 23 candidates examined, three contributions were analyzed. The LRIM benchmark contribution examined three candidates with one appearing to provide overlapping prior work. The model-agnostic evidence contribution examined ten candidates with one potential refutation, while the theoretical analysis contribution also examined ten candidates with one refutation. These statistics indicate that within the limited search scope, each contribution faces at least one prior work that may overlap, though the majority of examined candidates (20 out of 23 total) do not clearly refute the claims. The search scale is modest, leaving open the possibility of additional relevant work beyond the top-K semantic matches.

Based on the limited literature search of 23 candidates, the work appears to occupy a sparsely populated research direction with only one sibling paper in its taxonomy leaf. The contribution-level statistics suggest that while some prior work exists for each claim, the majority of examined candidates do not provide clear refutations. However, the modest search scope and the presence of at least one potentially overlapping work per contribution indicate that a more exhaustive review would be necessary to fully assess novelty.

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: evaluating long-range dependency modeling in graph neural networks. The field has organized itself around three main branches that reflect complementary perspectives on this challenge. The first branch, Benchmark Design and Theoretical Foundations, focuses on creating rigorous testbeds and formal frameworks to measure how well GNNs capture dependencies spanning many hops or distant nodes—works like Long Range Graph Benchmark[6] and Measuring Long Range GNN[1] exemplify efforts to establish controlled settings with provable long-range structure. The second branch, Architecture Design for Long-Range Modeling, explores novel layer designs, attention mechanisms, and hybrid approaches (e.g., Graph Mamba[2], Hybrid Long Range GCN[4]) that aim to overcome the limited receptive fields of standard message-passing schemes. The third branch, Domain Applications with Long-Range Dependencies, applies these architectures to real-world problems—ranging from traffic forecasting (PDFormer Traffic Prediction[17]) and wildfire prediction (Wildfire Prediction GNN[12]) to protein structure learning (Protein Structure Graph Learning[25])—where capturing distant interactions is essential for performance. Across these branches, a recurring theme is the tension between theoretical guarantees and practical scalability: some studies prioritize formal benchmarks that isolate long-range reasoning, while others emphasize end-to-end performance on complex domain tasks. Within the Benchmark Design branch, LRIM Physics Benchmark[0] sits alongside Long Range Graph Benchmark[6] as a formal testbed with provable long-range dependencies, yet it brings a physics-inspired perspective that complements the more graph-theoretic focus of earlier benchmarks. Compared to Measuring Long Range GNN[1], which quantifies existing architectures' capacity for long-range modeling, LRIM Physics Benchmark[0] emphasizes constructing new evaluation scenarios grounded in physical systems. This positioning reflects an ongoing effort to bridge the gap between abstract graph properties and interpretable, domain-motivated tasks, helping researchers understand not only whether a model can capture long-range dependencies in principle, but also how those capabilities translate to scientifically meaningful settings.

Claimed Contributions

Long-Range Ising Model (LRIM) Graph Benchmark

Can Refute

3 retrieved papers

The authors introduce a physics-based benchmark utilizing the Ising model that provides provable and controllable long-range dependencies for evaluating graph learning models. The benchmark consists of ten datasets scaling from 256 to 65k nodes with tunable parameters that allow precise control over task hardness and long-rangedness.

3 retrieved papers

Can Refute

Model-agnostic evidence for long-range dependency requirements

Can Refute

10 retrieved papers

The authors provide baseline-agnostic evidence demonstrating that local information is insufficient for solving the benchmark tasks. They analyze oracle predictors restricted to local neighborhoods, showing systematic performance degradation and providing theoretical lower bounds on worst-case error for methods using only local information.

10 retrieved papers

Can Refute

Theoretical analysis using long-rangedness metrics

Can Refute

10 retrieved papers

The authors connect their benchmark to formal long-rangedness measures, deriving analytical expressions for range measures on the oracle predictor and proving how the interaction parameter sigma controls long-range dependencies. This provides theoretical grounding for the benchmark's ability to evaluate long-range capabilities.

10 retrieved papers

Can Refute

Core Task Comparisons

Comparisons with papers in the same taxonomy category

[6] Long range graph benchmark PDF

Dwivedi, Vijay Prakash, Vijay Prakash Dwivedi, RampÃ¡Å¡ek, Ladislav, Ladislav RampÃ¡Å¡ek, Galkin, Mikhail, Mikhail Galkin, Parviz, Ali, Ali Parviz, Wolf, Guy, Guy Wolf, Alipanah Parviz, Luu Anh Tuan, Anh Tuan Luu, Beaini, Dominique, Dominique Beaini, A. Luu, D. Beaini (2022)

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Long-Range Ising Model (LRIM) Graph Benchmark

[68] Long-Range Ising Model: A Benchmark for Long-Range Capabilities in Graph Learning PDF

Can Refute

[67] Beyond circuit connections: A non-message passing graph transformer approach for quantum error mitigation PDF

Cannot Refute

[69] RELIC: Reinforcement Learning Based Ising Optimization via Graph Compression PDF

Cannot Refute

Contribution

Model-agnostic evidence for long-range dependency requirements

[61] Generalization and representational limits of graph neural networks PDF

Can Refute

[57] Local augmentation for graph neural networks PDF

Cannot Refute

[58] Subgraph federated learning for local generalization PDF

Cannot Refute

[59] Graph Mamba: Towards Learning on Graphs with State Space Models PDF

Cannot Refute

[60] Subgraph federated learning with missing neighbor generation PDF

Cannot Refute

[62] Learning strong graph neural networks with weak information PDF

Cannot Refute

[63] A local graph limits perspective on sampling-based gnns PDF

Cannot Refute

[64] Accurate Interpolation of Scattered Data Via Learning Relation Graph PDF

Cannot Refute

[65] Federated Graph Learning via Constructing and Sharing Feature Spaces for Cross-Domain IoT PDF

Cannot Refute

[66] Openfgl: A comprehensive benchmark for federated graph learning PDF

Cannot Refute

Contribution

Theoretical analysis using long-rangedness metrics

[1] On Measuring Long-Range Interactions in Graph Neural Networks PDF

Can Refute

[11] Long-range brain graph transformer PDF

Cannot Refute

[40] Towards Quantifying Long-Range Interactions in Graph Machine Learning: a Large Graph Dataset and a Measurement PDF

Cannot Refute

[41] GLoRa: A Benchmark to Evaluate the Ability to Learn Long-Range Dependencies in Graphs PDF

Cannot Refute

[51] Understanding over-squashing and bottlenecks on graphs via curvature PDF

Cannot Refute

[52] Distance encoding: Design provably more powerful neural networks for graph representation learning PDF

Cannot Refute

[53] Long Range Propagation on Continuous-Time Dynamic Graphs PDF

Cannot Refute

[54] Return of ChebNet: Understanding and Improving an Overlooked GNN on Long Range Tasks PDF

Cannot Refute

[55] Benchmarking Machine Learning Models for Quantum Error Correction PDF

Cannot Refute

[56] Effects of Dropout on Performance in Long-range Graph Learning Tasks PDF

Cannot Refute

LRIM: a Physics-Based Benchmark for Provably Evaluating Long-Range Capabilities in Graph Learning

Overview

Overall Novelty Assessment

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

[6] Long range graph benchmark PDF

Contribution Analysis

Long-Range Ising Model (LRIM) Graph Benchmark

[68] Long-Range Ising Model: A Benchmark for Long-Range Capabilities in Graph Learning PDF

[67] Beyond circuit connections: A non-message passing graph transformer approach for quantum error mitigation PDF

[69] RELIC: Reinforcement Learning Based Ising Optimization via Graph Compression PDF

Model-agnostic evidence for long-range dependency requirements

[61] Generalization and representational limits of graph neural networks PDF

[57] Local augmentation for graph neural networks PDF

[58] Subgraph federated learning for local generalization PDF

[59] Graph Mamba: Towards Learning on Graphs with State Space Models PDF

[60] Subgraph federated learning with missing neighbor generation PDF

[62] Learning strong graph neural networks with weak information PDF

[63] A local graph limits perspective on sampling-based gnns PDF

[64] Accurate Interpolation of Scattered Data Via Learning Relation Graph PDF

[65] Federated Graph Learning via Constructing and Sharing Feature Spaces for Cross-Domain IoT PDF

[66] Openfgl: A comprehensive benchmark for federated graph learning PDF

Theoretical analysis using long-rangedness metrics

[1] On Measuring Long-Range Interactions in Graph Neural Networks PDF

[11] Long-range brain graph transformer PDF

[40] Towards Quantifying Long-Range Interactions in Graph Machine Learning: a Large Graph Dataset and a Measurement PDF

[41] GLoRa: A Benchmark to Evaluate the Ability to Learn Long-Range Dependencies in Graphs PDF

[51] Understanding over-squashing and bottlenecks on graphs via curvature PDF

[52] Distance encoding: Design provably more powerful neural networks for graph representation learning PDF

[53] Long Range Propagation on Continuous-Time Dynamic Graphs PDF

[54] Return of ChebNet: Understanding and Improving an Overlooked GNN on Long Range Tasks PDF

[55] Benchmarking Machine Learning Models for Quantum Error Correction PDF

[56] Effects of Dropout on Performance in Long-range Graph Learning Tasks PDF

Table of Contents