Abstract:

Long-range dependencies are critical for effective graph representation learning, yet most existing datasets focus on small graphs tailored to inductive tasks, offering limited insight into long-range interactions. Current evaluations primarily compare models employing global attention (e.g., graph transformers) with those using local neighborhood aggregation (e.g., message-passing neural networks) without a direct measurement of long-range dependency. In this work, we introduce City-Networks\texttt{City-Networks}, a novel large-scale transductive learning dataset derived from real-world city road networks. This dataset features graphs with over 10510^5 nodes and significantly larger diameters than those in existing benchmarks, naturally embodying long-range information. We annotate the graphs based on local node eccentricities, ensuring that the classification task inherently requires information from distant nodes. Furthermore, we propose a generic measurement based on the Jacobians of neighbors from distant hops, offering a principled quantification of long-range dependencies. Finally, we provide theoretical justifications for both our dataset design and the proposed measurement—particularly by focusing on over-smoothing and influence score dilution—which establishes a robust foundation for further exploration of long-range interactions in graph neural networks.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper introduces City-Networks, a large-scale transductive dataset derived from real-world road networks, and proposes a Jacobian-based metric to quantify long-range dependencies in graph neural networks. It resides in the 'Formal Characterization and Metrics' leaf under 'Theoretical Foundations and Measurement', sharing this space with only one sibling paper. This leaf is notably sparse, containing just two works focused on principled measurement of long-range interactions, suggesting the paper enters a relatively underpopulated research direction within the broader taxonomy of fifty papers across thirty-six topics.

The taxonomy reveals that most activity concentrates in 'Architecture Design and Enhancement' (nine subcategories, twenty-one papers) and 'Domain Applications' (eight subcategories, twenty-four papers), while 'Theoretical Foundations and Measurement' contains only two subcategories with three papers total. The sibling work 'Random Walks Dependencies' also develops formal metrics, but the taxonomy's scope notes clarify that this leaf excludes empirical benchmarking without theoretical grounding and architectural solutions. Neighboring leaves like 'Fundamental Limitations Analysis' examine over-smoothing and over-squashing, while 'Benchmark Datasets and Tasks' focuses on curated evaluation protocols—both adjacent but distinct from the formal metric development emphasized here.

Among thirty candidates examined via semantic search and citation expansion, none were found to clearly refute any of the three contributions. For the City-Networks dataset contribution, ten candidates were reviewed with zero refutable overlaps; the Jacobian-based measurement similarly examined ten candidates with no prior work providing the same formulation; and the theoretical justifications reviewed ten candidates without encountering overlapping analyses. This suggests that within the limited search scope, the specific combination of large-scale transductive road network data, eccentricity-based annotation, and Jacobian-derived metrics appears distinct from existing benchmarking and measurement efforts.

The analysis covers top-thirty semantic matches and does not constitute an exhaustive literature review. While the sparse taxonomy leaf and absence of refutable candidates within this scope indicate novelty in formal measurement design, the broader field's emphasis on architectural solutions and domain applications means the practical impact of these metrics will depend on adoption by the architecture-focused community. The theoretical grounding and dataset scale appear substantive, though the limited search cannot rule out related work in adjacent measurement frameworks or alternative benchmark designs.

Taxonomy

Core-task Taxonomy Papers
50
3
Claimed Contributions
30
Contribution Candidate Papers Compared
0
Refutable Paper

Research Landscape Overview

Core task: Quantifying long-range dependencies in graph neural networks. The field has organized itself around several complementary directions. Theoretical Foundations and Measurement focuses on formal characterization and metrics to rigorously assess how well GNNs capture distant node interactions, with works like Measuring Long-Range GNN[2] and Random Walks Dependencies[1] developing principled ways to evaluate dependency propagation. Architecture Design and Enhancement explores novel propagation schemes and attention mechanisms—such as Adaptive Propagation GNN[3], Global Attention Context[4], and Implicit GNN[5]—that extend the receptive field beyond standard message-passing limits. Sequential and Temporal Modeling addresses dynamic graphs and time-evolving structures, while Benchmarking and Evaluation provides standardized datasets and protocols like GLoRa Benchmark[27] to compare methods. Domain Applications demonstrate the practical impact of long-range modeling in areas ranging from traffic forecasting (PDFormer Traffic[18], T-RippleGNN Traffic[28]) to protein structure learning (Protein Structure Learning[8]) and code analysis (Binary Code Similarity[6]). A particularly active line of work examines the trade-offs between computational efficiency and expressive power when modeling distant dependencies. Some approaches leverage spectral methods or higher-order propagation (ChebNet Long Range[33], Predict Propagate PageRank[34]), while others incorporate global context via transformers (Brain Graph Transformer[9]) or state-space models (Graph Mamba[7], DG-Mamba[43]). Long-Range Graph Dataset[0] sits squarely within the Theoretical Foundations branch alongside Measuring Long-Range GNN[2], contributing formal metrics and benchmark datasets to quantify dependency reach. Compared to Measuring Long-Range GNN[2], which emphasizes diagnostic tools for existing architectures, Long-Range Graph Dataset[0] appears to focus on establishing standardized evaluation protocols that can guide future architectural innovations. This positioning bridges the gap between rigorous measurement and the practical design challenges addressed by works like Adaptive Propagation GNN[3], helping the community systematically assess whether new methods genuinely improve long-range reasoning or merely shift computational bottlenecks.

Claimed Contributions

City-Networks: a large-scale transductive learning dataset with long-range dependencies

The authors propose a new dataset consisting of four large-scale city road networks (Paris, Shanghai, Los Angeles, London) with over 100,000 nodes and diameters up to 400. The dataset is annotated based on local node eccentricities to create a classification task that inherently requires information from distant nodes, addressing the gap in existing benchmarks that focus on small graphs for inductive tasks.

10 retrieved papers
A generic Jacobian-based measurement for quantifying long-range dependencies

The authors introduce a measurement that quantifies per-hop influence of a focal node's neighbors on its prediction by computing the aggregated l1-norm of the Jacobian from a trained model at each hop. This provides a principled way to measure and compare the level of long-rangeness across different datasets and models, going beyond empirical performance comparisons.

10 retrieved papers
Theoretical justifications for dataset design and measurement based on over-smoothing and influence score dilution

The authors establish theoretical foundations by linking the rate of over-smoothing to algebraic connectivity and network diameter, showing that graphs with large diameters and low maximum degrees are more resilient to over-smoothing. They also analyze influence score dilution in grid-like graphs to justify their choice of aggregating influence scores using summation rather than mean.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

City-Networks: a large-scale transductive learning dataset with long-range dependencies

The authors propose a new dataset consisting of four large-scale city road networks (Paris, Shanghai, Los Angeles, London) with over 100,000 nodes and diameters up to 400. The dataset is annotated based on local node eccentricities to create a classification task that inherently requires information from distant nodes, addressing the gap in existing benchmarks that focus on small graphs for inductive tasks.

Contribution

A generic Jacobian-based measurement for quantifying long-range dependencies

The authors introduce a measurement that quantifies per-hop influence of a focal node's neighbors on its prediction by computing the aggregated l1-norm of the Jacobian from a trained model at each hop. This provides a principled way to measure and compare the level of long-rangeness across different datasets and models, going beyond empirical performance comparisons.

Contribution

Theoretical justifications for dataset design and measurement based on over-smoothing and influence score dilution

The authors establish theoretical foundations by linking the rate of over-smoothing to algebraic connectivity and network diameter, showing that graphs with large diameters and low maximum degrees are more resilient to over-smoothing. They also analyze influence score dilution in grid-like graphs to justify their choice of aggregating influence scores using summation rather than mean.