Graph Random Features for Scalable Gaussian Processes

ICLR 2026 Conference SubmissionAnonymous Authors
kernelsgraphsGaussian processesMonte Carloinference
Abstract:

We study the application of graph random features (GRFs) – a recently-introduced stochastic estimator of graph node kernels – to scalable Gaussian processes on discrete input spaces. We prove that (under mild assumptions) Bayesian inference with GRFs enjoys O(N3/2)\mathcal{O}(N^{3/2}) time complexity with respect to the number of nodes NN, with probabilistic accuracy guarantees. In contrast, exact kernels generally incur O(N3)\mathcal{O}(N^{3}). Wall-clock speedups and memory savings unlock Bayesian optimisation with over 1M graph nodes on a single computer chip, whilst preserving competitive performance.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper applies graph random features to scalable Gaussian processes on discrete input spaces, contributing theoretical guarantees and demonstrating large-scale Bayesian optimization. Within the taxonomy, it resides in the 'Graph Random Features for Discrete Spaces' leaf under 'Random Feature Methods for Kernel Approximation'. This leaf contains only two papers total, including the original work, indicating a relatively sparse and emerging research direction focused specifically on random feature techniques for graph-structured inputs.

The taxonomy reveals two main branches: random feature approximation methods and direct GP applications on graphs. The original paper's leaf sits alongside 'Variance Reduction via Optimal Transport Couplings', which addresses variance reduction in random features but not specifically for discrete spaces. The sibling branch 'Gaussian Process Applications on Graphs' contains application-driven work (online learning, conformal prediction, SLAM) that uses GPs on graphs without random feature approximation. The paper thus bridges kernel approximation theory with discrete optimization, occupying a niche distinct from both variance reduction techniques and direct graph GP applications.

Among thirty candidates examined, the first contribution (applying graph random features to scalable GPs) shows one refutable candidate out of ten examined, suggesting some prior work exists in this direction. The second contribution (theoretical O(N^(3/2)) complexity analysis) examined ten candidates with none refutable, indicating potential novelty in the complexity guarantees. The third contribution (scalable Bayesian optimization on massive graphs) also examined ten candidates with none refutable, suggesting this application scale may be new. The limited search scope means these findings reflect top-thirty semantic matches rather than exhaustive coverage.

Based on the top-thirty semantic search results and taxonomy structure, the work appears to advance a sparse research direction with modest prior overlap in its core application but potentially novel theoretical and scale contributions. The analysis covers semantically similar papers but does not claim exhaustive field coverage, particularly for work outside the random feature and graph GP intersection.

Taxonomy

Core-task Taxonomy Papers
7
3
Claimed Contributions
30
Contribution Candidate Papers Compared
1
Refutable Paper

Research Landscape Overview

Core task: Scalable Gaussian processes on discrete input spaces using graph random features. The field addresses the challenge of applying Gaussian process (GP) models to large-scale problems where inputs are discrete or graph-structured, rather than continuous vectors. The taxonomy reveals two main branches: one focused on Random Feature Methods for Kernel Approximation, which develops techniques to approximate expensive kernel computations through sampling strategies, and another on Gaussian Process Applications on Graphs, which adapts GP machinery to graph-based domains such as robotics and network analysis. Within the first branch, works like Variance Reducing Couplings[1] and Optimal Transport Couplings[5] refine the quality of random feature approximations, while General Graph Random Features[4] and Egonet Features[6] extend these ideas to graph-structured inputs. The second branch includes application-driven studies such as GP-SLAM[3] for simultaneous localization and mapping, and methodological contributions like Conformalized Gaussian Processes[2] and Ensemble Gaussian Processes[7] that enhance uncertainty quantification and scalability. A particularly active line of work centers on designing random features that respect the combinatorial structure of discrete or graph inputs, balancing approximation fidelity with computational efficiency. Graph Random Features[0] sits squarely within this cluster, building on the foundation laid by General Graph Random Features[4] but emphasizing scalability for discrete spaces through novel sampling schemes. Compared to Variance Reducing Couplings[1], which focuses on variance reduction in continuous settings, Graph Random Features[0] tailors its approach to the unique geometry of graphs. Meanwhile, application-oriented works like GP-SLAM[3] demonstrate the practical payoff of scalable GP methods, though they typically operate in continuous spatial domains rather than purely discrete structures. The original paper thus occupies a niche at the intersection of kernel approximation theory and discrete optimization, addressing a gap where classical random feature methods meet graph-based inference.

Claimed Contributions

Application of graph random features to scalable Gaussian processes

The authors apply graph random features, a Monte Carlo estimator based on random walks, to construct sparse estimates of learnable graph node kernels for use as covariance functions in Gaussian processes on discrete input spaces.

10 retrieved papers
Can Refute
Theoretical analysis with O(N^(3/2)) time complexity guarantees

The authors provide theoretical proofs demonstrating that Bayesian inference using graph random features achieves O(N^(3/2)) time complexity compared to O(N^3) for exact methods, with probabilistic guarantees on approximation quality.

10 retrieved papers
Scalable Bayesian optimisation on massive graphs

The authors demonstrate practical scalability by implementing Bayesian optimisation with Thompson sampling on graphs containing over one million nodes using a single GPU, showcasing the effectiveness of their GRF-based approach.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Application of graph random features to scalable Gaussian processes

The authors apply graph random features, a Monte Carlo estimator based on random walks, to construct sparse estimates of learnable graph node kernels for use as covariance functions in Gaussian processes on discrete input spaces.

Contribution

Theoretical analysis with O(N^(3/2)) time complexity guarantees

The authors provide theoretical proofs demonstrating that Bayesian inference using graph random features achieves O(N^(3/2)) time complexity compared to O(N^3) for exact methods, with probabilistic guarantees on approximation quality.

Contribution

Scalable Bayesian optimisation on massive graphs

The authors demonstrate practical scalability by implementing Bayesian optimisation with Thompson sampling on graphs containing over one million nodes using a single GPU, showcasing the effectiveness of their GRF-based approach.