Learning from Algorithm Feedback: One-Shot SAT Solver Guidance with GNNs

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 6.5 Download Report PDF

Reinforcement LearningGraph Neural NetworksCombinatorial OptimizationSAT SolvingGNNsGraph LearningGRPO

Boolean Satisfiability (SAT) solvers are foundational to computer science, yet their performance typically hinges on hand-crafted heuristics. This work introduces Reinforcement Learning from Algorithm Feedback (RLAF) as a paradigm for learning to guide SAT solver branching heuristics with Graph Neural Networks (GNNs). Central to our approach is a novel and generic mechanism for injecting inferred variable weights and polarities into the branching heuristics of existing SAT solvers. In a single forward pass, a GNN assigns these parameters to all variables. Casting this one-shot guidance as a reinforcement learning problem lets us train the GNN with off-the-shelf policy-gradient methods, such as GRPO, directly using the solver's computational cost as the sole reward signal. Extensive evaluations demonstrate that RLAF-trained policies significantly reduce the mean solve times of different base solvers across diverse SAT problem distributions, achieving more than a 2x speedup in some cases, while generalizing effectively to larger and harder problems after training. Notably, these policies consistently outperform expert-supervised approaches based on learning handcrafted weighting heuristics, offering a promising path towards data-driven heuristic design in combinatorial optimization.

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper introduces Reinforcement Learning from Algorithm Feedback (RLAF) for training GNN-based branching policies in SAT solvers, using solver runtime as the reward signal and policy-gradient methods like GRPO. It resides in the 'Reinforcement Learning-Based CDCL Branching' leaf, which contains only three papers total (including this work and two siblings). This is a relatively sparse research direction within the broader taxonomy of 17 papers across multiple branches, suggesting the specific combination of RL training and CDCL integration remains an emerging area rather than a crowded subfield.

The taxonomy reveals several neighboring directions: supervised learning-based CDCL branching (one paper mimicking expert heuristics), domain-specific CDCL methods (one paper on logic equivalence checking), and local search GNN branching (one paper on stochastic algorithms). The paper's focus on RL-driven policy learning distinguishes it from supervised approaches that rely on labeled expert demonstrations. Broader branches address non-SAT combinatorial problems (graph optimization, neural network verification, constraint programming) and theoretical foundations (expressive power, multi-task learning), indicating the field spans both application-specific solver development and methodological inquiry into GNN capabilities.

Among 28 candidates examined, the RLAF paradigm contribution shows 3 refutable candidates out of 10 examined, while the generic weight-injection mechanism has 2 refutable candidates out of 10. The one-shot GNN policy contribution appears more novel, with 0 refutable candidates among 8 examined. These statistics reflect a limited semantic search scope, not exhaustive coverage. The presence of some overlapping prior work on RL-based branching and weight parameterization suggests incremental refinement of existing ideas, though the specific integration of GRPO and one-shot variable parameterization may offer distinguishing technical details.

Based on the top-28 semantic matches and taxonomy structure, the work appears to advance an active but not densely populated research direction. The limited number of sibling papers and the sparse RL-based CDCL branch indicate room for methodological contributions, though the refutable candidates for two contributions signal that core ideas around RL training and weight injection have precedents. The analysis does not cover exhaustive citation networks or domain-specific venues, so additional related work may exist beyond this scope.

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: Learning to guide SAT solver branching heuristics with graph neural networks. The field has evolved around several complementary directions. The main branches include GNN-based branching heuristics specifically designed for SAT solvers, extensions to non-SAT combinatorial problems such as mixed-integer programming and constraint satisfaction, and foundational work on theoretical expressiveness and methodological principles. Within the SAT-focused branch, a key distinction emerges between approaches that integrate GNNs directly into conflict-driven clause learning (CDCL) solvers and those that explore alternative architectures or extended formalisms like quantified Boolean formulas. Representative works such as GNN Reinforcement SAT[4] and Q-Learning Branching Heuristic[13] illustrate reinforcement learning strategies for CDCL integration, while studies like GNN Branching Capacity[3] and GNN Expressive Power[10] examine the representational limits of graph neural networks in capturing solver dynamics. Meanwhile, branches addressing non-SAT domains (e.g., Neural Branch and Bound[7], Targeted Branching MIS[6]) demonstrate how similar GNN principles transfer to broader combinatorial settings, and survey-oriented efforts (Data-Driven Combinatorial Solvers[15]) provide comparative perspectives across methods. Particularly active lines of work center on reinforcement learning-based CDCL branching and the interplay between learned heuristics and traditional solver feedback. Algorithm Feedback GNN[0] sits within the reinforcement learning-based CDCL branching cluster, emphasizing how algorithm-level feedback can refine GNN policies during search. This contrasts with nearby approaches like GNN Reinforcement SAT[4], which pioneered RL integration but may rely on simpler reward signals, and Q-Learning Branching Heuristic[13], which explores tabular or value-based methods rather than policy gradients. A recurring theme across these studies is the trade-off between sample efficiency and generalization: some methods prioritize rapid adaptation to specific problem families, while others aim for broader transferability. Open questions include how to best incorporate clause-learning dynamics into GNN architectures and whether hybrid strategies that blend learned and hand-crafted heuristics can outperform purely data-driven policies.

Claimed Contributions

Reinforcement Learning from Algorithm Feedback (RLAF) paradigm

Can Refute

10 retrieved papers

The authors propose RLAF, a training paradigm that uses reinforcement learning to train GNN-based policies for guiding SAT solver branching heuristics. The approach uses the solver's computational cost as the sole reward signal, eliminating the need for expert supervision.

10 retrieved papers

Can Refute

Generic mechanism for injecting variable weights into branching heuristics

Can Refute

10 retrieved papers

The authors introduce a method that modifies existing SAT solver branching heuristics by incorporating external variable-wise weights and polarities. This mechanism scales the solver's original scoring function without sacrificing its inherent heuristic properties.

10 retrieved papers

Can Refute

One-shot GNN-based policy for variable parameterization

8 retrieved papers

The authors develop a GNN-based policy that assigns weights and polarities to all variables in a single forward pass, avoiding costly repeated network evaluations. This one-shot formulation enables efficient training using standard policy-gradient methods like GRPO.

8 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

[4] Improving SAT solver heuristics with graph networks and reinforcement learning PDF

Vitaly Kurin, Saad Godil, Shimon Whiteson, Bryan Catanzaro (2019)

[13] Can $Q$ -Learning with Graph Networks Learn a Generalizable Branching Heuristic for a SAT Solver? PDF

Kurin, Vitaly, Vitaly Kurin, Godil, Saad, Saad Godil, Whiteson, Shimon, Shimon Whiteson, Catanzaro, Bryan, Bryan Catanzaro (2022)

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Reinforcement Learning from Algorithm Feedback (RLAF) paradigm

[1] Learning local search heuristics for boolean satisfiability PDF

Can Refute

[4] Improving SAT solver heuristics with graph networks and reinforcement learning PDF

Can Refute

[13] Can $Q$ -Learning with Graph Networks Learn a Generalizable Branching Heuristic for a SAT Solver? PDF

Can Refute

[8] Neural Guidance in Constraint Solvers PDF

Cannot Refute

[11] A minimalist approach to deep multi-task learning PDF

Cannot Refute

[18] Learning splitting heuristics in divide-and-conquer SAT solvers with reinforcement learning PDF

Cannot Refute

[19] A Deep Reinforcement Learning Heuristic for SAT based on Antagonist Graph Neural Networks PDF

Cannot Refute

[20] A Survey of Machine Learning Approaches in Logic Synthesis PDF

Cannot Refute

[21] LangSAT: A Novel Framework Combining NLP and Reinforcement Learning for SAT Solving PDF

Cannot Refute

[22] Applications of Q-Learning to Network Optimization and Graph Problems PDF

Cannot Refute

Contribution

Generic mechanism for injecting variable weights into branching heuristics

[28] Learning rate based branching heuristic for SAT solvers PDF

Can Refute

[34] Additive versus multiplicative clause weighting for SAT PDF

Can Refute

[26] Enhancing SAT Solving with GNN for Clause Weight Prediction PDF

Cannot Refute

[29] RASLite: Enhancing (W) PMS Solvers Through Dynamic Initial Weight Approach PDF

Cannot Refute

[30] A dynamic clause specific initial weight assignment for solving satisfiability problems using local search PDF

Cannot Refute

[31] Exponential recency weighted average branching heuristic for SAT solvers PDF

Cannot Refute

[32] Solving (Weighted) Partial MaxSAT by Dynamic Local Search for SAT PDF

Cannot Refute

[33] Dynamic branching in qualitative constraint networks via counting local models PDF

Cannot Refute

[35] MathSAT: Tight Integration of SAT and Mathematical Decision Procedures PDF

Cannot Refute

[36] Phase selection heuristics for satisfiability solvers PDF

Cannot Refute

Contribution

One-shot GNN-based policy for variable parameterization

[4] Improving SAT solver heuristics with graph networks and reinforcement learning PDF

Cannot Refute

[9] NeuroBack: Improving CDCL SAT Solving using Graph Neural Networks PDF

Cannot Refute

[13] Can $Q$ -Learning with Graph Networks Learn a Generalizable Branching Heuristic for a SAT Solver? PDF

Cannot Refute

[23] G4satbench: Benchmarking and advancing sat solving with graph neural networks PDF

Cannot Refute

[24] GraSS: Combining Graph Neural Networks with Expert Knowledge for SAT Solver Selection PDF

Cannot Refute

[25] Enhancing Modern SAT Solver With Machine Learning Method PDF

Cannot Refute

[26] Enhancing SAT Solving with GNN for Clause Weight Prediction PDF

Cannot Refute

[27] Using deep learning to construct stochastic local search SAT solvers with performance bounds PDF

Cannot Refute

Learning from Algorithm Feedback: One-Shot SAT Solver Guidance with GNNs

Overview

Overall Novelty Assessment

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

[4] Improving SAT solver heuristics with graph networks and reinforcement learning PDF

[13] Can QQQ-Learning with Graph Networks Learn a Generalizable Branching Heuristic for a SAT Solver? PDF

Contribution Analysis

Reinforcement Learning from Algorithm Feedback (RLAF) paradigm

[1] Learning local search heuristics for boolean satisfiability PDF

[4] Improving SAT solver heuristics with graph networks and reinforcement learning PDF

[13] Can QQQ-Learning with Graph Networks Learn a Generalizable Branching Heuristic for a SAT Solver? PDF

[8] Neural Guidance in Constraint Solvers PDF

[11] A minimalist approach to deep multi-task learning PDF

[18] Learning splitting heuristics in divide-and-conquer SAT solvers with reinforcement learning PDF

[19] A Deep Reinforcement Learning Heuristic for SAT based on Antagonist Graph Neural Networks PDF

[20] A Survey of Machine Learning Approaches in Logic Synthesis PDF

[21] LangSAT: A Novel Framework Combining NLP and Reinforcement Learning for SAT Solving PDF

[22] Applications of Q-Learning to Network Optimization and Graph Problems PDF

Generic mechanism for injecting variable weights into branching heuristics

[28] Learning rate based branching heuristic for SAT solvers PDF

[34] Additive versus multiplicative clause weighting for SAT PDF

[26] Enhancing SAT Solving with GNN for Clause Weight Prediction PDF

[29] RASLite: Enhancing (W) PMS Solvers Through Dynamic Initial Weight Approach PDF

[30] A dynamic clause specific initial weight assignment for solving satisfiability problems using local search PDF

[31] Exponential recency weighted average branching heuristic for SAT solvers PDF

[32] Solving (Weighted) Partial MaxSAT by Dynamic Local Search for SAT PDF

[33] Dynamic branching in qualitative constraint networks via counting local models PDF

[35] MathSAT: Tight Integration of SAT and Mathematical Decision Procedures PDF

[36] Phase selection heuristics for satisfiability solvers PDF

One-shot GNN-based policy for variable parameterization

[4] Improving SAT solver heuristics with graph networks and reinforcement learning PDF

[9] NeuroBack: Improving CDCL SAT Solving using Graph Neural Networks PDF

[13] Can QQQ-Learning with Graph Networks Learn a Generalizable Branching Heuristic for a SAT Solver? PDF

[23] G4satbench: Benchmarking and advancing sat solving with graph neural networks PDF

[24] GraSS: Combining Graph Neural Networks with Expert Knowledge for SAT Solver Selection PDF

[25] Enhancing Modern SAT Solver With Machine Learning Method PDF

[26] Enhancing SAT Solving with GNN for Clause Weight Prediction PDF

[27] Using deep learning to construct stochastic local search SAT solvers with performance bounds PDF

Table of Contents

[13] Can $Q$ -Learning with Graph Networks Learn a Generalizable Branching Heuristic for a SAT Solver? PDF

[13] Can $Q$ -Learning with Graph Networks Learn a Generalizable Branching Heuristic for a SAT Solver? PDF

[13] Can $Q$ -Learning with Graph Networks Learn a Generalizable Branching Heuristic for a SAT Solver? PDF