Boolean Satisfiability via Imitation Learning

ICLR 2026 Conference SubmissionAnonymous Authors
Boolean SatisfiabilityImitation LearningAutoregressive ModelingBranching Heuristics
Abstract:

We propose ImitSAT, a branching policy for conflict-driven clause learning (CDCL) solvers based on imitation learning for the Boolean satisfiability problem (SAT). Unlike previous methods that predict instance-level signals to improve CDCL branching indirectly, or rely on reinforcement learning and insufficient CDCL information to enhance branching, ImitSAT learns from expert KeyTrace that collapses a full run into the sequence of surviving decisions. Replaying a KeyTrace on the same instance is nearly conflict-free, providing dense decision- level supervision and directly reducing propagations—the dominant contributor to wall-clock time. This prefix-conditioned supervision enables ImitSAT to reproduce high-quality branches without exploration, yielding faster convergence, stable training, and seamless integration into CDCL. Extensive experiments demonstrate that ImitSAT reduces propagation counts and runtime, outperforming state-of-the-art learned approaches. We will release code, trained model, and CDCL integration.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

ImitSAT proposes an imitation learning approach to CDCL branching that learns from expert KeyTrace demonstrations—compact sequences of surviving decisions extracted from full solver runs. The paper resides in the 'Imitation Learning and Supervised Approaches' leaf under 'Machine Learning-Based Branching Heuristics', where it appears as the sole occupant among the 50 papers surveyed. This isolation suggests the leaf represents an emerging or underexplored direction within the broader landscape of learned branching policies, contrasting with more populated sibling categories like 'Graph Neural Network-Based RL Branching' containing four papers.

The taxonomy reveals substantial activity in neighboring reinforcement learning branches, particularly GNN-based RL methods that learn through trial-and-error exploration rather than expert demonstration. ImitSAT diverges from these by avoiding exploration entirely, instead extracting supervision directly from CDCL traces. The broader 'Machine Learning-Based Branching Heuristics' category also includes 'Neural Heuristics Without Explicit RL' (three papers) and 'Machine Learning for Solver Component Selection' (two papers), indicating that while neural approaches to branching are well-represented, the specific combination of imitation learning with decision-level supervision from KeyTrace appears less crowded.

Among the nineteen candidates examined through limited semantic search, none clearly refute any of the three core contributions. The 'ImitSAT branching policy' contribution examined three candidates with zero refutations, 'KeyTrace extraction' examined ten candidates with zero refutations, and 'autoregressive sequence modeling formulation' examined six candidates with zero refutations. This absence of overlapping prior work across all contributions suggests that within the examined scope, the combination of KeyTrace-based supervision, prefix-conditioned autoregressive modeling, and direct propagation reduction represents a distinct approach not previously documented in the top-nineteen semantic matches.

The analysis reflects a focused literature search rather than exhaustive coverage, examining nineteen candidates from semantic similarity and citation expansion. While no refutations emerged within this scope, the singleton status in the taxonomy leaf and the limited search scale mean the assessment captures novelty relative to closely related work but cannot rule out relevant contributions outside the examined set. The approach's distinctiveness appears strongest in its KeyTrace formulation and decision-level supervision mechanism.

Taxonomy

Core-task Taxonomy Papers
50
3
Claimed Contributions
19
Contribution Candidate Papers Compared
0
Refutable Paper

Research Landscape Overview

Core task: learning branching policies for Boolean satisfiability solvers. The field of SAT solving has evolved into a rich landscape where the choice of which variable to branch on next can dramatically affect solver performance. The taxonomy reflects this diversity through eight major branches. Machine Learning-Based Branching Heuristics encompasses supervised, reinforcement learning, and neural approaches that learn from solver traces or problem structure, exemplified by works like Neural SAT Heuristics[3] and NeuroSelect[29]. Hand-Designed Branching Heuristics captures classical strategies such as VSIDS and its variants, including Understanding VSIDS[20] and Exponential Recency Weighted[43]. Phase Selection and Polarity Heuristics addresses the complementary decision of assigning truth values, with studies like Dynamic Phase Selection[4] and Polarity Variable Selection[23]. Non-Branching Solver Components explores clause learning and conflict analysis mechanisms, while Solver Optimization and Meta-Level Approaches includes parameter tuning and restart policies. Theoretical Foundations and Benchmarking provides the analytical underpinnings, and Specialized SAT Applications extends these techniques to domains like planning and model counting. Probability-Based and Stochastic Heuristics rounds out the taxonomy with randomized decision strategies. Recent activity has concentrated on bridging classical heuristics with modern machine learning. A particularly active line explores imitation learning and supervised methods that distill expert solver behavior into neural policies, contrasting with reinforcement learning approaches like Q-Learning Graph Networks[19] that discover strategies through trial and error. Imitation Learning SAT[0] sits squarely within the supervised branch, learning branching decisions by mimicking strong hand-designed heuristics rather than exploring the search space independently. This positions it alongside Neural SAT Heuristics[3], which similarly leverages neural architectures for variable selection, but differs in emphasis: while Neural SAT Heuristics[3] may focus on graph-based representations of clause structure, Imitation Learning SAT[0] prioritizes direct policy extraction from expert demonstrations. The central trade-off across these branches remains whether to invest in hand-tuned features and classical insights or to rely on data-driven learning to capture problem-specific patterns, with hybrid approaches like AutoSAT[21] attempting to combine both worlds.

Claimed Contributions

ImitSAT: Imitation learning-based branching policy for CDCL solvers

The authors introduce ImitSAT, the first branching policy for CDCL solvers that uses imitation learning instead of reinforcement learning. It learns from expert KeyTrace demonstrations to predict branching decisions, providing dense decision-level supervision and directly reducing propagations.

3 retrieved papers
KeyTrace: Compact expert trace extraction from CDCL runs

The authors develop a method to collapse full CDCL solver runs into KeyTrace sequences that contain only surviving decisions by systematically removing backtracked detours. These sequences serve as clean, conflict-free training targets that align naturally with prefix-conditioned autoregressive modeling.

10 retrieved papers
Autoregressive sequence modeling formulation for CDCL branching

The authors formulate CDCL branching as a prefix-conditioned autoregressive sequence prediction problem. They serialize the CNF formula and KeyTrace prefix into a single sequence and train a Transformer-based model to predict the next branching decision autoregressively.

6 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Within the taxonomy built over the current TopK core-task papers, the original paper is assigned to a leaf with no direct siblings and no cousin branches under the same grandparent topic. In this retrieved landscape, it appears structurally isolated, which is one partial signal of novelty, but still constrained by search coverage and taxonomy granularity.

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

ImitSAT: Imitation learning-based branching policy for CDCL solvers

The authors introduce ImitSAT, the first branching policy for CDCL solvers that uses imitation learning instead of reinforcement learning. It learns from expert KeyTrace demonstrations to predict branching decisions, providing dense decision-level supervision and directly reducing propagations.

Contribution

KeyTrace: Compact expert trace extraction from CDCL runs

The authors develop a method to collapse full CDCL solver runs into KeyTrace sequences that contain only surviving decisions by systematically removing backtracked detours. These sequences serve as clean, conflict-free training targets that align naturally with prefix-conditioned autoregressive modeling.

Contribution

Autoregressive sequence modeling formulation for CDCL branching

The authors formulate CDCL branching as a prefix-conditioned autoregressive sequence prediction problem. They serialize the CNF formula and KeyTrace prefix into a single sequence and train a Transformer-based model to predict the next branching decision autoregressively.