LogicSR: A Unified Benchmark for Logical Discovery from Data

ICLR 2026 Conference SubmissionAnonymous Authors
Symbolic RegressionLogical ReasoningNeuro-symbolic LearningBenchmark DatasetBoolean Expressions
Abstract:

Discovering underlying logical expressions from data is a critical task for interpretable AI and scientific discovery, yet it remains poorly served by existing research infrastructure. The field of Symbolic Regression (SR) primarily focuses on continuous mathematical functions, while Logic Synthesis (LS) is designed for exact, noise-free specifications, not for learning from incomplete or noisy data. This leaves a crucial gap for evaluating algorithms that can learn generalizable logical rules in realistic scenarios. To address this, we introduce LogicSR, a large-scale and comprehensive benchmark for logical symbolic regression. LogicSR is built from two sources: real-world problems from digital circuits and biological networks, and a novel synthetic data generator capable of producing a diverse set of complex logical formulas at scale. We use LogicSR to conduct a rigorous evaluation of 17 algorithms, spanning classical logic solvers, modern machine learning models, and Large Language Models (LLMs). Our findings reveal that the logical modeling capabilities and generalization robustness of these algorithms significantly depend on task scale and logical complexity, with current cutting-edge LLMs showing limited complex logical reasoning ability. LogicSR provides a robust foundation to benchmark progress, unify evaluation across disparate fields, and steer the future development of powerful neuro-symbolic systems.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper introduces LogicSR, a benchmark for learning logical expressions from noisy, incomplete data—a task distinct from continuous symbolic regression and exact logic synthesis. Within the taxonomy, it resides in 'Specialized Domain Applications' alongside four sibling papers addressing biological networks, water distribution systems, and circuit design. This leaf represents a relatively sparse research direction (five papers total) focused on applying logical discovery to concrete engineering and scientific domains, suggesting the work targets an underserved niche rather than a crowded methodological space.

The taxonomy reveals that most logical expression discovery research concentrates in two neighboring branches: 'Symbolic Regression and Formula Discovery' (ten papers across three sub-areas) and 'Logical Rule Learning and Reasoning' (nineteen papers spanning knowledge graphs, temporal logic, and neuro-symbolic integration). LogicSR bridges these areas by addressing logical (not continuous) formulas while handling noisy data (unlike exact logic synthesis). The benchmark's dual focus on real-world circuits/biological networks and synthetic generation distinguishes it from purely domain-specific methods in its leaf and from general-purpose symbolic regression approaches that lack logical structure.

Among thirty candidates examined, none clearly refute the three core contributions. The LogicSR benchmark itself (ten candidates, zero refutations) appears novel as a dedicated evaluation framework for logical symbolic regression under noise. The synthetic data generator (ten candidates, zero refutations) shows no direct prior work in the limited search scope, though the analysis does not cover exhaustive generation literature. The cross-domain evaluation of seventeen algorithms (ten candidates, zero refutations) represents a substantial empirical effort, with no overlapping multi-algorithm comparisons identified in the examined papers. These statistics suggest originality within the search scope, though the limited candidate pool (thirty total) means undiscovered prior work remains possible.

Based on the top-thirty semantic matches and taxonomy structure, the work addresses a genuine gap between continuous symbolic regression and exact logic synthesis. The benchmark's combination of real-world and synthetic logical tasks, evaluated across classical solvers, ML models, and LLMs, appears distinctive within the examined literature. However, the analysis covers a narrow slice of potential prior work—broader searches in logic synthesis, program synthesis, or SAT-based learning communities might reveal additional relevant baselines or evaluation frameworks not captured here.

Taxonomy

Core-task Taxonomy Papers
50
3
Claimed Contributions
30
Contribution Candidate Papers Compared
0
Refutable Paper

Research Landscape Overview

Core task: Discovering logical expressions from data. The field encompasses a diverse set of approaches organized into four main branches. Symbolic Regression and Formula Discovery focuses on extracting mathematical and symbolic models directly from observations, often leveraging evolutionary algorithms or neural architectures to uncover closed-form equations. Logical Rule Learning and Reasoning emphasizes the induction of interpretable logical rules—ranging from propositional to first-order logic—that capture underlying patterns in structured or relational data. Deep Learning for Logical Tasks explores how neural networks can be trained to perform or assist in logical inference, blending differentiable modules with symbolic reasoning. Finally, Applications and Domain-Specific Methods tailors these techniques to specialized domains such as temporal logic synthesis, biological systems, water network modeling, and software verification, demonstrating how domain constraints guide the discovery process. Recent work highlights contrasting trade-offs between interpretability and expressiveness. Some studies pursue purely symbolic outputs for transparency, while others integrate neural components to handle noisy or high-dimensional data at the cost of reduced clarity. Within the Applications and Domain-Specific Methods branch, LogicSR[0] sits alongside efforts like Logical Circuits Fungi[22] and Symbolic Water Networks[36], which apply logical or symbolic discovery to concrete engineering and biological problems. Compared to Mining Logical Arithmetic[12], which targets arithmetic rule extraction, LogicSR[0] emphasizes a broader logical framework suitable for diverse application contexts. Meanwhile, SwarmFlawFinder[50] illustrates how domain-specific heuristics can guide search in software analysis. Overall, LogicSR[0] occupies a niche where domain expertise and logical structure converge, bridging general symbolic regression methods with the practical demands of specialized fields.

Claimed Contributions

LogicSR benchmark for logical symbolic regression

The authors present LogicSR, a unified benchmark designed to evaluate algorithms that discover logical expressions from data. It combines real-world problems from digital circuits and biological networks with a novel synthetic data generator, addressing the gap between continuous symbolic regression and exact logic synthesis.

10 retrieved papers
Novel synthetic data generation algorithm

The authors develop a two-stage synthesis process for generating large-scale, complex, and structurally diverse ground-truth logic networks. This algorithm uses truth table analysis, structured sampling, and graph-based composition to produce diverse, non-redundant logical formulas at scale.

10 retrieved papers
Comprehensive cross-domain evaluation of 17 algorithms

The authors conduct a rigorous evaluation of 17 algorithms spanning classical logic solvers, modern machine learning models, and large language models. The evaluation reveals capability boundaries and provides insights on scalability, noise robustness, and operator-set compatibility across current methods.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

LogicSR benchmark for logical symbolic regression

The authors present LogicSR, a unified benchmark designed to evaluate algorithms that discover logical expressions from data. It combines real-world problems from digital circuits and biological networks with a novel synthetic data generator, addressing the gap between continuous symbolic regression and exact logic synthesis.

Contribution

Novel synthetic data generation algorithm

The authors develop a two-stage synthesis process for generating large-scale, complex, and structurally diverse ground-truth logic networks. This algorithm uses truth table analysis, structured sampling, and graph-based composition to produce diverse, non-redundant logical formulas at scale.

Contribution

Comprehensive cross-domain evaluation of 17 algorithms

The authors conduct a rigorous evaluation of 17 algorithms spanning classical logic solvers, modern machine learning models, and large language models. The evaluation reveals capability boundaries and provides insights on scalability, noise robustness, and operator-set compatibility across current methods.

LogicSR: A Unified Benchmark for Logical Discovery from Data | Novelty Validation