LogicSR: A Unified Benchmark for Logical Discovery from Data

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 6.0 Download Report PDF

Symbolic RegressionLogical ReasoningNeuro-symbolic LearningBenchmark DatasetBoolean Expressions

Discovering underlying logical expressions from data is a critical task for interpretable AI and scientific discovery, yet it remains poorly served by existing research infrastructure. The field of Symbolic Regression (SR) primarily focuses on continuous mathematical functions, while Logic Synthesis (LS) is designed for exact, noise-free specifications, not for learning from incomplete or noisy data. This leaves a crucial gap for evaluating algorithms that can learn generalizable logical rules in realistic scenarios. To address this, we introduce LogicSR, a large-scale and comprehensive benchmark for logical symbolic regression. LogicSR is built from two sources: real-world problems from digital circuits and biological networks, and a novel synthetic data generator capable of producing a diverse set of complex logical formulas at scale. We use LogicSR to conduct a rigorous evaluation of 17 algorithms, spanning classical logic solvers, modern machine learning models, and Large Language Models (LLMs). Our findings reveal that the logical modeling capabilities and generalization robustness of these algorithms significantly depend on task scale and logical complexity, with current cutting-edge LLMs showing limited complex logical reasoning ability. LogicSR provides a robust foundation to benchmark progress, unify evaluation across disparate fields, and steer the future development of powerful neuro-symbolic systems.

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper introduces LogicSR, a benchmark for learning logical expressions from noisy, incomplete data—a task distinct from continuous symbolic regression and exact logic synthesis. Within the taxonomy, it resides in 'Specialized Domain Applications' alongside four sibling papers addressing biological networks, water distribution systems, and circuit design. This leaf represents a relatively sparse research direction (five papers total) focused on applying logical discovery to concrete engineering and scientific domains, suggesting the work targets an underserved niche rather than a crowded methodological space.

The taxonomy reveals that most logical expression discovery research concentrates in two neighboring branches: 'Symbolic Regression and Formula Discovery' (ten papers across three sub-areas) and 'Logical Rule Learning and Reasoning' (nineteen papers spanning knowledge graphs, temporal logic, and neuro-symbolic integration). LogicSR bridges these areas by addressing logical (not continuous) formulas while handling noisy data (unlike exact logic synthesis). The benchmark's dual focus on real-world circuits/biological networks and synthetic generation distinguishes it from purely domain-specific methods in its leaf and from general-purpose symbolic regression approaches that lack logical structure.

Among thirty candidates examined, none clearly refute the three core contributions. The LogicSR benchmark itself (ten candidates, zero refutations) appears novel as a dedicated evaluation framework for logical symbolic regression under noise. The synthetic data generator (ten candidates, zero refutations) shows no direct prior work in the limited search scope, though the analysis does not cover exhaustive generation literature. The cross-domain evaluation of seventeen algorithms (ten candidates, zero refutations) represents a substantial empirical effort, with no overlapping multi-algorithm comparisons identified in the examined papers. These statistics suggest originality within the search scope, though the limited candidate pool (thirty total) means undiscovered prior work remains possible.

Based on the top-thirty semantic matches and taxonomy structure, the work addresses a genuine gap between continuous symbolic regression and exact logic synthesis. The benchmark's combination of real-world and synthetic logical tasks, evaluated across classical solvers, ML models, and LLMs, appears distinctive within the examined literature. However, the analysis covers a narrow slice of potential prior work—broader searches in logic synthesis, program synthesis, or SAT-based learning communities might reveal additional relevant baselines or evaluation frameworks not captured here.

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: Discovering logical expressions from data. The field encompasses a diverse set of approaches organized into four main branches. Symbolic Regression and Formula Discovery focuses on extracting mathematical and symbolic models directly from observations, often leveraging evolutionary algorithms or neural architectures to uncover closed-form equations. Logical Rule Learning and Reasoning emphasizes the induction of interpretable logical rules—ranging from propositional to first-order logic—that capture underlying patterns in structured or relational data. Deep Learning for Logical Tasks explores how neural networks can be trained to perform or assist in logical inference, blending differentiable modules with symbolic reasoning. Finally, Applications and Domain-Specific Methods tailors these techniques to specialized domains such as temporal logic synthesis, biological systems, water network modeling, and software verification, demonstrating how domain constraints guide the discovery process. Recent work highlights contrasting trade-offs between interpretability and expressiveness. Some studies pursue purely symbolic outputs for transparency, while others integrate neural components to handle noisy or high-dimensional data at the cost of reduced clarity. Within the Applications and Domain-Specific Methods branch, LogicSR[0] sits alongside efforts like Logical Circuits Fungi[22] and Symbolic Water Networks[36], which apply logical or symbolic discovery to concrete engineering and biological problems. Compared to Mining Logical Arithmetic[12], which targets arithmetic rule extraction, LogicSR[0] emphasizes a broader logical framework suitable for diverse application contexts. Meanwhile, SwarmFlawFinder[50] illustrates how domain-specific heuristics can guide search in software analysis. Overall, LogicSR[0] occupies a niche where domain expertise and logical structure converge, bridging general symbolic regression methods with the practical demands of specialized fields.

Claimed Contributions

LogicSR benchmark for logical symbolic regression

10 retrieved papers

The authors present LogicSR, a unified benchmark designed to evaluate algorithms that discover logical expressions from data. It combines real-world problems from digital circuits and biological networks with a novel synthetic data generator, addressing the gap between continuous symbolic regression and exact logic synthesis.

10 retrieved papers

Novel synthetic data generation algorithm

10 retrieved papers

The authors develop a two-stage synthesis process for generating large-scale, complex, and structurally diverse ground-truth logic networks. This algorithm uses truth table analysis, structured sampling, and graph-based composition to produce diverse, non-redundant logical formulas at scale.

10 retrieved papers

Comprehensive cross-domain evaluation of 17 algorithms

10 retrieved papers

The authors conduct a rigorous evaluation of 17 algorithms spanning classical logic solvers, modern machine learning models, and large language models. The evaluation reveals capability boundaries and provides insights on scalability, noise robustness, and operator-set compatibility across current methods.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

[12] Mining logical arithmetic expressions from proper representations PDF

Eitan Kosman, Ilya Kolchinsky, Assaf Schuster, I. Kolchinsky, A. Schuster (2022)

[22] Mining logical circuits in fungi PDF

Nic Roberts, Andrew Adamatzky (2022)

[36] Using symbolic machine learning to assess and model substance transport and decay in water distribution networks PDF

Daniele Laucelli, Laura Enriquez, Juan Saldarriaga, Orazio Giustolisi (2024)

[50] SWARMFLAWFINDER: Discovering and Exploiting Logic Flaws of Swarm Algorithms PDF

Chi-jung Jung, Ali Ahad, Chi-Gon Jung, Yuseok Jeon, A. Ahad, Yonghwi Kwon (2022)

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

LogicSR benchmark for logical symbolic regression

[41] Deep Learning for Symbolic Mathematics PDF

Cannot Refute

[61] Logictree: Improving complex reasoning of LLMs via instantiated multi-step synthetic logical data PDF

Cannot Refute

[71] DivLogicEval: A framework for benchmarking logical reasoning evaluation in large language models PDF

Cannot Refute

[72] Recent Advances in Symbolic Regression PDF

Cannot Refute

[73] Large language models meet symbolic provers for logical reasoning evaluation PDF

Cannot Refute

[74] Integrating Expert Knowledge into Logical Programs via LLMs PDF

Cannot Refute

[75] LogicBench: A Benchmark for Evaluation of Logical Reasoning PDF

Cannot Refute

[76] Rock: Cleaning Data by Embedding ML in Logic Rules PDF

Cannot Refute

[77] Contemporary Symbolic Regression Methods and their Relative Performance PDF

Cannot Refute

[78] EDGE: Evaluation Framework for Logical vs. Subgraph Explanations for Node Classifiers on Knowledge Graphs PDF

Cannot Refute

Contribution

Novel synthetic data generation algorithm

[61] Logictree: Improving complex reasoning of LLMs via instantiated multi-step synthetic logical data PDF

Cannot Refute

[62] SynLogic: Synthesizing Verifiable Reasoning Data at Scale for Learning Logical Reasoning and Beyond PDF

Cannot Refute

[63] Logic Augmented Generation PDF

Cannot Refute

[64] Logical natural language generation from open-domain tables PDF

Cannot Refute

[65] ABAC policy mining through affiliation networks and biclique analysis PDF

Cannot Refute

[66] Scalable anytime algorithms for learning fragments of linear temporal logic PDF

Cannot Refute

[67] Synthetic data generation for statistical testing PDF

Cannot Refute

[68] System for automatic generation of logical formulas PDF

Cannot Refute

[69] Enhancing reasoning capabilities of llms via principled synthetic logic corpus PDF

Cannot Refute

[70] DeLoSo: Detecting Logic Synthesis Optimization Faults Based on Configuration Diversity PDF

Cannot Refute

Contribution

Comprehensive cross-domain evaluation of 17 algorithms

[51] Evaluating the Logical Reasoning Ability of ChatGPT and GPT-4 PDF

Cannot Refute

[52] Verifybench: A systematic benchmark for evaluating reasoning verifiers across domains PDF

Cannot Refute

[53] Survey and Evaluation of Causal Discovery Methods for Time Series PDF

Cannot Refute

[54] Revisiting Reinforcement Learning for LLM Reasoning from A Cross-Domain Perspective PDF

Cannot Refute

[55] Cross-domain Open-world Discovery PDF

Cannot Refute

[56] Latent logic tree extraction for event sequence explanation from llms PDF

Cannot Refute

[57] A Cross-Domain Evaluation of Approaches for Causal Knowledge Extraction PDF

Cannot Refute

[58] Criteria2Query: a natural language interface to clinical databases for cohort definition PDF

Cannot Refute

[59] Truth Discovery Algorithms: An Experimental Evaluation PDF

Cannot Refute

[60] Security Key Management Protocol for Cross-domain Authentication of Internet of Vehicles PDF

Cannot Refute

LogicSR: A Unified Benchmark for Logical Discovery from Data

Overview

Overall Novelty Assessment

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

[12] Mining logical arithmetic expressions from proper representations PDF

[22] Mining logical circuits in fungi PDF

[36] Using symbolic machine learning to assess and model substance transport and decay in water distribution networks PDF

[50] SWARMFLAWFINDER: Discovering and Exploiting Logic Flaws of Swarm Algorithms PDF

Contribution Analysis

LogicSR benchmark for logical symbolic regression

[41] Deep Learning for Symbolic Mathematics PDF

[61] Logictree: Improving complex reasoning of LLMs via instantiated multi-step synthetic logical data PDF

[71] DivLogicEval: A framework for benchmarking logical reasoning evaluation in large language models PDF

[72] Recent Advances in Symbolic Regression PDF

[73] Large language models meet symbolic provers for logical reasoning evaluation PDF

[74] Integrating Expert Knowledge into Logical Programs via LLMs PDF

[75] LogicBench: A Benchmark for Evaluation of Logical Reasoning PDF

[76] Rock: Cleaning Data by Embedding ML in Logic Rules PDF

[77] Contemporary Symbolic Regression Methods and their Relative Performance PDF

[78] EDGE: Evaluation Framework for Logical vs. Subgraph Explanations for Node Classifiers on Knowledge Graphs PDF

Novel synthetic data generation algorithm

[61] Logictree: Improving complex reasoning of LLMs via instantiated multi-step synthetic logical data PDF

[62] SynLogic: Synthesizing Verifiable Reasoning Data at Scale for Learning Logical Reasoning and Beyond PDF

[63] Logic Augmented Generation PDF

[64] Logical natural language generation from open-domain tables PDF

[65] ABAC policy mining through affiliation networks and biclique analysis PDF

[66] Scalable anytime algorithms for learning fragments of linear temporal logic PDF

[67] Synthetic data generation for statistical testing PDF

[68] System for automatic generation of logical formulas PDF

[69] Enhancing reasoning capabilities of llms via principled synthetic logic corpus PDF

[70] DeLoSo: Detecting Logic Synthesis Optimization Faults Based on Configuration Diversity PDF

Comprehensive cross-domain evaluation of 17 algorithms

[51] Evaluating the Logical Reasoning Ability of ChatGPT and GPT-4 PDF

[52] Verifybench: A systematic benchmark for evaluating reasoning verifiers across domains PDF

[53] Survey and Evaluation of Causal Discovery Methods for Time Series PDF

[54] Revisiting Reinforcement Learning for LLM Reasoning from A Cross-Domain Perspective PDF

[55] Cross-domain Open-world Discovery PDF

[56] Latent logic tree extraction for event sequence explanation from llms PDF

[57] A Cross-Domain Evaluation of Approaches for Causal Knowledge Extraction PDF

[58] Criteria2Query: a natural language interface to clinical databases for cohort definition PDF

[59] Truth Discovery Algorithms: An Experimental Evaluation PDF

[60] Security Key Management Protocol for Cross-domain Authentication of Internet of Vehicles PDF

Table of Contents