Inferring the Invisible: Neuro-Symbolic Rule Discovery for Missing Value Imputation

ICLR 2026 Conference SubmissionAnonymous Authors
Neuro-symbolic LearningRule DiscoveryInterpretable Reasoning
Abstract:

One of the central challenges in artificial intelligence is reasoning under partial observability, where key values are missing but essential for understanding and modeling the system. This paper presents a neuro-symbolic framework for latent rule discovery and missing value imputation. In contrast to traditional latent variable models, our approach treats missing grounded values as latent predicates to be inferred through logical reasoning. By interleaving neural representation learning with symbolic rule induction, the model iteratively discovers—both conjunctive and disjunctive rules—that explain observed patterns and recover missing entries. Our framework seamlessly handles heterogeneous data, reasoning over both discrete and continuous features by learning soft predicates from continuous values. Crucially, the inferred values not only fill in gaps in the data but also serve as supporting evidence for further rule induction and inference—creating a feedback loop in which imputation and rule mining reinforce one another. Using coordinate gradient descent, the system learns these rules end-to-end, enabling interpretable reasoning over incomplete data. Experiments on both synthetic and real-world datasets demonstrate that our method effectively imputes missing values while uncovering meaningful, human-interpretable rules that govern system dynamics.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper proposes a neuro-symbolic framework that treats missing values as latent predicates inferred through logical reasoning, combining neural representation learning with symbolic rule induction. It resides in the Association Rule-Based Imputation leaf, which contains only three papers total, indicating a relatively sparse research direction within the broader taxonomy. This leaf sits under Rule-Based Imputation and Learning, one of five major branches addressing incomplete data through symbolic reasoning combined with learning mechanisms.

The taxonomy reveals neighboring approaches across multiple branches: Knowledge Graph-Based Reasoning methods use graph structures for completion tasks, Hybrid Neural-Symbolic Frameworks explore general integration architectures, and Genetic Programming methods employ evolutionary search for symbolic regression with incomplete data. The paper's position in Association Rule-Based Imputation distinguishes it from sibling categories like Rule Learning for Prediction and Fuzzy Rule-Based Systems, which focus on classification tasks or fuzzy logic respectively. The scope note for this leaf emphasizes using rules directly for estimation, excluding purely statistical or neural methods without explicit rule discovery.

Among 21 candidates examined across three contributions, none were identified as clearly refuting the proposed approach. The core neuro-symbolic framework examined 10 candidates with no refutable overlaps, the coordinate gradient descent scheme examined 1 candidate, and the differentiable forward-chaining engine examined 10 candidates, again with no refutations. This suggests that within the limited search scope, the specific combination of treating missing values as latent predicates while interleaving neural learning with symbolic rule induction appears distinct from examined prior work, though the small candidate pool limits definitive conclusions.

Based on the top-21 semantic matches examined, the work appears to occupy a relatively unexplored intersection between neural imputation and symbolic rule discovery. The sparse Association Rule-Based Imputation leaf and absence of refutable candidates suggest potential novelty, though the limited search scope means substantial related work may exist beyond the examined candidates. The framework's feedback loop between imputation and rule mining represents a distinctive architectural choice within the analyzed literature.

Taxonomy

Core-task Taxonomy Papers
50
3
Claimed Contributions
21
Contribution Candidate Papers Compared
0
Refutable Paper

Research Landscape Overview

Core task: neuro-symbolic rule discovery for missing value imputation. The field addresses incomplete data by combining symbolic reasoning with neural learning, structured around five main branches. Neuro-Symbolic Integration Architectures explore hybrid frameworks that merge logic-based and sub-symbolic components, exemplified by works like Neuro-Symbolic VQA[3] and Symbolic vs Sub-symbolic[5], which investigate how symbolic knowledge can guide neural models. Rule-Based Imputation and Learning focuses on extracting interpretable rules from data, including association rule methods such as Association Rules Imputation[9] and statistical rule learning approaches like Learning Statistical Rules[7]. Genetic Programming for Symbolic Regression with Incomplete Data employs evolutionary search to discover symbolic expressions that handle missingness, with methods ranging from wrapper-based strategies like GP Wrapper Imputation[19] to hybrid approaches such as Hybrid GP-KNN[14]. Statistical and Machine Learning Imputation Methods encompass traditional and modern techniques, from classical reviews like Missing Data Review[44] to contemporary deep learning solutions. Application-Driven Incomplete Data Processing targets domain-specific challenges in healthcare, infrastructure monitoring, and other real-world settings where missing data is prevalent. Recent activity highlights tensions between interpretability and predictive power, with many studies exploring how symbolic rules can provide transparency while neural components capture complex patterns. Neuro-Symbolic Imputation[0] sits within the Rule-Based Imputation and Learning branch, specifically under Association Rule-Based Imputation alongside Association Rules Imputation[9] and Fuzzy Decision Uncertainty[49]. While Association Rules Imputation[9] emphasizes mining frequent patterns to fill gaps, Neuro-Symbolic Imputation[0] appears to integrate neural learning more tightly with rule discovery, potentially offering adaptive rule generation. This contrasts with purely symbolic approaches and positions the work at the intersection of interpretable rule extraction and data-driven learning, addressing the challenge of maintaining transparency while leveraging neural network expressiveness for handling complex missingness patterns.

Claimed Contributions

Neuro-symbolic framework for latent rule discovery and missing value imputation

The authors introduce a framework that treats missing values as latent predicates to be inferred through logical reasoning. By interleaving neural representation learning with symbolic rule induction, the model iteratively discovers conjunctive and disjunctive rules that explain observed patterns and recover missing entries.

10 retrieved papers
Scalable coordinate gradient descent scheme with sequential covering and joint fine-tuning

The authors propose an optimization method that updates one rule or clause at a time while holding others fixed. This approach includes sequential covering to harvest diverse clauses and joint fine-tuning using a soft-OR aggregator, enabling the discovery of long chains and disjunctive theories under high missingness.

1 retrieved paper
Unified differentiable forward-chaining engine for heterogeneous data

The authors develop a mechanism that handles heterogeneous data by learning soft predicates for continuous features using sigmoid thresholds and slopes, and combining them with discrete predicates through differentiable logical operators such as soft-min for AND and soft-max for OR.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Neuro-symbolic framework for latent rule discovery and missing value imputation

The authors introduce a framework that treats missing values as latent predicates to be inferred through logical reasoning. By interleaving neural representation learning with symbolic rule induction, the model iteratively discovers conjunctive and disjunctive rules that explain observed patterns and recover missing entries.

Contribution

Scalable coordinate gradient descent scheme with sequential covering and joint fine-tuning

The authors propose an optimization method that updates one rule or clause at a time while holding others fixed. This approach includes sequential covering to harvest diverse clauses and joint fine-tuning using a soft-OR aggregator, enabling the discovery of long chains and disjunctive theories under high missingness.

Contribution

Unified differentiable forward-chaining engine for heterogeneous data

The authors develop a mechanism that handles heterogeneous data by learning soft predicates for continuous features using sigmoid thresholds and slopes, and combining them with discrete predicates through differentiable logical operators such as soft-min for AND and soft-max for OR.