Learning with Dual-level Noisy Correspondence for Multi-modal Entity Alignment

ICLR 2026 Conference SubmissionAnonymous Authors
Noisy correspondence; Multi-modal entity alignment.
Abstract:

Multi-modal entity alignment (MMEA) aims to identify equivalent entities across heterogeneous multi-modal knowledge graphs (MMKGs), where each entity is described by attributes from various modalities. Existing methods typically assume that both intra-entity and inter-graph correspondences are faultless, which is often violated in real-world MMKGs due to the reliance on expert annotations. In this paper, we reveal and study a highly practical yet under-explored problem in MMEA, termed Dual-level Noisy Correspondence (DNC). DNC refers to misalignments in both intra-entity (entity-attribute) and inter-graph (entity-entity and attribute-attribute) correspondences. To address the DNC problem, we propose a robust MMEA framework termed RULE. RULE first estimates the reliability of both intra-entity and inter-graph correspondences via a dedicated two-fold principle. Leveraging the estimated reliabilities, RULE mitigates the negative impact of intra-entity noise during attribute fusion and prevents overfitting to noisy inter-graph correspondences during inter-graph discrepancy elimination. Beyond the training-time designs, RULE further incorporates a correspondence reasoning module that uncovers the underlying attribute-attribute connection across graphs, guaranteeing more accurate equivalent entity identification. Extensive experiments on five benchmarks verify the effectiveness of our method against the DNC compared with seven state-of-the-art methods. The code will be released upon acceptance.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Taxonomy

Core-task Taxonomy Papers
50
3
Claimed Contributions
26
Contribution Candidate Papers Compared
0
Refutable Paper

Research Landscape Overview

Core task: multi-modal entity alignment with noisy correspondences. This field addresses the challenge of matching entities across different modalities—such as text, images, and knowledge graph structures—when the training data contains incorrect or ambiguous pairings. The taxonomy organizes research into four main branches. Cross-Modal Retrieval with Noisy Correspondence focuses on learning robust embeddings for retrieval tasks despite mismatched pairs, often employing contrastive learning and noise-filtering strategies (e.g., Noisy Correspondence Learning[32], Rematch Mismatched Pairs[17]). Multi-modal Entity Alignment in Knowledge Graphs targets entity matching within structured knowledge bases, where noise arises from incomplete or conflicting attributes; methods here emphasize graph-based reasoning and dual-level noise handling (e.g., Dual Noisy Correspondence[0], Progressive Graph Matching[34]). Domain-Specific Multi-modal Alignment explores specialized settings such as medical imaging or cross-lingual retrieval, adapting general techniques to domain constraints. Multi-modal Alignment Foundations and Mechanisms investigates underlying principles—attention mechanisms, uncertainty quantification, and modality bias mitigation—that support noise-robust alignment across diverse applications. Several active lines of work reveal key trade-offs and open questions. One prominent theme is the tension between filtering out noisy pairs and retaining informative hard negatives: approaches like Disentangled Noisy Correspondence[9] and Consistency Refining Mining[1] attempt to disentangle true mismatches from challenging but correct pairs, while others such as Deep Evidential Learning[5] quantify uncertainty to guide sample weighting. Another contrast emerges between methods that rely on progressive refinement (e.g., Progressive Perspective Matching[3]) versus those that jointly model noise at multiple levels. Dual Noisy Correspondence[0] sits within the knowledge graph branch and emphasizes handling noise at both the entity and attribute levels simultaneously, distinguishing it from single-level filtering strategies. Compared to Progressive Perspective Matching[3], which iteratively refines alignments, Dual Noisy Correspondence[0] adopts a more integrated dual-level framework, reflecting ongoing debates about whether noise should be addressed incrementally or holistically.

Claimed Contributions

Dual-level Noisy Correspondence (DNC) problem formulation

The authors identify and formalize a new problem in multi-modal entity alignment where misalignments occur at two levels: within entities (entity-attribute pairs) and across knowledge graphs (entity-entity and attribute-attribute pairs). They demonstrate empirically that this dual-level noise undermines both attribute fusion and inter-graph alignment.

10 retrieved papers
RULE framework with two-fold reliability estimation principle

The authors propose RULE, a robust framework that estimates correspondence reliability using uncertainty and consensus principles. This reliability estimation enables the method to reduce the impact of noisy correspondences during both attribute fusion within entities and alignment across knowledge graphs.

10 retrieved papers
Test-time correspondence reasoning module

The authors introduce a correspondence reasoning module that operates during inference to discover latent semantic connections between attributes across graphs. This module uses multi-modal large language models with chain-of-thought reasoning to improve entity identification accuracy at test time, representing a novel contribution to test-time robustness in MMEA.

6 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Within the taxonomy built over the current TopK core-task papers, the original paper is assigned to a leaf with no direct siblings and no cousin branches under the same grandparent topic. In this retrieved landscape, it appears structurally isolated, which is one partial signal of novelty, but still constrained by search coverage and taxonomy granularity.

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Dual-level Noisy Correspondence (DNC) problem formulation

The authors identify and formalize a new problem in multi-modal entity alignment where misalignments occur at two levels: within entities (entity-attribute pairs) and across knowledge graphs (entity-entity and attribute-attribute pairs). They demonstrate empirically that this dual-level noise undermines both attribute fusion and inter-graph alignment.

Contribution

RULE framework with two-fold reliability estimation principle

The authors propose RULE, a robust framework that estimates correspondence reliability using uncertainty and consensus principles. This reliability estimation enables the method to reduce the impact of noisy correspondences during both attribute fusion within entities and alignment across knowledge graphs.

Contribution

Test-time correspondence reasoning module

The authors introduce a correspondence reasoning module that operates during inference to discover latent semantic connections between attributes across graphs. This module uses multi-modal large language models with chain-of-thought reasoning to improve entity identification accuracy at test time, representing a novel contribution to test-time robustness in MMEA.