DeLeaker: Dynamic Inference-Time Reweighting For Semantic Leakage Mitigation in Text-to-Image Models

ICLR 2026 Conference SubmissionAnonymous Authors
text-to-imagesemantic leakagecomputer visionautomatic evaluationmultimodal
Abstract:

Text-to-Image (T2I) models have advanced rapidly, yet they remain vulnerable to semantic leakage, the unintended transfer of semantically related features between distinct entities. Existing mitigation strategies are often optimization-based or dependent on external inputs. We introduce DeLeaker, a lightweight, optimization-free inference-time approach that mitigates leakage by directly intervening on the model’s attention maps. Throughout the diffusion process, DeLeaker dynamically reweights attention maps to suppress excessive cross-entity interactions while strengthening the identity of each entity. To support systematic evaluation, we introduce SLIM (Semantic Leakage in IMages), the first dataset dedicated to semantic leakage, comprising 1,130 human-verified samples spanning diverse scenarios, together with a novel automatic evaluation framework. Experiments demonstrate that DeLeaker consistently outperforms all baselines, even when they are provided with external information, achieving effective leakage mitigation without compromising fidelity or quality. These results underscore the value of attention control and pave the way for more semantically precise T2I models.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper introduces DeLeaker, a lightweight inference-time method that mitigates semantic leakage by dynamically reweighting attention maps during diffusion. It resides in the 'Inference-Time Attention Reweighting' leaf, which contains five papers total, including the original work. This leaf sits within the broader 'Attention Mechanism Intervention' branch, one of nine major research directions in the taxonomy. The relatively small cluster suggests this specific approach—dynamic reweighting without optimization or external inputs—represents a focused but not overcrowded research direction within the larger semantic leakage mitigation landscape.

The taxonomy reveals neighboring leaves addressing related attention-based strategies: 'Attention Map Alignment and Control' enforces spatial constraints using layouts or energy objectives, while 'Text Self-Attention and Syntactic Guidance' leverages text encoder structure. These sibling categories share the attention intervention philosophy but differ in mechanism. Beyond attention methods, parallel branches explore embedding manipulation, concept erasure, and bias mitigation, indicating that the field pursues semantic control through diverse complementary pathways. DeLeaker's focus on cross-entity interaction suppression distinguishes it from spatial alignment methods and positions it as a refinement of dynamic attention control.

Among thirty candidates examined, the DeLeaker method (Contribution A) shows two refutable candidates from ten examined, suggesting some prior work addresses inference-time attention reweighting for semantic control. The SLIM dataset (Contribution B) and automated evaluation framework (Contribution C) each examined ten candidates with zero refutations, indicating these evaluation contributions appear more distinctive within the limited search scope. The statistics reflect a targeted literature search rather than exhaustive coverage, meaning the analysis captures top semantic matches and immediate citations but may not encompass all relevant prior work in this evolving subfield.

Based on the limited search scope of thirty candidates, the method contribution appears to build incrementally on existing attention reweighting strategies, while the evaluation contributions show stronger novelty signals. The taxonomy context reveals a moderately populated research direction with clear boundaries separating dynamic reweighting from spatial alignment and embedding-based approaches. The analysis provides a snapshot of the immediate research neighborhood but does not claim comprehensive coverage of all semantic leakage mitigation techniques.

Taxonomy

Core-task Taxonomy Papers
50
3
Claimed Contributions
30
Contribution Candidate Papers Compared
2
Refutable Paper

Research Landscape Overview

Core task: Semantic leakage mitigation in text-to-image generation addresses the challenge of preventing unintended information transfer or attribute mixing when generating images from textual descriptions. The field's taxonomy reveals a diverse landscape organized around nine major branches. Attention Mechanism Intervention focuses on inference-time reweighting and dynamic modulation strategies to steer cross-attention maps, as exemplified by works like Attend and Excite[2] and Be Yourself[3]. Embedding and Latent Space Manipulation explores how to disentangle or constrain representations to prevent unwanted semantic bleed, while Concept Erasure and Safety Filtering (surveyed in Concept Erasure Survey[4]) targets the removal of harmful or copyrighted content. Bias Detection and Mitigation examines fairness issues in generated outputs, and Multi-Subject and Multi-Attribute Composition tackles the challenge of faithfully rendering multiple entities without attribute confusion. Style Preservation and Content Leakage Prevention aims to separate stylistic elements from content, Cross-Modal and Temporal Consistency ensures coherence across modalities and frames, Semantic Alignment and Data Augmentation refines training signals, and Specialized Applications address domain-specific leakage problems. Recent work highlights contrasting strategies for controlling semantic flow during generation. Attention-based methods like Temporal Adaptive Attention[11] and Attention Modulation[34] dynamically adjust cross-attention weights to prevent attribute leakage, while embedding-level approaches manipulate latent codes or token representations to enforce semantic boundaries. DeLeaker[0] operates within the Inference-Time Attention Reweighting cluster, sharing the attention intervention philosophy of Attend and Excite[2] and Be Yourself[3], yet it emphasizes mitigating leakage through targeted reweighting rather than broad excitation or identity preservation. This positions DeLeaker[0] as a refinement of attention control strategies, addressing scenarios where subtle semantic drift occurs despite standard guidance. Meanwhile, works like InstantStyle[1] and Only Style[18] tackle the related but distinct problem of style-content separation, illustrating how different branches converge on the shared goal of preventing unintended information mixing through complementary mechanisms.

Claimed Contributions

DeLeaker: Dynamic Inference-Time Reweighting Method

DeLeaker is a novel inference-time method that mitigates semantic leakage in text-to-image models by dynamically reweighting attention maps. It suppresses cross-entity interactions while strengthening each entity's self-identity, without requiring external inputs or costly optimization.

10 retrieved papers
Can Refute
SLIM Dataset for Semantic Leakage Evaluation

SLIM is the first dedicated dataset explicitly designed to evaluate semantic leakage in text-to-image models. It contains 1,130 human-verified samples organized into five subsets covering diverse leakage scenarios, including visually similar entities, spatial interactions, and multi-entity compositions.

10 retrieved papers
Automated Evaluation Framework for Semantic Leakage

The authors introduce a comprehensive automated evaluation pipeline for assessing semantic leakage mitigation. The framework uses comparative evaluation that breaks down the assessment into discrete logical steps, including leakage detection, mitigation success ranking, and preservation of image quality, validated through extensive human study.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

DeLeaker: Dynamic Inference-Time Reweighting Method

DeLeaker is a novel inference-time method that mitigates semantic leakage in text-to-image models by dynamically reweighting attention maps. It suppresses cross-entity interactions while strengthening each entity's self-identity, without requiring external inputs or costly optimization.

Contribution

SLIM Dataset for Semantic Leakage Evaluation

SLIM is the first dedicated dataset explicitly designed to evaluate semantic leakage in text-to-image models. It contains 1,130 human-verified samples organized into five subsets covering diverse leakage scenarios, including visually similar entities, spatial interactions, and multi-entity compositions.

Contribution

Automated Evaluation Framework for Semantic Leakage

The authors introduce a comprehensive automated evaluation pipeline for assessing semantic leakage mitigation. The framework uses comparative evaluation that breaks down the assessment into discrete logical steps, including leakage detection, mitigation success ranking, and preservation of image quality, validated through extensive human study.