Language Confusion Gate: Language-Aware Decoding Through Model Self-Distillation

ICLR 2026 Conference SubmissionAnonymous Authors
llmdecoding interventionlanguage confusion
Abstract:

Large language models (LLMs) often experience language confusion, which is the unintended mixing of languages during text generation. Current solutions to this problem either necessitate model retraining or cannot differentiate between harmful confusion and acceptable code-switching. This paper introduces the \textbf{Language Confusion Gate} (LCG), a lightweight, plug-in solution that filters tokens during decoding without altering the base LLM. The LCG is trained using norm-adjusted self-distillation to predict appropriate language families and apply masking only when needed. Our method is based on the findings that language confusion is infrequent, correct-language tokens are usually among the top predictions, and output token embedding norms are larger for high-resource languages, which biases sampling. When evaluated across various models, including Qwen3, GPT-OSS, Gemma3, Llama3.1, LCG decreases language confusion significantly—often by an order of magnitude—without negatively impacting task performance.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper proposes a Language Confusion Gate (LCG), a plug-in decoding-time filter that masks inappropriate language tokens during generation without retraining the base model. It resides in the Token-Level Filtering and Steering leaf, which contains only two papers including this one. This leaf sits within the broader Decoding-Time Intervention Methods branch, indicating a relatively sparse research direction compared to training-based approaches. The taxonomy shows 34 papers across 16 leaf nodes, suggesting the field is moderately populated but this specific decoding-time filtering niche remains underexplored.

The taxonomy reveals that neighboring work clusters around training-time solutions (Preference Optimization, Language-Specific Parameter Modulation) and cross-lingual interference analysis. The sibling paper in the same leaf, Language Steering Latent, manipulates hidden states rather than filtering tokens, highlighting a methodological divergence within the same problem space. The exclude_note clarifies that methods requiring model retraining belong elsewhere, positioning LCG as a lightweight alternative to heavier architectural interventions like language adapters or continual pretraining strategies found in adjacent branches.

Among 20 candidates examined, the LCG mechanism itself shows no clear refutation across 10 candidates reviewed. The norm-adjusted self-distillation training method was not examined against prior work. The specialized training and evaluation datasets contribution examined 10 candidates and found 1 refutable match, suggesting some overlap in dataset construction approaches. The limited search scope means these findings reflect top-K semantic matches rather than exhaustive coverage, with the core gating mechanism appearing more distinctive than the dataset contribution within the examined literature.

Based on the limited search of 20 candidates, the work appears to occupy a relatively novel position within decoding-time token filtering, though the dataset contribution shows some prior overlap. The sparse population of its taxonomy leaf and the methodological contrast with its sole sibling paper suggest a distinct approach, but the analysis does not cover the full landscape of multilingual generation control methods beyond top semantic matches.

Taxonomy

Core-task Taxonomy Papers
34
3
Claimed Contributions
20
Contribution Candidate Papers Compared
1
Refutable Paper

Research Landscape Overview

Core task: Mitigating unintended language mixing during text generation in multilingual language models. The field addresses a fundamental challenge in multilingual NLP: ensuring that models generate text in the intended language without inadvertently switching or blending languages. The taxonomy reveals six major branches that capture complementary perspectives on this problem. Decoding-Time Intervention Methods focus on runtime strategies such as token-level filtering and steering to guide generation toward the target language, while Training-Time and Architectural Approaches modify model design or learning objectives to reduce confusion at its source. Cross-Lingual Knowledge Transfer and Interference examines how shared representations can both enable transfer and introduce unwanted mixing, a tension explored in works like Crosslingual Knowledge Barriers[3] and Interference Multilingual Translation[1]. Evaluation and Benchmarking provides diagnostic tools to measure language confusion, Task-Specific Multilingual Applications adapt these insights to domains like retrieval or translation, and Multilingual Model Development and Pretraining investigates foundational choices in tokenization and corpus balancing that shape a model's propensity for mixing. Recent work highlights contrasting strategies for controlling language confusion. Some approaches intervene during decoding by steering latent representations or filtering undesired tokens, as seen in Language Steering Latent[25] and Controlling Language Confusion[2], while others address the issue earlier through training modifications or architectural constraints like language adapters. The original paper, Language Confusion Gate[0], sits squarely within the Decoding-Time Intervention branch, specifically under Token-Level Filtering and Steering. It shares this focus with Language Steering Latent[25], yet differs in mechanism: where Language Steering Latent[25] manipulates hidden states to enforce language consistency, Language Confusion Gate[0] introduces a gating mechanism to selectively suppress cross-lingual tokens at generation time. This positions it as a lightweight, inference-time solution that complements training-based methods like Mitigating Language Confusion[6] and offers an alternative to heavier architectural interventions, addressing the practical need for post-hoc control in deployed multilingual systems.

Claimed Contributions

Language Confusion Gate (LCG)

The authors propose a lightweight two-layer MLP intervention mechanism that dynamically filters inappropriate tokens at decoding time by predicting permissible language families and applying masking only when necessary, without modifying the base LLM weights.

10 retrieved papers
Norm-adjusted self-distillation training method

The authors introduce a training approach that leverages the model's own debiased top-k/p predictions by adjusting logits with token embedding norms to remove systemic bias toward high-resource languages, enabling the gate to learn from the model's corrected language predictions.

0 retrieved papers
Specialized training and evaluation datasets

The authors collect and release datasets specifically designed for training the language confusion gate and evaluating language confusion across diverse multilingual contexts, covering over 200 languages and approximately 78,000 samples.

10 retrieved papers
Can Refute

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Language Confusion Gate (LCG)

The authors propose a lightweight two-layer MLP intervention mechanism that dynamically filters inappropriate tokens at decoding time by predicting permissible language families and applying masking only when necessary, without modifying the base LLM weights.

Contribution

Norm-adjusted self-distillation training method

The authors introduce a training approach that leverages the model's own debiased top-k/p predictions by adjusting logits with token embedding norms to remove systemic bias toward high-resource languages, enabling the gate to learn from the model's corrected language predictions.

Contribution

Specialized training and evaluation datasets

The authors collect and release datasets specifically designed for training the language confusion gate and evaluating language confusion across diverse multilingual contexts, covering over 200 languages and approximately 78,000 samples.