From Utterance to Vividity: Training Expressive Subtitle Translation LLM via Adaptive Local Preference Optimization

ICLR 2026 Conference SubmissionAnonymous Authors
Large Language ModelSubtitle TranslationPreference Optimization
Abstract:

The rapid development of Large Language Models (LLMs) has significantly enhanced the general capabilities of machine translation. However, as application scenarios become more complex, the limitations of LLMs in vertical domain translations are gradually becoming apparent. In this study, we focus on how to construct translation LLMs that meet the needs of domain customization. We take visual media subtitle translation as our topic and explore how to train expressive and vivid translation LLMs. We investigated the situations of subtitle translation and other domains of literal and liberal translation, verifying the reliability of LLM as reward model and evaluator for translation. Additionally, to train an expressive translation LLM, we constructed and released a multidirectional subtitle parallel corpus dataset and proposed the Adaptive Local Preference Optimization (ALPO) method to address fine-grained preference alignment. Experimental results demonstrate that ALPO achieves outstanding performance in multidimensional evaluation of translation quality.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper proposes Adaptive Local Preference Optimization (ALPO) for training expressive subtitle translation LLMs, alongside a multilingual subtitle corpus (MuSC) and an LLM-based multidimensional evaluation framework. It resides in the 'Expressive and Stylistic Translation' leaf, which contains four papers total, including the original work. This leaf sits within 'Human Translation Practice and Quality', a moderately populated branch addressing translator strategies, cultural adaptation, quality assessment, and stylistic concerns. The focus on expressiveness and vividness in subtitle translation places the work in a relatively sparse research direction compared to broader neural MT or multimodal translation clusters.

The taxonomy reveals neighboring leaves such as 'Translation Strategies and Techniques' (five papers on domestication, reduction, and adaptation) and 'Cultural and Idiomatic Translation' (five papers on culture-specific references). The 'Expressive and Stylistic Translation' leaf explicitly excludes accessibility adaptations and general translation strategies, concentrating instead on preserving emotional content and characterization. Nearby branches include 'Neural Machine Translation for Subtitles' (five papers on end-to-end systems) and 'Multimodal Translation Approaches' (three papers integrating visual and audio signals), indicating that the paper bridges human-centric stylistic concerns with computational methods, a less crowded intersection in the taxonomy.

Among thirty candidates examined, the ALPO method shows one refutable candidate out of ten, suggesting some prior work on preference optimization exists but the specific local adaptation mechanism may be novel. The MuSC dataset encountered no refutable candidates across ten examined papers, indicating potential novelty in multidirectional subtitle corpus construction. The LLM-as-a-Judge evaluation framework found two refutable candidates among ten, reflecting existing work on LLM-based translation assessment but possibly differing in the multidimensional expressiveness focus. The limited search scope means these findings reflect top-K semantic matches rather than exhaustive coverage.

Given the sparse 'Expressive and Stylistic Translation' leaf and the moderate overlap detected in the limited candidate pool, the work appears to occupy a relatively underexplored niche at the intersection of LLM-based translation and expressive subtitle rendering. The analysis is constrained by the thirty-candidate search scope and does not capture the full breadth of preference optimization or LLM evaluation literature outside the subtitle translation domain.

Taxonomy

Core-task Taxonomy Papers
50
3
Claimed Contributions
30
Contribution Candidate Papers Compared
3
Refutable Paper

Research Landscape Overview

Core task: expressive subtitle translation for visual media. The field encompasses a broad spectrum of concerns, from technical methods and automated systems to human translation practice, accessibility, theoretical frameworks, domain-specific applications, professional workflows, and corpus development. Subtitle Translation Methods and Systems[1][2][3][11] explores computational approaches and datasets that support automated or semi-automated translation, while Human Translation Practice and Quality[0][5][7][8][9] examines how translators handle stylistic nuance, quality assessment, and the creative challenges of rendering dialogue expressively. Accessibility-Oriented Subtitling[10][12] addresses the needs of deaf and hard-of-hearing audiences and barrier-free communication, whereas Theoretical and Conceptual Frameworks[25][40][41] provide linguistic and semiotic lenses for understanding audiovisual translation. Domain-Specific and Applied Contexts[13][16][19][22][24][26][32][33][37][38][43][47][48][50] cover culture-specific references, censorship, genre conventions, and regional adaptations, while Professional Subtitling Workflows[17][23][29][34][36] investigate editing, post-editing productivity, and industry tools. Subtitle Corpora and Datasets[20][45] supply the empirical resources that underpin both research and system development. Within this landscape, a particularly active line of work focuses on expressive and stylistic translation, where translators must balance fidelity to source dialogue with target-language naturalness and emotional resonance. Expressive Subtitle Translation[0] sits squarely in this cluster, emphasizing how subtitlers convey tone, register, and affect in constrained textual space. Nearby, Style in Subtitles[7] examines the theoretical dimensions of stylistic choice, while Emotive Reactions Subtitling[9] investigates the rendering of emotional cues and interjections. Easy Language Subtitles[5] explores accessibility through simplified registers, highlighting a different facet of expressive adaptation. These works collectively grapple with the tension between creative freedom and technical constraints, and with the question of how much a translator's subjectivity[21] shapes the final product. Expressive Subtitle Translation[0] contributes to this conversation by foregrounding the expressive dimension as a central quality criterion, aligning closely with studies of style[7] and emotion[9] but with a distinct focus on the interplay between visual context and linguistic choice.

Claimed Contributions

Adaptive Local Preference Optimization (ALPO) method

ALPO is a novel preference alignment strategy designed for fine-grained local preference optimization in subtitle translation. It uses a segment-wise sampling strategy and adaptive alignment loss to train expressive translation LLMs, addressing limitations of outcome-supervised methods like DPO and PPO for tasks requiring multi-segment local alignment.

10 retrieved papers
Can Refute
Multilingual Subtitle Corpus (MuSC) dataset

The authors constructed and released MuSC, a multidirectional subtitle parallel corpus dataset comprising subtitle corpora from multiple translation directions (en⇒de, en⇒fr, en⇒zh, ko⇒zh, zh⇒en, zh⇒th) with 100–200 programs across various genres per direction to support community research in visual media subtitle translation.

10 retrieved papers
Multidimensional evaluation framework based on LLM-as-a-Judge

The authors developed a multidimensional quality evaluation system for subtitle translation that uses LLMs as evaluators to assess three dimensions: accuracy (conveying original meaning), naturalness (fluent expression aligned with target language conventions), and vividness (expressiveness conveying emotions and atmosphere). They validated the reliability of LLMs as evaluators through correlation studies with human preferences.

10 retrieved papers
Can Refute

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Adaptive Local Preference Optimization (ALPO) method

ALPO is a novel preference alignment strategy designed for fine-grained local preference optimization in subtitle translation. It uses a segment-wise sampling strategy and adaptive alignment loss to train expressive translation LLMs, addressing limitations of outcome-supervised methods like DPO and PPO for tasks requiring multi-segment local alignment.

Contribution

Multilingual Subtitle Corpus (MuSC) dataset

The authors constructed and released MuSC, a multidirectional subtitle parallel corpus dataset comprising subtitle corpora from multiple translation directions (en⇒de, en⇒fr, en⇒zh, ko⇒zh, zh⇒en, zh⇒th) with 100–200 programs across various genres per direction to support community research in visual media subtitle translation.

Contribution

Multidimensional evaluation framework based on LLM-as-a-Judge

The authors developed a multidimensional quality evaluation system for subtitle translation that uses LLMs as evaluators to assess three dimensions: accuracy (conveying original meaning), naturalness (fluent expression aligned with target language conventions), and vividness (expressiveness conveying emotions and atmosphere). They validated the reliability of LLMs as evaluators through correlation studies with human preferences.