From Utterance to Vividity: Training Expressive Subtitle Translation LLM via Adaptive Local Preference Optimization

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 6.0 Download Report PDF

Large Language ModelSubtitle TranslationPreference Optimization

The rapid development of Large Language Models (LLMs) has significantly enhanced the general capabilities of machine translation. However, as application scenarios become more complex, the limitations of LLMs in vertical domain translations are gradually becoming apparent. In this study, we focus on how to construct translation LLMs that meet the needs of domain customization. We take visual media subtitle translation as our topic and explore how to train expressive and vivid translation LLMs. We investigated the situations of subtitle translation and other domains of literal and liberal translation, verifying the reliability of LLM as reward model and evaluator for translation. Additionally, to train an expressive translation LLM, we constructed and released a multidirectional subtitle parallel corpus dataset and proposed the Adaptive Local Preference Optimization (ALPO) method to address fine-grained preference alignment. Experimental results demonstrate that ALPO achieves outstanding performance in multidimensional evaluation of translation quality.

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper proposes Adaptive Local Preference Optimization (ALPO) for training expressive subtitle translation LLMs, alongside a multilingual subtitle corpus (MuSC) and an LLM-based multidimensional evaluation framework. It resides in the 'Expressive and Stylistic Translation' leaf, which contains four papers total, including the original work. This leaf sits within 'Human Translation Practice and Quality', a moderately populated branch addressing translator strategies, cultural adaptation, quality assessment, and stylistic concerns. The focus on expressiveness and vividness in subtitle translation places the work in a relatively sparse research direction compared to broader neural MT or multimodal translation clusters.

The taxonomy reveals neighboring leaves such as 'Translation Strategies and Techniques' (five papers on domestication, reduction, and adaptation) and 'Cultural and Idiomatic Translation' (five papers on culture-specific references). The 'Expressive and Stylistic Translation' leaf explicitly excludes accessibility adaptations and general translation strategies, concentrating instead on preserving emotional content and characterization. Nearby branches include 'Neural Machine Translation for Subtitles' (five papers on end-to-end systems) and 'Multimodal Translation Approaches' (three papers integrating visual and audio signals), indicating that the paper bridges human-centric stylistic concerns with computational methods, a less crowded intersection in the taxonomy.

Among thirty candidates examined, the ALPO method shows one refutable candidate out of ten, suggesting some prior work on preference optimization exists but the specific local adaptation mechanism may be novel. The MuSC dataset encountered no refutable candidates across ten examined papers, indicating potential novelty in multidirectional subtitle corpus construction. The LLM-as-a-Judge evaluation framework found two refutable candidates among ten, reflecting existing work on LLM-based translation assessment but possibly differing in the multidimensional expressiveness focus. The limited search scope means these findings reflect top-K semantic matches rather than exhaustive coverage.

Given the sparse 'Expressive and Stylistic Translation' leaf and the moderate overlap detected in the limited candidate pool, the work appears to occupy a relatively underexplored niche at the intersection of LLM-based translation and expressive subtitle rendering. The analysis is constrained by the thirty-candidate search scope and does not capture the full breadth of preference optimization or LLM evaluation literature outside the subtitle translation domain.

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: expressive subtitle translation for visual media. The field encompasses a broad spectrum of concerns, from technical methods and automated systems to human translation practice, accessibility, theoretical frameworks, domain-specific applications, professional workflows, and corpus development. Subtitle Translation Methods and Systems[1][2][3][11] explores computational approaches and datasets that support automated or semi-automated translation, while Human Translation Practice and Quality[0][5][7][8][9] examines how translators handle stylistic nuance, quality assessment, and the creative challenges of rendering dialogue expressively. Accessibility-Oriented Subtitling[10][12] addresses the needs of deaf and hard-of-hearing audiences and barrier-free communication, whereas Theoretical and Conceptual Frameworks[25][40][41] provide linguistic and semiotic lenses for understanding audiovisual translation. Domain-Specific and Applied Contexts[13][16][19][22][24][26][32][33][37][38][43][47][48][50] cover culture-specific references, censorship, genre conventions, and regional adaptations, while Professional Subtitling Workflows[17][23][29][34][36] investigate editing, post-editing productivity, and industry tools. Subtitle Corpora and Datasets[20][45] supply the empirical resources that underpin both research and system development. Within this landscape, a particularly active line of work focuses on expressive and stylistic translation, where translators must balance fidelity to source dialogue with target-language naturalness and emotional resonance. Expressive Subtitle Translation[0] sits squarely in this cluster, emphasizing how subtitlers convey tone, register, and affect in constrained textual space. Nearby, Style in Subtitles[7] examines the theoretical dimensions of stylistic choice, while Emotive Reactions Subtitling[9] investigates the rendering of emotional cues and interjections. Easy Language Subtitles[5] explores accessibility through simplified registers, highlighting a different facet of expressive adaptation. These works collectively grapple with the tension between creative freedom and technical constraints, and with the question of how much a translator's subjectivity[21] shapes the final product. Expressive Subtitle Translation[0] contributes to this conversation by foregrounding the expressive dimension as a central quality criterion, aligning closely with studies of style[7] and emotion[9] but with a distinct focus on the interplay between visual context and linguistic choice.

Claimed Contributions

Adaptive Local Preference Optimization (ALPO) method

Can Refute

10 retrieved papers

ALPO is a novel preference alignment strategy designed for fine-grained local preference optimization in subtitle translation. It uses a segment-wise sampling strategy and adaptive alignment loss to train expressive translation LLMs, addressing limitations of outcome-supervised methods like DPO and PPO for tasks requiring multi-segment local alignment.

10 retrieved papers

Can Refute

Multilingual Subtitle Corpus (MuSC) dataset

10 retrieved papers

The authors constructed and released MuSC, a multidirectional subtitle parallel corpus dataset comprising subtitle corpora from multiple translation directions (en⇒de, en⇒fr, en⇒zh, ko⇒zh, zh⇒en, zh⇒th) with 100–200 programs across various genres per direction to support community research in visual media subtitle translation.

10 retrieved papers

Multidimensional evaluation framework based on LLM-as-a-Judge

Can Refute

10 retrieved papers

The authors developed a multidimensional quality evaluation system for subtitle translation that uses LLMs as evaluators to assess three dimensions: accuracy (conveying original meaning), naturalness (fluent expression aligned with target language conventions), and vividness (expressiveness conveying emotions and atmosphere). They validated the reliability of LLMs as evaluators through correlation studies with human preferences.

10 retrieved papers

Can Refute

Core Task Comparisons

Comparisons with papers in the same taxonomy category

[5] Translating subtitles into Easy Language: First considerations and empirical investigations PDF

J Nitzke, S Hansen-Schirra, AK Habig (2022)

[7] Style in Subtitles: A Dialogical Approach to Characterisation in Subtitled Film and Television Drama PDF

Meister Lova (2025)

[9] Assessing the subtitling of emotive reactions: A social semiotic approach PDF

Muhammad A. A. Taghian, A. M. Ali, Ahmad M. Ali (2023)

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Adaptive Local Preference Optimization (ALPO) method

[55] Fine-grained video dubbing duration alignment with segment supervised preference optimization PDF

Can Refute

[51] CHiP: Cross-modal Hierarchical Direct Preference Optimization for Multimodal LLMs PDF

Cannot Refute

[52] Yandex Submission to the WMT25 General Machine Translation Task PDF

Cannot Refute

[53] Error analysis prompting enables human-like translation evaluation in large language models PDF

Cannot Refute

[54] MAPO: Advancing Multilingual Reasoning through Multilingual Alignment-as-Preference Optimization PDF

Cannot Refute

[56] STARS: Segment-level Token Alignment with Rejection Sampling in Large Language Models PDF

Cannot Refute

[57] DDPO: Diversity-Driven Preference Optimization for Machine Translation Enhancing Robustness and Generalization PDF

Cannot Refute

[58] Inducing Robustness in a 2 Dimensional Direct Preference Optimization Paradigm PDF

Cannot Refute

[59] Plan2Align: Predictive Planning Based Test-Time Preference Alignment for Large Language Models PDF

Cannot Refute

[60] Beyond Single-Reward: Multi-Pair, Multi-Perspective Preference Optimization for Machine Translation PDF

Cannot Refute

Contribution

Multilingual Subtitle Corpus (MuSC) dataset

[20] VISA: An Ambiguous Subtitles Dataset for Visual Scene-aware Machine Translation PDF

Cannot Refute

[71] Preservation of sentiment in machine translation of low-resource languages: A case study on Slovak movie subtitles PDF

Cannot Refute

[72] Tech-driven advances in audiovisual translation: developing a cloud-based English-Arabic subtitle corpus for training and practice PDF

Cannot Refute

[73] A Multilingual Parallel Corpora Collection Effort for Indian Languages PDF

Cannot Refute

[74] A reception study of machine translated subtitles for MOOCs PDF

Cannot Refute

[75] Research and development of a subtitle management system using artificial intelligence PDF

Cannot Refute

[76] WCC-JC 2.0: A web-crawled and manually aligned parallel corpus for Japanese-Chinese neural machine translation PDF

Cannot Refute

[77] Tag Assisted Neural Machine Translation of Film Subtitles PDF

Cannot Refute

[78] Video-helpful multimodal machine translation PDF

Cannot Refute

[79] ArzEn-MultiGenre: An aligned parallel dataset of Egyptian Arabic song lyrics, novels, and subtitles, with English translations PDF

Cannot Refute

Contribution

Multidimensional evaluation framework based on LLM-as-a-Judge

[63] M-MAD: Multidimensional multi-agent debate for advanced machine translation evaluation PDF

Can Refute

[67] TransEvalnia: Reasoning-based Evaluation and Ranking of Translations PDF

Can Refute

[61] Large Language Models Are State-of-the-Art Evaluators of Translation Quality PDF

Cannot Refute

[62] Learning evaluation models from large language models for sequence generation PDF

Cannot Refute

[64] UvA-MT at WMT25 Evaluation Task: LLM Uncertainty as a Proxy for Translation Quality PDF

Cannot Refute

[65] D'ej\a Vu: Multilingual LLM Evaluation through the Lens of Machine Translation Evaluation PDF

Cannot Refute

[66] HiMATE: A Hierarchical Multi-Agent Framework for Machine Translation Evaluation PDF

Cannot Refute

[68] Fine-Grained and Multi-Dimensional Metrics for Document-Level Machine Translation PDF

Cannot Refute

[69] MQM-APE: toward high-quality error annotation predictors with automatic post-editing in LLM translation evaluators PDF

Cannot Refute

[70] MTQ-Eval: Multilingual Text Quality Evaluation for Language Models PDF

Cannot Refute

From Utterance to Vividity: Training Expressive Subtitle Translation LLM via Adaptive Local Preference Optimization

Overview

Overall Novelty Assessment

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

[5] Translating subtitles into Easy Language: First considerations and empirical investigations PDF

[7] Style in Subtitles: A Dialogical Approach to Characterisation in Subtitled Film and Television Drama PDF

[9] Assessing the subtitling of emotive reactions: A social semiotic approach PDF

Contribution Analysis

Adaptive Local Preference Optimization (ALPO) method

[55] Fine-grained video dubbing duration alignment with segment supervised preference optimization PDF

[51] CHiP: Cross-modal Hierarchical Direct Preference Optimization for Multimodal LLMs PDF

[52] Yandex Submission to the WMT25 General Machine Translation Task PDF

[53] Error analysis prompting enables human-like translation evaluation in large language models PDF

[54] MAPO: Advancing Multilingual Reasoning through Multilingual Alignment-as-Preference Optimization PDF

[56] STARS: Segment-level Token Alignment with Rejection Sampling in Large Language Models PDF

[57] DDPO: Diversity-Driven Preference Optimization for Machine Translation Enhancing Robustness and Generalization PDF

[58] Inducing Robustness in a 2 Dimensional Direct Preference Optimization Paradigm PDF

[59] Plan2Align: Predictive Planning Based Test-Time Preference Alignment for Large Language Models PDF

[60] Beyond Single-Reward: Multi-Pair, Multi-Perspective Preference Optimization for Machine Translation PDF

Multilingual Subtitle Corpus (MuSC) dataset

[20] VISA: An Ambiguous Subtitles Dataset for Visual Scene-aware Machine Translation PDF

[71] Preservation of sentiment in machine translation of low-resource languages: A case study on Slovak movie subtitles PDF

[72] Tech-driven advances in audiovisual translation: developing a cloud-based English-Arabic subtitle corpus for training and practice PDF

[73] A Multilingual Parallel Corpora Collection Effort for Indian Languages PDF

[74] A reception study of machine translated subtitles for MOOCs PDF

[75] Research and development of a subtitle management system using artificial intelligence PDF

[76] WCC-JC 2.0: A web-crawled and manually aligned parallel corpus for Japanese-Chinese neural machine translation PDF

[77] Tag Assisted Neural Machine Translation of Film Subtitles PDF

[78] Video-helpful multimodal machine translation PDF

[79] ArzEn-MultiGenre: An aligned parallel dataset of Egyptian Arabic song lyrics, novels, and subtitles, with English translations PDF

Multidimensional evaluation framework based on LLM-as-a-Judge

[63] M-MAD: Multidimensional multi-agent debate for advanced machine translation evaluation PDF

[67] TransEvalnia: Reasoning-based Evaluation and Ranking of Translations PDF

[61] Large Language Models Are State-of-the-Art Evaluators of Translation Quality PDF

[62] Learning evaluation models from large language models for sequence generation PDF

[64] UvA-MT at WMT25 Evaluation Task: LLM Uncertainty as a Proxy for Translation Quality PDF

[65] D'ej\a Vu: Multilingual LLM Evaluation through the Lens of Machine Translation Evaluation PDF

[66] HiMATE: A Hierarchical Multi-Agent Framework for Machine Translation Evaluation PDF

[68] Fine-Grained and Multi-Dimensional Metrics for Document-Level Machine Translation PDF

[69] MQM-APE: toward high-quality error annotation predictors with automatic post-editing in LLM translation evaluators PDF

[70] MTQ-Eval: Multilingual Text Quality Evaluation for Language Models PDF

Table of Contents