Preference Leakage: A Contamination Problem in LLM-as-a-judge
Overview
Taxonomy
Research Landscape Overview
Claimed Contributions
The authors formally define preference leakage as a new contamination issue that arises when the LLM used for synthetic data generation and the LLM used as an evaluator are related, causing systematic bias in evaluation scores. They identify three types of relatedness: being the same model, having an inheritance relationship, and belonging to the same model family.
The authors perform comprehensive experiments using multiple LLM baselines and benchmarks (Arena-Hard and AlpacaEval 2.0) to empirically confirm that judge LLMs exhibit systematic bias toward their related student models. They introduce the preference leakage score metric to quantify this bias across different scenarios.
The authors investigate the underlying mechanisms of preference leakage through recognition experiments and category analyses. They demonstrate that preference leakage is particularly hard to detect, especially affecting subjective questions and judgment dimensions, and that judge LLMs cannot reliably recognize their related student models' generations.
Core Task Comparisons
Comparisons with papers in the same taxonomy category
Contribution Analysis
Detailed comparisons for each claimed contribution
Preference leakage problem definition
The authors formally define preference leakage as a new contamination issue that arises when the LLM used for synthetic data generation and the LLM used as an evaluator are related, causing systematic bias in evaluation scores. They identify three types of relatedness: being the same model, having an inheritance relationship, and belonging to the same model family.
Empirical validation of preference leakage bias
The authors perform comprehensive experiments using multiple LLM baselines and benchmarks (Arena-Hard and AlpacaEval 2.0) to empirically confirm that judge LLMs exhibit systematic bias toward their related student models. They introduce the preference leakage score metric to quantify this bias across different scenarios.
Analysis of preference leakage mechanisms and characteristics
The authors investigate the underlying mechanisms of preference leakage through recognition experiments and category analyses. They demonstrate that preference leakage is particularly hard to detect, especially affecting subjective questions and judgment dimensions, and that judge LLMs cannot reliably recognize their related student models' generations.