PreferThinker: Reasoning-based Personalized Image Preference Assessment

ICLR 2026 Conference SubmissionAnonymous Authors
Image Preference Assessment;Multimodal Large Language Model;Chain-of-Thought
Abstract:

Personalized image preference assessment aims to evaluate an individual user's image preferences by relying only on a small set of reference images as prior information. Existing methods mainly focus on general preference assessment, training models with large-scale data to tackle well-defined tasks such as text-image alignment. However, these approaches struggle to handle personalized preference because user-specific data are scarce and not easily scalable, and individual tastes are often diverse and complex. To overcome these challenges, we introduce a common preference profile that serves as a bridge across users, allowing large-scale user data to be leveraged for training profile prediction and capturing complex personalized preferences. Building on this idea, we propose a reasoning-based personalized image preference assessment framework that follows a \textit{predict-then-assess} paradigm: it first predicts a user's preference profile from reference images, and then provides interpretable, multi-dimensional scores and assessments of candidate images based on the predicted profile. To support this, we first construct a large-scale Chain-of-Thought (CoT)-style personalized assessment dataset annotated with diverse user preference profiles and high-quality CoT-style reasoning, enabling explicit supervision of structured reasoning. Next, we adopt a two-stage training strategy: a cold-start supervised fine-tuning phase to empower the model with structured reasoning capabilities, followed by reinforcement learning to incentivize the model to explore more reasonable assessment paths and enhance generalization. Furthermore, we propose a similarity-aware prediction reward to encourage better prediction of the user's preference profile, which facilitates more reasonable assessments exploration. Extensive experiments demonstrate the superiority of the proposed method. Our code and dataset will be publicly released.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper proposes a reasoning-based framework for personalized image preference assessment that predicts user-specific preference profiles from reference images and then evaluates candidate images accordingly. It resides in the 'Profile-Based Personalized Aesthetics Assessment' leaf, which contains five papers including the original work. This leaf sits within the broader 'Personalized Aesthetics and Preference Modeling' branch, indicating a moderately populated research direction focused on explicit profile modeling. The taxonomy shows that personalized aesthetics is one of several major branches alongside generic quality assessment and domain-specific methods, suggesting the paper addresses a well-defined but not overcrowded niche.

The taxonomy reveals neighboring leaves addressing implicit preference learning from user interactions, adaptive scalability across many users, privacy-preserving federated approaches, and specialized applications like color vision deficiency. The paper's profile-based approach contrasts with implicit methods that learn from ratings without explicit profiles and differs from generic quality assessment branches that apply universal perceptual criteria. The taxonomy's scope and exclude notes clarify that reasoning-based assessment distinguishes this work from simpler profile-based methods, while its focus on personalization separates it from generic aesthetics models that lack user-specific customization.

Among twenty-seven candidates examined across three contributions, none were found to clearly refute the proposed ideas. The common preference profile concept examined ten candidates with zero refutations, the reasoning-based predict-then-assess framework examined seven candidates with zero refutations, and the CoT-style dataset and training strategy examined ten candidates with zero refutations. This limited search scope—covering top-K semantic matches and citation expansion rather than exhaustive review—suggests that within the examined literature, the contributions appear distinct. The profile-based leaf contains four sibling papers, indicating some prior work in explicit profile modeling, though none among the examined candidates directly overlaps with the reasoning-based approach.

Based on the limited search of twenty-seven candidates, the work appears to introduce novel elements in reasoning-based personalized assessment, though the analysis does not cover the full breadth of personalized aesthetics research. The taxonomy structure shows this is an active area with multiple related directions, and the absence of refutations among examined candidates suggests the specific combination of profile prediction and reasoning-based evaluation may be distinctive within the scope analyzed.

Taxonomy

Core-task Taxonomy Papers
50
3
Claimed Contributions
27
Contribution Candidate Papers Compared
0
Refutable Paper

Research Landscape Overview

Core task: Personalized image preference assessment. The field encompasses a broad spectrum of approaches to evaluating image quality and aesthetics, ranging from generic methods that apply universally to domain-specific techniques tailored for medical imaging, underwater scenes, or other specialized contexts. At the top level, the taxonomy distinguishes between personalized aesthetics and preference modeling—where systems adapt to individual tastes—and generic or domain-specific quality assessment that targets objective or context-dependent criteria. Additional branches address subjective evaluation studies that probe human perception, transfer and adaptation methods that align models across domains or modalities, personalized content generation and recommendation systems, benchmark datasets, and interactions between image processing and quality metrics. Review and survey papers provide overarching perspectives, such as Aesthetics Review[5], which synthesizes trends across these diverse lines of work. Within personalized aesthetics and preference modeling, a particularly active area focuses on profile-based methods that learn user-specific or group-specific preferences from historical ratings or rich attribute annotations. Personalized Aesthetics[3] and Rich Attributes[7] exemplify efforts to capture individual differences through explicit user profiles or detailed image attributes, while Interaction Matrix[12] and Content Attribute[26] explore how content features interact with user characteristics. PreferThinker[0] sits squarely in this profile-based cluster, emphasizing reasoning mechanisms that integrate user-specific signals to predict personalized preferences. Compared to neighboring works like Personalized Aesthetics[3], which often rely on collaborative filtering or attribute-based embeddings, PreferThinker[0] introduces a more deliberative approach to modeling individual taste. This contrasts with broader generic quality assessment methods such as Deep Learning Blind[17] or KonIQ[19], which prioritize universal perceptual criteria over personalization, highlighting an ongoing tension between scalability and the granularity of user-specific adaptation.

Claimed Contributions

Common preference profile bridging users for personalized assessment

The authors propose a preference profile composed of common visual elements (such as color and art style) that characterizes individual preferences while being shared across users. This design enables leveraging large-scale data for training and addresses the challenges of limited personalized data and complex individual tastes.

10 retrieved papers
Reasoning-based predict-then-assess framework (PreferThinker)

The authors develop a two-stage framework that first predicts a user's preference profile from reference images, then uses this profile to provide interpretable and multi-dimensional assessments of candidate images through structured reasoning.

7 retrieved papers
CoT-style personalized assessment dataset and two-stage training strategy

The authors create a large-scale dataset with Chain-of-Thought annotations for personalized preference assessment and employ a two-stage training approach: supervised fine-tuning for structured reasoning followed by reinforcement learning with a similarity-aware prediction reward to improve generalization.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Common preference profile bridging users for personalized assessment

The authors propose a preference profile composed of common visual elements (such as color and art style) that characterizes individual preferences while being shared across users. This design enables leveraging large-scale data for training and addresses the challenges of limited personalized data and complex individual tastes.

Contribution

Reasoning-based predict-then-assess framework (PreferThinker)

The authors develop a two-stage framework that first predicts a user's preference profile from reference images, then uses this profile to provide interpretable and multi-dimensional assessments of candidate images through structured reasoning.

Contribution

CoT-style personalized assessment dataset and two-stage training strategy

The authors create a large-scale dataset with Chain-of-Thought annotations for personalized preference assessment and employ a two-stage training approach: supervised fine-tuning for structured reasoning followed by reinforcement learning with a similarity-aware prediction reward to improve generalization.

PreferThinker: Reasoning-based Personalized Image Preference Assessment | Novelty Validation