ExPO-HM: Learning to Explain-then-Detect for Hateful Meme Detection
Overview
Overall Novelty Assessment
The paper introduces ExPO-HM, a framework combining supervised fine-tuning warmup, GRPO with curriculum learning, and Conditional Decision Entropy (CDE) for explainable hateful meme detection. It resides in the Chain-of-Thought and Multi-Step Reasoning leaf, which contains five papers total including the original work. This leaf sits within the broader Reasoning-Enhanced Detection Frameworks branch, indicating a moderately populated research direction focused on sequential inference and interpretable classification pathways rather than direct end-to-end detection.
The taxonomy reveals neighboring leaves addressing related reasoning paradigms: Multi-Agent Reasoning and Debate explores argumentation-based classification, Rationale Distillation transfers reasoning knowledge to smaller models, and Evolutionary and Contextual Reasoning models cultural progression. These sibling branches share the goal of interpretable detection but diverge in mechanism—ExPO-HM emphasizes policy optimization over chain-of-thought, whereas debate methods use multi-agent conflict resolution. The broader Explainability and Interpretability Methods branch focuses on post-hoc justifications rather than reasoning-guided classification, highlighting ExPO-HM's positioning at the intersection of reasoning and explainability.
Among thirty candidates examined, the ExPO-HM framework contribution shows one refutable candidate out of ten examined, suggesting some prior work in explain-then-detect architectures. The Conditional Decision Entropy metric examined ten candidates with none refutable, indicating potential novelty in using entropy-based rewards for reasoning quality. The evaluation framework contribution also examined ten candidates without refutations, though comprehensive benchmarking is common in this field. The limited search scope means these statistics reflect top-K semantic matches rather than exhaustive coverage, and the single refutation for the core framework warrants closer inspection of overlapping prior methods.
Based on the thirty candidates examined, the work appears to occupy a moderately explored space within reasoning-enhanced detection, with the CDE metric showing stronger novelty signals than the overall framework architecture. The taxonomy structure confirms this is an active research direction with multiple competing approaches, though not as densely populated as end-to-end classification methods. The analysis captures semantic proximity but cannot rule out relevant work outside the top-K retrieval scope or in adjacent communities.
Taxonomy
Research Landscape Overview
Claimed Contributions
The authors introduce ExPO-HM, a framework that combines SFT warmup on policy manuals, GRPO with curriculum learning, and Conditional Decision Entropy rewards to enable hateful meme detection systems that generate explanations before making predictions, mimicking how human moderators are trained.
The authors propose CDE, which measures the entropy of a model's decision conditioned on its generated explanation. CDE serves dual purposes: evaluating reasoning quality and providing a reward signal during training to encourage confident correct predictions while penalizing confident errors.
The authors establish an evaluation framework that assesses models not only on binary hateful versus benign classification but also on fine-grained categories such as attack types and target groups, plus reasoning quality judged by LLMs, better reflecting real-world moderation needs.
Core Task Comparisons
Comparisons with papers in the same taxonomy category
[5] Detecting Harmful Memes with Decoupled Understanding and Guided CoT Reasoning PDF
[7] SAFE-MEME: Structured reasoning framework for robust hate speech detection in memes PDF
[24] MemeMind: A Large-Scale Multimodal Dataset with Chain-of-Thought Reasoning for Harmful Meme Detection PDF
[27] MemHateCaptioning: Enhancing Hate Speech Detection in Memes with Context-Aware Captioning and Chain-of-Thought PDF
Contribution Analysis
Detailed comparisons for each claimed contribution
ExPO-HM framework for Explain-then-Detect hateful meme detection
The authors introduce ExPO-HM, a framework that combines SFT warmup on policy manuals, GRPO with curriculum learning, and Conditional Decision Entropy rewards to enable hateful meme detection systems that generate explanations before making predictions, mimicking how human moderators are trained.
[77] Towards Trustworthy Multimodal Moderation via Policy-Aligned Reasoning and Hierarchical Labeling PDF
[69] ICM-Assistant: Instruction-tuning Multimodal Large Language Models for Rule-based Explainable Image Content Moderation PDF
[70] MoMoE: Mixture of Moderation Experts Framework for AI-Assisted Online Governance PDF
[71] Detoxifying language model outputs: combining multi-agent debates and reinforcement learning for improved summarization PDF
[72] Unlocking the Essence of Beauty: Advanced Aesthetic Reasoning with Relative-Absolute Policy Optimization PDF
[73] IVY-FAKE: A Unified Explainable Framework and Benchmark for Image and Video AIGC Detection PDF
[74] T2I-Eval-R1: Reinforcement Learning-Driven Reasoning for Interpretable Text-to-Image Evaluation PDF
[75] Entity-Aware Optimal Transport and Residual Attention for Multimodal Content Moderation PDF
[76] Towards Explainable Bilingual Multimodal Misinformation Detection and Localization PDF
[78] PPO-XLM-R Enhanced Crowd Intelligence for Early Misinformation Prediction in Urdu Social Media PDF
Conditional Decision Entropy (CDE) metric and reward
The authors propose CDE, which measures the entropy of a model's decision conditioned on its generated explanation. CDE serves dual purposes: evaluating reasoning quality and providing a reward signal during training to encourage confident correct predictions while penalizing confident errors.
[51] Entropy-based logic explanations of neural networks PDF
[52] An Explainable Machine Learning Network for Classification of Autism Spectrum Disorder Using Optimal Frequency Band Identification From Brain EEG PDF
[53] On entropy-based term weighting schemes for text categorization PDF
[54] Improved GraphSVX for GNN Explanations Based on Cross Entropy PDF
[55] Entropy-based fuzzy support vector machine for imbalanced datasets PDF
[56] Explainable ResNet50 learning model based on copula entropy for cotton plant disease prediction PDF
[57] Explaining a machine-learning lane change model with maximum entropy Shapley values PDF
[58] Edge entropy as an indicator of the effectiveness of gnns over cnns for node classification PDF
[59] Metric Learning in Freewill EEG Pre-Movement and Movement Intention Classification for Brain Machine Interfaces PDF
[60] Entropy Reweighted Conformal Classification PDF
Comprehensive evaluation framework for hateful meme detection
The authors establish an evaluation framework that assesses models not only on binary hateful versus benign classification but also on fine-grained categories such as attack types and target groups, plus reasoning quality judged by LLMs, better reflecting real-world moderation needs.