AdPO: Enhancing the Adversarial Robustness of Large Vision-Language Models with Preference Optimization
Overview
Taxonomy
Research Landscape Overview
Claimed Contributions
The authors propose AdPO, a novel adversarial defense method that reframes adversarial training as a preference optimization problem. This approach enhances LVLMs' preference for generating correct outputs on clean inputs while rejecting misleading outputs on adversarial examples, representing the first application of preference optimization techniques to adversarial training.
The authors introduce two complementary optimization components: Preferred Image Optimization increases probability of correct outputs under clean inputs while decreasing erroneous outputs under adversarial images, and Adversarial Image Optimization explicitly optimizes for correct responses under adversarial inputs. This dual approach serves as a general adversarial training framework applicable beyond specific algorithms or models.
The authors demonstrate that adversarial training can be performed on smaller LVLM models (e.g., TinyLLaVA) and the resulting robust image encoder can be transferred to larger models. This strategy achieves computational efficiency comparable to previous methods while reducing overfitting risks and enabling fair comparison with prior CLIP-based approaches.
Core Task Comparisons
Comparisons with papers in the same taxonomy category
Contribution Analysis
Detailed comparisons for each claimed contribution
AdPO: Adversarial defense strategy based on preference optimization
The authors propose AdPO, a novel adversarial defense method that reframes adversarial training as a preference optimization problem. This approach enhances LVLMs' preference for generating correct outputs on clean inputs while rejecting misleading outputs on adversarial examples, representing the first application of preference optimization techniques to adversarial training.
[71] A preference-driven paradigm for enhanced translation with large language models PDF
[72] Calibrated self-rewarding vision language models PDF
[73] Positive enhanced preference alignment for text-to-image models PDF
[74] Modality-balancing preference optimization of large multimodal models by adversarial negative mining PDF
[75] Structured preference modeling for reinforcement learning-based fine-tuning of large models PDF
[76] Pref-grpo: Pairwise preference reward-based grpo for stable text-to-image reinforcement learning PDF
[77] Structured preference optimization for vision-language long-horizon task planning PDF
[78] Aligning modalities in vision large language models via preference fine-tuning PDF
[79] SAMPO: Visual Preference Optimization for Intent-Aware Segmentation with Vision Foundation Models PDF
[80] TCPO: Thought-Centric Preference Optimization for Effective Embodied Decision-making PDF
Dual optimization strategy combining PIO and AIO
The authors introduce two complementary optimization components: Preferred Image Optimization increases probability of correct outputs under clean inputs while decreasing erroneous outputs under adversarial images, and Adversarial Image Optimization explicitly optimizes for correct responses under adversarial inputs. This dual approach serves as a general adversarial training framework applicable beyond specific algorithms or models.
[61] On the duality between sharpness-aware minimization and adversarial training PDF
[62] Solving the robustness puzzle: The joint impact of optimization approach, robustness metrics, and scenarios on water resources management under deep ⦠PDF
[63] Balancing generalization and robustness in adversarial training via steering through clean and adversarial gradient directions PDF
[64] Fortify the guardian, not the treasure: Resilient adversarial detectors PDF
[65] Trade-off between robustness and accuracy of vision transformers PDF
[66] Robustness and accuracy could be reconcilable by (proper) definition PDF
[67] Improving the accuracy-robustness trade-off of classifiers via adaptive smoothing PDF
[68] Optimizing Robustness and Accuracy in Mixture of Experts: A Dual-Model Approach PDF
[69] R&D-Agent-Quant: A Multi-Agent Framework for Data-Centric Factors and Model Joint Optimization PDF
[70] Enhancing infrared small target detection robustness with bi-level adversarial framework PDF
Transfer learning approach from smaller to larger LVLMs
The authors demonstrate that adversarial training can be performed on smaller LVLM models (e.g., TinyLLaVA) and the resulting robust image encoder can be transferred to larger models. This strategy achieves computational efficiency comparable to previous methods while reducing overfitting risks and enabling fair comparison with prior CLIP-based approaches.