VEAttack: Downstream-agnostic Vision Encoder Attack against Large Vision Language Models

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 6.0 Download Report PDF

adversarial attackvision-encoder-onlylarge vision language modelsdownstream-agnostic

Large Vision-Language Models (LVLMs) have demonstrated capabilities in multimodal understanding, yet their vulnerability to adversarial attacks raises significant concerns. To achieve practical attacking, this paper aims at efficient and transferable untargeted attacks under limited perturbation sizes. Considering this objective, white‑box attacks require full‑model gradients and task‑specific labels, making costs scale with tasks, while black‑box attacks rely on proxy models, typically requiring large perturbation sizes and elaborate transfer strategies. Given the centrality and widespread reuse of the vision encoder in LVLMs, we adopt a gray‑box setting that targets the vision encoder alone for efficient but effective attacking. We theoretically establish the feasibility of vision‑encoder‑only attacks, laying the foundation for our gray‑box setting. Based on this analysis, we propose perturbing patch tokens rather than the class token, informed by both theoretical and empirical insights. We generate adversarial examples by minimizing the cosine similarity between clean and perturbed visual features, without accessing the subsequent models, tasks, or labels. This significantly reduces computational overhead while eliminating the task and label dependence. VEAttack has achieved a performance degradation of 94.5% on image caption task and 75.7% on visual question answering task. We also reveal some key observations to provide insights into LVLM attack/defense: 1) hidden layer variations of LLM, 2) token attention differential, 3) Möbius band in transfer attack, 4) low sensitivity to attack steps.

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper proposes a gray-box adversarial attack framework targeting the vision encoder of Large Vision-Language Models (LVLMs), aiming for efficient and transferable untargeted attacks under limited perturbation budgets. According to the taxonomy, this work resides in the 'Vision-Language Model Attacks' leaf under 'Adversarial Attack Methodologies and Frameworks'. Notably, this leaf contains only the original paper itself with no sibling papers, suggesting this is a relatively sparse or emerging research direction within the broader adversarial attack landscape. The taxonomy includes fifty papers across approximately thirty-six topics, indicating that vision-language model attacks represent a small but distinct niche.

The taxonomy reveals that the paper's immediate parent branch, 'Adversarial Attack Methodologies and Frameworks', also includes a sibling leaf on 'Robustness Problem Formulation and Analysis', which focuses on theoretical examinations of adversarial robustness definitions rather than attack implementations. Neighboring branches address 'Optimization and Learning Frameworks' and 'Task-Specific Methods and Applications', particularly 'Computer Vision and Multimodal Tasks'. The scope note for the paper's leaf explicitly excludes attacks on unimodal vision or language models alone, positioning this work at the intersection of multimodal architectures. This placement suggests the paper bridges adversarial attack research with the growing field of vision-language integration, diverging from purely vision-centric or language-centric attack strategies.

Among the three identified contributions, the gray-box vision-encoder-only attack framework examined ten candidates and found one potentially refuting prior work, while the theoretical analysis and four key observations examined four and ten candidates respectively with no clear refutations. The literature search scope covered twenty-four candidates total, yielding one refutable pair overall. This indicates that among the limited set of semantically similar papers examined, the core attack framework may have some overlap with existing work, whereas the theoretical justification and empirical observations appear more distinctive. The modest search scale means these findings reflect top-K semantic matches rather than exhaustive coverage of all relevant adversarial attack literature.

Given the limited search scope of twenty-four candidates and the sparse taxonomy leaf with no sibling papers, the work appears to occupy a relatively novel position within vision-language model attacks specifically. However, the presence of one refutable candidate for the main framework contribution suggests that certain aspects may build incrementally on existing gray-box or encoder-targeted attack strategies. The analysis does not cover the full breadth of adversarial robustness research, particularly work published in venues outside the semantic search radius or recent preprints not yet indexed.

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: The paper addresses an unspecified core task, but the taxonomy reveals a research landscape organized around six major branches. Adversarial Attack Methodologies and Frameworks explores techniques for crafting attacks against machine learning systems, with a notable sub-area focusing on Vision-Language Model Attacks where methods target multimodal architectures. Optimization and Learning Frameworks encompasses algorithmic approaches for training and tuning models, including multi-objective optimization strategies as seen in works like Multimodal multi-objective optimization[6] and Tackling the Objective Inconsistency[5]. Task-Specific Methods and Applications groups domain-tailored solutions, while Research Methodology and Problem Formulation addresses foundational questions about how studies are designed and framed, exemplified by works such as Pragmatism as a research[4] and Objectives of the Study[1]. Domain Applications and Empirical Studies captures applied investigations across diverse fields, and Unspecified Study Objectives and Metadata collects works with less clearly defined scopes. Within this landscape, adversarial robustness research has become particularly active, especially at the intersection of vision and language modalities. VEAttack[0] situates itself squarely in the Vision-Language Model Attacks cluster, contributing to an emerging line of work that probes vulnerabilities in systems processing both visual and textual inputs. This contrasts with branches focused on optimization theory or general research methodology, which tend to emphasize algorithmic efficiency or study design principles rather than security concerns. Compared to foundational methodological works like Objectives of the Study[1] or Pragmatism as a research[4], VEAttack[0] adopts a more applied, attack-centric perspective, seeking to expose weaknesses in specific model architectures. The positioning highlights ongoing tensions in the field between developing robust multimodal systems and understanding their failure modes, a theme that cuts across several branches and remains an open question as vision-language models grow in capability and deployment.

Claimed Contributions

Gray-box vision-encoder-only attack framework (VEAttack)

Can Refute

10 retrieved papers

The authors introduce VEAttack, a gray-box attack method that targets only the vision encoder of LVLMs by perturbing patch tokens and minimizing cosine similarity between clean and perturbed visual features. This approach eliminates dependence on downstream tasks, labels, and LLM gradients while achieving efficient and transferable attacks.

10 retrieved papers

Can Refute

Theoretical analysis of vision-encoder-only attack feasibility

4 retrieved papers

The authors provide theoretical analysis establishing a lower bound on perturbations in multimodal aligned features when attacking only the vision encoder. This theoretical foundation demonstrates that perturbations on patch tokens propagate more effectively to downstream LLMs than perturbations on class tokens.

4 retrieved papers

Four key observations about LVLM vulnerabilities

10 retrieved papers

The authors identify and empirically demonstrate four novel observations about LVLM vulnerabilities, including how vision encoder attacks induce LLM hidden layer variations, differential attention to image versus instruction tokens across tasks, a paradoxical relationship between encoder robustness and attack transferability, and reduced sensitivity to attack iteration counts.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Within the taxonomy built over the current TopK core-task papers, the original paper is assigned to a leaf with no direct siblings and no cousin branches under the same grandparent topic. In this retrieved landscape, it appears structurally isolated, which is one partial signal of novelty, but still constrained by search coverage and taxonomy granularity.

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Gray-box vision-encoder-only attack framework (VEAttack)

[53] Break the visual perception: Adversarial attacks targeting encoded visual tokens of large vision-language models PDF

Can Refute

[51] Sample-agnostic adversarial perturbation for vision-language pre-training models PDF

Cannot Refute

[52] When alignment fails: Multimodal adversarial attacks on vision-language-action models PDF

Cannot Refute

[54] Adversarial Attacks against Closed-Source MLLMs via Feature Optimal Alignment PDF

Cannot Refute

[55] Attacking multimodal os agents with malicious image patches PDF

Cannot Refute

[56] Physpatch: A physically realizable and transferable adversarial patch attack for multimodal large language models-based autonomous driving systems PDF

Cannot Refute

[57] Transfer Attack for Bad and Good: Explain and Boost Adversarial Transferability across Multimodal Large Language Models PDF

Cannot Refute

[58] As Firm As Their Foundations: Can open-sourced foundation models be used to create adversarial examples for downstream tasks? PDF

Cannot Refute

[59] Towards Adversarial Robust Learning On Multimodal Neural Networks PDF

Cannot Refute

[60] Attacks on multimodal models PDF

Cannot Refute

Contribution

Theoretical analysis of vision-encoder-only attack feasibility

[61] Transferable Multimodal Attack on Vision-Language Pre-training Models PDF

Cannot Refute

[62] Towards Adversarial Attack on Vision-Language Pre-training Models PDF

Cannot Refute

[63] One perturbation is enough: On generating universal adversarial perturbations against vision-language pre-training models PDF

Cannot Refute

[64] Exploring visual vulnerabilities via multi-loss adversarial search for jailbreaking vision-language models PDF

Cannot Refute

Contribution

Four key observations about LVLM vulnerabilities

[61] Transferable Multimodal Attack on Vision-Language Pre-training Models PDF

Cannot Refute

[65] On Evaluating Adversarial Robustness of Large Vision-Language Models PDF

Cannot Refute

[66] Unveiling Vulnerabilities in Large Vision-Language Models: The SAVJ Jailbreak Approach PDF

Cannot Refute

[67] Visual adversarial attack on vision-language models for autonomous driving PDF

Cannot Refute

[68] Chain of Attack: On the Robustness of Vision-Language Models Against Transfer-Based Adversarial Attacks PDF

Cannot Refute

[69] Probing the robustness of vision-language pretrained models: A multimodal adversarial attack approach PDF

Cannot Refute

[70] Imperceptible Transfer Attack on Large Vision-Language Models PDF

Cannot Refute

[71] IDEATOR: Jailbreaking Large Vision-Language Models Using Themselves PDF

Cannot Refute

[72] Modality-Specific Interactive Attack for Vision-Language Pre-Training Models PDF

Cannot Refute

[73] Large vision-language model security: A survey PDF

Cannot Refute

VEAttack: Downstream-agnostic Vision Encoder Attack against Large Vision Language Models

Overview

Overall Novelty Assessment

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

Contribution Analysis

Gray-box vision-encoder-only attack framework (VEAttack)

[53] Break the visual perception: Adversarial attacks targeting encoded visual tokens of large vision-language models PDF

[51] Sample-agnostic adversarial perturbation for vision-language pre-training models PDF

[52] When alignment fails: Multimodal adversarial attacks on vision-language-action models PDF

[54] Adversarial Attacks against Closed-Source MLLMs via Feature Optimal Alignment PDF

[55] Attacking multimodal os agents with malicious image patches PDF

[56] Physpatch: A physically realizable and transferable adversarial patch attack for multimodal large language models-based autonomous driving systems PDF

[57] Transfer Attack for Bad and Good: Explain and Boost Adversarial Transferability across Multimodal Large Language Models PDF

[58] As Firm As Their Foundations: Can open-sourced foundation models be used to create adversarial examples for downstream tasks? PDF

[59] Towards Adversarial Robust Learning On Multimodal Neural Networks PDF

[60] Attacks on multimodal models PDF

Theoretical analysis of vision-encoder-only attack feasibility

[61] Transferable Multimodal Attack on Vision-Language Pre-training Models PDF

[62] Towards Adversarial Attack on Vision-Language Pre-training Models PDF

[63] One perturbation is enough: On generating universal adversarial perturbations against vision-language pre-training models PDF

[64] Exploring visual vulnerabilities via multi-loss adversarial search for jailbreaking vision-language models PDF

Four key observations about LVLM vulnerabilities

[61] Transferable Multimodal Attack on Vision-Language Pre-training Models PDF

[65] On Evaluating Adversarial Robustness of Large Vision-Language Models PDF

[66] Unveiling Vulnerabilities in Large Vision-Language Models: The SAVJ Jailbreak Approach PDF

[67] Visual adversarial attack on vision-language models for autonomous driving PDF

[68] Chain of Attack: On the Robustness of Vision-Language Models Against Transfer-Based Adversarial Attacks PDF

[69] Probing the robustness of vision-language pretrained models: A multimodal adversarial attack approach PDF

[70] Imperceptible Transfer Attack on Large Vision-Language Models PDF

[71] IDEATOR: Jailbreaking Large Vision-Language Models Using Themselves PDF

[72] Modality-Specific Interactive Attack for Vision-Language Pre-Training Models PDF

[73] Large vision-language model security: A survey PDF

Table of Contents