Steerable Adversarial Scenario Generation through Test-Time Preference Alignment

ICLR 2026 Conference SubmissionAnonymous Authors
Adversarial Scenario GenerationAutonomous DrivingTraffic ModelingTest-time Alignment
Abstract:

Adversarial scenario generation is a cost-effective approach for safety assessment of autonomous driving systems. However, existing methods are often constrained to a single, fixed trade-off between competing objectives such as adversariality and realism. This yields behavior-specific models that cannot be steered at inference time, lacking the efficiency and flexibility to generate tailored scenarios for diverse training and testing requirements. In view of this, we reframe the task of adversarial scenario generation as a multi-objective preference alignment problem and introduce a new framework named Steerable Adversarial scenario GEnerator (SAGE). SAGE enables fine-grained test-time control over the trade-off between adversariality and realism without any retraining. We first propose hierarchical group-based preference optimization, a data-efficient offline alignment method that learns to balance competing objectives by decoupling hard feasibility constraints from soft preferences. Instead of training a fixed model, SAGE fine-tunes two experts on opposing preferences and constructs a continuous spectrum of policies at inference time by linearly interpolating their weights. We provide theoretical justification for this framework through the lens of linear mode connectivity. Extensive experiments demonstrate that SAGE not only generates scenarios with a superior balance of adversariality and realism but also enables more effective closed-loop training of driving policies.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper introduces SAGE, a framework for adversarial scenario generation that enables test-time control over the trade-off between adversariality and realism. It resides in the 'Preference-Aligned and Multi-Objective Generation' leaf of the taxonomy, which currently contains only this single paper. This positioning reflects a relatively sparse research direction within the broader field of adversarial scenario generation, where most prior work focuses on fixed-objective optimization or learning-based methods without explicit preference alignment. The framework's emphasis on steerable, multi-objective generation distinguishes it from the more populated neighboring leaves.

The taxonomy reveals several neighboring research directions that provide context for this work. The closest relatives include 'Reinforcement Learning-Based Generation' (three papers) and 'Generative Model-Based Scenario Synthesis' (three papers), which explore learning-based approaches but typically optimize for single, fixed objectives. The 'Optimization-Based Generation' branch contains methods using genetic algorithms and adaptive search, while 'Data-Driven Scenario Generation' focuses on extracting scenarios from real-world data. SAGE's preference alignment approach bridges these areas by combining learning-based generation with explicit multi-objective control, a capability not emphasized in the neighboring leaves' scope notes.

Among the eighteen candidates examined, the contribution-level analysis reveals varying degrees of novelty. The core SAGE framework (nine candidates examined, zero refutations) and the hierarchical group-based preference optimization method (six candidates examined, zero refutations) appear relatively novel within the limited search scope. However, the test-time preference control via weight interpolation (three candidates examined, one refutation) shows overlap with existing work. This suggests that while the overall framework and preference optimization approach may be distinctive, the specific technique of weight interpolation for policy steering has precedent in the examined literature.

Based on the limited search of eighteen candidates, the work appears to occupy a genuinely sparse area of the research landscape, particularly in its emphasis on steerable multi-objective generation. The analysis does not cover the full breadth of multi-objective optimization or preference learning literature beyond autonomous driving, so the novelty assessment is necessarily scoped to the examined candidates. The single-paper leaf status and low refutation rate suggest meaningful differentiation from prior work, though the weight interpolation component shows some overlap.

Taxonomy

Core-task Taxonomy Papers
50
3
Claimed Contributions
18
Contribution Candidate Papers Compared
1
Refutable Paper

Research Landscape Overview

Core task: adversarial scenario generation for autonomous driving systems. The field is organized around several complementary branches that together address how to create, represent, validate, and defend against challenging test cases for autonomous vehicles. Adversarial Scenario Generation Methods encompasses diverse algorithmic approaches—ranging from evolutionary and reinforcement learning techniques (e.g., Reinforcement Learning Scenarios[20], MARL High-Risk[4]) to generative models and multi-objective frameworks—that automatically produce safety-critical situations. Scenario Specification and Representation focuses on how scenarios are formally described and parameterized, including natural language interfaces like Text2Scenario[6] and structured encodings. Scenario Realism and Validation ensures that generated scenarios remain plausible and grounded in real-world data (e.g., Synthetic versus Real[2], AuthSim[13]), while Scenario-Specific Generation targets particular traffic contexts such as pedestrian interactions (Critical Pedestrian Scenarios[1]) or two-wheeler behaviors (WGAN Powered Two-Wheelers[5]). Meanwhile, Adversarial Attacks on Perception Components and Adversarial Robustness and Defense examine vulnerabilities and countermeasures at the sensor and model level, and Simulation and Testing Infrastructure provides the platforms needed to execute these tests at scale. Within the generation methods branch, a particularly active line of work explores how to balance multiple competing objectives—such as collision severity, scenario diversity, and computational efficiency—often using evolutionary algorithms (EvoScenario[24]) or reinforcement learning (Reinforcement Learning Scenarios[20]). Another emerging theme is controllability: enabling human testers or automated pipelines to steer scenario properties toward specific risk profiles or edge cases. Steerable Adversarial Scenarios[0] sits squarely in this preference-aligned and multi-objective generation cluster, emphasizing user-guided control over the types of adversarial situations produced. Compared to works like Diffscene[3], which may prioritize generative diversity through diffusion models, or MARL High-Risk[4], which uses multi-agent reinforcement learning to discover high-risk interactions, Steerable Adversarial Scenarios[0] focuses on aligning generated outputs with explicit human preferences or safety criteria. This controllability is crucial for systematic testing, as it allows engineers to explore targeted failure modes rather than relying solely on random or black-box search.

Claimed Contributions

SAGE framework for steerable adversarial scenario generation

The authors introduce SAGE, a framework that treats adversarial scenario generation as a multi-objective preference alignment problem. This enables fine-grained test-time control over the trade-off between adversariality and realism without retraining, shifting from manually designing weighted objectives to learning a controllable preference landscape.

9 retrieved papers
Hierarchical group-based preference optimization method

The authors propose a new offline alignment method called hierarchical group-based preference optimization (HGPO). This method decouples hard feasibility constraints (such as map compliance) from soft preference trade-offs (adversariality versus realism), improving data efficiency by constructing multiple preference pairs from groups of samples.

6 retrieved papers
Test-time preference control via weight interpolation

The authors develop a test-time control mechanism where two expert models are fine-tuned on opposing preferences, then their weights are linearly interpolated at inference to generate a continuous spectrum of policies. This allows users to navigate the entire Pareto front of trade-offs without retraining, with theoretical justification through linear mode connectivity.

3 retrieved papers
Can Refute

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Within the taxonomy built over the current TopK core-task papers, the original paper is assigned to a leaf with no direct siblings and no cousin branches under the same grandparent topic. In this retrieved landscape, it appears structurally isolated, which is one partial signal of novelty, but still constrained by search coverage and taxonomy granularity.

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

SAGE framework for steerable adversarial scenario generation

The authors introduce SAGE, a framework that treats adversarial scenario generation as a multi-objective preference alignment problem. This enables fine-grained test-time control over the trade-off between adversariality and realism without retraining, shifting from manually designing weighted objectives to learning a controllable preference landscape.

Contribution

Hierarchical group-based preference optimization method

The authors propose a new offline alignment method called hierarchical group-based preference optimization (HGPO). This method decouples hard feasibility constraints (such as map compliance) from soft preference trade-offs (adversariality versus realism), improving data efficiency by constructing multiple preference pairs from groups of samples.

Contribution

Test-time preference control via weight interpolation

The authors develop a test-time control mechanism where two expert models are fine-tuned on opposing preferences, then their weights are linearly interpolated at inference to generate a continuous spectrum of policies. This allows users to navigate the entire Pareto front of trade-offs without retraining, with theoretical justification through linear mode connectivity.