SAFETY-GUIDED FLOW (SGF): A UNIFIED FRAMEWORK FOR NEGATIVE GUIDANCE IN SAFE GENERATION

ICLR 2026 Conference SubmissionAnonymous Authors
Safe generationflow matchingcontrol barrier functions
Abstract:

Safety mechanisms for diffusion and flow models have recently been developed along two distinct paths. In robot planning, control barrier functions are employed to guide generative trajectories away from obstacles at every denoising step by explicitly imposing geometric constraints. In parallel, recent data-driven, negative guidance approaches have been shown to suppress harmful content and promote diversity in generated samples. However, they rely on heuristics without clearly stating when safety guidance is actually necessary. In this paper, we first introduce a unified probabilistic framework using a Maximum Mean Discrepancy (MMD) potential for image generation tasks that recasts both Shielded Diffusion and Safe Denoiser as instances of our energy-based negative guidance against unsafe data samples. Furthermore, we leverage control-barrier functions analysis to justify the existence of a critical time window in which negative guidance must be strong; outside of this window, the guidance should decay to zero to ensure safe and high-quality generation. We evaluate our unified framework on several realistic safe generation scenarios, confirming that negative guidance should be applied in the early stages of the denoising process for successful safe generation.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper proposes a unified probabilistic framework using Maximum Mean Discrepancy (MMD) potentials to formalize negative guidance in diffusion and flow models, explicitly connecting prior heuristic methods like Shielded Diffusion and Safe Denoiser under a single energy-based lens. It resides in the 'Unified Frameworks and Energy-Based Formulations' leaf, which contains only one other sibling paper among the 37 total papers surveyed. This positioning suggests the work occupies a relatively sparse research direction focused on theoretical unification rather than application-specific implementations or concept removal techniques.

The taxonomy reveals that most neighboring work clusters around dynamic timing strategies, attention-based interventions, and classifier-free guidance extensions—all within the broader 'Negative Guidance Mechanisms and Theoretical Foundations' branch. The paper's energy-based formulation distinguishes it from attention-manipulation methods and prompt-engineering approaches, which dominate adjacent leaves. Its control-barrier function analysis bridges theoretical foundations with the timing-focused subcategory, suggesting cross-pollination between geometric safety constraints (common in robotics) and probabilistic guidance frameworks for generative models.

Among 13 candidates examined across three contributions, none were flagged as clearly refuting the paper's claims. The MMD-based unification examined 3 candidates with no refutations; the equivalence propositions examined 1 candidate; and the control-barrier timing analysis examined 9 candidates, again with no overlapping prior work identified. This limited search scope—focused on top-K semantic matches—suggests that within the examined literature, the combination of MMD potentials, formal equivalence proofs, and barrier-function timing analysis appears relatively novel, though exhaustive coverage of related robotics or control-theoretic safety literature may lie outside this search.

Given the sparse population of the unified-framework leaf and the absence of refuting candidates among 13 examined papers, the work appears to occupy a distinct niche at the intersection of energy-based guidance theory and control-theoretic safety analysis. However, the analysis is constrained by the limited search scope and may not capture all relevant prior work in adjacent fields such as robotics planning or formal verification, where barrier functions are more established.

Taxonomy

Core-task Taxonomy Papers
37
3
Claimed Contributions
13
Contribution Candidate Papers Compared
0
Refutable Paper

Research Landscape Overview

Core task: negative guidance in safe generation for diffusion and flow models. The field addresses how to steer generative models away from undesirable outputs—such as harmful, biased, or off-topic content—while preserving generation quality. The taxonomy organizes this landscape into several main branches. Negative Guidance Mechanisms and Theoretical Foundations explores the underlying mathematical frameworks, including energy-based formulations and unified guidance strategies that provide principled ways to incorporate safety constraints during sampling. Concept Erasure and Content Filtering focuses on methods that remove or suppress specific unwanted concepts, often through training-free interventions or fine-tuning approaches like Erasing concepts from diffusion[1] and Bi-Erasing[26]. Application Domains and Task-Specific Implementations covers specialized uses in image synthesis, video generation, and multimodal tasks, where negative guidance is adapted to domain-specific safety requirements. Alternative Generative Paradigms and Related Methods examines how similar ideas appear in non-diffusion settings, such as language models or other generative architectures. Within the theoretical branch, a dense cluster of works investigates how to formulate negative guidance as an energy-based optimization problem, balancing safety objectives with sample fidelity. SAFETY-GUIDED FLOW SGF[0] sits squarely in this unified framework subarea, proposing a principled energy formulation for flow models that contrasts with earlier heuristic approaches. Nearby, Dont be so negative[20] examines potential pitfalls of naive negative prompting, highlighting trade-offs between suppression strength and generation coherence. Other works like Adaptive guidance[2] and Training-free safe denoisers[4] explore dynamic or training-free strategies that adjust guidance intensity on the fly, addressing the challenge of maintaining output diversity while enforcing safety constraints. Across these lines, a recurring theme is the tension between strong negative steering—which can degrade sample quality or introduce artifacts—and weaker interventions that may fail to eliminate harmful content, with SAFETY-GUIDED FLOW SGF[0] contributing a theoretically grounded middle path for flow-based generation.

Claimed Contributions

Unified probabilistic framework using MMD potential for negative guidance

The authors propose an energy-based formulation of negative guidance using the Maximum Mean Discrepancy (MMD) potential. This framework unifies existing methods (Shielded Diffusion and Safe Denoiser) by showing they are special cases of gradient-based repulsion from unsafe data samples in kernel feature space.

3 retrieved papers
Propositions establishing equivalence between MMD gradient and existing repulsive fields

The authors provide formal propositions demonstrating that the gradient of their MMD potential recovers both Safe Denoiser's weighted kernel repulsion and Shielded Diffusion's radial repulsion under appropriate conditions, establishing mathematical connections between these previously disparate approaches.

1 retrieved paper
Control-barrier function analysis justifying critical time window for guidance

The authors apply control-barrier function theory to formally characterize when negative guidance should be applied during generation. They prove that guidance is most effective early in the denoising process and should decay afterward, providing theoretical justification for the critical time window rather than relying on heuristics.

9 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Unified probabilistic framework using MMD potential for negative guidance

The authors propose an energy-based formulation of negative guidance using the Maximum Mean Discrepancy (MMD) potential. This framework unifies existing methods (Shielded Diffusion and Safe Denoiser) by showing they are special cases of gradient-based repulsion from unsafe data samples in kernel feature space.

Contribution

Propositions establishing equivalence between MMD gradient and existing repulsive fields

The authors provide formal propositions demonstrating that the gradient of their MMD potential recovers both Safe Denoiser's weighted kernel repulsion and Shielded Diffusion's radial repulsion under appropriate conditions, establishing mathematical connections between these previously disparate approaches.

Contribution

Control-barrier function analysis justifying critical time window for guidance

The authors apply control-barrier function theory to formally characterize when negative guidance should be applied during generation. They prove that guidance is most effective early in the denoising process and should decay afterward, providing theoretical justification for the critical time window rather than relying on heuristics.

SAFETY-GUIDED FLOW (SGF): A UNIFIED FRAMEWORK FOR NEGATIVE GUIDANCE IN SAFE GENERATION | Novelty Validation