Efficient Regression-based Training of Normalizing Flows for Boltzmann Generators

ICLR 2026 Conference SubmissionAnonymous Authors
Normalizing FlowsGenerative ModelsOptimal TransportFlow MatchingAI for Science
Abstract:

Simulation-free training frameworks have been at the forefront of the generative modelling revolution in continuous spaces, leading to large-scale diffusion and flow matching models. However, such modern generative models suffer from expensive inference, inhibiting their use in numerous scientific applications like Boltzmann Generators (BGs) for molecular conformations that require fast likelihood evaluation. In this paper, we revisit classical normalizing flows in the context of BGs that offer efficient sampling and likelihoods, but whose training via maximum likelihood is often unstable and computationally challenging. We propose Regression Training of Normalizing Flows (RegFlow), a novel and scalable regression-based training objective that bypasses the numerical instability and computational challenge of conventional maximum likelihood training in favour of a simple 2\ell_2-regression objective. Specifically, RegFlow maps prior samples under our flow to targets computed using optimal transport couplings or a pre-trained continuous normalizing flow (CNF). To enhance numerical stability, RegFlow employs effective regularization strategies such as a new forward-backward self-consistency loss that enjoys painless implementation. Empirically, we demonstrate that RegFlow unlocks a broader class of architectures that were previously intractable to train for BGs with maximum likelihood. We also show RegFlow exceeds the performance, computational cost, and stability of maximum likelihood training in equilibrium sampling in Cartesian coordinates of alanine dipeptide, tripeptide, and tetrapeptide, showcasing its potential in molecular systems.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper proposes RegFlow, a regression-based training objective for normalizing flows applied to Boltzmann Generators for molecular conformations. It sits in the 'Direct Regression Training for Normalizing Flows' leaf, which contains only two papers total. This is a sparse research direction within the broader taxonomy of seven papers across five leaf nodes, suggesting the specific combination of regression training and normalizing flows for molecular sampling remains relatively unexplored compared to adjacent areas like flow matching or diffusion-based methods.

The taxonomy reveals neighboring work in flow matching for molecular generation, which includes equivariant flow matching for conformer generation and broader bioinformatics applications. These adjacent branches emphasize continuous normalizing flows trained via vector field regression, whereas the paper's leaf focuses specifically on discrete normalizing flows with regression objectives that avoid maximum likelihood. The taxonomy explicitly excludes flow matching from this category, positioning the work as an alternative training paradigm that retains invertibility guarantees while bypassing likelihood computation challenges inherent in classical normalizing flow training.

Among twenty-five candidates examined, the analysis identifies limited prior work overlap. The core RegFlow objective examined ten candidates with one appearing to provide overlapping prior work, as does the forward-backward self-consistency regularization. The energy-free targeted free energy perturbation method examined five candidates with none clearly refuting it. These statistics reflect a focused semantic search rather than exhaustive coverage, suggesting that within the examined scope, the regression training framework and regularization strategies show moderate novelty, while the free energy perturbation component appears less contested by prior literature.

Based on the limited search scope of top-twenty-five semantic matches, the work appears to occupy a relatively sparse position in the taxonomy, with only one sibling paper in its immediate category. The contribution-level analysis suggests the core training objective has some precedent among examined candidates, while the free energy method shows less overlap. However, these findings are constrained by the search methodology and do not constitute an exhaustive assessment of all relevant prior work in regression-based flow training or molecular sampling.

Taxonomy

Core-task Taxonomy Papers
6
3
Claimed Contributions
25
Contribution Candidate Papers Compared
2
Refutable Paper

Research Landscape Overview

Core task: regression-based training of normalizing flows for molecular sampling. The field divides into two main branches that reflect complementary perspectives on generative modeling for molecules. The first branch, Regression-Based Flow Training Methods, focuses on how to train normalizing flows without relying on maximum likelihood or adversarial objectives, instead using direct regression or forward-only schemes to match target distributions. The second branch, Generative Models for Molecular Design, encompasses a broader set of architectures and problem settings—ranging from equivariant flow matching to controllable generation—that address the unique geometric and chemical constraints of molecular data. Together, these branches illustrate a shift from classical likelihood-based training toward more flexible regression frameworks, while also highlighting the domain-specific challenges of sampling from high-dimensional molecular conformations and unnormalized Boltzmann distributions. Several active lines of work reveal key trade-offs between training efficiency, sample quality, and physical interpretability. Some studies explore hybrid probability transport schemes (Hybrid Probability Transport[3]) or methods that handle unnormalized target densities directly (Molecular Unnormalized Distributions[4]), addressing the difficulty of computing partition functions in molecular systems. Others investigate equivariant architectures (Equivariant Flow Matching[1]) or domain-specific applications (Flow in Bioinformatics[5], Controllable Drug Generation[6]) that respect molecular symmetries and enable goal-directed design. Within this landscape, Boltzmann Generators[0] sits squarely in the Direct Regression Training cluster, closely aligned with Forward Only Regression[2]. Both emphasize training flows via regression objectives rather than likelihood maximization, but Boltzmann Generators[0] specifically targets equilibrium sampling from Boltzmann distributions, whereas Forward Only Regression[2] explores a more general forward-pass training paradigm. This positioning underscores an emerging theme: leveraging regression-based objectives to bypass expensive likelihood computations while maintaining the invertibility and exact sampling guarantees that normalizing flows provide.

Claimed Contributions

REGFLOW: Regression-based training objective for normalizing flows

The authors introduce REGFLOW, a new training framework for classical normalizing flows that replaces maximum likelihood estimation with a simple regression objective. This approach maps prior samples to targets computed using optimal transport couplings or a pre-trained continuous normalizing flow, avoiding the numerical instability and computational expense of traditional MLE training.

10 retrieved papers
Can Refute
Forward-backward self-consistency regularization

The authors propose a novel forward-backward self-consistency regularizer that ensures invertibility at the output level without requiring computation of the Jacobian determinant. This regularization strategy enhances numerical stability during training and opens possibilities for less constrained architectures.

10 retrieved papers
Can Refute
Energy-free targeted free energy perturbation method

The authors develop a new approach to Targeted Free Energy Perturbation that trains normalizing flows using only samples from metastable states, eliminating the need for costly energy function evaluations during training. This represents a distinct capability compared to traditional MLE-trained normalizing flows.

5 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

REGFLOW: Regression-based training objective for normalizing flows

The authors introduce REGFLOW, a new training framework for classical normalizing flows that replaces maximum likelihood estimation with a simple regression objective. This approach maps prior samples to targets computed using optimal transport couplings or a pre-trained continuous normalizing flow, avoiding the numerical instability and computational expense of traditional MLE training.

Contribution

Forward-backward self-consistency regularization

The authors propose a novel forward-backward self-consistency regularizer that ensures invertibility at the output level without requiring computation of the Jacobian determinant. This regularization strategy enhances numerical stability during training and opens possibilities for less constrained architectures.

Contribution

Energy-free targeted free energy perturbation method

The authors develop a new approach to Targeted Free Energy Perturbation that trains normalizing flows using only samples from metastable states, eliminating the need for costly energy function evaluations during training. This represents a distinct capability compared to traditional MLE-trained normalizing flows.

Efficient Regression-based Training of Normalizing Flows for Boltzmann Generators | Novelty Validation