Online Decision-Focused Learning

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 6.0 Download Report PDF

decision-focused learningintegrated estimaton optimizationpredict-then-optimizeonline learning

Decision-focused learning (DFL) is an increasingly popular paradigm for training predictive models whose outputs are used in decision-making tasks. Instead of merely optimizing for predictive accuracy, DFL trains models to directly minimize the loss associated with downstream decisions. However, existing studies focus solely on scenarios where a fixed batch of data is available and the objective function does not change over time. We instead investigate DFL in dynamic environments where the objective function and data distribution evolve over time. This setting is challenging for online learning because the objective function has zero or undefined gradients---which prevents the use of standard first-order optimization methods---and is generally non-convex. To address these difficulties, we (i) regularize the objective to make it differentiable and (ii) use perturbation techniques along with a near-optimal oracle to overcome non-convexity. Combining those techniques yields two original online algorithms tailored for DFL, for which we establish respectively static and dynamic regret bounds. These are the first provable guarantees for the online decision-focused problem. Finally, we showcase the effectiveness of our algorithms on a knapsack experiment, where they outperform two standard benchmarks.

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper formalizes online decision-focused learning in dynamic environments where both data distributions and objective functions evolve over time. It sits in the 'Online Decision-Focused Learning' leaf under 'Methodological Surveys and Comparative Studies', which currently contains only this paper as a sibling. This positioning suggests the work addresses a relatively sparse intersection: combining decision-focused paradigms with online learning guarantees. The taxonomy shows that while neighboring branches like 'Decision-Focused and Predict-Then-Optimize Frameworks' and 'Online Convex Optimization in Dynamic Settings' are well-populated (with 4-7 papers each), the specific integration of these two themes into a provable online framework appears less explored.

The taxonomy reveals substantial activity in adjacent directions. The 'Dynamic Regret Minimization' leaf contains four papers on non-stationary online optimization, while 'Energy and Infrastructure Systems' and 'Sequential Decision-Making and Reinforcement Learning' explore decision-focused methods in offline or stationary settings. The paper's scope_note emphasizes 'dynamic objectives and zero-gradient challenges', distinguishing it from general time-varying optimization surveys in sibling leaves. This boundary suggests the work bridges two mature areas—online convex optimization and predict-then-optimize—rather than extending a single established direction. The exclude_note clarifies it differs from purely algorithmic contributions by focusing on the decision-focused paradigm itself.

Among 30 candidates examined, the contribution-level analysis shows mixed novelty signals. The formalization of the online decision-focused problem appears unrefuted across 10 candidates, suggesting this framing may be new. However, the two algorithmic contributions face more substantial prior work: the regret-bound algorithms show 3 refutable candidates among 10 examined, and the regularization approach for non-differentiability shows 2 refutable candidates among 10. These statistics indicate that while the problem formulation may be original, the technical machinery—perturbation methods, regularization, and regret analysis—likely builds on established techniques from online optimization literature. The limited search scope (30 papers) means these findings reflect top semantic matches rather than exhaustive coverage.

Based on the top-30 semantic search results, the work appears to occupy a genuine gap between decision-focused learning and online optimization theory. The taxonomy structure and contribution statistics suggest the novelty lies primarily in the problem formulation and its theoretical treatment, rather than in fundamentally new algorithmic primitives. The analysis does not cover broader optimization literature beyond the examined candidates, so the assessment remains conditional on this limited scope.

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: decision-focused learning in dynamic online environments. This field addresses the challenge of making sequential decisions when both the environment and the underlying optimization objectives evolve over time. The taxonomy reveals a rich structure spanning eight major branches. Online Convex Optimization in Dynamic Settings and Distributed Online Optimization focus on algorithmic foundations for handling time-varying objectives and multi-agent coordination, with works like Online Optimization in Dynamic[13] and Distributed online convex optimization[14] establishing core convergence guarantees. Decision-Focused and Predict-Then-Optimize Frameworks emphasize end-to-end learning where predictions are optimized directly for downstream decision quality, as seen in Perturbed decision-focused learning for[3] and Uncertainty-aware predict-then-optimize framework for[8]. Online Learning for Adaptive Decision-Making and Specialized Online Optimization Applications explore domain-specific adaptations, while Theoretical Foundations and Algorithmic Frameworks provide rigorous analysis of regret bounds and convergence rates. Emerging and Cross-Domain Applications demonstrate the breadth of impact, and Methodological Surveys and Comparative Studies synthesize these diverse threads. Several active research directions reveal key trade-offs between prediction accuracy and decision quality. The predict-then-optimize paradigm, exemplified by works such as A Predict-Then-Optimize Customer Allocation[11] and End-to-end Stochastic Predict-Then-Optimize for[20], contrasts with purely online methods like Adaptive Online Learning in[4] and Online Non-convex Learning in[1] that adapt without explicit prediction models. A central tension involves balancing computational tractability with theoretical guarantees in non-stationary settings, as explored in Time-Varying Convex Optimization via[30]. Online Decision-Focused Learning[0] sits within the Methodological Surveys and Comparative Studies branch, providing a synthesizing perspective on how decision-focused objectives can be integrated into online learning frameworks. Its emphasis on bridging prediction and optimization distinguishes it from purely algorithmic works like Projection-free Online Learning in[6], while complementing application-oriented studies such as Decision-Focused Learning for Complex[7] by offering a unifying conceptual framework for understanding when and how to couple learning with decision-making in dynamic environments.

Claimed Contributions

Formalization of online decision-focused learning problem

10 retrieved papers

The authors introduce a formal framework for decision-focused learning in dynamic, non-stationary environments where the objective function and data distribution evolve over time. This extends DFL beyond the traditional batch setting with i.i.d. data to sequential decision-making under uncertainty.

10 retrieved papers

Two online algorithms with provable regret bounds

Can Refute

10 retrieved papers

The authors develop two novel algorithms (DF-FTPL and DF-OGD) that combine regularization techniques with perturbation methods to handle non-differentiability and non-convexity. They provide the first theoretical guarantees for online DFL in the form of static and dynamic regret bounds.

10 retrieved papers

Can Refute

Regularization approach for handling non-differentiability

Can Refute

10 retrieved papers

The authors propose adding a regularizer (such as log-barrier or negative entropy) to the inner optimization problem to make the decision function differentiable, enabling gradient-based updates. This addresses the fundamental challenge that the original objective has zero or undefined gradients.

10 retrieved papers

Can Refute

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Within the taxonomy built over the current TopK core-task papers, the original paper is assigned to a leaf with no direct siblings and no cousin branches under the same grandparent topic. In this retrieved landscape, it appears structurally isolated, which is one partial signal of novelty, but still constrained by search coverage and taxonomy granularity.

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Formalization of online decision-focused learning problem

[60] Online stochastic optimization with wasserstein-based nonstationarity PDF

Cannot Refute

[61] Risk-averse Learning with Non-Stationary Distributions PDF

Cannot Refute

[62] Learning Latent and Changing Dynamics in Real Non-Stationary Environments PDF

Cannot Refute

[63] Kernel-based function learning in dynamic and non stationary environments PDF

Cannot Refute

[64] Non-stationary online learning with memory and non-stochastic control PDF

Cannot Refute

[65] Continual Prototype Evolution: Learning Online from Non-Stationary Data Streams PDF

Cannot Refute

[66] Avoiding undesired future with minimal cost in non-stationary environments PDF

Cannot Refute

[67] Weighted Linear Bandits for Non-Stationary Environments PDF

Cannot Refute

[68] Dynamic regret of policy optimization in non-stationary environments PDF

Cannot Refute

[69] Near-optimal goal-oriented reinforcement learning in non-stationary environments PDF

Cannot Refute

Contribution

Two online algorithms with provable regret bounds

[1] Online Non-convex Learning in Dynamic Environments PDF

Can Refute

[72] Efficient Regret Minimization in Non-Convex Games PDF

Can Refute

[74] Provable Regret Bounds for Deep Online Learning and Control PDF

Can Refute

[70] No internal regret with non-convex loss functions PDF

Cannot Refute

[71] Decision-focused learning for power system decision-making under uncertainty PDF

Cannot Refute

[73] Non-stationary Online Learning for Curved Losses: Improved Dynamic Regret via Mixability PDF

Cannot Refute

[75] Optimal Stochastic Non-smooth Non-convex Optimization through Online-to-Non-convex Conversion PDF

Cannot Refute

[76] Non-convex online learning via algorithmic equivalence PDF

Cannot Refute

[77] On Information Gain and Regret Bounds in Gaussian Process Bandits PDF

Cannot Refute

[78] Delayed feedback in online non-convex optimization: a non-stationary approach with applications PDF

Cannot Refute

Contribution

Regularization approach for handling non-differentiability

[54] Sp-R-IP: A Decision-Focused Learning Strategy for Linear Programs that Avoids Overfitting PDF

Can Refute

[59] A surrogate objective framework for prediction+ programming with soft constraints PDF

Can Refute

[7] Decision-Focused Learning for Complex System Identification: HVAC Management System Application PDF

Cannot Refute

[51] Differentiable distributionally robust optimization layers PDF

Cannot Refute

[52] Interpretable Reinforcement Learning via Differentiable Decision Trees PDF

Cannot Refute

[53] Towards scalable decision-focused learning for combinatorial optimization problems PDF

Cannot Refute

[55] Safe Neural Control for Non-Affine Control Systems with Differentiable Control Barrier Functions PDF

Cannot Refute

[56] Learning Binary Decision Trees by Argmin Differentiation PDF

Cannot Refute

[57] A Noise-Tolerant Differentiable Learning Approach for Single Occurrence Regular Expression with Interleaving PDF

Cannot Refute

[58] Automatically learning compact quality-aware surrogates for optimization problems PDF

Cannot Refute

Online Decision-Focused Learning

Overview

Overall Novelty Assessment

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

Contribution Analysis

Formalization of online decision-focused learning problem

[60] Online stochastic optimization with wasserstein-based nonstationarity PDF

[61] Risk-averse Learning with Non-Stationary Distributions PDF

[62] Learning Latent and Changing Dynamics in Real Non-Stationary Environments PDF

[63] Kernel-based function learning in dynamic and non stationary environments PDF

[64] Non-stationary online learning with memory and non-stochastic control PDF

[65] Continual Prototype Evolution: Learning Online from Non-Stationary Data Streams PDF

[66] Avoiding undesired future with minimal cost in non-stationary environments PDF

[67] Weighted Linear Bandits for Non-Stationary Environments PDF

[68] Dynamic regret of policy optimization in non-stationary environments PDF

[69] Near-optimal goal-oriented reinforcement learning in non-stationary environments PDF

Two online algorithms with provable regret bounds

[1] Online Non-convex Learning in Dynamic Environments PDF

[72] Efficient Regret Minimization in Non-Convex Games PDF

[74] Provable Regret Bounds for Deep Online Learning and Control PDF

[70] No internal regret with non-convex loss functions PDF

[71] Decision-focused learning for power system decision-making under uncertainty PDF

[73] Non-stationary Online Learning for Curved Losses: Improved Dynamic Regret via Mixability PDF

[75] Optimal Stochastic Non-smooth Non-convex Optimization through Online-to-Non-convex Conversion PDF

[76] Non-convex online learning via algorithmic equivalence PDF

[77] On Information Gain and Regret Bounds in Gaussian Process Bandits PDF

[78] Delayed feedback in online non-convex optimization: a non-stationary approach with applications PDF

Regularization approach for handling non-differentiability

[54] Sp-R-IP: A Decision-Focused Learning Strategy for Linear Programs that Avoids Overfitting PDF

[59] A surrogate objective framework for prediction+ programming with soft constraints PDF

[7] Decision-Focused Learning for Complex System Identification: HVAC Management System Application PDF

[51] Differentiable distributionally robust optimization layers PDF

[52] Interpretable Reinforcement Learning via Differentiable Decision Trees PDF

[53] Towards scalable decision-focused learning for combinatorial optimization problems PDF

[55] Safe Neural Control for Non-Affine Control Systems with Differentiable Control Barrier Functions PDF

[56] Learning Binary Decision Trees by Argmin Differentiation PDF

[57] A Noise-Tolerant Differentiable Learning Approach for Single Occurrence Regular Expression with Interleaving PDF

[58] Automatically learning compact quality-aware surrogates for optimization problems PDF

Table of Contents