Online Decision-Focused Learning

ICLR 2026 Conference SubmissionAnonymous Authors
decision-focused learningintegrated estimaton optimizationpredict-then-optimizeonline learning
Abstract:

Decision-focused learning (DFL) is an increasingly popular paradigm for training predictive models whose outputs are used in decision-making tasks. Instead of merely optimizing for predictive accuracy, DFL trains models to directly minimize the loss associated with downstream decisions. However, existing studies focus solely on scenarios where a fixed batch of data is available and the objective function does not change over time. We instead investigate DFL in dynamic environments where the objective function and data distribution evolve over time. This setting is challenging for online learning because the objective function has zero or undefined gradients---which prevents the use of standard first-order optimization methods---and is generally non-convex. To address these difficulties, we (i) regularize the objective to make it differentiable and (ii) use perturbation techniques along with a near-optimal oracle to overcome non-convexity. Combining those techniques yields two original online algorithms tailored for DFL, for which we establish respectively static and dynamic regret bounds. These are the first provable guarantees for the online decision-focused problem. Finally, we showcase the effectiveness of our algorithms on a knapsack experiment, where they outperform two standard benchmarks.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper formalizes online decision-focused learning in dynamic environments where both data distributions and objective functions evolve over time. It sits in the 'Online Decision-Focused Learning' leaf under 'Methodological Surveys and Comparative Studies', which currently contains only this paper as a sibling. This positioning suggests the work addresses a relatively sparse intersection: combining decision-focused paradigms with online learning guarantees. The taxonomy shows that while neighboring branches like 'Decision-Focused and Predict-Then-Optimize Frameworks' and 'Online Convex Optimization in Dynamic Settings' are well-populated (with 4-7 papers each), the specific integration of these two themes into a provable online framework appears less explored.

The taxonomy reveals substantial activity in adjacent directions. The 'Dynamic Regret Minimization' leaf contains four papers on non-stationary online optimization, while 'Energy and Infrastructure Systems' and 'Sequential Decision-Making and Reinforcement Learning' explore decision-focused methods in offline or stationary settings. The paper's scope_note emphasizes 'dynamic objectives and zero-gradient challenges', distinguishing it from general time-varying optimization surveys in sibling leaves. This boundary suggests the work bridges two mature areas—online convex optimization and predict-then-optimize—rather than extending a single established direction. The exclude_note clarifies it differs from purely algorithmic contributions by focusing on the decision-focused paradigm itself.

Among 30 candidates examined, the contribution-level analysis shows mixed novelty signals. The formalization of the online decision-focused problem appears unrefuted across 10 candidates, suggesting this framing may be new. However, the two algorithmic contributions face more substantial prior work: the regret-bound algorithms show 3 refutable candidates among 10 examined, and the regularization approach for non-differentiability shows 2 refutable candidates among 10. These statistics indicate that while the problem formulation may be original, the technical machinery—perturbation methods, regularization, and regret analysis—likely builds on established techniques from online optimization literature. The limited search scope (30 papers) means these findings reflect top semantic matches rather than exhaustive coverage.

Based on the top-30 semantic search results, the work appears to occupy a genuine gap between decision-focused learning and online optimization theory. The taxonomy structure and contribution statistics suggest the novelty lies primarily in the problem formulation and its theoretical treatment, rather than in fundamentally new algorithmic primitives. The analysis does not cover broader optimization literature beyond the examined candidates, so the assessment remains conditional on this limited scope.

Taxonomy

Core-task Taxonomy Papers
50
3
Claimed Contributions
30
Contribution Candidate Papers Compared
5
Refutable Paper

Research Landscape Overview

Core task: decision-focused learning in dynamic online environments. This field addresses the challenge of making sequential decisions when both the environment and the underlying optimization objectives evolve over time. The taxonomy reveals a rich structure spanning eight major branches. Online Convex Optimization in Dynamic Settings and Distributed Online Optimization focus on algorithmic foundations for handling time-varying objectives and multi-agent coordination, with works like Online Optimization in Dynamic[13] and Distributed online convex optimization[14] establishing core convergence guarantees. Decision-Focused and Predict-Then-Optimize Frameworks emphasize end-to-end learning where predictions are optimized directly for downstream decision quality, as seen in Perturbed decision-focused learning for[3] and Uncertainty-aware predict-then-optimize framework for[8]. Online Learning for Adaptive Decision-Making and Specialized Online Optimization Applications explore domain-specific adaptations, while Theoretical Foundations and Algorithmic Frameworks provide rigorous analysis of regret bounds and convergence rates. Emerging and Cross-Domain Applications demonstrate the breadth of impact, and Methodological Surveys and Comparative Studies synthesize these diverse threads. Several active research directions reveal key trade-offs between prediction accuracy and decision quality. The predict-then-optimize paradigm, exemplified by works such as A Predict-Then-Optimize Customer Allocation[11] and End-to-end Stochastic Predict-Then-Optimize for[20], contrasts with purely online methods like Adaptive Online Learning in[4] and Online Non-convex Learning in[1] that adapt without explicit prediction models. A central tension involves balancing computational tractability with theoretical guarantees in non-stationary settings, as explored in Time-Varying Convex Optimization via[30]. Online Decision-Focused Learning[0] sits within the Methodological Surveys and Comparative Studies branch, providing a synthesizing perspective on how decision-focused objectives can be integrated into online learning frameworks. Its emphasis on bridging prediction and optimization distinguishes it from purely algorithmic works like Projection-free Online Learning in[6], while complementing application-oriented studies such as Decision-Focused Learning for Complex[7] by offering a unifying conceptual framework for understanding when and how to couple learning with decision-making in dynamic environments.

Claimed Contributions

Formalization of online decision-focused learning problem

The authors introduce a formal framework for decision-focused learning in dynamic, non-stationary environments where the objective function and data distribution evolve over time. This extends DFL beyond the traditional batch setting with i.i.d. data to sequential decision-making under uncertainty.

10 retrieved papers
Two online algorithms with provable regret bounds

The authors develop two novel algorithms (DF-FTPL and DF-OGD) that combine regularization techniques with perturbation methods to handle non-differentiability and non-convexity. They provide the first theoretical guarantees for online DFL in the form of static and dynamic regret bounds.

10 retrieved papers
Can Refute
Regularization approach for handling non-differentiability

The authors propose adding a regularizer (such as log-barrier or negative entropy) to the inner optimization problem to make the decision function differentiable, enabling gradient-based updates. This addresses the fundamental challenge that the original objective has zero or undefined gradients.

10 retrieved papers
Can Refute

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Within the taxonomy built over the current TopK core-task papers, the original paper is assigned to a leaf with no direct siblings and no cousin branches under the same grandparent topic. In this retrieved landscape, it appears structurally isolated, which is one partial signal of novelty, but still constrained by search coverage and taxonomy granularity.

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Formalization of online decision-focused learning problem

The authors introduce a formal framework for decision-focused learning in dynamic, non-stationary environments where the objective function and data distribution evolve over time. This extends DFL beyond the traditional batch setting with i.i.d. data to sequential decision-making under uncertainty.

Contribution

Two online algorithms with provable regret bounds

The authors develop two novel algorithms (DF-FTPL and DF-OGD) that combine regularization techniques with perturbation methods to handle non-differentiability and non-convexity. They provide the first theoretical guarantees for online DFL in the form of static and dynamic regret bounds.

Contribution

Regularization approach for handling non-differentiability

The authors propose adding a regularizer (such as log-barrier or negative entropy) to the inner optimization problem to make the decision function differentiable, enabling gradient-based updates. This addresses the fundamental challenge that the original objective has zero or undefined gradients.