Bayesian Ensemble for Sequential Decision-Making

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 6.0 Download Report PDF

Ensemble MethodsReinforcement Learning

Ensemble learning is a practical family of methods for uncertainty modeling, particularly useful for sequential decision-making problems like recommendation systems and reinforcement learning tasks. The posterior on likelihood parameters is approximated by sampling an ensemble member from a predetermined index distribution, with the ensemble’s diversity reflecting the degree of uncertainty. In this paper, we propose Bayesian Ensemble (BE), a lightweight yet principled Bayesian layer atop existing ensembles. BE treats the selection of an ensemble member as a bandit problem in itself, dynamically updating a sampling distribution over members via Bayesian inference on observed rewards. This contrasts with prior works that rely on fixed, uniform sampling. We extend this framework to both bandit learning and reinforcement learning, introducing Bayesian Ensemble Bandit and Bayesian Ensemble Deep Q-Network for diverse decision-making problems. Extensive experiments on both synthetic and real-world environments demonstrate the effectiveness and efficiency of BE.

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper proposes a Bayesian Ensemble framework that treats ensemble member selection as a bandit problem, dynamically updating sampling distributions via Bayesian inference on observed rewards. It resides in the Deep Ensemble Methods for Policy and Value Uncertainty leaf, which contains seven papers including the original work. This leaf sits within the broader Ensemble-Based Uncertainty Quantification in Reinforcement Learning branch, indicating a moderately populated research direction focused on neural network ensembles for uncertainty-aware policy optimization and Q-value estimation.

The taxonomy reveals neighboring leaves addressing model-based RL uncertainty and multi-task transfer learning, both under the same parent branch. The Deep Ensemble Methods leaf explicitly excludes tree-based ensembles and bandit-specific methods, yet this paper bridges to bandit learning by framing member selection as a bandit problem. Sibling papers in the leaf include works on exploration trajectories, randomized prior functions, and distributional robustness, suggesting the field balances exploration-driven methods with safety-oriented approaches. The paper's Bayesian treatment of member selection appears to occupy a distinct methodological niche within this landscape.

Among twenty-two candidates examined via limited semantic search, no contributions were clearly refuted. The core Bayesian Ensemble framework examined ten candidates with zero refutable matches, as did the extension to bandit and RL settings. The unified variance reduction framework examined only two candidates, also with no refutations. This absence of overlapping prior work among the examined candidates suggests the dynamic Bayesian member selection mechanism may represent a novel angle, though the limited search scope means potentially relevant work outside the top-K matches remains unexamined.

Based on the examined literature, the work appears to introduce a distinctive approach by applying Bayesian inference to ensemble member selection rather than relying on fixed uniform sampling. However, the analysis covers only top-twenty-two semantic matches and does not constitute an exhaustive survey of ensemble methods in sequential decision-making. The taxonomy context indicates a moderately active research area where methodological variations on ensemble uncertainty quantification continue to emerge.

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: Uncertainty modeling for sequential decision-making using ensemble methods. The field encompasses a diverse set of branches that reflect both methodological innovations and domain-specific applications. At the methodological core, Ensemble-Based Uncertainty Quantification in Reinforcement Learning develops deep ensemble techniques for policy and value uncertainty, while Ensemble Methods for Hyperparameter and Model Selection focus on optimizing learning configurations. Several branches address critical real-world domains: Decision-Making Under Uncertainty in Safety-Critical Applications targets high-stakes environments such as autonomous systems and emergency response (e.g., Emergency Decision Uncertainty[3]), Multi-Step Forecasting with Ensemble Uncertainty Quantification handles temporal prediction tasks (e.g., Ensemble Monte Carlo[5]), and specialized branches tackle resource management, financial trading, and physical systems like wind power forecasting. Meanwhile, Ensemble-Based State and Parameter Estimation and Ensemble-Based Sample Collection and Active Learning provide foundational tools for data assimilation and efficient exploration, bridging theory and practice across the taxonomy. Particularly active lines of work reveal contrasting emphases between exploration-driven methods and safety-oriented approaches. Works like Ensemble Exploration Trajectories[29] and Randomized Prior Functions[26] emphasize diversity and epistemic uncertainty to guide exploration in reinforcement learning, while others such as Uncertainty OOD Detection[31] and Sentinel Distributional RL[9] focus on robustness and out-of-distribution detection. Bayesian Ensemble Sequential[0] sits within the deep ensemble methods for policy and value uncertainty cluster, sharing methodological ground with Revalued Ensemble Decomposition[4] and Ensemble Overlapping Predictions[44]. Compared to neighbors that prioritize exploration or distributional robustness, Bayesian Ensemble Sequential[0] appears to integrate Bayesian principles with sequential decision-making, potentially offering a principled framework for balancing uncertainty quantification with iterative policy refinement. This positioning suggests a bridge between purely exploration-focused ensembles and those designed for safety-critical deployment, addressing open questions about how to maintain computational tractability while capturing meaningful epistemic uncertainty across extended decision horizons.

Claimed Contributions

Bayesian Ensemble (BE) framework for sequential decision-making

10 retrieved papers

The authors introduce Bayesian Ensemble, a framework that updates both ensemble member parameters and the index distribution for selecting members. Unlike prior methods using fixed uniform sampling, BE dynamically updates the sampling distribution over ensemble members through Bayesian inference based on observed rewards.

10 retrieved papers

Extension to bandit learning and reinforcement learning

10 retrieved papers

The authors extend the BE framework to contextual bandits and reinforcement learning settings, proposing specific instantiations called Bayesian Ensemble Bandit and Bayesian Ensemble Deep Q-Network for different sequential decision-making problems.

10 retrieved papers

Unified framework with theoretically grounded variance reduction

2 retrieved papers

The authors present BE as a unified framework applicable to both contextual bandits and reinforcement learning that achieves improved exploration efficiency through variance reduction with theoretical grounding, while being versatile enough to enhance various ensemble-based Thompson sampling and RL methods.

2 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

[4] Revalued: Regularised ensemble value-decomposition for factorisable markov decision processes PDF

Ireland David, Montana, Giovanni, David Ireland, Giovanni Montana (2024)

[9] Sentinel: taming uncertainty with ensemble based distributional reinforcement learning PDF

Eriksson, Hannes, Basu, Debabrota, Hannes Eriksson, Alibeigi, Mina, D. Basu, Dimitrakakis, Christos, Mina Alibeigi, Christos Dimitrakakis (2022)

[26] Randomized prior functions for deep reinforcement learning PDF

Ian Osband, John Aslanides, Albin Cassirer (2018)

[29] Keep various trajectories: Promoting exploration of ensemble policies in continuous control PDF

Li Chao, Chao Li, Chen Gong, He Qiang, Qiang He, Hou Xinwen, Xinwen Hou (2023)

[31] Uncertainty-based out-of-distribution detection in deep reinforcement learning PDF

Sedlmeier, Andreas, Andreas Sedlmeier, Gabor, Thomas, Thomas Gabor, Phan Thomy, Thomy Phan, Belzner Lenz, Lenz Belzner, Linnhoff-Popien, Claudia, Claudia LinnhoffâPopien, Claudia Linnhoff-Popien (2019)

[44] Ensemble-based uncertainty estimation with overlapping alternative predictions PDF

Roscher, Karsten, Unav (2022)

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Bayesian Ensemble (BE) framework for sequential decision-making

[61] Bayesian optimization based dynamic ensemble for time series forecasting PDF

Cannot Refute

[62] From automation to autonomy in smart manufacturing: a Bayesian optimization framework for modeling multi-objective experimentation and sequential decision â¦ PDF

Cannot Refute

[63] An Ensemble Bayesian Dynamic Linear Model for Human Activity Recognition PDF

Cannot Refute

[64] Successive Halving Based Online Ensemble Selection for Concept-Drift Adaptation PDF

Cannot Refute

[65] Robust sequential online prediction with dynamic ensemble of multiple models: A review PDF

Cannot Refute

[66] On Sequential Bayesian Inference for Continual Learning PDF

Cannot Refute

[67] Diversity-oriented dynamic ensemble selection approach for multi-class road traffic injury severity predictions with interpretable insights PDF

Cannot Refute

[68] Inference and dynamic decision-making for deteriorating systems with probabilistic dependencies through Bayesian networks and deep reinforcement learning PDF

Cannot Refute

[69] Sequential reversible jump MCMC for dynamic Bayesian neural networks PDF

Cannot Refute

[70] Sequential Bayesian Neural Subnetwork Ensembles PDF

Cannot Refute

Contribution

Extension to bandit learning and reinforcement learning

[51] Deep Q-Learning with Whittle Index for Contextual Restless Bandits: Application to Email Recommender Systems PDF

Cannot Refute

[52] Reinforcement Learning with an Ensemble of Binary Action Deep Q-Networks PDF

Cannot Refute

[53] Horizontal scaling in cloud using contextual bandits PDF

Cannot Refute

[54] A novel ensemble XGBoost and deep Q-network for pregnancy risk prediction on multi-class imbalanced datasets PDF

Cannot Refute

[55] Hedging using reinforcement learning: Contextual k-Armed Bandit versus Q-learning PDF

Cannot Refute

[56] Contextualized hybrid ensemble Q-learning: Learning fast with control priors PDF

Cannot Refute

[57] Optimizing Neural Spike Train Prediction Using Contextual Bandit Algorithms Within Dqn Frameworks PDF

Cannot Refute

[58] Deep reinforcement learning in action PDF

Cannot Refute

[59] EdgeAISim: A toolkit for simulation and modelling of AI models in edge computing environments PDF

Cannot Refute

[60] Learning promotion policies with attention-based deep Q-networks PDF

Cannot Refute

Contribution

Unified framework with theoretically grounded variance reduction

[71] â¦ Disease Prediction Among Middle-Aged Individuals Using Reinforcement Learning Dynamic Ensemble Selection with Customizable Actions and Exploration â¦ PDF

Cannot Refute

[72] Reinforcement learning for traffic signal control: advancing efficiency through hybrid exploration strategies: S. Thadikamalla et al. PDF

Cannot Refute

Bayesian Ensemble for Sequential Decision-Making

Overview

Overall Novelty Assessment

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

[4] Revalued: Regularised ensemble value-decomposition for factorisable markov decision processes PDF

[9] Sentinel: taming uncertainty with ensemble based distributional reinforcement learning PDF

[26] Randomized prior functions for deep reinforcement learning PDF

[29] Keep various trajectories: Promoting exploration of ensemble policies in continuous control PDF

[31] Uncertainty-based out-of-distribution detection in deep reinforcement learning PDF

[44] Ensemble-based uncertainty estimation with overlapping alternative predictions PDF

Contribution Analysis

Bayesian Ensemble (BE) framework for sequential decision-making

[61] Bayesian optimization based dynamic ensemble for time series forecasting PDF

[62] From automation to autonomy in smart manufacturing: a Bayesian optimization framework for modeling multi-objective experimentation and sequential decision â¦ PDF

[63] An Ensemble Bayesian Dynamic Linear Model for Human Activity Recognition PDF

[64] Successive Halving Based Online Ensemble Selection for Concept-Drift Adaptation PDF

[65] Robust sequential online prediction with dynamic ensemble of multiple models: A review PDF

[66] On Sequential Bayesian Inference for Continual Learning PDF

[67] Diversity-oriented dynamic ensemble selection approach for multi-class road traffic injury severity predictions with interpretable insights PDF

[68] Inference and dynamic decision-making for deteriorating systems with probabilistic dependencies through Bayesian networks and deep reinforcement learning PDF

[69] Sequential reversible jump MCMC for dynamic Bayesian neural networks PDF

[70] Sequential Bayesian Neural Subnetwork Ensembles PDF

Extension to bandit learning and reinforcement learning

[51] Deep Q-Learning with Whittle Index for Contextual Restless Bandits: Application to Email Recommender Systems PDF

[52] Reinforcement Learning with an Ensemble of Binary Action Deep Q-Networks PDF

[53] Horizontal scaling in cloud using contextual bandits PDF

[54] A novel ensemble XGBoost and deep Q-network for pregnancy risk prediction on multi-class imbalanced datasets PDF

[55] Hedging using reinforcement learning: Contextual k-Armed Bandit versus Q-learning PDF

[56] Contextualized hybrid ensemble Q-learning: Learning fast with control priors PDF

[57] Optimizing Neural Spike Train Prediction Using Contextual Bandit Algorithms Within Dqn Frameworks PDF

[58] Deep reinforcement learning in action PDF

[59] EdgeAISim: A toolkit for simulation and modelling of AI models in edge computing environments PDF

[60] Learning promotion policies with attention-based deep Q-networks PDF

Unified framework with theoretically grounded variance reduction

[71] â¦ Disease Prediction Among Middle-Aged Individuals Using Reinforcement Learning Dynamic Ensemble Selection with Customizable Actions and Exploration â¦ PDF

[72] Reinforcement learning for traffic signal control: advancing efficiency through hybrid exploration strategies: S. Thadikamalla et al. PDF

Table of Contents

[62] From automation to autonomy in smart manufacturing: a Bayesian optimization framework for modeling multi-objective experimentation and sequential decision â¦ PDF

[71] â¦ Disease Prediction Among Middle-Aged Individuals Using Reinforcement Learning Dynamic Ensemble Selection with Customizable Actions and Exploration â¦ PDF