Operator Theory-Driven Autoformulation of MDPs for Control of Queueing Systems

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 7.0 Download Report PDF

Autoformulationautoformalismlarge language modelMarkov decision processqueueing systems

Autoformulation is an emerging field that uses large language models (LLMs) to translate natural-language descriptions of decision-making problems into formal mathematical formulations. Existing works have focused on autoformulating mathematical optimization problems for $\textit{one-shot}$ decision-making. However, many real-world decision-making problems are $\textit{sequential}$ , best modeled as $\textit{Markov decision processes}$ (MDPs). MDPs introduce unique challenges for autoformulation, including a significantly larger formulation search space, and for computing and interpreting the optimal policy. In this work, we address these challenges in the context of queueing problems---central to domains such as healthcare and logistics---which often require substantial technical expertise to formulate correctly. We propose a novel operator-theoretic autoformulation framework using LLMs. Our approach captures the underlying decision structure of queueing problems through constructing the Bellman equation as a graph of $\textit{operators}$ , where each operator is an $\textit{interpretable}$ transformation of the value function corresponding to certain $\textit{event}$ (e.g., arrival, departure, routing). Theoretically, we prove a universal three-level operator-graph topology covering a broad class of MDPs, significantly shrinking the formulation search space. Algorithmically, we propose customized Monte Carlo tree search to build operator graphs while incorporating self-evaluation, solver feedback, and intermediate syntax checking for early assessment, and present a provably low-complexity algorithm that automatically identifies structures of the optimal policy (e.g., threshold-based), accelerating downstream solving. Numerical results demonstrate the effectiveness of our approach in formulating queueing problems and identifying structural results.

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: Autoformulation of Markov decision processes for sequential decision-making problems. The field addresses how to systematically construct and specify MDPs from high-level problem descriptions or domain knowledge, rather than hand-crafting state spaces and transition models. The taxonomy reveals several complementary directions: Automated MDP Construction and Formulation explores methods that generate or refine MDP structures using operator-theoretic approaches, knowledge-based frameworks, and automated generation pipelines (e.g., Automated MDP Generation[14], A-LAMP Automated MDP[33]). Domain-Specific MDP Applications demonstrate how autoformulation adapts to concrete settings such as vehicle scheduling, workflow composition, and resource allocation. Meanwhile, branches on Fairness and Social Objectives, Hierarchical and Multi-Step Decision Structures, and Uncertainty and Robustness address orthogonal concerns—ensuring equitable outcomes (Sequential Fairness Adaptation[3]), managing temporal abstractions (StepTool Multi-Step[12]), and handling model uncertainty (Distributionally Robust Optimization[21])—that often intersect with the core autoformulation challenge. A particularly active line of work focuses on bridging symbolic or natural-language specifications with formal MDP representations, exemplified by efforts in automated workflow construction and planning under uncertain specifications. Another contrasting theme involves learning abstractions or state representations directly from data (Abstract MDP Learning[19], MDP Abstractions Data[20]), trading off manual design effort against sample complexity. Operator Theory MDPs[0] sits within the structural formulation cluster, emphasizing mathematical frameworks that leverage operator-theoretic tools to derive MDP components systematically. This approach contrasts with more data-driven or domain-specific methods: whereas Knowledge-based Decision Models[37] relies on explicit domain ontologies and Automated MDP Generation[14] targets end-to-end pipeline automation, Operator Theory MDPs[0] provides a principled algebraic foundation for constructing transition operators and reward structures, offering theoretical guarantees at the cost of requiring deeper mathematical machinery.

Claimed Contributions

Operator-theoretic autoformulation framework for MDPs

7 retrieved papers

The authors introduce a novel framework that uses operator theory to automatically translate natural-language descriptions of queueing problems into formal MDP formulations while simultaneously discovering structural properties of optimal policies. This framework represents Bellman equations as operator graphs, where each operator corresponds to interpretable transformations related to specific events.

7 retrieved papers

Universal three-level operator graph topology theorem

10 retrieved papers

The authors prove that all event-based MDPs can be represented using a fixed three-level tree topology with cost operators at the root, uniformization operators as intermediate nodes, and event operators as leaves. This theoretical result significantly constrains the formulation search space from exponentially many possible graph structures to a single universal topology.

10 retrieved papers

Low-complexity algorithm for automatic structure identification

10 retrieved papers

The authors develop Algorithms 1-3 with proven O(N|G|^2) time complexity that automatically identify structural properties of optimal policies (such as monotonicity or threshold-based behavior) from operator graphs. This addresses both computational tractability by enabling specialized solvers and interpretability by revealing policy structure before solving.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

[37] Knowledge-based formulation of dynamic decision models PDF

Chenggang Wang, T. Leong, Tze Yun Leong (1998)

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Operator-theoretic autoformulation framework for MDPs

[51] Topological foundations of reinforcement learning PDF

Cannot Refute

[52] Budgeted reinforcement learning in continuous state space PDF

Cannot Refute

[53] Koopman-Driven Linearized Model-Based Offline Planning With Application to Freeway Ramp Metering PDF

Cannot Refute

[54] Learning representation and control in Markov decision processes: New frontiers PDF

Cannot Refute

[55] Comprehensive uncertainty management in MDPs PDF

Cannot Refute

[56] Linear Reinforcement Learning with Options PDF

Cannot Refute

[57] Optimization Over Time Volume 1: Dynamic Programming and Stochastic Control (Peter Whittle) PDF

Cannot Refute

Contribution

Universal three-level operator graph topology theorem

[68] Multi-UAV dynamic task assignment based on event-triggered graph reinforcement learning under weak communication PDF

Cannot Refute

[69] Event-Driven Transformer-Based Reinforcement Learning for Trajectory Design and Channel Assignment in Multi-UAV Assisted Communication PDF

Cannot Refute

[70] Medium Voltage Direct Current Shipboard Power Network Reconfiguration Using Graph-Based Reinforcement Learning PDF

Cannot Refute

[71] RLHGNN: Reinforcement Learning-driven Heterogeneous Graph Neural Network for Next Activity Prediction in Business Processes PDF

Cannot Refute

[72] Learning expensive coordination: An event-based deep rl approach PDF

Cannot Refute

[73] A Deep Reinforcement Learning with Transformer Integration for Directed Acyclic Graph Scheduling in Edge Networks PDF

Cannot Refute

[74] A Proactive Complex Event Processing Method Based on Parallel Markov Decision Processes PDF

Cannot Refute

[75] Hierarchical Reinforcement Learning-Based Charging Recommendation Strategy for Electric Ride-Hailing Vehicles PDF

Cannot Refute

[76] QoS-Aware Dynamic CU Selection in O-RAN with Graph-Based Reinforcement Learning PDF

Cannot Refute

[77] Efficient dynamic evolution of service composition PDF

Cannot Refute

Contribution

Low-complexity algorithm for automatic structure identification

[58] Policy Gradient Methods for Information-Theoretic Opacity in Markov Decision Processes PDF

Cannot Refute

[59] Diwa: Diffusion policy adaptation with world models PDF

Cannot Refute

[60] Multi-reward best policy identification PDF

Cannot Refute

[61] Enhancing LLM QoS Through Cloud-Edge Collaboration: A Diffusion-Based Multi-Agent Reinforcement Learning Approach PDF

Cannot Refute

[62] Best policy identification in linear mdps PDF

Cannot Refute

[63] Autonomous Vehicles Driving at Unsigned Intersections Based on Improved Proximal Policy Optimization Algorithm PDF

Cannot Refute

[64] Task-Driven Priority-Aware Computation Offloading Using Deep Reinforcement Learning PDF

Cannot Refute

[65] Policy Testing in Markov Decision Processes PDF

Cannot Refute

[66] Best policy identification in discounted linear mdps PDF

Cannot Refute

[67] Efficient Computation of Blackwell Optimal Policies using Rational Functions PDF

Cannot Refute

Operator Theory-Driven Autoformulation of MDPs for Control of Queueing Systems

Overview

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

[37] Knowledge-based formulation of dynamic decision models PDF

Contribution Analysis

Operator-theoretic autoformulation framework for MDPs

[51] Topological foundations of reinforcement learning PDF

[52] Budgeted reinforcement learning in continuous state space PDF

[53] Koopman-Driven Linearized Model-Based Offline Planning With Application to Freeway Ramp Metering PDF

[54] Learning representation and control in Markov decision processes: New frontiers PDF

[55] Comprehensive uncertainty management in MDPs PDF

[56] Linear Reinforcement Learning with Options PDF

[57] Optimization Over Time Volume 1: Dynamic Programming and Stochastic Control (Peter Whittle) PDF

Universal three-level operator graph topology theorem

[68] Multi-UAV dynamic task assignment based on event-triggered graph reinforcement learning under weak communication PDF

[69] Event-Driven Transformer-Based Reinforcement Learning for Trajectory Design and Channel Assignment in Multi-UAV Assisted Communication PDF

[70] Medium Voltage Direct Current Shipboard Power Network Reconfiguration Using Graph-Based Reinforcement Learning PDF

[71] RLHGNN: Reinforcement Learning-driven Heterogeneous Graph Neural Network for Next Activity Prediction in Business Processes PDF

[72] Learning expensive coordination: An event-based deep rl approach PDF

[73] A Deep Reinforcement Learning with Transformer Integration for Directed Acyclic Graph Scheduling in Edge Networks PDF

[74] A Proactive Complex Event Processing Method Based on Parallel Markov Decision Processes PDF

[75] Hierarchical Reinforcement Learning-Based Charging Recommendation Strategy for Electric Ride-Hailing Vehicles PDF

[76] QoS-Aware Dynamic CU Selection in O-RAN with Graph-Based Reinforcement Learning PDF

[77] Efficient dynamic evolution of service composition PDF

Low-complexity algorithm for automatic structure identification

[58] Policy Gradient Methods for Information-Theoretic Opacity in Markov Decision Processes PDF

[59] Diwa: Diffusion policy adaptation with world models PDF

[60] Multi-reward best policy identification PDF

[61] Enhancing LLM QoS Through Cloud-Edge Collaboration: A Diffusion-Based Multi-Agent Reinforcement Learning Approach PDF

[62] Best policy identification in linear mdps PDF

[63] Autonomous Vehicles Driving at Unsigned Intersections Based on Improved Proximal Policy Optimization Algorithm PDF

[64] Task-Driven Priority-Aware Computation Offloading Using Deep Reinforcement Learning PDF

[65] Policy Testing in Markov Decision Processes PDF

[66] Best policy identification in discounted linear mdps PDF

[67] Efficient Computation of Blackwell Optimal Policies using Rational Functions PDF

Table of Contents