Operator Theory-Driven Autoformulation of MDPs for Control of Queueing Systems

ICLR 2026 Conference SubmissionAnonymous Authors
Autoformulationautoformalismlarge language modelMarkov decision processqueueing systems
Abstract:

Autoformulation is an emerging field that uses large language models (LLMs) to translate natural-language descriptions of decision-making problems into formal mathematical formulations. Existing works have focused on autoformulating mathematical optimization problems for one-shot\textit{one-shot} decision-making. However, many real-world decision-making problems are sequential\textit{sequential}, best modeled as Markov decision processes\textit{Markov decision processes} (MDPs). MDPs introduce unique challenges for autoformulation, including a significantly larger formulation search space, and for computing and interpreting the optimal policy. In this work, we address these challenges in the context of queueing problems---central to domains such as healthcare and logistics---which often require substantial technical expertise to formulate correctly. We propose a novel operator-theoretic autoformulation framework using LLMs. Our approach captures the underlying decision structure of queueing problems through constructing the Bellman equation as a graph of operators\textit{operators}, where each operator is an interpretable\textit{interpretable} transformation of the value function corresponding to certain event\textit{event} (e.g., arrival, departure, routing). Theoretically, we prove a universal three-level operator-graph topology covering a broad class of MDPs, significantly shrinking the formulation search space. Algorithmically, we propose customized Monte Carlo tree search to build operator graphs while incorporating self-evaluation, solver feedback, and intermediate syntax checking for early assessment, and present a provably low-complexity algorithm that automatically identifies structures of the optimal policy (e.g., threshold-based), accelerating downstream solving. Numerical results demonstrate the effectiveness of our approach in formulating queueing problems and identifying structural results.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Taxonomy

Core-task Taxonomy Papers
50
3
Claimed Contributions
27
Contribution Candidate Papers Compared
0
Refutable Paper

Research Landscape Overview

Core task: Autoformulation of Markov decision processes for sequential decision-making problems. The field addresses how to systematically construct and specify MDPs from high-level problem descriptions or domain knowledge, rather than hand-crafting state spaces and transition models. The taxonomy reveals several complementary directions: Automated MDP Construction and Formulation explores methods that generate or refine MDP structures using operator-theoretic approaches, knowledge-based frameworks, and automated generation pipelines (e.g., Automated MDP Generation[14], A-LAMP Automated MDP[33]). Domain-Specific MDP Applications demonstrate how autoformulation adapts to concrete settings such as vehicle scheduling, workflow composition, and resource allocation. Meanwhile, branches on Fairness and Social Objectives, Hierarchical and Multi-Step Decision Structures, and Uncertainty and Robustness address orthogonal concerns—ensuring equitable outcomes (Sequential Fairness Adaptation[3]), managing temporal abstractions (StepTool Multi-Step[12]), and handling model uncertainty (Distributionally Robust Optimization[21])—that often intersect with the core autoformulation challenge. A particularly active line of work focuses on bridging symbolic or natural-language specifications with formal MDP representations, exemplified by efforts in automated workflow construction and planning under uncertain specifications. Another contrasting theme involves learning abstractions or state representations directly from data (Abstract MDP Learning[19], MDP Abstractions Data[20]), trading off manual design effort against sample complexity. Operator Theory MDPs[0] sits within the structural formulation cluster, emphasizing mathematical frameworks that leverage operator-theoretic tools to derive MDP components systematically. This approach contrasts with more data-driven or domain-specific methods: whereas Knowledge-based Decision Models[37] relies on explicit domain ontologies and Automated MDP Generation[14] targets end-to-end pipeline automation, Operator Theory MDPs[0] provides a principled algebraic foundation for constructing transition operators and reward structures, offering theoretical guarantees at the cost of requiring deeper mathematical machinery.

Claimed Contributions

Operator-theoretic autoformulation framework for MDPs

The authors introduce a novel framework that uses operator theory to automatically translate natural-language descriptions of queueing problems into formal MDP formulations while simultaneously discovering structural properties of optimal policies. This framework represents Bellman equations as operator graphs, where each operator corresponds to interpretable transformations related to specific events.

7 retrieved papers
Universal three-level operator graph topology theorem

The authors prove that all event-based MDPs can be represented using a fixed three-level tree topology with cost operators at the root, uniformization operators as intermediate nodes, and event operators as leaves. This theoretical result significantly constrains the formulation search space from exponentially many possible graph structures to a single universal topology.

10 retrieved papers
Low-complexity algorithm for automatic structure identification

The authors develop Algorithms 1-3 with proven O(N|G|^2) time complexity that automatically identify structural properties of optimal policies (such as monotonicity or threshold-based behavior) from operator graphs. This addresses both computational tractability by enabling specialized solvers and interpretability by revealing policy structure before solving.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Operator-theoretic autoformulation framework for MDPs

The authors introduce a novel framework that uses operator theory to automatically translate natural-language descriptions of queueing problems into formal MDP formulations while simultaneously discovering structural properties of optimal policies. This framework represents Bellman equations as operator graphs, where each operator corresponds to interpretable transformations related to specific events.

Contribution

Universal three-level operator graph topology theorem

The authors prove that all event-based MDPs can be represented using a fixed three-level tree topology with cost operators at the root, uniformization operators as intermediate nodes, and event operators as leaves. This theoretical result significantly constrains the formulation search space from exponentially many possible graph structures to a single universal topology.

Contribution

Low-complexity algorithm for automatic structure identification

The authors develop Algorithms 1-3 with proven O(N|G|^2) time complexity that automatically identify structural properties of optimal policies (such as monotonicity or threshold-based behavior) from operator graphs. This addresses both computational tractability by enabling specialized solvers and interpretability by revealing policy structure before solving.