Speculative Actions: A Lossless Framework for Faster AI Agents

ICLR 2026 Conference SubmissionAnonymous Authors
AI AgentsSpeculative DecodingParallel ExecutionAgentic ServingAgentic Simulation
Abstract:

AI agents have attracted growing interest across industry and academia, but in practice their execution can be slow. For example, letting two state-of-the-art agents play a game of chess may take hours. A key bottleneck is that agent behavior unfolds sequentially: each action requires an API call, and these calls can be time-consuming. Inspired by speculative execution in microprocessors and speculative decoding in LLM inference, we propose speculative actions—a lossless framework that predicts likely actions using faster models, enabling multiple API calls to be executed in parallel. We evaluate this framework across four agentic environments: gaming, e-commerce, web search, and operating systems. In all cases, speculative actions yield substantial acceleration, with potential speedups of up to 30%. Moreover, performance can be further improved through stronger guessing models and top-K action prediction, opening a promising path toward real world, efficient deployment of AI agents.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper proposes a speculative actions framework that predicts likely agent actions using faster models to enable parallel API execution, drawing inspiration from speculative decoding in LLM inference. Within the taxonomy, it resides in the 'Speculative Action Prediction in General Agentic Systems' leaf, which contains only two papers total. This leaf sits under the broader 'Speculative Execution Frameworks for Agent Acceleration' branch, which encompasses five specialized subcategories addressing VLA models, LLM-based planning, ranking systems, and edge devices. The sparse population of this particular leaf suggests the work targets a relatively nascent research direction focused on domain-agnostic speculation mechanisms rather than task-specific predictive architectures.

The taxonomy reveals neighboring branches that explore related but distinct approaches to agent acceleration and prediction. Adjacent leaves include 'Speculative Planning for LLM-Based Agents' (focusing on planning latency reduction through co-design) and 'Speculative Decoding for Vision-Language-Action Models' (applying drafting-verification to VLA inference). The broader 'Predictive Models for Agent Behavior' branch contains trajectory forecasting and action prediction methods that emphasize learning from interaction traces rather than parallelization mechanisms. The paper's position bridges general-purpose speculation frameworks and domain-specific applications, with the taxonomy explicitly excluding prediction models without parallelization from this leaf while directing domain-specific implementations to other subcategories.

Among the three contributions analyzed across thirty candidate papers, the core speculative actions framework shows one refutable candidate among ten examined, indicating some prior overlap in the limited search scope. The unified API-call abstraction and multi-environment demonstration contributions each examined ten candidates with zero refutations, suggesting these aspects may be more distinctive within the search scope. The statistics reflect a focused literature search rather than exhaustive coverage, with the single refutable pair likely representing work in the same sparse research direction. The framework's generality across gaming, e-commerce, web search, and operating systems appears less directly addressed in the examined candidates, though the limited sample size constrains definitive conclusions.

Based on the examined thirty candidates, the work appears to occupy a relatively unexplored intersection of general-purpose speculation and multi-domain agentic systems. The sparse taxonomy leaf and limited refutations suggest novelty within the search scope, though the presence of one overlapping candidate indicates the core speculation concept has precedent. The analysis captures top-K semantic matches and does not exhaustively cover all related work in agent acceleration, particularly in specialized domains like robotics or autonomous vehicles where prediction mechanisms may differ substantially from the proposed API-parallel framework.

Taxonomy

Core-task Taxonomy Papers
49
3
Claimed Contributions
29
Contribution Candidate Papers Compared
1
Refutable Paper

Research Landscape Overview

Core task: Accelerating agentic systems through speculative action prediction. The field addresses how autonomous agents—ranging from web-based assistants to robotic controllers—can reduce latency and improve responsiveness by predicting and pre-executing likely future actions. The taxonomy organizes this landscape into several major branches: Speculative Execution Frameworks for Agent Acceleration develop general-purpose mechanisms for drafting and verifying candidate actions (e.g., Speculative Actions Framework[10], Reinforcement Speculative Decoding[7]); Predictive Models for Agent Behavior and Trajectory Forecasting focus on forecasting multi-step sequences in navigation and driving contexts (e.g., MTR Plus Plus[2], Precog[5]); GUI and Web Automation Agents with Efficient Action Understanding tackle screen-based tasks where predicting user or agent clicks can streamline interaction (e.g., ScreenLLM[4], Predicting Future Actions[3]); and branches on Autonomous Navigation and Robotics, Multi-Agent Coordination, Theoretical Foundations, Edge Computing, Anticipatory Behavior, and Domain-Specific Applications each explore how prediction and speculation manifest in their respective settings—from robot motion planning (Spec VLA[6]) to distributed edge intelligence (Edge General Intelligence[13]) and cooperative control (Event Triggered Cooperative[27]). A particularly active line of work centers on general speculative frameworks that borrow ideas from language-model speculative decoding and adapt them to action spaces, aiming to balance the cost of generating multiple candidate actions against the speedup from parallel verification. Speculative Actions[0] sits squarely in this branch, proposing mechanisms to draft and validate action sequences in agentic systems, closely aligned with Speculative Actions Framework[10] and Reinforcement Speculative Decoding[7], which similarly explore how to leverage smaller or faster models to propose actions that a larger policy then confirms. In contrast, works like Predicting Future Actions[3] and ScreenLLM[4] emphasize learning predictive models from interaction traces in GUI environments, while Precog[5] and MTR Plus Plus[2] focus on trajectory forecasting for autonomous vehicles, highlighting a trade-off between domain-agnostic speculation frameworks and task-specific predictive architectures. Open questions remain around how to ensure safety during speculative execution (Safety Assured Speculative[12]), how to handle multi-agent scenarios where predictions must account for other agents' behaviors (Mutual Prediction[41]), and how to deploy these techniques efficiently at the edge (Edge General Intelligence[13]).

Claimed Contributions

Speculative actions framework for agentic systems

The authors introduce a general framework that allows agents to predict and tentatively pursue the most likely next actions using faster models while slower ground-truth executors catch up. This framework treats each action in an agentic system as an API call and uses a Speculator to predict responses in parallel with an Actor that provides authoritative outputs, achieving lossless speedup through validation and rollback mechanisms.

9 retrieved papers
Can Refute
Unified API-call abstraction for agentic environments

The authors propose modeling every action in an agentic system (LLM calls, tool invocations, MCP server requests, and human responses) as an API call. This abstraction provides a unified framework for optimizing system latency and aligns with the emerging environment and MCP perspectives on agentic systems.

10 retrieved papers
Demonstration across multiple agentic environments

The authors instantiate and evaluate their speculative actions framework across four diverse environments (chess gameplay, e-commerce dialogue, multi-hop web search, and OS hyperparameter tuning), demonstrating substantial accuracy in next-action prediction and significant reductions in end-to-end latency across different types of agent-environment interactions.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Within the taxonomy built over the current TopK core-task papers, the original paper is assigned to a leaf with no direct siblings and no cousin branches under the same grandparent topic. In this retrieved landscape, it appears structurally isolated, which is one partial signal of novelty, but still constrained by search coverage and taxonomy granularity.

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Speculative actions framework for agentic systems

The authors introduce a general framework that allows agents to predict and tentatively pursue the most likely next actions using faster models while slower ground-truth executors catch up. This framework treats each action in an agentic system as an API call and uses a Speculator to predict responses in parallel with an Actor that provides authoritative outputs, achieving lossless speedup through validation and rollback mechanisms.

Contribution

Unified API-call abstraction for agentic environments

The authors propose modeling every action in an agentic system (LLM calls, tool invocations, MCP server requests, and human responses) as an API call. This abstraction provides a unified framework for optimizing system latency and aligns with the emerging environment and MCP perspectives on agentic systems.

Contribution

Demonstration across multiple agentic environments

The authors instantiate and evaluate their speculative actions framework across four diverse environments (chess gameplay, e-commerce dialogue, multi-hop web search, and OS hyperparameter tuning), demonstrating substantial accuracy in next-action prediction and significant reductions in end-to-end latency across different types of agent-environment interactions.

Speculative Actions: A Lossless Framework for Faster AI Agents | Novelty Validation