Differentiable Model Predictive Control on the GPU

ICLR 2026 Conference SubmissionAnonymous Authors
differentiable optimizationmodel predictive controloptimal controlgpu-accelerated optimizationreinforcement learningimitation learningrobotics
Abstract:

Differentiable model predictive control (MPC) offers a powerful framework for combining learning and control. However, its adoption has been limited by the inherently sequential nature of traditional optimization algorithms, which are challenging to parallelize on modern computing hardware like GPUs. In this work, we tackle this bottleneck by introducing a GPU-accelerated differentiable optimization tool for MPC. This solver leverages sequential quadratic programming and a custom preconditioned conjugate gradient (PCG) routine with tridiagonal preconditioning to exploit the problem's structure and enable efficient parallelization. We demonstrate substantial speedups over CPU- and GPU-based baselines, significantly improving upon state-of-the-art training times on benchmark reinforcement learning and imitation learning tasks. Finally, we showcase the method on the challenging task of reinforcement learning for driving at the limits of handling, where it enables robust drifting of a Toyota Supra through water puddles.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper introduces a GPU-accelerated differentiable MPC solver combining sequential quadratic programming with a custom preconditioned conjugate gradient routine featuring tridiagonal preconditioning. It resides in the 'Preconditioned Conjugate Gradient Approaches' leaf, which contains only two papers total (including this one and one sibling). This is a notably sparse research direction within the broader taxonomy of 36 papers across 28 leaf nodes, suggesting the specific combination of GPU acceleration, differentiability, and PCG-based iterative solvers remains relatively underexplored compared to sampling-based or direct shooting methods.

The taxonomy reveals several neighboring directions: the sibling 'Condensed-Space Interior-Point Methods' leaf focuses on eliminating state variables rather than iterative PCG solvers, while the parallel 'Parallel Shooting and Direct Methods' branch employs primal-dual KKT solvers or multilevel SQP without emphasizing PCG preconditioning. Nearby 'Differentiable MPC for End-to-End Learning' nodes integrate MPC with actor-critic frameworks but do not necessarily prioritize iterative linear system solvers. The paper's position bridges gradient-based optimization infrastructure with learning-integrated applications, sitting at the intersection of solver design and end-to-end training pipelines.

Among 16 candidates examined across three contributions, none clearly refute the core claims. The GPU-accelerated differentiable optimization tool examined 7 candidates with 0 refutations; the tridiagonal PCG routine examined 1 candidate with 0 refutations; and the robust drifting application examined 8 candidates with 0 refutations. This limited search scope—focused on top-K semantic matches—suggests that within the examined literature, the specific combination of SQP, custom PCG preconditioning, and differentiability for learning tasks appears distinct. However, the small candidate pool means the analysis does not capture exhaustive prior work in iterative MPC solvers or GPU optimization.

Given the sparse taxonomy leaf and absence of refutations among 16 examined candidates, the work appears to occupy a relatively novel niche within GPU-accelerated MPC. The emphasis on tridiagonal preconditioning for differentiable SQP distinguishes it from both sampling-heavy methods and direct KKT approaches. Nonetheless, the limited search scope and small sibling set mean this assessment reflects positioning within a focused literature subset rather than comprehensive field coverage.

Taxonomy

Core-task Taxonomy Papers
36
3
Claimed Contributions
16
Contribution Candidate Papers Compared
0
Refutable Paper

Research Landscape Overview

Core task: GPU-accelerated differentiable model predictive control. The field organizes around several complementary directions. Gradient-Based Trajectory Optimization Methods focus on iterative solvers and direct differentiation through dynamics, often leveraging automatic differentiation frameworks to compute sensitivities efficiently. Sampling-Based and Hybrid Methods explore stochastic rollouts and path-integral formulations that naturally parallelize on GPUs, trading off gradient precision for robustness in high-dimensional or uncertain settings. Learning-Integrated MPC Frameworks blend neural network components with classical control loops, using differentiable MPC layers to enable end-to-end training. Domain-Specific MPC Applications tailor these techniques to robotics, autonomous driving, and soft-body systems, while Specialized Optimization and Solver Techniques develop custom algorithms for constrained quadratic programs and nonlinear solvers. Differentiable Simulation and Surrogate Modeling emphasizes physics engines and reduced-order models that support backpropagation, and Hardware Design contributions address VLSI and embedded implementations. A particularly active line of work targets real-time control for legged robots and manipulators, where GPU parallelism enables rapid trajectory recomputation at high frequencies—examples include Whole Body MPC GPU[4] and GPU Stochastic Predictive Legged[10]. Another contrasting theme is the integration of learned components, as seen in Physics Neural MPC Quadcopter[3] and Adaptive DiffTune MPC[30], which adapt model parameters online. The original paper, Differentiable MPC GPU[0], sits squarely within the gradient-based optimization branch, specifically employing preconditioned conjugate gradient solvers to handle large-scale linear systems arising from sequential quadratic programming. This places it close to Mpcgpu[2], which similarly exploits GPU-accelerated iterative methods, but Differentiable MPC GPU[0] emphasizes end-to-end differentiability to support learning pipelines. Compared to sampling-heavy approaches like Feedback-MPPI[14], it trades stochastic exploration for deterministic gradient descent, reflecting a broader tension between computational efficiency and algorithmic flexibility across the taxonomy.

Claimed Contributions

GPU-accelerated differentiable optimization tool for MPC

The authors introduce DiffMPC, a differentiable solver for model predictive control that runs efficiently on GPUs. It uses sequential quadratic programming combined with a custom preconditioned conjugate gradient routine that exploits the sparse-in-time structure of optimal control problems to enable parallelization over time steps.

7 retrieved papers
Preconditioned conjugate gradient routine with tridiagonal preconditioning

The authors adapt and implement a PCG routine with tridiagonal preconditioning that solves the KKT linear systems by exploiting the block-tridiagonal structure of the Schur complement. This design enables parallelization over time steps and supports warm-starting, making it suitable for GPU execution.

1 retrieved paper
Application to robust drifting via domain randomization and RL

The authors demonstrate DiffMPC on a reinforcement learning task for autonomous drifting under model mismatch. They use domain randomization over nonlinear dynamics to learn MPC cost and vehicle parameters, achieving robust drifting through water puddles on a Toyota Supra.

8 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

GPU-accelerated differentiable optimization tool for MPC

The authors introduce DiffMPC, a differentiable solver for model predictive control that runs efficiently on GPUs. It uses sequential quadratic programming combined with a custom preconditioned conjugate gradient routine that exploits the sparse-in-time structure of optimal control problems to enable parallelization over time steps.

Contribution

Preconditioned conjugate gradient routine with tridiagonal preconditioning

The authors adapt and implement a PCG routine with tridiagonal preconditioning that solves the KKT linear systems by exploiting the block-tridiagonal structure of the Schur complement. This design enables parallelization over time steps and supports warm-starting, making it suitable for GPU execution.

Contribution

Application to robust drifting via domain randomization and RL

The authors demonstrate DiffMPC on a reinforcement learning task for autonomous drifting under model mismatch. They use domain randomization over nonlinear dynamics to learn MPC cost and vehicle parameters, achieving robust drifting through water puddles on a Toyota Supra.