Optimas: Optimizing Compound AI Systems with Globally Aligned Local Rewards

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 6.0 Download Report PDF

Compound AI SystemHeterogenous ConfigurationOptimizationLocal Rewards

Compound AI systems integrating multiple components, such as Large Language Models, specialized tools, and traditional machine learning models, are increasingly deployed to solve complex real-world tasks. However, optimizing compound systems remains challenging due to their non-differentiable structures and diverse configuration types across components, including prompts, hyperparameters, and model parameters. To address this challenge, we propose Optimas, a unified framework for effective optimization of compound systems. The core idea of Optimas is to maintain one Local Reward Function (LRF) per component, each satisfying a local–global alignment property, i.e., each component’s local reward correlates with the global system performance. In each iteration, Optimas efficiently adapts the LRFs to maintain this property while simultaneously maximizing each component’s local reward. This approach enables independent updates of heterogeneous configurations using the designated optimization method, while ensuring that local improvements consistently lead to performance gains. We present extensive evaluations across five real-world compound systems to demonstrate that Optimas outperforms strong baselines by an average improvement of 11.92%, offering a general and effective approach for improving compound systems.

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper proposes Optimas, a framework for optimizing compound AI systems by maintaining Local Reward Functions (LRFs) per component that align with global performance. It resides in the 'End-to-End System Alignment' leaf, which contains only three papers total, including this one. This leaf sits within the broader 'System-Level Optimization and Integration' branch, indicating a relatively sparse but active research direction focused on holistic pipeline optimization rather than isolated component tuning. The small cluster size suggests this is an emerging area where methods for coordinating heterogeneous configurations across multi-component systems are still being developed.

The taxonomy reveals that neighboring leaves address related but distinct challenges. 'Retrieval-Augmented Systems Optimization' focuses specifically on query-retrieval-generation pipelines, while 'Multi-Agent Coordination and Orchestration' emphasizes agent-based workflows in production environments. The 'Component-Level Optimization Methods' branch, by contrast, targets individual modules like neural architecture search or parameter-efficient fine-tuning. Optimas bridges these perspectives by proposing a system-level alignment mechanism that operates across heterogeneous components, distinguishing it from purely modular approaches while sharing the end-to-end optimization philosophy with its two sibling papers in the same leaf.

Among 29 candidates examined, none clearly refute any of the three core contributions. For the OPTIMAS framework itself, 10 candidates were reviewed with zero refutable overlaps; the adaptive local-global alignment mechanism similarly examined 10 candidates with no refutations; and the theoretical convergence guarantees analyzed 9 candidates, again finding no prior work that directly anticipates these results. This limited search scope—focused on top-K semantic matches and citation expansion—suggests that within the examined literature, the specific combination of per-component LRFs with local-global alignment properties appears novel, though a more exhaustive survey might uncover additional related work.

Based on the available signals, Optimas appears to occupy a relatively underexplored niche within system-level optimization, particularly in its approach to maintaining alignment properties across heterogeneous configurations. The sparse taxonomy leaf and absence of refuting candidates among 29 examined papers suggest meaningful novelty, though the limited search scope means this assessment reflects only a subset of the broader literature. The framework's emphasis on independent component updates while ensuring global coherence distinguishes it from existing end-to-end alignment methods reviewed.

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: Optimizing compound AI systems with heterogeneous components and configurations. The field addresses the challenge of tuning complex AI pipelines that combine diverse modules—such as retrieval systems, language models, and specialized agents—each with distinct parameters and architectural choices. The taxonomy reveals several complementary perspectives: Component-Level Optimization Methods focus on tuning individual building blocks like neural architectures or parameter-efficient fine-tuning modules, while System-Level Optimization and Integration examines how to align entire pipelines end-to-end, ensuring that improvements in one component translate to better overall performance. Infrastructure and Deployment Optimization tackles practical concerns such as resource allocation and cloud-edge coordination, and Hardware-Aware Optimization considers accelerator-specific constraints. Domain-Specific Applications demonstrate these techniques in areas ranging from retrieval-augmented generation (LLaMA Retrieval[6]) to agentic workflows (Heterogeneous Agentic AI[7]), while Methodological Foundations and Surveys (Compound AI Survey[36]) provide overarching frameworks. A particularly active line of work explores end-to-end alignment strategies that optimize multi-component systems holistically rather than tuning each module in isolation. System-Level DPO[2] exemplifies this approach by applying preference-based learning across an entire pipeline, while Decision-Focused Learning[15] integrates prediction and decision-making objectives. Optimas[0] sits squarely within this end-to-end alignment cluster, emphasizing joint optimization of heterogeneous components to maximize system-level outcomes. Compared to works like Otter[1], which may focus on specific modular improvements, Optimas[0] addresses the broader challenge of coordinating diverse configurations and ensuring that local tuning decisions benefit the compound system as a whole. This contrasts with component-centric methods such as Neural Architecture Optimization[3] or BANANAS[5], which primarily target individual module design. The central tension remains how to balance modular flexibility with global coherence, a question that continues to drive research across system-level optimization branches.

Claimed Contributions

OPTIMAS framework for optimizing compound AI systems

10 retrieved papers

The authors introduce OPTIMAS, a framework that optimizes compound AI systems by maintaining Local Reward Functions (LRFs) for each component. These LRFs satisfy a local-global alignment property, enabling independent optimization of heterogeneous configurations (prompts, hyperparameters, model parameters) while ensuring local improvements lead to global performance gains.

10 retrieved papers

Adaptive local-global alignment mechanism for LRFs

10 retrieved papers

The authors propose a two-stage approach: initial reward modeling to establish aligned LRFs, followed by online adaptation using mini-batch preference data. This mechanism maintains the local-global alignment property as system configurations evolve during optimization, without requiring expensive full retraining.

10 retrieved papers

Theoretical convergence guarantees for compound system optimization

9 retrieved papers

The authors prove that LRFs constructed via their method satisfy the local-global alignment property and that OPTIMAS converges to component-wise maximum under regularity conditions. This provides theoretical justification that aligning local and global rewards enables effective optimization.

9 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

[2] Aligning compound ai systems via system-level dpo PDF

Wang XiangWen, Zhang, Yibo Jacky, Xiangwen Wang, Ding, Zhoujie, Y. Zhang, Tsai, Katherine, Zhoujie Ding, WU Haolun, Katherine Tsai, Koyejo, Sanmi, Oluwasanmi Koyejo (2025)

[15] Melding the data-decisions pipeline: Decision-focused learning for combinatorial optimization PDF

Bryan Wilder, Bistra Dilkina, Milind Tambe (2019)

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

OPTIMAS framework for optimizing compound AI systems

[51] Hierarchical deep reinforcement learning for multi-objective integrated circuit physical layout optimization with congestion-aware reward shaping PDF

Cannot Refute

[52] Advances in the application of artificial intelligence in mass spectrometry-based analysis of traditional Chinese medicine: compound identification and metabolic â¦ PDF

Cannot Refute

[53] AI-driven multi-agent scheduling and service quality optimization in microservice systems PDF

Cannot Refute

[54] Generative AI and Blockchain-Integrated Multi-Agent Framework for Resilient and Sustainable Fruit Cold-Chain Logistics PDF

Cannot Refute

[55] Ai-searchplanner: Modular agentic search via pareto-optimal multi-objective reinforcement learning PDF

Cannot Refute

[56] Performance Analysis of Different Reward Functions in Reinforcement Learning for the Scheduling of Modular Automotive Production Systems PDF

Cannot Refute

[57] Distributed value functions PDF

Cannot Refute

[58] Decision stacks: Flexible reinforcement learning via modular generative models PDF

Cannot Refute

[59] Knowledge-Guided Reinforcement Learning for Preventive Maintenance Planning in Economically Dependent Multi-Component Systems PDF

Cannot Refute

[60] Predictive reward-prediction errors of climbing fiber inputs integrate modular reinforcement learning with supervised learning. PDF

Cannot Refute

Contribution

Adaptive local-global alignment mechanism for LRFs

[69] GOAL: Global-local Object Alignment Learning PDF

Cannot Refute

[70] T2VLAD: Global-Local Sequence Alignment for Text-Video Retrieval PDF

Cannot Refute

[71] From Trial to Triumph: Advancing Long Video Understanding via Visual Context Sample Scaling and Self-reward Alignment PDF

Cannot Refute

[72] Blockchain-Anchored Reinforcement Learning Collectives with Tokenized Ecosystem Optimization for Trustless, Bias-Free Adaptation of Complex Systems PDF

Cannot Refute

[73] GLAD: Global-Local-Alignment Descriptor for Pedestrian Retrieval PDF

Cannot Refute

[74] Harness Local Rewards for Global Benefits: Effective Text-to-Video Generation Alignment with Patch-level Reward Models PDF

Cannot Refute

[75] The Multilingual Alignment Prism: Aligning Global and Local Preferences to Reduce Harm PDF

Cannot Refute

[76] Multi-Modal Transformer With Global-Local Alignment for Composed Query Image Retrieval PDF

Cannot Refute

[77] Global Reward to Local Rewards: Multimodal-Guided Decomposition for Improving Dialogue Agents PDF

Cannot Refute

[78] Learning Aligned Local Evaluations For Better Credit Assignment In Cooperative Coevolution PDF

Cannot Refute

Contribution

Theoretical convergence guarantees for compound system optimization

[59] Knowledge-Guided Reinforcement Learning for Preventive Maintenance Planning in Economically Dependent Multi-Component Systems PDF

Cannot Refute

[61] Distributed optimization methods for multi-robot systems: Part 1âa tutorial PDF

Cannot Refute

[62] Deep Reinforcement Learning for Optimal Replenishment in Stochastic Assembly Systems PDF

Cannot Refute

[63] Optimal algorithms for submodular maximization with distributed constraints PDF

Cannot Refute

[64] Federated Block Coordinate Descent Scheme for Learning Global and Personalized Models PDF

Cannot Refute

[65] Distributed proportional stochastic coordinate descent with social sampling PDF

Cannot Refute

[66] Asynchronous parallel nonconvex optimization under the polyak-Åojasiewicz condition PDF

Cannot Refute

[67] Foundations of Multiple-Time-Scale Stochastic Approximation for Fast and Resilient Distributed Optimization Algorithms PDF

Cannot Refute

[68] Mechanics-coupled deep reinforcement learning for automated design of internal supporting structure in foundation pits PDF

Cannot Refute

Optimas: Optimizing Compound AI Systems with Globally Aligned Local Rewards

Overview

Overall Novelty Assessment

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

[2] Aligning compound ai systems via system-level dpo PDF

[15] Melding the data-decisions pipeline: Decision-focused learning for combinatorial optimization PDF

Contribution Analysis

OPTIMAS framework for optimizing compound AI systems

[51] Hierarchical deep reinforcement learning for multi-objective integrated circuit physical layout optimization with congestion-aware reward shaping PDF

[52] Advances in the application of artificial intelligence in mass spectrometry-based analysis of traditional Chinese medicine: compound identification and metabolic â¦ PDF

[53] AI-driven multi-agent scheduling and service quality optimization in microservice systems PDF

[54] Generative AI and Blockchain-Integrated Multi-Agent Framework for Resilient and Sustainable Fruit Cold-Chain Logistics PDF

[55] Ai-searchplanner: Modular agentic search via pareto-optimal multi-objective reinforcement learning PDF

[56] Performance Analysis of Different Reward Functions in Reinforcement Learning for the Scheduling of Modular Automotive Production Systems PDF

[57] Distributed value functions PDF

[58] Decision stacks: Flexible reinforcement learning via modular generative models PDF

[59] Knowledge-Guided Reinforcement Learning for Preventive Maintenance Planning in Economically Dependent Multi-Component Systems PDF

[60] Predictive reward-prediction errors of climbing fiber inputs integrate modular reinforcement learning with supervised learning. PDF

Adaptive local-global alignment mechanism for LRFs

[69] GOAL: Global-local Object Alignment Learning PDF

[70] T2VLAD: Global-Local Sequence Alignment for Text-Video Retrieval PDF

[71] From Trial to Triumph: Advancing Long Video Understanding via Visual Context Sample Scaling and Self-reward Alignment PDF

[72] Blockchain-Anchored Reinforcement Learning Collectives with Tokenized Ecosystem Optimization for Trustless, Bias-Free Adaptation of Complex Systems PDF

[73] GLAD: Global-Local-Alignment Descriptor for Pedestrian Retrieval PDF

[74] Harness Local Rewards for Global Benefits: Effective Text-to-Video Generation Alignment with Patch-level Reward Models PDF

[75] The Multilingual Alignment Prism: Aligning Global and Local Preferences to Reduce Harm PDF

[76] Multi-Modal Transformer With Global-Local Alignment for Composed Query Image Retrieval PDF

[77] Global Reward to Local Rewards: Multimodal-Guided Decomposition for Improving Dialogue Agents PDF

[78] Learning Aligned Local Evaluations For Better Credit Assignment In Cooperative Coevolution PDF

Theoretical convergence guarantees for compound system optimization

[59] Knowledge-Guided Reinforcement Learning for Preventive Maintenance Planning in Economically Dependent Multi-Component Systems PDF

[61] Distributed optimization methods for multi-robot systems: Part 1âa tutorial PDF

[62] Deep Reinforcement Learning for Optimal Replenishment in Stochastic Assembly Systems PDF

[63] Optimal algorithms for submodular maximization with distributed constraints PDF

[64] Federated Block Coordinate Descent Scheme for Learning Global and Personalized Models PDF

[65] Distributed proportional stochastic coordinate descent with social sampling PDF

[66] Asynchronous parallel nonconvex optimization under the polyak-Åojasiewicz condition PDF

[67] Foundations of Multiple-Time-Scale Stochastic Approximation for Fast and Resilient Distributed Optimization Algorithms PDF

[68] Mechanics-coupled deep reinforcement learning for automated design of internal supporting structure in foundation pits PDF

Table of Contents

[52] Advances in the application of artificial intelligence in mass spectrometry-based analysis of traditional Chinese medicine: compound identification and metabolic â¦ PDF

[61] Distributed optimization methods for multi-robot systems: Part 1âa tutorial PDF

[66] Asynchronous parallel nonconvex optimization under the polyak-Åojasiewicz condition PDF