Optimas: Optimizing Compound AI Systems with Globally Aligned Local Rewards

ICLR 2026 Conference SubmissionAnonymous Authors
Compound AI SystemHeterogenous ConfigurationOptimizationLocal Rewards
Abstract:

Compound AI systems integrating multiple components, such as Large Language Models, specialized tools, and traditional machine learning models, are increasingly deployed to solve complex real-world tasks. However, optimizing compound systems remains challenging due to their non-differentiable structures and diverse configuration types across components, including prompts, hyperparameters, and model parameters. To address this challenge, we propose Optimas, a unified framework for effective optimization of compound systems. The core idea of Optimas is to maintain one Local Reward Function (LRF) per component, each satisfying a local–global alignment property, i.e., each component’s local reward correlates with the global system performance. In each iteration, Optimas efficiently adapts the LRFs to maintain this property while simultaneously maximizing each component’s local reward. This approach enables independent updates of heterogeneous configurations using the designated optimization method, while ensuring that local improvements consistently lead to performance gains. We present extensive evaluations across five real-world compound systems to demonstrate that Optimas outperforms strong baselines by an average improvement of 11.92%, offering a general and effective approach for improving compound systems.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper proposes Optimas, a framework for optimizing compound AI systems by maintaining Local Reward Functions (LRFs) per component that align with global performance. It resides in the 'End-to-End System Alignment' leaf, which contains only three papers total, including this one. This leaf sits within the broader 'System-Level Optimization and Integration' branch, indicating a relatively sparse but active research direction focused on holistic pipeline optimization rather than isolated component tuning. The small cluster size suggests this is an emerging area where methods for coordinating heterogeneous configurations across multi-component systems are still being developed.

The taxonomy reveals that neighboring leaves address related but distinct challenges. 'Retrieval-Augmented Systems Optimization' focuses specifically on query-retrieval-generation pipelines, while 'Multi-Agent Coordination and Orchestration' emphasizes agent-based workflows in production environments. The 'Component-Level Optimization Methods' branch, by contrast, targets individual modules like neural architecture search or parameter-efficient fine-tuning. Optimas bridges these perspectives by proposing a system-level alignment mechanism that operates across heterogeneous components, distinguishing it from purely modular approaches while sharing the end-to-end optimization philosophy with its two sibling papers in the same leaf.

Among 29 candidates examined, none clearly refute any of the three core contributions. For the OPTIMAS framework itself, 10 candidates were reviewed with zero refutable overlaps; the adaptive local-global alignment mechanism similarly examined 10 candidates with no refutations; and the theoretical convergence guarantees analyzed 9 candidates, again finding no prior work that directly anticipates these results. This limited search scope—focused on top-K semantic matches and citation expansion—suggests that within the examined literature, the specific combination of per-component LRFs with local-global alignment properties appears novel, though a more exhaustive survey might uncover additional related work.

Based on the available signals, Optimas appears to occupy a relatively underexplored niche within system-level optimization, particularly in its approach to maintaining alignment properties across heterogeneous configurations. The sparse taxonomy leaf and absence of refuting candidates among 29 examined papers suggest meaningful novelty, though the limited search scope means this assessment reflects only a subset of the broader literature. The framework's emphasis on independent component updates while ensuring global coherence distinguishes it from existing end-to-end alignment methods reviewed.

Taxonomy

Core-task Taxonomy Papers
50
3
Claimed Contributions
29
Contribution Candidate Papers Compared
0
Refutable Paper

Research Landscape Overview

Core task: Optimizing compound AI systems with heterogeneous components and configurations. The field addresses the challenge of tuning complex AI pipelines that combine diverse modules—such as retrieval systems, language models, and specialized agents—each with distinct parameters and architectural choices. The taxonomy reveals several complementary perspectives: Component-Level Optimization Methods focus on tuning individual building blocks like neural architectures or parameter-efficient fine-tuning modules, while System-Level Optimization and Integration examines how to align entire pipelines end-to-end, ensuring that improvements in one component translate to better overall performance. Infrastructure and Deployment Optimization tackles practical concerns such as resource allocation and cloud-edge coordination, and Hardware-Aware Optimization considers accelerator-specific constraints. Domain-Specific Applications demonstrate these techniques in areas ranging from retrieval-augmented generation (LLaMA Retrieval[6]) to agentic workflows (Heterogeneous Agentic AI[7]), while Methodological Foundations and Surveys (Compound AI Survey[36]) provide overarching frameworks. A particularly active line of work explores end-to-end alignment strategies that optimize multi-component systems holistically rather than tuning each module in isolation. System-Level DPO[2] exemplifies this approach by applying preference-based learning across an entire pipeline, while Decision-Focused Learning[15] integrates prediction and decision-making objectives. Optimas[0] sits squarely within this end-to-end alignment cluster, emphasizing joint optimization of heterogeneous components to maximize system-level outcomes. Compared to works like Otter[1], which may focus on specific modular improvements, Optimas[0] addresses the broader challenge of coordinating diverse configurations and ensuring that local tuning decisions benefit the compound system as a whole. This contrasts with component-centric methods such as Neural Architecture Optimization[3] or BANANAS[5], which primarily target individual module design. The central tension remains how to balance modular flexibility with global coherence, a question that continues to drive research across system-level optimization branches.

Claimed Contributions

OPTIMAS framework for optimizing compound AI systems

The authors introduce OPTIMAS, a framework that optimizes compound AI systems by maintaining Local Reward Functions (LRFs) for each component. These LRFs satisfy a local-global alignment property, enabling independent optimization of heterogeneous configurations (prompts, hyperparameters, model parameters) while ensuring local improvements lead to global performance gains.

10 retrieved papers
Adaptive local-global alignment mechanism for LRFs

The authors propose a two-stage approach: initial reward modeling to establish aligned LRFs, followed by online adaptation using mini-batch preference data. This mechanism maintains the local-global alignment property as system configurations evolve during optimization, without requiring expensive full retraining.

10 retrieved papers
Theoretical convergence guarantees for compound system optimization

The authors prove that LRFs constructed via their method satisfy the local-global alignment property and that OPTIMAS converges to component-wise maximum under regularity conditions. This provides theoretical justification that aligning local and global rewards enables effective optimization.

9 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

OPTIMAS framework for optimizing compound AI systems

The authors introduce OPTIMAS, a framework that optimizes compound AI systems by maintaining Local Reward Functions (LRFs) for each component. These LRFs satisfy a local-global alignment property, enabling independent optimization of heterogeneous configurations (prompts, hyperparameters, model parameters) while ensuring local improvements lead to global performance gains.

Contribution

Adaptive local-global alignment mechanism for LRFs

The authors propose a two-stage approach: initial reward modeling to establish aligned LRFs, followed by online adaptation using mini-batch preference data. This mechanism maintains the local-global alignment property as system configurations evolve during optimization, without requiring expensive full retraining.

Contribution

Theoretical convergence guarantees for compound system optimization

The authors prove that LRFs constructed via their method satisfy the local-global alignment property and that OPTIMAS converges to component-wise maximum under regularity conditions. This provides theoretical justification that aligning local and global rewards enables effective optimization.

Optimas: Optimizing Compound AI Systems with Globally Aligned Local Rewards | Novelty Validation