Textual Equilibrium Propagation for Deep Compound AI Systems

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 6.5 Download Report PDF

Compound AI System

Large language models (LLMs) are increasingly deployed as part of compound AI systems which coordinate multiple modules (e.g., retrievers, tools, verifiers) over long-horizon workflows. Although recent frameworks that propagate textual feedback globally (e.g., TextGrad make it feasible to optimize such pipelines, we identify two depth-scaling failure modes in long-horizon agentic workflows: 1) exploding textual gradient, where textual feedback grows exponentially with depth, leading to prohibitively long message and amplifies evaluation biases; and 2) vanishing textual gradient, where limited long-context ability causes models overemphasize recent or early feedback, while compression of lengthy feedback causes downstream messages to lose specificity gradually as they propagate many hops upstream. To mitigate these issues, we introduce Textual Equilibrium Propagation (TEP), a local learning principle inspired by Equilibrium Propagation in energy-based models. TEP includes two phases: 1) a free phase where a local LLM critics iteratively refine prompts until reaching equilibrium (no further improvements are suggested); and 2) a nudged phase which applies proximal prompt edits with bounded modification intensity, using task-level objectives that propagate via forward signaling rather than backward feedback chains. This design supports local prompt optimization followed by controlled adaptation toward global goals without the computational burden and signal degradation of global textual backpropagation. Across long-horizon QA benchmarks and multi-agent tool-use dataset, TEP consistently improves accuracy and efficiency over global propagation methods such as TextGrad, with gains that increase at greater depths, while preserving the practicality of black-box LLM components in deep compound AI system.

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper introduces Textual Equilibrium Propagation (TEP) for optimizing prompts in deep compound AI systems, addressing failure modes in long-horizon workflows. It resides in the Global Gradient-Based Optimization leaf, which contains only two papers total. This is a notably sparse research direction within the broader taxonomy of 41 papers across the field, suggesting the work targets an emerging problem space where gradient-inspired optimization methods for multi-module LLM pipelines are still being actively developed.

The taxonomy reveals that prompt optimization for compound systems divides into global versus local strategies, with TEP's leaf focusing on end-to-end feedback propagation. Neighboring leaves include Local Optimization Strategies (module-by-module tuning) and Joint Fine-Tuning approaches (simultaneous weight and prompt updates). The scope note explicitly distinguishes global gradient flow from local methods, positioning TEP alongside one sibling paper that also propagates feedback across all modules. Related branches on Multi-Stage Frameworks and Infrastructure address architectural patterns rather than optimization mechanics, indicating TEP's focus on the optimization algorithm itself rather than system design.

Among 30 candidates examined through semantic search, none clearly refuted any of the three contributions. The identification of exploding and vanishing textual gradient failure modes examined 10 candidates with zero refutations, as did the TEP method itself and the empirical validation component. This suggests that within the limited search scope, the specific framing of depth-scaling failures and the equilibrium-based solution appear distinct from prior work. However, the analysis explicitly notes this is not an exhaustive literature review, leaving open the possibility of relevant work outside the top-30 semantic matches.

Based on the limited search scope, the work appears to occupy a sparsely populated research direction with novel problem framing. The taxonomy structure shows only one sibling paper in the same optimization category, and no examined candidates provided overlapping prior work. The analysis covers top-30 semantic matches plus citation expansion but does not claim exhaustive coverage of all gradient-based prompt optimization literature.

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: optimizing prompts in deep compound AI systems. Modern AI applications increasingly rely on multi-module pipelines where language models are chained together, each stage consuming the output of previous modules and producing inputs for downstream components. The taxonomy reveals several major branches addressing this complexity. Prompt Optimization Methods for Multi-Module Systems focuses on techniques that treat entire pipelines as differentiable or searchable structures, enabling end-to-end tuning across modules. Multi-Stage Prompt Engineering Frameworks and Infrastructure and Orchestration branches emphasize architectural patterns and tooling for managing these cascaded systems, while Prompt Optimization Search and Meta-Learning explores automated discovery of effective prompt configurations. Domain-Specific Prompt Applications demonstrates how these methods adapt to specialized fields, and branches on Design Principles, Security, and Deployment address formalization, robustness, and practical integration challenges. Within the optimization methods, a particularly active line of work pursues gradient-based or gradient-inspired techniques that propagate feedback through non-differentiable language model boundaries. Textual Equilibrium Propagation[0] exemplifies this global gradient-based optimization approach, drawing on equilibrium propagation principles to update prompts across deep compound systems. It shares conceptual ground with Backpropagating Language Feedback[2], which similarly aims to flow optimization signals backward through multi-stage pipelines, and contrasts with more modular approaches like Optimizing Instructions Demonstrations[1] that tune individual components separately. These gradient-oriented methods face the fundamental challenge of bridging discrete text generation with continuous optimization, a trade-off that distinguishes them from search-based or reinforcement learning alternatives found elsewhere in the taxonomy. Textual Equilibrium Propagation[0] sits squarely in this emerging cluster, contributing a biologically-inspired mechanism for end-to-end prompt refinement in systems where traditional backpropagation is unavailable.

Claimed Contributions

Identification of exploding and vanishing textual gradient failure modes

10 retrieved papers

The authors identify and formalize two critical depth-dependent failure modes in global textual backpropagation for compound AI systems: exploding textual gradients (where feedback grows exponentially with depth) and vanishing textual gradients (where compression causes loss of specificity). These failure modes limit the scalability of existing optimization methods in deep workflows.

10 retrieved papers

Textual Equilibrium Propagation (TEP) method

10 retrieved papers

The authors introduce TEP, a local learning principle inspired by Equilibrium Propagation in energy-based models. TEP consists of two phases: a free phase where local LLM critics iteratively refine prompts until equilibrium, and a nudged phase that applies bounded prompt modifications guided by task objectives via forward signaling rather than backward feedback chains.

10 retrieved papers

Comprehensive empirical validation across multiple benchmarks

10 retrieved papers

The authors provide extensive experimental validation showing that TEP consistently outperforms TextGrad and other baselines across diverse compound AI benchmarks including PubMedQA, STARK-PRIME, HotpotQA, and BigCodeBench, with performance gains that increase as workflow depth grows.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

[2] Optimizing generative AI by backpropagating language model feedback PDF

Mert Yuksekgonul, Federico Bianchi, Joseph Boen, Sheng Liu, Pan Lu, Zhi HUANG, Carlos Guestrin, James Zou (2025)

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Identification of exploding and vanishing textual gradient failure modes

[42] Understanding and mitigating gradient flow pathologies in physics-informed neural networks PDF

Cannot Refute

[43] Machine learning optimization techniques: a survey, classification, challenges, and future research issues PDF

Cannot Refute

[44] Backward gradient normalization in deep neural networks PDF

Cannot Refute

[45] Theoretical optimization of group size in group normalization for enhanced deep neural network training PDF

Cannot Refute

[46] Theoretical issues in deep networks PDF

Cannot Refute

[47] Failures of gradient-based deep learning PDF

Cannot Refute

[48] Understanding gradient descent on the edge of stability in deep learning PDF

Cannot Refute

[49] Directional convergence and alignment in deep learning PDF

Cannot Refute

[50] An enhanced deep neural network with global adaptive weighted gradient for solving hyperbolic partial differential equations PDF

Cannot Refute

[51] Physics-informed neural networks: A review of methodological evolution, theoretical foundations, and interdisciplinary frontiers toward next-generation â¦ PDF

Cannot Refute

Contribution

Textual Equilibrium Propagation (TEP) method

[62] Equilibrium Propagation for Periodic Dynamics PDF

Cannot Refute

[63] Quantum equilibrium propagation for efficient training of quantum systems based on Onsager reciprocity PDF

Cannot Refute

[64] Quantum equilibrium propagation: Gradient-descent training of quantum systems PDF

Cannot Refute

[65] Scalable Equilibrium Propagation via Intermediate Error Signals for Deep Convolutional CRNNs PDF

Cannot Refute

[66] Holomorphic equilibrium propagation computes exact gradients through finite size oscillations PDF

Cannot Refute

[67] Scaling equilibrium propagation to deep convnets by drastically reducing its gradient estimator bias PDF

Cannot Refute

[68] Developing a hybrid algorithm based on an equilibrium optimizer and an improved backpropagation neural network for fault warning PDF

Cannot Refute

[69] Equilibrium-Based Learning Dynamics in Spiking Architectures PDF

Cannot Refute

[70] Equilibrium propagation for learning in Lagrangian dynamical systems PDF

Cannot Refute

[71] Training and synchronizing oscillator networks with Equilibrium Propagation PDF

Cannot Refute

Contribution

Comprehensive empirical validation across multiple benchmarks

[52] Mac-sql: A multi-agent collaborative framework for text-to-sql PDF

Cannot Refute

[53] Chain-of-agents: End-to-end agent foundation models via multi-agent distillation and agentic rl PDF

Cannot Refute

[54] Long context scaling: Divide and conquer via multi-agent question-driven collaboration PDF

Cannot Refute

[55] Deep research agents: A systematic examination and roadmap PDF

Cannot Refute

[56] Multi-Agent Actor-Critic Generative AI for Query Resolution and Analysis PDF

Cannot Refute

[57] Knowledge-Aware Iterative Retrieval for Multi-Agent Systems PDF

Cannot Refute

[58] Multi-Agent System for Comprehensive Soccer Understanding PDF

Cannot Refute

[59] Beyond single-turn: A survey on multi-turn interactions with large language models PDF

Cannot Refute

[60] MedAgentBoard: Benchmarking Multi-Agent Collaboration with Conventional Methods for Diverse Medical Tasks PDF

Cannot Refute

[61] Efficient multi-agent collaboration with tool use for online planning in complex table question answering PDF

Cannot Refute

Textual Equilibrium Propagation for Deep Compound AI Systems

Overview

Overall Novelty Assessment

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

[2] Optimizing generative AI by backpropagating language model feedback PDF

Contribution Analysis

Identification of exploding and vanishing textual gradient failure modes

[42] Understanding and mitigating gradient flow pathologies in physics-informed neural networks PDF

[43] Machine learning optimization techniques: a survey, classification, challenges, and future research issues PDF

[44] Backward gradient normalization in deep neural networks PDF

[45] Theoretical optimization of group size in group normalization for enhanced deep neural network training PDF

[46] Theoretical issues in deep networks PDF

[47] Failures of gradient-based deep learning PDF

[48] Understanding gradient descent on the edge of stability in deep learning PDF

[49] Directional convergence and alignment in deep learning PDF

[50] An enhanced deep neural network with global adaptive weighted gradient for solving hyperbolic partial differential equations PDF

[51] Physics-informed neural networks: A review of methodological evolution, theoretical foundations, and interdisciplinary frontiers toward next-generation â¦ PDF

Textual Equilibrium Propagation (TEP) method

[62] Equilibrium Propagation for Periodic Dynamics PDF

[63] Quantum equilibrium propagation for efficient training of quantum systems based on Onsager reciprocity PDF

[64] Quantum equilibrium propagation: Gradient-descent training of quantum systems PDF

[65] Scalable Equilibrium Propagation via Intermediate Error Signals for Deep Convolutional CRNNs PDF

[66] Holomorphic equilibrium propagation computes exact gradients through finite size oscillations PDF

[67] Scaling equilibrium propagation to deep convnets by drastically reducing its gradient estimator bias PDF

[68] Developing a hybrid algorithm based on an equilibrium optimizer and an improved backpropagation neural network for fault warning PDF

[69] Equilibrium-Based Learning Dynamics in Spiking Architectures PDF

[70] Equilibrium propagation for learning in Lagrangian dynamical systems PDF

[71] Training and synchronizing oscillator networks with Equilibrium Propagation PDF

Comprehensive empirical validation across multiple benchmarks

[52] Mac-sql: A multi-agent collaborative framework for text-to-sql PDF

[53] Chain-of-agents: End-to-end agent foundation models via multi-agent distillation and agentic rl PDF

[54] Long context scaling: Divide and conquer via multi-agent question-driven collaboration PDF

[55] Deep research agents: A systematic examination and roadmap PDF

[56] Multi-Agent Actor-Critic Generative AI for Query Resolution and Analysis PDF

[57] Knowledge-Aware Iterative Retrieval for Multi-Agent Systems PDF

[58] Multi-Agent System for Comprehensive Soccer Understanding PDF

[59] Beyond single-turn: A survey on multi-turn interactions with large language models PDF

[60] MedAgentBoard: Benchmarking Multi-Agent Collaboration with Conventional Methods for Diverse Medical Tasks PDF

[61] Efficient multi-agent collaboration with tool use for online planning in complex table question answering PDF

Table of Contents

[51] Physics-informed neural networks: A review of methodological evolution, theoretical foundations, and interdisciplinary frontiers toward next-generation â¦ PDF