Discovering Novel LLM Experts via Task-Capability Coevolution

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 6.0 Download Report PDF

Large Language ModelLLMsMinimal Criterion CoevolutionEvolutionary Model MergingSynthetic DataQuality-DiversityOpen-endedness

Frontier model developers aim to train models continually to possess emergent, diverse capabilities. To extend capabilities, the current pre-training and post-training paradigm requires manually starting training runs with static datasets or reward functions every time. Addressing this limitation, our work pursues the insight that open-endedness (via the coevolution of models and tasks) can discover models with increasingly novel skills in a single run. We introduce a new model development framework that extends coevolution to large language model (LLM) discovery, open-ended \textit{Assessment Coevolving with Diverse Capabilities} (AC/DC). AC/DC evolves both LLMs via model merging and natural language tasks via synthetic data generation. AC/DC discovers growing archives of LLMs that surpass the capabilities of larger LLMs while taking up less GPU memory. In particular, our LLM populations achieve a broader Coverage of expertise than other curated models or baselines on downstream benchmarks, without \textit{any} explicit benchmark optimization. Furthermore, AC/DC improves Coverage over time, continually innovates on tasks and models, and improves performance in multi-agent best-of-N selection. Our findings highlight the potential of coevolution as a means of discovering broader sets of capabilities from base LLMs. Overall, AC/DC brings us one step closer to a profoundly new paradigm of LLM development, where continual improvements to the diversity of model capabilities can be accelerated by leveraging existing models as stepping stones to increasingly powerful models.

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper introduces AC/DC, a framework that coevolves language models through model merging and synthetic tasks through data generation. It occupies the 'Task-Model Coevolution with Synthetic Data' leaf within the 'Coevolutionary and Population-Based Model Development' branch. Notably, this leaf contains only the original paper itself—no sibling papers exist in the taxonomy. This suggests the specific combination of joint task-model coevolution via synthetic data generation and model merging represents a relatively sparse research direction within the examined literature.

The taxonomy reveals neighboring work in related but distinct areas. The sibling leaf 'Population-Based LLM Evolution' explores evolutionary operations on model populations without joint task coevolution, while 'Code Generation with Program-Test Coevolution' applies coevolutionary principles to a specialized domain. The adjacent 'Model Merging and Knowledge Integration' branch focuses on weight fusion without iterative evolutionary selection. The taxonomy's scope notes clarify that AC/DC's joint coevolution of both models and tasks distinguishes it from methods that evolve only one component or merge models statically.

Among 22 candidates examined, the AC/DC framework contribution shows one refutable candidate out of three examined, suggesting some prior work in coevolutionary approaches. The discovery of diverse LLM collectives examined nine candidates with none clearly refuting the contribution, indicating this aspect may be more novel within the limited search scope. The continual open-ended improvement claim examined ten candidates, also with no clear refutations. The statistics suggest varying degrees of prior work across contributions, though the search scope remains modest relative to the broader literature.

Based on the top-22 semantic matches examined, the work appears to occupy a relatively unexplored intersection of coevolution, synthetic task generation, and model merging. The single-paper taxonomy leaf and limited refutable candidates suggest novelty, though the analysis does not cover exhaustive literature review or all potential related work in evolutionary computation, synthetic data generation, or model fusion domains.

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: Coevolutionary discovery of diverse language model experts through model merging and synthetic task generation. The field structure reflects a convergence of evolutionary computation, modular neural architectures, and adaptive learning paradigms. The taxonomy organizes work into four main branches: Coevolutionary and Population-Based Model Development explores how populations of models and tasks can jointly evolve, often using synthetic data generation to drive specialization; Model Merging and Knowledge Integration focuses on techniques for combining pretrained models or their components to create new capabilities without retraining from scratch; Expert Specialization and Modular Architectures examines methods for building systems with distinct expert modules that handle different subtasks; and Multi-Task Learning and Instruction Tuning addresses how models can be trained or adapted to handle diverse objectives simultaneously. These branches are interconnected, as evolutionary approaches often rely on merging operations, while expert systems benefit from multi-task learning strategies. Recent work highlights several active research directions and trade-offs. Population-based methods like CoCoEvo[2] and Nature-Inspired Population Evolution[5] demonstrate how evolutionary algorithms can discover diverse model populations, while Branch-Train-Merge[3] and Knowledge Fusion Evolving[4] show how merging strategies can efficiently create specialized experts. A key tension exists between fully coevolutionary approaches that jointly optimize tasks and models versus more modular strategies that separate expert discovery from task design. Task-Capability Coevolution[0] sits squarely within the coevolutionary paradigm, emphasizing the mutual adaptation of synthetic tasks and model experts through iterative generation and selection. Compared to CoCoEvo[2], which also explores task-model coevolution, Task-Capability Coevolution[0] places stronger emphasis on synthetic task generation as a driver of diversity. Meanwhile, approaches like Self-MOE[6] focus more narrowly on expert specialization without the coevolutionary task generation component, highlighting different pathways toward building diverse model populations.

Claimed Contributions

AC/DC framework for coevolving LLMs and synthetic tasks

Can Refute

3 retrieved papers

The authors propose AC/DC, a framework that simultaneously evolves populations of LLMs through model merging and synthetic tasks through data generation. This coevolutionary approach enables continuous discovery of diverse model capabilities in a single run without explicit benchmark optimization.

3 retrieved papers

Can Refute

Discovery of diverse LLM collectives with broader coverage than larger models

9 retrieved papers

The method discovers populations of smaller LLMs that collectively cover more skills and solve more out-of-distribution benchmark tasks than larger individual models or manually curated expert ensembles, while using fewer total parameters and without optimizing for those benchmarks.

9 retrieved papers

Demonstration of continual open-ended improvement in LLM capabilities

10 retrieved papers

The authors show that their coevolutionary process leads to ongoing improvements in collective model performance over successive generations, with evidence of sustained innovation in both task complexity and model capabilities throughout the evolutionary run.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Within the taxonomy built over the current TopK core-task papers, the original paper is assigned to a leaf with no direct siblings and no cousin branches under the same grandparent topic. In this retrieved landscape, it appears structurally isolated, which is one partial signal of novelty, but still constrained by search coverage and taxonomy granularity.

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

AC/DC framework for coevolving LLMs and synthetic tasks

[27] Evolution through large models PDF

Can Refute

[28] SentiGEN: Synthetic Data Generator for Sentiment Analysis PDF

Cannot Refute

[29] EvoSyn: Generalizable Evolutionary Data Synthesis for Verifiable Learning PDF

Cannot Refute

Contribution

Discovery of diverse LLM collectives with broader coverage than larger models

[18] Distilling Reasoning Capabilities into Smaller Language Models PDF

Cannot Refute

[19] Wider or Deeper: Revisiting the ResNet Model for Visual Recognition PDF

Cannot Refute

[20] When Ensembling Smaller Models is More Efficient than Single Large Models PDF

Cannot Refute

[21] Melora: Mini-ensemble low-rank adapters for parameter-efficient fine-tuning PDF

Cannot Refute

[22] Less is more: Extreme gradient boost rank-1 adaption for efficient finetuning of llms PDF

Cannot Refute

[23] Combining dynamical and statistical ensembles PDF

Cannot Refute

[24] Ability of a poor man's ensemble to predict the probability and distribution of precipitation PDF

Cannot Refute

[25] Skill improvement from increased ensemble size and model diversity PDF

Cannot Refute

[26] Diverse Lottery Tickets Boost Ensemble from a Single Pretrained Model PDF

Cannot Refute

Contribution

Demonstration of continual open-ended improvement in LLM capabilities

[8] Llms can simulate standardized patients via agent coevolution PDF

Cannot Refute

[9] A survey of self-evolving agents: On path to artificial super intelligence PDF

Cannot Refute

[10] The openelm library: Leveraging progress in language models for novel evolutionary algorithms PDF

Cannot Refute

[11] When large language models meet evolutionary algorithms PDF

Cannot Refute

[12] Llm-poet: Evolving complex environments using large language models PDF

Cannot Refute

[13] Coevolution of large language models with physical models boosts advanced battery research PDF

Cannot Refute

[14] Poet: open-ended coevolution of environments and their optimized solutions PDF

Cannot Refute

[15] System Metamodelling of Open-Ended Evolution Implemented with Self-Modifying Code PDF

Cannot Refute

[16] Coevo: Continual evolution of symbolic solutions using large language models PDF

Cannot Refute

[17] Endless minds most beautiful: building open-ended linguistic autotelic agents with deep reinforcement learning and language models PDF

Cannot Refute

Discovering Novel LLM Experts via Task-Capability Coevolution

Overview

Overall Novelty Assessment

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

Contribution Analysis

AC/DC framework for coevolving LLMs and synthetic tasks

[27] Evolution through large models PDF

[28] SentiGEN: Synthetic Data Generator for Sentiment Analysis PDF

[29] EvoSyn: Generalizable Evolutionary Data Synthesis for Verifiable Learning PDF

Discovery of diverse LLM collectives with broader coverage than larger models

[18] Distilling Reasoning Capabilities into Smaller Language Models PDF

[19] Wider or Deeper: Revisiting the ResNet Model for Visual Recognition PDF

[20] When Ensembling Smaller Models is More Efficient than Single Large Models PDF

[21] Melora: Mini-ensemble low-rank adapters for parameter-efficient fine-tuning PDF

[22] Less is more: Extreme gradient boost rank-1 adaption for efficient finetuning of llms PDF

[23] Combining dynamical and statistical ensembles PDF

[24] Ability of a poor man's ensemble to predict the probability and distribution of precipitation PDF

[25] Skill improvement from increased ensemble size and model diversity PDF

[26] Diverse Lottery Tickets Boost Ensemble from a Single Pretrained Model PDF

Demonstration of continual open-ended improvement in LLM capabilities

[8] Llms can simulate standardized patients via agent coevolution PDF

[9] A survey of self-evolving agents: On path to artificial super intelligence PDF

[10] The openelm library: Leveraging progress in language models for novel evolutionary algorithms PDF

[11] When large language models meet evolutionary algorithms PDF

[12] Llm-poet: Evolving complex environments using large language models PDF

[13] Coevolution of large language models with physical models boosts advanced battery research PDF

[14] Poet: open-ended coevolution of environments and their optimized solutions PDF

[15] System Metamodelling of Open-Ended Evolution Implemented with Self-Modifying Code PDF

[16] Coevo: Continual evolution of symbolic solutions using large language models PDF

[17] Endless minds most beautiful: building open-ended linguistic autotelic agents with deep reinforcement learning and language models PDF

Table of Contents