Discovering Novel LLM Experts via Task-Capability Coevolution

ICLR 2026 Conference SubmissionAnonymous Authors
Large Language ModelLLMsMinimal Criterion CoevolutionEvolutionary Model MergingSynthetic DataQuality-DiversityOpen-endedness
Abstract:

Frontier model developers aim to train models continually to possess emergent, diverse capabilities. To extend capabilities, the current pre-training and post-training paradigm requires manually starting training runs with static datasets or reward functions every time. Addressing this limitation, our work pursues the insight that open-endedness (via the coevolution of models and tasks) can discover models with increasingly novel skills in a single run. We introduce a new model development framework that extends coevolution to large language model (LLM) discovery, open-ended \textit{Assessment Coevolving with Diverse Capabilities} (AC/DC). AC/DC evolves both LLMs via model merging and natural language tasks via synthetic data generation. AC/DC discovers growing archives of LLMs that surpass the capabilities of larger LLMs while taking up less GPU memory. In particular, our LLM populations achieve a broader Coverage of expertise than other curated models or baselines on downstream benchmarks, without \textit{any} explicit benchmark optimization. Furthermore, AC/DC improves Coverage over time, continually innovates on tasks and models, and improves performance in multi-agent best-of-N selection. Our findings highlight the potential of coevolution as a means of discovering broader sets of capabilities from base LLMs. Overall, AC/DC brings us one step closer to a profoundly new paradigm of LLM development, where continual improvements to the diversity of model capabilities can be accelerated by leveraging existing models as stepping stones to increasingly powerful models.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper introduces AC/DC, a framework that coevolves language models through model merging and synthetic tasks through data generation. It occupies the 'Task-Model Coevolution with Synthetic Data' leaf within the 'Coevolutionary and Population-Based Model Development' branch. Notably, this leaf contains only the original paper itself—no sibling papers exist in the taxonomy. This suggests the specific combination of joint task-model coevolution via synthetic data generation and model merging represents a relatively sparse research direction within the examined literature.

The taxonomy reveals neighboring work in related but distinct areas. The sibling leaf 'Population-Based LLM Evolution' explores evolutionary operations on model populations without joint task coevolution, while 'Code Generation with Program-Test Coevolution' applies coevolutionary principles to a specialized domain. The adjacent 'Model Merging and Knowledge Integration' branch focuses on weight fusion without iterative evolutionary selection. The taxonomy's scope notes clarify that AC/DC's joint coevolution of both models and tasks distinguishes it from methods that evolve only one component or merge models statically.

Among 22 candidates examined, the AC/DC framework contribution shows one refutable candidate out of three examined, suggesting some prior work in coevolutionary approaches. The discovery of diverse LLM collectives examined nine candidates with none clearly refuting the contribution, indicating this aspect may be more novel within the limited search scope. The continual open-ended improvement claim examined ten candidates, also with no clear refutations. The statistics suggest varying degrees of prior work across contributions, though the search scope remains modest relative to the broader literature.

Based on the top-22 semantic matches examined, the work appears to occupy a relatively unexplored intersection of coevolution, synthetic task generation, and model merging. The single-paper taxonomy leaf and limited refutable candidates suggest novelty, though the analysis does not cover exhaustive literature review or all potential related work in evolutionary computation, synthetic data generation, or model fusion domains.

Taxonomy

Core-task Taxonomy Papers
7
3
Claimed Contributions
22
Contribution Candidate Papers Compared
1
Refutable Paper

Research Landscape Overview

Core task: Coevolutionary discovery of diverse language model experts through model merging and synthetic task generation. The field structure reflects a convergence of evolutionary computation, modular neural architectures, and adaptive learning paradigms. The taxonomy organizes work into four main branches: Coevolutionary and Population-Based Model Development explores how populations of models and tasks can jointly evolve, often using synthetic data generation to drive specialization; Model Merging and Knowledge Integration focuses on techniques for combining pretrained models or their components to create new capabilities without retraining from scratch; Expert Specialization and Modular Architectures examines methods for building systems with distinct expert modules that handle different subtasks; and Multi-Task Learning and Instruction Tuning addresses how models can be trained or adapted to handle diverse objectives simultaneously. These branches are interconnected, as evolutionary approaches often rely on merging operations, while expert systems benefit from multi-task learning strategies. Recent work highlights several active research directions and trade-offs. Population-based methods like CoCoEvo[2] and Nature-Inspired Population Evolution[5] demonstrate how evolutionary algorithms can discover diverse model populations, while Branch-Train-Merge[3] and Knowledge Fusion Evolving[4] show how merging strategies can efficiently create specialized experts. A key tension exists between fully coevolutionary approaches that jointly optimize tasks and models versus more modular strategies that separate expert discovery from task design. Task-Capability Coevolution[0] sits squarely within the coevolutionary paradigm, emphasizing the mutual adaptation of synthetic tasks and model experts through iterative generation and selection. Compared to CoCoEvo[2], which also explores task-model coevolution, Task-Capability Coevolution[0] places stronger emphasis on synthetic task generation as a driver of diversity. Meanwhile, approaches like Self-MOE[6] focus more narrowly on expert specialization without the coevolutionary task generation component, highlighting different pathways toward building diverse model populations.

Claimed Contributions

AC/DC framework for coevolving LLMs and synthetic tasks

The authors propose AC/DC, a framework that simultaneously evolves populations of LLMs through model merging and synthetic tasks through data generation. This coevolutionary approach enables continuous discovery of diverse model capabilities in a single run without explicit benchmark optimization.

3 retrieved papers
Can Refute
Discovery of diverse LLM collectives with broader coverage than larger models

The method discovers populations of smaller LLMs that collectively cover more skills and solve more out-of-distribution benchmark tasks than larger individual models or manually curated expert ensembles, while using fewer total parameters and without optimizing for those benchmarks.

9 retrieved papers
Demonstration of continual open-ended improvement in LLM capabilities

The authors show that their coevolutionary process leads to ongoing improvements in collective model performance over successive generations, with evidence of sustained innovation in both task complexity and model capabilities throughout the evolutionary run.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Within the taxonomy built over the current TopK core-task papers, the original paper is assigned to a leaf with no direct siblings and no cousin branches under the same grandparent topic. In this retrieved landscape, it appears structurally isolated, which is one partial signal of novelty, but still constrained by search coverage and taxonomy granularity.

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

AC/DC framework for coevolving LLMs and synthetic tasks

The authors propose AC/DC, a framework that simultaneously evolves populations of LLMs through model merging and synthetic tasks through data generation. This coevolutionary approach enables continuous discovery of diverse model capabilities in a single run without explicit benchmark optimization.

Contribution

Discovery of diverse LLM collectives with broader coverage than larger models

The method discovers populations of smaller LLMs that collectively cover more skills and solve more out-of-distribution benchmark tasks than larger individual models or manually curated expert ensembles, while using fewer total parameters and without optimizing for those benchmarks.

Contribution

Demonstration of continual open-ended improvement in LLM capabilities

The authors show that their coevolutionary process leads to ongoing improvements in collective model performance over successive generations, with evidence of sustained innovation in both task complexity and model capabilities throughout the evolutionary run.