Composer: A Search Framework for Hybrid Neural Architecture Design

ICLR 2026 Conference SubmissionAnonymous Authors
Neural architecture searchhybrid modelsefficient ML
Abstract:

Hybrid model architectures that combine computational primitives (e.g., Attention, MLP) in different ratios have shown promising performance beyond Transformers. Some studies have shown that different interleavings of primitives can affect model quality as well. However, prior works explore the hybrid model architecture design space manually. Due to the large design space and training costs, discovering hybrid models that combine key computational primitives for pre-training is challenging. In this work, we take a principled approach in designing a modular hybrid model architecture search framework — Composer. Composer explores model architectures at a small scale and extrapolates the top-performing model architectures to a larger scale using our proposed scaling strategies. Using Composer, we discover new hybrid LLM architectures that outperform Llama 3.2. Compared to Llama 3.2 and previous state-of-the-art baselines, the new model architectures consistently reduce validation loss at parameter scales of 350M-3B and improve evaluation accuracy on the downstream tasks by up to 2.8-8.3% (1.1-3.1% on average) while improving both training and inference efficiency.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper introduces Composer, a modular framework for discovering hybrid language model architectures by combining computational primitives such as Attention and MLP in varied interleavings. It resides in the 'Multi-Primitive Hybrid Architecture Search' leaf, which contains only three papers total, indicating a relatively sparse research direction within the broader taxonomy. This positioning suggests the work addresses a focused problem—automated search over multi-primitive compositions—rather than competing in a densely populated subfield. The small sibling set implies limited direct competition but also highlights that systematic exploration of hybrid primitive combinations remains an emerging area.

The taxonomy tree reveals that Composer's leaf sits within the 'Hybrid Architecture Design and Search Methods' branch, which also includes evolutionary NAS, differentiable NAS, and LLM-guided search paradigms. Neighboring leaves such as 'Evolutionary and Genetic Algorithm-Based NAS' (four papers) and 'Differentiable and Gradient-Based NAS' (two papers) explore alternative search strategies but do not emphasize multi-primitive composition as explicitly. The 'Domain-Specific Architecture Design' branch addresses task-tailored designs, while 'Efficiency and Compression' focuses on post-hoc optimization. Composer's scope note excludes single-primitive architectures and non-search manual designs, clarifying that it targets automated discovery of hybrid primitives rather than efficiency-driven compression or task-specific tuning.

Among the 25 candidates examined, none clearly refute any of Composer's three contributions. For the core framework contribution, 10 candidates were reviewed with zero refutable overlaps; the scaling strategy contribution examined 10 candidates with the same outcome; and the composite interleaving patterns contribution reviewed 5 candidates, again finding no clear prior work. This suggests that within the limited search scope—top-K semantic matches plus citation expansion—the specific combination of modular search, scaling extrapolation, and advanced interleaving patterns appears relatively novel. However, the analysis does not claim exhaustive coverage, and the small candidate pool means undiscovered overlaps remain possible.

Given the sparse taxonomy leaf and the absence of refutable candidates among 25 examined papers, Composer appears to occupy a distinct niche in hybrid architecture search. The limited search scope and small sibling set mean this assessment reflects top-K semantic proximity rather than comprehensive field coverage. Future work might uncover related efforts in adjacent communities or unpublished preprints, but the current signals point to a contribution that extends beyond incremental refinement of existing multi-primitive search methods.

Taxonomy

Core-task Taxonomy Papers
47
3
Claimed Contributions
25
Contribution Candidate Papers Compared
0
Refutable Paper

Research Landscape Overview

Core task: hybrid neural architecture search for language models. The field encompasses diverse strategies for discovering and optimizing neural architectures tailored to language understanding and generation. The taxonomy reveals several major branches: Hybrid Architecture Design and Search Methods explore combinations of different architectural primitives and search strategies; Domain-Specific Architecture Design targets particular language tasks such as translation or question answering; Efficiency and Compression address resource constraints through pruning, quantization, and lightweight designs; Automated Machine Learning with LLMs leverages large language models to guide architecture decisions; Meta-Learning and Adaptive Architecture Frameworks focus on learning-to-learn paradigms; NAS Surveys provide methodological overviews; and Application-Specific NAS tailors architectures to specialized domains. Representative works like Transformer NAS Survey[7] and DNN Design Survey[19] offer broad perspectives, while methods such as LLaMA-NAS[1] and Mixed-Precision BERT[26] exemplify targeted optimization for language models. Within the Hybrid Architecture Design branch, a particularly active line of work investigates multi-primitive search spaces that blend convolutional, recurrent, and attention-based components. Composer[0] sits squarely in this cluster, emphasizing the automated discovery of hybrid architectures that combine diverse building blocks for language tasks. Nearby, Jet-Nemotron[17] and Pretrained Hybrids[23] also explore hybrid designs but differ in their treatment of pretraining and search efficiency trade-offs. While Jet-Nemotron[17] focuses on integrating pretrained components into the search process, Pretrained Hybrids[23] examines how to leverage existing pretrained models as starting points for architecture refinement. Composer[0] distinguishes itself by targeting a broader compositional search over multiple primitive types, aiming to balance expressiveness and computational cost. These contrasting emphases reflect ongoing questions about how to best navigate the vast design space of hybrid architectures while maintaining practical training budgets and generalization performance.

Claimed Contributions

Composer: A modular hybrid neural architecture search framework

The authors introduce Composer, a systematic framework for discovering hybrid neural architectures by searching at small scale and extrapolating to larger scales. The framework consists of four core components: a search engine, evaluator, aggregator, and extrapolator that work together to efficiently navigate the design space.

10 retrieved papers
Novel scaling strategies for extrapolating small-scale architectures

The authors develop two extrapolation techniques (stacking and stretching) that scale up discovered small hybrid architectures to target model sizes approximately 1000 times larger while preserving their performance characteristics and interleaving patterns of computational primitives.

10 retrieved papers
Composite hybrid LLM architectures with advanced interleaving patterns

The authors discover novel hybrid architectures (termed Composite architectures) featuring a 1:2 ratio of attention to MLP layers with sophisticated interleaving patterns that consistently outperform Llama 3.2 and other state-of-the-art baselines across multiple scales and metrics.

5 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Composer: A modular hybrid neural architecture search framework

The authors introduce Composer, a systematic framework for discovering hybrid neural architectures by searching at small scale and extrapolating to larger scales. The framework consists of four core components: a search engine, evaluator, aggregator, and extrapolator that work together to efficiently navigate the design space.

Contribution

Novel scaling strategies for extrapolating small-scale architectures

The authors develop two extrapolation techniques (stacking and stretching) that scale up discovered small hybrid architectures to target model sizes approximately 1000 times larger while preserving their performance characteristics and interleaving patterns of computational primitives.

Contribution

Composite hybrid LLM architectures with advanced interleaving patterns

The authors discover novel hybrid architectures (termed Composite architectures) featuring a 1:2 ratio of attention to MLP layers with sophisticated interleaving patterns that consistently outperform Llama 3.2 and other state-of-the-art baselines across multiple scales and metrics.