Composer: A Search Framework for Hybrid Neural Architecture Design

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 5.6 Download Report PDF

Neural architecture searchhybrid modelsefficient ML

Hybrid model architectures that combine computational primitives (e.g., Attention, MLP) in different ratios have shown promising performance beyond Transformers. Some studies have shown that different interleavings of primitives can affect model quality as well. However, prior works explore the hybrid model architecture design space manually. Due to the large design space and training costs, discovering hybrid models that combine key computational primitives for pre-training is challenging. In this work, we take a principled approach in designing a modular hybrid model architecture search framework — Composer. Composer explores model architectures at a small scale and extrapolates the top-performing model architectures to a larger scale using our proposed scaling strategies. Using Composer, we discover new hybrid LLM architectures that outperform Llama 3.2. Compared to Llama 3.2 and previous state-of-the-art baselines, the new model architectures consistently reduce validation loss at parameter scales of 350M-3B and improve evaluation accuracy on the downstream tasks by up to 2.8-8.3% (1.1-3.1% on average) while improving both training and inference efficiency.

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper introduces Composer, a modular framework for discovering hybrid language model architectures by combining computational primitives such as Attention and MLP in varied interleavings. It resides in the 'Multi-Primitive Hybrid Architecture Search' leaf, which contains only three papers total, indicating a relatively sparse research direction within the broader taxonomy. This positioning suggests the work addresses a focused problem—automated search over multi-primitive compositions—rather than competing in a densely populated subfield. The small sibling set implies limited direct competition but also highlights that systematic exploration of hybrid primitive combinations remains an emerging area.

The taxonomy tree reveals that Composer's leaf sits within the 'Hybrid Architecture Design and Search Methods' branch, which also includes evolutionary NAS, differentiable NAS, and LLM-guided search paradigms. Neighboring leaves such as 'Evolutionary and Genetic Algorithm-Based NAS' (four papers) and 'Differentiable and Gradient-Based NAS' (two papers) explore alternative search strategies but do not emphasize multi-primitive composition as explicitly. The 'Domain-Specific Architecture Design' branch addresses task-tailored designs, while 'Efficiency and Compression' focuses on post-hoc optimization. Composer's scope note excludes single-primitive architectures and non-search manual designs, clarifying that it targets automated discovery of hybrid primitives rather than efficiency-driven compression or task-specific tuning.

Among the 25 candidates examined, none clearly refute any of Composer's three contributions. For the core framework contribution, 10 candidates were reviewed with zero refutable overlaps; the scaling strategy contribution examined 10 candidates with the same outcome; and the composite interleaving patterns contribution reviewed 5 candidates, again finding no clear prior work. This suggests that within the limited search scope—top-K semantic matches plus citation expansion—the specific combination of modular search, scaling extrapolation, and advanced interleaving patterns appears relatively novel. However, the analysis does not claim exhaustive coverage, and the small candidate pool means undiscovered overlaps remain possible.

Given the sparse taxonomy leaf and the absence of refutable candidates among 25 examined papers, Composer appears to occupy a distinct niche in hybrid architecture search. The limited search scope and small sibling set mean this assessment reflects top-K semantic proximity rather than comprehensive field coverage. Future work might uncover related efforts in adjacent communities or unpublished preprints, but the current signals point to a contribution that extends beyond incremental refinement of existing multi-primitive search methods.

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: hybrid neural architecture search for language models. The field encompasses diverse strategies for discovering and optimizing neural architectures tailored to language understanding and generation. The taxonomy reveals several major branches: Hybrid Architecture Design and Search Methods explore combinations of different architectural primitives and search strategies; Domain-Specific Architecture Design targets particular language tasks such as translation or question answering; Efficiency and Compression address resource constraints through pruning, quantization, and lightweight designs; Automated Machine Learning with LLMs leverages large language models to guide architecture decisions; Meta-Learning and Adaptive Architecture Frameworks focus on learning-to-learn paradigms; NAS Surveys provide methodological overviews; and Application-Specific NAS tailors architectures to specialized domains. Representative works like Transformer NAS Survey[7] and DNN Design Survey[19] offer broad perspectives, while methods such as LLaMA-NAS[1] and Mixed-Precision BERT[26] exemplify targeted optimization for language models. Within the Hybrid Architecture Design branch, a particularly active line of work investigates multi-primitive search spaces that blend convolutional, recurrent, and attention-based components. Composer[0] sits squarely in this cluster, emphasizing the automated discovery of hybrid architectures that combine diverse building blocks for language tasks. Nearby, Jet-Nemotron[17] and Pretrained Hybrids[23] also explore hybrid designs but differ in their treatment of pretraining and search efficiency trade-offs. While Jet-Nemotron[17] focuses on integrating pretrained components into the search process, Pretrained Hybrids[23] examines how to leverage existing pretrained models as starting points for architecture refinement. Composer[0] distinguishes itself by targeting a broader compositional search over multiple primitive types, aiming to balance expressiveness and computational cost. These contrasting emphases reflect ongoing questions about how to best navigate the vast design space of hybrid architectures while maintaining practical training budgets and generalization performance.

Claimed Contributions

Composer: A modular hybrid neural architecture search framework

10 retrieved papers

The authors introduce Composer, a systematic framework for discovering hybrid neural architectures by searching at small scale and extrapolating to larger scales. The framework consists of four core components: a search engine, evaluator, aggregator, and extrapolator that work together to efficiently navigate the design space.

10 retrieved papers

Novel scaling strategies for extrapolating small-scale architectures

10 retrieved papers

The authors develop two extrapolation techniques (stacking and stretching) that scale up discovered small hybrid architectures to target model sizes approximately 1000 times larger while preserving their performance characteristics and interleaving patterns of computational primitives.

10 retrieved papers

Composite hybrid LLM architectures with advanced interleaving patterns

5 retrieved papers

The authors discover novel hybrid architectures (termed Composite architectures) featuring a 1:2 ratio of attention to MLP layers with sophisticated interleaving patterns that consistently outperform Llama 3.2 and other state-of-the-art baselines across multiple scales and metrics.

5 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

[17] Jet-Nemotron: Efficient Language Model with Post Neural Architecture Search PDF

Gu, Yuxian, Hu Qinghao, Yang Shang, Xi, Haocheng, Chen, Junyu, Han Song, Cai Han (2025)

[23] Pretrained hybrids with mad skills PDF

Roberts, Nicholas, Guo, Samuel, Nicholas Roberts, Gao Zhi-qi, Samuel Guo, Zhiqi Gao, Cromp, Sonia, Satya Sai Srinath Namburi, WU Chengjun, Sonia Cromp, Duan Cheng-yu, Chengjun Wu, Sala Frederic, Chengyu Duan, Frederic Sala (2024)

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Composer: A modular hybrid neural architecture search framework

[63] Disentangled continual graph neural architecture search with invariant modular supernet PDF

Cannot Refute

[64] PSNAS-Net: Hybrid gradient-physical optimizationfor efficient neural architecture search in customized medical imaging analysis PDF

Cannot Refute

[65] Hybrid intelligence systems for reliable automation: advancing knowledge work and autonomous operations with scalable AI architectures PDF

Cannot Refute

[66] Mechanistic Design and Scaling of Hybrid Architectures PDF

Cannot Refute

[67] Multi-objective differentiable neural architecture search PDF

Cannot Refute

[68] Blockwisely Supervised Neural Architecture Search with Knowledge Distillation PDF

Cannot Refute

[69] Nas-fpn: Learning scalable feature pyramid architecture for object detection PDF

Cannot Refute

[70] FACETS: Efficient Once-for-all Object Detection via Constrained Iterative Search PDF

Cannot Refute

[71] Portable fast platform-aware neural architecture search for edge/mobile computing ai applications PDF

Cannot Refute

[72] Modular Machine Learning: An Indispensable Path towards New-Generation Large Language Models PDF

Cannot Refute

Contribution

Novel scaling strategies for extrapolating small-scale architectures

[48] Designing neural networks through neuroevolution PDF

Cannot Refute

[49] Deep multi-scale 3D convolutional neural network (CNN) for MRI gliomas brain tumor classification PDF

Cannot Refute

[50] Compact Neural Network via Stacking Hybrid Units PDF

Cannot Refute

[51] Early Poplar (Populus) Leaf-Based Disease Detection through Computer Vision, YOLOv8, and Contrast Stretching Technique PDF

Cannot Refute

[52] A lightweight neural network with strong robustness for bearing fault diagnosis PDF

Cannot Refute

[53] Underwater target recognition using convolutional recurrent neural networks with 3-D Mel-spectrogram and data augmentation PDF

Cannot Refute

[54] Neural Paraphrase Generation with Stacked Residual LSTM Networks PDF

Cannot Refute

[55] Training Robust Spiking Neural Networks with Viewpoint Transform and Spatiotemporal Stretching PDF

Cannot Refute

[56] Monolithic 3D stacking for neural network acceleration PDF

Cannot Refute

[57] Machine hallucinations: Architecture and artificial intelligence PDF

Cannot Refute

Contribution

Composite hybrid LLM architectures with advanced interleaving patterns

[58] Branchformer: Parallel MLP-Attention Architectures to Capture Local and Global Context for Speech Recognition and Understanding PDF

Cannot Refute

[59] The mamba in the llama: Distilling and accelerating hybrid models PDF

Cannot Refute

[60] Onediff: A generalist model for image difference captioning PDF

Cannot Refute

[61] Distilling to Hybrid Attention Models via KL-Guided Layer Selection PDF

Cannot Refute

[62] Orthogonal Trace Resonance in Transformer Attention for Large Language Models PDF

Cannot Refute

Composer: A Search Framework for Hybrid Neural Architecture Design

Overview

Overall Novelty Assessment

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

[17] Jet-Nemotron: Efficient Language Model with Post Neural Architecture Search PDF

[23] Pretrained hybrids with mad skills PDF

Contribution Analysis

Composer: A modular hybrid neural architecture search framework

[63] Disentangled continual graph neural architecture search with invariant modular supernet PDF

[64] PSNAS-Net: Hybrid gradient-physical optimizationfor efficient neural architecture search in customized medical imaging analysis PDF

[65] Hybrid intelligence systems for reliable automation: advancing knowledge work and autonomous operations with scalable AI architectures PDF

[66] Mechanistic Design and Scaling of Hybrid Architectures PDF

[67] Multi-objective differentiable neural architecture search PDF

[68] Blockwisely Supervised Neural Architecture Search with Knowledge Distillation PDF

[69] Nas-fpn: Learning scalable feature pyramid architecture for object detection PDF

[70] FACETS: Efficient Once-for-all Object Detection via Constrained Iterative Search PDF

[71] Portable fast platform-aware neural architecture search for edge/mobile computing ai applications PDF

[72] Modular Machine Learning: An Indispensable Path towards New-Generation Large Language Models PDF

Novel scaling strategies for extrapolating small-scale architectures

[48] Designing neural networks through neuroevolution PDF

[49] Deep multi-scale 3D convolutional neural network (CNN) for MRI gliomas brain tumor classification PDF

[50] Compact Neural Network via Stacking Hybrid Units PDF

[51] Early Poplar (Populus) Leaf-Based Disease Detection through Computer Vision, YOLOv8, and Contrast Stretching Technique PDF

[52] A lightweight neural network with strong robustness for bearing fault diagnosis PDF

[53] Underwater target recognition using convolutional recurrent neural networks with 3-D Mel-spectrogram and data augmentation PDF

[54] Neural Paraphrase Generation with Stacked Residual LSTM Networks PDF

[55] Training Robust Spiking Neural Networks with Viewpoint Transform and Spatiotemporal Stretching PDF

[56] Monolithic 3D stacking for neural network acceleration PDF

[57] Machine hallucinations: Architecture and artificial intelligence PDF

Composite hybrid LLM architectures with advanced interleaving patterns

[58] Branchformer: Parallel MLP-Attention Architectures to Capture Local and Global Context for Speech Recognition and Understanding PDF

[59] The mamba in the llama: Distilling and accelerating hybrid models PDF

[60] Onediff: A generalist model for image difference captioning PDF

[61] Distilling to Hybrid Attention Models via KL-Guided Layer Selection PDF

[62] Orthogonal Trace Resonance in Transformer Attention for Large Language Models PDF

Table of Contents