ADEPT: Continual Pretraining via Adaptive Expansion and Dynamic Decoupled Tuning
Overview
Overall Novelty Assessment
The paper proposes ADEPT, a two-stage framework for domain-adaptive continual pretraining that selectively expands model layers based on functional importance and applies asymmetric learning rates to balance general and domain knowledge. It resides in the Adaptive and Selective Parameter Expansion leaf, which contains only two papers within the broader Continual Pretraining Methodologies and Frameworks branch. This is a relatively sparse research direction compared to more crowded areas like Medical and Healthcare Domains (ten papers) or General Training Strategies (seven papers), suggesting the work targets a less explored methodological niche.
The taxonomy reveals that ADEPT's immediate neighbors include Catastrophic Forgetting Mitigation Techniques (five papers) and General Training Strategies (seven papers), both addressing stability and optimization during continual pretraining. The sibling paper in the same leaf, AdapterSwap, focuses on modular adapter mechanisms rather than base model expansion, highlighting a methodological divergence. Nearby branches like Cross-Lingual Adaptation and Application Domains emphasize different axes—language transfer and domain-specific corpora—while ADEPT concentrates on architecture-level adaptation strategies. The scope_note clarifies that this leaf excludes uniform full-parameter methods, positioning ADEPT as a selective, function-aware alternative.
Among twenty-four candidates examined, none clearly refute the three main contributions. The functional specialization perspective examined ten candidates with zero refutable matches, suggesting limited prior work explicitly framing continual pretraining through layer-wise functional roles. The ADEPT framework itself examined four candidates, again with no refutations, indicating the two-stage design combining selective expansion and decoupled tuning may be novel within the search scope. Empirical validation across mathematical and medical domains examined ten candidates without refutation, though this likely reflects the limited search scale rather than absolute novelty, as domain-specific benchmarking is common in the broader taxonomy.
Based on the limited search scope of twenty-four semantically similar candidates, the work appears to introduce a distinct methodological angle—function-aware parameter expansion—within a relatively sparse taxonomy leaf. The absence of refutable prior work across all contributions suggests potential novelty, though the small candidate pool and narrow semantic search radius mean this analysis cannot confirm whether similar ideas exist in adjacent methodological spaces or domain-specific literature not captured by top-K retrieval.
Taxonomy
Research Landscape Overview
Claimed Contributions
The authors demonstrate through pilot studies that LLMs exhibit functional specialization where layers and units differentially encode general-critical capabilities. They argue that parameter expansion and optimization should be function-aware, with targeted layer expansion and decoupled training as a principled solution to domain adaptation.
The authors introduce ADEPT, a two-stage continual pretraining framework. The first stage selectively duplicates layers least critical for general domain to increase capacity. The second stage decouples parameter units within expanded layers and assigns asymmetric learning rates to balance knowledge injection and retention.
The authors perform comprehensive experiments showing ADEPT outperforms full-parameter continual pretraining by up to 5.76% on general benchmarks and 5.58% on target domain benchmarks, while using only 15% of parameters and less than 50% training time.
Core Task Comparisons
Comparisons with papers in the same taxonomy category
[49] AdapterSwap: Continuous Training of LLMs with Data Removal and Access-Control Guarantees PDF
Contribution Analysis
Detailed comparisons for each claimed contribution
Functional specialization perspective for continual pretraining
The authors demonstrate through pilot studies that LLMs exhibit functional specialization where layers and units differentially encode general-critical capabilities. They argue that parameter expansion and optimization should be function-aware, with targeted layer expansion and decoupled training as a principled solution to domain adaptation.
[51] Higpt: Heterogeneous graph language model PDF
[52] Dynamic Expert Specialization: Towards Catastrophic Forgetting-Free Multi-Domain MoE Adaptation PDF
[53] Need a Small Specialized Language Model? Plan Early! PDF
[54] DEMix Layers: Disentangling Domains for Modular Language Modeling PDF
[55] Resonant pattern shaping through iterative latency induction in contextual token expansion of transformer-based language models PDF
[56] News without borders: Domain adaptation of multilingual sentence embeddings for cross-lingual news recommendation PDF
[57] Nirvana: A Specialized Generalist Model With Task-Aware Memory Mechanism PDF
[58] Structural permutation layers: An unprecedented approach for modulating internal representations in large language models PDF
[59] Lightllm: A versatile large language model for predictive light sensing PDF
[60] Towards low-resource languages machine translation: A language-specific fine-tuning with LoRA for specialized large language models PDF
ADEPT framework with two-stage design
The authors introduce ADEPT, a two-stage continual pretraining framework. The first stage selectively duplicates layers least critical for general domain to increase capacity. The second stage decouples parameter units within expanded layers and assigns asymmetric learning rates to balance knowledge injection and retention.
[61] A Comprehensive Survey on Continual Learning in Generative Models PDF
[62] Convolutional prompting meets language models for continual learning PDF
[63] Deepfake Detection with Multi-Artifact Subspace Fine-Tuning and Selective Layer Masking PDF
[64] LAET: A Layer-wise Adaptive Ensemble Tuning Framework for Pretrained Language Models PDF
Empirical validation across mathematical and medical domains
The authors perform comprehensive experiments showing ADEPT outperforms full-parameter continual pretraining by up to 5.76% on general benchmarks and 5.58% on target domain benchmarks, while using only 15% of parameters and less than 50% training time.