Understanding the Learning Phases in Self-Supervised Learning via Critical Periods

ICLR 2026 Conference SubmissionAnonymous Authors
Learning PhasesCritical PeriodsSelf-Supervised Learning
Abstract:

Self-supervised learning (SSL) has emerged as a powerful pretraining strategy to learn transferable representations from unlabeled data. Yet, it remains unclear how long SSL models should be pretrained for such representations to emerge. Contrary to the prevailing heuristic that longer pretraining translates to better downstream performance, we identify a transferability trade-off: across diverse SSL settings, intermediate checkpoints often yield stronger out-of-domain (OOD) generalization, whereas additional pretraining primarily benefits in-domain (ID) accuracy. From this observation, we hypothesize that SSL progresses through learning phases that can be characterized through the lens of critical periods (CP). Prior work on CP has shown that supervised learning models exhibit early phases of high plasticity, followed by a consolidation phase where adaptability declines but task-specific performance keeps increasing. Since traditional CP analysis depends on supervised labels, for SSL we rethink CP in two ways. First, we inject deficits to perturb the pretraining data and measure the quality of learned representations via downstream tasks. Second, to estimate network plasticity during pretraining we compute the Fisher Information matrix on pretext objectives, quantifying the sensitivity of model parameters to the supervisory signal defined by the pretext tasks. We conduct several experiments to demonstrate that SSL models do exhibit their own CP, with CP closure marking a sweet spot where representations are neither underdeveloped nor overfitted to the pretext task. Leveraging these insights, we propose CP-guided checkpoint selection as a mechanism for identifying intermediate checkpoints during SSL that improve OOD transferability. Finally, to balance the transferability trade-off, we propose CP-guided self-distillation, which selectively distills layer representations from the sweet spot (CP closure) checkpoint into their overspecialized counterparts in the final pretrained model.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper investigates temporal dynamics of self-supervised learning pretraining, identifying a transferability trade-off where intermediate checkpoints yield stronger out-of-domain generalization while extended pretraining benefits in-domain accuracy. It resides in the 'Learning Phase Characterization' leaf under 'Theoretical Foundations and Mechanisms', which contains only this single paper among 50 total papers across 19 leaf nodes. This placement indicates a relatively sparse research direction focused specifically on temporal phase analysis during SSL pretraining, distinguishing it from the more populated methodological and application-oriented branches.

The taxonomy reveals neighboring theoretical work in 'Transferability Analysis and Measurement' (3 papers) and 'Representation Learning Principles' (2 papers), which examine transfer capability and feature learning mechanisms but without explicit temporal phase characterization. The broader 'Pretraining Methodologies' branch contains 13 papers across contrastive, generative, and architectural innovations, while 'Transfer Learning Strategies' encompasses 11 papers on adaptation techniques. The paper's focus on learning phases during pretraining positions it at the intersection of theoretical analysis and practical transfer concerns, bridging mechanistic understanding with downstream performance implications.

Among 27 candidates examined across three contributions, no clearly refuting prior work was identified. The transferability trade-off analysis examined 10 candidates with 0 refutations, the critical period reformulation for SSL examined 7 candidates with 0 refutations, and the checkpoint selection intervention examined 10 candidates with 0 refutations. This limited search scope suggests that within the top semantic matches and citation expansions, no prior work explicitly documents the same temporal trade-off phenomenon or applies critical period analysis to self-supervised settings, though the search does not claim exhaustive coverage of all potentially relevant literature.

Based on examination of 27 semantically related candidates, the work appears to occupy a distinct position within SSL research by explicitly characterizing learning phases and their differential impact on in-domain versus out-of-domain transfer. The sparse population of its taxonomy leaf and absence of refuting candidates among examined papers suggest novelty in this specific analytical framing, though the limited search scope means potentially relevant work outside the top-K semantic neighborhood may exist but was not captured in this analysis.

Taxonomy

Core-task Taxonomy Papers
50
3
Claimed Contributions
27
Contribution Candidate Papers Compared
0
Refutable Paper

Research Landscape Overview

Core task: understanding learning phases and transferability in self-supervised pretraining. The field has organized itself around four major branches. Theoretical Foundations and Mechanisms investigates the underlying principles governing how self-supervised models learn and generalize, including phase transitions and critical periods during training. Pretraining Methodologies and Architectures encompasses the diverse algorithmic strategies—contrastive methods like MoCo Chest Xray[29], masked modeling approaches, and hybrid techniques—that enable models to extract useful representations from unlabeled data. Transfer Learning Strategies and Adaptation focuses on how pretrained representations are fine-tuned or adapted to downstream tasks, exploring questions of domain shift, few-shot learning as in Surgical Phases Few-Shot[50], and parameter-efficient adaptation methods such as those studied in BatchNorm Finetuning Transfer[5]. Application Domains and Empirical Studies documents the breadth of real-world deployments, from medical imaging in SSL Skin Cancer[10] and Retinal Multimodal SSL[13] to remote sensing in Consecutive Pretraining Remote Sensing[23] and specialized domains like Seismic Fault Transformer[17]. A particularly active line of work examines the temporal dynamics of pretraining: when and how representations become useful, and whether certain learning windows are more critical than others. Critical Periods SSL[0] sits squarely within this theoretical inquiry, characterizing distinct phases during self-supervised training and their impact on downstream transferability. This contrasts with more application-driven studies that take pretrained models as given and focus on adaptation strategies, such as BatchNorm Finetuning Transfer[5], which explores efficient fine-tuning by selectively updating normalization layers. Meanwhile, works like Big SSL Semi-Supervised[3] bridge pretraining and semi-supervised learning, highlighting trade-offs between label efficiency and representation quality. By situating learning phase characterization within the broader theoretical branch, Critical Periods SSL[0] complements empirical transfer studies and offers mechanistic insights into why certain pretraining regimes yield more robust or adaptable features.

Claimed Contributions

Identification of transferability trade-off in SSL pretraining

The authors demonstrate that extended SSL pretraining creates a trade-off where intermediate checkpoints achieve better out-of-domain generalization, whereas longer pretraining primarily benefits in-domain accuracy. This challenges the prevailing heuristic that longer pretraining always improves downstream performance.

10 retrieved papers
Reformulation of critical period analysis for SSL

The authors adapt critical period analysis from supervised learning to SSL by injecting deficits into pretraining data and computing Fisher Information on pretext objectives rather than class labels. This reformulation enables tracking plasticity phases during SSL pretraining without requiring downstream supervision.

7 retrieved papers
CP-guided checkpoint selection and self-distillation interventions

The authors introduce two practical methods leveraging critical period insights: CP-guided checkpoint selection identifies intermediate checkpoints at CP closure for improved OOD transfer, and CP-guided self-distillation selectively distills early-layer representations from CP checkpoints into final models to balance the transferability trade-off.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Within the taxonomy built over the current TopK core-task papers, the original paper is assigned to a leaf with no direct siblings and no cousin branches under the same grandparent topic. In this retrieved landscape, it appears structurally isolated, which is one partial signal of novelty, but still constrained by search coverage and taxonomy granularity.

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Identification of transferability trade-off in SSL pretraining

The authors demonstrate that extended SSL pretraining creates a trade-off where intermediate checkpoints achieve better out-of-domain generalization, whereas longer pretraining primarily benefits in-domain accuracy. This challenges the prevailing heuristic that longer pretraining always improves downstream performance.

Contribution

Reformulation of critical period analysis for SSL

The authors adapt critical period analysis from supervised learning to SSL by injecting deficits into pretraining data and computing Fisher Information on pretext objectives rather than class labels. This reformulation enables tracking plasticity phases during SSL pretraining without requiring downstream supervision.

Contribution

CP-guided checkpoint selection and self-distillation interventions

The authors introduce two practical methods leveraging critical period insights: CP-guided checkpoint selection identifies intermediate checkpoints at CP closure for improved OOD transfer, and CP-guided self-distillation selectively distills early-layer representations from CP checkpoints into final models to balance the transferability trade-off.