Larger Datasets Can Be Repeated More: A Theoretical Analysis of Multi-Epoch Scaling in Linear Regression
Overview
Overall Novelty Assessment
The paper introduces a theoretical framework for quantifying the effective reuse rate E(K,N) in multi-epoch linear regression, characterizing how repeated passes over N samples compare to single-pass training on fresh data. It resides in the 'Scaling Laws and Effective Reuse Rates' leaf of the taxonomy, which contains only two papers total (including this one). This leaf sits within the broader 'Theoretical Analysis of Multi-Epoch Training and Data Reuse' branch, indicating a relatively sparse research direction focused on fundamental scaling properties rather than algorithmic or applied concerns.
The taxonomy reveals neighboring branches addressing related but distinct problems. 'Sensitivity and Complexity Analysis' examines model robustness to data perturbations, while 'Model Transfer and Reuse for Efficient Training' focuses on leveraging pretrained representations across tasks rather than iterative training on fixed data. The 'Linear Mixed-Effects Models' branch handles repeated measurements through hierarchical random effects, representing a statistical modeling tradition fundamentally different from the iterative optimization perspective adopted here. The paper's theoretical focus on scaling laws positions it at the intersection of classical statistical learning theory and modern concerns about data efficiency in large-scale training.
Among 21 candidates examined across three contributions, the analysis found limited prior work overlap. The core effective reuse rate characterization (8 candidates examined, 0 refutable) and scaling behavior analysis for strongly convex and Zipf cases (10 candidates examined, 0 refutable) appear relatively novel within this search scope. However, the optimal learning rate derivation and risk approximation for multi-epoch SGD (3 candidates examined, 1 refutable) shows more substantial prior work, suggesting this technical component may have existing coverage in the optimization literature.
Based on the limited search scope of 21 semantically similar candidates, the work appears to occupy a sparsely populated research direction with only one sibling paper in its taxonomy leaf. The core theoretical contributions around effective reuse rates show minimal overlap with examined candidates, though the learning rate analysis component has at least one overlapping prior result. The analysis does not cover exhaustive citation networks or domain-specific venues that might contain additional relevant theoretical work on data reuse in iterative training.
Taxonomy
Research Landscape Overview
Claimed Contributions
The authors theoretically analyze how the effective reuse rate E(K,N)—the multiplicative factor by which a dataset must grow under one-pass training to match K-epoch training performance—depends on both the number of epochs K and dataset size N. They prove that for small K, E(K,N) is approximately K, while for large K it plateaus at a problem-dependent value that grows with N (order log N for strongly convex cases).
The authors establish precise scaling laws for E(K,N) in two settings: strongly convex linear regression where saturation occurs at order log N, and Zipf-distributed data where saturation scales as a power of N. These results reveal a phase transition between an effective-reuse regime and a limited-reuse regime.
The authors derive the optimal learning rate for multi-epoch stochastic gradient descent in linear regression and provide an approximation formula for expected excess risk with multiplicative error no(1). These technical results enable precise characterization of multi-epoch training dynamics.
Core Task Comparisons
Comparisons with papers in the same taxonomy category
[1] Improved Scaling Laws in Linear Regression via Data Reuse PDF
Contribution Analysis
Detailed comparisons for each claimed contribution
Theoretical characterization of effective reuse rate E(K,N) in linear regression
The authors theoretically analyze how the effective reuse rate E(K,N)—the multiplicative factor by which a dataset must grow under one-pass training to match K-epoch training performance—depends on both the number of epochs K and dataset size N. They prove that for small K, E(K,N) is approximately K, while for large K it plateaus at a problem-dependent value that grows with N (order log N for strongly convex cases).
[1] Improved Scaling Laws in Linear Regression via Data Reuse PDF
[28] {RECL}: Responsive {Resource-Efficient} continuous learning for video analytics PDF
[29] The Distribution of Tax Collectability, Quality of Tax Services Efforts to Tax Coverage Ratio PDF
[30] A linear regression-based resource utilization prediction policy for live migration in cloud computing PDF
[31] Overview of multivariate regression model analysis and application PDF
[32] Examining Associations Between Physician Data Utilization for Practice Improvement and Lifelong Learning. PDF
[33] A novel roundness error evaluation method for high-speed EMU train axles PDF
[34] Enhanced Data Utilization Approach to Improve the Prediction Performance of Groundwater Level Using Semianalytical and Data Process Models PDF
Scaling behavior analysis for strongly convex and Zipf-distributed data cases
The authors establish precise scaling laws for E(K,N) in two settings: strongly convex linear regression where saturation occurs at order log N, and Zipf-distributed data where saturation scales as a power of N. These results reveal a phase transition between an effective-reuse regime and a limited-reuse regime.
[38] Rapid Overfitting of Multi-Pass Stochastic Gradient Descent in Stochastic Convex Optimization PDF
[39] SGD and Hogwild! convergence without the bounded gradients assumption PDF
[40] Large deviations rates for stochastic gradient descent with strongly convex functions PDF
[41] Fast and Faster Convergence of SGD for Over-Parameterized Models and an Accelerated Perceptron PDF
[42] Stochastic Polyak Step-size for SGD: An Adaptive Learning Rate for Fast Convergence PDF
[43] Performative control for linear dynamical systems PDF
[44] Stochastic versus Deterministic in Stochastic Gradient Descent PDF
[45] Learning with interactive models over decision-dependent distributions PDF
[46] Convergence and concentration properties of constant step-size SGD through Markov chains PDF
[47] Adaptive SGD with Polyak stepsize and Line-search: Robust Convergence and Variance Reduction PDF
Optimal learning rate derivation and risk approximation formula for multi-epoch SGD
The authors derive the optimal learning rate for multi-epoch stochastic gradient descent in linear regression and provide an approximation formula for expected excess risk with multiplicative error no(1). These technical results enable precise characterization of multi-epoch training dynamics.