Can Language Models Discover Scaling Laws?

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 6.7 Download Report PDF

scaling law; agent; LLM

Discovering scaling laws for predicting model performance at scale is a fundamental and open-ended challenge, mostly reliant on slow, case specific human experimentation. To investigate the potential for LLMs to automate this process, we collect over 5,000 experiments from existing literature and curate seven diverse scaling law discovery tasks. While existing agents struggle to produce accurate law formulas, this paper introduces SLDAgent, an evolution-based agent that co-optimize the scaling law model and the parameters, enabling it to autonomously explore complex relationships between variables. For the first time, we demonstrates that SLDAgent can automatically discover laws that exhibit consistently more accurate extrapolation than their established, human-derived counterparts across all tasks. Through comprehensive analysis, we elucidate why these discovered laws are superior and verify their practical utility in both pretraining and finetuning applications. This work establishes a new paradigm for agentic scientific discovery, showing that AI systems can understand their own scaling behavior, and can contribute novel and practical knowledge back to the research community.

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper introduces SLDAgent, an evolution-based system that autonomously discovers scaling law formulas from experimental data, and SLDBench, a benchmark comprising over 5,000 experiments across seven tasks. This work occupies the 'Automated Scaling Law Discovery' leaf in the taxonomy, which currently contains no sibling papers—making it the sole representative of this research direction. While the broader taxonomy encompasses 50 papers across 33 leaf nodes, this particular branch remains sparse, suggesting that automated discovery of scaling laws is an emerging rather than crowded area.

The taxonomy reveals substantial activity in adjacent branches: empirical characterization methods (Fundamental Compute-Loss Scaling, Temporal Dynamics), predictive modeling approaches (Observational Scaling Law Inference, Downstream Performance Prediction), and hyperparameter optimization (Hyperparameter and Training Configuration Scaling). The original paper diverges from these by proposing meta-level automation—using language models to discover laws rather than manually fitting empirical data or observationally inferring relationships. This positions the work at the intersection of predictive modeling and training method optimization, but with a fundamentally different mechanism: agentic exploration rather than human-guided experimentation or statistical extrapolation.

Among 26 candidates examined, the contribution-level analysis reveals mixed novelty signals. The benchmark contribution (SLDBench) examined 10 candidates with no clear refutations, suggesting this curation effort addresses a gap in standardized evaluation. The agent contribution (SLDAgent) examined 6 candidates and found 1 refutable match, indicating some overlap with prior automated discovery or optimization methods within this limited search scope. The superhuman performance claim examined 10 candidates without refutation, though this reflects the search scale rather than exhaustive validation. The statistics suggest moderate prior work density for the agent mechanism, but sparser coverage for benchmark construction and performance claims.

Given the limited search scope of 26 semantically similar papers, this analysis captures nearby work but cannot claim exhaustive coverage of all relevant optimization, meta-learning, or symbolic regression methods. The taxonomy structure and contribution-level statistics together suggest the work occupies a genuinely sparse research direction (automated scaling law discovery), though individual technical components (evolution-based search, formula optimization) may connect to broader literatures in automated machine learning and symbolic discovery not fully represented in this domain-specific search.

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: automated discovery of scaling laws for language model performance. The field has matured into a rich taxonomy spanning empirical characterization of how loss and performance scale with compute, data, and model size; architecture-specific investigations into transformers, mixture-of-experts, and quantized models; data-centric studies examining quality, diversity, and synthetic data effects; and capability-specific analyses for reasoning, memorization, and multilingual performance. Branches also address training methods and hyperparameter tuning, predictive modeling techniques that enable observational inference without exhaustive training, model composition and merging dynamics, system-level considerations for distributed training, theoretical foundations rooted in information theory, large-scale empirical studies from industry labs, robustness and safety implications, and applications extending scaling insights to new domains. Representative works illustrate this breadth: Neural Scaling Laws[14] and Observational Scaling Laws[5] anchor empirical and predictive methods, while Pythia[29] and DeepSeek LLM[7] exemplify large-scale empirical studies, and Inference Scaling Laws[4] and Test-Time Compute Scaling[39] explore compute allocation beyond pretraining. Recent activity highlights tensions between observational efficiency and experimental rigor, with Observational Scaling Laws[5] enabling low-cost prediction while works like Algorithmic Progress LMs[1] and Temporal Scaling Law[2] track how algorithmic improvements shift scaling curves over time. The original paper, LMs Discover Scaling[0], sits squarely within the Automated Scaling Law Discovery branch, proposing that language models themselves can identify and formulate scaling relationships—a meta-level approach contrasting with manual empirical fitting or observational extrapolation. This automation theme connects to AutoScale[43] and Optimal Hyperparameter Scaling[46], which similarly seek to reduce human effort in characterizing scaling behavior. By leveraging models' own reasoning capabilities, LMs Discover Scaling[0] offers a novel complement to traditional methods, potentially accelerating the discovery process as models grow more capable and the space of architectural and training choices expands.

Claimed Contributions

SLDBench: A comprehensive scaling law discovery benchmark

10 retrieved papers

The authors introduce SLDBench, a benchmark containing seven diverse scaling law discovery tasks derived from over 5,000 experiments in existing literature. Each task requires identifying a symbolic expression that accurately extrapolates to unseen test data, providing a rigorous testbed for evaluating agentic scientific discovery systems.

10 retrieved papers

SLDAgent: An evolution-based agent for scaling law discovery

Can Refute

6 retrieved papers

The authors propose SLDAgent, a novel evolution-based agent that co-optimizes both the scaling law expression and its parameter fitting routine. This evolutionary approach enables autonomous exploration of complex variable relationships and achieves state-of-the-art performance on scaling law discovery tasks.

6 retrieved papers

Can Refute

Demonstration of superhuman scaling law discovery

10 retrieved papers

The authors demonstrate for the first time that an AI agent can autonomously discover scaling laws that consistently outperform human-derived counterparts in extrapolation accuracy across all benchmark tasks. They validate the practical utility of these discovered laws in pretraining and fine-tuning applications.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Within the taxonomy built over the current TopK core-task papers, the original paper is assigned to a leaf with no direct siblings and no cousin branches under the same grandparent topic. In this retrieved landscape, it appears structurally isolated, which is one partial signal of novelty, but still constrained by search coverage and taxonomy granularity.

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

SLDBench: A comprehensive scaling law discovery benchmark

[7] DeepSeek LLM: Scaling Open-Source Language Models with Longtermism PDF

Cannot Refute

[9] Scaling Laws of Synthetic Data for Language Models PDF

Cannot Refute

[18] Scaling Laws for Generative Mixed-Modal Language Models PDF

Cannot Refute

[23] PaLM: Scaling Language Modeling with Pathways PDF

Cannot Refute

[29] Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling PDF

Cannot Refute

[61] Revisiting Neural Scaling Laws in Language and Vision PDF

Cannot Refute

[67] Rewardbench: Evaluating reward models for language modeling PDF

Cannot Refute

[68] Scaling data-constrained language models PDF

Cannot Refute

[69] Reproducible scaling laws for contrastive language-image learning PDF

Cannot Refute

[70] Exploring scaling laws for local SGD in large language model training PDF

Cannot Refute

Contribution

SLDAgent: An evolution-based agent for scaling law discovery

[53] Evosld: Automated neural scaling law discovery with large language models PDF

Can Refute

[51] Spatiotemporal co-optimization of agricultural management practices towards climate-smart crop production PDF

Cannot Refute

[52] : Democratized LLM Scaling for A Large Model Zoo in the Wild PDF

Cannot Refute

[54] Multi-criteria selection and scaling of ground motion records using Evolutionary Algorithms PDF

Cannot Refute

[55] Joint scaling laws in functional and evolutionary categories in prokaryotic genomes PDF

Cannot Refute

[56] Wavelet denoising with evolutionary algorithms PDF

Cannot Refute

Contribution

Demonstration of superhuman scaling law discovery

[57] Beyond neural scaling laws: beating power law scaling via data pruning PDF

Cannot Refute

[58] Scaling laws for predicting downstream performance in llms PDF

Cannot Refute

[59] Model Steering: Learning with a Reference Model Improves Generalization Bounds and Scaling Laws PDF

Cannot Refute

[60] Bayesian Neural Scaling Laws Extrapolation with Prior-Fitted Networks PDF

Cannot Refute

[61] Revisiting Neural Scaling Laws in Language and Vision PDF

Cannot Refute

[62] Broken neural scaling laws PDF

Cannot Refute

[63] Machine LearningâInformed Predictive Design and Analysis of Electrohydrodynamic Printing Systems PDF

Cannot Refute

[64] Human mobility is well described by closed-form gravity-like models learned automatically from data PDF

Cannot Refute

[65] Considerations on Stellaratorâs Optimization from the Perspective of the Energy Confinement Time Scaling Laws PDF

Cannot Refute

[66] Bayesian Neural Scaling Law Extrapolation with Prior-Data Fitted Networks PDF

Cannot Refute

Can Language Models Discover Scaling Laws?

Overview

Overall Novelty Assessment

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

Contribution Analysis

SLDBench: A comprehensive scaling law discovery benchmark

[7] DeepSeek LLM: Scaling Open-Source Language Models with Longtermism PDF

[9] Scaling Laws of Synthetic Data for Language Models PDF

[18] Scaling Laws for Generative Mixed-Modal Language Models PDF

[23] PaLM: Scaling Language Modeling with Pathways PDF

[29] Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling PDF

[61] Revisiting Neural Scaling Laws in Language and Vision PDF

[67] Rewardbench: Evaluating reward models for language modeling PDF

[68] Scaling data-constrained language models PDF

[69] Reproducible scaling laws for contrastive language-image learning PDF

[70] Exploring scaling laws for local SGD in large language model training PDF

SLDAgent: An evolution-based agent for scaling law discovery

[53] Evosld: Automated neural scaling law discovery with large language models PDF

[51] Spatiotemporal co-optimization of agricultural management practices towards climate-smart crop production PDF

[52] : Democratized LLM Scaling for A Large Model Zoo in the Wild PDF

[54] Multi-criteria selection and scaling of ground motion records using Evolutionary Algorithms PDF

[55] Joint scaling laws in functional and evolutionary categories in prokaryotic genomes PDF

[56] Wavelet denoising with evolutionary algorithms PDF

Demonstration of superhuman scaling law discovery

[57] Beyond neural scaling laws: beating power law scaling via data pruning PDF

[58] Scaling laws for predicting downstream performance in llms PDF

[59] Model Steering: Learning with a Reference Model Improves Generalization Bounds and Scaling Laws PDF

[60] Bayesian Neural Scaling Laws Extrapolation with Prior-Fitted Networks PDF

[61] Revisiting Neural Scaling Laws in Language and Vision PDF

[62] Broken neural scaling laws PDF

[63] Machine LearningâInformed Predictive Design and Analysis of Electrohydrodynamic Printing Systems PDF

[64] Human mobility is well described by closed-form gravity-like models learned automatically from data PDF

[65] Considerations on Stellaratorâs Optimization from the Perspective of the Energy Confinement Time Scaling Laws PDF

[66] Bayesian Neural Scaling Law Extrapolation with Prior-Data Fitted Networks PDF

Table of Contents

[63] Machine LearningâInformed Predictive Design and Analysis of Electrohydrodynamic Printing Systems PDF

[65] Considerations on Stellaratorâs Optimization from the Perspective of the Energy Confinement Time Scaling Laws PDF