Perturbation-Induced Linearization: Constructing Unlearnable Data with Solely Linear Classifiers

ICLR 2026 Conference SubmissionAnonymous Authors
unlearnable examplesdata protectionlinear modelshortcutlinearity
Abstract:

Collecting web data to train deep models has become increasingly common, raising concerns about unauthorized data usage. To mitigate this issue, unlearnable examples introduce imperceptible perturbations into data, preventing models from learning effectively. However, existing methods typically rely on deep neural networks as surrogate models for perturbation generation, resulting in significant computational costs. In this work, we propose Perturbation-Induced Linearization (PIL), a computationally efficient yet effective method that generates perturbations using only linear surrogate models. PIL achieves comparable or better performance than existing surrogate-based methods while reducing computational time dramatically. We further reveal a key mechanism underlying unlearnable examples: inducing linearization to deep models, which explains why PIL can achieve competitive results in a very short training time. Beyond this, we provide an analysis about the property of unlearnable examples under percentage-based partial perturbation. Our work not only provides a practical approach for data protection but also offers insights into what makes unlearnable examples effective.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper proposes Perturbation-Induced Linearization (PIL), a method for generating unlearnable examples using linear surrogate models rather than deep networks. According to the taxonomy tree, this work occupies the 'Linearization-Based Perturbation Methods' leaf under 'Core Perturbation Generation Methods'. Notably, this leaf contains only the original paper itself—no sibling papers are present. This indicates a relatively sparse research direction within the broader field of unlearnable example generation, which encompasses fifty papers across multiple branches including error-minimizing approaches, adversarial methods, and domain-specific applications.

The taxonomy reveals that PIL's closest neighbors are error-minimizing noise approaches (four papers) and adversarial-based perturbation generation (two papers), both sibling leaves under the same parent category. The error-minimizing branch explicitly suppresses informative features by minimizing training error, while adversarial methods leverage adversarial training dynamics. PIL diverges by using linear approximations to induce linearization in deep models, positioning it between computational efficiency concerns and mechanistic understanding. The broader 'Core Perturbation Generation Methods' category also includes conditional and transferable methods (three papers), suggesting the field balances fundamental technique development with practical deployment considerations.

Among fifteen candidates examined, the 'Linearization mechanism underlying unlearnable examples' contribution shows one refutable candidate from ten examined, while the 'PIL method' contribution found zero refutable candidates among five examined. The 'Theoretical analysis of partial perturbation property' was not evaluated against prior work. This suggests that while the core algorithmic approach appears relatively novel within the limited search scope, the mechanistic insight about linearization may have some overlap with existing literature. The analysis explicitly covers top-K semantic matches plus citation expansion, not an exhaustive field survey.

Based on the limited search scope of fifteen candidates, PIL appears to introduce a distinct computational approach within a sparsely populated research direction. The single-paper leaf status and absence of refutable candidates for the method itself suggest potential novelty, though the mechanistic explanation shows modest prior overlap. The analysis does not cover the full spectrum of unlearnable example research, particularly recent work in robustness enhancements or domain-specific adaptations that might employ similar linearization insights.

Taxonomy

Core-task Taxonomy Papers
50
3
Claimed Contributions
15
Contribution Candidate Papers Compared
1
Refutable Paper

Research Landscape Overview

Core task: Generating unlearnable examples for unauthorized data usage prevention. The field has organized itself around several complementary directions. Core Perturbation Generation Methods explore fundamental techniques for crafting imperceptible noise that disrupts model training, including error-minimization approaches like Unlearnable Time Series[1] and linearization strategies such as Perturbation Linearization[0]. Domain-Specific Unlearnable Example Applications extend these ideas to specialized settings—temporal data (Temporal Unlearnable Examples[3]), multimodal contexts (Multimodal Unlearnable Examples[4]), medical imaging (Medical Image Safeguarding[5], Medical Unlearnable Examples[7][8]), and even graph or code domains. Robustness and Countermeasures investigate adversarial dynamics, examining how defenses can be bypassed and how to build more resilient protections. Alternative Data Protection Paradigms consider orthogonal mechanisms like backdoor watermarking and differential privacy, while Theoretical Foundations and Surveys (Unlearnable Data Survey[9]) provide formal guarantees and consolidate emerging insights. Recent work reveals tension between transferability and robustness: some methods prioritize cross-architecture generalization (Transferable Unlearnable Examples[11]), while others focus on stability under data augmentation (Robust Unlearnable Examples[19]) or adversarial scrubbing (Provably Unlearnable Data[6]). Perturbation Linearization[0] sits within the linearization-based branch, emphasizing efficient gradient approximations to generate perturbations that remain effective across diverse training pipelines. Compared to error-minimization techniques like Temporal Unlearnable Examples[3], which explicitly minimize prediction error on perturbed samples, Perturbation Linearization[0] leverages linear surrogate models for scalability and interpretability. This positions it alongside works that balance computational efficiency with broad applicability, addressing open questions about how to maintain unlearnability when attackers employ adaptive defenses or novel architectures.

Claimed Contributions

Perturbation-Induced Linearization (PIL) method

The authors introduce PIL, a novel method for generating unlearnable examples that uses simple linear classifiers instead of deep neural networks as surrogate models. This approach achieves comparable or better performance than existing methods while dramatically reducing computational time, requiring less than one GPU minute for CIFAR-10 compared to over 15 GPU hours for existing methods.

5 retrieved papers
Linearization mechanism underlying unlearnable examples

The authors uncover that unlearnable examples work by forcing deep neural networks to behave more like linear models, which reduces their capacity to learn meaningful representations. This mechanism is shown to be present not only in PIL but also in existing unlearnable example methods, providing a fundamental explanation for their effectiveness.

10 retrieved papers
Can Refute
Theoretical analysis of partial perturbation property

The authors provide theoretical and empirical analysis explaining why unlearnable examples cannot substantially reduce test accuracy when only part of the dataset is perturbed. They introduce Assumption 1 regarding gradient orthogonality and prove in Theorem 1 that unlearnable examples do not interfere with learning from clean data, revealing a fundamental limitation of this protection approach.

0 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Within the taxonomy built over the current TopK core-task papers, the original paper is assigned to a leaf with no direct siblings and no cousin branches under the same grandparent topic. In this retrieved landscape, it appears structurally isolated, which is one partial signal of novelty, but still constrained by search coverage and taxonomy granularity.

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Perturbation-Induced Linearization (PIL) method

The authors introduce PIL, a novel method for generating unlearnable examples that uses simple linear classifiers instead of deep neural networks as surrogate models. This approach achieves comparable or better performance than existing methods while dramatically reducing computational time, requiring less than one GPU minute for CIFAR-10 compared to over 15 GPU hours for existing methods.

Contribution

Linearization mechanism underlying unlearnable examples

The authors uncover that unlearnable examples work by forcing deep neural networks to behave more like linear models, which reduces their capacity to learn meaningful representations. This mechanism is shown to be present not only in PIL but also in existing unlearnable example methods, providing a fundamental explanation for their effectiveness.

Contribution

Theoretical analysis of partial perturbation property

The authors provide theoretical and empirical analysis explaining why unlearnable examples cannot substantially reduce test accuracy when only part of the dataset is perturbed. They introduce Assumption 1 regarding gradient orthogonality and prove in Theorem 1 that unlearnable examples do not interfere with learning from clean data, revealing a fundamental limitation of this protection approach.