Secret-Protected Evolution for Differentially Private Synthetic Text Generation

ICLR 2026 Conference SubmissionAnonymous Authors
synthetic datadifferential privacy
Abstract:

Text data has become extremely valuable on large language models (LLMs) and even lead to general artificial intelligence (AGI). A lot of high-quality text in the real world is private and cannot be freely used due to privacy concerns. Therefore, differentially private (DP) synthetic text generation has been proposed, aiming to produce high-utility synthetic data while protecting sensitive information. However, existing DP synthetic text generation imposes uniform guarantees that often overprotect non-sensitive content, resulting in substantial utility loss and computational overhead. Therefore, we propose Secret-Protected Evolution (SecPE), a novel framework that extends private evolution with secret-aware protection. Theoretically, we show that SecPE satisfies (\vp,\vr)(\vp, \vr)-secret protection, constituting a relaxation of Gaussian DP that enables tighter utility–privacy trade-offs, while also substantially reducing computational complexity relative to baseline methods. Empirically, across the OpenReview, PubMed, and Yelp benchmarks, SecPE consistently achieves lower Fréchet Inception Distance (FID) and higher downstream task accuracy than GDP-based Aug-PE baselines, while requiring less noise to attain the same level of protection. Our results highlight that secret-aware guarantees can unlock more practical and effective privacy-preserving synthetic text generation.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper proposes Secret-Protected Evolution (SecPE), a framework that extends private evolution with secret-aware protection for differentially private synthetic text generation. It resides in the 'Secret-Aware and Selective DP' leaf under 'Evolutionary and Iterative DP Text Synthesis', which contains only two papers including this one. This places the work in a relatively sparse research direction within the broader field of differentially private text generation, suggesting that secret-aware evolutionary approaches remain underexplored compared to more established methods like DP fine-tuning or GAN-based synthesis.

The taxonomy reveals that SecPE's nearest neighbors include genetic and distribution-alignment methods in a sibling leaf, as well as DP fine-tuning approaches and private next-token prediction techniques in parallel branches. While evolutionary synthesis methods exist (e.g., genetic algorithms for distribution alignment), the secret-aware dimension distinguishes this work from uniform-privacy approaches. The framework diverges from end-to-end generative models and knowledge distillation techniques that apply global privacy budgets, instead targeting selective protection of sensitive content—a boundary explicitly noted in the taxonomy's scope definitions.

Among thirty candidates examined, the analysis found limited prior work overlap. The SecPE framework itself shows one refutable candidate out of ten examined, suggesting some evolutionary privacy mechanisms exist but are not densely represented. The secret-protected clustering method appears more novel, with zero refutable candidates among ten examined. However, the theoretical formalization of secret protection encountered four refutable candidates out of ten, indicating that formal privacy relaxations and secret-aware guarantees have received prior theoretical attention, though the specific application to evolutionary text synthesis may be less explored.

Based on the limited search scope of thirty semantically similar papers, the work appears to occupy a relatively novel position within secret-aware evolutionary synthesis. The framework's combination of selective privacy and iterative refinement addresses a gap between uniform-noise methods and application-specific approaches, though the theoretical foundations draw on existing relaxations of differential privacy. The analysis does not cover exhaustive citation networks or domain-specific venues, so additional related work may exist beyond the top-K semantic matches examined.

Taxonomy

Core-task Taxonomy Papers
50
3
Claimed Contributions
30
Contribution Candidate Papers Compared
5
Refutable Paper

Research Landscape Overview

Core task: differentially private synthetic text generation. The field has organized itself into several major branches that reflect both methodological diversity and application-driven concerns. Core DP Text Generation Methods encompass foundational techniques—ranging from evolutionary and iterative synthesis approaches (such as Secret Protected Evolution[0] and Selective Privacy[35]) to knowledge distillation and GAN-based frameworks (e.g., Private GAN[3], Private Knowledge Distillation[4])—that directly tackle the challenge of generating text under formal privacy guarantees. Application-Specific DP Text Generation addresses domain needs in healthcare (Healthcare Synthetic Data[18], Term2Note[21]), recommendation systems (Recommendation Systems[20]), and instruction tuning (Private Instructions[10]), while DP Generative Models for Non-Text Modalities extends similar privacy mechanisms to images and tabular data (Medical Convolutional GANs[7], Echonet Synthetic[9]). Meanwhile, Evaluation, Privacy Metrics, and Theoretical Foundations (Synthetic Privacy Metrics[17], Evaluation Metrics Review[33]) provide the analytical backbone, and Specialized DP Mechanisms and Extensions explore advanced privacy notions (User Level Privacy[36], Formal Privacy Guarantees[37]) alongside Privacy-Utility Trade-offs and Practical Considerations that guide real-world deployment. Within the evolutionary and iterative synthesis cluster, a particularly active line of work focuses on secret-aware and selective privacy mechanisms that protect only sensitive portions of data rather than applying uniform noise. Secret Protected Evolution[0] exemplifies this direction by evolving synthetic text while explicitly safeguarding designated secrets, contrasting with broader approaches like Selective Privacy[35] that allow fine-grained control over which attributes receive protection. These methods sit at the intersection of iterative refinement and targeted privacy, differing from end-to-end generative models (Private GAN[3]) that treat all data uniformly and from distillation-based techniques (Private Knowledge Distillation[4]) that transfer knowledge under global privacy budgets. The trade-off between granular secret protection and overall utility remains a central open question, as does the scalability of evolutionary strategies compared to one-shot generation pipelines like those in Synthetic Text APIs[5] or RAG-based frameworks (RAG Differential Privacy[2]).

Claimed Contributions

Secret-Protected Evolution (SecPE) framework

The authors introduce SecPE, a framework that shifts from uniform differential privacy guarantees to secret-aware protection. This framework provides (p,r)-secret protection, which relaxes Gaussian DP by requiring protection only at specific prior points rather than over the entire trade-off curve, enabling tighter utility-privacy trade-offs.

10 retrieved papers
Can Refute
Secret-protected clustering method

The authors propose a clustering-based method that detects sensitive attributes and forms representative centers by updating public clusters with noisy private data. This approach reduces computational complexity from O(M*N_syn) to O(K*N_syn), where K is much smaller than M, enabling scalability to larger datasets.

10 retrieved papers
Theoretical formalization of secret protection for text generation

The authors provide a theoretical framework showing that their method satisfies (p,r)-secret protection, which is a relaxation of Gaussian differential privacy. This formalization bounds the reconstruction success probability calibrated to specific secrets rather than enforcing uniform protection across all records.

10 retrieved papers
Can Refute

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Secret-Protected Evolution (SecPE) framework

The authors introduce SecPE, a framework that shifts from uniform differential privacy guarantees to secret-aware protection. This framework provides (p,r)-secret protection, which relaxes Gaussian DP by requiring protection only at specific prior points rather than over the entire trade-off curve, enabling tighter utility-privacy trade-offs.

Contribution

Secret-protected clustering method

The authors propose a clustering-based method that detects sensitive attributes and forms representative centers by updating public clusters with noisy private data. This approach reduces computational complexity from O(M*N_syn) to O(K*N_syn), where K is much smaller than M, enabling scalability to larger datasets.

Contribution

Theoretical formalization of secret protection for text generation

The authors provide a theoretical framework showing that their method satisfies (p,r)-secret protection, which is a relaxation of Gaussian differential privacy. This formalization bounds the reconstruction success probability calibrated to specific secrets rather than enforcing uniform protection across all records.