Knowing When to Quit: Probabilistic Early Exits for Speech Separation Networks

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 5.6 Download Report PDF

speech separationspeech enhancementdeep learningearly exitdynamic neural networks

In recent years, deep learning-based single-channel speech separation has improved considerably, in large part driven by increasingly compute- and parameter-efficient neural network architectures. Most such architectures are, however, designed with a fixed compute and parameter budget, and consequently cannot scale to varying compute demands or resources, which limits their use in embedded and heterogeneous devices such as mobile phones and hearables. To enable such use-cases we design a neural network architecture for speech separation and enhancement capable of early-exit, and we propose an uncertainty-aware probabilistic framework to jointly model the clean speech signal and error variance which we use to derive probabilistic early-exit conditions in terms of desired signal-to-noise ratios. We evaluate our methods on both speech separation and enhancement tasks where we demonstrate that early-exit capabilities can be introduced without compromising reconstruction, and that our early-exit conditions are well-calibrated on training data and can easily be post-calibrated on validation data, leading to large energy savings when used with early-exit over single-exit baselines. Our framework enables fine-grained dynamic compute-scaling of neural networks while achieving state-of-the-art performance and interpretable exit conditions.

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper introduces an uncertainty-aware probabilistic framework for early-exit speech separation, positioning itself within the 'Probabilistic Early Exit for Speech Separation' leaf of the taxonomy. This leaf contains only two papers total, including the original work, indicating a relatively sparse research direction. The core contribution combines dynamic compute scaling with probabilistic modeling of clean speech and error variance, enabling SNR-based exit conditions. This sits at the intersection of adaptive computation and speech separation, addressing real-time deployment constraints through principled uncertainty quantification rather than fixed-architecture approaches.

The taxonomy reveals that early exit mechanisms for speech separation remain an emerging area, with neighboring leaves exploring self-supervised early exit for speech representations and multi-channel spatial processing. The broader 'Early Exit Mechanisms for Adaptive Computation' branch contains only four papers across three leaves, suggesting limited prior exploration of adaptive inference in this domain. Related work in 'Neural Architecture Design' focuses on temporal modeling and speaker-informed networks without dynamic compute capabilities, while 'Scalable Training and Inference Frameworks' addresses multi-loss supervision rather than inference-time adaptation. The paper's probabilistic approach diverges from deterministic layer-wise classifiers common in other domains.

Among 21 candidates examined across three contributions, none were found to clearly refute the proposed methods. The uncertainty-aware probabilistic framework examined 10 candidates with no refutable overlap, the PRESS-Net architecture examined 10 candidates with similar results, and the SNR-based exit conditions examined 1 candidate. This limited search scope suggests the specific combination of probabilistic uncertainty modeling with early-exit speech separation has minimal direct precedent in the examined literature. The framework contribution appears most distinctive, while the architecture and exit conditions build upon this probabilistic foundation with less extensive prior work overlap.

Based on the limited search of 21 semantically-related candidates, the work appears to occupy a relatively unexplored niche combining probabilistic uncertainty quantification with adaptive inference for speech separation. The sparse taxonomy leaf and absence of refutable candidates suggest novelty, though the analysis does not cover exhaustive literature review or adjacent fields like general early-exit methods in computer vision or NLP that might inform similar approaches.

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: dynamic compute scaling for speech separation using early exit. The field of speech separation has evolved to address both accuracy and computational efficiency, with the taxonomy reflecting several complementary research directions. The Early Exit Mechanisms branch explores adaptive computation strategies that allow models to terminate processing when confidence thresholds are met, reducing latency for easier inputs while maintaining quality for harder cases. Scalable Training and Inference Frameworks focus on system-level optimizations and distributed approaches that enable practical deployment. Neural Architecture Design encompasses foundational model structures like Dual-path RNN[1] and specialized topologies that balance separation quality with parameter efficiency. Regularization and Generalization Approaches address overfitting and domain robustness, while Spectral Factorization and Optimization Methods tackle the mathematical foundations of source separation through signal processing techniques. Within the Early Exit Mechanisms branch, a small handful of works have begun exploring probabilistic and confidence-based strategies for adaptive inference. Probabilistic Early Exits[0] introduces a framework where exit decisions are guided by uncertainty estimates, allowing the model to dynamically allocate computation based on input difficulty. This approach contrasts with deterministic early exit schemes like Early Exit Transformer[3], which rely on fixed layer-wise classifiers, and complements recent efforts such as Knowing When Quit[9] that explore stopping criteria in related domains. The original paper sits at the intersection of adaptive computation and speech separation, addressing the challenge of balancing real-time constraints with separation quality—a concern also present in works like Low-latency Deep Clustering[4] and Low-Latency Single-Channel[11], though those typically employ fixed architectures rather than dynamic exit strategies. The probabilistic framing offers a principled way to navigate the accuracy-efficiency trade-off that remains central to practical speech separation systems.

Claimed Contributions

Uncertainty-aware probabilistic framework for early-exit speech separation

10 retrieved papers

The authors introduce a probabilistic modeling approach that jointly predicts both the clean speech signal and the variance of the prediction error using a Student t-likelihood. This framework enables deriving interpretable early-exit conditions based on desired signal-to-noise ratios with quantified uncertainty.

10 retrieved papers

PRESS-Net architecture for early-exit speech separation

10 retrieved papers

The authors propose a new neural network architecture based on linear recurrent neural networks designed to achieve state-of-the-art reconstruction performance while supporting high-quality reconstructions from multiple early-exit points throughout the network depth.

10 retrieved papers

Probabilistic SNR-like early-exit conditions

1 retrieved paper

The authors derive multiple probabilistic signal-to-noise ratio distributions (SNR, SNRi, and a unified exit-SNR) from their probabilistic framework that provide interpretable and calibratable conditions for deciding when to exit computation based on desired performance levels.

1 retrieved paper

Core Task Comparisons

Comparisons with papers in the same taxonomy category

[9] Knowing When to Quit: Probabilistic Early Exits for Speech Separation PDF

Ostergaard Mads, Jensen, BjÃ¸rn Sand, MÃ¸rup, Morten (2025) • arXiv.org

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Uncertainty-aware probabilistic framework for early-exit speech separation

[9] Knowing When to Quit: Probabilistic Early Exits for Speech Separation PDF

Cannot Refute

[21] Audio Source Separation: Advances and Challenges PDF

Cannot Refute

[22] Informed audio source separation with deep learning in limited data settings PDF

Cannot Refute

[23] Integrating uncertainty into neural network-based speech enhancement PDF

Cannot Refute

[24] Estimation of Output SI-SDR of Speech Signals Separated From Noisy Input by Conv-Tasnet PDF

Cannot Refute

[25] 3D Neural Beamforming for Multi-channel Speech Separation Against Location Uncertainty PDF

Cannot Refute

[26] A Bayesian Hierarchical Model for Blind Audio Source Separation PDF

Cannot Refute

[27] Auxiliary-Function-Based Independent Vector Analysis Using Generalized Inter-Clique Dependence Source Models With Clique Variance Estimation PDF

Cannot Refute

[28] A unified probabilistic view on spatially informed source separation and extraction based on independent vector analysis PDF

Cannot Refute

[29] Probabilistic permutation invariant training for speech separation PDF

Cannot Refute

Contribution

PRESS-Net architecture for early-exit speech separation

[2] DAISY: Data Adaptive Self-Supervised Early Exit for Speech Representation Models PDF

Cannot Refute

[12] Towards a flexible and unified architecture for speech enhancement PDF

Cannot Refute

[13] VoiceFilter-Lite: Streaming targeted voice separation for on-device speech recognition PDF

Cannot Refute

[14] Dynamic Slimmable Network for Speech Separation PDF

Cannot Refute

[15] Selector-enhancer: learning dynamic selection of local and non-local attention operation for speech enhancement PDF

Cannot Refute

[16] Latent iterative refinement for modular source separation PDF

Cannot Refute

[17] Temporally Dynamic Spiking Transformer Network for Speech Enhancement PDF

Cannot Refute

[18] Cross-Modal Knowledge Distillation With Multi-Stage Adaptive Feature Fusion for Speech Separation PDF

Cannot Refute

[19] Continual audio-visual sound separation PDF

Cannot Refute

[20] Enhancing the MUSE Speech Enhancement Framework with Mamba-Based Architecture and Extended Loss Functions PDF

Cannot Refute

Contribution

Probabilistic SNR-like early-exit conditions

[9] Knowing When to Quit: Probabilistic Early Exits for Speech Separation PDF

Cannot Refute

Knowing When to Quit: Probabilistic Early Exits for Speech Separation Networks

Overview

Overall Novelty Assessment

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

[9] Knowing When to Quit: Probabilistic Early Exits for Speech Separation PDF

Contribution Analysis

Uncertainty-aware probabilistic framework for early-exit speech separation

[9] Knowing When to Quit: Probabilistic Early Exits for Speech Separation PDF

[21] Audio Source Separation: Advances and Challenges PDF

[22] Informed audio source separation with deep learning in limited data settings PDF

[23] Integrating uncertainty into neural network-based speech enhancement PDF

[24] Estimation of Output SI-SDR of Speech Signals Separated From Noisy Input by Conv-Tasnet PDF

[25] 3D Neural Beamforming for Multi-channel Speech Separation Against Location Uncertainty PDF

[26] A Bayesian Hierarchical Model for Blind Audio Source Separation PDF

[27] Auxiliary-Function-Based Independent Vector Analysis Using Generalized Inter-Clique Dependence Source Models With Clique Variance Estimation PDF

[28] A unified probabilistic view on spatially informed source separation and extraction based on independent vector analysis PDF

[29] Probabilistic permutation invariant training for speech separation PDF

PRESS-Net architecture for early-exit speech separation

[2] DAISY: Data Adaptive Self-Supervised Early Exit for Speech Representation Models PDF

[12] Towards a flexible and unified architecture for speech enhancement PDF

[13] VoiceFilter-Lite: Streaming targeted voice separation for on-device speech recognition PDF

[14] Dynamic Slimmable Network for Speech Separation PDF

[15] Selector-enhancer: learning dynamic selection of local and non-local attention operation for speech enhancement PDF

[16] Latent iterative refinement for modular source separation PDF

[17] Temporally Dynamic Spiking Transformer Network for Speech Enhancement PDF

[18] Cross-Modal Knowledge Distillation With Multi-Stage Adaptive Feature Fusion for Speech Separation PDF

[19] Continual audio-visual sound separation PDF

[20] Enhancing the MUSE Speech Enhancement Framework with Mamba-Based Architecture and Extended Loss Functions PDF

Probabilistic SNR-like early-exit conditions

[9] Knowing When to Quit: Probabilistic Early Exits for Speech Separation PDF

Table of Contents