Knowing When to Quit: Probabilistic Early Exits for Speech Separation Networks

ICLR 2026 Conference SubmissionAnonymous Authors
speech separationspeech enhancementdeep learningearly exitdynamic neural networks
Abstract:

In recent years, deep learning-based single-channel speech separation has improved considerably, in large part driven by increasingly compute- and parameter-efficient neural network architectures. Most such architectures are, however, designed with a fixed compute and parameter budget, and consequently cannot scale to varying compute demands or resources, which limits their use in embedded and heterogeneous devices such as mobile phones and hearables. To enable such use-cases we design a neural network architecture for speech separation and enhancement capable of early-exit, and we propose an uncertainty-aware probabilistic framework to jointly model the clean speech signal and error variance which we use to derive probabilistic early-exit conditions in terms of desired signal-to-noise ratios. We evaluate our methods on both speech separation and enhancement tasks where we demonstrate that early-exit capabilities can be introduced without compromising reconstruction, and that our early-exit conditions are well-calibrated on training data and can easily be post-calibrated on validation data, leading to large energy savings when used with early-exit over single-exit baselines. Our framework enables fine-grained dynamic compute-scaling of neural networks while achieving state-of-the-art performance and interpretable exit conditions.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper introduces an uncertainty-aware probabilistic framework for early-exit speech separation, positioning itself within the 'Probabilistic Early Exit for Speech Separation' leaf of the taxonomy. This leaf contains only two papers total, including the original work, indicating a relatively sparse research direction. The core contribution combines dynamic compute scaling with probabilistic modeling of clean speech and error variance, enabling SNR-based exit conditions. This sits at the intersection of adaptive computation and speech separation, addressing real-time deployment constraints through principled uncertainty quantification rather than fixed-architecture approaches.

The taxonomy reveals that early exit mechanisms for speech separation remain an emerging area, with neighboring leaves exploring self-supervised early exit for speech representations and multi-channel spatial processing. The broader 'Early Exit Mechanisms for Adaptive Computation' branch contains only four papers across three leaves, suggesting limited prior exploration of adaptive inference in this domain. Related work in 'Neural Architecture Design' focuses on temporal modeling and speaker-informed networks without dynamic compute capabilities, while 'Scalable Training and Inference Frameworks' addresses multi-loss supervision rather than inference-time adaptation. The paper's probabilistic approach diverges from deterministic layer-wise classifiers common in other domains.

Among 21 candidates examined across three contributions, none were found to clearly refute the proposed methods. The uncertainty-aware probabilistic framework examined 10 candidates with no refutable overlap, the PRESS-Net architecture examined 10 candidates with similar results, and the SNR-based exit conditions examined 1 candidate. This limited search scope suggests the specific combination of probabilistic uncertainty modeling with early-exit speech separation has minimal direct precedent in the examined literature. The framework contribution appears most distinctive, while the architecture and exit conditions build upon this probabilistic foundation with less extensive prior work overlap.

Based on the limited search of 21 semantically-related candidates, the work appears to occupy a relatively unexplored niche combining probabilistic uncertainty quantification with adaptive inference for speech separation. The sparse taxonomy leaf and absence of refutable candidates suggest novelty, though the analysis does not cover exhaustive literature review or adjacent fields like general early-exit methods in computer vision or NLP that might inform similar approaches.

Taxonomy

Core-task Taxonomy Papers
11
3
Claimed Contributions
21
Contribution Candidate Papers Compared
0
Refutable Paper

Research Landscape Overview

Core task: dynamic compute scaling for speech separation using early exit. The field of speech separation has evolved to address both accuracy and computational efficiency, with the taxonomy reflecting several complementary research directions. The Early Exit Mechanisms branch explores adaptive computation strategies that allow models to terminate processing when confidence thresholds are met, reducing latency for easier inputs while maintaining quality for harder cases. Scalable Training and Inference Frameworks focus on system-level optimizations and distributed approaches that enable practical deployment. Neural Architecture Design encompasses foundational model structures like Dual-path RNN[1] and specialized topologies that balance separation quality with parameter efficiency. Regularization and Generalization Approaches address overfitting and domain robustness, while Spectral Factorization and Optimization Methods tackle the mathematical foundations of source separation through signal processing techniques. Within the Early Exit Mechanisms branch, a small handful of works have begun exploring probabilistic and confidence-based strategies for adaptive inference. Probabilistic Early Exits[0] introduces a framework where exit decisions are guided by uncertainty estimates, allowing the model to dynamically allocate computation based on input difficulty. This approach contrasts with deterministic early exit schemes like Early Exit Transformer[3], which rely on fixed layer-wise classifiers, and complements recent efforts such as Knowing When Quit[9] that explore stopping criteria in related domains. The original paper sits at the intersection of adaptive computation and speech separation, addressing the challenge of balancing real-time constraints with separation quality—a concern also present in works like Low-latency Deep Clustering[4] and Low-Latency Single-Channel[11], though those typically employ fixed architectures rather than dynamic exit strategies. The probabilistic framing offers a principled way to navigate the accuracy-efficiency trade-off that remains central to practical speech separation systems.

Claimed Contributions

Uncertainty-aware probabilistic framework for early-exit speech separation

The authors introduce a probabilistic modeling approach that jointly predicts both the clean speech signal and the variance of the prediction error using a Student t-likelihood. This framework enables deriving interpretable early-exit conditions based on desired signal-to-noise ratios with quantified uncertainty.

10 retrieved papers
PRESS-Net architecture for early-exit speech separation

The authors propose a new neural network architecture based on linear recurrent neural networks designed to achieve state-of-the-art reconstruction performance while supporting high-quality reconstructions from multiple early-exit points throughout the network depth.

10 retrieved papers
Probabilistic SNR-like early-exit conditions

The authors derive multiple probabilistic signal-to-noise ratio distributions (SNR, SNRi, and a unified exit-SNR) from their probabilistic framework that provide interpretable and calibratable conditions for deciding when to exit computation based on desired performance levels.

1 retrieved paper

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Uncertainty-aware probabilistic framework for early-exit speech separation

The authors introduce a probabilistic modeling approach that jointly predicts both the clean speech signal and the variance of the prediction error using a Student t-likelihood. This framework enables deriving interpretable early-exit conditions based on desired signal-to-noise ratios with quantified uncertainty.

Contribution

PRESS-Net architecture for early-exit speech separation

The authors propose a new neural network architecture based on linear recurrent neural networks designed to achieve state-of-the-art reconstruction performance while supporting high-quality reconstructions from multiple early-exit points throughout the network depth.

Contribution

Probabilistic SNR-like early-exit conditions

The authors derive multiple probabilistic signal-to-noise ratio distributions (SNR, SNRi, and a unified exit-SNR) from their probabilistic framework that provide interpretable and calibratable conditions for deciding when to exit computation based on desired performance levels.