Modeling the Density of Pixel-level Self-supervised Embeddings for Unsupervised Pathology Segmentation in Medical CT

ICLR 2026 Conference SubmissionAnonymous Authors
Unsupervised Visual Anomaly SegmentationSelf-supervised learningDensity estimationComputed Tomography
Abstract:

Accurate detection of all pathological findings in 3D medical images remains a significant challenge, as supervised models are limited to detecting only the few pathology classes annotated in existing datasets. To address this, we frame pathology detection as an unsupervised visual anomaly segmentation (UVAS) problem, leveraging the inherent rarity of pathological patterns compared to healthy ones. We enhance the existing density-based UVAS framework with two key innovations: (1) dense self-supervised learning for feature extraction, eliminating the need for supervised pretraining, and (2) learned, masking-invariant dense features as conditioning variables, replacing hand-crafted positional encodings. Trained on over 30,000 unlabeled 3D CT volumes, our fully self-supervised model, Screener, outperforms existing UVAS methods on four large-scale test datasets comprising 1,820 scans with diverse pathologies. Furthermore, in a low-shot supervised fine-tuning setting, Screener surpasses existing self-supervised pretraining methods, establishing it as a state-of-the-art foundation for pathology segmentation. The code and pretrained models will be made publicly available.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper introduces Screener, a fully self-supervised model for pathology detection in 3D CT volumes, framed as unsupervised visual anomaly segmentation. It sits within the 'Density-Based Anomaly Detection with Self-Supervised Features' leaf of the taxonomy, which contains only three papers total. This is a relatively sparse research direction compared to broader categories like supervised lesion segmentation or contrastive pretraining. The core innovation lies in combining dense self-supervised feature learning with density-based anomaly modeling, eliminating reliance on supervised pretraining or hand-crafted positional encodings.

The taxonomy reveals that neighboring approaches diverge in their anomaly detection strategies. The sibling leaf 'Diffusion and Generative Model-Based Anomaly Segmentation' uses reconstruction error from diffusion models or GANs, while 'Pseudo-Healthy Image Synthesis and Subtraction' generates synthetic healthy tissue for comparison. Screener's density-based approach contrasts with these generative methods by directly modeling normal tissue distributions in feature space. The broader 'Self-Supervised Representation Learning for Segmentation' branch focuses on pretraining for downstream tasks rather than direct anomaly detection, highlighting Screener's dual role as both a pretraining method and an end-to-end pathology detector.

Among 28 candidates examined across three contributions, none were found to clearly refute the paper's claims. The first contribution (dense self-supervised features for UVAS) examined 10 candidates with zero refutable matches, suggesting limited prior work directly combining these elements. The second contribution (learned masking-invariant conditioning) examined 8 candidates, also with no refutations, indicating novelty in replacing positional encodings with learned features. The third contribution (self-supervised pretraining via distillation) examined 10 candidates without refutation. This limited search scope means the analysis captures top semantic matches but may not reflect the full breadth of related work in medical imaging or computer vision.

Given the sparse taxonomy leaf and absence of refutations among examined candidates, the work appears to occupy a relatively unexplored niche within density-based anomaly detection for medical CT. However, the search examined only 28 papers from a field with at least 50 relevant works in the taxonomy. The novelty assessment is thus constrained by this limited scope and should be interpreted as indicative rather than definitive.

Taxonomy

Core-task Taxonomy Papers
50
3
Claimed Contributions
28
Contribution Candidate Papers Compared
0
Refutable Paper

Research Landscape Overview

Core task: unsupervised pathology segmentation in medical CT images. The field addresses the challenge of identifying and delineating pathological regions without relying on pixel-level annotations, which are expensive and time-consuming to obtain in clinical settings. The taxonomy reveals several complementary strategies: Anomaly-Based Pathology Segmentation leverages the idea that pathologies deviate from learned normal patterns, often combining reconstruction-based or density-based anomaly detection with self-supervised features (e.g., Diffusion Anomaly Segmentation[1], Screener Pathology[2]). Self-Supervised Representation Learning for Segmentation focuses on pretraining robust feature extractors through tasks like contrastive learning or masked prediction (e.g., Self Supervised 3D CT[20], Masked Attentive Predicting[25]). Unsupervised Domain Adaptation for CT Segmentation tackles distribution shifts across scanners or institutions, while Weakly-Supervised and Semi-Supervised Segmentation methods exploit partial labels or pseudo-labels. Supervised Pathology and Lesion Segmentation includes fully-annotated approaches for comparison, Specialized Clinical Applications and Risk Prediction targets organ-specific or prognostic tasks (e.g., Lung Cancer Risk[10], Stroke Perfusion[19]), and Traditional and Hybrid Segmentation Methods encompass classical techniques and their modern integrations. A particularly active line of work explores density-based anomaly detection combined with self-supervised feature learning, where methods model the distribution of normal tissue and flag deviations as pathological. Pixel Density Pathology[0] exemplifies this approach by using pixel-level density estimation on self-supervised representations, closely aligning with Screener Pathology[2] and Screener 3D[8], which similarly detect anomalies through learned normality models. These methods contrast with reconstruction-based techniques like Diffusion Anomaly Segmentation[1] or Synthetic Healthy Subtraction[21], which generate healthy reference images and identify pathology via image differencing. A key trade-off emerges between density-based approaches, which can be more sensitive to subtle distributional shifts, and reconstruction-based methods, which provide more interpretable visual evidence of abnormality. Open questions include how to best integrate anatomical priors, handle rare pathologies with extreme appearance variability, and bridge the gap between unsupervised detection and clinically actionable segmentation boundaries.

Claimed Contributions

Dense self-supervised features for density-based UVAS

The authors propose using dense self-supervised learning to pretrain a descriptor model that produces discriminative feature maps for CT images, eliminating the need for supervised pretraining in the density-based unsupervised visual anomaly segmentation framework. This enables a fully self-supervised UVAS approach suitable for domains with limited labeled data.

10 retrieved papers
Learned masking-invariant conditioning variables

The authors introduce a self-supervised condition model that learns pixel-wise contextual embeddings which are invariant to image masking, replacing hand-crafted positional encodings. These learned conditioning variables capture global characteristics like anatomical position while remaining agnostic to local pathology presence, thereby simplifying density estimation.

8 retrieved papers
Novel self-supervised pretraining method via distillation

The authors develop a distillation procedure that transfers knowledge from the pretrained modular Screener pipeline into a single UNet architecture, enabling end-to-end supervised fine-tuning. This establishes Screener as a competitive self-supervised pretraining approach for pathology segmentation tasks.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Dense self-supervised features for density-based UVAS

The authors propose using dense self-supervised learning to pretrain a descriptor model that produces discriminative feature maps for CT images, eliminating the need for supervised pretraining in the density-based unsupervised visual anomaly segmentation framework. This enables a fully self-supervised UVAS approach suitable for domains with limited labeled data.

Contribution

Learned masking-invariant conditioning variables

The authors introduce a self-supervised condition model that learns pixel-wise contextual embeddings which are invariant to image masking, replacing hand-crafted positional encodings. These learned conditioning variables capture global characteristics like anatomical position while remaining agnostic to local pathology presence, thereby simplifying density estimation.

Contribution

Novel self-supervised pretraining method via distillation

The authors develop a distillation procedure that transfers knowledge from the pretrained modular Screener pipeline into a single UNet architecture, enabling end-to-end supervised fine-tuning. This establishes Screener as a competitive self-supervised pretraining approach for pathology segmentation tasks.

Modeling the Density of Pixel-level Self-supervised Embeddings for Unsupervised Pathology Segmentation in Medical CT | Novelty Validation