ASMIL: Attention-Stabilized Multiple Instance Learning for Whole-Slide Imaging

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 6.0 Download Report PDF

Whole slide imageMultiple instance learning

Attention-based multiple instance learning (MIL) has emerged as a powerful framework for whole slide image (WSI) diagnosis, leveraging attention to aggregate instance-level features into bag-level predictions. Despite this success, we find that such methods exhibit a new failure mode: unstable attention dynamics. Across four representative attention-based MIL methods and two public WSI datasets, we observe that attention distributions oscillate across epochs rather than converging to a consistent pattern, degrading performance. This instability adds to two previously reported challenges: overfitting and over-concentrated attention distribution. To simultaneously overcome these three limitations, we introduce attention-stabilized multiple instance learning (ASMIL), a novel unified framework. ASMIL uses an anchor model to stabilize attention, replaces softmax with a normalized sigmoid function in the anchor to prevent over-concentration, and applies token random dropping to mitigate overfitting. Extensive experiments demonstrate that ASMIL achieves up to a 6.49% F1 score improvement over state-of-the-art methods. Moreover, integrating the anchor model and normalized sigmoid into existing attention-based MIL methods consistently boosts their performance, with F1 score gains up to 10.73%. All code and data are publicly available at https://anonymous.4open.science/r/ASMIL-5018/.

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper introduces ASMIL, a framework addressing three challenges in attention-based MIL for WSI diagnosis: unstable attention dynamics, overfitting, and over-concentrated attention. It resides in the 'Attention Refinement and Localization Improvement' leaf, which contains four papers including the original work. This leaf sits within the broader 'Attention Mechanism Design and Enhancement' branch, one of ten major research directions in a taxonomy spanning fifty papers. The leaf represents a moderately active research area focused specifically on correcting and refining attention mechanisms, distinct from general attention architectures or hierarchical modeling approaches.

The taxonomy reveals neighboring research directions that share overlapping concerns but pursue different strategies. Adjacent leaves include 'Attention Regularization and Entropy-Based Methods' (one paper using entropy maximization), 'Channel and Multi-Dimensional Attention' (two papers on cross-channel dependencies), and 'Top-K and Selective Attention Mechanisms' (two papers on instance selection). The sibling papers in the same leaf—Focus your attention, Attention-Challenging MIL, and AEM—all target attention quality improvement but through distinct mechanisms like spatial constraints or error mitigation. ASMIL's anchor-based stabilization approach represents a different technical path within this shared goal of refining attention localization and preventing degradation.

Among twenty-four candidates examined via semantic search, the contribution-level analysis reveals mixed novelty signals. The identification of unstable attention dynamics examined five candidates with zero refutations, suggesting this diagnostic observation may be relatively fresh. However, the anchor model mechanism examined ten candidates and found two refutable overlaps, indicating prior work on attention stabilization exists within the limited search scope. The normalized sigmoid function examined nine candidates with one refutation, pointing to some precedent for addressing over-concentration. These statistics reflect a targeted literature search, not exhaustive coverage, and suggest the technical components have varying degrees of prior exploration.

Based on the limited search of twenty-four semantically similar papers, ASMIL appears to combine known attention refinement strategies in a novel configuration targeting a specific failure mode. The unstable dynamics observation seems less explored, while the stabilization and normalization techniques show partial overlap with existing work. The taxonomy context indicates this sits in an active but not overcrowded research direction, with room for incremental contributions that integrate multiple refinement strategies. A broader literature search beyond top-K semantic matches would be needed to assess whether the specific combination and empirical validation represent a substantive advance.

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: whole slide image diagnosis using attention-based multiple instance learning. The field addresses the challenge of classifying gigapixel pathology images by treating each slide as a bag of smaller patches (instances) and learning which patches are diagnostically relevant. The taxonomy reveals a rich landscape organized around ten major branches. Attention Mechanism Design and Enhancement focuses on refining how models weight informative patches, including localization improvements and novel attention formulations. Transformer-Based MIL Architectures such as TransMIL[7] and HiViT[3] leverage self-attention for richer contextual modeling. Instance Selection and Hard Example Mining targets the identification of critical or challenging patches, while Feature Representation and Aggregation explores how to combine patch-level embeddings into slide-level predictions. Training Strategies and Learning Paradigms encompass diverse supervision schemes, from pseudo-labeling approaches like PAMIL[11] to iterative refinement methods. Hierarchical and Multi-Scale MIL methods capture tissue structure at multiple resolutions, and Domain-Specific extensions tailor architectures to particular cancer types or clinical tasks. Generalization and Robustness Enhancement addresses overfitting and domain shift, Interactive and Adaptive frameworks incorporate feedback or dynamic selection, and Comparative Studies provide empirical benchmarks across methods. Several active lines of work highlight ongoing trade-offs between model complexity, interpretability, and generalization. Attention refinement methods like Focus your attention[4] and Attention-Challenging Multiple Instance Learning[6] aim to sharpen localization and reduce noise in attention maps, a concern shared by AEM[17] which emphasizes error mitigation. ASMIL[0] sits within this Attention Refinement and Localization Improvement cluster, addressing similar goals of improving attention quality and diagnostic precision. Compared to neighbors like Focus your attention[4], which may emphasize spatial constraints, or AEM[17], which targets error-aware mechanisms, ASMIL[0] offers its own strategy for refining attention to better isolate relevant tissue regions. Meanwhile, transformer-based approaches and hierarchical methods pursue complementary directions—capturing long-range dependencies or multi-scale context—illustrating the field's exploration of both local precision and global structure. Open questions remain around balancing attention sharpness with robustness, and integrating these refinements into clinically deployable systems.

Claimed Contributions

Identification and analysis of unstable attention dynamics in attention-based MIL

5 retrieved papers

The authors identify a previously overlooked failure mode where attention distributions in attention-based multiple instance learning oscillate across training epochs rather than converging to consistent patterns. They quantify this instability using Jensen-Shannon divergence and demonstrate its negative impact on performance and interpretability.

5 retrieved papers

Anchor model for stabilizing attention distributions

Can Refute

10 retrieved papers

The authors propose an anchor model that mirrors the attention block of the online model but is updated via exponential moving average rather than backpropagation. The online model is encouraged to align with the anchor's attention distribution through KL divergence minimization, providing stable training dynamics.

10 retrieved papers

Can Refute

Normalized sigmoid function to prevent attention over-concentration

Can Refute

9 retrieved papers

The authors introduce a normalized sigmoid function as a replacement for softmax in the anchor model to prevent over-concentrated attention distributions. They provide theoretical analysis showing that NSF achieves selective flattening of attention among informative tokens while suppressing weak ones, which cannot be achieved by softmax with a single temperature parameter.

9 retrieved papers

Can Refute

Core Task Comparisons

Comparisons with papers in the same taxonomy category

[4] Focus your attention: multiple instance learning with attention modification for whole slide pathological image classification PDF

Hailun Cheng, Shenjin Huang, Linghan Cai, Yangfan Xu, Runming Wang, Yong-Bing Zhang, Yongbing Zhang (2025)

[6] Attention-Challenging Multiple Instance Learning for Whole Slide Image Classification PDF

Yunlong Zhang, Honglin Li, Yunxuan Sun, Sunyi Zheng, Zhu Chenglu, Lin Yang (2023) • European Conference on Computer Vision

[17] AEM: Attention Entropy Maximization for Multiple Instance Learning Based Whole Slide Image Classification PDF

Yunlong Zhang, Honglin Li, Yuxuan Sun, Zhongyi Shui, Yunxuan Sun, Jingxiong Li, Zhu Chenglu, Lin Yang, Chenglu Zhu (2024) • International Conference on Medical Image Computing and Computer-Assisted Intervention

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Identification and analysis of unstable attention dynamics in attention-based MIL

[59] â¦ localization, and staging of breast cancer lymph node metastasis in digital pathology whole slide images using selective neighborhood attention-based deep learning PDF

Cannot Refute

[60] Harness Behavioural Analysis for Unpacking the Bio-Interpretability of Pathology Foundation Models PDF

Cannot Refute

[61] SG-MuRCL: Smoothed Graph-Enhanced Multi-Instance Contrastive Learning for Robust Whole-Slide Image Classification PDF

Cannot Refute

[62] Advanced AI for Histopathological Whole Slide Image Classification and Captioning PDF

Cannot Refute

[63] Distributed Parallel Gradient Stacking (DPGS): Solving Whole Slide Image Stacking Challenge in Multi-Instance Learning PDF

Cannot Refute

Contribution

Anchor model for stabilizing attention distributions

[5] Multiple Instance Learning Framework with Masked Hard Instance Mining for Gigapixel Histopathology Image Analysis PDF

Can Refute

[24] Multiple Instance Learning Framework with Masked Hard Instance Mining for Whole Slide Image Classification PDF

Can Refute

[51] Incorporating probabilistic domain knowledge into deep multiple instance learning PDF

Cannot Refute

[52] DMIA-MIL: Dual-Model Improved Attention Multiple Instance Learning for Classification of Histopathology WSI PDF

Cannot Refute

[53] Classification of Myopic Maculopathy Images with Self-supervised Driven Multiple Instance Learning Network PDF

Cannot Refute

[54] Enhanced attention guided TeacherâStudent network for weakly supervised object detection PDF

Cannot Refute

[55] BM-SMIL: A Breast Cancer Molecular Subtype Prediction Framework from H&E Slides with Self-supervised Pretraining and Multi-instance Learning PDF

Cannot Refute

[56] Fine-tuning a multiple instance learning feature extractor with masked context modelling and knowledge distillation PDF

Cannot Refute

[57] Deep learning-based pathology signature could reveal lymph node status and act as a novel prognostic marker across multiple cancer types PDF

Cannot Refute

[58] Smile: Sparse-attention based multiple instance contrastive learning for glioma sub-type classification using pathological images PDF

Cannot Refute

Contribution

Normalized sigmoid function to prevent attention over-concentration

[64] Sliding Window Attention Training for Efficient Large Language Models PDF

Can Refute

[65] When attention sink emerges in language models: An empirical view PDF

Cannot Refute

[66] Sigmoid Self-Attention has Lower Sample Complexity than Softmax Self-Attention: A Mixture-of-Experts Perspective PDF

Cannot Refute

[67] AdaMCT: Adaptive Mixture of CNN-Transformer for Sequential Recommendation PDF

Cannot Refute

[68] Supervised Domain Enablement Attention for Personalized Domain Classification PDF

Cannot Refute

[69] Beyond Softmax: Dual-Branch Sigmoid Architecture for Accurate Class Activation Maps PDF

Cannot Refute

[70] Theory, Analysis, and Best Practices for Sigmoid Self-Attention PDF

Cannot Refute

[71] It's not about the Journey; It's about the Destination: Following Soft Paths under Question-Guidance for Visual Reasoning Supplemental Material PDF

Cannot Refute

[72] Elnet: An Efficient Lightweight Network for Multi-Organ Segmentation PDF

Cannot Refute

ASMIL: Attention-Stabilized Multiple Instance Learning for Whole-Slide Imaging

Overview

Overall Novelty Assessment

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

[4] Focus your attention: multiple instance learning with attention modification for whole slide pathological image classification PDF

[6] Attention-Challenging Multiple Instance Learning for Whole Slide Image Classification PDF

[17] AEM: Attention Entropy Maximization for Multiple Instance Learning Based Whole Slide Image Classification PDF

Contribution Analysis

Identification and analysis of unstable attention dynamics in attention-based MIL

[59] â¦ localization, and staging of breast cancer lymph node metastasis in digital pathology whole slide images using selective neighborhood attention-based deep learning PDF

[60] Harness Behavioural Analysis for Unpacking the Bio-Interpretability of Pathology Foundation Models PDF

[61] SG-MuRCL: Smoothed Graph-Enhanced Multi-Instance Contrastive Learning for Robust Whole-Slide Image Classification PDF

[62] Advanced AI for Histopathological Whole Slide Image Classification and Captioning PDF

[63] Distributed Parallel Gradient Stacking (DPGS): Solving Whole Slide Image Stacking Challenge in Multi-Instance Learning PDF

Anchor model for stabilizing attention distributions

[5] Multiple Instance Learning Framework with Masked Hard Instance Mining for Gigapixel Histopathology Image Analysis PDF

[24] Multiple Instance Learning Framework with Masked Hard Instance Mining for Whole Slide Image Classification PDF

[51] Incorporating probabilistic domain knowledge into deep multiple instance learning PDF

[52] DMIA-MIL: Dual-Model Improved Attention Multiple Instance Learning for Classification of Histopathology WSI PDF

[53] Classification of Myopic Maculopathy Images with Self-supervised Driven Multiple Instance Learning Network PDF

[54] Enhanced attention guided TeacherâStudent network for weakly supervised object detection PDF

[55] BM-SMIL: A Breast Cancer Molecular Subtype Prediction Framework from H&E Slides with Self-supervised Pretraining and Multi-instance Learning PDF

[56] Fine-tuning a multiple instance learning feature extractor with masked context modelling and knowledge distillation PDF

[57] Deep learning-based pathology signature could reveal lymph node status and act as a novel prognostic marker across multiple cancer types PDF

[58] Smile: Sparse-attention based multiple instance contrastive learning for glioma sub-type classification using pathological images PDF

Normalized sigmoid function to prevent attention over-concentration

[64] Sliding Window Attention Training for Efficient Large Language Models PDF

[65] When attention sink emerges in language models: An empirical view PDF

[66] Sigmoid Self-Attention has Lower Sample Complexity than Softmax Self-Attention: A Mixture-of-Experts Perspective PDF

[67] AdaMCT: Adaptive Mixture of CNN-Transformer for Sequential Recommendation PDF

[68] Supervised Domain Enablement Attention for Personalized Domain Classification PDF

[69] Beyond Softmax: Dual-Branch Sigmoid Architecture for Accurate Class Activation Maps PDF

[70] Theory, Analysis, and Best Practices for Sigmoid Self-Attention PDF

[71] It's not about the Journey; It's about the Destination: Following Soft Paths under Question-Guidance for Visual Reasoning Supplemental Material PDF

[72] Elnet: An Efficient Lightweight Network for Multi-Organ Segmentation PDF

Table of Contents

[59] â¦ localization, and staging of breast cancer lymph node metastasis in digital pathology whole slide images using selective neighborhood attention-based deep learning PDF

[54] Enhanced attention guided TeacherâStudent network for weakly supervised object detection PDF