MRAD: Zero-Shot Anomaly Detection with Memory-Driven Retrieval

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 6.0 Download Report PDF

Anomaly detectionZero-shot anomaly detectionMemory retrievalCLIP

Zero-shot anomaly detection (ZSAD) often leverages pretrained vision or vision-language models, but many existing methods use prompt learning or complex modeling to fit the data distribution, resulting in high training or inference cost and limited cross-domain stability. To address these limitations, we propose Memory-Retrieval Anomaly Detection method (MRAD), a unified framework that replaces parametric fitting with a direct memory retrieval. The train-free base model, MRAD-TF, freezes the CLIP image encoder and constructs a two-level memory bank (image-level and pixel-level) from auxiliary data, where feature-label pairs are explicitly stored as keys and values. During inference, anomaly scores are obtained directly by similarity retrieval over the memory bank. Based on the MRAD-TF, we further propose two lightweight variants as enhancements: (i) MRAD-FT fine-tunes the retrieval metric with two linear layers to enhance the discriminability between normal and anomaly; (ii) MRAD-CLIP injects the normal and anomalous region priors from the MRAD-FT as dynamic biases into CLIP's learnable text prompts, strengthening generalization to unseen categories. Across 16 industrial and medical datasets, the MRAD framework consistently demonstrates superior performance in anomaly classification and segmentation, under both train-free and training-based settings. Our work shows that fully leveraging the empirical distribution of raw data, rather than relying only on model fitting, can achieve stronger anomaly detection performance. Code will be released.

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper proposes MRAD, a memory-retrieval framework for zero-shot anomaly detection that replaces parametric fitting with direct similarity-based retrieval from two-level memory banks. It resides in the 'Pseudo-Anomaly and Correlation-Weighted Approaches' leaf under Vision-Language Model-Based Industrial Anomaly Detection, alongside two sibling papers. This leaf represents a focused research direction within the broader taxonomy of 25 papers across multiple modalities, suggesting a moderately active but not overcrowded area where CLIP-based industrial defect detection methods explore different strategies for zero-shot generalization.

The taxonomy reveals that MRAD's leaf sits within a larger branch of Vision-Language Model-Based Industrial Anomaly Detection, which also includes Multi-Scale Memory Comparison Frameworks and Additive Manufacturing Anomaly Detection. Neighboring branches address Video Anomaly Detection with Temporal Memory and Log Anomaly Detection with Retrieval Augmentation, indicating that memory-driven retrieval is a cross-cutting theme across modalities. The scope note for MRAD's leaf emphasizes pseudo-anomaly generation and correlation weighting, while explicitly excluding multi-scale memory comparison methods that appear in a sibling leaf, suggesting MRAD's single-scale retrieval approach occupies a distinct methodological niche.

Among the three contributions analyzed, the core MRAD framework examined ten candidates and found one refutable prior work, indicating some overlap in the memory-retrieval paradigm within the limited search scope. The MRAD-FT variant examined four candidates with no clear refutations, suggesting its lightweight fine-tuning approach may be more novel. The MRAD-CLIP variant examined ten candidates and found two refutable instances, implying that region-prior-guided dynamic prompts have more substantial prior exploration. These statistics reflect a search of 24 total candidates, not an exhaustive literature review, so the presence of refutable work indicates overlap within this specific sample rather than definitive lack of novelty.

Based on the limited search scope of 24 semantically similar candidates, the work appears to offer incremental refinements to memory-driven retrieval in zero-shot anomaly detection, with the MRAD-FT variant showing the least prior overlap. The taxonomy structure suggests the paper operates in a moderately explored area where CLIP-based industrial methods are actively being developed, though the specific combination of train-free retrieval and lightweight variants may differentiate it from existing pseudo-anomaly and correlation-weighted approaches.

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: zero-shot anomaly detection with memory-driven retrieval. The field organizes around several major branches that reflect different data modalities and application contexts. Vision-Language Model-Based Industrial Anomaly Detection leverages large-scale pretrained models (e.g., CLIP) to identify defects in manufacturing settings without task-specific training, often employing pseudo-anomaly synthesis or correlation-weighted strategies to bridge the gap between normal reference images and unseen anomalies. Video Anomaly Detection with Temporal Memory focuses on sequential data, using memory banks to capture normal motion and appearance patterns over time. Log Anomaly Detection with Retrieval Augmentation applies retrieval-augmented generation techniques to system logs, enabling zero-shot identification of operational faults. Unsupervised Memory-Based Anomaly Detection encompasses broader memory architectures that store prototypical normal features, while Multimodal Language Model-Driven Anomaly Detection integrates textual reasoning with visual or sensor inputs. Cross-Domain and Specialized Anomaly Detection addresses niche scenarios such as spectrum analysis or domain transfer, where memory retrieval helps generalize across diverse settings. Within Vision-Language Model-Based Industrial Anomaly Detection, a particularly active line of work explores pseudo-anomaly generation and correlation weighting to improve zero-shot performance. PA-CLIP[1] synthesizes artificial defects to guide the model's attention, while Correlation-Weighted Model[2] refines feature alignment between text prompts and visual patches. MRAD[0] sits squarely in this cluster, emphasizing memory-driven retrieval to dynamically select relevant normal exemplars and contrast them against test samples. Compared to PA-CLIP[1], which relies heavily on synthetic anomaly augmentation, MRAD[0] prioritizes retrieval mechanisms that adapt to varying defect types without explicit anomaly simulation. This approach contrasts with Correlation-Weighted Model[2], which focuses on optimizing prompt-image correlation rather than maintaining a structured memory bank. Across branches, a recurring theme is the trade-off between leveraging large pretrained models for generalization and designing specialized memory structures to capture domain-specific normality, with open questions around scalability and interpretability of retrieved references.

Claimed Contributions

MRAD framework with memory-driven retrieval paradigm

Can Refute

10 retrieved papers

The authors introduce MRAD, a framework that constructs a two-level memory bank (image-level and pixel-level) from auxiliary data and performs anomaly detection through direct similarity retrieval rather than parametric model fitting. This approach stores feature-label pairs explicitly and obtains anomaly scores via retrieval during inference.

10 retrieved papers

Can Refute

MRAD-FT variant with lightweight fine-tuning

4 retrieved papers

Building on the train-free base model, the authors propose MRAD-FT which adds only two linear layers to calibrate the retrieval metric. This lightweight fine-tuning improves discriminative ability for both classification and segmentation tasks while maintaining low training cost.

4 retrieved papers

MRAD-CLIP variant with region-prior-guided dynamic prompts

Can Refute

10 retrieved papers

The authors develop MRAD-CLIP which enhances traditional prompt learning by injecting normal and anomalous region priors from MRAD-FT into learnable CLIP text prompts as dynamic biases. This approach improves cross-modal alignment, anomaly localization, and generalization to unseen categories compared to conventional dynamic prompt methods.

10 retrieved papers

Can Refute

Core Task Comparisons

Comparisons with papers in the same taxonomy category

[1] PA-CLIP: Enhancing Zero-Shot Anomaly Detection through Pseudo-Anomaly Awareness PDF

Pan Yu-rui, Wang Lidong, Yurui Pan, Chen Yuchao, Lidong Wang, Zhu, Wenbing, Yuchao Chen, Peng Bo, Wenbing Zhu, Chi Mingmin, Bo Peng, Mingmin Chi (2025) • arXiv.org

[2] A Training-Free Correlation-Weighted Model for Zero-/Few-Shot Industrial Anomaly Detection with Retrieval Augmentation PDF

Wei Ran, Zefang Yu, Suncheng Xiang, Ting Liu, Yuzhuo Fu (2025) • IEEE International Conference on Acoustics, Speech, and Signal Processing

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

MRAD framework with memory-driven retrieval paradigm

[27] Memorizing normality to detect anomaly: Memory-augmented deep autoencoder for unsupervised anomaly detection PDF

Can Refute

[9] RAGLog: Log Anomaly Detection using Retrieval Augmented Generation PDF

Cannot Refute

[26] Visual anomaly detection via partition memory bank module and error estimation PDF

Cannot Refute

[28] FedDyMem: Efficient Federated Learning with Dynamic Memory and Memory-Reduce for Unsupervised Image Anomaly Detection PDF

Cannot Refute

[29] HyADS: A Hybrid Lightweight Anomaly Detection Framework for Edge-Based Industrial Systems with Limited Data PDF

Cannot Refute

[30] Appearance-Motion Memory Consistency Network for Video Anomaly Detection PDF

Cannot Refute

[31] Diffusion for out-of-distribution detection on road scenes and beyond PDF

Cannot Refute

[32] Learning Memory-guided Normality for Anomaly Detection PDF

Cannot Refute

[33] RAN Cortex: Memory-Augmented Intelligence for Context-Aware Decision-Making in AI-Native Networks PDF

Cannot Refute

[34] Anomaly detection with dual-stream memory network PDF

Cannot Refute

Contribution

MRAD-FT variant with lightweight fine-tuning

[45] CLIP-AD: A Language-Guided Staged Dual-Path Model for Zero-shot Anomaly Detection PDF

Cannot Refute

[46] Intelligent Anomaly Detection for Lane Rendering Using Transformer with Self-Supervised Pretraining and Customized Fine-Tuning PDF

Cannot Refute

[47] REMEMBER: Retrieval-based Explainable Multimodal Evidence-guided Modeling for Brain Evaluation and Reasoning in Zero-and Few-shot Neurodegenerative â¦ PDF

Cannot Refute

[48] LogTIW:A log anomaly detection model based on TF-IDF weighted semantic features PDF

Cannot Refute

Contribution

MRAD-CLIP variant with region-prior-guided dynamic prompts

[39] ViP-CLIP: Visual-Perception Prompting with Unified Alignment for Zero-Shot Anomaly Detection PDF

Can Refute

[43] Vcp-clip: A visual context prompting model for zero-shot anomaly segmentation PDF

Can Refute

[35] Kanoclip: Zero-shot anomaly detection through knowledge-driven prompt learning and enhanced cross-modal integration PDF

Cannot Refute

[36] AdaptCLIP: Adapting CLIP for Universal Visual Anomaly Detection PDF

Cannot Refute

[37] GenCLIP: Generalizing CLIP Prompts for Zero-shot Anomaly Detection PDF

Cannot Refute

[38] LECLIP: Boosting Zero-Shot Anomaly Detection with Local Enhanced CLIP PDF

Cannot Refute

[40] AA-CLIP: Enhancing Zero-Shot Anomaly Detection via Anomaly-Aware CLIP PDF

Cannot Refute

[41] Zero-Shot Defect Detection With Anomaly Attribute Awareness via Textual Domain Bridge PDF

Cannot Refute

[42] UltraAD: Fine-Grained Ultrasound Anomaly Classification via Few-Shot CLIP Adaptation PDF

Cannot Refute

[44] Anomaly detection by adapting a pre-trained vision language model PDF

Cannot Refute

MRAD: Zero-Shot Anomaly Detection with Memory-Driven Retrieval

Overview

Overall Novelty Assessment

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

[1] PA-CLIP: Enhancing Zero-Shot Anomaly Detection through Pseudo-Anomaly Awareness PDF

[2] A Training-Free Correlation-Weighted Model for Zero-/Few-Shot Industrial Anomaly Detection with Retrieval Augmentation PDF

Contribution Analysis

MRAD framework with memory-driven retrieval paradigm

[27] Memorizing normality to detect anomaly: Memory-augmented deep autoencoder for unsupervised anomaly detection PDF

[9] RAGLog: Log Anomaly Detection using Retrieval Augmented Generation PDF

[26] Visual anomaly detection via partition memory bank module and error estimation PDF

[28] FedDyMem: Efficient Federated Learning with Dynamic Memory and Memory-Reduce for Unsupervised Image Anomaly Detection PDF

[29] HyADS: A Hybrid Lightweight Anomaly Detection Framework for Edge-Based Industrial Systems with Limited Data PDF

[30] Appearance-Motion Memory Consistency Network for Video Anomaly Detection PDF

[31] Diffusion for out-of-distribution detection on road scenes and beyond PDF

[32] Learning Memory-guided Normality for Anomaly Detection PDF

[33] RAN Cortex: Memory-Augmented Intelligence for Context-Aware Decision-Making in AI-Native Networks PDF

[34] Anomaly detection with dual-stream memory network PDF

MRAD-FT variant with lightweight fine-tuning

[45] CLIP-AD: A Language-Guided Staged Dual-Path Model for Zero-shot Anomaly Detection PDF

[46] Intelligent Anomaly Detection for Lane Rendering Using Transformer with Self-Supervised Pretraining and Customized Fine-Tuning PDF

[47] REMEMBER: Retrieval-based Explainable Multimodal Evidence-guided Modeling for Brain Evaluation and Reasoning in Zero-and Few-shot Neurodegenerative â¦ PDF

[48] LogTIW:A log anomaly detection model based on TF-IDF weighted semantic features PDF

MRAD-CLIP variant with region-prior-guided dynamic prompts

[39] ViP-CLIP: Visual-Perception Prompting with Unified Alignment for Zero-Shot Anomaly Detection PDF

[43] Vcp-clip: A visual context prompting model for zero-shot anomaly segmentation PDF

[35] Kanoclip: Zero-shot anomaly detection through knowledge-driven prompt learning and enhanced cross-modal integration PDF

[36] AdaptCLIP: Adapting CLIP for Universal Visual Anomaly Detection PDF

[37] GenCLIP: Generalizing CLIP Prompts for Zero-shot Anomaly Detection PDF

[38] LECLIP: Boosting Zero-Shot Anomaly Detection with Local Enhanced CLIP PDF

[40] AA-CLIP: Enhancing Zero-Shot Anomaly Detection via Anomaly-Aware CLIP PDF

[41] Zero-Shot Defect Detection With Anomaly Attribute Awareness via Textual Domain Bridge PDF

[42] UltraAD: Fine-Grained Ultrasound Anomaly Classification via Few-Shot CLIP Adaptation PDF

[44] Anomaly detection by adapting a pre-trained vision language model PDF

Table of Contents

[47] REMEMBER: Retrieval-based Explainable Multimodal Evidence-guided Modeling for Brain Evaluation and Reasoning in Zero-and Few-shot Neurodegenerative â¦ PDF