Retain and Adapt: Auto-Balanced Model Editing for Open-Vocabulary Object Detection under Domain Shifts

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 6.0 Download Report PDF

Open-Vocabulary Object DetectionModel EditingContinual LearningKnowledge InjectionFew-Shot LearningCatastrophic Forgetting

Recent advances in Open Vocabulary Object Detection (OVOD) have shown strong performance on standard benchmarks, but performance drops sharply under out-of-distribution (OOD) shifts. Continual learning offers a potential remedy by sequentially integrating new tasks, yet existing methods often struggle to balance retaining the pre-trained model capabilities with adapting to new tasks, and usually require retraining under specific task orders. To address these limitations, we observe that model editing naturally lends itself to this setting, as it enables efficient knowledge injection while retaining prior capabilities. Building on this insight, we introduce $\textbf{A}$ utomatically $\textbf{B}$ alanced $\textbf{M}$ odel $\textbf{E}$ diting ( $\textbf{ABME}$ ), which injects new task knowledge into the powerful OVOD models while preserving the model’s original abilities. We first stores compact key–value representations with storage cost independent of task volume. Then we leverage the stored KV matrices to automatically balance the new and old knowledge for varying learning scenarios, supporting order-agnostic task insertion or removal without additional retraining. Experiments show that ABME consistently achieves a better trade-off between maintaining pre-trained performance and adapting to diverse OOD tasks compared to existing continual learning approaches for open-vocabulary object detection, and generalizes seamlessly across different models and task scales.

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper introduces model editing techniques to open-vocabulary object detection (OVOD) for continual learning under domain shifts. It occupies the 'Knowledge Editing with Key-Value Storage' leaf within the 'Modular and Parameter-Efficient Adaptation Methods' branch. Notably, this leaf contains only the original paper itself—no sibling papers appear in the taxonomy. This positioning suggests the work explores a relatively sparse research direction, combining model editing paradigms with OVOD continual learning in a way that distinguishes it from neighboring prompt-based or modular expert approaches.

The taxonomy reveals three sibling leaves under the same parent branch: 'Modular Expert Systems with Version Control' (1 paper), 'Textual and Prompt-Based Adaptation' (3 papers), and the original paper's leaf. Neighboring branches include 'Dual Incremental Learning Frameworks' (2 papers) and 'Open-World Continual Detection Systems' (2 papers). The field structure shows concentrated activity in prompt-based methods, while knowledge editing with key-value storage remains less populated. The scope note explicitly excludes methods without key-value mechanisms, clarifying that the paper's approach differs from prompt tuning or module libraries by storing compact representations for automatic knowledge balancing.

Among 21 candidates examined across three contributions, no refutable prior work was identified. Contribution A ('Introducing model editing to OVOD') examined 2 candidates with 0 refutations. Contribution B ('Auto-balanced editing strategy') examined 9 candidates, also yielding 0 refutations. Contribution C ('ABME framework') examined 10 candidates with the same outcome. This limited search scope suggests that within the top-21 semantically similar papers, none provide clear overlapping prior work on model editing for OVOD continual learning. However, the analysis does not claim exhaustive coverage of the broader literature.

Given the sparse taxonomy leaf and absence of refutable candidates among 21 examined papers, the work appears to occupy a relatively unexplored niche at the intersection of model editing and open-vocabulary detection. The limited search scope means this assessment reflects top-K semantic matches rather than comprehensive field coverage. Future reviewers may wish to examine whether related model editing techniques in other vision domains (e.g., image classification) could inform novelty judgments, as the current analysis focuses primarily on OVOD-specific continual learning literature.

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: continual learning for open-vocabulary object detection under domain shifts. The field addresses how detection models can incrementally acquire new visual concepts and adapt to changing environments without catastrophic forgetting. The taxonomy organizes research into four main branches. Modular and Parameter-Efficient Adaptation Methods focus on lightweight mechanisms such as prompt tuning, adapter modules, and knowledge editing with key-value storage to preserve prior knowledge while learning new domains. Dual Incremental Learning Frameworks tackle the simultaneous expansion of both class vocabularies and domain coverage, often employing memory replay or distillation strategies. Open-World Continual Detection Systems emphasize handling novel object categories in unconstrained settings, bridging closed-set incremental detection with open-vocabulary recognition. Application-Specific Continual Detection explores domain-targeted scenarios like robotics or autonomous driving, where task structure and data distributions follow particular patterns. Recent work reveals tension between parameter efficiency and adaptation flexibility. Some studies pursue modular designs with frozen backbones and learnable side networks, as seen in MR-GDINO[1] and DitHub[2], while others like Textual Inversion[3] and Coleclip[4] edit text embeddings or visual prompts to encode domain-specific knowledge compactly. Retain and Adapt[0] sits within the knowledge editing cluster, employing key-value storage to selectively update representations without full retraining, closely aligning with approaches like D-Know[6] that maintain external memory structures. Compared to Scene Task Adaptive[5], which emphasizes task-specific tuning, Retain and Adapt[0] prioritizes retaining cross-domain generalization through controlled editing. This landscape highlights an ongoing exploration of how to balance continual adaptation with open-vocabulary robustness, particularly as models encounter both novel categories and shifting visual distributions.

Claimed Contributions

Introducing model editing to open-vocabulary object detection

2 retrieved papers

The authors introduce model editing techniques, previously used in large language models, to the open-vocabulary object detection domain. They propose a method to construct key-value knowledge pairs from FFN layers to enable efficient adaptation to new concepts while preserving original model capabilities.

2 retrieved papers

Auto-balanced model editing strategy

9 retrieved papers

The authors develop an automatic balancing mechanism that eliminates the need for manual hyperparameter tuning by using the key-value matrices themselves to adjust the trade-off between retaining pre-trained knowledge and adapting to new tasks. This strategy works across different models and task volumes without requiring task-specific parameter search.

9 retrieved papers

Automatically Balanced Model Editing (ABME) framework

10 retrieved papers

The authors present ABME, a complete framework that stores compact key-value representations with storage cost independent of task volume, supports order-agnostic task insertion or removal without retraining, and achieves effective knowledge injection while maintaining base model performance on open-vocabulary object detection tasks.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Within the taxonomy built over the current TopK core-task papers, the original paper is assigned to a leaf with no direct siblings and no cousin branches under the same grandparent topic. In this retrieved landscape, it appears structurally isolated, which is one partial signal of novelty, but still constrained by search coverage and taxonomy granularity.

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Introducing model editing to open-vocabulary object detection

[6] D-Know: Disentangled Domain Knowledge-Aided Learning for Open-Domain Continual Object Detection PDF

Cannot Refute

[31] Language-Conditioned Object Detection and Manipulation PDF

Cannot Refute

Contribution

Auto-balanced model editing strategy

[21] Self-Adapting Language Models PDF

Cannot Refute

[22] AdaMerging: Adaptive Model Merging for Multi-Task Learning PDF

Cannot Refute

[23] How the self-concept structures social role learning: insights from computational models PDF

Cannot Refute

[24] Automating Dataset Updates Towards Reliable and Timely Evaluation of Large Language Models PDF

Cannot Refute

[25] Information Association for Language Model Updating by Mitigating LM-Logical Discrepancy PDF

Cannot Refute

[26] An Automated Code Update Tool For Python Packages PDF

Cannot Refute

[27] Harmonization service and global library of models to support country-driven global information on salt-affected soils PDF

Cannot Refute

[28] HarmonyTM: multi-center data harmonization applied to distributed learning for Parkinson's disease classification PDF

Cannot Refute

[30] Towards Automated Updates of Software Dependencies PDF

Cannot Refute

Contribution

Automatically Balanced Model Editing (ABME) framework

[11] CrossKD: Cross-Head Knowledge Distillation for Object Detection PDF

Cannot Refute

[12] LKD-YOLOv8: A Lightweight Knowledge Distillation-Based Method for Infrared Object Detection PDF

Cannot Refute

[13] Structured knowledge distillation for accurate and efficient object detection PDF

Cannot Refute

[14] An Improved Knowledge Distillation Algorithm and Its Application to Object Detection PDF

Cannot Refute

[15] CLEAN: Category Knowledge-Driven Compression Framework for Efficient 3D Object Detection PDF

Cannot Refute

[16] Foreground separation knowledge distillation for object detection PDF

Cannot Refute

[17] FINet: Frequency Injection Network for Lightweight Camouflaged Object Detection PDF

Cannot Refute

[18] A Hierarchical Semantic Distillation Framework for Open-Vocabulary Object Detection PDF

Cannot Refute

[19] Channel-wise Knowledge Distillation for Dense Prediction* PDF

Cannot Refute

[20] MKD-YOLO: Multi-Scale and Knowledge-Distilling YOLO for Efficient PPE Compliance Detection PDF

Cannot Refute

Retain and Adapt: Auto-Balanced Model Editing for Open-Vocabulary Object Detection under Domain Shifts

Overview

Overall Novelty Assessment

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

Contribution Analysis

Introducing model editing to open-vocabulary object detection

[6] D-Know: Disentangled Domain Knowledge-Aided Learning for Open-Domain Continual Object Detection PDF

[31] Language-Conditioned Object Detection and Manipulation PDF

Auto-balanced model editing strategy

[21] Self-Adapting Language Models PDF

[22] AdaMerging: Adaptive Model Merging for Multi-Task Learning PDF

[23] How the self-concept structures social role learning: insights from computational models PDF

[24] Automating Dataset Updates Towards Reliable and Timely Evaluation of Large Language Models PDF

[25] Information Association for Language Model Updating by Mitigating LM-Logical Discrepancy PDF

[26] An Automated Code Update Tool For Python Packages PDF

[27] Harmonization service and global library of models to support country-driven global information on salt-affected soils PDF

[28] HarmonyTM: multi-center data harmonization applied to distributed learning for Parkinson's disease classification PDF

[30] Towards Automated Updates of Software Dependencies PDF

Automatically Balanced Model Editing (ABME) framework

[11] CrossKD: Cross-Head Knowledge Distillation for Object Detection PDF

[12] LKD-YOLOv8: A Lightweight Knowledge Distillation-Based Method for Infrared Object Detection PDF

[13] Structured knowledge distillation for accurate and efficient object detection PDF

[14] An Improved Knowledge Distillation Algorithm and Its Application to Object Detection PDF

[15] CLEAN: Category Knowledge-Driven Compression Framework for Efficient 3D Object Detection PDF

[16] Foreground separation knowledge distillation for object detection PDF

[17] FINet: Frequency Injection Network for Lightweight Camouflaged Object Detection PDF

[18] A Hierarchical Semantic Distillation Framework for Open-Vocabulary Object Detection PDF

[19] Channel-wise Knowledge Distillation for Dense Prediction* PDF

[20] MKD-YOLO: Multi-Scale and Knowledge-Distilling YOLO for Efficient PPE Compliance Detection PDF

Table of Contents