Retain and Adapt: Auto-Balanced Model Editing for Open-Vocabulary Object Detection under Domain Shifts

ICLR 2026 Conference SubmissionAnonymous Authors
Open-Vocabulary Object DetectionModel EditingContinual LearningKnowledge InjectionFew-Shot LearningCatastrophic Forgetting
Abstract:

Recent advances in Open Vocabulary Object Detection (OVOD) have shown strong performance on standard benchmarks, but performance drops sharply under out-of-distribution (OOD) shifts. Continual learning offers a potential remedy by sequentially integrating new tasks, yet existing methods often struggle to balance retaining the pre-trained model capabilities with adapting to new tasks, and usually require retraining under specific task orders. To address these limitations, we observe that model editing naturally lends itself to this setting, as it enables efficient knowledge injection while retaining prior capabilities. Building on this insight, we introduce A\textbf{A}utomatically B\textbf{B}alanced M\textbf{M}odel E\textbf{E}diting (ABME\textbf{ABME}), which injects new task knowledge into the powerful OVOD models while preserving the model’s original abilities. We first stores compact key–value representations with storage cost independent of task volume. Then we leverage the stored KV matrices to automatically balance the new and old knowledge for varying learning scenarios, supporting order-agnostic task insertion or removal without additional retraining. Experiments show that ABME consistently achieves a better trade-off between maintaining pre-trained performance and adapting to diverse OOD tasks compared to existing continual learning approaches for open-vocabulary object detection, and generalizes seamlessly across different models and task scales.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper introduces model editing techniques to open-vocabulary object detection (OVOD) for continual learning under domain shifts. It occupies the 'Knowledge Editing with Key-Value Storage' leaf within the 'Modular and Parameter-Efficient Adaptation Methods' branch. Notably, this leaf contains only the original paper itself—no sibling papers appear in the taxonomy. This positioning suggests the work explores a relatively sparse research direction, combining model editing paradigms with OVOD continual learning in a way that distinguishes it from neighboring prompt-based or modular expert approaches.

The taxonomy reveals three sibling leaves under the same parent branch: 'Modular Expert Systems with Version Control' (1 paper), 'Textual and Prompt-Based Adaptation' (3 papers), and the original paper's leaf. Neighboring branches include 'Dual Incremental Learning Frameworks' (2 papers) and 'Open-World Continual Detection Systems' (2 papers). The field structure shows concentrated activity in prompt-based methods, while knowledge editing with key-value storage remains less populated. The scope note explicitly excludes methods without key-value mechanisms, clarifying that the paper's approach differs from prompt tuning or module libraries by storing compact representations for automatic knowledge balancing.

Among 21 candidates examined across three contributions, no refutable prior work was identified. Contribution A ('Introducing model editing to OVOD') examined 2 candidates with 0 refutations. Contribution B ('Auto-balanced editing strategy') examined 9 candidates, also yielding 0 refutations. Contribution C ('ABME framework') examined 10 candidates with the same outcome. This limited search scope suggests that within the top-21 semantically similar papers, none provide clear overlapping prior work on model editing for OVOD continual learning. However, the analysis does not claim exhaustive coverage of the broader literature.

Given the sparse taxonomy leaf and absence of refutable candidates among 21 examined papers, the work appears to occupy a relatively unexplored niche at the intersection of model editing and open-vocabulary detection. The limited search scope means this assessment reflects top-K semantic matches rather than comprehensive field coverage. Future reviewers may wish to examine whether related model editing techniques in other vision domains (e.g., image classification) could inform novelty judgments, as the current analysis focuses primarily on OVOD-specific continual learning literature.

Taxonomy

Core-task Taxonomy Papers
10
3
Claimed Contributions
21
Contribution Candidate Papers Compared
0
Refutable Paper

Research Landscape Overview

Core task: continual learning for open-vocabulary object detection under domain shifts. The field addresses how detection models can incrementally acquire new visual concepts and adapt to changing environments without catastrophic forgetting. The taxonomy organizes research into four main branches. Modular and Parameter-Efficient Adaptation Methods focus on lightweight mechanisms such as prompt tuning, adapter modules, and knowledge editing with key-value storage to preserve prior knowledge while learning new domains. Dual Incremental Learning Frameworks tackle the simultaneous expansion of both class vocabularies and domain coverage, often employing memory replay or distillation strategies. Open-World Continual Detection Systems emphasize handling novel object categories in unconstrained settings, bridging closed-set incremental detection with open-vocabulary recognition. Application-Specific Continual Detection explores domain-targeted scenarios like robotics or autonomous driving, where task structure and data distributions follow particular patterns. Recent work reveals tension between parameter efficiency and adaptation flexibility. Some studies pursue modular designs with frozen backbones and learnable side networks, as seen in MR-GDINO[1] and DitHub[2], while others like Textual Inversion[3] and Coleclip[4] edit text embeddings or visual prompts to encode domain-specific knowledge compactly. Retain and Adapt[0] sits within the knowledge editing cluster, employing key-value storage to selectively update representations without full retraining, closely aligning with approaches like D-Know[6] that maintain external memory structures. Compared to Scene Task Adaptive[5], which emphasizes task-specific tuning, Retain and Adapt[0] prioritizes retaining cross-domain generalization through controlled editing. This landscape highlights an ongoing exploration of how to balance continual adaptation with open-vocabulary robustness, particularly as models encounter both novel categories and shifting visual distributions.

Claimed Contributions

Introducing model editing to open-vocabulary object detection

The authors introduce model editing techniques, previously used in large language models, to the open-vocabulary object detection domain. They propose a method to construct key-value knowledge pairs from FFN layers to enable efficient adaptation to new concepts while preserving original model capabilities.

2 retrieved papers
Auto-balanced model editing strategy

The authors develop an automatic balancing mechanism that eliminates the need for manual hyperparameter tuning by using the key-value matrices themselves to adjust the trade-off between retaining pre-trained knowledge and adapting to new tasks. This strategy works across different models and task volumes without requiring task-specific parameter search.

9 retrieved papers
Automatically Balanced Model Editing (ABME) framework

The authors present ABME, a complete framework that stores compact key-value representations with storage cost independent of task volume, supports order-agnostic task insertion or removal without retraining, and achieves effective knowledge injection while maintaining base model performance on open-vocabulary object detection tasks.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Within the taxonomy built over the current TopK core-task papers, the original paper is assigned to a leaf with no direct siblings and no cousin branches under the same grandparent topic. In this retrieved landscape, it appears structurally isolated, which is one partial signal of novelty, but still constrained by search coverage and taxonomy granularity.

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Introducing model editing to open-vocabulary object detection

The authors introduce model editing techniques, previously used in large language models, to the open-vocabulary object detection domain. They propose a method to construct key-value knowledge pairs from FFN layers to enable efficient adaptation to new concepts while preserving original model capabilities.

Contribution

Auto-balanced model editing strategy

The authors develop an automatic balancing mechanism that eliminates the need for manual hyperparameter tuning by using the key-value matrices themselves to adjust the trade-off between retaining pre-trained knowledge and adapting to new tasks. This strategy works across different models and task volumes without requiring task-specific parameter search.

Contribution

Automatically Balanced Model Editing (ABME) framework

The authors present ABME, a complete framework that stores compact key-value representations with storage cost independent of task volume, supports order-agnostic task insertion or removal without retraining, and achieves effective knowledge injection while maintaining base model performance on open-vocabulary object detection tasks.

Retain and Adapt: Auto-Balanced Model Editing for Open-Vocabulary Object Detection under Domain Shifts | Novelty Validation