Automatic Image-Level Morphological Trait Annotation for Organismal Images

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 6.7 Download Report PDF

morphological traitsmorphological trait annotationecologytrait description generation

Morphological traits are physical characteristics of biological organisms that provide vital clues on how organisms interact with their environment. Yet extracting these traits remains a slow, expert-driven process, limiting their use in large-scale ecological studies. A major bottleneck is the absence of high-quality datasets linking biological images to trait-level annotations. In this work, we demonstrate that sparse autoencoders trained on foundation-model features yield monosemantic, spatially grounded neurons that consistently activate on meaningful morphological parts. Leveraging this property, we introduce a trait annotation pipeline that localizes salient regions and uses vision-language prompting to generate interpretable trait descriptions. Using this approach, we construct Bioscan-Traits, a dataset of 80K trait annotations spanning 19K insect images from BIOSCAN-5M. Human evaluation confirms the biological plausibility of the generated morphological descriptions. When used to fine-tune BioCLIP, a biologically grounded vision-language model, Bioscan-Traits improves zero-shot species classification on the in-the-wild Insects benchmark, underscoring the value of trait-level supervision for enhancing model generalization.

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper introduces a trait annotation pipeline combining sparse autoencoders on foundation-model features with vision-language prompting to generate morphological descriptions from insect images. It resides in the Foundation Model and Vision-Language Approaches leaf, which contains only two papers within the broader Deep Learning-Based Trait Extraction and Annotation branch. This leaf represents an emerging research direction, contrasting with the more populated Supervised Segmentation and Classification leaf (eight papers) that focuses on task-specific architectures. The sparse population suggests the work enters a relatively nascent area where foundation models are being adapted for biological trait discovery.

The taxonomy reveals that neighboring leaves pursue distinct strategies: Supervised Segmentation and Classification emphasizes domain-tailored networks trained on annotated datasets, while Interactive and Semi-Supervised Learning (three papers) enables non-expert annotation through corrective feedback. The paper's approach diverges by leveraging pretrained representations to bypass extensive manual labeling, aligning more closely with the vision-language paradigm than with classical supervised pipelines. Its sibling paper in the same leaf, CellFlow Morphology, targets cellular-scale phenotyping with flow-based representations, whereas this work addresses organism-level insect morphology, indicating complementary scopes within the foundation model category.

Among the twenty candidates examined across three contributions, none were flagged as clearly refuting the proposed methods. The trait annotation pipeline (ten candidates examined, zero refutable) and the BIOSCAN-TRAITS dataset (ten candidates examined, zero refutable) both appear to lack direct prior work within this limited search scope. The species-contrastive ranking method was not evaluated against any candidates. This absence of overlapping prior work, combined with the sparse leaf population, suggests the contributions occupy a relatively unexplored intersection of sparse autoencoders, vision-language models, and morphological trait extraction, though the search examined only top-twenty semantic matches rather than an exhaustive survey.

Given the limited search scope and the nascent state of the Foundation Model and Vision-Language Approaches leaf, the work appears to introduce novel technical components—particularly the use of sparse autoencoders for monosemantic neuron discovery in biological imaging—that have not been directly addressed in the examined candidates. However, the analysis reflects top-twenty semantic matches and does not cover the full breadth of foundation model or vision-language research outside this specific biological context. The dataset contribution also appears distinct within the examined scope, though broader ecological or entomological datasets may exist beyond the search perimeter.

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: automatic morphological trait annotation from biological images. The field encompasses a diverse set of approaches organized around six main branches. Deep Learning-Based Trait Extraction and Annotation includes both specialized architectures for segmentation and measurement (e.g., Stomata Morphology Measurement[1], Maize Leaf Angle Detection[20]) and emerging foundation model or vision-language strategies that leverage large-scale pretraining. Geometric Morphometrics and Shape Analysis focuses on landmark-based methods and shape correspondence techniques (Automated Geometric Morphometrics[8], 3D Shape Correspondence[10]), often applied to evolutionary or comparative studies. Software Toolkits and Platforms provide interactive or modular frameworks (Ilastik Interactive Learning[18], PhenoLearn Toolkit[2]) that enable biologists to annotate and analyze images without deep technical expertise. Imaging Modalities and Preprocessing addresses the acquisition and preparation of data from microscopy, flow imaging, or advanced optical systems (Flow Imaging Microscopy[17], Polarization Thermoacoustic Imaging[5]). Domain-Specific Applications target particular organisms or tissues—ranging from plankton (Plankton Trait Characterization[14]) to cancer spheroids (Cancer Spheroid Assessment[6]) and insect morphology (Black Soldier Fly Prediction[3])—while Methodological Foundations and Reviews consolidate theoretical perspectives and survey the state of the art (Phenomics Promise[42], Computational Cell Biology[43]). Several active lines of work highlight contrasting emphases and open questions. Traditional geometric morphometrics remains valuable for hypothesis-driven shape analysis, yet deep learning methods increasingly dominate when large annotated datasets are available, raising questions about interpretability and generalization across imaging conditions. Meanwhile, foundation model and vision-language approaches—exemplified by Automatic Morphological Trait Annotation[0] and its neighbor CellFlow Morphology[30]—seek to unify diverse annotation tasks under a single pretrained framework, potentially reducing the need for task-specific labeled data. These works differ in scope: CellFlow Morphology[30] emphasizes cellular-scale phenotyping with flow-based representations, whereas Automatic Morphological Trait Annotation[0] targets broader biological imaging contexts by integrating vision-language alignment. Both contrast with more narrowly scoped deep learning pipelines (e.g., Fruit Phenotyping AI[7]) that optimize for a single organism or trait. The central trade-off lies between the flexibility and data efficiency of foundation models versus the precision and domain knowledge embedded in specialized tools, a tension that continues to shape the trajectory of automated morphological analysis.

Claimed Contributions

Trait annotation pipeline using sparse autoencoders and vision-language prompting

10 retrieved papers

The authors propose a three-step pipeline that uses sparse autoencoders trained on foundation-model features to identify monosemantic, spatially grounded neurons corresponding to morphological parts, then localizes these regions and prompts a multimodal language model to generate trait descriptions. This approach addresses the bottleneck of extracting morphological traits from biological images without requiring manual expert annotation.

10 retrieved papers

BIOSCAN-TRAITS dataset

10 retrieved papers

The authors create a large-scale dataset containing 80,000 morphological trait annotations across 19,000 insect images by applying their pipeline to the BIOSCAN-5M corpus. This dataset provides structured, interpretable trait-level supervision at scale for training and evaluating biological foundation models.

10 retrieved papers

Species-contrastive ranking method for trait selection

0 retrieved papers

The authors develop a ranking approach that identifies SAE units by comparing their activation strength within a focal species against closely related species (congeners). This method isolates taxonomically diagnostic features that correspond to the fine-scale morphological structures recorded by taxonomists as traits.

0 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

[30] CellFlow: Simulating Cellular Morphology Changes via Flow Matching PDF

Zhang Yu-hui, Su YuChang, Yuhui Zhang, Wang Chen-yu, Yuchang Su, Li Tianhong, Chenyu Wang, Tianhong Li, Zoe Wefers, Burgess, James, Jeffrey J. Nirschl, Ding, Daisy, James Burgess, Lozano Alejandro, Daisy Ding, Lundberg Emma, Alejandro Lozano, Emma Lundberg, S. Yeung-Levy (2025) • International Conference on Machine Learning

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Trait annotation pipeline using sparse autoencoders and vision-language prompting

[61] Avltrack: Dynamic sparse learning for aerial vision-language tracking PDF

Cannot Refute

[62] Interpreting CLIP with Hierarchical Sparse Autoencoders PDF

Cannot Refute

[63] VL-SAE: Interpreting and Enhancing Vision-Language Alignment with a Unified Concept Set PDF

Cannot Refute

[64] Sparse attention vectors: Generative multimodal model features are discriminative vision-language classifiers PDF

Cannot Refute

[65] Patch-level phenotype identification via weakly supervised neuron selection in sparse autoencoders for CLIP-derived pathology embeddings PDF

Cannot Refute

[66] Sparse Autoencoders Learn Monosemantic Features in Vision-Language Models PDF

Cannot Refute

[67] A novel multimodal framework for automatic recognition of individual cattle based on hybrid features using sparse stacked denoising autoencoder and group sparse â¦ PDF

Cannot Refute

[68] Steering LVLMs via Sparse Autoencoder for Hallucination Mitigation PDF

Cannot Refute

[69] SAUCE: Selective Concept Unlearning in Vision-Language Models with Sparse Autoencoders PDF

Cannot Refute

[70] debiaSAE: Benchmarking and Mitigating Vision-Language Model Bias PDF

Cannot Refute

Contribution

BIOSCAN-TRAITS dataset

[51] Artificial intelligence correctly classifies developmental stages of monarch caterpillars enabling better conservation through the use of community science photographs PDF

Cannot Refute

[52] Formalizing invertebrate morphological data: A descriptive model for cuticle-based skeleto-muscular systems, an ontology for insect anatomy, and their potential â¦ PDF

Cannot Refute

[53] A multi-modal dataset for insect biodiversity with imagery and DNA at the trap and individual level PDF

Cannot Refute

[54] Utilizing CNNs for classification and uncertainty quantification for 15 families of European fly pollinators PDF

Cannot Refute

[55] Classification and morphological analysis of vector mosquitoes using deep convolutional neural networks PDF

Cannot Refute

[56] Zeroâshot insect detection via weak language supervision PDF

Cannot Refute

[57] Worldwide revision of synanthropic silverfish (Insecta: Zygentoma: Lepismatidae) combining morphological and molecular data PDF

Cannot Refute

[58] Identification of species by combining molecular and morphological data using convolutional neural networks PDF

Cannot Refute

[59] MAPHISâMeasuring arthropod phenotypes using hierarchical image segmentations PDF

Cannot Refute

[60] STARdbi: A pipeline and database for insect monitoring based on automated image analysis PDF

Cannot Refute

Contribution

Automatic Image-Level Morphological Trait Annotation for Organismal Images

Overview

Overall Novelty Assessment

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

[30] CellFlow: Simulating Cellular Morphology Changes via Flow Matching PDF

Contribution Analysis

Trait annotation pipeline using sparse autoencoders and vision-language prompting

[61] Avltrack: Dynamic sparse learning for aerial vision-language tracking PDF

[62] Interpreting CLIP with Hierarchical Sparse Autoencoders PDF

[63] VL-SAE: Interpreting and Enhancing Vision-Language Alignment with a Unified Concept Set PDF

[64] Sparse attention vectors: Generative multimodal model features are discriminative vision-language classifiers PDF

[65] Patch-level phenotype identification via weakly supervised neuron selection in sparse autoencoders for CLIP-derived pathology embeddings PDF

[66] Sparse Autoencoders Learn Monosemantic Features in Vision-Language Models PDF

[67] A novel multimodal framework for automatic recognition of individual cattle based on hybrid features using sparse stacked denoising autoencoder and group sparse â¦ PDF

[68] Steering LVLMs via Sparse Autoencoder for Hallucination Mitigation PDF

[69] SAUCE: Selective Concept Unlearning in Vision-Language Models with Sparse Autoencoders PDF

[70] debiaSAE: Benchmarking and Mitigating Vision-Language Model Bias PDF

BIOSCAN-TRAITS dataset

[51] Artificial intelligence correctly classifies developmental stages of monarch caterpillars enabling better conservation through the use of community science photographs PDF

[52] Formalizing invertebrate morphological data: A descriptive model for cuticle-based skeleto-muscular systems, an ontology for insect anatomy, and their potential â¦ PDF

[53] A multi-modal dataset for insect biodiversity with imagery and DNA at the trap and individual level PDF

[54] Utilizing CNNs for classification and uncertainty quantification for 15 families of European fly pollinators PDF

[55] Classification and morphological analysis of vector mosquitoes using deep convolutional neural networks PDF

[56] Zeroâshot insect detection via weak language supervision PDF

[57] Worldwide revision of synanthropic silverfish (Insecta: Zygentoma: Lepismatidae) combining morphological and molecular data PDF

[58] Identification of species by combining molecular and morphological data using convolutional neural networks PDF

[59] MAPHISâMeasuring arthropod phenotypes using hierarchical image segmentations PDF

[60] STARdbi: A pipeline and database for insect monitoring based on automated image analysis PDF

Species-contrastive ranking method for trait selection

Table of Contents

[67] A novel multimodal framework for automatic recognition of individual cattle based on hybrid features using sparse stacked denoising autoencoder and group sparse â¦ PDF

[52] Formalizing invertebrate morphological data: A descriptive model for cuticle-based skeleto-muscular systems, an ontology for insect anatomy, and their potential â¦ PDF

[56] Zeroâshot insect detection via weak language supervision PDF

[59] MAPHISâMeasuring arthropod phenotypes using hierarchical image segmentations PDF