Automatic Image-Level Morphological Trait Annotation for Organismal Images

ICLR 2026 Conference SubmissionAnonymous Authors
morphological traitsmorphological trait annotationecologytrait description generation
Abstract:

Morphological traits are physical characteristics of biological organisms that provide vital clues on how organisms interact with their environment. Yet extracting these traits remains a slow, expert-driven process, limiting their use in large-scale ecological studies. A major bottleneck is the absence of high-quality datasets linking biological images to trait-level annotations. In this work, we demonstrate that sparse autoencoders trained on foundation-model features yield monosemantic, spatially grounded neurons that consistently activate on meaningful morphological parts. Leveraging this property, we introduce a trait annotation pipeline that localizes salient regions and uses vision-language prompting to generate interpretable trait descriptions. Using this approach, we construct Bioscan-Traits, a dataset of 80K trait annotations spanning 19K insect images from BIOSCAN-5M. Human evaluation confirms the biological plausibility of the generated morphological descriptions. When used to fine-tune BioCLIP, a biologically grounded vision-language model, Bioscan-Traits improves zero-shot species classification on the in-the-wild Insects benchmark, underscoring the value of trait-level supervision for enhancing model generalization.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper introduces a trait annotation pipeline combining sparse autoencoders on foundation-model features with vision-language prompting to generate morphological descriptions from insect images. It resides in the Foundation Model and Vision-Language Approaches leaf, which contains only two papers within the broader Deep Learning-Based Trait Extraction and Annotation branch. This leaf represents an emerging research direction, contrasting with the more populated Supervised Segmentation and Classification leaf (eight papers) that focuses on task-specific architectures. The sparse population suggests the work enters a relatively nascent area where foundation models are being adapted for biological trait discovery.

The taxonomy reveals that neighboring leaves pursue distinct strategies: Supervised Segmentation and Classification emphasizes domain-tailored networks trained on annotated datasets, while Interactive and Semi-Supervised Learning (three papers) enables non-expert annotation through corrective feedback. The paper's approach diverges by leveraging pretrained representations to bypass extensive manual labeling, aligning more closely with the vision-language paradigm than with classical supervised pipelines. Its sibling paper in the same leaf, CellFlow Morphology, targets cellular-scale phenotyping with flow-based representations, whereas this work addresses organism-level insect morphology, indicating complementary scopes within the foundation model category.

Among the twenty candidates examined across three contributions, none were flagged as clearly refuting the proposed methods. The trait annotation pipeline (ten candidates examined, zero refutable) and the BIOSCAN-TRAITS dataset (ten candidates examined, zero refutable) both appear to lack direct prior work within this limited search scope. The species-contrastive ranking method was not evaluated against any candidates. This absence of overlapping prior work, combined with the sparse leaf population, suggests the contributions occupy a relatively unexplored intersection of sparse autoencoders, vision-language models, and morphological trait extraction, though the search examined only top-twenty semantic matches rather than an exhaustive survey.

Given the limited search scope and the nascent state of the Foundation Model and Vision-Language Approaches leaf, the work appears to introduce novel technical components—particularly the use of sparse autoencoders for monosemantic neuron discovery in biological imaging—that have not been directly addressed in the examined candidates. However, the analysis reflects top-twenty semantic matches and does not cover the full breadth of foundation model or vision-language research outside this specific biological context. The dataset contribution also appears distinct within the examined scope, though broader ecological or entomological datasets may exist beyond the search perimeter.

Taxonomy

Core-task Taxonomy Papers
50
3
Claimed Contributions
20
Contribution Candidate Papers Compared
0
Refutable Paper

Research Landscape Overview

Core task: automatic morphological trait annotation from biological images. The field encompasses a diverse set of approaches organized around six main branches. Deep Learning-Based Trait Extraction and Annotation includes both specialized architectures for segmentation and measurement (e.g., Stomata Morphology Measurement[1], Maize Leaf Angle Detection[20]) and emerging foundation model or vision-language strategies that leverage large-scale pretraining. Geometric Morphometrics and Shape Analysis focuses on landmark-based methods and shape correspondence techniques (Automated Geometric Morphometrics[8], 3D Shape Correspondence[10]), often applied to evolutionary or comparative studies. Software Toolkits and Platforms provide interactive or modular frameworks (Ilastik Interactive Learning[18], PhenoLearn Toolkit[2]) that enable biologists to annotate and analyze images without deep technical expertise. Imaging Modalities and Preprocessing addresses the acquisition and preparation of data from microscopy, flow imaging, or advanced optical systems (Flow Imaging Microscopy[17], Polarization Thermoacoustic Imaging[5]). Domain-Specific Applications target particular organisms or tissues—ranging from plankton (Plankton Trait Characterization[14]) to cancer spheroids (Cancer Spheroid Assessment[6]) and insect morphology (Black Soldier Fly Prediction[3])—while Methodological Foundations and Reviews consolidate theoretical perspectives and survey the state of the art (Phenomics Promise[42], Computational Cell Biology[43]). Several active lines of work highlight contrasting emphases and open questions. Traditional geometric morphometrics remains valuable for hypothesis-driven shape analysis, yet deep learning methods increasingly dominate when large annotated datasets are available, raising questions about interpretability and generalization across imaging conditions. Meanwhile, foundation model and vision-language approaches—exemplified by Automatic Morphological Trait Annotation[0] and its neighbor CellFlow Morphology[30]—seek to unify diverse annotation tasks under a single pretrained framework, potentially reducing the need for task-specific labeled data. These works differ in scope: CellFlow Morphology[30] emphasizes cellular-scale phenotyping with flow-based representations, whereas Automatic Morphological Trait Annotation[0] targets broader biological imaging contexts by integrating vision-language alignment. Both contrast with more narrowly scoped deep learning pipelines (e.g., Fruit Phenotyping AI[7]) that optimize for a single organism or trait. The central trade-off lies between the flexibility and data efficiency of foundation models versus the precision and domain knowledge embedded in specialized tools, a tension that continues to shape the trajectory of automated morphological analysis.

Claimed Contributions

Trait annotation pipeline using sparse autoencoders and vision-language prompting

The authors propose a three-step pipeline that uses sparse autoencoders trained on foundation-model features to identify monosemantic, spatially grounded neurons corresponding to morphological parts, then localizes these regions and prompts a multimodal language model to generate trait descriptions. This approach addresses the bottleneck of extracting morphological traits from biological images without requiring manual expert annotation.

10 retrieved papers
BIOSCAN-TRAITS dataset

The authors create a large-scale dataset containing 80,000 morphological trait annotations across 19,000 insect images by applying their pipeline to the BIOSCAN-5M corpus. This dataset provides structured, interpretable trait-level supervision at scale for training and evaluating biological foundation models.

10 retrieved papers
Species-contrastive ranking method for trait selection

The authors develop a ranking approach that identifies SAE units by comparing their activation strength within a focal species against closely related species (congeners). This method isolates taxonomically diagnostic features that correspond to the fine-scale morphological structures recorded by taxonomists as traits.

0 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Trait annotation pipeline using sparse autoencoders and vision-language prompting

The authors propose a three-step pipeline that uses sparse autoencoders trained on foundation-model features to identify monosemantic, spatially grounded neurons corresponding to morphological parts, then localizes these regions and prompts a multimodal language model to generate trait descriptions. This approach addresses the bottleneck of extracting morphological traits from biological images without requiring manual expert annotation.

Contribution

BIOSCAN-TRAITS dataset

The authors create a large-scale dataset containing 80,000 morphological trait annotations across 19,000 insect images by applying their pipeline to the BIOSCAN-5M corpus. This dataset provides structured, interpretable trait-level supervision at scale for training and evaluating biological foundation models.

Contribution

Species-contrastive ranking method for trait selection

The authors develop a ranking approach that identifies SAE units by comparing their activation strength within a focal species against closely related species (congeners). This method isolates taxonomically diagnostic features that correspond to the fine-scale morphological structures recorded by taxonomists as traits.