Brain-IT: Image Reconstruction from fMRI via Brain-Interaction Transformer

ICLR 2026 Conference SubmissionAnonymous Authors
fMRI-to-Image ReconstructionBrain DecodingfMRI DecodingMultiple Brains
Abstract:

Reconstructing images seen by people from their fMRI brain recordings provides a non-invasive window into the human brain. Despite recent progress enabled by diffusion models, current methods often lack faithfulness to the actual seen images. We present ``Brain-IT'', a brain-inspired approach that addresses this challenge through a Brain Interaction Transformer (BIT), allowing effective interactions between clusters of functionally-similar brain-voxels. These functional-clusters are shared by all subjects, serving as building blocks for integrating information both within and across brains. All model components are shared by all clusters & subjects, allowing efficient training with limited amount of data. To guide the image reconstruction, BIT predicts two complementary localized patch-level image features: (i) high-level semantic features which steer the diffusion model toward the correct semantic content of the image; and (ii) low-level structural features which help to initialize the diffusion process with the correct coarse layout of the image. BIT's design enables direct flow of information from brain-voxel clusters to localized image features. Through these principles, our method achieves image reconstructions from fMRI that faithfully reconstruct the seen images, and surpass current SotA approaches both visually and by standard objective metrics. Moreover, with only 1-hour of fMRI data from a new subject, we achieve results comparable to current methods trained on full 40-hour recordings.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

Brain-IT proposes a supervised end-to-end architecture for fMRI-to-image reconstruction, introducing a Brain Interaction Transformer (BIT) that maps functional brain-voxel clusters to localized image features. The paper resides in the 'Supervised End-to-End Architectures' leaf, which contains five papers including the original work. This leaf sits within the broader 'Neural Architecture and Representation Learning' branch, indicating a moderately populated research direction focused on direct neural mapping strategies rather than generative-model-heavy approaches. The taxonomy reveals this is an active but not overcrowded area, with clear methodological boundaries separating it from diffusion-based and self-supervised methods.

The taxonomy structure shows Brain-IT's leaf neighbors include works on adversarial training, multitask learning, and graph-based architectures for fMRI decoding. Adjacent leaves address multi-subject generalization and self-supervised learning, suggesting the field explores diverse architectural strategies for handling inter-subject variability and limited supervision. The 'Generative Model-Based Reconstruction Approaches' branch, particularly diffusion-based methods with seven papers in single-stage decoding alone, represents a parallel and arguably more crowded research direction. Brain-IT's positioning emphasizes architectural innovation over reliance on pretrained generative priors, distinguishing it from the diffusion-heavy paradigm that dominates recent reconstruction efforts.

Among the three contributions analyzed, the Brain Interaction Transformer and overall Brain-IT framework each examined ten candidates with no clear refutations found in the limited search scope. The Deep Image Prior component examined only two candidates, with one appearing to provide overlapping prior work. This suggests the core architectural contributions may be more novel within the examined literature, while the low-level reconstruction strategy has more substantial precedent. However, these findings reflect a search of twenty-two total candidates, not an exhaustive review, and the taxonomy shows related work exists in adjacent leaves that may not have been fully captured by semantic search.

Based on the limited search scope, Brain-IT appears to introduce distinctive architectural elements for fMRI-to-image mapping, particularly in its functional-cluster-based interaction mechanism. The taxonomy context suggests it occupies a meaningful but not isolated position within supervised end-to-end approaches, with clear differentiation from generative-model-heavy methods. The analysis covers top-K semantic matches and does not exhaustively survey all related work in multi-subject generalization or alternative architectural strategies that may share conceptual overlap.

Taxonomy

Core-task Taxonomy Papers
50
3
Claimed Contributions
22
Contribution Candidate Papers Compared
1
Refutable Paper

Research Landscape Overview

Core task: image reconstruction from fMRI brain recordings. The field has evolved into several major branches that reflect different methodological philosophies. Generative Model-Based Reconstruction Approaches leverage powerful pretrained diffusion models and GANs to synthesize images from neural signals, as seen in works like High-Resolution Latent Diffusion[3] and Brain-Diffuser[8]. Neural Architecture and Representation Learning focuses on designing specialized encoders and end-to-end pipelines that map brain activity directly to visual features, exemplified by End-to-End Deep Reconstruction[4] and related supervised architectures. Multimodal and Extended Decoding Tasks broaden the scope beyond static images to language, video, and cross-modal representations, while Specialized Reconstruction Contexts address domain-specific constraints such as retinal imaging or rapid decoding scenarios. Methodological Foundations and Benchmarking provide the datasets, evaluation protocols, and theoretical underpinnings that support reproducible progress across these diverse directions. Recent work reveals a tension between fully supervised end-to-end learning and hybrid approaches that incorporate strong generative priors. Brain-IT[0] sits within the Neural Architecture and Representation Learning branch, specifically among Supervised End-to-End Architectures, where it emphasizes direct mapping from fMRI voxels to image space without relying heavily on pretrained generative models. This contrasts with neighbors like Cognitive Representation Adversarial[20] and Multitask Learning Decoding[28], which explore adversarial training and multitask objectives to enrich learned representations. Meanwhile, the generative branch continues to push reconstruction fidelity using diffusion guidance, raising questions about the trade-offs between architectural simplicity, interpretability, and dependence on large-scale pretraining. Brain-IT[0] represents an effort to achieve competitive reconstruction quality through carefully designed supervised architectures, positioning itself as an alternative to the generative-heavy paradigm while remaining closely aligned with other end-to-end learning strategies.

Claimed Contributions

Brain-IT: Brain-inspired approach for faithful fMRI-to-image reconstruction

The authors introduce Brain-IT, a novel framework that reconstructs images from fMRI brain recordings with higher fidelity to the actual seen images than previous methods. The approach achieves state-of-the-art performance on both visual quality and quantitative metrics by addressing limitations in how brain representations are extracted and mapped to image features.

10 retrieved papers
Brain Interaction Transformer (BIT) for mapping functional brain-voxel clusters to localized image features

The authors develop BIT, a transformer-based model that clusters functionally similar brain-voxels across subjects and maps these shared clusters directly to localized image features. This design enables direct information flow from functional brain regions to image features and supports effective cross-subject information integration.

10 retrieved papers
Deep Image Prior approach for low-level image reconstruction from fMRI

The authors propose a complementary low-level reconstruction branch that predicts VGG-based features from fMRI and inverts them through a Deep Image Prior framework. This produces accurate coarse image layouts that initialize the diffusion process, combining structural fidelity with semantic guidance.

2 retrieved papers
Can Refute

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Brain-IT: Brain-inspired approach for faithful fMRI-to-image reconstruction

The authors introduce Brain-IT, a novel framework that reconstructs images from fMRI brain recordings with higher fidelity to the actual seen images than previous methods. The approach achieves state-of-the-art performance on both visual quality and quantitative metrics by addressing limitations in how brain representations are extracted and mapped to image features.

Contribution

Brain Interaction Transformer (BIT) for mapping functional brain-voxel clusters to localized image features

The authors develop BIT, a transformer-based model that clusters functionally similar brain-voxels across subjects and maps these shared clusters directly to localized image features. This design enables direct information flow from functional brain regions to image features and supports effective cross-subject information integration.

Contribution

Deep Image Prior approach for low-level image reconstruction from fMRI

The authors propose a complementary low-level reconstruction branch that predicts VGG-based features from fMRI and inverts them through a Deep Image Prior framework. This produces accurate coarse image layouts that initialize the diffusion process, combining structural fidelity with semantic guidance.

Brain-IT: Image Reconstruction from fMRI via Brain-Interaction Transformer | Novelty Validation