Brain-IT: Image Reconstruction from fMRI via Brain-Interaction Transformer

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 6.0 Download Report PDF

fMRI-to-Image ReconstructionBrain DecodingfMRI DecodingMultiple Brains

Reconstructing images seen by people from their fMRI brain recordings provides a non-invasive window into the human brain. Despite recent progress enabled by diffusion models, current methods often lack faithfulness to the actual seen images. We present ``Brain-IT'', a brain-inspired approach that addresses this challenge through a Brain Interaction Transformer (BIT), allowing effective interactions between clusters of functionally-similar brain-voxels. These functional-clusters are shared by all subjects, serving as building blocks for integrating information both within and across brains. All model components are shared by all clusters & subjects, allowing efficient training with limited amount of data. To guide the image reconstruction, BIT predicts two complementary localized patch-level image features: (i) high-level semantic features which steer the diffusion model toward the correct semantic content of the image; and (ii) low-level structural features which help to initialize the diffusion process with the correct coarse layout of the image. BIT's design enables direct flow of information from brain-voxel clusters to localized image features. Through these principles, our method achieves image reconstructions from fMRI that faithfully reconstruct the seen images, and surpass current SotA approaches both visually and by standard objective metrics. Moreover, with only 1-hour of fMRI data from a new subject, we achieve results comparable to current methods trained on full 40-hour recordings.

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

Brain-IT proposes a supervised end-to-end architecture for fMRI-to-image reconstruction, introducing a Brain Interaction Transformer (BIT) that maps functional brain-voxel clusters to localized image features. The paper resides in the 'Supervised End-to-End Architectures' leaf, which contains five papers including the original work. This leaf sits within the broader 'Neural Architecture and Representation Learning' branch, indicating a moderately populated research direction focused on direct neural mapping strategies rather than generative-model-heavy approaches. The taxonomy reveals this is an active but not overcrowded area, with clear methodological boundaries separating it from diffusion-based and self-supervised methods.

The taxonomy structure shows Brain-IT's leaf neighbors include works on adversarial training, multitask learning, and graph-based architectures for fMRI decoding. Adjacent leaves address multi-subject generalization and self-supervised learning, suggesting the field explores diverse architectural strategies for handling inter-subject variability and limited supervision. The 'Generative Model-Based Reconstruction Approaches' branch, particularly diffusion-based methods with seven papers in single-stage decoding alone, represents a parallel and arguably more crowded research direction. Brain-IT's positioning emphasizes architectural innovation over reliance on pretrained generative priors, distinguishing it from the diffusion-heavy paradigm that dominates recent reconstruction efforts.

Among the three contributions analyzed, the Brain Interaction Transformer and overall Brain-IT framework each examined ten candidates with no clear refutations found in the limited search scope. The Deep Image Prior component examined only two candidates, with one appearing to provide overlapping prior work. This suggests the core architectural contributions may be more novel within the examined literature, while the low-level reconstruction strategy has more substantial precedent. However, these findings reflect a search of twenty-two total candidates, not an exhaustive review, and the taxonomy shows related work exists in adjacent leaves that may not have been fully captured by semantic search.

Based on the limited search scope, Brain-IT appears to introduce distinctive architectural elements for fMRI-to-image mapping, particularly in its functional-cluster-based interaction mechanism. The taxonomy context suggests it occupies a meaningful but not isolated position within supervised end-to-end approaches, with clear differentiation from generative-model-heavy methods. The analysis covers top-K semantic matches and does not exhaustively survey all related work in multi-subject generalization or alternative architectural strategies that may share conceptual overlap.

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: image reconstruction from fMRI brain recordings. The field has evolved into several major branches that reflect different methodological philosophies. Generative Model-Based Reconstruction Approaches leverage powerful pretrained diffusion models and GANs to synthesize images from neural signals, as seen in works like High-Resolution Latent Diffusion[3] and Brain-Diffuser[8]. Neural Architecture and Representation Learning focuses on designing specialized encoders and end-to-end pipelines that map brain activity directly to visual features, exemplified by End-to-End Deep Reconstruction[4] and related supervised architectures. Multimodal and Extended Decoding Tasks broaden the scope beyond static images to language, video, and cross-modal representations, while Specialized Reconstruction Contexts address domain-specific constraints such as retinal imaging or rapid decoding scenarios. Methodological Foundations and Benchmarking provide the datasets, evaluation protocols, and theoretical underpinnings that support reproducible progress across these diverse directions. Recent work reveals a tension between fully supervised end-to-end learning and hybrid approaches that incorporate strong generative priors. Brain-IT[0] sits within the Neural Architecture and Representation Learning branch, specifically among Supervised End-to-End Architectures, where it emphasizes direct mapping from fMRI voxels to image space without relying heavily on pretrained generative models. This contrasts with neighbors like Cognitive Representation Adversarial[20] and Multitask Learning Decoding[28], which explore adversarial training and multitask objectives to enrich learned representations. Meanwhile, the generative branch continues to push reconstruction fidelity using diffusion guidance, raising questions about the trade-offs between architectural simplicity, interpretability, and dependence on large-scale pretraining. Brain-IT[0] represents an effort to achieve competitive reconstruction quality through carefully designed supervised architectures, positioning itself as an alternative to the generative-heavy paradigm while remaining closely aligned with other end-to-end learning strategies.

Claimed Contributions

Brain-IT: Brain-inspired approach for faithful fMRI-to-image reconstruction

10 retrieved papers

The authors introduce Brain-IT, a novel framework that reconstructs images from fMRI brain recordings with higher fidelity to the actual seen images than previous methods. The approach achieves state-of-the-art performance on both visual quality and quantitative metrics by addressing limitations in how brain representations are extracted and mapped to image features.

10 retrieved papers

Brain Interaction Transformer (BIT) for mapping functional brain-voxel clusters to localized image features

10 retrieved papers

The authors develop BIT, a transformer-based model that clusters functionally similar brain-voxels across subjects and maps these shared clusters directly to localized image features. This design enables direct information flow from functional brain regions to image features and supports effective cross-subject information integration.

10 retrieved papers

Deep Image Prior approach for low-level image reconstruction from fMRI

Can Refute

2 retrieved papers

The authors propose a complementary low-level reconstruction branch that predicts VGG-based features from fMRI and inverts them through a Deep Image Prior framework. This produces accurate coarse image layouts that initialize the diffusion process, combining structural fidelity with semantic guidance.

2 retrieved papers

Can Refute

Core Task Comparisons

Comparisons with papers in the same taxonomy category

[9] Natural Image Reconstruction from fMRI Based on NodeâEdge Interaction and MultiâScale Constraint PDF

Mei Kuang, Zongyi Zhan, Shaobing Gao (2024)

[20] Reconstructing seen image from brain activity by visually-guided cognitive representation and adversarial learning PDF

Ziqi Ren, Jie Li, Xuetong Xue, Xin Li, Fan Yang, Zhicheng Jiao, Xinbo Gao, Z. Jiao (2021)

[28] Decoding with Purpose: Improving Image Reconstruction from fMRI with Multitask Learning PDF

A. Lad, Abhi Lad, Reema Patel (2021)

[46] Constraint-free natural image reconstruction from fMRI signals based on convolutional neural network PDF

Chi Zhang, Kai Qiao, Linyuan Wang, Li Tong, Ying Zeng, Bin Yan (2018)

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Brain-IT: Brain-inspired approach for faithful fMRI-to-image reconstruction

[3] High-resolution image reconstruction with latent diffusion models from human brain activity PDF

Cannot Refute

[7] Psychometry: An Omnifit Model for Image Reconstruction from Human Brain Activity PDF

Cannot Refute

[8] Brain-Diffuser: Natural scene reconstruction from fMRI signals using generative latent diffusion PDF

Cannot Refute

[12] Mind Reader: Reconstructing complex images from brain activities PDF

Cannot Refute

[15] MindDiffuser: Controlled Image Reconstruction from Human Brain Activity with Semantic and Structural Diffusion PDF

Cannot Refute

[51] DREAM: Visual Decoding from REversing HumAn Visual SysteM PDF

Cannot Refute

[52] Visual Image Reconstruction from Brain Activity via Latent Representation PDF

Cannot Refute

[53] Brain-driven facial image reconstruction via StyleGAN inversion with improved identity consistency PDF

Cannot Refute

[54] Mind-3d++: Advancing fmri-based 3d reconstruction with high-quality textured mesh generation and a comprehensive dataset PDF

Cannot Refute

[55] MindTuner: Cross-Subject Visual Decoding with Visual Fingerprint and Semantic Correction PDF

Cannot Refute

Contribution

Brain Interaction Transformer (BIT) for mapping functional brain-voxel clusters to localized image features

[56] Transformer brain encoders explain human high-level visual responses PDF

Cannot Refute

[57] BrainMT: A Hybrid Mamba-Transformer Architecture for Modeling Long-Range Dependencies in Functional MRI Data PDF

Cannot Refute

[58] Dynamic Graph Transformer for Brain Disorder Diagnosis PDF

Cannot Refute

[59] Predicting task-related brain activity from resting-state brain dynamics with fMRI Transformer PDF

Cannot Refute

[60] In Silico Mapping of Visual Categorical Selectivity Across the Whole Brain PDF

Cannot Refute

[61] Swift: Swin 4d fmri transformer PDF

Cannot Refute

[62] stâDenseViT: A Weakly Supervised Spatiotemporal Vision Transformer for Dense Prediction of Dynamic Brain Networks PDF

Cannot Refute

[63] Brain Tumor Segmentation Through Supervoxel Transformer PDF

Cannot Refute

[64] Disentangling spatial-temporal functional brain networks via twin-transformers PDF

Cannot Refute

[65] A common network of functional areas for attention and eye movements PDF

Cannot Refute

Contribution

Deep Image Prior approach for low-level image reconstruction from fMRI

[66] Visual encoding and decoding of the human brain based on shared features PDF

Can Refute

[67] Feature space-guided denoising of noisy 4D data: applications to dynamic PET imaging and dual-calibrated functional MRI PDF

Cannot Refute

Brain-IT: Image Reconstruction from fMRI via Brain-Interaction Transformer

Overview

Overall Novelty Assessment

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

[9] Natural Image Reconstruction from fMRI Based on NodeâEdge Interaction and MultiâScale Constraint PDF

[20] Reconstructing seen image from brain activity by visually-guided cognitive representation and adversarial learning PDF

[28] Decoding with Purpose: Improving Image Reconstruction from fMRI with Multitask Learning PDF

[46] Constraint-free natural image reconstruction from fMRI signals based on convolutional neural network PDF

Contribution Analysis

Brain-IT: Brain-inspired approach for faithful fMRI-to-image reconstruction

[3] High-resolution image reconstruction with latent diffusion models from human brain activity PDF

[7] Psychometry: An Omnifit Model for Image Reconstruction from Human Brain Activity PDF

[8] Brain-Diffuser: Natural scene reconstruction from fMRI signals using generative latent diffusion PDF

[12] Mind Reader: Reconstructing complex images from brain activities PDF

[15] MindDiffuser: Controlled Image Reconstruction from Human Brain Activity with Semantic and Structural Diffusion PDF

[51] DREAM: Visual Decoding from REversing HumAn Visual SysteM PDF

[52] Visual Image Reconstruction from Brain Activity via Latent Representation PDF

[53] Brain-driven facial image reconstruction via StyleGAN inversion with improved identity consistency PDF

[54] Mind-3d++: Advancing fmri-based 3d reconstruction with high-quality textured mesh generation and a comprehensive dataset PDF

[55] MindTuner: Cross-Subject Visual Decoding with Visual Fingerprint and Semantic Correction PDF

Brain Interaction Transformer (BIT) for mapping functional brain-voxel clusters to localized image features

[56] Transformer brain encoders explain human high-level visual responses PDF

[57] BrainMT: A Hybrid Mamba-Transformer Architecture for Modeling Long-Range Dependencies in Functional MRI Data PDF

[58] Dynamic Graph Transformer for Brain Disorder Diagnosis PDF

[59] Predicting task-related brain activity from resting-state brain dynamics with fMRI Transformer PDF

[60] In Silico Mapping of Visual Categorical Selectivity Across the Whole Brain PDF

[61] Swift: Swin 4d fmri transformer PDF

[62] stâDenseViT: A Weakly Supervised Spatiotemporal Vision Transformer for Dense Prediction of Dynamic Brain Networks PDF

[63] Brain Tumor Segmentation Through Supervoxel Transformer PDF

[64] Disentangling spatial-temporal functional brain networks via twin-transformers PDF

[65] A common network of functional areas for attention and eye movements PDF

Deep Image Prior approach for low-level image reconstruction from fMRI

[66] Visual encoding and decoding of the human brain based on shared features PDF

[67] Feature space-guided denoising of noisy 4D data: applications to dynamic PET imaging and dual-calibrated functional MRI PDF

Table of Contents

[9] Natural Image Reconstruction from fMRI Based on NodeâEdge Interaction and MultiâScale Constraint PDF

[62] stâDenseViT: A Weakly Supervised Spatiotemporal Vision Transformer for Dense Prediction of Dynamic Brain Networks PDF