Coupled Transformer Autoencoder for Disentangling Multi-Region Neural Latent Dynamics

ICLR 2026 Conference SubmissionAnonymous Authors
multi-region neural recordingsshared/private disentanglementtransformer sequence modelscoupled autoencoderslatent variable dynamicsNeuropixelsneural dynamicsrepresentation learning
Abstract:

Simultaneous recordings from thousands of neurons across multiple brain areas reveal rich mixtures of activity that are shared between regions and dynamics that are unique to each region. Existing alignment or multi-view methods neglect temporal structure, whereas dynamical latent-variable models capture temporal dependencies but are usually restricted to a single area, assume linear read-outs, or conflate shared and private signals. We introduce Coupled Transformer Autoencoder (CTAE)—a sequence model that addresses both (i) non-stationary, non-linear dynamics and (ii) separation of shared versus region-specific structure, in a single framework. CTAE employs Transformer encoders and decoders to capture long-range neural dynamics, and explicitly partitions each region’s latent space into orthogonal shared and private subspaces. We demonstrate the effectiveness of CTAE on two high-density electrophysiology datasets of simultaneous recordings from multiple regions, one from motor cortical areas and the other from sensory areas. CTAE extracts meaningful representations that better decode behavior variables compared to existing approaches.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper introduces a Coupled Transformer Autoencoder (CTAE) that combines transformer-based sequence modeling with explicit shared-private latent space partitioning for multi-region neural recordings. It resides in the 'Deep Learning Architectures for Shared-Private Factorization' leaf, which contains four papers total including the original work. This leaf sits within the broader 'Latent Variable Models for Multi-Region Neural Dynamics' branch, indicating a moderately populated research direction focused on deep learning approaches to neural decomposition rather than classical probabilistic methods.

The taxonomy reveals neighboring leaves addressing related but distinct challenges: 'Behavior-Aligned Latent Dynamics Modeling' incorporates behavioral variables explicitly during factorization, while 'Classical Latent Dynamical Models' employ state-space frameworks without deep architectures. The sibling papers in the same leaf (SPIRE, Disentangled Low-Rank RNN, and one other) emphasize recurrent or low-rank structures with probabilistic priors. CTAE diverges by adopting transformer attention mechanisms for long-range dependencies, positioning itself at the intersection of modern sequence modeling and neural decomposition rather than relying on RNN-based or explicitly probabilistic frameworks.

Among eight candidates examined across three contributions, none were flagged as clearly refuting the work. The core CTAE framework examined two candidates with zero refutations, the scalable architecture examined zero candidates, and the behavior-agnostic latent space examined six candidates with zero refutations. This limited search scope—eight papers rather than an exhaustive survey—suggests the analysis captures immediate neighbors but may not reveal all overlapping prior work. The absence of refutations across contributions indicates that within this small sample, no single paper directly anticipates the combination of transformer encoders, orthogonal subspace partitioning, and multi-region electrophysiology applications.

Given the constrained literature search and the moderately populated taxonomy leaf, the work appears to occupy a recognizable niche within deep learning-based neural decomposition. The transformer-based approach differentiates it from recurrent or low-rank methods among its siblings, though the fundamental task of shared-private factorization is well-established in this research area. A broader search beyond the top-eight semantic matches would be necessary to assess whether similar transformer-based multi-region architectures exist in adjacent communities or recent preprints.

Taxonomy

Core-task Taxonomy Papers
50
3
Claimed Contributions
8
Contribution Candidate Papers Compared
0
Refutable Paper

Research Landscape Overview

Core task: Disentangling shared and private neural dynamics across multiple brain regions. The field has organized itself around several complementary perspectives. Latent Variable Models for Multi-Region Neural Dynamics form a central branch, developing probabilistic and deep learning architectures that explicitly factorize neural activity into components common across regions versus those unique to each area. Multi-View and Multi-Modal Integration Methods address the challenge of combining heterogeneous data sources—such as fMRI, EEG, and behavioral recordings—by learning joint representations that respect both shared structure and modality-specific features. Task-Specific Neural Decomposition Approaches focus on isolating dynamics tied to particular cognitive or motor functions, while Clinical and Translational Disentanglement Applications extend these techniques to disease states and therapeutic contexts. Network Connectivity and Structural Decomposition methods emphasize graph-theoretic and anatomical constraints, Foundation Models and Large-Scale Neural Representations explore scalable architectures for population-level inference, and Specialized Neural Representation Studies examine domain-specific phenomena such as sensory processing or social cognition. Within the latent variable branch, a particularly active line of work employs deep learning architectures for shared-private factorization. Methods like SPIRE[2] and Disentangled Low-Rank RNN[21] use recurrent or low-rank structures to separate trial-shared dynamics from region-private variability, often emphasizing interpretability and alignment with known neural constraints. The Coupled Transformer Autoencoder[0] sits naturally within this cluster, leveraging transformer-based attention mechanisms to model dependencies across regions while maintaining distinct private subspaces. Compared to SPIRE[2], which relies on explicit probabilistic priors, the transformer approach offers greater flexibility in capturing long-range temporal dependencies. Relative to Intrinsic Input-Driven Dynamics[3], which emphasizes input-driven versus intrinsic components, the coupled autoencoder framework prioritizes the spatial decomposition of multi-region recordings. Open questions remain about how to balance model expressiveness with biological plausibility and how to scale these architectures to whole-brain recordings.

Claimed Contributions

Coupled Transformer Autoencoder (CTAE) framework

CTAE is a novel sequence modeling framework that uses Transformer encoders and decoders to capture long-range, non-stationary neural dynamics while explicitly partitioning each brain region's latent space into orthogonal shared and private subspaces. This addresses limitations of existing methods that either neglect temporal structure or fail to separate shared and region-specific signals.

2 retrieved papers
Scalable multi-region architecture with mixing weights

The architecture employs region-specific weight masks and a weighted latent fusion mechanism that enables scalable extension to more than two brain regions without exponential parameter growth, unlike existing multi-region methods that suffer from scalability issues.

0 retrieved papers
Behavior-agnostic latent space for downstream decoding

CTAE produces generic latent representations that can support multiple downstream behavioral decoding tasks such as kinematics, forces, or cognitive variables using simple linear decoders, without requiring retraining of the model for each specific task.

6 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Coupled Transformer Autoencoder (CTAE) framework

CTAE is a novel sequence modeling framework that uses Transformer encoders and decoders to capture long-range, non-stationary neural dynamics while explicitly partitioning each brain region's latent space into orthogonal shared and private subspaces. This addresses limitations of existing methods that either neglect temporal structure or fail to separate shared and region-specific signals.

Contribution

Scalable multi-region architecture with mixing weights

The architecture employs region-specific weight masks and a weighted latent fusion mechanism that enables scalable extension to more than two brain regions without exponential parameter growth, unlike existing multi-region methods that suffer from scalability issues.

Contribution

Behavior-agnostic latent space for downstream decoding

CTAE produces generic latent representations that can support multiple downstream behavioral decoding tasks such as kinematics, forces, or cognitive variables using simple linear decoders, without requiring retraining of the model for each specific task.