Otters: An Energy-Efficient Spiking Transformer via Optical Time-to-First-Spike Encoding

ICLR 2026 Conference SubmissionAnonymous Authors
Spiking neural networkEnergy efficientTime-to-First-Spike EncodingOptoelectronic Synapse
Abstract:

Spiking neural networks (SNNs) promise high energy efficiency, particularly with time-to-first-spike (TTFS) encoding, which maximizes sparsity by emitting at most one spike per neuron. However, such energy advantage is often unrealized because inference requires evaluating a temporal decay function and subsequent multiplication with the synaptic weights. This paper challenges this costly approach by repurposing a physical hardware `bug', namely, the natural signal decay in optoelectronic devices, as the core computation of TTFS. We fabricated a custom indium oxide optoelectronic synapse, showing how its natural physical decay directly implements the required temporal function. By treating the device's analog output as the fused product of the synaptic weight and temporal decay, optoelectronic synaptic TTFS (named Otters) eliminates these expensive digital operations. To use the Otters paradigm in complex architectures like the transformer, which are challenging to train directly due to the sparsity issue, we introduce a novel quantized neural network-to-SNN conversion algorithm. This complete hardware-software co-design enables our model to achieve state-of-the-art accuracy across seven GLUE benchmark datasets and demonstrates a 1.77×\times improvement in energy efficiency over previous leading SNNs, based on a comprehensive analysis of compute, data movement, and memory access costs using energy measurements from a commercial 22nm process. Our work thus establishes a new paradigm for energy-efficient SNNs, translating fundamental device physics directly into powerful computational primitives. All codes and data are open source.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper introduces a hardware-software co-design approach that repurposes natural signal decay in indium oxide optoelectronic devices to implement time-to-first-spike encoding, eliminating costly digital operations for temporal decay and weight multiplication. It sits in a sparse taxonomy leaf ('Hardware-Software Co-Design for Transformer-Based SNNs with Optoelectronic TTFS') with no sibling papers, indicating this specific intersection of optoelectronic TTFS and Transformer architectures is relatively unexplored. The broader taxonomy contains only eight papers across seven leaves, suggesting the entire field of optoelectronic TTFS-based SNNs is nascent.

The taxonomy reveals three main branches: device implementations (three leaves, four papers), neuron circuit designs (two leaves, three papers), and system-level architectures (two leaves, two papers including this work). Neighboring work focuses on material-level innovations (MoS2 phototransistors, memristive synapses) or circuit-level neuron designs (Izhikevich models, single-transistor implementations), whereas this paper targets end-to-end system integration. The closest related direction is 'Optical-Electrical Hybrid SNNs for Speech Recognition,' which addresses application-specific architectures but not Transformer-based models, highlighting a gap this work attempts to fill.

Among thirty candidates examined, the Otters paradigm (Contribution A) shows no clear refutation across ten candidates, suggesting novelty in leveraging physical device decay for TTFS computation. However, the QNN-to-SNN conversion algorithm (Contribution B) encountered two refutable candidates among ten examined, and hardware-aware training for device variability (Contribution C) found six refutable candidates among ten, indicating more substantial prior work in these areas. The limited search scope means these statistics reflect top-thirty semantic matches rather than exhaustive coverage, so unexamined literature may contain additional overlaps.

The analysis suggests the core optoelectronic TTFS paradigm appears relatively novel within the examined scope, while the conversion and robustness training components build on more established techniques. The sparse taxonomy structure and absence of sibling papers reinforce that this specific hardware-software co-design for Transformers occupies an underexplored niche, though the limited candidate pool (thirty papers) and focused search methodology mean broader literature may reveal additional context not captured here.

Taxonomy

Core-task Taxonomy Papers
8
3
Claimed Contributions
30
Contribution Candidate Papers Compared
8
Refutable Paper

Research Landscape Overview

Core task: energy-efficient spiking neural networks using optoelectronic time-to-first-spike encoding. The field centers on exploiting optoelectronic components to implement time-to-first-spike (TTFS) coding in spiking neural networks, aiming to reduce energy consumption while preserving computational expressiveness. The taxonomy reveals three main branches. The first, Optoelectronic Device Implementations for TTFS Encoding, explores novel materials and device physics—ranging from two-dimensional semiconductors like MoS2 Phototransistors[1] to perovskite-based memristors such as FA2PbI4 Memristors[5] and laser-based encoders like DFB-SA Laser[3]—that directly convert optical signals into spike-timing information. The second branch, Optoelectronic Neuron Circuit Designs for SNNs, focuses on circuit-level realizations of spiking neurons, including compact designs like Single Transistor Neuron[4] and biologically inspired models such as Izhikevich Optoelectronic[6], as well as hybrid approaches using memristive elements for programmable delays (Memristor Delays[7]). The third branch, System-Level Architectures and Applications, addresses how these devices and circuits integrate into complete SNN systems, including hardware-software co-design strategies and application-specific implementations like tactile sensing with IGZO Tactile Encoders[8]. A central theme across these branches is the trade-off between device simplicity, encoding fidelity, and system scalability. Device-level innovations often prioritize ultra-low power and fast photoresponse, while circuit designs must balance biological realism with practical manufacturability, and system architectures grapple with interfacing optoelectronic front-ends to digital or analog back-ends. Otters[0] sits within the System-Level Architectures branch, specifically targeting hardware-software co-design for Transformer-based SNNs with optoelectronic TTFS. Unlike device-focused works such as MoS2 Phototransistors[1] or DFB-SA Laser[3], Otters[0] emphasizes end-to-end system integration, leveraging TTFS encoding to enable energy-efficient attention mechanisms. This positions it closer to application-driven efforts like Sparse Photoencoder[2], which also explores system-level encoding strategies, yet Otters[0] uniquely addresses the computational demands of Transformers—a relatively underexplored intersection of optoelectronic SNNs and modern deep architectures.

Claimed Contributions

Optoelectronic Time-to-First-Spike (Otters) paradigm

The authors introduce Otters, a hardware-software co-design that repurposes the natural signal decay of a custom-fabricated In2O3 optoelectronic synapse to physically implement the temporal decay function required for TTFS encoding. This approach eliminates the costly digital computation of decay functions and multiplications, fusing computation and memory into a single physical process.

10 retrieved papers
QNN-to-SNN conversion algorithm for spiking Transformers

The authors develop a conversion methodology that trains a quantized neural network (QNN) with 1-bit weights and 1-bit key/value projections using knowledge distillation, then converts it to an equivalent Otters SNN. This approach circumvents the challenges of direct SNN training while enabling deployment in complex Transformer architectures.

10 retrieved papers
Can Refute
Hardware-Aware Training for robustness to device variability

The authors propose Hardware-Aware Training (HAT), which injects simulated Gaussian noise during QNN training to build robustness against hardware non-idealities. This method enables the model to tolerate device-to-device variability in analog optoelectronic synapses, demonstrating practical deployment viability.

10 retrieved papers
Can Refute

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Within the taxonomy built over the current TopK core-task papers, the original paper is assigned to a leaf with no direct siblings and no cousin branches under the same grandparent topic. In this retrieved landscape, it appears structurally isolated, which is one partial signal of novelty, but still constrained by search coverage and taxonomy granularity.

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Optoelectronic Time-to-First-Spike (Otters) paradigm

The authors introduce Otters, a hardware-software co-design that repurposes the natural signal decay of a custom-fabricated In2O3 optoelectronic synapse to physically implement the temporal decay function required for TTFS encoding. This approach eliminates the costly digital computation of decay functions and multiplications, fusing computation and memory into a single physical process.

Contribution

QNN-to-SNN conversion algorithm for spiking Transformers

The authors develop a conversion methodology that trains a quantized neural network (QNN) with 1-bit weights and 1-bit key/value projections using knowledge distillation, then converts it to an equivalent Otters SNN. This approach circumvents the challenges of direct SNN training while enabling deployment in complex Transformer architectures.

Contribution

Hardware-Aware Training for robustness to device variability

The authors propose Hardware-Aware Training (HAT), which injects simulated Gaussian noise during QNN training to build robustness against hardware non-idealities. This method enables the model to tolerate device-to-device variability in analog optoelectronic synapses, demonstrating practical deployment viability.