Beyond Real: Imaginary Extension of Rotary Position Embeddings for Long-Context LLMs

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 6.0 Download Report PDF

Large Language ModelLong-Context LLMPosition Embedding

Rotary Position Embeddings (RoPE) have become a standard for encoding sequence order in Large Language Models (LLMs) by applying rotations to query and key vectors in the complex plane. Standard implementations, however, utilize only the real component of the complex-valued dot product for attention score calculation. This simplification discards the imaginary component, which contains valuable phase information, leading to a potential loss of relational details crucial for modeling long-range dependencies. In this paper, we propose an extension that re-incorporates this discarded imaginary component. Our method leverages the full complex-valued representation to create a dual-component attention score. We theoretically and empirically demonstrate that this approach enhances the modeling of long-context dependencies by preserving more positional information. Furthermore, evaluations on a suite of long-context language modeling benchmarks show that our method consistently improves performance over the standard RoPE, with the benefits becoming more significant as context length increases.

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper proposes re-incorporating the imaginary component of complex-valued dot products in RoPE to preserve phase information for long-range dependencies. It resides in the 'Imaginary Component Utilization' leaf under 'Complex-Plane and Higher-Dimensional RoPE Extensions', where it is currently the sole paper. This leaf is part of a broader taxonomy containing sixteen papers across ten distinct research directions, indicating a moderately explored field with multiple competing approaches to extending RoPE.

The taxonomy reveals several neighboring directions: 'Geometric Space Augmentation' extends RoPE into 3D Bloch spheres or hyperbolic spaces, while 'RoPE Extension via Base and Frequency Manipulation' adjusts fundamental parameters without altering algebraic structure. 'Hierarchical and Grouped Positional Encoding' partitions positions into multi-scale representations. The paper's focus on complex-plane arithmetic distinguishes it from frequency-based methods like Resonance RoPE and hierarchical schemes like Hirope, which operate within different mathematical frameworks to address context extension.

Among twenty-seven candidates examined via top-K semantic search and citation expansion, none clearly refute the three core contributions. The RoPE++ method examined ten candidates with zero refutations, the dual-configuration approach examined ten with zero refutations, and the theoretical analysis examined seven with zero refutations. This suggests that within the limited search scope, the specific mechanism of leveraging imaginary components for dual-component attention scores appears relatively unexplored, though the broader complex-plane extension direction has some prior work in geometric augmentation.

Based on the limited literature search covering twenty-seven candidates, the work appears to occupy a sparse niche within complex-plane RoPE extensions. The analysis does not cover exhaustive prior work in attention mechanisms or positional encoding more broadly, focusing instead on RoPE-specific extensions. The absence of sibling papers in the same taxonomy leaf and zero refutations across contributions suggest novelty within the examined scope, though comprehensive assessment would require broader search.

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: Extending rotary position embeddings with imaginary components for long-context language modeling. The field of RoPE extensions has diversified into several major branches, each addressing distinct aspects of positional encoding for transformers and related architectures. One cluster focuses on base and frequency manipulation, adjusting the fundamental parameters of RoPE to stretch context windows. Another explores complex-plane and higher-dimensional extensions, leveraging richer geometric structures to encode position information more expressively. Hierarchical and grouped approaches partition positions into multiple scales, while input-dependent and selective methods adapt embeddings dynamically based on token content. Additional branches tackle multi-scale plug-and-play designs, unified encodings for hybrid architectures combining transformers with state-space models, single-stage continual pretraining strategies, stability and noise mitigation, inference-only decay resilience, dimension efficiency analysis, and domain-specific applications. Representative works such as Extending Context Window in[1], 3D-RPE[4], and Hirope[8] illustrate the breadth of these directions, each proposing distinct mechanisms to enhance or generalize the original RoPE framework. Within this landscape, a particularly active line of research investigates how to exploit the complex plane and higher-dimensional representations to capture positional relationships more robustly. Beyond Real[0] sits squarely in this branch, introducing imaginary components to RoPE in order to enrich the embedding space and improve long-context modeling. This approach contrasts with frequency-based methods like Resonance RoPE[5], which modulates base frequencies without altering the underlying algebraic structure, and with hierarchical schemes such as Hirope[8], which decompose positions into nested groups rather than extending the number field. By moving beyond purely real-valued rotations, Beyond Real[0] explores whether additional degrees of freedom in the complex domain can mitigate interpolation artifacts and better preserve relative position information at extended sequence lengths. This direction complements other stability-focused efforts and reflects ongoing interest in whether geometric generalizations of RoPE can unlock more scalable positional encoding.

Claimed Contributions

RoPE++ method re-incorporating imaginary component of complex attention

10 retrieved papers

The authors propose RoPE++, which reintroduces the previously discarded imaginary component of the complex-valued attention computation in Rotary Position Embeddings. This creates a dual-component attention mechanism that preserves more positional information by using both real and imaginary parts of the complex dot product.

10 retrieved papers

Two RoPE++ configurations with different efficiency trade-offs

10 retrieved papers

The authors develop two variants of RoPE++: RoPE++EH maintains the same number of attention heads while reducing KV cache and parameters by half, and RoPE++EC maintains the same cache size while doubling the number of attention heads. Both configurations preserve the unified absolute-relative position embedding format.

10 retrieved papers

Theoretical and empirical analysis of imaginary attention properties

7 retrieved papers

The authors provide theoretical analysis showing that imaginary attention captures longer-range dependencies through its sine integral characteristic curve and exposes query-key pairs to a wider positional information range. They empirically validate that imaginary heads attend more to long-context information and play a dominant role in long-context modeling.

7 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Within the taxonomy built over the current TopK core-task papers, the original paper is assigned to a leaf with no direct siblings and no cousin branches under the same grandparent topic. In this retrieved landscape, it appears structurally isolated, which is one partial signal of novelty, but still constrained by search coverage and taxonomy granularity.

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

RoPE++ method re-incorporating imaginary component of complex attention

[27] FCAFormer: multivariate time series forecasting combining channel attention and transformer in the frequency domain: B. Xiao et al. PDF

Cannot Refute

[28] T-gsa: Transformer with gaussian-weighted self-attention for speech enhancement PDF

Cannot Refute

[29] A Complex Attention Transformer for Bearing Fault Diagnosis Based on Motor Current Signals PDF

Cannot Refute

[30] ComplexFormer: Disruptively Advancing Transformer Inference Ability via Head-Specific Complex Vector Attention PDF

Cannot Refute

[31] A Complex-Valued Transformer for Automatic Modulation Recognition PDF

Cannot Refute

[32] Signal transformer: Complex-valued attention and meta-learning for signal recognition PDF

Cannot Refute

[33] Contextual Learning in Fourier Complex Field for VHR Remote Sensing Images PDF

Cannot Refute

[34] AI-Driven Channel State Information (CSI) Extrapolation for 6G: Current Situations, Challenges and Future Research PDF

Cannot Refute

[35] Phaseper: a complex-valued transformer for automatic speech recognition PDF

Cannot Refute

[36] A Complex Hermitian Positive Definite Manifold Embedding Transformer Network for Time-Varying Direction of Arrival Tracking PDF

Cannot Refute

Contribution

Two RoPE++ configurations with different efficiency trade-offs

[17] Beyond Homogeneous Attention: Memory-Efficient LLMs via Fourier-Approximated KV Cache PDF

Cannot Refute

[18] FairKV: Balancing Per-Head KV Cache for Fast Multi-GPU Inference PDF

Cannot Refute

[19] Mlkv: Multi-layer key-value heads for memory efficient transformer decoding PDF

Cannot Refute

[20] WuNeng: Hybrid State with Attention PDF

Cannot Refute

[21] Not All Heads Matter: A Head-Level KV Cache Compression Method with Integrated Retrieval and Reasoning PDF

Cannot Refute

[22] Task-KV: Task-aware KV Cache Optimization via Semantic Differentiation of Attention Heads PDF

Cannot Refute

[23] Lossless KV Cache Compression to 2% PDF

Cannot Refute

[24] Multi-matrix Factorization Attention PDF

Cannot Refute

[25] Spikingbrain technical report: Spiking brain-inspired large models PDF

Cannot Refute

[26] ChunkAttention: Efficient Attention on KV Cache with Chunking Sharing and Batching PDF

Cannot Refute

Contribution

Theoretical and empirical analysis of imaginary attention properties

[2] Found in the Middle: How Language Models Use Long Contexts Better via Plug-and-Play Positional Encoding PDF

Cannot Refute

[10] Selective Rotary Position Embedding PDF

Cannot Refute

[37] Complex-valued relative positional encodings for transformer PDF

Cannot Refute

[38] CoPE: A Lightweight Complex Positional Encoding PDF

Cannot Refute

[39] Exploiting Complex-Valued Representations in Automatic Modulation Recognition: A Framework Integrating a Transformer With Relative Positional Encoding and Separable Convolution PDF

Cannot Refute

[40] Exploiting Complex-Valued Representations in Automatic Modulation Recognition: A Framework Integrating a Transformer with Relative Positional Encoding and â¦ PDF

Cannot Refute

[41] A Long-Tail Fault Diagnosis Method Based on a Coupled TimeâFrequency Attention Transformer PDF

Cannot Refute

Beyond Real: Imaginary Extension of Rotary Position Embeddings for Long-Context LLMs

Overview

Overall Novelty Assessment

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

Contribution Analysis

RoPE++ method re-incorporating imaginary component of complex attention

[27] FCAFormer: multivariate time series forecasting combining channel attention and transformer in the frequency domain: B. Xiao et al. PDF

[28] T-gsa: Transformer with gaussian-weighted self-attention for speech enhancement PDF

[29] A Complex Attention Transformer for Bearing Fault Diagnosis Based on Motor Current Signals PDF

[30] ComplexFormer: Disruptively Advancing Transformer Inference Ability via Head-Specific Complex Vector Attention PDF

[31] A Complex-Valued Transformer for Automatic Modulation Recognition PDF

[32] Signal transformer: Complex-valued attention and meta-learning for signal recognition PDF

[33] Contextual Learning in Fourier Complex Field for VHR Remote Sensing Images PDF

[34] AI-Driven Channel State Information (CSI) Extrapolation for 6G: Current Situations, Challenges and Future Research PDF

[35] Phaseper: a complex-valued transformer for automatic speech recognition PDF

[36] A Complex Hermitian Positive Definite Manifold Embedding Transformer Network for Time-Varying Direction of Arrival Tracking PDF

Two RoPE++ configurations with different efficiency trade-offs

[17] Beyond Homogeneous Attention: Memory-Efficient LLMs via Fourier-Approximated KV Cache PDF

[18] FairKV: Balancing Per-Head KV Cache for Fast Multi-GPU Inference PDF

[19] Mlkv: Multi-layer key-value heads for memory efficient transformer decoding PDF

[20] WuNeng: Hybrid State with Attention PDF

[21] Not All Heads Matter: A Head-Level KV Cache Compression Method with Integrated Retrieval and Reasoning PDF

[22] Task-KV: Task-aware KV Cache Optimization via Semantic Differentiation of Attention Heads PDF

[23] Lossless KV Cache Compression to 2% PDF

[24] Multi-matrix Factorization Attention PDF

[25] Spikingbrain technical report: Spiking brain-inspired large models PDF

[26] ChunkAttention: Efficient Attention on KV Cache with Chunking Sharing and Batching PDF

Theoretical and empirical analysis of imaginary attention properties

[2] Found in the Middle: How Language Models Use Long Contexts Better via Plug-and-Play Positional Encoding PDF

[10] Selective Rotary Position Embedding PDF

[37] Complex-valued relative positional encodings for transformer PDF

[38] CoPE: A Lightweight Complex Positional Encoding PDF

[39] Exploiting Complex-Valued Representations in Automatic Modulation Recognition: A Framework Integrating a Transformer With Relative Positional Encoding and Separable Convolution PDF

[40] Exploiting Complex-Valued Representations in Automatic Modulation Recognition: A Framework Integrating a Transformer with Relative Positional Encoding and â¦ PDF

[41] A Long-Tail Fault Diagnosis Method Based on a Coupled TimeâFrequency Attention Transformer PDF

Table of Contents

[40] Exploiting Complex-Valued Representations in Automatic Modulation Recognition: A Framework Integrating a Transformer with Relative Positional Encoding and â¦ PDF

[41] A Long-Tail Fault Diagnosis Method Based on a Coupled TimeâFrequency Attention Transformer PDF