Beyond Real: Imaginary Extension of Rotary Position Embeddings for Long-Context LLMs

ICLR 2026 Conference SubmissionAnonymous Authors
Large Language ModelLong-Context LLMPosition Embedding
Abstract:

Rotary Position Embeddings (RoPE) have become a standard for encoding sequence order in Large Language Models (LLMs) by applying rotations to query and key vectors in the complex plane. Standard implementations, however, utilize only the real component of the complex-valued dot product for attention score calculation. This simplification discards the imaginary component, which contains valuable phase information, leading to a potential loss of relational details crucial for modeling long-range dependencies. In this paper, we propose an extension that re-incorporates this discarded imaginary component. Our method leverages the full complex-valued representation to create a dual-component attention score. We theoretically and empirically demonstrate that this approach enhances the modeling of long-context dependencies by preserving more positional information. Furthermore, evaluations on a suite of long-context language modeling benchmarks show that our method consistently improves performance over the standard RoPE, with the benefits becoming more significant as context length increases.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper proposes re-incorporating the imaginary component of complex-valued dot products in RoPE to preserve phase information for long-range dependencies. It resides in the 'Imaginary Component Utilization' leaf under 'Complex-Plane and Higher-Dimensional RoPE Extensions', where it is currently the sole paper. This leaf is part of a broader taxonomy containing sixteen papers across ten distinct research directions, indicating a moderately explored field with multiple competing approaches to extending RoPE.

The taxonomy reveals several neighboring directions: 'Geometric Space Augmentation' extends RoPE into 3D Bloch spheres or hyperbolic spaces, while 'RoPE Extension via Base and Frequency Manipulation' adjusts fundamental parameters without altering algebraic structure. 'Hierarchical and Grouped Positional Encoding' partitions positions into multi-scale representations. The paper's focus on complex-plane arithmetic distinguishes it from frequency-based methods like Resonance RoPE and hierarchical schemes like Hirope, which operate within different mathematical frameworks to address context extension.

Among twenty-seven candidates examined via top-K semantic search and citation expansion, none clearly refute the three core contributions. The RoPE++ method examined ten candidates with zero refutations, the dual-configuration approach examined ten with zero refutations, and the theoretical analysis examined seven with zero refutations. This suggests that within the limited search scope, the specific mechanism of leveraging imaginary components for dual-component attention scores appears relatively unexplored, though the broader complex-plane extension direction has some prior work in geometric augmentation.

Based on the limited literature search covering twenty-seven candidates, the work appears to occupy a sparse niche within complex-plane RoPE extensions. The analysis does not cover exhaustive prior work in attention mechanisms or positional encoding more broadly, focusing instead on RoPE-specific extensions. The absence of sibling papers in the same taxonomy leaf and zero refutations across contributions suggest novelty within the examined scope, though comprehensive assessment would require broader search.

Taxonomy

Core-task Taxonomy Papers
16
3
Claimed Contributions
27
Contribution Candidate Papers Compared
0
Refutable Paper

Research Landscape Overview

Core task: Extending rotary position embeddings with imaginary components for long-context language modeling. The field of RoPE extensions has diversified into several major branches, each addressing distinct aspects of positional encoding for transformers and related architectures. One cluster focuses on base and frequency manipulation, adjusting the fundamental parameters of RoPE to stretch context windows. Another explores complex-plane and higher-dimensional extensions, leveraging richer geometric structures to encode position information more expressively. Hierarchical and grouped approaches partition positions into multiple scales, while input-dependent and selective methods adapt embeddings dynamically based on token content. Additional branches tackle multi-scale plug-and-play designs, unified encodings for hybrid architectures combining transformers with state-space models, single-stage continual pretraining strategies, stability and noise mitigation, inference-only decay resilience, dimension efficiency analysis, and domain-specific applications. Representative works such as Extending Context Window in[1], 3D-RPE[4], and Hirope[8] illustrate the breadth of these directions, each proposing distinct mechanisms to enhance or generalize the original RoPE framework. Within this landscape, a particularly active line of research investigates how to exploit the complex plane and higher-dimensional representations to capture positional relationships more robustly. Beyond Real[0] sits squarely in this branch, introducing imaginary components to RoPE in order to enrich the embedding space and improve long-context modeling. This approach contrasts with frequency-based methods like Resonance RoPE[5], which modulates base frequencies without altering the underlying algebraic structure, and with hierarchical schemes such as Hirope[8], which decompose positions into nested groups rather than extending the number field. By moving beyond purely real-valued rotations, Beyond Real[0] explores whether additional degrees of freedom in the complex domain can mitigate interpolation artifacts and better preserve relative position information at extended sequence lengths. This direction complements other stability-focused efforts and reflects ongoing interest in whether geometric generalizations of RoPE can unlock more scalable positional encoding.

Claimed Contributions

RoPE++ method re-incorporating imaginary component of complex attention

The authors propose RoPE++, which reintroduces the previously discarded imaginary component of the complex-valued attention computation in Rotary Position Embeddings. This creates a dual-component attention mechanism that preserves more positional information by using both real and imaginary parts of the complex dot product.

10 retrieved papers
Two RoPE++ configurations with different efficiency trade-offs

The authors develop two variants of RoPE++: RoPE++EH maintains the same number of attention heads while reducing KV cache and parameters by half, and RoPE++EC maintains the same cache size while doubling the number of attention heads. Both configurations preserve the unified absolute-relative position embedding format.

10 retrieved papers
Theoretical and empirical analysis of imaginary attention properties

The authors provide theoretical analysis showing that imaginary attention captures longer-range dependencies through its sine integral characteristic curve and exposes query-key pairs to a wider positional information range. They empirically validate that imaginary heads attend more to long-context information and play a dominant role in long-context modeling.

7 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Within the taxonomy built over the current TopK core-task papers, the original paper is assigned to a leaf with no direct siblings and no cousin branches under the same grandparent topic. In this retrieved landscape, it appears structurally isolated, which is one partial signal of novelty, but still constrained by search coverage and taxonomy granularity.

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

RoPE++ method re-incorporating imaginary component of complex attention

The authors propose RoPE++, which reintroduces the previously discarded imaginary component of the complex-valued attention computation in Rotary Position Embeddings. This creates a dual-component attention mechanism that preserves more positional information by using both real and imaginary parts of the complex dot product.

Contribution

Two RoPE++ configurations with different efficiency trade-offs

The authors develop two variants of RoPE++: RoPE++EH maintains the same number of attention heads while reducing KV cache and parameters by half, and RoPE++EC maintains the same cache size while doubling the number of attention heads. Both configurations preserve the unified absolute-relative position embedding format.

Contribution

Theoretical and empirical analysis of imaginary attention properties

The authors provide theoretical analysis showing that imaginary attention captures longer-range dependencies through its sine integral characteristic curve and exposes query-key pairs to a wider positional information range. They empirically validate that imaginary heads attend more to long-context information and play a dominant role in long-context modeling.