LIME: Link-based user-item Interaction Modeling with decoupled xor attention for Efficient test time scaling

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 6.0 Download Report PDF

decoupled attentionrecommendation systemtest time scalingxor attentionlinear attentionranking

Scaling large recommendation systems requires advancing three major frontiers: processing longer user histories, expanding candidate sets, and increasing model capacity. While promising, transformers' computational cost scales quadratically with the user sequence length and linearly with the number of candidates. This trade-off makes it prohibitively expensive to expand candidate sets or increase sequence length at inference, despite the significant performance improvements.

We introduce \textbf{LIME}, a novel architecture that resolves this trade-off. Through two key innovations, LIME fundamentally reduces computational complexity. First, low-rank ``link embeddings" enable pre-computation of attention weights by decoupling user and candidate interactions, making the inference cost nearly independent of candidate set size. Second, a linear attention mechanism, \textbf{LIME-XOR}, reduces the complexity with respect to user sequence length from quadratic ( $O(N^2)$ ) to linear ( $O(N)$ ).

Experiments on public and industrial datasets show LIME achieves near-parity with state-of-the-art transformers but with a 10 $\times$ inference speedup on large candidate sets or long sequence lengths. When tested on a major recommendation platform, LIME improved user engagement while maintaining minimal inference costs with respect to candidate set size and user history length, establishing a new paradigm for efficient and expressive recommendation systems.

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper introduces LIME, an architecture combining low-rank link embeddings and linear attention (LIME-XOR) to reduce computational complexity in large-scale recommendation. It resides in the 'Linear Attention and Complexity Reduction' leaf, which contains only three papers total, indicating a relatively sparse research direction within the broader taxonomy of 24 papers across ~13 leaf nodes. This leaf focuses specifically on achieving sub-quadratic complexity through novel attention formulations, distinguishing it from training optimizations or non-attention efficiency methods found in sibling categories.

The taxonomy reveals that LIME's parent branch, 'Efficient Attention Mechanisms for Sequential Modeling', sits alongside five other major branches addressing complementary challenges: user representation, LLM integration, scalable retrieval, cold-start handling, and domain-specific methods. Within the attention efficiency branch, LIME's leaf neighbors 'Transformer Scaling and Training Optimization' (two papers on batching and training speedups), suggesting the field separates inference-time complexity reduction from training-time optimizations. The scope notes clarify that LIME's linear attention focus excludes standard quadratic transformers and non-attention approaches, positioning it as a direct alternative to full self-attention for sequential recommendation.

Among 21 candidates examined across three contributions, no clearly refuting prior work was identified. The 'Link Embedding Mechanism' examined one candidate with no overlap; 'XOR Attention Masking' and 'LIME Architecture Framework' each examined ten candidates, again with zero refutable matches. This limited search scope—drawn from top-K semantic retrieval—suggests the specific combination of decoupled user-candidate attention via low-rank embeddings and XOR-based linear masking has not been directly addressed in the examined literature. However, the small candidate pool (21 papers) and sparse leaf occupancy (three papers) mean the analysis covers a focused slice rather than exhaustive prior work.

Given the sparse taxonomy leaf and absence of refuting candidates among 21 examined papers, LIME appears to occupy a relatively unexplored niche within attention-based sequential recommendation. The analysis is constrained by its top-K semantic search scope and does not capture potential overlaps in adjacent fields (e.g., general linear attention research outside recommendation systems). The contribution's novelty hinges on the joint application of link embeddings and linear attention to recommendation-specific scaling challenges, which the limited search did not contradict.

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: efficient large-scale recommendation with long user histories and large candidate sets. The field has evolved into several distinct branches that address complementary aspects of this challenge. Efficient Attention Mechanisms for Sequential Modeling focuses on reducing the computational burden of modeling lengthy interaction sequences, often through linear-complexity variants or sliding-window approaches that avoid quadratic scaling. User Representation and Multi-Interest Modeling tackles the problem of capturing diverse user preferences from rich behavioral data, while Large Language Model Integration explores how pretrained language models can enhance semantic understanding and cold-start performance. Scalable Retrieval and Ranking Architectures emphasizes system-level optimizations for handling massive candidate pools, and Cold-Start and Long-Tail Item Handling addresses the perennial difficulty of recommending less popular or newly introduced items. Domain-Specific and Contextual Recommendation adapts these techniques to particular verticals such as music or location-aware services, and Foundational Reviews and Surveys provide overarching perspectives on collaborative filtering and related paradigms. Within the Efficient Attention Mechanisms branch, a particularly active line of work centers on linear-complexity transformations that preserve expressive power while dramatically reducing runtime. LIME[0] exemplifies this direction by proposing a linear attention variant tailored to sequential recommendation, sitting alongside methods like ELASTIC[23] that also pursue complexity reduction through architectural innovations. Nearby efforts such as Faster Sequential Training[2] and Sliding Window Training[11] explore alternative strategies—batching optimizations and localized context windows—that complement attention-based approaches. The main trade-off across these works involves balancing model expressiveness against computational overhead: while full self-attention captures long-range dependencies most directly, linear approximations and windowing schemes offer practical scalability at the potential cost of missing distant interactions. LIME[0] occupies a middle ground by retaining global receptive fields through its linear formulation, contrasting with purely local methods yet avoiding the quadratic cost of standard transformers.

Claimed Contributions

Link Embedding Mechanism for Decoupled Attention

1 retrieved paper

The authors propose a novel mechanism using globally learned link embeddings that act as an intermediary between user history and candidate items. This design decouples user and item representations, enabling pre-computation of attention weights offline and making inference cost nearly independent of candidate set size.

1 retrieved paper

XOR Attention Masking for Linear Complexity

10 retrieved papers

The authors introduce a novel XOR attention mask that reduces self-attention complexity from quadratic O(N²) to linear O(N) by eliminating direct history-to-history interactions and instead facilitating efficient bidirectional attention between link embeddings and user history.

10 retrieved papers

LIME Architecture Framework

10 retrieved papers

The authors present LIME, a comprehensive architectural framework that combines link embeddings and XOR attention to achieve cross-attention-like expressiveness with two-tower efficiency, enabling scalable recommendation systems that process longer histories and larger candidate sets.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

[23] ELASTIC: Efficient Linear Attention for Sequential Interest Compression PDF

DENG Jiaxin, Wang Shi-yao, Jiaxin Deng, Lu Song, Shiyao Wang, Li Yinfeng, Song Lu, Luo, Xinchen, Yinfeng Li, Liu Yuan-jun, Xinchen Luo, Xu Peixing, Yuanjun Liu, Zhou, Guorui, Peixing Xu, Guorui Zhou (2024)

[24] LIME: LINK-BASED USER-ITEM INTERACTION MOD PDF

CTT SCALING (0)

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Link Embedding Mechanism for Decoupled Attention

[25] A multimodal skin lesion classification through cross-attention fusion and collaborative edge computing. PDF

Cannot Refute

Contribution

XOR Attention Masking for Linear Complexity

[26] Linformer: Self-Attention with Linear Complexity PDF

Cannot Refute

[27] Beyond Self-Attention: External Attention Using Two Linear Layers for Visual Tasks PDF

Cannot Refute

[28] Efficient Attention: Attention with Linear Complexities PDF

Cannot Refute

[29] Hyperattention: Long-context attention in near-linear time PDF

Cannot Refute

[30] Castling-vit: Compressing self-attention via switching towards linear-angular attention at vision transformer inference PDF

Cannot Refute

[31] Agent Attention: On the Integration of Softmax and Linear Attention PDF

Cannot Refute

[32] DAM: Dynamic Attention Mask for Long-Context Large Language Model Inference Acceleration PDF

Cannot Refute

[33] Artifacts and Attention Sinks: Structured Approximations for Efficient Vision Transformers PDF

Cannot Refute

[34] Self-attention Does Not Need Memory PDF

Cannot Refute

[35] Faster neighborhood attention: Reducing the o (n^ 2) cost of self attention at the threadblock level PDF

Cannot Refute

Contribution

LIME Architecture Framework

[36] HAGN: A hierarchical attentive graph network with dynamic reward fusion for scalable recommendation PDF

Cannot Refute

[37] Bidirectional alignment text-embeddings with decoupled contrastive for sequential recommendation PDF

Cannot Refute

[38] Decoupled Side Information Fusion for Sequential Recommendation PDF

Cannot Refute

[39] Locally enhanced denoising self-attention networks and decoupled position encoding for sequential recommendation PDF

Cannot Refute

[40] DDualSE: Decoupled Dual-head Squeeze and Excitation Attention for Sequential Recommendation PDF

Cannot Refute

[41] End4rec: Efficient noise-decoupling for multi-behavior sequential recommendation PDF

Cannot Refute

[42] Decoupled Behavior-based Contrastive Recommendation PDF

Cannot Refute

[43] Multi-behavioral recommendation algorithm based on decoupled graph convolution PDF

Cannot Refute

[44] Multi-Label Movie Genre Classification with Attention Mechanism on Movie Plots PDF

Cannot Refute

[45] Enhancing Performance and Scalability of Large-Scale Recommendation Systems with Jagged Flash Attention PDF

Cannot Refute

LIME: Link-based user-item Interaction Modeling with decoupled xor attention for Efficient test time scaling

Overview

Overall Novelty Assessment

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

[23] ELASTIC: Efficient Linear Attention for Sequential Interest Compression PDF

[24] LIME: LINK-BASED USER-ITEM INTERACTION MOD PDF

Contribution Analysis

Link Embedding Mechanism for Decoupled Attention

[25] A multimodal skin lesion classification through cross-attention fusion and collaborative edge computing. PDF

XOR Attention Masking for Linear Complexity

[26] Linformer: Self-Attention with Linear Complexity PDF

[27] Beyond Self-Attention: External Attention Using Two Linear Layers for Visual Tasks PDF

[28] Efficient Attention: Attention with Linear Complexities PDF

[29] Hyperattention: Long-context attention in near-linear time PDF

[30] Castling-vit: Compressing self-attention via switching towards linear-angular attention at vision transformer inference PDF

[31] Agent Attention: On the Integration of Softmax and Linear Attention PDF

[32] DAM: Dynamic Attention Mask for Long-Context Large Language Model Inference Acceleration PDF

[33] Artifacts and Attention Sinks: Structured Approximations for Efficient Vision Transformers PDF

[34] Self-attention Does Not Need Memory PDF

[35] Faster neighborhood attention: Reducing the o (n^ 2) cost of self attention at the threadblock level PDF

LIME Architecture Framework

[36] HAGN: A hierarchical attentive graph network with dynamic reward fusion for scalable recommendation PDF

[37] Bidirectional alignment text-embeddings with decoupled contrastive for sequential recommendation PDF

[38] Decoupled Side Information Fusion for Sequential Recommendation PDF

[39] Locally enhanced denoising self-attention networks and decoupled position encoding for sequential recommendation PDF

[40] DDualSE: Decoupled Dual-head Squeeze and Excitation Attention for Sequential Recommendation PDF

[41] End4rec: Efficient noise-decoupling for multi-behavior sequential recommendation PDF

[42] Decoupled Behavior-based Contrastive Recommendation PDF

[43] Multi-behavioral recommendation algorithm based on decoupled graph convolution PDF

[44] Multi-Label Movie Genre Classification with Attention Mechanism on Movie Plots PDF

[45] Enhancing Performance and Scalability of Large-Scale Recommendation Systems with Jagged Flash Attention PDF

Table of Contents