LIME: Link-based user-item Interaction Modeling with decoupled xor attention for Efficient test time scaling

ICLR 2026 Conference SubmissionAnonymous Authors
decoupled attentionrecommendation systemtest time scalingxor attentionlinear attentionranking
Abstract:

Scaling large recommendation systems requires advancing three major frontiers: processing longer user histories, expanding candidate sets, and increasing model capacity. While promising, transformers' computational cost scales quadratically with the user sequence length and linearly with the number of candidates. This trade-off makes it prohibitively expensive to expand candidate sets or increase sequence length at inference, despite the significant performance improvements.

We introduce \textbf{LIME}, a novel architecture that resolves this trade-off. Through two key innovations, LIME fundamentally reduces computational complexity. First, low-rank ``link embeddings" enable pre-computation of attention weights by decoupling user and candidate interactions, making the inference cost nearly independent of candidate set size. Second, a linear attention mechanism, \textbf{LIME-XOR}, reduces the complexity with respect to user sequence length from quadratic (O(N2)O(N^2)) to linear (O(N)O(N)).

Experiments on public and industrial datasets show LIME achieves near-parity with state-of-the-art transformers but with a 10×\times inference speedup on large candidate sets or long sequence lengths. When tested on a major recommendation platform, LIME improved user engagement while maintaining minimal inference costs with respect to candidate set size and user history length, establishing a new paradigm for efficient and expressive recommendation systems.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper introduces LIME, an architecture combining low-rank link embeddings and linear attention (LIME-XOR) to reduce computational complexity in large-scale recommendation. It resides in the 'Linear Attention and Complexity Reduction' leaf, which contains only three papers total, indicating a relatively sparse research direction within the broader taxonomy of 24 papers across ~13 leaf nodes. This leaf focuses specifically on achieving sub-quadratic complexity through novel attention formulations, distinguishing it from training optimizations or non-attention efficiency methods found in sibling categories.

The taxonomy reveals that LIME's parent branch, 'Efficient Attention Mechanisms for Sequential Modeling', sits alongside five other major branches addressing complementary challenges: user representation, LLM integration, scalable retrieval, cold-start handling, and domain-specific methods. Within the attention efficiency branch, LIME's leaf neighbors 'Transformer Scaling and Training Optimization' (two papers on batching and training speedups), suggesting the field separates inference-time complexity reduction from training-time optimizations. The scope notes clarify that LIME's linear attention focus excludes standard quadratic transformers and non-attention approaches, positioning it as a direct alternative to full self-attention for sequential recommendation.

Among 21 candidates examined across three contributions, no clearly refuting prior work was identified. The 'Link Embedding Mechanism' examined one candidate with no overlap; 'XOR Attention Masking' and 'LIME Architecture Framework' each examined ten candidates, again with zero refutable matches. This limited search scope—drawn from top-K semantic retrieval—suggests the specific combination of decoupled user-candidate attention via low-rank embeddings and XOR-based linear masking has not been directly addressed in the examined literature. However, the small candidate pool (21 papers) and sparse leaf occupancy (three papers) mean the analysis covers a focused slice rather than exhaustive prior work.

Given the sparse taxonomy leaf and absence of refuting candidates among 21 examined papers, LIME appears to occupy a relatively unexplored niche within attention-based sequential recommendation. The analysis is constrained by its top-K semantic search scope and does not capture potential overlaps in adjacent fields (e.g., general linear attention research outside recommendation systems). The contribution's novelty hinges on the joint application of link embeddings and linear attention to recommendation-specific scaling challenges, which the limited search did not contradict.

Taxonomy

Core-task Taxonomy Papers
24
3
Claimed Contributions
21
Contribution Candidate Papers Compared
0
Refutable Paper

Research Landscape Overview

Core task: efficient large-scale recommendation with long user histories and large candidate sets. The field has evolved into several distinct branches that address complementary aspects of this challenge. Efficient Attention Mechanisms for Sequential Modeling focuses on reducing the computational burden of modeling lengthy interaction sequences, often through linear-complexity variants or sliding-window approaches that avoid quadratic scaling. User Representation and Multi-Interest Modeling tackles the problem of capturing diverse user preferences from rich behavioral data, while Large Language Model Integration explores how pretrained language models can enhance semantic understanding and cold-start performance. Scalable Retrieval and Ranking Architectures emphasizes system-level optimizations for handling massive candidate pools, and Cold-Start and Long-Tail Item Handling addresses the perennial difficulty of recommending less popular or newly introduced items. Domain-Specific and Contextual Recommendation adapts these techniques to particular verticals such as music or location-aware services, and Foundational Reviews and Surveys provide overarching perspectives on collaborative filtering and related paradigms. Within the Efficient Attention Mechanisms branch, a particularly active line of work centers on linear-complexity transformations that preserve expressive power while dramatically reducing runtime. LIME[0] exemplifies this direction by proposing a linear attention variant tailored to sequential recommendation, sitting alongside methods like ELASTIC[23] that also pursue complexity reduction through architectural innovations. Nearby efforts such as Faster Sequential Training[2] and Sliding Window Training[11] explore alternative strategies—batching optimizations and localized context windows—that complement attention-based approaches. The main trade-off across these works involves balancing model expressiveness against computational overhead: while full self-attention captures long-range dependencies most directly, linear approximations and windowing schemes offer practical scalability at the potential cost of missing distant interactions. LIME[0] occupies a middle ground by retaining global receptive fields through its linear formulation, contrasting with purely local methods yet avoiding the quadratic cost of standard transformers.

Claimed Contributions

Link Embedding Mechanism for Decoupled Attention

The authors propose a novel mechanism using globally learned link embeddings that act as an intermediary between user history and candidate items. This design decouples user and item representations, enabling pre-computation of attention weights offline and making inference cost nearly independent of candidate set size.

1 retrieved paper
XOR Attention Masking for Linear Complexity

The authors introduce a novel XOR attention mask that reduces self-attention complexity from quadratic O(N²) to linear O(N) by eliminating direct history-to-history interactions and instead facilitating efficient bidirectional attention between link embeddings and user history.

10 retrieved papers
LIME Architecture Framework

The authors present LIME, a comprehensive architectural framework that combines link embeddings and XOR attention to achieve cross-attention-like expressiveness with two-tower efficiency, enabling scalable recommendation systems that process longer histories and larger candidate sets.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Link Embedding Mechanism for Decoupled Attention

The authors propose a novel mechanism using globally learned link embeddings that act as an intermediary between user history and candidate items. This design decouples user and item representations, enabling pre-computation of attention weights offline and making inference cost nearly independent of candidate set size.

Contribution

XOR Attention Masking for Linear Complexity

The authors introduce a novel XOR attention mask that reduces self-attention complexity from quadratic O(N²) to linear O(N) by eliminating direct history-to-history interactions and instead facilitating efficient bidirectional attention between link embeddings and user history.

Contribution

LIME Architecture Framework

The authors present LIME, a comprehensive architectural framework that combines link embeddings and XOR attention to achieve cross-attention-like expressiveness with two-tower efficiency, enabling scalable recommendation systems that process longer histories and larger candidate sets.