Catalog-Native LLM: Speaking Item-ID dialect with Less Entanglement for Recommendation

ICLR 2026 Conference SubmissionAnonymous Authors
Recommender SystemsLarge Language ModelsMixture of Experts
Abstract:

While collaborative filtering delivers predictive accuracy and efficiency, and Large Language Models (LLMs) enable expressive and generalizable reasoning, modern recommendation systems must bring these strengths together. Growing user expectations, such as natural-language queries and transparent explanations, further highlight the need for a unified approach. However, doing so is nontrivial. Collaborative signals are often token-efficient but semantically opaque, while LLMs are semantically rich but struggle to model implicit user preferences when trained only on textual inputs. This paper introduces Item-ID + Natural-language Mixture-of-Experts Language Model (IDIOMoE), which treats item interaction histories as a native dialect within the language space, enabling collaborative signals to be understood in the same way as natural language. By splitting the Feed Forward Network of each block of a pretrained LLM into a separate text expert and an item expert with token-type gating, our method avoids destructive interference between text and catalog modalities. IDIOMoE demonstrates strong recommendation performance across both public and proprietary datasets, while preserving the text understanding of the pretrained model.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper introduces IDIOMoE, a mixture-of-experts architecture that treats item interaction histories as a native dialect within the language space. It resides in the 'Tokenization and Encoding Strategies' leaf, which contains five papers exploring how to convert collaborative embeddings or item identifiers into discrete tokens or text-like sequences compatible with LLM vocabularies. This leaf sits within the broader 'Collaborative Signal Integration Mechanisms' branch, indicating a moderately crowded research direction focused on encoding collaborative signals for LLM consumption. The taxonomy shows this is an active area with multiple competing approaches to the same fundamental challenge.

The taxonomy reveals neighboring leaves addressing related integration challenges through different mechanisms. 'Embedding Projection and Alignment' (six papers) focuses on continuous mapping rather than discrete tokenization, while 'Multimodal and Cross-Modal Fusion' (three papers) extends integration to multiple modalities. The scope note for the paper's leaf explicitly excludes continuous projection methods, positioning IDIOMoE's token-type gating and expert splitting as a distinct approach. Nearby branches like 'Semantic and Prompting Approaches' and 'Hybrid and Collaborative-LLM Architectures' tackle the integration problem from complementary angles, suggesting the field explores multiple pathways rather than converging on a single solution.

Among twenty-six candidates examined, the core IDIOMoE architecture shows no clear refutation across ten candidates, suggesting novelty in its specific mixture-of-experts design. The disentangled MoE architecture similarly appears novel across six candidates examined. However, the FFN key-value memory analysis framework encountered two refutable candidates among ten examined, indicating this analytical contribution has more substantial prior work. The limited search scope means these findings reflect top-K semantic matches rather than exhaustive coverage, but the pattern suggests the architectural contributions are more distinctive than the analysis framework within the examined literature.

Based on the limited search of twenty-six candidates, IDIOMoE appears to offer a novel architectural approach within an active research area. The mixture-of-experts design with token-type gating distinguishes it from sibling papers in the same taxonomy leaf, though the analysis framework shows overlap with existing work. The taxonomy structure reveals this contribution sits at the intersection of tokenization strategies and architectural innovation, addressing a well-recognized challenge through a distinct mechanism not clearly anticipated by the examined prior work.

Taxonomy

Core-task Taxonomy Papers
50
3
Claimed Contributions
26
Contribution Candidate Papers Compared
2
Refutable Paper

Research Landscape Overview

Core task: Integrating collaborative filtering with large language models for recommendation. The field has evolved into several complementary directions that address different aspects of this integration challenge. Collaborative Signal Integration Mechanisms explore how to encode user-item interaction patterns into formats digestible by LLMs, with branches focusing on tokenization strategies, graph-based representations, and fusion techniques that preserve collaborative information. Semantic and Prompting Approaches leverage the natural language understanding of LLMs through carefully designed prompts and textual representations of user preferences. Hybrid and Collaborative-LLM Architectures develop systems that combine traditional collaborative filtering modules with LLM components, balancing the strengths of both paradigms. Agent-Based and Interactive Recommendation treats recommendation as a conversational or multi-agent problem, while Domain-Specific and Application-Oriented Methods tailor solutions to particular contexts like e-commerce or music. Finally, Optimization, Evaluation, and Supporting Techniques address practical concerns around efficiency, scalability, and measurement. Within Collaborative Signal Integration Mechanisms, the Tokenization and Encoding Strategies branch has attracted considerable attention, exploring how to represent collaborative signals as tokens or embeddings that LLMs can process effectively. Works like Collaborative LLM[3] and Text Encoding Collaborative[2] investigate different encoding schemes, while TokenRec[23] and User Item Graph[38] propose novel tokenization methods that capture interaction patterns. Catalog Native LLM[0] situates itself in this active area by focusing on catalog-native representations that preserve item relationships and collaborative structure. Compared to approaches like Collaborative LLM[3], which may emphasize general-purpose encoding, and TokenRec[23], which explores specific tokenization architectures, Catalog Native LLM[0] appears to prioritize representations that align naturally with catalog structures, offering a distinct perspective on how collaborative signals can be made accessible to language models while maintaining the semantic richness of item catalogs.

Claimed Contributions

Item-ID + Natural-language Mixture-of-Experts Language Model (IDIOMoE)

The authors propose a Mixture-of-Experts architecture that treats item IDs as a distinct dialect from natural language. The model splits the Feed Forward Network of each transformer block into separate text and item experts with token-type gating, avoiding destructive interference between text and catalog modalities while preserving pretrained language understanding.

10 retrieved papers
Disentangled MoE architecture for recommendation

The authors introduce a novel architectural design that explicitly separates collaborative filtering signals from semantic language processing using dedicated experts. A router activates text experts only when useful, enabling modality-specific specialization without parameter interference.

6 retrieved papers
FFN key-value memory analysis framework

The authors develop an analysis framework that views FFN neurons as key-value memories to demonstrate that their MoE separation produces more interpretable and modular representations. They introduce metrics for item-text affinity, category purity, and neuron clustering to quantify expert specialization.

10 retrieved papers
Can Refute

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Item-ID + Natural-language Mixture-of-Experts Language Model (IDIOMoE)

The authors propose a Mixture-of-Experts architecture that treats item IDs as a distinct dialect from natural language. The model splits the Feed Forward Network of each transformer block into separate text and item experts with token-type gating, avoiding destructive interference between text and catalog modalities while preserving pretrained language understanding.

Contribution

Disentangled MoE architecture for recommendation

The authors introduce a novel architectural design that explicitly separates collaborative filtering signals from semantic language processing using dedicated experts. A router activates text experts only when useful, enabling modality-specific specialization without parameter interference.

Contribution

FFN key-value memory analysis framework

The authors develop an analysis framework that views FFN neurons as key-value memories to demonstrate that their MoE separation produces more interpretable and modular representations. They introduce metrics for item-text affinity, category purity, and neuron clustering to quantify expert specialization.