Pre-training Limited Memory Language Models with Internal and External Knowledge
Overview
Taxonomy
Research Landscape Overview
Claimed Contributions
LMLM is a novel class of language models designed to offload entity-level factual knowledge to an external database during pre-training instead of storing it in model parameters. This approach decouples factual memorization from language understanding, enabling more efficient use of model capacity and providing explicit, editable, and verifiable knowledge bases.
The authors propose a modified pre-training procedure that excludes retrieved factual values from the loss computation. This design discourages the model from memorizing facts and instead teaches it to generate database lookup calls, systematically separating factual knowledge from neural weights during training.
The authors develop a complete framework that includes automated knowledge extraction using a distilled annotator model to prepare training data, a modified pre-training loss that masks retrieved values, and an inference procedure where the model interleaves text generation with database lookups to ground outputs on retrieved facts.
Core Task Comparisons
Comparisons with papers in the same taxonomy category
Contribution Analysis
Detailed comparisons for each claimed contribution
Limited Memory Language Models (LMLM)
LMLM is a novel class of language models designed to offload entity-level factual knowledge to an external database during pre-training instead of storing it in model parameters. This approach decouples factual memorization from language understanding, enabling more efficient use of model capacity and providing explicit, editable, and verifiable knowledge bases.
Pre-training approach with lookup masking
The authors propose a modified pre-training procedure that excludes retrieved factual values from the loss computation. This design discourages the model from memorizing facts and instead teaches it to generate database lookup calls, systematically separating factual knowledge from neural weights during training.
Integrated solution for data preparation, pre-training, and inference
The authors develop a complete framework that includes automated knowledge extraction using a distilled annotator model to prepare training data, a modified pre-training loss that masks retrieved values, and an inference procedure where the model interleaves text generation with database lookups to ground outputs on retrieved facts.