Critical Confabulation: Can LLMs Hallucinate for Social Good?
Overview
Overall Novelty Assessment
The paper introduces critical confabulation, a framework that deliberately uses LLM hallucinations to reconstruct missing historical narratives for marginalized figures, evaluated through a narrative cloze task on unpublished texts. According to the taxonomy tree, this work sits in the 'Critical Confabulation and Narrative Cloze Approaches' leaf under 'Controlled Hallucination Methods for Historical Reconstruction'. Notably, this leaf contains only the original paper itself—no sibling papers are listed. This suggests the paper occupies a relatively sparse research direction within the broader field of using LLMs for historical reconstruction, which comprises 21 papers across multiple branches.
The taxonomy reveals several neighboring research directions that contextualize this work's position. The sibling leaf 'Recursive and Generative Ancestral Reconstruction Systems' explores iterative human-AI methods for ancestral narratives, while 'AI-Mediated Voice Recreation for Specific Historical Figures' focuses on recreating individual voices. The broader 'Bias Analysis and Representation Studies' branch examines how LLMs encode historical inequities, and 'Community-Level Oral History and Archive Analysis' addresses collective memory preservation. The scope note for the paper's leaf explicitly excludes 'general creative generation without historical grounding', positioning critical confabulation as evidence-bound speculation rather than unconstrained creativity.
Among 30 candidates examined through limited semantic search, none were found to clearly refute any of the three main contributions. The critical confabulation framework examined 10 candidates with 0 refutable matches; the narrative cloze task similarly examined 10 candidates with no refutations; and the contamination-audited dataset from unpublished texts also showed 10 candidates examined with no clear prior work. This absence of refutable candidates across all contributions, combined with the paper being the sole occupant of its taxonomy leaf, suggests the specific combination of controlled hallucination for historical reconstruction, narrative cloze evaluation, and contamination-audited unpublished sources represents a relatively unexplored configuration within the limited search scope.
Based on the limited literature search of 30 candidates, the work appears to occupy a novel position combining theoretical framing (critical confabulation), methodological innovation (narrative cloze), and dataset construction (contamination-audited unpublished texts). However, the analysis cannot assess whether more extensive searches in adjacent fields—such as digital humanities, archival studies, or computational creativity—might reveal closer precedents. The taxonomy structure suggests active research in related areas like bias analysis and oral history, but the specific intersection this paper targets remains sparsely populated within the examined scope.
Taxonomy
Research Landscape Overview
Claimed Contributions
The authors introduce critical confabulation as a framework that repurposes LLM hallucinations to reconstruct evidence-bounded narratives for historically under-documented figures. This approach adapts Hartman's critical fabulation methodology to leverage controlled confabulations for addressing archival silence and recovering divergent historical narratives.
The authors operationalize critical confabulation as a narrative cloze task where LLMs must reconstruct masked events in character timelines. This task serves as a proxy for fragmentary historical records and enables systematic evaluation of models' ability to perform evidence-bounded confabulation.
The authors construct a dataset from the Black Writing and Thought Collection with rigorous data contamination auditing procedures. They perform sentence-level string searches and behavioral probes to ensure the dataset represents genuinely unseen history, enabling reliable evaluation of confabulation rather than memorization.
Core Task Comparisons
Comparisons with papers in the same taxonomy category
Contribution Analysis
Detailed comparisons for each claimed contribution
Critical confabulation framework for LLM hallucinations
The authors introduce critical confabulation as a framework that repurposes LLM hallucinations to reconstruct evidence-bounded narratives for historically under-documented figures. This approach adapts Hartman's critical fabulation methodology to leverage controlled confabulations for addressing archival silence and recovering divergent historical narratives.
[20] From Herodotus to Algorithms: Rethinking Historical Inquiry in the Age of AI. PDF
[22] ROGER: Extracting Narratives Using Large Language Models from Robert Gerstmann's Historical Photo Archive of the Sacambaya Expedition in 1928. PDF
[23] Speculative Historiography in the Age of Hallucinations PDF
[24] Knowledge Extraction from LLMs for Scalable Historical Data Annotation PDF
[25] Ghosts and the machine: testing the use of Artificial Intelligence to deliver historical life course biographies from big data PDF
[26] Can Generative AI Uncover Hidden Patterns in Historical Domestic Traffic Ads Through Data Analysis? A ChatLoS-DTA Exploration PDF
[27] Kongzi: A Historical Large Language Model with Fact Enhancement PDF
[28] The Performance of Artificial Intelligence in the Use of Indigenous American Languages PDF
[29] When Language Fails: Tragedy and Thucydides PDF
[30] Histactor: Summon Your Favorite Historical Persona PDF
Narrative cloze task for evaluating critical confabulation
The authors operationalize critical confabulation as a narrative cloze task where LLMs must reconstruct masked events in character timelines. This task serves as a proxy for fragmentary historical records and enables systematic evaluation of models' ability to perform evidence-bounded confabulation.
[41] SNAP: semantic stories for next activity prediction PDF
[42] Language models outperform cloze predictability in a cognitive model of reading PDF
[43] Automatic story generation: A survey of approaches PDF
[44] What do large language models learn about scripts? PDF
[45] Constructing Narrative Event Evolutionary Graph for Script Event Prediction PDF
[46] Conditional generation of temporally-ordered event sequences PDF
[47] News event prediction by trigger evolution graph and event segment PDF
[48] LUPIN: A LLM Approach for Activity Suffix Prediction in Business Process Event Logs PDF
[49] Goal-directed story generation: Augmenting generative language models with reinforcement learning PDF
[50] Storyimager: A unified and efficient framework for coherent story visualization and completion PDF
Contamination-audited dataset from unpublished historical texts
The authors construct a dataset from the Black Writing and Thought Collection with rigorous data contamination auditing procedures. They perform sentence-level string searches and behavioral probes to ensure the dataset represents genuinely unseen history, enabling reliable evaluation of confabulation rather than memorization.