Critical Confabulation: Can LLMs Hallucinate for Social Good?

ICLR 2026 Conference SubmissionAnonymous Authors
Large Language ModelsAI for Social GoodHallucination and ConfabulationNarrative ModelingData Contamination and MemorizationComputational CreativityEvidence-Grounded Generation
Abstract:

LLMs hallucinate, yet some confabulations can have social affordances if carefully bounded. We propose critical confabulation (inspired by critical fabulation from literary and social theory), the use of LLM hallucinations to "fill-in-the-gap'' for omissions in archives due to social and political inequality, and reconstruct divergent yet evidence-bound narratives for history's "hidden figures''. We simulate these gaps with an open-ended narrative cloze task: asking LLMs to generate a masked event in a character-centric timeline sourced from a novel corpus of unpublished texts. We evaluate audited (for data contamination), fully-open models (the OLMo-2 family) and unaudited open-weight and proprietary baselines under a range of prompts designed to elicit controlled and useful hallucinations. Our findings validate LLMs' foundational narrative understanding capabilities to perform critical confabulation, and show how controlled and well-specified hallucinations can support LLM applications for knowledge production without collapsing speculation into a lack of historical accuracy and fidelity.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper introduces critical confabulation, a framework that deliberately uses LLM hallucinations to reconstruct missing historical narratives for marginalized figures, evaluated through a narrative cloze task on unpublished texts. According to the taxonomy tree, this work sits in the 'Critical Confabulation and Narrative Cloze Approaches' leaf under 'Controlled Hallucination Methods for Historical Reconstruction'. Notably, this leaf contains only the original paper itself—no sibling papers are listed. This suggests the paper occupies a relatively sparse research direction within the broader field of using LLMs for historical reconstruction, which comprises 21 papers across multiple branches.

The taxonomy reveals several neighboring research directions that contextualize this work's position. The sibling leaf 'Recursive and Generative Ancestral Reconstruction Systems' explores iterative human-AI methods for ancestral narratives, while 'AI-Mediated Voice Recreation for Specific Historical Figures' focuses on recreating individual voices. The broader 'Bias Analysis and Representation Studies' branch examines how LLMs encode historical inequities, and 'Community-Level Oral History and Archive Analysis' addresses collective memory preservation. The scope note for the paper's leaf explicitly excludes 'general creative generation without historical grounding', positioning critical confabulation as evidence-bound speculation rather than unconstrained creativity.

Among 30 candidates examined through limited semantic search, none were found to clearly refute any of the three main contributions. The critical confabulation framework examined 10 candidates with 0 refutable matches; the narrative cloze task similarly examined 10 candidates with no refutations; and the contamination-audited dataset from unpublished texts also showed 10 candidates examined with no clear prior work. This absence of refutable candidates across all contributions, combined with the paper being the sole occupant of its taxonomy leaf, suggests the specific combination of controlled hallucination for historical reconstruction, narrative cloze evaluation, and contamination-audited unpublished sources represents a relatively unexplored configuration within the limited search scope.

Based on the limited literature search of 30 candidates, the work appears to occupy a novel position combining theoretical framing (critical confabulation), methodological innovation (narrative cloze), and dataset construction (contamination-audited unpublished texts). However, the analysis cannot assess whether more extensive searches in adjacent fields—such as digital humanities, archival studies, or computational creativity—might reveal closer precedents. The taxonomy structure suggests active research in related areas like bias analysis and oral history, but the specific intersection this paper targets remains sparsely populated within the examined scope.

Taxonomy

Core-task Taxonomy Papers
21
3
Claimed Contributions
30
Contribution Candidate Papers Compared
0
Refutable Paper

Research Landscape Overview

Core task: Using LLM hallucinations to reconstruct missing historical narratives for marginalized figures. This emerging field addresses the challenge of recovering voices and stories systematically excluded from traditional archives by leveraging large language models' generative capacities in ethically grounded ways. The taxonomy reveals four main branches: Controlled Hallucination Methods for Historical Reconstruction explores techniques that deliberately harness LLM confabulation to fill archival gaps, including approaches like narrative cloze and critical confabulation; Bias Analysis and Representation Studies examines how LLMs encode and reproduce historical inequities, investigating cultural biases and representational harms in model outputs (e.g., Cultural Bias Asian[5], Unequal Voices[1]); Community-Level Oral History and Archive Analysis focuses on integrating oral traditions and community knowledge with computational methods (Oral History Understanding[3], Archival Photographs Multimodal[7]); and Methodological and Theoretical Frameworks addresses the philosophical and ethical foundations needed for responsible AI-assisted historiography, drawing on concepts like epistemic injustice and structural oppression (Epistemic Injustice[10], Historical Structural Oppression[17]). Particularly active tensions emerge between controlled generation methods and critical scholarship on representation. Works like Designing Invisible[2] and Moses Williams Representation[8] highlight how marginalized figures remain underrepresented or distorted even in computational reconstructions, while studies such as Simulating Social Perception[9] and Contextualizing Harmful Language[12] probe how models perpetuate historical biases. Critical Confabulation[0] situates itself within the controlled hallucination branch, proposing methods that intentionally use model confabulation as a historiographic tool rather than treating it as error. This approach contrasts with more cautious frameworks like Prosthetic Denial[15] and Spectral Imaginings[18], which emphasize the risks of fabricating narratives for communities already subjected to erasure. The central question across these lines of work remains how to balance generative reconstruction with epistemic humility, ensuring that computational methods amplify rather than replace marginalized voices.

Claimed Contributions

Critical confabulation framework for LLM hallucinations

The authors introduce critical confabulation as a framework that repurposes LLM hallucinations to reconstruct evidence-bounded narratives for historically under-documented figures. This approach adapts Hartman's critical fabulation methodology to leverage controlled confabulations for addressing archival silence and recovering divergent historical narratives.

10 retrieved papers
Narrative cloze task for evaluating critical confabulation

The authors operationalize critical confabulation as a narrative cloze task where LLMs must reconstruct masked events in character timelines. This task serves as a proxy for fragmentary historical records and enables systematic evaluation of models' ability to perform evidence-bounded confabulation.

10 retrieved papers
Contamination-audited dataset from unpublished historical texts

The authors construct a dataset from the Black Writing and Thought Collection with rigorous data contamination auditing procedures. They perform sentence-level string searches and behavioral probes to ensure the dataset represents genuinely unseen history, enabling reliable evaluation of confabulation rather than memorization.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Within the taxonomy built over the current TopK core-task papers, the original paper is assigned to a leaf with no direct siblings and no cousin branches under the same grandparent topic. In this retrieved landscape, it appears structurally isolated, which is one partial signal of novelty, but still constrained by search coverage and taxonomy granularity.

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Critical confabulation framework for LLM hallucinations

The authors introduce critical confabulation as a framework that repurposes LLM hallucinations to reconstruct evidence-bounded narratives for historically under-documented figures. This approach adapts Hartman's critical fabulation methodology to leverage controlled confabulations for addressing archival silence and recovering divergent historical narratives.

Contribution

Narrative cloze task for evaluating critical confabulation

The authors operationalize critical confabulation as a narrative cloze task where LLMs must reconstruct masked events in character timelines. This task serves as a proxy for fragmentary historical records and enables systematic evaluation of models' ability to perform evidence-bounded confabulation.

Contribution

Contamination-audited dataset from unpublished historical texts

The authors construct a dataset from the Black Writing and Thought Collection with rigorous data contamination auditing procedures. They perform sentence-level string searches and behavioral probes to ensure the dataset represents genuinely unseen history, enabling reliable evaluation of confabulation rather than memorization.