Is This Just Fantasy? Language Model Representations Reflect Human Judgments of Event Plausibility

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 6.0 Download Report PDF

LLMscognitive scienceinterpretabilitycommon sense reasoning

Language models (LMs) are used for a diverse range of tasks, from question answering to writing fantastical stories. In order to reliably accomplish these tasks, LMs must be able to discern the modal category of a sentence (i.e., whether it describes something that is possible, impossible, completely nonsensical, etc.). However, recent studies have called into question the ability of LMs to categorize sentences according to modality. In this work, we identify linear representations that discriminate between modal categories within a variety of LMs, or modal difference vectors. Analysis of modal difference vectors reveals that LMs have access to more reliable modal categorization judgments than previously reported. Furthermore, we find that modal difference vectors emerge in a consistent order as models become more competent (i.e., through training steps, layers, and parameter count). Notably, we find that modal difference vectors identified within LM activations can be used to model fine-grained human categorization behavior. This potentially provides a novel view into how human participants distinguish between modal categories, which we explore by correlating projections along modal difference vectors with human participants' ratings of interpretable features. In summary, we derive new insights into LM modal categorization using techniques from mechanistic interpretability, with the potential to inform our understanding of modal categorization in humans.

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper investigates whether language models can reliably distinguish modal categories—such as possible, impossible, or nonsensical events—by identifying linear 'modal difference vectors' within model activations. It resides in the 'Modal and Deontic Categorization in Text' leaf, which contains only four papers total, including this one. This is a notably sparse research direction within the broader taxonomy of fifty papers, suggesting that fine-grained modal categorization remains an underexplored niche compared to the crowded multimodal architecture and application domains that dominate the field.

The taxonomy tree reveals that this work sits within 'Semantic Representation and Linguistic Analysis,' a branch focused on how models encode meaning, distinct from the heavily populated 'Multimodal Architectures' and 'Application Domains' branches. Neighboring leaves include 'Conceptual and Geometric Representations' (three papers on categorical structures) and 'Lexical Semantic Change' (one paper on temporal semantics). The sibling papers in the same leaf address deontic modality detection and modal sense classification, indicating a shared interest in linguistic nuance rather than cross-modal fusion or task-specific applications.

Among twenty-seven candidates examined via limited semantic search, none were found to clearly refute any of the three contributions. Contribution A (modal difference vectors) examined seven candidates with zero refutations; Contribution B (developmental characterization across training and scale) examined ten candidates with zero refutations; Contribution C (modeling human categorization behavior) also examined ten candidates with zero refutations. This suggests that within the scope of top-K semantic matches, the specific combination of linear probing for modal categories, developmental analysis, and human behavior modeling appears relatively unexplored.

Given the limited search scope and the sparse population of the taxonomy leaf, the work appears to occupy a distinct position within modal categorization research. However, the analysis does not cover exhaustive citation networks or domain-specific venues, leaving open the possibility of relevant prior work outside the top-thirty semantic matches examined here.

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: modal categorization in language models. The field encompasses a broad spectrum of research directions, organized into six main branches. Multimodal Large Language Model Architectures and Frameworks (e.g., Cogvlm[1], mPLUG-OwI2[8]) focus on building unified systems that integrate vision, language, and other modalities. Evaluation and Benchmarking of Multimodal Models addresses the challenge of measuring performance across diverse tasks and modalities, while Prompt Tuning and Adaptation Techniques explore efficient methods for tailoring models to specific contexts. Application Domains and Specialized Tasks span areas from medical imaging to recommendation systems, demonstrating the versatility of multimodal approaches. Semantic Representation and Linguistic Analysis delves into how models capture meaning, including modal and deontic distinctions in text. Cross-Modal and Domain-Specific Language Models investigate transfer learning and specialized architectures for particular data types or problem settings. Within Semantic Representation and Linguistic Analysis, a small but active cluster examines modal and deontic categorization in text—how language models distinguish necessity, possibility, permission, and obligation. This line of work contrasts with the broader multimodal architectures by focusing on fine-grained linguistic phenomena rather than cross-modal fusion. Is This Just Fantasy[0] situates itself in this specialized niche, addressing modal categorization with an emphasis that aligns closely with Agent-Specific Deontic Modality Detection[42] and Revisiting modal sense classification[44], which similarly tackle nuanced semantic distinctions. Compared to Low-Resource Deontic Modality Classification[49], which explores resource-constrained settings, Is This Just Fantasy[0] appears to engage with the theoretical and representational challenges of capturing modal meaning. This cluster remains relatively compact, raising open questions about how insights from large-scale multimodal systems might inform or benefit from advances in fine-grained semantic analysis.

Claimed Contributions

Modal difference vectors for categorizing event plausibility

7 retrieved papers

The authors introduce modal difference vectors, which are linear representations extracted from language model hidden states that distinguish between modal categories (probable, improbable, impossible, inconceivable). These vectors are created using Contrastive Activation Addition and enable more reliable modal categorization than probability-based methods.

7 retrieved papers

Characterization of modal representation development across training and scale

10 retrieved papers

The authors analyze how modal difference vectors develop systematically across model training steps, layer depth, and parameter count. They find that coarse-grained distinctions (e.g., inconceivable vs. other categories) emerge earlier than fine-grained distinctions (e.g., improbable vs. impossible).

10 retrieved papers

Feature space for modeling human categorization behavior

10 retrieved papers

The authors demonstrate that projections of sentences onto modal difference vectors create a feature space that accurately models human participants' graded categorization judgments. This feature space outperforms baseline methods in predicting human response distributions and entropy patterns.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

[42] Agent-Specific Deontic Modality Detection in Legal Language PDF

Sancheti, Abhilasha (2022)

[44] Revisiting modal sense classification with contextual word embeddings PDF

M Dehouck, P Denis (2023)

[49] Low-Resource Deontic Modality Classification in EU Legislation PDF

Kristina Minkova, Shashank M Chakravarthy, G. van Dijck (2023)

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Modal difference vectors for categorizing event plausibility

[61] The Geometry of Truth: Emergent Linear Structure in Large Language Model Representations of True/False Datasets PDF

Cannot Refute

[62] Unifying visual-semantic embeddings with multimodal neural language models PDF

Cannot Refute

[63] Linearly mapping from image to text space PDF

Cannot Refute

[64] Language quantized autoencoders: Towards unsupervised text-image alignment PDF

Cannot Refute

[65] Strengths and Limitations of Word-Based Task Explainability in Vision Language Models: a Case Study on Biological Sex Biases in the Medical Domain PDF

Cannot Refute

[66] The Biblical Hebrew Verbal System in Light of Grammaticalization: The Second Generation PDF

Cannot Refute

[67] The BH weqatal - a homogenous form with no haphazard functions (part one) PDF

Cannot Refute

Contribution

Characterization of modal representation development across training and scale

[68] Matryoshka representation learning PDF

Cannot Refute

[69] The geometry of hidden representations of large transformer models PDF

Cannot Refute

[70] Understanding deep representation learning via layerwise feature compression and discrimination PDF

Cannot Refute

[71] Code Representation Learning At Scale PDF

Cannot Refute

[72] Deep high-resolution representation learning for visual recognition PDF

Cannot Refute

[73] Higherhrnet: Scale-aware representation learning for bottom-up human pose estimation PDF

Cannot Refute

[74] Layer by Layer: Uncovering Hidden Representations in Language Models PDF

Cannot Refute

[75] Multi-Scale Representation Learning on Proteins PDF

Cannot Refute

[76] EVA: Exploring the Limits of Masked Visual Representation Learning at Scale PDF

Cannot Refute

[77] Similarity of Neural Network Representations Revisited PDF

Cannot Refute

Contribution

Feature space for modeling human categorization behavior

[51] Dissociating language and thought in large language models PDF

Cannot Refute

[52] Emergent analogical reasoning in large language models PDF

Cannot Refute

[53] Do large language models resemble humans in language use? PDF

Cannot Refute

[54] Towards large language models with human-like episodic memory PDF

Cannot Refute

[55] Revealing emergent human-like conceptual representations from language prediction PDF

Cannot Refute

[56] The neural architecture of language: Integrative modeling converges on predictive processing PDF

Cannot Refute

[57] Evaluating neural language models as cognitive models of language acquisition PDF

Cannot Refute

[58] Driving and suppressing the human language network using large language models PDF

Cannot Refute

[59] Large language model recall uncertainty is modulated by the fan effect PDF

Cannot Refute

[60] Language Models Learn to Mislead Humans via RLHF PDF

Cannot Refute

Is This Just Fantasy? Language Model Representations Reflect Human Judgments of Event Plausibility

Overview

Overall Novelty Assessment

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

[42] Agent-Specific Deontic Modality Detection in Legal Language PDF

[44] Revisiting modal sense classification with contextual word embeddings PDF

[49] Low-Resource Deontic Modality Classification in EU Legislation PDF

Contribution Analysis

Modal difference vectors for categorizing event plausibility

[61] The Geometry of Truth: Emergent Linear Structure in Large Language Model Representations of True/False Datasets PDF

[62] Unifying visual-semantic embeddings with multimodal neural language models PDF

[63] Linearly mapping from image to text space PDF

[64] Language quantized autoencoders: Towards unsupervised text-image alignment PDF

[65] Strengths and Limitations of Word-Based Task Explainability in Vision Language Models: a Case Study on Biological Sex Biases in the Medical Domain PDF

[66] The Biblical Hebrew Verbal System in Light of Grammaticalization: The Second Generation PDF

[67] The BH weqatal - a homogenous form with no haphazard functions (part one) PDF

Characterization of modal representation development across training and scale

[68] Matryoshka representation learning PDF

[69] The geometry of hidden representations of large transformer models PDF

[70] Understanding deep representation learning via layerwise feature compression and discrimination PDF

[71] Code Representation Learning At Scale PDF

[72] Deep high-resolution representation learning for visual recognition PDF

[73] Higherhrnet: Scale-aware representation learning for bottom-up human pose estimation PDF

[74] Layer by Layer: Uncovering Hidden Representations in Language Models PDF

[75] Multi-Scale Representation Learning on Proteins PDF

[76] EVA: Exploring the Limits of Masked Visual Representation Learning at Scale PDF

[77] Similarity of Neural Network Representations Revisited PDF

Feature space for modeling human categorization behavior

[51] Dissociating language and thought in large language models PDF

[52] Emergent analogical reasoning in large language models PDF

[53] Do large language models resemble humans in language use? PDF

[54] Towards large language models with human-like episodic memory PDF

[55] Revealing emergent human-like conceptual representations from language prediction PDF

[56] The neural architecture of language: Integrative modeling converges on predictive processing PDF

[57] Evaluating neural language models as cognitive models of language acquisition PDF

[58] Driving and suppressing the human language network using large language models PDF

[59] Large language model recall uncertainty is modulated by the fan effect PDF

[60] Language Models Learn to Mislead Humans via RLHF PDF

Table of Contents