Is This Just Fantasy? Language Model Representations Reflect Human Judgments of Event Plausibility

ICLR 2026 Conference SubmissionAnonymous Authors
LLMscognitive scienceinterpretabilitycommon sense reasoning
Abstract:

Language models (LMs) are used for a diverse range of tasks, from question answering to writing fantastical stories. In order to reliably accomplish these tasks, LMs must be able to discern the modal category of a sentence (i.e., whether it describes something that is possible, impossible, completely nonsensical, etc.). However, recent studies have called into question the ability of LMs to categorize sentences according to modality. In this work, we identify linear representations that discriminate between modal categories within a variety of LMs, or modal difference vectors. Analysis of modal difference vectors reveals that LMs have access to more reliable modal categorization judgments than previously reported. Furthermore, we find that modal difference vectors emerge in a consistent order as models become more competent (i.e., through training steps, layers, and parameter count). Notably, we find that modal difference vectors identified within LM activations can be used to model fine-grained human categorization behavior. This potentially provides a novel view into how human participants distinguish between modal categories, which we explore by correlating projections along modal difference vectors with human participants' ratings of interpretable features. In summary, we derive new insights into LM modal categorization using techniques from mechanistic interpretability, with the potential to inform our understanding of modal categorization in humans.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper investigates whether language models can reliably distinguish modal categories—such as possible, impossible, or nonsensical events—by identifying linear 'modal difference vectors' within model activations. It resides in the 'Modal and Deontic Categorization in Text' leaf, which contains only four papers total, including this one. This is a notably sparse research direction within the broader taxonomy of fifty papers, suggesting that fine-grained modal categorization remains an underexplored niche compared to the crowded multimodal architecture and application domains that dominate the field.

The taxonomy tree reveals that this work sits within 'Semantic Representation and Linguistic Analysis,' a branch focused on how models encode meaning, distinct from the heavily populated 'Multimodal Architectures' and 'Application Domains' branches. Neighboring leaves include 'Conceptual and Geometric Representations' (three papers on categorical structures) and 'Lexical Semantic Change' (one paper on temporal semantics). The sibling papers in the same leaf address deontic modality detection and modal sense classification, indicating a shared interest in linguistic nuance rather than cross-modal fusion or task-specific applications.

Among twenty-seven candidates examined via limited semantic search, none were found to clearly refute any of the three contributions. Contribution A (modal difference vectors) examined seven candidates with zero refutations; Contribution B (developmental characterization across training and scale) examined ten candidates with zero refutations; Contribution C (modeling human categorization behavior) also examined ten candidates with zero refutations. This suggests that within the scope of top-K semantic matches, the specific combination of linear probing for modal categories, developmental analysis, and human behavior modeling appears relatively unexplored.

Given the limited search scope and the sparse population of the taxonomy leaf, the work appears to occupy a distinct position within modal categorization research. However, the analysis does not cover exhaustive citation networks or domain-specific venues, leaving open the possibility of relevant prior work outside the top-thirty semantic matches examined here.

Taxonomy

Core-task Taxonomy Papers
50
3
Claimed Contributions
27
Contribution Candidate Papers Compared
0
Refutable Paper

Research Landscape Overview

Core task: modal categorization in language models. The field encompasses a broad spectrum of research directions, organized into six main branches. Multimodal Large Language Model Architectures and Frameworks (e.g., Cogvlm[1], mPLUG-OwI2[8]) focus on building unified systems that integrate vision, language, and other modalities. Evaluation and Benchmarking of Multimodal Models addresses the challenge of measuring performance across diverse tasks and modalities, while Prompt Tuning and Adaptation Techniques explore efficient methods for tailoring models to specific contexts. Application Domains and Specialized Tasks span areas from medical imaging to recommendation systems, demonstrating the versatility of multimodal approaches. Semantic Representation and Linguistic Analysis delves into how models capture meaning, including modal and deontic distinctions in text. Cross-Modal and Domain-Specific Language Models investigate transfer learning and specialized architectures for particular data types or problem settings. Within Semantic Representation and Linguistic Analysis, a small but active cluster examines modal and deontic categorization in text—how language models distinguish necessity, possibility, permission, and obligation. This line of work contrasts with the broader multimodal architectures by focusing on fine-grained linguistic phenomena rather than cross-modal fusion. Is This Just Fantasy[0] situates itself in this specialized niche, addressing modal categorization with an emphasis that aligns closely with Agent-Specific Deontic Modality Detection[42] and Revisiting modal sense classification[44], which similarly tackle nuanced semantic distinctions. Compared to Low-Resource Deontic Modality Classification[49], which explores resource-constrained settings, Is This Just Fantasy[0] appears to engage with the theoretical and representational challenges of capturing modal meaning. This cluster remains relatively compact, raising open questions about how insights from large-scale multimodal systems might inform or benefit from advances in fine-grained semantic analysis.

Claimed Contributions

Modal difference vectors for categorizing event plausibility

The authors introduce modal difference vectors, which are linear representations extracted from language model hidden states that distinguish between modal categories (probable, improbable, impossible, inconceivable). These vectors are created using Contrastive Activation Addition and enable more reliable modal categorization than probability-based methods.

7 retrieved papers
Characterization of modal representation development across training and scale

The authors analyze how modal difference vectors develop systematically across model training steps, layer depth, and parameter count. They find that coarse-grained distinctions (e.g., inconceivable vs. other categories) emerge earlier than fine-grained distinctions (e.g., improbable vs. impossible).

10 retrieved papers
Feature space for modeling human categorization behavior

The authors demonstrate that projections of sentences onto modal difference vectors create a feature space that accurately models human participants' graded categorization judgments. This feature space outperforms baseline methods in predicting human response distributions and entropy patterns.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Modal difference vectors for categorizing event plausibility

The authors introduce modal difference vectors, which are linear representations extracted from language model hidden states that distinguish between modal categories (probable, improbable, impossible, inconceivable). These vectors are created using Contrastive Activation Addition and enable more reliable modal categorization than probability-based methods.

Contribution

Characterization of modal representation development across training and scale

The authors analyze how modal difference vectors develop systematically across model training steps, layer depth, and parameter count. They find that coarse-grained distinctions (e.g., inconceivable vs. other categories) emerge earlier than fine-grained distinctions (e.g., improbable vs. impossible).

Contribution

Feature space for modeling human categorization behavior

The authors demonstrate that projections of sentences onto modal difference vectors create a feature space that accurately models human participants' graded categorization judgments. This feature space outperforms baseline methods in predicting human response distributions and entropy patterns.