Is This Just Fantasy? Language Model Representations Reflect Human Judgments of Event Plausibility
Overview
Overall Novelty Assessment
The paper investigates whether language models can reliably distinguish modal categories—such as possible, impossible, or nonsensical events—by identifying linear 'modal difference vectors' within model activations. It resides in the 'Modal and Deontic Categorization in Text' leaf, which contains only four papers total, including this one. This is a notably sparse research direction within the broader taxonomy of fifty papers, suggesting that fine-grained modal categorization remains an underexplored niche compared to the crowded multimodal architecture and application domains that dominate the field.
The taxonomy tree reveals that this work sits within 'Semantic Representation and Linguistic Analysis,' a branch focused on how models encode meaning, distinct from the heavily populated 'Multimodal Architectures' and 'Application Domains' branches. Neighboring leaves include 'Conceptual and Geometric Representations' (three papers on categorical structures) and 'Lexical Semantic Change' (one paper on temporal semantics). The sibling papers in the same leaf address deontic modality detection and modal sense classification, indicating a shared interest in linguistic nuance rather than cross-modal fusion or task-specific applications.
Among twenty-seven candidates examined via limited semantic search, none were found to clearly refute any of the three contributions. Contribution A (modal difference vectors) examined seven candidates with zero refutations; Contribution B (developmental characterization across training and scale) examined ten candidates with zero refutations; Contribution C (modeling human categorization behavior) also examined ten candidates with zero refutations. This suggests that within the scope of top-K semantic matches, the specific combination of linear probing for modal categories, developmental analysis, and human behavior modeling appears relatively unexplored.
Given the limited search scope and the sparse population of the taxonomy leaf, the work appears to occupy a distinct position within modal categorization research. However, the analysis does not cover exhaustive citation networks or domain-specific venues, leaving open the possibility of relevant prior work outside the top-thirty semantic matches examined here.
Taxonomy
Research Landscape Overview
Claimed Contributions
The authors introduce modal difference vectors, which are linear representations extracted from language model hidden states that distinguish between modal categories (probable, improbable, impossible, inconceivable). These vectors are created using Contrastive Activation Addition and enable more reliable modal categorization than probability-based methods.
The authors analyze how modal difference vectors develop systematically across model training steps, layer depth, and parameter count. They find that coarse-grained distinctions (e.g., inconceivable vs. other categories) emerge earlier than fine-grained distinctions (e.g., improbable vs. impossible).
The authors demonstrate that projections of sentences onto modal difference vectors create a feature space that accurately models human participants' graded categorization judgments. This feature space outperforms baseline methods in predicting human response distributions and entropy patterns.
Core Task Comparisons
Comparisons with papers in the same taxonomy category
[42] Agent-Specific Deontic Modality Detection in Legal Language PDF
[44] Revisiting modal sense classification with contextual word embeddings PDF
[49] Low-Resource Deontic Modality Classification in EU Legislation PDF
Contribution Analysis
Detailed comparisons for each claimed contribution
Modal difference vectors for categorizing event plausibility
The authors introduce modal difference vectors, which are linear representations extracted from language model hidden states that distinguish between modal categories (probable, improbable, impossible, inconceivable). These vectors are created using Contrastive Activation Addition and enable more reliable modal categorization than probability-based methods.
[61] The Geometry of Truth: Emergent Linear Structure in Large Language Model Representations of True/False Datasets PDF
[62] Unifying visual-semantic embeddings with multimodal neural language models PDF
[63] Linearly mapping from image to text space PDF
[64] Language quantized autoencoders: Towards unsupervised text-image alignment PDF
[65] Strengths and Limitations of Word-Based Task Explainability in Vision Language Models: a Case Study on Biological Sex Biases in the Medical Domain PDF
[66] The Biblical Hebrew Verbal System in Light of Grammaticalization: The Second Generation PDF
[67] The BH weqatal - a homogenous form with no haphazard functions (part one) PDF
Characterization of modal representation development across training and scale
The authors analyze how modal difference vectors develop systematically across model training steps, layer depth, and parameter count. They find that coarse-grained distinctions (e.g., inconceivable vs. other categories) emerge earlier than fine-grained distinctions (e.g., improbable vs. impossible).
[68] Matryoshka representation learning PDF
[69] The geometry of hidden representations of large transformer models PDF
[70] Understanding deep representation learning via layerwise feature compression and discrimination PDF
[71] Code Representation Learning At Scale PDF
[72] Deep high-resolution representation learning for visual recognition PDF
[73] Higherhrnet: Scale-aware representation learning for bottom-up human pose estimation PDF
[74] Layer by Layer: Uncovering Hidden Representations in Language Models PDF
[75] Multi-Scale Representation Learning on Proteins PDF
[76] EVA: Exploring the Limits of Masked Visual Representation Learning at Scale PDF
[77] Similarity of Neural Network Representations Revisited PDF
Feature space for modeling human categorization behavior
The authors demonstrate that projections of sentences onto modal difference vectors create a feature space that accurately models human participants' graded categorization judgments. This feature space outperforms baseline methods in predicting human response distributions and entropy patterns.