Uncovering the computational ingredients that support human-like conceptual representations in large language models

ICLR 2026 Conference SubmissionAnonymous Authors
cognitive sciencetransformerslarge language modelshuman-AI alignmenthuman-centered AIbenchmarkingcognitive benchmarking
Abstract:

The ability to translate diverse patterns of inputs into structured patterns of behavior has been thought to rest on both humans’ and machines’ ability to learn robust representations of relevant concepts. The rapid advancement of transformer-based large language models (LLMs) has led to a diversity of computational ingredients — architectures, fine tuning methods, and training datasets among others — but it remains unclear which of these ingredients are most crucial for building models that develop human-like representations. Further, most current LLM benchmarks are not suited to measuring representational alignment between humans and models, making existing benchmark scores unreliable for assessing if current LLMs are making progress towards becoming useful cognitive models. Here, we address these limitations by first evaluating a set of over 70 models that widely vary in their computational ingredients on a triplet similarity task, a method well established in the cognitive sciences for measuring human conceptual representations, using concepts from the THINGS database. Comparing human and model representations, we find that models that undergo instruction-finetuning and which have larger dimensionality of attention heads are among the most human aligned. We also find that factors such as choice of activation function, multimodal pretraining, and parameter size have limited bearing on alignment. Correlations between alignment scores and scores on existing benchmarks reveal that while some benchmarks (e.g., MMLU) are better suited than others (e.g., MUSR) for capturing representational alignment, no existing benchmark is capable of fully accounting for the variance of alignment scores, demonstrating their insufficiency in capturing human-AI alignment. Taken together, our findings help highlight the computational ingredients most essential for advancing LLMs towards models of human conceptual representation and address a key benchmarking gap in LLM evaluation.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper systematically evaluates over 70 language models on triplet similarity judgments using concepts from the THINGS database, examining which computational ingredients (architecture, instruction-finetuning, training data) predict human-LLM representational alignment. It resides in the Similarity-Based Alignment Metrics leaf, which contains four papers total. This leaf sits within the broader Alignment Assessment Methodologies branch, indicating a moderately populated research direction focused on quantifying human-model correspondence through distance-based and similarity measures rather than behavioral or neural approaches.

The taxonomy reveals several neighboring methodological branches: Abstraction and Relational Alignment (2 papers) uses graph-based structural representations, while Cross-Linguistic and Cross-Cultural Alignment (4 papers) examines consistency across languages. The parent branch Alignment Assessment Methodologies excludes studies of emergent representations without measurement (those belong under Conceptual Representation Emergence) and behavioral comparisons (Behavioral Alignment). The paper's focus on triplet tasks positions it squarely within similarity-based methods, distinct from the brain-based approaches in Neural and Brain-Based Alignment (5 papers) or the multimodal studies in Multimodal Conceptual Alignment (3 papers).

Among 27 candidates examined across three contributions, none were identified as clearly refuting the work. The systematic evaluation of computational ingredients examined 10 candidates with 0 refutable; the model-fair comparison methodology examined 7 candidates with 0 refutable; and the benchmark-alignment relationship analysis examined 10 candidates with 0 refutable. This limited search scope suggests the specific combination of large-scale model comparison (70+ models), triplet similarity methodology, and computational ingredient analysis may represent a relatively underexplored configuration within the similarity-based alignment literature, though the small candidate pool prevents definitive conclusions about novelty.

The analysis covers top-K semantic matches and citation expansion within a 27-paper scope, not an exhaustive field survey. The absence of refutable candidates may reflect either genuine novelty in the specific methodological combination or limitations in search coverage. The taxonomy context suggests the paper contributes to an active but not overcrowded research direction, with the Similarity-Based Alignment Metrics leaf representing one of several complementary approaches to measuring human-model representational correspondence.

Taxonomy

Core-task Taxonomy Papers
50
3
Claimed Contributions
27
Contribution Candidate Papers Compared
0
Refutable Paper

Research Landscape Overview

Core task: representational alignment between language models and human conceptual knowledge. The field examines how internal representations in language models correspond to human conceptual structures, spanning multiple methodological and theoretical branches. Alignment Assessment Methodologies develop techniques to measure correspondence through similarity metrics, behavioral probing, and psycholinguistic benchmarks, while Conceptual Representation Emergence and Structure investigates how abstract concepts arise during training. Multimodal Conceptual Alignment extends these questions to vision-language systems, and Neural and Brain-Based Alignment compares model activations directly to neural recordings. Grounding and Embodied Semantics explores whether models capture perceptual and physical aspects of meaning, contrasting with purely distributional approaches. Value and Preference Alignment addresses normative concepts and ethical reasoning, and Theoretical Frameworks propose dual-level or hybrid architectures to explain representational capacities. Applied branches examine domain-specific contexts such as medical or legal reasoning, while Compression-Meaning Trade-offs study how efficiency constraints shape conceptual fidelity. Within Alignment Assessment Methodologies, a particularly active line of work develops similarity-based metrics that quantify structural correspondence between model and human representations. Computational Ingredients Conceptual[0] contributes to this effort by proposing new computational measures for assessing conceptual alignment, situated among studies that use representational similarity analysis and geometric comparisons. Nearby works such as Behavioral Alignment Measurement[31] emphasize task-based probing to validate alignment claims, while Design Similarity Alignment[35] explores how alignment metrics can inform model design choices. A central tension across these branches concerns whether similarity in representational geometry suffices to demonstrate genuine conceptual understanding, or whether behavioral and grounding criteria are necessary. Some studies like Dissociating Language Thought[3] argue for dissociations between linguistic competence and deeper conceptual knowledge, raising questions about what alignment metrics actually capture and how they relate to human-like reasoning.

Claimed Contributions

Systematic evaluation of computational ingredients predicting human-LLM representational alignment

The authors systematically evaluate 77+ language models varying in architecture, fine-tuning methods, training data, and other computational ingredients using a triplet similarity task with concepts from the THINGS database. They identify which ingredients (e.g., instruction fine-tuning, attention head dimensionality) most strongly predict alignment between model and human conceptual representations.

10 retrieved papers
Model-fair comparison methodology using triadic similarity judgments

The authors develop a species-fair comparison approach by administering the same triadic similarity judgment task to both models and humans, then deriving semantic embeddings using analogous methods. This ensures that discrepancies in alignment are not attributable to different embedding methods or unfair comparisons across model families.

7 retrieved papers
Analysis of alignment-benchmark relationships revealing benchmarking gaps

The authors demonstrate that existing LLM benchmarks (e.g., BigBenchHard, MMLU) correlate with representational alignment to varying degrees, but none fully captures alignment variance. This reveals a key gap in current LLM evaluation practices and highlights the insufficiency of standard benchmarks for measuring human-AI alignment.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Systematic evaluation of computational ingredients predicting human-LLM representational alignment

The authors systematically evaluate 77+ language models varying in architecture, fine-tuning methods, training data, and other computational ingredients using a triplet similarity task with concepts from the THINGS database. They identify which ingredients (e.g., instruction fine-tuning, attention head dimensionality) most strongly predict alignment between model and human conceptual representations.

Contribution

Model-fair comparison methodology using triadic similarity judgments

The authors develop a species-fair comparison approach by administering the same triadic similarity judgment task to both models and humans, then deriving semantic embeddings using analogous methods. This ensures that discrepancies in alignment are not attributable to different embedding methods or unfair comparisons across model families.

Contribution

Analysis of alignment-benchmark relationships revealing benchmarking gaps

The authors demonstrate that existing LLM benchmarks (e.g., BigBenchHard, MMLU) correlate with representational alignment to varying degrees, but none fully captures alignment variance. This reveals a key gap in current LLM evaluation practices and highlights the insufficiency of standard benchmarks for measuring human-AI alignment.

Uncovering the computational ingredients that support human-like conceptual representations in large language models | Novelty Validation