Fig. 2 (IMAGE)
Caption
Example comparisons between human and LLM ratings. Each scatterplot compares ratings from humans and an LLM for a given psychological feature. Points closer to the diagonal line (from bottom left to top right) indicate stronger agreement between humans and the LLM. For Concreteness, ratings from humans and the LLM generally show high agreement. In contrast, for Iconicity (the degree to which a word’s sound resembles its meaning), the rating patterns differ substantially. Notably, even for Concreteness, which shows high overall agreement, human ratings vary widely for function words such as prepositions and conjunctions, whereas the LLM consistently assigns them low concreteness values. This highlights systematic differences in how humans and AI “perceive” certain types of words.
Credit
2025, Hiromichi Hagihara et al., How well do large language models mirror human cognition of word concepts?: A comparison of psychological ratings for early-acquired English words, Behavior Research Methods (Publisher: Springer Nature)
Usage Restrictions
Credit must be given to the creator.
License
CC BY