AIDB Daily Papers

LLMの発展的認知能力を評価する：発達心理学理論を応用した新しい手法

原題: Evaluating Developmental Cognition Capabilities of LLMs

著者: Xiao Xiao, Hayoun Noh, Mar Gonzalez-Franco

公開日: 2026-05-08 | 分野: LLM 認知評価心理 cs.AI

※ 日本語タイトル・ポイントはAIによる自動生成です。正確な内容は原論文をご確認ください。

ポイント

本研究では、大規模言語モデル（LLM）の発展的認知能力を評価するため、新しい「発達性文章完成テスト（DSCT）」を開発した。
DSCTは、専門家インタビューに代わるスケーラブルな手法として、LLMの自己生成応答やシミュレートされたペルソナ応答から発達段階のシグナルを抽出することを目指す。
実験の結果、最新のLLMはシミュレートされた応答から発達段階を高い精度で復元したが、実際の人間からの応答では一致度が中程度であり、モデルファミリー間で応答に安定した発達段階的差異が見られた。

Abstract

Conversational AI is increasingly personalized around users' preferences, histories, goals, and knowledge, but much less around how users interpret and take up model outputs to construct and understand their reality. We draw on Robert Kegan's constructive-developmental theory as a complementary lens on this dimension. Existing methods for assessing developmental stage in the Keganian tradition rely either on expert interviews that do not scale or on sentence-completion instruments that are proprietary, lengthy, or invasive. To make this perspective tractable for LLM evaluation, we introduce the Developmental Sentence Completion Test (DSCT), a 20-item instrument designed to elicit developmental signal in self-administered text. Throughout, we treat the resulting labels as characterizations of stage-like structure in elicited responses, not as validated person-level developmental stage. We then ask how much of that signal can be recovered by LLMs across three elicited response regimes: simulated personas, real human respondents, and default model-generated answers. On simulated personas, top frontier models recover simulator-intended labels with high accuracy. On real human DSCT responses, human-LLM agreement is fair, with much stronger within-neighborhood than exact agreement. Finally, when LLMs answer DSCT prompts without persona-conditioning, their responses exhibit stable stage-like differences across model families, with larger and newer models tending to generate higher-rated text. These results suggest that stage-conditioned signal is cleaner in synthetic responses than in human-written DSCT text, and that the core constraint for stage-aware conversational AI is not classifier accuracy alone, but the availability of developmental signal from elicited text.

Paper AI Chat

この論文のPDF全文を対象にAIに質問できます。

質問の例:

AIチャット機能を利用するには、ログインまたは会員登録（無料）が必要です。

会員登録 / ログイン

arXivで読む PDFを開く

メタ情報

arXiv ID: 2605.08549
カテゴリ: cs.AI

ポイント

Abstract

Paper AI Chat

関連するAIDB記事

メタ情報