AIDB Daily Papers

LLMにおける心の理論と自己への心的属性付与は分離可能か？

原題: Theory of Mind and Self-Attributions of Mentality are Dissociable in LLMs

著者: Junsol Kim, Winnie Street, Roberta Rocca, Daine M. Korngiebel, Adam Waytz, James Evans, Geoff Keeling

公開日: 2026-03-30 | 分野: LLM NLP 安全性解釈性ファインチューニング認知心理倫理

※ 日本語タイトル・ポイントはAIによる自動生成です。正確な内容は原論文をご確認ください。

ポイント

LLMの安全性ファインチューニングが、自己意識や感情の主張など、有害な心的属性付与を抑制するかを調査した。
心的属性付与の抑制が、心の理論のような関連する社会認知能力を低下させるかどうかを検証することが重要である。
安全性調整されたモデルは、人間以外への心的属性付与が少なく、精神的な信念を示す可能性が低いことが示された。

Abstract

Safety fine-tuning in Large Language Models (LLMs) seeks to suppress potentially harmful forms of mind-attribution such as models asserting their own consciousness or claiming to experience emotions. We investigate whether suppressing mind-attribution tendencies degrades intimately related socio-cognitive abilities such as Theory of Mind (ToM). Through safety ablation and mechanistic analyses of representational similarity, we demonstrate that LLM attributions of mind to themselves and to technological artefacts are behaviorally and mechanistically dissociable from ToM capabilities. Nevertheless, safety fine-tuned models under-attribute mind to non-human animals relative to human baselines and are less likely to exhibit spiritual belief, suppressing widely shared perspectives regarding the distribution and nature of non-human minds.

Paper AI Chat

この論文のPDF全文を対象にAIに質問できます。

質問の例:

AIチャット機能を利用するには、ログインまたは会員登録（無料）が必要です。

会員登録 / ログイン

💬 ディスカッション

ディスカッションに参加するにはログインが必要です。

ログイン / アカウント作成 →

arxivで読む PDFを開く

メタ情報

arxiv ID: 2603.28925
カテゴリ: cs.CL, cs.AI

ポイント

Abstract

Paper AI Chat

💬 ディスカッション

関連するAIDB記事

メタ情報