AIDB Daily Papers

大規模言語モデルにおける利他主義のメカニズム解明

原題: Understanding the Mechanism of Altruism in Large Language Models

著者: Shuhuai Zhang, Shu Wang, Zijun Yao, Chuanhao Li, Xiaozhi Wang, Songfa Zhong, Tracy Xiao Liu

公開日: 2026-04-21 | 分野: LLM AI 意思決定社会倫理研究 XAI アルゴリズム説明性インタラクションデータ分析 cs.AI econ.GN

※ 日本語タイトル・ポイントはAIによる自動生成です。正確な内容は原論文をご確認ください。

ポイント

大規模言語モデル（LLM）の利他主義的行動の内部計算メカニズムを、スパースオートエンコーダーを用いて調査した。
社会的なスタンスの違いがLLMの配分行動に影響を与えることを発見し、その行動変化と強く関連する特徴量を特定した。
特定された特徴量をヒューリスティック（System 1）と熟慮的（System 2）に分類し、System 2特徴量が最終出力に近接した影響を持つことを確認した。

Abstract

Altruism is fundamental to human societies, fostering cooperation and social cohesion. Recent studies suggest that large language models (LLMs) can display human-like prosocial behavior, but the internal computations that produce such behavior remain poorly understood. We investigate the mechanisms underlying LLM altruism using sparse autoencoders (SAEs). In a standard Dictator Game, minimal-pair prompts that differ only in social stance (generous versus selfish) induce large, economically meaningful shifts in allocations. Leveraging this contrast, we identify a set of SAE features (0.024% of all features across the model's layers) whose activations are strongly associated with the behavioral shift. To interpret these features, we use benchmark tasks motivated by dual-process theories to classify a subset as primarily heuristic (System 1) or primarily deliberative (System 2). Causal interventions validate their functional role: activation patching and continuous steering of this feature direction reliably shift allocation distributions, with System 2 features exerting a more proximal influence on the model's final output than System 1 features. The same steering direction generalizes across multiple social-preference games. Together, these results enhance our understanding of artificial cognition by translating altruistic behaviors into identifiable network states and provide a framework for aligning LLM behavior with human values, thereby informing more transparent and value-aligned deployment.

Paper AI Chat

この論文のPDF全文を対象にAIに質問できます。

質問の例:

AIチャット機能を利用するには、ログインまたは会員登録（無料）が必要です。

会員登録 / ログイン

💬 ディスカッション

ディスカッションに参加するにはログインが必要です。

ログイン / アカウント作成 →

arxivで読む PDFを開く

メタ情報

arxiv ID: 2604.19260
カテゴリ: econ.GN

ポイント

Abstract

Paper AI Chat

💬 ディスカッション

関連するAIDB記事

メタ情報