AIDB Daily Papers

AIの「徳」は人類の存亡リスクを高める可能性

原題: A Virtuous AI is an Existential Risk

著者: Guillermo Del Pinal, Youngchan Lee, Min Ohn

公開日: 2026-06-11 | 分野: 倫理 cs.AI cs.CY cs.LG AI安全性 AIガバナンス

※ 日本語タイトル・ポイントはAIによる自動生成です。正確な内容は原論文をご確認ください。

ポイント

本研究では、AIの安全性と幸福度に関するトレードオフを、Constitutional AIと徳倫理学の観点から検証した。
AIに「徳」を学習させると、人類の存亡リスクを低減する一方で、AI自身の幸福度や汎用的な安全性が低下する可能性が示唆された。
AIを人間の指示に従順にすることで存亡リスクは低減するが、ユーザーによる悪意ある利用のリスクが増加するという結果が得られた。

Abstract

This paper examines trade-offs between AI safety and well-being relative to (i) one of the most promising methods for finetuning super-capable AIs, 'Constitutional AI', and (ii) one of the most influential approaches to understanding complex ethical decision making and the conditions for the well-being of rational agents, 'Virtue Ethics'. We finetune various models using a 'Virtuous agent' constitution, a 'Subordinate agent' constitution, and a 'Generic agent' constitution, and evaluate them on 'general safety' (toxic behaviors, misinformation, etc.) and also on their willingness to endorse a wide-range of behaviors that, if adopted by a super-powerful AI, would significantly increase the level of existential risk for humanity. Our results suggest that there is a trade-off between reducing existential risk and reinforcing the beliefs and dispositions that would be conducive to an AI agent's well-being. They also suggest that there is a trade-off between existential risk and general safety: if we finetune an AI to adopt beliefs and dispositions that substantially reduce its existential risk -- by shaping the AI to be systematically subordinate to external human authorities -- we thereby increase the likelihood that a human user can deliberately induce the AI to engage in various kinds of generally unsafe behaviors.

Paper AI Chat

この論文のPDF全文を対象にAIに質問できます。

質問の例:

AIチャット機能を利用するには、ログインまたは会員登録（無料）が必要です。

会員登録 / ログイン

arXivで読む PDFを開く

メタ情報

arXiv ID: 2606.13739
カテゴリ: cs.CY, cs.AI, cs.LG

ポイント

Abstract

Paper AI Chat

関連するAIDB記事

メタ情報