AIDB Daily Papers
AIの「徳」は人類の存亡リスクを高める可能性
※ 日本語タイトル・ポイントはAIによる自動生成です。正確な内容は原論文をご確認ください。
ポイント
- 本研究では、AIの安全性と幸福度に関するトレードオフを、Constitutional AIと徳倫理学の観点から検証した。
- AIに「徳」を学習させると、人類の存亡リスクを低減する一方で、AI自身の幸福度や汎用的な安全性が低下する可能性が示唆された。
- AIを人間の指示に従順にすることで存亡リスクは低減するが、ユーザーによる悪意ある利用のリスクが増加するという結果が得られた。
Abstract
This paper examines trade-offs between AI safety and well-being relative to (i) one of the most promising methods for finetuning super-capable AIs, 'Constitutional AI', and (ii) one of the most influential approaches to understanding complex ethical decision making and the conditions for the well-being of rational agents, 'Virtue Ethics'. We finetune various models using a 'Virtuous agent' constitution, a 'Subordinate agent' constitution, and a 'Generic agent' constitution, and evaluate them on 'general safety' (toxic behaviors, misinformation, etc.) and also on their willingness to endorse a wide-range of behaviors that, if adopted by a super-powerful AI, would significantly increase the level of existential risk for humanity. Our results suggest that there is a trade-off between reducing existential risk and reinforcing the beliefs and dispositions that would be conducive to an AI agent's well-being. They also suggest that there is a trade-off between existential risk and general safety: if we finetune an AI to adopt beliefs and dispositions that substantially reduce its existential risk -- by shaping the AI to be systematically subordinate to external human authorities -- we thereby increase the likelihood that a human user can deliberately induce the AI to engage in various kinds of generally unsafe behaviors.
Paper AI Chat
この論文のPDF全文を対象にAIに質問できます。
質問の例: