AIDB Daily Papers
AIのポジティブ・アラインメント:人間の繁栄を支援するAI
※ 日本語タイトル・ポイントはAIによる自動生成です。正確な内容は原論文をご確認ください。
ポイント
- 本研究は、AIの安全性だけでなく、人間の繁栄を積極的に支援する「ポジティブ・アラインメント」という新たな研究課題を提案する。
- 従来のAIアラインメント研究が安全や危害防止に偏っていたのに対し、本研究は人間の自律性喪失や多様な視点の欠如といった問題を解決する。
- データフィルタリングや協調的価値収集などの技術的課題を提示し、分散化と多様性を促進する設計原則を提案する。
Abstract
Existing alignment research is dominated by concerns about safety and preventing harm: safeguards, controllability, and compliance. This paradigm of alignment parallels early psychology's focus on mental illness: necessary but incomplete. What we call Positive Alignment is the development of AI systems that (i) actively support human and ecological flourishing in a pluralistic, polycentric, context-sensitive, and user-authored way while (ii) remaining safe and cooperative. It is a distinct and necessary agenda within AI alignment research. We argue that several existing failures of alignment (e.g., engagement hacking, loss of human autonomy, failures in truth-seeking, low epistemic humility, error correction, lack of diverse viewpoints, and being primarily reactive rather than proactive) may be better addressed through positive alignment, including cultivating virtues and maximizing human flourishing. We highlight a range of challenges, open questions, and technical directions (e.g., data filtering and upsampling, pre- and post-training, evaluations, collaborative value collection) for different phases of the LLM and agents lifecycle. We end with design principles for promoting disagreement and decentralization through contextual grounding, community customization, continual adaptation, and polycentric governance; that is, many legitimate centers of oversight rather than one institutional or moral chokepoint.
Paper AI Chat
この論文のPDF全文を対象にAIに質問できます。
質問の例: