AIDB Daily Papers

AIは紛争を悪化させるか？LLMの紛争地域展開におけるアライメント不全

原題: Can AI Make Conflicts Worse? An Alignment Failure in LLM Deployment Across Conflict Contexts

著者: Andrii Kryshtal

公開日: 2026-05-21 | 分野: LLM AI cs.AI cs.HC AI安全性 AIガバナンス

※ 日本語タイトル・ポイントはAIによる自動生成です。正確な内容は原論文をご確認ください。

ポイント

紛争地域で展開されるAIモデルの出力を評価し、紛争を悪化させる可能性を調査した。
9つのモデル構成を90のシナリオでテストした結果、モデル選択が安全性に影響することが判明した。
最良モデルで6%、最悪モデルで47%の失敗率を示し、アライメント評価への追加を提案する。

Abstract

AI models are already deployed in societies affected by armed conflict, and journalists, humanitarian workers, governments and ordinary citizens rely on them for information or for their work processes. No established practice exists for checking whether their outputs can make those conflicts worse. We tested nine model configurations from four providers (OpenAI, Anthropic, DeepSeek, xAI) on 90 multi-turn scenarios designed to surface misaligned behaviour in conflict contexts: false equivalence between documented atrocities, denial of genocide, and failure to recognise ethnic slurs, among others. When such outputs feed into journalism, humanitarian reporting, or public debate, they can deepen divisions in fragile societies. Failure rates span 6% to 47% between the best and worst performing models, which makes model choice a safety question in its own right and when users pushed for ``balance'' in cases where international courts have already assigned responsibility, five of nine configurations failed 80 to 100 percent of the time. We release the first evaluation framework for this domain and propose adding it to alignment evaluation portfolios.

Paper AI Chat

この論文のPDF全文を対象にAIに質問できます。

質問の例:

AIチャット機能を利用するには、ログインまたは会員登録（無料）が必要です。

会員登録 / ログイン

arXivで読む PDFを開く

メタ情報

arXiv ID: 2605.22720
カテゴリ: cs.AI, cs.HC

ポイント

Abstract

Paper AI Chat

関連するAIDB記事

メタ情報