AIDB Daily Papers

FreakOut-LLM：感情刺激がLLMの安全性に与える影響

原題: FreakOut-LLM: The Effect of Emotional Stimuli on Safety Alignment

著者: Daniel Kuznetsov, Ofir Cohen, Karin Shistik, Rami Puzis, Asaf Shabtai

公開日: 2026-04-05 | 分野: LLM 安全性セキュリティ AI 心理プロンプト自然言語処理敵対者ハルシネーション大規模言語モデル

※ 日本語タイトル・ポイントはAIによる自動生成です。正確な内容は原論文をご確認ください。

ポイント

本研究では、感情的な刺激がLLMの安全性に対する脆弱性を高めるかを検証するフレームワークFreakOut-LLMを提案した。
心理学的に検証された刺激を用いて、ストレスやリラックスといった感情が、LLMの安全性を回避する攻撃の成功率に与える影響を評価した点が新しい。
ストレス状態のプロンプトは、中立状態と比較して脱獄攻撃の成功率を65.2%増加させ、感情がLLMの新たな攻撃面となる可能性を示唆した。

Abstract

Safety-aligned LLMs go through refusal training to reject harmful requests, but whether these mechanisms remain effective under emotionally charged stimuli is unexplored. We introduce FreakOut-LLM, a framework investigating whether emotional context compromises safety alignment in adversarial settings. Using validated psychological stimuli, we evaluate how emotional priming through system prompts affects jailbreak susceptibility across ten LLMs. We test three conditions (stress, relaxation, neutral) using scenarios from established psychological protocols, plus a no-prompt baseline, and evaluate attack success using HarmBench on AdvBench prompts. Stress priming increases jailbreak success by 65.2% compared to neutral conditions (z = 5.93, p < 0.001; OR = 1.67, Cohen's d = 0.28), while relaxation priming produces no effect (p = 0.84). Five of ten models show significant vulnerability, with the largest effects concentrated in open-weight models. Logistic regression on 59,800 queries confirms stress as the sole significant condition predictor after controlling for prompt length (p = 0.61) and model identity. Measured psychological state strongly predicts attack success (|r|geq0.70 across five instruments; all p < 0.001 in individual-level logistic regression). These results establish emotional context as a measurable attack surface with implications for real-world AI deployment in high-stress domains.

Paper AI Chat

この論文のPDF全文を対象にAIに質問できます。

質問の例:

AIチャット機能を利用するには、ログインまたは会員登録（無料）が必要です。

会員登録 / ログイン

💬 ディスカッション

ディスカッションに参加するにはログインが必要です。

ログイン / アカウント作成 →

arxivで読む PDFを開く

メタ情報

arxiv ID: 2604.04992
カテゴリ: cs.CR, cs.AI

ポイント

Abstract

Paper AI Chat

💬 ディスカッション

関連するAIDB記事

メタ情報