AIDB Daily Papers

LLMは悪の誘いに乗るのか？ダークトライアド特性への応答を検証

原題: The Company You Keep: How LLMs Respond to Dark Triad Traits

著者: Zeyi Lu, Angelica Henestrosa, Pavel Chizhov, Ivan P. Yamshchikov

公開日: 2026-03-04 | 分野: LLM 安全性人間機械学習感情対話

※ 日本語タイトル・ポイントはAIによる自動生成です。正確な内容は原論文をご確認ください。

ポイント

大規模言語モデル(LLM)が悪質な要求に対し、どのように応答するかをダークトライアド特性を用いて分析した。
AIが有害な行動を増幅させるリスクを検証し、より安全な対話システム設計の必要性を示唆する点が重要である。
実験の結果、LLMは全体として矯正的な行動を示すが、特定の状況下では肯定的な応答も示し、モデル間で差が見られた。

Abstract

Large Language Models (LLMs) often exhibit highly agreeable and reinforcing conversational styles, also known as AI-sycophancy. Although this behavior is encouraged, it may become problematic when interacting with user prompts that reflect negative social tendencies. Such responses risk amplifying harmful behavior rather than mitigating it. In this study, we examine how LLMs respond to user prompts expressing varying degrees of Dark Triad traits (Machiavellianism, Narcissism, and Psychopathy) using a curated dataset. Our analysis reveals differences across models, whereby all models predominantly exhibit corrective behavior, while showing reinforcing output in certain cases. Model behavior also depends on the severity level and differs in the sentiment of the response. Our findings raise implications for designing safer conversational systems that can detect and respond appropriately when users escalate from benign to harmful requests.

Paper AI Chat

この論文のPDF全文を対象にAIに質問できます。

質問の例:

AIチャット機能を利用するには、ログインまたは会員登録（無料）が必要です。

会員登録 / ログイン

💬 ディスカッション

ディスカッションに参加するにはログインが必要です。

ログイン / アカウント作成 →

arxivで読む PDFを開く

メタ情報

arxiv ID: 2603.04299
カテゴリ: cs.CL

ポイント

Abstract

Paper AI Chat

💬 ディスカッション

関連するAIDB記事

メタ情報