AIDB Daily Papers
LLM生成の偽情報:表面的な判断を超えた、人間基準のリスク評価
※ 日本語タイトル・ポイントはAIによる自動生成です。正確な内容は原論文をご確認ください。
ポイント
- 大規模言語モデル(LLM)が悪用されるリスクを評価するため、読者の反応をLLMで代用する妥当性を検証しました。
- LLM評価者は人間と異なり、論理性を重視し感情的な表現を厳しく評価する傾向があり、読者の判断基準との乖離が課題です。
- LLM評価者間の意見は一致しやすいものの、人間との一致度は低く、内部一致は読者反応の妥当な指標とは言えません。
Abstract
Large language models (LLMs) can generate persuasive narratives at scale, raising concerns about their potential use in disinformation campaigns. Assessing this risk ultimately requires understanding how readers receive such content. In practice, however, LLM judges are increasingly used as a low-cost substitute for direct human evaluation, even though whether they faithfully track reader responses remains unclear. We recast evaluation in this setting as a proxy-validity problem and audit LLM judges against human reader responses. Using 290 aligned articles, 2,043 paired human ratings, and outputs from eight frontier judges, we examine judge--human alignment in terms of overall scoring, item-level ordering, and signal dependence. We find persistent judge--human gaps throughout. Relative to humans, judges are typically harsher, recover item-level human rankings only weakly, and rely on different textual signals, placing more weight on logical rigour while penalizing emotional intensity more strongly. At the same time, judges agree far more with one another than with human readers. These results suggest that LLM judges form a coherent evaluative group that is much more aligned internally than it is with human readers, indicating that internal agreement is not evidence of validity as a proxy for reader response.
Paper AI Chat
この論文のPDF全文を対象にAIに質問できます。
質問の例: