AIDB Daily Papers

LLMの「不注意盲」を暴く：MixReaベンチマークと因果関係補完プロンプト

原題: MixRea: Benchmarking Explicit-Implicit Reasoning in Large Language Models

著者: Yuanqing Cai, Ziyi Huang, Minhao Liu, Lixin Duan, Wen Li, Yanru Zhang

公開日: 2026-05-19 | 分野: LLM NLP AI cs.CL AI安全性 AI評価

※ 日本語タイトル・ポイントはAIによる自動生成です。正確な内容は原論文をご確認ください。

ポイント

人間心理の「不注意盲」に着想を得て、LLMが明示的な指示下で文脈の重要な手がかりを見落とすかを検証した。
2,246問のMixReaベンチマークを導入し、21の先進LLMを評価した結果、最高性能のモデルでも一貫性は42.8%に留まり、広範な不注意盲が明らかになった。
見落とされた因果関係を回復するPRCPプロンプトを提案し、LLMの認知的な偏りを軽減するモデル開発の必要性を示唆した。

Abstract

Large language models (LLMs) are increasingly integrated into high-stakes decision-making. Inspired by the theory of emph{inattentional blindness} in human cognition, we investigate whether LLMs, trained on human-preferred corpora that embed attentional biases, exhibit a similar limitation: emph{failing to attend to subtle yet important contextual cues under explicit task instructions}. To evaluate this, we introduce the task of textbf{explicit-implicit reasoning} and present textbf{MixRea}, a benchmark of 2,246 multiple-choice questions across 9 reasoning types with varying distributions of explicit and implicit information. Evaluation of 21 advanced LLMs shows that even the best-performing reasoning model (Gemini 2.5 Pro) achieves only 42.8% consistency, revealing widespread inattentional blindness. To mitigate this, we propose textbf{Potential Relation Completion Prompting (PRCP)}, a prompting method that improves reasoning by recovering overlooked causal relations. Further analysis shows that this limitation persists across diverse multi-source reasoning tasks, highlighting the need for more cognitively aligned models.

Paper AI Chat

この論文のPDF全文を対象にAIに質問できます。

質問の例:

AIチャット機能を利用するには、ログインまたは会員登録（無料）が必要です。

会員登録 / ログイン

arXivで読む PDFを開く

メタ情報

arXiv ID: 2605.20128
カテゴリ: cs.CL

ポイント

Abstract

Paper AI Chat

関連するAIDB記事

メタ情報