AIDB Daily Papers

視線誘導の謎を解く：シーン理解を最適化する視覚言語モデルの創発的な人間的注視パターン

原題: Why We Look Where We Look: Emergent Human-like Fixations of a Foveated Visual Language Model Maximizing Scene Understanding

著者: Shravan Murlidaran, Ziqi Wen, Sana Shehabi, Miguel P. Eckstein

公開日: 2026-05-18 | 分野: コンピュータビジョン認知人工知能 cs.AI cs.CV 視覚エージェント

※ 日本語タイトル・ポイントはAIによる自動生成です。正確な内容は原論文をご確認ください。

ポイント

本研究では、視覚言語モデルがシーン理解を最適化する際に、人間のような視線誘導パターンを創発的に示すことを明らかにした。
この発見は、人間の自由視（タスクなしでの視線誘導）が、視覚的制約下でのシーン理解最適化の機能的副産物である可能性を示唆する。
タスク特化型モデルや異なる視覚能力を持つモデルでは人間のようなパターンは再現されず、シーン理解最適化が鍵であることが示された。

Abstract

When humans view scenes without a specific task (free-viewing), they initially direct their eye movements toward the scene center and then fixate on people, text, objects being gazed at or grasped, and semantically meaningful regions. What these signature fixation patterns reflect and whether they optimize an underlying perceptual task remain unknown. We show that a computational agent with simulated foveation, trained to optimize scene comprehension, exhibits emergent human fixation signature patterns. In contrast, versions of the agent trained to search or classify scenes, or equipped with peripheral vision that was better or worse than human vision, predicted human fixation patterns less accurately. Thus, human free-viewing fixation patterns may emerge as a functional byproduct of optimizing scene comprehension under the biological constraints of foveated vision.

Paper AI Chat

この論文のPDF全文を対象にAIに質問できます。

質問の例:

AIチャット機能を利用するには、ログインまたは会員登録（無料）が必要です。

会員登録 / ログイン

arXivで読む PDFを開く

メタ情報

arXiv ID: 2605.17823
カテゴリ: cs.CV, cs.AI

ポイント

Abstract

Paper AI Chat

関連するAIDB記事

メタ情報