AIDB Daily Papers

心の理論で物語を読み解く！動画の時間検索を革新するStoryTR

原題: StoryTR: Narrative-Centric Video Temporal Retrieval with Theory of Mind Reasoning

著者: Xuanyue Zhong, Yuqiang Xie, Guanqun Bi, Jiangping Yang, Guibin Chen

公開日: 2026-04-25 | 分野: NLP 推論 AI 動画物語 Vision-Language-Action インタラクション VLM cs.AI AIエージェント

※ 日本語タイトル・ポイントはAIによる自動生成です。正確な内容は原論文をご確認ください。

ポイント

物語中心の動画時間検索のため、心の理論（ToM）推論を必要とするStoryTRを提案しました。
従来のモデルは表面的な事象しか理解できず、物語の因果関係や登場人物の意図を読み解けませんでした。
提案モデルは、心の理論に基づいたデータ生成パイプラインにより、パラメータサイズよりも物語推論能力が重要であることを示しました。

Abstract

Current video moment retrieval excels at action-centric tasks but struggles with narrative content. Models can see textit{what is happening} but fail to reason textit{why it matters}. This semantic gap stems from the lack of textbf{Theory of Mind (ToM)}: the cognitive ability to infer implicit intentions, mental states, and narrative causality from surface-level observations. We introduce textbf{StoryTR}, the first video moment retrieval benchmark requiring ToM reasoning, comprising 8.1k samples from narrative short-form videos (shorts/reels). These videos present an ideal testbed. Their high information density encodes meaning through subtle multimodal cues. For instance, a glance paired with a sigh carries entirely different semantics than the glance alone. Yet multimodal perception alone is insufficient; ToM is required to decode that a character ``smiling'' may actually be ``concealing hostility.'' To teach models this reasoning capability, we propose an textbf{Agentic Data Pipeline} that generates training data with explicit three-tier ToM chains (intent decoding, narrative reasoning, boundary localization). Experiments reveal the severity of the reasoning gap: Gemini-3.0-Pro achieves only 0.53 Avg IoU on StoryTR. However, our 7B textbf{Shorts-Moment} model, trained on ToM-guided data, improves +15.1% relative IoU over baselines, demonstrating that textit{narrative reasoning capability matters more than parameter scale}.

Paper AI Chat

この論文のPDF全文を対象にAIに質問できます。

質問の例:

AIチャット機能を利用するには、ログインまたは会員登録（無料）が必要です。

会員登録 / ログイン

💬 ディスカッション

ディスカッションに参加するにはログインが必要です。

ログイン / アカウント作成 →

arxivで読む PDFを開く

メタ情報

arxiv ID: 2604.23198
カテゴリ: cs.AI

ポイント

Abstract

Paper AI Chat

💬 ディスカッション

関連するAIDB記事

メタ情報