AIDB Daily Papers

SVAgent：ストーリー展開に基づくクロスモーダル・マルチエージェント協調による長尺動画理解

原題: SVAgent: Storyline-Guided Long Video Understanding via Cross-Modal Multi-Agent Collaboration

著者: Zhongyu Yang, Zuhao Yang, Shuo Zhan, Tan Yue, Wei Pang, Yingfang Yuan

公開日: 2026-04-06 | 分野: マルチモーダルコンピュータビジョンエージェント動画ストーリー質問応答自然言語処理深層学習アニメーション VideoQA

※ 日本語タイトル・ポイントはAIによる自動生成です。正確な内容は原論文をご確認ください。

ポイント

本研究では、動画理解における人間のようなストーリー展開の推論を模倣するSVAgentという新しいフレームワークを提案した。
SVAgentは、視覚とテキストの両方の情報を統合し、過去の失敗事例を分析することで、より高度な動画理解を可能にする。
実験結果から、SVAgentが既存手法を凌駕する性能と解釈可能性を示し、動画理解における新たな可能性を示唆した。

Abstract

Video question answering (VideoQA) is a challenging task that requires integrating spatial, temporal, and semantic information to capture the complex dynamics of video sequences. Although recent advances have introduced various approaches for video understanding, most existing methods still rely on locating relevant frames to answer questions rather than reasoning through the evolving storyline as humans do. Humans naturally interpret videos through coherent storylines, an ability that is crucial for making robust and contextually grounded predictions. To address this gap, we propose SVAgent, a storyline-guided cross-modal multi-agent framework for VideoQA. The storyline agent progressively constructs a narrative representation based on frames suggested by a refinement suggestion agent that analyzes historical failures. In addition, cross-modal decision agents independently predict answers from visual and textual modalities under the guidance of the evolving storyline. Their outputs are then evaluated by a meta-agent to align cross-modal predictions and enhance reasoning robustness and answer consistency. Experimental results demonstrate that SVAgent achieves superior performance and interpretability by emulating human-like storyline reasoning in video understanding.

Paper AI Chat

この論文のPDF全文を対象にAIに質問できます。

質問の例:

AIチャット機能を利用するには、ログインまたは会員登録（無料）が必要です。

会員登録 / ログイン

💬 ディスカッション

ディスカッションに参加するにはログインが必要です。

ログイン / アカウント作成 →

arxivで読む PDFを開く

メタ情報

arxiv ID: 2604.05079
カテゴリ: cs.CV

ポイント

Abstract

Paper AI Chat

💬 ディスカッション

関連するAIDB記事

メタ情報