AIDB Daily Papers

言葉を超えて：一人称視点における手の指示による指示表現のグラウンディング

原題: Beyond Language: Grounding Referring Expressions with Hand Pointing in Egocentric Vision

著者: Ling Li, Bowen Liu, Zinuo Zhan, Peng Jie, Jianhui Zhong, Kenglun Chang, Zhidong Deng

公開日: 2026-03-27 | 分野: マルチモーダルロボティクスコンピュータビジョンデータセット機械学習画像対話言語視覚研究自然言語処理

※ 日本語タイトル・ポイントはAIによる自動生成です。正確な内容は原論文をご確認ください。

ポイント

一人称視点における手の指示と視覚情報を組み合わせた新しい視覚グラウンディングのデータセットEgoPoint-Groundを構築した。
言語的な曖昧さを解消し、現実世界のインタラクションで重要な非言語的指示を捉える、大規模なマルチモーダルデータセットであることが重要である。
提案手法SV-CoTは、既存手法と比較して11.7%の性能向上を達成し、マルチモーダルな意図理解能力を向上させた。

Abstract

Traditional Visual Grounding (VG) predominantly relies on textual descriptions to localize objects, a paradigm that inherently struggles with linguistic ambiguity and often ignores non-verbal deictic cues prevalent in real-world interactions. In natural egocentric engagements, hand-pointing combined with speech forms the most intuitive referring mechanism. To bridge this gap, we introduce EgoPoint-Ground, the first large-scale multimodal dataset dedicated to egocentric deictic visual grounding. Comprising over textbf{15k} interactive samples in complex scenes, the dataset provides rich, multi-grained annotations including hand-target bounding box pairs and dense semantic captions. We establish a comprehensive benchmark for hand-pointing referring expression resolution, evaluating a wide spectrum of mainstream Multimodal Large Language Models (MLLMs) and state-of-the-art VG architectures. Furthermore, we propose SV-CoT, a novel baseline framework that reformulates grounding as a structured inference process, synergizing gestural and linguistic cues through a Visual Chain-of-Thought paradigm. Extensive experiments demonstrate that SV-CoT achieves an $textbf{11.7%}$ absolute improvement over existing methods, effectively mitigating semantic ambiguity and advancing the capability of agents to comprehend multimodal physical intents. The dataset and code will be made publicly available.

Paper AI Chat

この論文のPDF全文を対象にAIに質問できます。

質問の例:

AIチャット機能を利用するには、ログインまたは会員登録（無料）が必要です。

会員登録 / ログイン

💬 ディスカッション

ディスカッションに参加するにはログインが必要です。

ログイン / アカウント作成 →

arxivで読む PDFを開く

メタ情報

arxiv ID: 2603.26646
カテゴリ: cs.CV

ポイント

Abstract

Paper AI Chat

💬 ディスカッション

関連するAIDB記事

メタ情報