AIDB Daily Papers

構造化された空間推論で3Dシーン内の物体を精密に特定するSSR3D-LLM

原題: SSR3D-LLM: Structured Spatial Reasoning via Latent Steps for Fine-Grained Grounding in Unified 3D-LLMs

著者: Jiawei Li, Ziyi Liu, Weijie Shi, Long Chen, Jiajie Xu, Xiaofang Zhou

公開日: 2026-05-27 | 分野: LLM コンピュータビジョン AI 3D cs.AI cs.CV

※ 日本語タイトル・ポイントはAIによる自動生成です。正確な内容は原論文をご確認ください。

ポイント

自然言語から3Dシーン内の物体を特定する「3D物体グラウンディング」を、構造化された推論ステップにより精密化した。
従来の単一選択方式の限界を克服し、文脈オブジェクトや空間関係を考慮した多段階の推論を可能にした点が重要である。
ReferIt3Dなどのベンチマークで、既存の統合型3D-LLMを大幅に上回る最高性能を達成し、特に精密な指示への対応力を示した。

Abstract

3D object grounding localizes referred objects in a 3D scene from natural language. Unified instance-centric 3D-LLMs aim to solve grounding together with dialog, QA, and captioning, yet many rely on a single pointer-style grounding decision that compresses a relational instruction into one selection. This is brittle for fine-grained queries where multiple same-class candidates must be ruled out by context objects and spatial relations. We propose Structured Spatial Reasoning 3D-LLM (SSR3D-LLM), a structured grounding interface for unified 3D-LLMs. Given fixed Mask3D object proposals, the LLM writes a sequence of latent spatial reasoning steps and memory tokens from the query, and a geometry-aware scorer reads these latent steps in order to refine candidate rankings step by step with step-length masking. The latent steps are learned from standard benchmark target supervision with auxiliary referential-cue supervision during training, while inference uses only the input query and Mask3D proposals. Across ReferIt3D, ScanRefer, and Multi3DRef, SSR3D-LLM achieves the strongest results among unified 3D-LLM baselines, with substantial gains over the single-pointer QPG baseline on fine-grained grounding and consistent improvements over prior unified 3D-LLMs, while preserving the default language-task route.

Paper AI Chat

この論文のPDF全文を対象にAIに質問できます。

質問の例:

AIチャット機能を利用するには、ログインまたは会員登録（無料）が必要です。

会員登録 / ログイン

arXivで読む PDFを開く

メタ情報

arXiv ID: 2605.28490
カテゴリ: cs.CV, cs.AI

ポイント

Abstract

Paper AI Chat

関連するAIDB記事

メタ情報