AIDB Daily Papers

ソーシャルロボットのための軽量な視覚的推論

原題: Lightweight Visual Reasoning for Socially-Aware Robots

著者: Alessio Galatolo, Ronald Cumbal, Alexandros Rouchitsas, Katie Winkle, Didem Gürdür Broo, Ginevra Castellano

公開日: 2026-03-04 | 分野: LLM ロボティクスコンピュータビジョン人間推論対話

※ 日本語タイトル・ポイントはAIによる自動生成です。正確な内容は原論文をご確認ください。

ポイント

人間と共存するロボットが、周囲の状況を理解し、人間の行動に対応するための軽量なモジュールを開発した。
LLMと視覚エンコーダ間の連携を強化し、テキスト情報を基に視覚情報を再解釈することで、より複雑なHRIに対応する。
シミュレーション環境でのナビゲーション、シーン記述、人間の意図認識のタスクで性能向上を確認、特に意図認識で大幅な精度向上。

Abstract

Robots operating in shared human environments must not only navigate, interact, and detect their surroundings, they must also interpret and respond to dynamic, and often unpredictable, human behaviours. Although recent advances have shown promise in enhancing robotic perception and instruction-following using Vision-Language Models (VLMs), they remain limited in addressing the complexities of multimodal human-robot interactions (HRI). Motivated by this challenge, we introduce a lightweight language-to-vision feedback module that closes the loop between an LLM and the vision encoder in VLMs. The module projects image-token hidden states through a gated Multi-Layer Perceptron (MLP) back into the encoder input, prompting a second pass that reinterprets the scene under text context. We evaluate this approach on three robotics-centred tasks: navigation in a simulated environment (Habitat), sequential scene description (Mementos-Robotics), and human-intention recognition (our HRI dataset). Results show that our method improves Qwen 2.5 (7B) by $3.3%$ (less distance), $+0.057$ description score, and $+2.93%$ accuracy, with less than $3%$ extra parameters; Gemma 3 (4B) and LLaVA OV 1.5 (4B) show mixed navigation results but gains $+0.111,+0.055$ and $+10.81%,+4.79%$ on the latter two tasks. Code is available at https://github.com/alessioGalatolo/VLM-Reasoning-for-Robotics

Paper AI Chat

この論文のPDF全文を対象にAIに質問できます。

質問の例:

AIチャット機能を利用するには、ログインまたは会員登録（無料）が必要です。

会員登録 / ログイン

💬 ディスカッション

ディスカッションに参加するにはログインが必要です。

ログイン / アカウント作成 →

arxivで読む PDFを開く

メタ情報

arxiv ID: 2603.03942
カテゴリ: cs.RO

ポイント

Abstract

Paper AI Chat

💬 ディスカッション

関連するAIDB記事

メタ情報