AIDB Daily Papers

見て、推論し、介入せよ：目標指向型ソーシャルインテリジェンスのための先回り型世界モデル

原題: See, Infer, Intervene: Proactive World Modeling for Goal-Oriented Social Intelligence

著者: Honghui Zhang, Chenmeinian Guo, Yichen Yu, Guanyu Liu, Yongming Qin, Chongguo Song, Mengyue Yang, Lei Yu, Tianyu Shi

公開日: 2026-06-02 | 分野: AI cs.CL cs.AI AIエージェント AI支援 AI評価

※ 日本語タイトル・ポイントはAIによる自動生成です。正確な内容は原論文をご確認ください。

ポイント

顧客の行動を認識するだけでなく、明示的な要求前に支援すべきか、どのように支援すべきかを判断するフレームワークを提案した。
本研究は、顧客の購買意欲段階と心理状態をモデル化し、5つの応答クラスから最適な介入を選択する新しいアプローチである。
提案手法は、スマートリテールベンチマークにおいてベースラインを上回り、実店舗でのパイロットテストでも有効性を示した。

Abstract

Multimodal retail agents should not only recognize what a customer is doing, but also decide whether and how to assist before an explicit request is made. We study this setting through the See--Infer--Intervene (SII) framework, where a device must see pre-interaction behavior, infer latent customer intent, and act by selecting an appropriate service intervention or choosing to wait. We instantiate SII with the Proactive Intent World Model (PIWM), which represents customer state with AIDA (Attention, Interest, Desire, Action) purchasing phases and BDI (belief, desire, intention) psychological fields, predicts action-conditioned intent transitions, and selects from five response classes: Greet, Elicit, Inform, Recommend, and Hold. We further construct GuidanceSalesBench, a smart-retail benchmark containing state manifests, pre-interaction videos, candidate responses, action-conditioned outcomes, and best-action labels. When conditioned on ground-truth customer state to isolate action selection, PIWM achieves 0.641 macro F1 on 30 held-out target videos, outperforming a zero-shot Qwen2.5-VL-7B baseline and training variants without balanced action supervision; end-to-end video-only selection drops to 0.295, below the 5-class balanced random baseline of 0.414, identifying video-to-state grounding as the dominant deployment-time bottleneck. A preliminary staged real-store pilot (recorded with paid participants performing scripted customer behaviors) reaches 0.579 action macro F1 on 20 fully annotated videos, with 10 additional accessible videos released with index-level labels.

Paper AI Chat

この論文のPDF全文を対象にAIに質問できます。

質問の例:

AIチャット機能を利用するには、ログインまたは会員登録（無料）が必要です。

会員登録 / ログイン

arXivで読む PDFを開く

メタ情報

arXiv ID: 2606.03371
カテゴリ: cs.CL

ポイント

Abstract

Paper AI Chat

関連するAIDB記事

メタ情報