AIDB Daily Papers
心の響き合い:心の理論を持つ閉ループ型ソーシャルアバター
※ 日本語タイトル・ポイントはAIによる自動生成です。正確な内容は原論文をご確認ください。
ポイント
- 人間のような社会的知能を持つデジタルヒューマンを、認知推論とマルチモーダル生成を統合する閉ループ型フレームワークで実現した。
- 従来の別個のアプローチのギャップを埋めるため、知覚、心の理論に基づく社会的推論、感情制御可能な動画生成を統合した。
- 情報非対称下での評価のため、心理的根拠のあるペルソナとプライベートな社会的目標を持つ階層的なデータセットを構築し、対話品質と動画生成の両方で優れた結果を得た。
Abstract
Creating lifelike digital humans with genuine social intelligence requires unifying cognitive reasoning and multimodal generation within a coherent framework. Current approaches treat these as separate tasks: Large Language Models excel at dialogue but lack embodied expression, while diffusion-based talking head models achieve visual fidelity but ignore social cognition. To bridge this gap, we propose a closed-loop dual-agent framework integrating perception, social reasoning, and expression into a continuous interaction cycle. The perception module analyzes partners' multimodal behaviors from video, while the social reasoning module infers hidden mental states through Theory of Mind and selects responses via an ensemble mechanism. The expression module then generates emotion-controllable dual-agent videos synthesizing both speaker speech and expression alongside listener reactive behaviors, capturing bidirectional dynamics absent in prior work. We construct a hierarchical Persona-Scenario dataset with psychologically grounded personas and private social goals to support evaluation under information asymmetry. Experiments on this dataset demonstrate competitive or superior performance on both dialogue quality and video generation metrics. Notably, our method surpasses even the full-information Script mode on key dialogue quality dimensions, suggesting that explicit mental state inference under uncertainty can elicit more thoughtful dialogue than unrestricted information access.
Paper AI Chat
この論文のPDF全文を対象にAIに質問できます。
質問の例: