AIDB Daily Papers
LLMにおける人間のような社会推論の実現:Social-R1
※ 日本語タイトル・ポイントはAIによる自動生成です。正確な内容は原論文をご確認ください。
ポイント
- LLMに社会的な状況を理解し、適切な対応を生成する能力を付与する研究を行った。
- 既存モデルが表面的なパターンに頼る問題を克服するため、困難な事例で学習する手法を提案。
- 強化学習フレームワークSocial-R1により、4Bパラメータモデルが大規模モデルを凌駕する性能を発揮。
Abstract
While large language models demonstrate remarkable capabilities across numerous domains, social intelligence - the capacity to perceive social cues, infer mental states, and generate appropriate responses - remains a critical challenge, particularly for enabling effective human-AI collaboration and developing AI that truly serves human needs. Current models often rely on superficial patterns rather than genuine social reasoning. We argue that cultivating human-like social intelligence requires training with challenging cases that resist shortcut solutions. To this end, we introduce ToMBench-Hard, an adversarial benchmark designed to provide hard training examples for social reasoning. Building on this, we propose Social-R1, a reinforcement learning framework that aligns model reasoning with human cognition through multi-dimensional rewards. Unlike outcome-based RL, Social-R1 supervises the entire reasoning process, enforcing structural alignment, logical integrity, and information density. Results show that our approach enables a 4B parameter model to surpass much larger counterparts and generalize robustly across eight diverse benchmarks. These findings demonstrate that challenging training cases with trajectory-level alignment offer a path toward efficient and reliable social intelligence.
Paper AI Chat
この論文のPDF全文を対象にAIに質問できます。
質問の例: