AIDB Daily Papers
CollabBench:多様なプレイヤーとの協調能力を解き放つLLMベンチマーク
※ 日本語タイトル・ポイントはAIによる自動生成です。正確な内容は原論文をご確認ください。
ポイント
- LLMの協調能力を評価・訓練するためのベンチマーク「CollabBench」を提案した。
- 多様なプレイヤーの行動を模倣するパイプラインと、推論・通信・行動を統合する訓練パラダイムが特徴である。
- 実験の結果、提案手法はベースモデルと比較して効率性を19.5%、感情的パフォーマンスを24.4%向上させた。
Abstract
While LLM-based agents excel at individual tasks, effective collaboration with realistic human partners remains challenging. Most of the existing conversation-level collaborative studies lack grounded interaction and behavioral execution, motivating the need for cooperative game environments that enable contextualized and immersive collaboration. To this end, this paper proposes CollabBench, a benchmark for evaluating and training collaborative agents in cooperative games. CollabBench features a Diverse Player Profile Simulation pipeline to model varied players behaviors, and a Collaborative Agentic Training paradigm that unifies reasoning, communication, and action via agentic rollouts, optimized with a hybrid reward balancing task efficiency and affective adaptation. We further extend classic environments to CWAH-MultiPlayer and Cook-MultiPlayer for systematic evaluation under diverse personalities. Experiments with efficiency and affective metrics show that our trained models outperform base models, achieving 19.5% higher efficiency and 24.4% improved affective performance. Further analysis reveals key collaborative limitations of existing models and offers insights for future collaborative training.
Paper AI Chat
この論文のPDF全文を対象にAIに質問できます。
質問の例: