AIDB Daily Papers

CollabBench：多様なプレイヤーとの協調能力を解き放つLLMベンチマーク

原題: CollabBench: Benchmarking and Unleashing Collaborative Ability of LLMs with Diverse Players via Proactive Engagement

著者: Hong Qian, Yuanhao Liu, Zihan Zhou, Zongbao Zhang, Hanjie Ge, Haotian Shi, Liang Dou, Xiangfeng Wang, Jingwen Yang, Aimin Zhou

公開日: 2026-06-04 | 分野: LLM cs.CL cs.AI cs.CY cs.LG AIエージェント

※ 日本語タイトル・ポイントはAIによる自動生成です。正確な内容は原論文をご確認ください。

ポイント

LLMの協調能力を評価・訓練するためのベンチマーク「CollabBench」を提案した。
多様なプレイヤーの行動を模倣するパイプラインと、推論・通信・行動を統合する訓練パラダイムが特徴である。
実験の結果、提案手法はベースモデルと比較して効率性を19.5%、感情的パフォーマンスを24.4%向上させた。

Abstract

While LLM-based agents excel at individual tasks, effective collaboration with realistic human partners remains challenging. Most of the existing conversation-level collaborative studies lack grounded interaction and behavioral execution, motivating the need for cooperative game environments that enable contextualized and immersive collaboration. To this end, this paper proposes CollabBench, a benchmark for evaluating and training collaborative agents in cooperative games. CollabBench features a Diverse Player Profile Simulation pipeline to model varied players behaviors, and a Collaborative Agentic Training paradigm that unifies reasoning, communication, and action via agentic rollouts, optimized with a hybrid reward balancing task efficiency and affective adaptation. We further extend classic environments to CWAH-MultiPlayer and Cook-MultiPlayer for systematic evaluation under diverse personalities. Experiments with efficiency and affective metrics show that our trained models outperform base models, achieving 19.5% higher efficiency and 24.4% improved affective performance. Further analysis reveals key collaborative limitations of existing models and offers insights for future collaborative training.

Paper AI Chat

この論文のPDF全文を対象にAIに質問できます。

質問の例:

AIチャット機能を利用するには、ログインまたは会員登録（無料）が必要です。

会員登録 / ログイン

arXivで読む PDFを開く

メタ情報

arXiv ID: 2606.05793
カテゴリ: cs.CL, cs.AI, cs.CY, cs.LG

ポイント

Abstract

Paper AI Chat

関連するAIDB記事

メタ情報