AIDB Daily Papers

「何でも屋」エージェントを超えて：企業ワークフローにおける役割特化型マルチエージェント協調のベンチマーク

原題: Beyond the All-in-One Agent: Benchmarking Role-Specialized Multi-Agent Collaboration in Enterprise Workflows

著者: Tao Yu, Hao Wang, Changyu Li, Shenghua Chai, Minghui Zhang, Zhongtian Luo, Yuxuan Zhou, Haopeng Jin, Zhaolu Kang, Jiabing Yang, YiFan Zhang, Xinming Wang, Hongzhu Yi, Zheqi He, Jing-Shu Zheng, Xi Yang, Yan Huang, Liang Wang

公開日: 2026-05-09 | 分野: ワークフローエンタープライズ cs.MA cs.LG AIエージェントマルチエージェントシステム

※ 日本語タイトル・ポイントはAIによる自動生成です。正確な内容は原論文をご確認ください。

ポイント

企業ワークフローにおける役割特化型マルチエージェント協調を評価する新ベンチマーク「EntCollabBench」を提案した。
既存ベンチマークにはない、役割分担、アクセス制御、状態管理、承認ポリシーといった現実的な制約をシミュレートする。
実験の結果、LLMエージェントは委任やコンテキスト伝達、意思決定のコミットメントなどに課題があることが明らかになった。

Abstract

Large language model (LLM) agents are increasingly expected to operate in enterprise environments, where work is distributed across specialized roles, permission-controlled systems, and cross-departmental procedures. However, existing enterprise benchmarks largely evaluate single agents with broad tool access, while existing multi-agent benchmarks rarely capture realistic enterprise constraints such as role specialization, access control, stateful business systems, and policy-based approvals. We introduce textsc{EntCollabBench}, a benchmark for evaluating enterprise multi-agent collaboration. textsc{EntCollabBench} simulates a permission-isolated organization with 11 role-specialized agents across six departments and contains two evaluation subsets: a Workflow subset, where agents collaboratively modify enterprise system states, and an Approval subset, where agents make policy-grounded decisions. Evaluation is based on execution traces, database state verification, and deterministic policy adjudication rather than natural-language response judging. Experiments with representative LLM agents show that current models still struggle with end-to-end enterprise collaboration, especially in delegation, context transfer, parameter grounding, workflow closure, and decision commitment. textsc{EntCollabBench} provides a reproducible testbed for measuring and improving agent systems intended for realistic organizational environments.

Paper AI Chat

この論文のPDF全文を対象にAIに質問できます。

質問の例:

AIチャット機能を利用するには、ログインまたは会員登録（無料）が必要です。

会員登録 / ログイン

arXivで読む PDFを開く

メタ情報

arXiv ID: 2605.08761
カテゴリ: cs.MA, cs.LG

ポイント

Abstract

Paper AI Chat

関連するAIDB記事

メタ情報