AIDB Daily Papers
スキルクラフト:LLMエージェントはツールを巧みに使えるようになるか?
※ 日本語タイトル・ポイントはAIによる自動生成です。正確な内容は原論文をご確認ください。
ポイント
- 現実世界のツール利用を模倣し、LLMエージェントが高度なツール合成スキルを獲得・再利用できるかを検証するベンチマークを提案。
- 既存のベンチマークが静的なツールセットでの成功に焦点を当てるのに対し、本研究は再利用可能なスキル獲得能力の評価に重点を置く点が新しい。
- スキル保存と再利用により、トークン使用量を最大80%削減し、成功率とツール合成能力の強い相関関係を確認した。
Abstract
Real-world tool-using agents operate over long-horizon workflows with recurring structure and diverse demands, where effective behavior requires not only invoking atomic tools but also abstracting, and reusing higher-level tool compositions. However, existing benchmarks mainly measure instance-level success under static tool sets, offering limited insight into agents' ability to acquire such reusable skills. We address this gap by introducing SkillCraft, a benchmark explicitly stress-test agent ability to form and reuse higher-level tool compositions, where we call Skills. SkillCraft features realistic, highly compositional tool-use scenarios with difficulty scaled along both quantitative and structural dimensions, designed to elicit skill abstraction and cross-task reuse. We further propose a lightweight evaluation protocol that enables agents to auto-compose atomic tools into executable Skills, cache and reuse them inside and across tasks, thereby improving efficiency while accumulating a persistent library of reusable skills. Evaluating state-of-the-art agents on SkillCraft, we observe substantial efficiency gains, with token usage reduced by up to 80% by skill saving and reuse. Moreover, success rate strongly correlates with tool composition ability at test time, underscoring compositional skill acquisition as a core capability.
Paper AI Chat
この論文のPDF全文を対象にAIに質問できます。
質問の例: