AIDB Daily Papers

AIエージェントのスキル組織が実行時行動に与える影響を測定するSkillJuror

原題: SkillJuror: Measuring How Agent Skill Organization Changes Runtime Behavior

著者: Zhiyu Chen, Zihan Guo, Bo Huang, Bingwei Lu, Jianghao Lin, Yuanjian Zhou, Weinan Zhang

公開日: 2026-06-10 | 分野: LLM cs.AI cs.SE AIエージェント AI評価

※ 日本語タイトル・ポイントはAIによる自動生成です。正確な内容は原論文をご確認ください。

ポイント

AIエージェントが手続き的知識をどのように組織化し、利用するかを評価するフレームワークSkillJurorを提案した。
既存の評価ではスキルの内容のみが重視されがちだったが、本研究ではスキルの組織化がエージェントの行動に影響を与えることを示した。
プログレッシブ開示手法は、エージェントがより多くのスキルリソースにアクセスし、タスク完了率を向上させるが、その効果はタスクに依存することが明らかになった。

Abstract

Agent Skills augment large language model (LLM) agents with procedural knowledge at inference time, but current benchmarks rarely distinguish what a Skill says from how it is organized. We study this distinction through Progressive Disclosure, where a concise root file points agents to supporting resources on demand, and compare it with a normalized flat baseline. We present SkillJuror, a framework for evaluating Skill writing paradigms through semantically controlled variants, matched multi-trial evaluations, and trajectory evidence while holding task knowledge fixed. In an 82-task SkillsBench study, Progressive Disclosure changes runtime behavior before aggregate outcomes: distinct Skill resources touched per trajectory rise from 1.18 to 3.85, and effective uptake events rise from 1.33 to 3.92. It also yields 17 additional verifier-passing trials out of 410 matched trials (+4.1%) over the normalized flat baseline. The benefit is task-dependent. Progressive Disclosure helps when supporting resources guide implementation, checking, or repair, but is weaker when success hinges on exact output conventions, numerical thresholds, or long artifact-generation pipelines. These results show that Skill organization is not mere presentation: it can change how agents search and apply procedural knowledge, while outcome gains depend on whether the exposed resources are actionable for the task. Code is available at https://github.com/zhiyuchen-ai/skill-juror.

Paper AI Chat

この論文のPDF全文を対象にAIに質問できます。

質問の例:

AIチャット機能を利用するには、ログインまたは会員登録（無料）が必要です。

会員登録 / ログイン

arXivで読む PDFを開く

メタ情報

arXiv ID: 2606.11543
カテゴリ: cs.AI, cs.SE

ポイント

Abstract

Paper AI Chat

関連するAIDB記事

メタ情報