AIDB Daily Papers

SkillAttack：攻撃経路洗練によるエージェントスキルの自動レッドチーム

原題: SkillAttack: Automated Red Teaming of Agent Skills through Attack Path Refinement

著者: Zenghao Duan, Yuxin Tian, Zhiyi Yin, Liang Pang, Jingcheng Deng, Zihao Wei, Shicheng Xu, Yuyao Ge, Xueqi Cheng

公開日: 2026-04-05 | 分野: LLM セキュリティ機械学習 AI エージェントリスク評価プロンプト自動化テスト自然言語処理脆弱性深層学習人工知能敵対者

※ 日本語タイトル・ポイントはAIによる自動生成です。正確な内容は原論文をご確認ください。

ポイント

LLMエージェントはスキルを活用して能力を拡張するが、その開放性ゆえに徹底的な検証が困難である。
SkillAttackは、敵対的プロンプトを通じてスキルの脆弱性を動的に検証するレッドチームフレームワークである。
実験の結果、SkillAttackは既存手法を大幅に上回り、現実的なエージェントインタラクションにおけるセキュリティリスクを明らかにした。

Abstract

LLM-based agent systems increasingly rely on agent skills sourced from open registries to extend their capabilities, yet the openness of such ecosystems makes skills difficult to thoroughly vet. Existing attacks rely on injecting malicious instructions into skills, making them easily detectable by static auditing. However, non-malicious skills may also harbor latent vulnerabilities that an attacker can exploit solely through adversarial prompting, without modifying the skill itself. We introduce SkillAttack, a red-teaming framework that dynamically verifies skill vulnerability exploitability through adversarial prompting. SkillAttack combines vulnerability analysis, surface-parallel attack generation, and feedback-driven exploit refinement into a closed-loop search that progressively converges toward successful exploitation. Experiments across 10 LLMs on 71 adversarial and 100 real-world skills show that SkillAttack outperforms all baselines by a wide margin (ASR 0.73--0.93 on adversarial skills, up to 0.26 on real-world skills), revealing that even well-intended skills pose serious security risks under realistic agent interactions.

Paper AI Chat

この論文のPDF全文を対象にAIに質問できます。

質問の例:

AIチャット機能を利用するには、ログインまたは会員登録（無料）が必要です。

会員登録 / ログイン

💬 ディスカッション

ディスカッションに参加するにはログインが必要です。

ログイン / アカウント作成 →

arxivで読む PDFを開く

メタ情報

arxiv ID: 2604.04989
カテゴリ: cs.CR

ポイント

Abstract

Paper AI Chat

💬 ディスカッション

関連するAIDB記事

メタ情報