AIDB Daily Papers

Webエージェントをプロンプトインジェクションから守る敵対的頑健防御「WARD」

原題: WARD: Adversarially Robust Defense of Web Agents Against Prompt Injections

著者: Tri Cao, Yulin Chen, Hieu Cao, Yibo Li, Khoi Le, Thong Nguyen, Yuexin Li, Yufei He, Yue Liu, Shuicheng Yan, Bryan Hooi

公開日: 2026-05-14 | 分野: セキュリティ cs.AI cs.CR AIエージェント AI安全性

※ 日本語タイトル・ポイントはAIによる自動生成です。正確な内容は原論文をご確認ください。

ポイント

Webエージェントのプロンプトインジェクション攻撃に対する防御モデル「WARD」を提案した。
既存モデルの汎化性能、誤検知率、効率性の課題を解決し、進化する攻撃にも対応可能である。
大規模データセットと敵対的学習により、高い防御性能と低遅延を実現した。

Abstract

Web agents can autonomously complete online tasks by interacting with websites, but their exposure to open web environments makes them vulnerable to prompt injection attacks embedded in HTML content or visual interfaces. Existing guard models still suffer from limited generalization to unseen domains and attack patterns, high false positive rates on benign content, reduced deployment efficiency due to added latency at each step, and vulnerability to adversarial attacks that evolve over time or directly target the guard itself. To address these limitations, we propose WARD (Web Agent Robust Defense against Prompt Injection), a practical guard model for secure and efficient web agents. WARD is built on WARD-Base, a large-scale dataset with around 177K samples collected from 719 high-traffic URLs and platforms, and WARD-PIG, a dedicated dataset designed for prompt injection attacks targeting the guard model. We further introduce A3T, an adaptive adversarial attack training framework that iteratively strengthens WARD through a memory-based attacker and guard co-evolution process. Extensive experiments show that WARD achieves nearly perfect recall on out-of-distribution benchmarks, maintains low false positive rates to preserve agent utility, remains robust against guard-targeted and adaptive attacks under substantial distribution shifts, and runs efficiently in parallel with the agent without introducing additional latency.

Paper AI Chat

この論文のPDF全文を対象にAIに質問できます。

質問の例:

AIチャット機能を利用するには、ログインまたは会員登録（無料）が必要です。

会員登録 / ログイン

arXivで読む PDFを開く

メタ情報

arXiv ID: 2605.15030
カテゴリ: cs.CR, cs.AI

ポイント

Abstract

Paper AI Chat

関連するAIDB記事

メタ情報