AIDB Daily Papers

SafeHarness：LLMエージェント展開のためのライフサイクル統合型セキュリティアーキテクチャ

原題: SafeHarness: Lifecycle-Integrated Security Architecture for LLM-based Agent Deployment

著者: Xixun Lin, Yang Liu, Yancheng Chen, Yongxuan Wu, Yucheng Ning, Yilong Liu, Nan Sun, Shun Zhang, Bin Chong, Chuan Zhou, Yanan Cao, Li Guo

公開日: 2026-04-15 | 分野: LLM 安全性セキュリティ機械学習 AI エージェント自然言語処理

※ 日本語タイトル・ポイントはAIによる自動生成です。正確な内容は原論文をご確認ください。

ポイント

LLMエージェントの実行を支えるハーネスは、攻撃対象として非常に重要だが、既存のセキュリティ対策では不十分である。
SafeHarnessは、エージェントのライフサイクルに統合された4層の防御構造により、ハーネス内部の状態を監視し、攻撃を未然に防ぐ。
SafeHarnessは、UBR（安全でない行動率）を平均38%、ASR（攻撃成功率）を平均42%削減し、タスクの有用性を維持した。

Abstract

The performance of large language model (LLM) agents depends critically on the execution harness, the system layer that orchestrates tool use, context management, and state persistence. Yet this same architectural centrality makes the harness a high-value attack surface: a single compromise at the harness level can cascade through the entire execution pipeline. We observe that existing security approaches suffer from structural mismatch, leaving them blind to harness-internal state and unable to coordinate across the different phases of agent operation. In this paper, we introduce safeharness{}, a security architecture in which four proposed defense layers are woven directly into the agent lifecycle to address above significant limitations: adversarial context filtering at input processing, tiered causal verification at decision making, privilege-separated tool control at action execution, and safe rollback with adaptive degradation at state update. The proposed cross-layer mechanisms tie these layers together, escalating verification rigor, triggering rollbacks, and tightening tool privileges whenever sustained anomalies are detected. We evaluate safeharness{} on benchmark datasets across diverse harness configurations, comparing against four security baselines under five attack scenarios spanning six threat categories. Compared to the unprotected baseline, safeharness{} achieves an average reduction of approximately 38% in UBR and 42% in ASR, substantially lowering both the unsafe behavior rate and the attack success rate while preserving core task utility.

Paper AI Chat

この論文のPDF全文を対象にAIに質問できます。

質問の例:

AIチャット機能を利用するには、ログインまたは会員登録（無料）が必要です。

会員登録 / ログイン

💬 ディスカッション

ディスカッションに参加するにはログインが必要です。

ログイン / アカウント作成 →

arxivで読む PDFを開く

メタ情報

arxiv ID: 2604.13630
カテゴリ: cs.CR, cs.AI

ポイント

Abstract

Paper AI Chat

💬 ディスカッション

関連するAIDB記事

メタ情報