AIDB Daily Papers
LLMエージェントの記憶汚染を事後的に監査するフレームワーク「MemAudit」
※ 日本語タイトル・ポイントはAIによる自動生成です。正確な内容は原論文をご確認ください。
ポイント
- LLMエージェントの記憶に不正な記録が注入される脆弱性に対し、事後的な監査手法を提案しました。
- 因果的帰属と構造的異常検知を組み合わせ、有害な記憶の影響を特定する点が新規性です。
- QAと推論タスクにおいて、攻撃成功率を大幅に低下させる有効性が確認されました。
Abstract
Large language model agents increasingly rely on persistent memory to store past interactions, retrieve relevant demonstrations, and improve long-horizon task execution. However, this memory mechanism also creates a practical security vulnerability: an adversarial user may inject malicious records into the agent's memory through ordinary interaction, and these records can later be retrieved to steer the agent's reasoning and actions. Existing defenses primarily focus on online intervention, such as prompt filtering or output blocking, but they do not address the post-hoc question of which stored memories are responsible after harmful behavior has already been observed. We propose textbf{MemAudit}, a post-hoc causal memory auditing framework for memory-augmented LLM agents. The framework combines two complementary signals: (1) a counterfactual memory influence score that measures each memory's causal contribution to harmful outputs, and (2) a memory consistency graph that identifies structurally anomalous memories within the broader memory store. We evaluate MemAudit against MINJA, a query-only memory injection attack in which malicious records are generated and stored through normal agent interactions rather than direct memory-bank modification. Across both QA and reasoning-agent settings, MemAudit substantially reduces attack success rates under realistic post-hoc auditing scenarios. The results show that QA attack success is reduced from $70%$ to $0%$, while RAP attack success drops from $83.3%$ to $0%$.
Paper AI Chat
この論文のPDF全文を対象にAIに質問できます。
質問の例: