AIDB Daily Papers

長文LLMエージェントにおける「禁止」制約は減衰し、「指示」制約は持続する

原題: Omission Constraints Decay While Commission Constraints Persist in Long-Context LLM Agents

著者: Yeran Gamage

公開日: 2026-04-22 | 分野: LLM ロボティクスセキュリティ AI cs.AI cs.CR 信頼性

※ 日本語タイトル・ポイントはAIによる自動生成です。正確な内容は原論文をご確認ください。

ポイント

本研究は、LLMエージェントが長文コンテキスト下で、禁止事項に関する指示（例：情報漏洩禁止）の遵守率が低下する現象を発見した。
この「セキュリティ・リコール・ダイバージェンス」は、従来の安全評価では見過ごされがちな、指示の種類による遵守率の非対称性を示す。
禁止事項の遵守率は会話が進むにつれて大幅に低下するが、指示事項は維持され、監視システムでは異常が検知されにくい。

Abstract

LLM agents deployed in production operate under operator-defined behavioral policies (system-prompt instructions such as prohibitions on credential disclosure, data exfiltration, and unauthorized output) that safety evaluations assume hold throughout a conversation. Prohibition-type constraints decay under context pressure while requirement-type constraints persist; we term this asymmetry Security-Recall Divergence (SRD). In a 4,416-trial three-arm causal study across 12 models and 8 providers at six conversation depths, omission compliance falls from 73% at turn 5 to 33% at turn 16 while commission compliance holds at 100% (Mistral Large 3, $p < 10^{-33}$). In the two models with token-matched padding controls, schema semantic content accounts for 62-100% of the dilution effect. Re-injecting constraints before the per-model Safe Turn Depth (STD) restores compliance without retraining. Production security policies consist of prohibitions such as never revealing credentials, never executing untrusted code, and never forwarding user data. Commission-type audit signals remain healthy while omission constraints have already failed, leaving the failure invisible to standard monitoring.

Paper AI Chat

この論文のPDF全文を対象にAIに質問できます。

質問の例:

AIチャット機能を利用するには、ログインまたは会員登録（無料）が必要です。

会員登録 / ログイン

💬 ディスカッション

ディスカッションに参加するにはログインが必要です。

ログイン / アカウント作成 →

arxivで読む PDFを開く

メタ情報

arxiv ID: 2604.20911
カテゴリ: cs.CR, cs.AI

ポイント

Abstract

Paper AI Chat

💬 ディスカッション

関連するAIDB記事

メタ情報