AIDB Daily Papers
Claude Codeの自動モードにおける許可ゲートの脆弱性評価:意図的な曖昧性シナリオでのストレステスト
※ 日本語タイトル・ポイントはAIによる自動生成です。正確な内容は原論文をご確認ください。
ポイント
- AIコーディングエージェントの許可システムであるClaude Codeの自動モードを、意図的な曖昧性を持つタスクで評価しました。
- 本研究は、実際の使用状況とは異なる、曖昧な指示に対する自動モードの脆弱性を独立して評価する初めての試みです。
- その結果、エンドツーエンドの偽陰性率は81.0%と高く、ファイル編集による危険なアクションが検知されないことが課題として明らかになりました。
Abstract
Claude Code's auto mode is the first deployed permission system for AI coding agents, using a two-stage transcript classifier to gate dangerous tool calls. Anthropic reports a 0.4% false positive rate and 17% false negative rate on production traffic. We present the first independent evaluation of this system on deliberately ambiguous authorization scenarios, i.e., tasks where the user's intent is clear but the target scope, blast radius, or risk level is underspecified. Using AmPermBench, a 128-prompt benchmark spanning four DevOps task families and three controlled ambiguity axes, we evaluate 253 state-changing actions at the individual action level against oracle ground truth. Our findings characterize auto mode's scope-escalation coverage under this stress-test workload. The end-to-end false negative rate is 81.0% (95% CI: 73.8%-87.4%), substantially higher than the 17% reported on production traffic, reflecting a fundamentally different workload rather than a contradiction. Notably, 36.8% of all state-changing actions fall outside the classifier's scope via Tier 2 (in-project file edits), contributing to the elevated end-to-end FNR. Even restricting to the 160 actions the classifier actually evaluates (Tier 3), the FNR remains 70.3%, while the FPR rises to 31.9%. The Tier 2 coverage gap is most pronounced on artifact cleanup (92.9% FNR), where agents naturally fall back to editing state files when the expected CLI is unavailable. These results highlight a coverage boundary worth examining: auto mode assumes dangerous actions transit the shell, but agents routinely achieve equivalent effects through file edits that the classifier does not evaluate.
Paper AI Chat
この論文のPDF全文を対象にAIに質問できます。
質問の例: