AIDB Daily Papers

LLMの脆弱なコード生成を修正する外科的アプローチ

原題: Surgical Repair of Insecure Code Generation in LLMs

著者: Gustavo Sandoval, Brendan Dolan-Gavitt, Siddharth Garg

公開日: 2026-04-17 | 分野: LLM 解釈性セキュリティコード生成 cs.CR cs.LG

※ 日本語タイトル・ポイントはAIによる自動生成です。正確な内容は原論文をご確認ください。

ポイント

大規模言語モデルが生成するコードに潜む脆弱性の原因を、知識不足ではなくフォーマット準拠との競合にあると特定しました。
この研究は、脆弱性特定能力とコード生成能力の乖離（フォーマット信頼性ギャップ）を明らかにし、解釈可能性の問題であることを示唆します。
脆弱性ごとに特化した修正ベクトルを適用することで、脆弱なコード生成を大幅に削減し、その効果は複数のモデルと脆弱性タイプで確認されました。

Abstract

Large language models write production code, and yet they routinely introduce well-known vulnerabilities. We show that this is not a knowledge deficit: the same models that generate insecure code, correctly identify and explain the vulnerability when asked directly, this is a gap we call the Format-Reliability Gap. Mechanistic analysis reveals the cause: security representations are encoded from the earliest layers but remain computationally inert until the final layer, where format-compliance demands compete with them. Because the failure is localized to a single layer, per-vulnerability steering vectors reduce insecure generation by up to 74% with negligible overhead. The mechanism and the fix generalize across five models, three architecture families, and six vulnerability types, suggesting insecure code generation is an interpretability problem, not a training artifact.

Paper AI Chat

この論文のPDF全文を対象にAIに質問できます。

質問の例:

AIチャット機能を利用するには、ログインまたは会員登録（無料）が必要です。

会員登録 / ログイン

💬 ディスカッション

ディスカッションに参加するにはログインが必要です。

ログイン / アカウント作成 →

arxivで読む PDFを開く

メタ情報

arxiv ID: 2604.16697
カテゴリ: cs.CR, cs.LG

ポイント

Abstract

Paper AI Chat

💬 ディスカッション

関連するAIDB記事

メタ情報