AIDB Daily Papers

初期設定は脆弱性あり：AI生成コードのセキュリティ脆弱性に関する形式検証研究

原題: Broken by Default: A Formal Verification Study of Security Vulnerabilities in AI-Generated Code

著者: Dominik Blain, Maxime Noiseux

公開日: 2026-04-07 | 分野: LLM 安全性セキュリティ AI ソフトウェア検証プログラミングテストデバッグ脆弱性ソフトウェアエンジニアリング形式化大規模言語モデルコード生成

※ 日本語タイトル・ポイントはAIによる自動生成です。正確な内容は原論文をご確認ください。

ポイント

AIコーディング支援ツールが生成するコードの脆弱性を、形式検証によって定量的に評価した。
セキュリティが重要な領域でAI生成コードの利用が拡大する中、その安全性を数学的に証明する点が新しい。
主要モデルの55.8%に脆弱性があり、GPT-4oは62.4%と最も高い割合を示した（D評価以下）。

Abstract

AI coding assistants are now used to generate production code in security-sensitive domains, yet the exploitability of their outputs remains unquantified. We address this gap with Broken by Default: a formal verification study of 3,500 code artifacts generated by seven widely-deployed LLMs across 500 security-critical prompts (five CWE categories, 100 prompts each). Each artifact is subjected to the Z3 SMT solver via the COBALT analysis pipeline, producing mathematical satisfiability witnesses rather than pattern-based heuristics. Across all models, 55.8% of artifacts contain at least one COBALT-identified vulnerability; of these, 1,055 are formally proven via Z3 satisfiability witnesses. GPT-4o leads at 62.4% (grade F); Gemini 2.5 Flash performs best at 48.4% (grade D). No model achieves a grade better than D. Six of seven representative findings are confirmed with runtime crashes under GCC AddressSanitizer. Three auxiliary experiments show: (1) explicit security instructions reduce the mean rate by only 4 points; (2) six industry tools combined miss 97.8% of Z3-proven findings; and (3) models identify their own vulnerable outputs 78.7% of the time in review mode yet generate them at 55.8% by default.

Paper AI Chat

この論文のPDF全文を対象にAIに質問できます。

質問の例:

AIチャット機能を利用するには、ログインまたは会員登録（無料）が必要です。

会員登録 / ログイン

💬 ディスカッション

ディスカッションに参加するにはログインが必要です。

ログイン / アカウント作成 →

arxivで読む PDFを開く

メタ情報

arxiv ID: 2604.05292
カテゴリ: cs.CR, cs.AI, cs.SE

ポイント

Abstract

Paper AI Chat

💬 ディスカッション

関連するAIDB記事

メタ情報