AIDB Daily Papers

バイブコーディングは建設業の未来か？LLM生成コードの安全性に関する実証的評価

原題: Is Vibe Coding the Future? An Empirical Assessment of LLM Generated Codes for Construction Safety

著者: S M Jamil Uddin

公開日: 2026-04-14 | 分野: エラー GPT-4 自動生成コード生成ソフトウェアエンジニアリング建設ヒューマンコンピュータインタラクション深層学習自然言語処理 LLM Python プロンプト評価リスク AI 機械学習安全性

※ 日本語タイトル・ポイントはAIによる自動生成です。正確な内容は原論文をご確認ください。

ポイント

本研究では、自然言語で指示されたLLMが生成するコード（バイブコーディング）の建設安全への応用を評価しました。
LLMの確率的な性質がもたらす、見た目上は正常だが安全ロジックに欠陥がある「サイレント故障」のリスクを定量的に分析しました。
GPT-4o-Miniの生成コードの約56%に数学的な誤りが見られ、LLMの安全工学への単独利用には厳格な管理が必要と判明しました。

Abstract

The emergence of vibe coding, a paradigm where non-technical users instruct Large Language Models (LLMs) to generate executable codes via natural language, presents both significant opportunities and severe risks for the construction industry. While empowering construction personnel such as the safety managers, foremen, and workers to develop tools and software, the probabilistic nature of LLMs introduces the threat of silent failures, wherein generated code compiles perfectly but executes flawed mathematical safety logic. This study empirically evaluates the reliability, software architecture, and domain-specific safety fidelity of 450 vibe-coded Python scripts generated by three frontier models, Claude 3.5 Haiku, GPT-4o-Mini, and Gemini 2.5 Flash. Utilizing a persona-driven prompt dataset (n=150) and a bifurcated evaluation pipeline comprising isolated dynamic sandboxing and an LLM-as-a-Judge, the research quantifies the severe limits of zero-shot vibe codes for construction safety. The findings reveal a highly significant relationship between user persona and data hallucination, demonstrating that less formal prompts drastically increase the AI's propensity to invent missing safety variables. Furthermore, while the models demonstrated high foundational execution viability (~85%), this syntactic reliability actively masked logic deficits and a severe lack of defensive programming. Among successfully executed scripts, the study identified an alarming ~45% overall Silent Failure Rate, with GPT-4o-Mini generating mathematically inaccurate outputs in ~56% of its functional code. The results demonstrate that current LLMs lack the deterministic rigor required for standalone safety engineering, necessitating the adoption of deterministic AI wrappers and strict governance for cyber-physical deployments.

Paper AI Chat

この論文のPDF全文を対象にAIに質問できます。

質問の例:

AIチャット機能を利用するには、ログインまたは会員登録（無料）が必要です。

会員登録 / ログイン

arXivで読む PDFを開く

メタ情報

arXiv ID: 2604.12311
カテゴリ: cs.SE, cs.AI, cs.HC

ポイント

Abstract

Paper AI Chat

関連するAIDB記事

メタ情報