AIDB Daily Papers

大規模言語モデルによる電力潮流計算のベンチマーク：構造化プロンプトは有効か？

原題: A Benchmark on LLM-Based Power Flow Computation: Do More Structured Prompts Help?

著者: Tingwei Chen, Kaiyang Huang, Kai Sun

公開日: 2026-05-18 | 分野: LLM ベンチマークプロンプトエンジニアリング AI支援

※ 日本語タイトル・ポイントはAIによる自動生成です。正確な内容は原論文をご確認ください。

ポイント

3つの大規模言語モデル（LLM）を用いて、電力潮流計算におけるプロンプト形式の影響を評価した。
構造化されたプロンプトは、単純な物語形式のプロンプトと比較して、LLMの計算精度を低下させる場合があることが示された。
どのLLMも、現状では直接的な数値計算ソルバーとして利用できるほどの信頼性を達成できなかった。

Abstract

We present a controlled benchmark evaluating three LLMs -- Claude Sonnet 4.5, Gemini 2.5 Pro, and GPT-3.5 Turbo -- across four prompt formats (from concise narrative to structured JSON with explicit iteration trace) on Gauss--Seidel AC power flow computation for a three-bus system. Against 50 test cases with reference solutions computed numerically, Gemini 2.5 Pro with the simplest narrative prompt achieves the lowest mean absolute error (MAE = 0.257 MW/MVar, 54% of cases within 5% relative error), while the same model with a JSON-structured prompt raises MAE to 0.789 -- a 3.1$times$ increase. Adding a worked example degrades accuracy for Gemini but provides a marginal gain for Claude. GPT-3.5 Turbo fails on at least 90% of cases under all prompt formats. An independent 100-case replication with related prompt-format families confirms the qualitative ordering (Gemini $>$ Claude $>$ GPT-3.5): the best 100-case configuration (Gemini with explicit iteration trace) achieves MAE = 0.402 and 53% within 5%, while Claude Sonnet 4.5's near-flat accuracy profile ($approx$38% within 5% across formats) and GPT-3.5's near total ineffectiveness (92--97% above 20% error) both replicate. In neither evaluation does any configuration achieve sufficient reliability for use as a direct numerical solver. These findings offer a diagnostic baseline for practitioners and researchers evaluating LLMs for smart-grid decision-support assistance.

Paper AI Chat

この論文のPDF全文を対象にAIに質問できます。

質問の例:

AIチャット機能を利用するには、ログインまたは会員登録（無料）が必要です。

会員登録 / ログイン

arXivで読む PDFを開く

メタ情報

arXiv ID: 2605.18642
カテゴリ: eess.SY

ポイント

Abstract

Paper AI Chat

関連するAIDB記事

メタ情報