AIDB Daily Papers

評価前の最適化：未最適化プロンプトでの評価は誤解を招く可能性

原題: Optimization before Evaluation: Evaluation with Unoptimised Prompts Can be Misleading

著者: Nicholas Sadjoli, Tim Siefken, Atin Ghosh, Yifan Mai, Daniel Dahlmeier

公開日: 2026-04-30 | 分野: LLM AI 評価最適化 cs.AI プロンプトエンジニアリング

※ 日本語タイトル・ポイントはAIによる自動生成です。正確な内容は原論文をご確認ください。

ポイント

本研究では、LLM評価におけるプロンプト最適化の影響を調査した。
従来の評価フレームワークは全モデルで静的なプロンプトを使用するが、実運用ではモデルごとにプロンプト最適化が行われる。
プロンプト最適化はモデルの評価ランキングに大きく影響するため、評価時にはモデルごとの最適化が重要であると明らかになった。

Abstract

Current Large Language Model (LLM) evaluation frameworks utilize the same static prompt template across all models under evaluation. This differs from the common industry practice of using prompt optimization (PO) techniques to optimize the prompt for each model to maximize application performance. In this paper, we investigate the effect of PO towards LLM evaluations. Our results on public academic and internal industry benchmarks show that PO greatly affects the final ranking of models. This highlights the importance of practitioners performing PO per model when conducting evaluations to choose the best model for a given task.

Paper AI Chat

この論文のPDF全文を対象にAIに質問できます。

質問の例:

AIチャット機能を利用するには、ログインまたは会員登録（無料）が必要です。

会員登録 / ログイン

💬 ディスカッション

ディスカッションに参加するにはログインが必要です。

ログイン / アカウント作成 →

arxivで読む PDFを開く

メタ情報

arxiv ID: 2604.27637
カテゴリ: cs.AI

ポイント

Abstract

Paper AI Chat

💬 ディスカッション

関連するAIDB記事

メタ情報