AIDB Daily Papers

精度を超えて：長文LLM生成における事実性評価のための重要度を考慮した再現率

原題: Beyond Precision: Importance-Aware Recall for Factuality Evaluation in Long-Form LLM Generation

著者: Nazanin Jafari, James Allan, Mohit Iyyer

公開日: 2026-04-03 | 分野: LLM NLP 知識評価精度自然言語処理再現性ファクトチェック大規模言語モデル

※ 日本語タイトル・ポイントはAIによる自動生成です。正確な内容は原論文をご確認ください。

ポイント

LLMが生成する長文テキストの事実性を評価する新フレームワークを提案しました。
既存手法が精度に偏る一方、本研究では網羅性（再現率）も重視し、重要度に基づく重み付けを導入しました。
分析の結果、LLMは精度は高いものの再現率が課題であり、特に重要な事実の網羅性に改善の余地があると判明しました。

Abstract

Evaluating the factuality of long-form output generated by large language models (LLMs) remains challenging, particularly when responses are open-ended and contain many fine-grained factual statements. Existing evaluation methods primarily focus on precision: they decompose a response into atomic claims and verify each claim against external knowledge sources such as Wikipedia. However, this overlooks an equally important dimension of factuality: recall, whether the generated response covers the relevant facts that should be included. We propose a comprehensive factuality evaluation framework that jointly measures precision and recall. Our method leverages external knowledge sources to construct reference facts and determine whether they are captured in generated text. We further introduce an importance-aware weighting scheme based on relevance and salience. Our analysis reveals that current LLMs perform substantially better on precision than on recall, suggesting that factual incompleteness remains a major limitation of long-form generation and that models are generally better at covering highly important facts than the full set of relevant facts.

Paper AI Chat

この論文のPDF全文を対象にAIに質問できます。

質問の例:

AIチャット機能を利用するには、ログインまたは会員登録（無料）が必要です。

会員登録 / ログイン

💬 ディスカッション

ディスカッションに参加するにはログインが必要です。

ログイン / アカウント作成 →

arxivで読む PDFを開く

メタ情報

arxiv ID: 2604.03141
カテゴリ: cs.CL

ポイント

Abstract

Paper AI Chat

💬 ディスカッション

関連するAIDB記事

メタ情報