AIDB Daily Papers
ネタバレ注意報:LLMストーリーにおける緊張感を測る指標としての物語予測
※ 日本語タイトル・ポイントはAIによる自動生成です。正確な内容は原論文をご確認ください。
ポイント
- LLMによる物語生成の緊張感を評価する100-Endingsという新しい指標を提案した。
- 既存の評価基準では見落とされていた物語の緊張感を、結末予測の失敗率から測定する点が新しい。
- 100-EndingsはLLM生成物語と人間による物語を区別し、物語生成パイプラインの改善に貢献する。
Abstract
LLMs have so far failed both to generate consistently compelling stories and to recognize this failure--on the leading creative-writing benchmark (EQ-Bench), LLM judges rank zero-shot AI stories above New Yorker short stories, a gold standard for literary fiction. We argue that existing rubrics overlook a key dimension of compelling human stories: narrative tension. We introduce the 100-Endings metric, which walks through a story sentence by sentence: at each position, a model predicts how the story will end 100 times given only the text so far, and we measure tension as how often predictions fail to match the ground truth. Beyond the mismatch rate, the sentence-level curve yields complementary statistics, such as inflection rate, a geometric measure of how frequently the curve reverses direction, tracking twists and revelations. Unlike rubric-based judges, 100-Endings correctly ranks New Yorker stories far above LLM outputs. Grounded in narratological principles, we design a story-generation pipeline using structural constraints, including analysis of story templates, idea formulation, and narrative scaffolding. Our pipeline significantly increases narrative tension as measured by the 100-Endings metric, while maintaining performance on the EQ-Bench leaderboard.
Paper AI Chat
この論文のPDF全文を対象にAIに質問できます。
質問の例: