AIDB Daily Papers

LLMを活用したストーリーポイントの見積もり：アジャイル開発を効率化

原題: Story Point Estimation Using Large Language Models

著者: Pranam Prakash Shetty, Adarsh Balakrishnan, Mengqiao Xu, Xiaoyin Xi, Zhe Yu

公開日: 2026-03-06 | 分野: LLM 効率化機械学習ソフトウェアアノテーション開発プロジェクトマネジメント

※ 日本語タイトル・ポイントはAIによる自動生成です。正確な内容は原論文をご確認ください。

ポイント

大規模言語モデル(LLM)を用いて、アジャイル開発におけるストーリーポイントの見積もりを試みました。
従来は教師あり学習が主流でしたが、LLMは少ないデータでも高い予測精度を発揮し、開発者の負担軽減に繋がります。
実験の結果、LLMは教師あり学習モデルを上回る精度を示し、比較判断をFew-shot学習に用いることで更なる向上が見られました。

Abstract

This study investigates the use of large language models (LLMs) for story point estimation. Story points are unitless, project-specific effort estimates that help developers on the scrum team forecast which product backlog items they plan to complete in a sprint. To facilitate this process, machine learning models, especially deep neural networks, have been applied to predict the story points based on the title and description of each item. However, such machine learning models require sufficient amounts of training data (with ground truth story points annotated by human developers) from the same software project to achieve decent prediction performance. This motivated us to explore whether LLMs are capable of (RQ1) predicting story points without training data or (RQ2) with only a few training data points. Our empirical results with four LLMs on 16 software projects show that, without any training data (zero-shot prompting), LLMs can predict story points better than supervised deep learning models trained on 80% of the data. The prediction performance of LLMs can be further improved with a few training examples (few-shot prompting). In addition, a recent study explored the use of comparative judgments (between a given pair of items which one requires more effort to implement) instead of directly annotating the story points to reduce the cognitive burden on developers. Therefore, this study also explores (RQ3) whether comparative judgments are easier to predict than story points for LLMs and (RQ4) whether comparative judgments can serve as few-shot examples for LLMs to improve their predictions of story points. Empirical results suggest that it is not easier for LLMs to predict comparative judgments than to directly estimate the story points, but comparative judgments can serve as few-shot examples to improve the LLMs' prediction performance as well as the human-annotated story points.

Paper AI Chat

この論文のPDF全文を対象にAIに質問できます。

質問の例:

AIチャット機能を利用するには、ログインまたは会員登録（無料）が必要です。

会員登録 / ログイン

💬 ディスカッション

ディスカッションに参加するにはログインが必要です。

ログイン / アカウント作成 →

arxivで読む PDFを開く

メタ情報

arxiv ID: 2603.06276
カテゴリ: cs.SE

ポイント

Abstract

Paper AI Chat

💬 ディスカッション

関連するAIDB記事

メタ情報