AIDB Daily Papers
GitHub議論からコミュニティ知識を抽出するマルチLLMパイプライン:スレッドから軌跡へ
※ 日本語タイトル・ポイントはAIによる自動生成です。正確な内容は原論文をご確認ください。
ポイント
- GitHubの複雑な議論スレッドから、開発者が問題解決に必要な情報を効率的に抽出する自動化パイプラインを開発しました。
- この研究は、個々のコメント分析、外部リソースの考慮、ラベル付けされた軌跡の合成といったタスクを複数のLLMで実行する点で新規性があります。
- 800件のGitHubイシューで評価した結果、91.7%の成功率で高忠実度の軌跡を抽出し、開発者の問題診断支援やLLMエージェントの訓練に貢献します。
Abstract
Resolution of complex post-production issues in large-scale open-source software (OSS) projects requires significant cognitive effort, as developers need to go through long, unstructured and fragmented issue discussion threads before that. In this paper, we present SWE-MIMIC-Bench, an issue trajectory dataset generated from raw GitHub discussions using an automated multi-LLM pipeline. Unlike simple summarization, this pipeline utilizes a group of closed-source LLMs to perform granular tasks: analyzing individual comments with awareness of externally-linked resources, classifying comment analyses into label-specific fields (e.g., root cause, solution plan, implementation progress), and synthesizing label-aware trajectories which capture a structured and coherent narrative of the entire discussion thread. Our pipeline uses five closed-source LLM configurations for distinct purposes: label classification, inline code block and external link summarization, comment analysis, label-specific field classification and trajectory synthesis. By generating concise and reliable trajectories from complex conversation threads, this system can assist developers and researchers of broader software engineering community to understand the experience-driven collaborative approach for issue diagnosis. Furthermore, the generated trajectories can be used to train modern LLM agents to think and act like an expert developer. We evaluated our system on 800 real-world GitHub issues drawn from the SWE-Bench-Pro, SWE-Bench-Multilingual and SWE-Bench-Verified dataset, achieving a 91.7% success rate in extracting 734 high-fidelity reasoning trajectories.
Paper AI Chat
この論文のPDF全文を対象にAIに質問できます。
質問の例: