AIDB Daily Papers

Sakura：自然言語記述から複雑なテストを生成するアプローチ

原題: Sakura: An Approach for Generating Complex Tests from Natural Language Test Descriptions

著者: Tyler Stennett, Rangeet Pan, Bridget McGinn, Alessandro Orso, Saurabh Sinha

公開日: 2026-05-30 | 分野: 自然言語処理ソフトウェアエンジニアリング cs.SE テスト生成 AI支援

※ 日本語タイトル・ポイントはAIによる自動生成です。正確な内容は原論文をご確認ください。

ポイント

自然言語記述から構造的に複雑なテストケースを生成するSakuraフレームワークを提案した。
既存手法の限界を克服し、開発者が書くテストに近い、複数のメソッドにまたがる複雑なテスト生成を目指した点が重要である。
Sakuraは、オフザシェルフのエージェントツールを大幅に上回るテストコンパイル可能性とカバレッジオーバーラップを達成した。

Abstract

Testing is a core activity in software development workflows, and research on its automation has spanned several decades. Most existing approaches generate unit tests for individual methods, validate isolated API endpoints, or target user interface (UI) layers, with non-API and non-UI automated test generators typically exercising only a single focal method. Recent empirical evidence shows a substantial gap between such generated tests and developer-written ones, which often span multiple focal methods, involve complex call sequences, and contain elaborate assertions that current automated approaches fail to capture. To address this gap, we propose generating tests from natural language (NL) descriptions of developer intent. We present Sakura, the first agent-based framework for generating structurally complex test cases from NL descriptions. Sakura decomposes NL descriptions into structured blocks and processes them using a multi-agent system consisting of a localization agent that grounds test steps in concrete application code via static analysis, a composition agent that synthesizes compilable test code and iteratively refines it using execution feedback, and a supervisor agent that coordinates agent interactions. To evaluate Sakura, we curate a novel dataset of NL test descriptions at three levels of abstraction, systematically generated from developer-written tests mined from Apache Commons projects. Across 20 applications and 1,464 test scenarios, Sakura significantly outperforms off-the-shelf agentic tools such as Gemini CLI. Specifically, Sakura achieves 50-78% higher test compilability and 38-66% higher coverage overlap with ground-truth tests compared to baselines using the same models. Moreover, Sakura paired with small open-source models such as Devstral Small 2 and Qwen3-Coder outperforms Gemini CLI using large proprietary models, while also being more cost-effective.

Paper AI Chat

この論文のPDF全文を対象にAIに質問できます。

質問の例:

AIチャット機能を利用するには、ログインまたは会員登録（無料）が必要です。

会員登録 / ログイン

arXivで読む PDFを開く

メタ情報

arXiv ID: 2606.00530
カテゴリ: cs.SE

ポイント

Abstract

Paper AI Chat

関連するAIDB記事

メタ情報