AIDB Daily Papers

AIエージェントによる再現性評価：科学論文審査を大規模に支援するARA

原題: ARA: Agentic Reproducibility Assessment For Scalable Support Of Scientific Peer-Review

著者: Kevin Riehl, Andres L. Marin, Nikofors Zacharof, Fan Wu, Patrick Langer, Robert Jakob, Anastasios Kouvelas, Georgios Fontaras, Michail A. Makridis

公開日: 2026-05-04 | 分野: LLM AI 再現性 cs.LG cs.DL AIエージェント論文審査

※ 日本語タイトル・ポイントはAIによる自動生成です。正確な内容は原論文をご確認ください。

ポイント

AIエージェントが論文の再現性評価を構造化された推論タスクとして形式化しました。
現代の研究成果の複雑化と増加に対応するため、人間では困難な再現性評価を自動化する点が重要です。
大規模データセットでの実験により、ARAは高い精度でワークフローを再構築し、再現性評価の精度を向上させました。

Abstract

Scientific peer review increasingly struggles to assess reproducibility at the scale and complexity of modern research output. Evaluating reproducibility requires reconstructing experimental dependencies, methodological choices, data flows, and result-generating procedures, which often exceeds what human reviewers can provide. Agentic Reproducibility Assessment (ARA) formalizes reproducibility assessment as a structured reasoning task over scientific documents. Given a paper, ARA extracts a directed workflow graph linking sources, methods, experiments, and outputs, then evaluates its reconstructability using structural and content-based scores for reproducibility assessments. Experiments on 213 ReScience C articles - the largest cross-domain benchmark of human-validated computational reproducibility studies considered to date - demonstrate ARA's generalizability and consistent workflow reconstruction and assessment across LLMs, model temperatures, and scientific domains. ARA achieves ~61% accuracy on three benchmarks, and the highest accuracy reported on ReproBench (60.71% vs. 36.84%) and GoldStandardDB (61.68% vs. 43.56%), highlighting its potential to complement human review at scale and enabling next-generation peer review. Code and Data available: https://github.com/AndresLaverdeMarin/agentic_reproducibility_assessment.

Paper AI Chat

この論文のPDF全文を対象にAIに質問できます。

質問の例:

AIチャット機能を利用するには、ログインまたは会員登録（無料）が必要です。

会員登録 / ログイン

arXivで読む PDFを開く

メタ情報

arXiv ID: 2605.02651
カテゴリ: cs.DL, cs.LG

ポイント

Abstract

Paper AI Chat

関連するAIDB記事

メタ情報