AIDB Daily Papers
AIは科学的センスを学習できるか?:コミュニティのフィードバックからの強化学習
※ 日本語タイトル・ポイントはAIによる自動生成です。正確な内容は原論文をご確認ください。
ポイント
- 本研究では、大規模なコミュニティのシグナルを教師ありデータとして活用する新しい学習パラダイムRLCFを提案した。
- AI科学者の科学的センス向上は未開拓であり、潜在的な影響力の高い研究アイデアを判断・提案する能力の学習が重要となる。
- 実験の結果、提案手法は既存のLLMを上回り、将来の分野や査読者の好みに一般化できることを示した。
Abstract
Great scientists have strong judgement and foresight, closely tied to what we call scientific taste. Here, we use the term to refer to the capacity to judge and propose research ideas with high potential impact. However, most relative research focuses on improving an AI scientist's executive capability, while enhancing an AI's scientific taste remains underexplored. In this work, we propose Reinforcement Learning from Community Feedback (RLCF), a training paradigm that uses large-scale community signals as supervision, and formulate scientific taste learning as a preference modeling and alignment problem. For preference modeling, we train Scientific Judge on 700K field- and time-matched pairs of high- vs. low-citation papers to judge ideas. For preference alignment, using Scientific Judge as a reward model, we train a policy model, Scientific Thinker, to propose research ideas with high potential impact. Experiments show Scientific Judge outperforms SOTA LLMs (e.g., GPT-5.2, Gemini 3 Pro) and generalizes to future-year test, unseen fields, and peer-review preference. Furthermore, Scientific Thinker proposes research ideas with higher potential impact than baselines. Our findings show that AI can learn scientific taste, marking a key step toward reaching human-level AI scientists.
Paper AI Chat
この論文のPDF全文を対象にAIに質問できます。
質問の例: