AIDB Daily Papers
LLM駆動型方程式探索による科学的発見:影響度ガイド付きシンボリック回帰
※ 日本語タイトル・ポイントはAIによる自動生成です。正確な内容は原論文をご確認ください。
ポイント
- LLMを用いた科学的発見において、方程式探索の効率とフィードバックの粗さを改善する手法を提案した。
- 提案手法は、候補となる関数項の影響度を評価し、モデル構造を体系的に洗練させることで、従来のグローバルな誤差指標のみに頼る手法を凌駕する。
- 多様なベンチマークと実世界のゲノムデータで有効性を実証し、新規の生物学的関係性を発見して実験的に検証した。
Abstract
Large Language Models (LLMs) offer a promising avenue for scientific discovery, yet their application to symbolic regression is often constrained by inefficient search strategies and coarse feedback signals. Current methods typically guide LLMs using scalar metrics (e.g., global Mean Squared Error), which fail to identify which components of a proposed equation are driving performance or causing error. We introduce textit{Influence-Guided Symbolic Regression} (IGSR), a method that frames equation discovery as an iterative two-step process combining diverse term generation with rigorous selection: an LLM generates candidate basis functions $ψ_j(mathbf{x})$ for a linear model, which are then evaluated using granular influence scores $Δ_j$. These scores quantify each term's marginal contribution to generalization accuracy, enabling an influence-guided pruning process that systematically refines the model structure. Integrating this mechanism into a Monte Carlo Tree Search (MCTS) enables navigating the combinatorial search space while balancing exploration of novel functional forms with exploitation of high-influence components. We demonstrate IGSR's effectiveness on a diverse suite of benchmarks, including LLM-SRBench, pharmacological PKPD models, an epidemiological simulation, and real-world genomic data. Notably, we validate the framework's capacity for genuine discovery in a case study using a high-dimensional biological dataset, in which IGSR identified a novel relationship between DNA methylation and RNA Polymerase II pausing; a hypothesis that was subsequently supported via wet-lab experimentation.
Paper AI Chat
この論文のPDF全文を対象にAIに質問できます。
質問の例: