AIDB Daily Papers
ゲノム情報に基づいた微生物の生命限界予測エージェント「GGBound」
※ 日本語タイトル・ポイントはAIによる自動生成です。正確な内容は原論文をご確認ください。
ポイント
- 微生物の生理的生命限界をゲノム情報から予測する新しいAIエージェント「GGBound」を開発しました。
- 従来の計算手法の限界を克服し、ゲノムと生理的特性の関連性を統合的に扱う点で重要かつ新規性があります。
- 開発したエージェントは、既存の大規模言語モデルを凌駕する性能を示し、ゲノム埋め込みやツール利用が有効でした。
Abstract
Characterizing the physiological life boundaries of microbial strains, including viable temperature, pH, salinity, substrate utilization, and morphology, is central to biotechnology and ecology, yet traditionally requires exhaustive in vitro screening. Existing computational approaches either treat physiological traits as isolated supervised targets or repurpose biological foundation models as static encoders, leaving the genotype-to-physiology gap largely unbridged. We formulate microbial life-boundary prediction as a unified genome-to-physiology task and address it with a genome-conditioned, tool-augmented LLM agent. To support this task, we curate a strain-centric benchmark from IJSEM, NCBI, and BacDive covering 1,525 strains and 6,448 instances across viability intervals, environmental optima, substrate utilization, categorical traits, and morphology. Architecturally, the agent injects frozen LucaOne genome embeddings into a Qwen backbone via lightweight token fusion, and reasons over a similarity-based RAG module and a Genome-scale Metabolic Model (GEM) perturbation tool. We optimize the agent through a three-stage pipeline of gene-text alignment, agentic SFT on distilled trajectories, and GRPO with a novel counterfactual gene-grounding reward that reinforces the policy only when the authentic genome embedding causally improves correct-token generation relative to a zero-gene ablation. The resulting 4B-parameter agent matches or surpasses substantially larger frontier LLMs, with ablations confirming that genome-token fusion, dynamic tool use, and the counterfactual reward each yield distinct, significant gains.
Paper AI Chat
この論文のPDF全文を対象にAIに質問できます。
質問の例: