AIDB Daily Papers
自己進化を学習する:LLMが自らコンテキストを改善する革新的フレームワーク
※ 日本語タイトル・ポイントはAIによる自動生成です。正確な内容は原論文をご確認ください。
ポイント
- 大規模言語モデル(LLM)がテスト時に自身のコンテキストを改善する強化学習フレームワーク、LSEを導入した。
- 既存手法はモデルの推論能力に依存していたが、LSEは明示的に訓練することで性能を向上させる点が新しい。
- 4BパラメータモデルでGPT-5やClaude Sonnet 4.5を凌駕し、追加学習なしで他のモデルをガイド可能。
Abstract
We introduce Learning to Self-Evolve (LSE), a reinforcement learning framework that trains large language models (LLMs) to improve their own contexts at test time. We situate LSE in the setting of test-time self-evolution, where a model iteratively refines its context from feedback on seen problems to perform better on new ones. Existing approaches rely entirely on the inherent reasoning ability of the model and never explicitly train it for this task. LSE reduces the multi-step evolution problem to a single-step RL objective, where each context edit is rewarded by the improvement in downstream performance. We pair this objective with a tree-guided evolution loop. On Text-to-SQL generation (BIRD) and general question answering (MMLU-Redux), a 4B-parameter model trained with LSE outperforms self-evolving policies powered by GPT-5 and Claude Sonnet 4.5, as well as prompt optimization methods including GEPA and TextGrad, and transfers to guide other models without additional training. Our results highlight the effectiveness of treating self-evolution as a learnable skill.
Paper AI Chat
この論文のPDF全文を対象にAIに質問できます。
質問の例: