AIDB Daily Papers

LLMの効率的強化学習：経験再生による学習コスト削減

原題: Efficient RL Training for LLMs with Experience Replay

著者: Charles Arnal, Vivien Cabannes, Taco Cohen, Julia Kempe, Remi Munos

公開日: 2026-04-09 | 分野: LLM 強化学習 Transformer 機械学習 AI 最適化深層学習効率経験再生

※ 日本語タイトル・ポイントはAIによる自動生成です。正確な内容は原論文をご確認ください。

ポイント

LLMの強化学習において、経験再生（過去の経験の再利用）の有効性を検証しました。
従来、新鮮なデータが重要視されていましたが、計算コストとのトレードオフを最適化することで、経験再生が有効になることを示しました。
経験再生バッファを適切に設計することで、性能を維持・向上させつつ、推論コストを大幅に削減できることを実証しました。

Abstract

While Experience Replay - the practice of storing rollouts and reusing them multiple times during training - is a foundational technique in general RL, it remains largely unexplored in LLM post-training due to the prevailing belief that fresh, on-policy data is essential for high performance. In this work, we challenge this assumption. We present a systematic study of replay buffers for LLM post-training, formalizing the optimal design as a trade-off between staleness-induced variance, sample diversity and the high computational cost of generation. We show that strict on-policy sampling is suboptimal when generation is expensive. Empirically, we show that a well-designed replay buffer can drastically reduce inference compute without degrading - and in some cases even improving - final model performance, while preserving policy entropy.

Paper AI Chat

この論文のPDF全文を対象にAIに質問できます。

質問の例:

AIチャット機能を利用するには、ログインまたは会員登録（無料）が必要です。

会員登録 / ログイン

💬 ディスカッション

ディスカッションに参加するにはログインが必要です。

ログイン / アカウント作成 →

arxivで読む PDFを開く

メタ情報

arxiv ID: 2604.08706
カテゴリ: cs.LG

ポイント

Abstract

Paper AI Chat

💬 ディスカッション

関連するAIDB記事

メタ情報