AIDB Daily Papers
大規模言語モデルによるマルチエージェント学習アルゴリズムの発見
※ 日本語タイトル・ポイントはAIによる自動生成です。正確な内容は原論文をご確認ください。
ポイント
- 大規模言語モデルを活用した進化型コーディングエージェントAlphaEvolveを用いて、新しいマルチエージェント学習アルゴリズムを自動的に発見する。
- 従来のMARLアルゴリズム設計は人手による試行錯誤に依存していたが、本研究は自動化により、より効率的なアルゴリズム探索を可能にする。
- Volatility-Adaptive Discounted CFR (VAD-CFR)とSmoothed Hybrid Optimistic Regret PSRO (SHOR-PSRO)という新規アルゴリズムを進化させ、既存手法を凌駕する性能を示した。
Abstract
Much of the advancement of Multi-Agent Reinforcement Learning (MARL) in imperfect-information games has historically depended on manual iterative refinement of baselines. While foundational families like Counterfactual Regret Minimization (CFR) and Policy Space Response Oracles (PSRO) rest on solid theoretical ground, the design of their most effective variants often relies on human intuition to navigate a vast algorithmic design space. In this work, we propose the use of AlphaEvolve, an evolutionary coding agent powered by large language models, to automatically discover new multiagent learning algorithms. We demonstrate the generality of this framework by evolving novel variants for two distinct paradigms of game-theoretic learning. First, in the domain of iterative regret minimization, we evolve the logic governing regret accumulation and policy derivation, discovering a new algorithm, Volatility-Adaptive Discounted (VAD-)CFR. VAD-CFR employs novel, non-intuitive mechanisms-including volatility-sensitive discounting, consistency-enforced optimism, and a hard warm-start policy accumulation schedule-to outperform state-of-the-art baselines like Discounted Predictive CFR+. Second, in the regime of population based training algorithms, we evolve training-time and evaluation-time meta strategy solvers for PSRO, discovering a new variant, Smoothed Hybrid Optimistic Regret (SHOR-)PSRO. SHOR-PSRO introduces a hybrid meta-solver that linearly blends Optimistic Regret Matching with a smoothed, temperature-controlled distribution over best pure strategies. By dynamically annealing this blending factor and diversity bonuses during training, the algorithm automates the transition from population diversity to rigorous equilibrium finding, yielding superior empirical convergence compared to standard static meta-solvers.
Paper AI Chat
この論文のPDF全文を対象にAIに質問できます。
質問の例: