AIDB Daily Papers

エージェント主導による自律型強化学習研究：四脚ロボットの歩行制御に向けた反復的な方策改善

原題: Agent-Driven Autonomous Reinforcement Learning Research: Iterative Policy Improvement for Quadruped Locomotion

著者: Nimesh Khandelwal, Shakti S. Gupta

公開日: 2026-03-28 | 分野: 強化学習ロボティクス AI エージェント自動化制御シミュレーション

※ 日本語タイトル・ポイントはAIによる自動生成です。正確な内容は原論文をご確認ください。

ポイント

四脚ロボットの歩行制御における、エージェント主導の自律型強化学習研究の事例を提示した。
エージェントが実験の実行、失敗診断、報酬・地形設定の編集などを自律的に行い、人間の介入を限定することで研究の加速を目指す。
Isaac Lab上のDHAV1で70以上の実験を行い、エージェントが自律的に報酬関数の改善や環境設定の調整を行い、歩行性能を向上させた。

Abstract

This paper documents a case study in agent-driven autonomous reinforcement learning research for quadruped locomotion. The setting was not a fully self-starting research system. A human provided high-level directives through an agentic coding environment, while an agent carried out most of the execution loop: reading code, diagnosing failures, editing reward and terrain configurations, launching and monitoring jobs, analyzing intermediate metrics, and proposing the next wave of experiments. Across more than 70 experiments organized into fourteen waves on a DHAV1 12-DoF quadruped in Isaac Lab, the agent progressed from early rough-terrain runs with mean reward around 7 to a best logged Wave 12 run, exp063, with velocity error 0.263 and 97% timeout over 2000 iterations, independently reproduced five times across different GPUs. The archive also records several concrete autonomous research decisions: isolating PhysX deadlocks to terrain sets containing boxes and stair-like primitives, porting four reward terms from openly available reference implementations cite{deeprobotics, rlsar}, correcting Isaac Sim import and bootstrapping issues, reducing environment count for diagnosis, terminating hung runs, and pivoting effort away from HIM after repeated terrain=0.0 outcomes. Relative to the AutoResearch paradigm cite{autoresearch}, this case study operates in a more failure-prone robotics RL setting with multi-GPU experiment management and simulator-specific engineering constraints. The contribution is empirical and documentary: it shows that an agent can materially execute the iterative RL research loop in this domain with limited human intervention, while also making clear where human direction still shaped the agenda.

Paper AI Chat

この論文のPDF全文を対象にAIに質問できます。

質問の例:

AIチャット機能を利用するには、ログインまたは会員登録（無料）が必要です。

会員登録 / ログイン

arXivで読む PDFを開く

メタ情報

arXiv ID: 2603.27416
カテゴリ: cs.RO, cs.AI

ポイント

Abstract

Paper AI Chat

関連するAIDB記事

メタ情報