AIDB Daily Papers

行動シミュレーションのための基盤モデル構築：OdysSim

原題: OdysSim: Building Foundation Models for Human Behavior Simulation

著者: Xuhui Zhou, Weiwei Sun, Weihua Du, Jiarui Liu, Haojia Sun, Qianou Ma, Tongshuang Wu, Yiming Yang, Maarten Sap

公開日: 2026-06-12 | 分野: LLM cs.CL cs.AI cs.LG AIエージェント AI評価

※ 日本語タイトル・ポイントはAIによる自動生成です。正確な内容は原論文をご確認ください。

ポイント

人間行動のシミュレーションを目的とした大規模基盤モデルOdysSimを提案し、62のデータセットと23のタスクを統合するSOUL分類体系を開発した。
既存のLLMは均質化されたアシスタントになりがちだが、OdysSimはより人間らしい行動を再現し、特に会話や社会的タスクで優れた性能を示した。
OdysSimは、人間らしい応答の長さ、フォーマット、語彙選択で優位性を示し、ユーザーシミュレーションにおいて実際のユーザーに近い反応一致率を達成した。

Abstract

Large language models are increasingly deployed as human simulators for interactive evaluation and social simulation. Yet helpfulness-driven post-training pulls them toward a homogeneous, overly agreeable assistant register, creating a behavioral Sim2Real gap. We present OdysSim, the largest open systematic investigation of behavioral foundation models, i.e., models trained to simulate human behavior at scale. We propose SOUL, a taxonomy of five capability axes (CONV, SS, COG, ROLE, EVAL) that unifies 62 datasets and 23 benchmark tasks under one framework. Specifically, we curate the OdysSim corpus (21.4M interactions, 10B tokens, retrofitted with back-generated social contexts), construct the SOUL-Index benchmark, and develop an end-to-end training recipe combining midtraining, task-specific RL, and expert distillation. The resulting open 8B OSim model ranks first or tied-first on 8 of 23 tasks, outperforming any individual frontier model by this count, with the strongest gains on conversational and social tasks. Its outputs are also more human-like in length, formatting, and word choice, and it transfers zero-shot to out-of-distribution user simulation on $τ$-bench, nearly matching real users on reaction alignment (93.2 vs. 93.5). We further show that LLM-as-judge RL induces reward-hacking patterns, and that our detectors can mitigate them during post-training. Together, our findings suggest that behavioral foundation models require rethinking the LLM training paradigm. We release all artifacts to support future research.

Paper AI Chat

この論文のPDF全文を対象にAIに質問できます。

質問の例:

AIチャット機能を利用するには、ログインまたは会員登録（無料）が必要です。

会員登録 / ログイン

arXivで読む PDFを開く

メタ情報

arXiv ID: 2606.14199
カテゴリ: cs.CL, cs.AI, cs.LG

ポイント

Abstract

Paper AI Chat

関連するAIDB記事

メタ情報