AIDB Daily Papers

ポーカーアリーナ：LLMの戦略的推論と記憶の多軸プロファイリング

原題: Poker Arena: Multi-Axis Profiling of Strategic Reasoning and Memory in LLMs

著者: Pratham Singla, Shivank Garg, Vihan Singh

公開日: 2026-06-11 | 分野: LLM AI 戦略 cs.CL cs.AI AI評価

※ 日本語タイトル・ポイントはAIによる自動生成です。正確な内容は原論文をご確認ください。

ポイント

本研究では、不確実性下での戦略的推論を評価するため、ポーカーアリーナというプラットフォームを開発した。
従来の評価法では見落とされがちな、記憶アーキテクチャと9つの推論軸を考慮した新しい評価手法を提案した。
フロンティアLLM7種を評価した結果、単一の指標では見えないモデル間の能力構造の違いが明らかになった。

Abstract

Strategic reasoning under uncertainty underpins consequential decisions in negotiation, finance, and policy, but prevailing game-play benchmarks collapse heterogeneous reasoning dimensions into a single scalar, leaving the capability structure of frontier LLMs unexamined. We introduce Poker Arena, a no-limit Texas Hold'em tournament platform that couples a three-layer memory architecture (within-hand, session, and cross-session) with a nine-axis cognitive profile decomposing strategic reasoning into interpretable dimensions such as bet-sizing calibration and positional awareness. We evaluate seven frontier models across 50 sessions of 1,000 hands and a controlled memory ablation; tournament chips and aggregate axis score order the field differently: Claude Opus 4.6 wins +$15,730 chips with 14 first-place finishes, yet ranks only fifth of seven on mean axis score, while persistent memory helps some models and hurts others. These findings show that multi-axis evaluation surfaces capability structure that scalar leaderboards systematically misrank, with cross-dimensional consistency outweighing peak performance on any single axis.

Paper AI Chat

この論文のPDF全文を対象にAIに質問できます。

質問の例:

AIチャット機能を利用するには、ログインまたは会員登録（無料）が必要です。

会員登録 / ログイン

arXivで読む PDFを開く

メタ情報

arXiv ID: 2606.13815
カテゴリ: cs.AI, cs.CL

ポイント

Abstract

Paper AI Chat

関連するAIDB記事

メタ情報