AIDB Daily Papers
ボードゲームシミュレーションでLLMの動的資産管理と戦略的金融推論を評価するFinBoardBench
※ 日本語タイトル・ポイントはAIによる自動生成です。正確な内容は原論文をご確認ください。
ポイント
- 動的な資産管理と金融意思決定能力を評価するため、3つの古典的なボードゲームを基にした評価スイートFinBoardBenchを提案した。
- 既存の静的な金融ベンチマークでは評価できない、現実世界の環境におけるLLMの複雑な金融スキルの評価を可能にする点が重要である。
- 実験の結果、LLMは長期計画や投資論理を示すものの、複雑な相互作用を利益に繋げられず、流動性維持よりも即時的な資産獲得を優先しがちであることが明らかになった。
Abstract
Recently, large language models (LLMs) have achieved superior performance in static financial reasoning and simple dynamic trading tasks. However, existing static financial benchmarks are insufficient to assess the dynamic wealth management and financial decision-making capabilities of LLMs in real-world environments. To bridge this gap, we present FinBoardBench, an evaluation suite based on three classic financial board games: Cashflow, Acquire, and Monopoly. FinBoardBench assesses a comprehensive set of financial skills, including personal cash flow management with debt balancing, corporate investment and acquisition forecasting, and competitive trade negotiations with asset auctions. Our experiments with 9 advanced LLMs reveal that while exhibiting basic long-term planning and investment logic, they fail to effectively leverage complex interactions for profit, and their strong static reasoning performance does not transform into successful dynamic decision-making. Notably, they tend to prioritize immediate asset acquisition over maintaining sufficient liquidity, making them vulnerable to financial crises triggered by random events. We hope that FinBoardBench can provide a valuable reference for more intelligent LLM-based decision-making systems in the future.
Paper AI Chat
この論文のPDF全文を対象にAIに質問できます。
質問の例: