AIDB Daily Papers
言語モデルの空間推論能力を向上させる!アスキーアート描画学習の驚くべき効果
※ 日本語タイトル・ポイントはAIによる自動生成です。正確な内容は原論文をご確認ください。
ポイント
- 本研究では、言語モデルに空間的な記述からアスキーアートのレイアウトを生成させるText2Spaceデータセットを導入した。
- 言語モデルがアスキーアートを解釈できるものの、生成が苦手な「読み書き非対称性」を明らかにし、その改善が重要であることを示した。
- レイアウト生成学習により、推論時のアスキーアート生成なしでも空間推論能力が向上し、外部ベンチマークでも性能向上が確認された。
Abstract
When faced with complex spatial problems, humans naturally sketch layouts to organize their thinking, and the act of drawing further sharpens their understanding. In this work, we ask whether a similar principle holds for Large Language Models (LLMs): can learning to construct explicit visual layouts from spatial descriptions instill genuine spatial understanding? We introduce Text2Space, a dataset that pairs natural language descriptions with ground-truth ASCII grid layouts and spatial QA pairs, enabling us to separate failures in constructing spatial representations from failures in reasoning over them. We adopt ASCII because it is human-readable, operates entirely within the token space of language models, and encodes spatial relations in a structurally verifiable form. Our evaluation reveals a pronounced "Read-Write Asymmetry": LLMs interpret ASCII representations effectively but struggle to produce them from text, and these construction errors propagate to incorrect answers downstream. To address this limitation, we train models on layout construction (Text$rightarrow$ASCII) and find that it significantly improves spatial reasoning from text alone, even without producing any ASCII at inference time. Combining construction with comprehension training further amplifies these gains. Crucially, these improvements transfer to three external spatial reasoning benchmarks, demonstrating that, much as sketching sharpens human spatial thinking, learning to construct explicit layouts instills spatial understanding that generalizes beyond the training format.
Paper AI Chat
この論文のPDF全文を対象にAIに質問できます。
質問の例: