AIDB Daily Papers
2Dタスクを1Dシリアライゼーションで処理する際の「シリアライゼーション摩擦」
※ 日本語タイトル・ポイントはAIによる自動生成です。正確な内容は原論文をご確認ください。
ポイント
- 本研究は、LLMが2D構造を持つタスクを1Dのトークン列に変換する際に生じる「シリアライゼーション摩擦」を調査した。
- 2D構造の明示的な表現を失うことで生じる計算上の負荷を、テキストのみの経路と画像による2Dレイアウトを保持した経路で比較した。
- 画像による2Dレイアウトを保持した経路が、テキストのみの経路よりも一貫して優れた性能を示し、特に大規模な次元でその差が拡大した。
Abstract
Large language models (LLMs) conventionally process structured inputs as 1D token sequences. While natural for prose, such linearization may introduce additional representational burden for tasks whose computation depends directly on explicit 2D structure, because row--column alignment and local neighborhoods are no longer directly expressed in the input. We study this setting, which we refer to as serialization friction, on a small diagnostic testbed of synthetic tasks with explicit 2D structure: matrix transpose, Conway's Game of Life, and LU decomposition. To examine this question, we compare a text-only language pathway over serialized inputs with a vision-augmented pathway, built on the same language backbone, that receives the same underlying content rendered in task-faithful 2D layout, yielding a system-level comparison between two end-to-end input pathways. Across the tasks and settings we study, the visual pathway consistently outperforms the textual pathway; the gap often widens at larger dimensions, and error patterns under serialization become increasingly spatially structured. These findings indicate that the relationship between input representation and model performance on such tasks warrants further investigation, and suggest that preserving task-relevant 2D layout is a promising direction for structured 2D tasks.
Paper AI Chat
この論文のPDF全文を対象にAIに質問できます。
質問の例: