AIDB Daily Papers
人間が読めなくてもAIは理解できる?言語モデルの「秘密の言語」を探る研究
※ 日本語タイトル・ポイントはAIによる自動生成です。正確な内容は原論文をご確認ください。
ポイント
- 人間が読めないほど圧縮されたテキストでも、大規模言語モデル(LLM)が意味を理解できるか検証した。
- 人間には読みにくいがLLMには意味が復元できる「BabelTele」という表現形式を提案し、情報密度と意味の忠実度を評価した。
- 圧縮されたBabelTeleは元のテキストの27.9%のサイズで99.5%の意味を保持し、モデル間の通信や記憶に有効である可能性を示した。
Abstract
Large language models (LLMs) are commonly prompted and interfaced with human-readable natural language, even when the intended reader is another model. This paper investigates whether semantic information can be encoded in compact, non-standard textual forms that sacrifice human readability while remaining recoverable by LLMs. We refer to this class of model-centric textual representations as BabelTele, approached here not as a fixed protocol but as an empirical probe into LLMs' capacity to generate and interpret such representations. Through readability diagnostics, model likelihood measures, human questionnaires, and downstream task evaluations, we find that BabelTele can substantially depart from ordinary natural language while preserving core semantics for instruction-tuned LLMs. As a task-agnostic representational paradigm, BabelTele demonstrates high information density, maintaining 99.5% semantic fidelity even when the text volume is condensed to 27.9% of its original length. We further evaluate its semantic robustness in cross-model transfer, agent memory, and multi-agent communication. Results suggest that BabelTele can reduce context overhead while generally maintaining reliable downstream performance, although its effectiveness depends on the compressor-reader pair and task setting. These findings indicate that human readability, natural-language typicality, and model-side semantic recoverability can be partially decoupled, opening a path toward model-native representations in future exploration of LLM systems.
Paper AI Chat
この論文のPDF全文を対象にAIに質問できます。
質問の例: