AIDB Daily Papers

LLMの文字数カウント失敗：初期エンコードから終盤の抑制まで

原題: From Early Encoding to Late Suppression: Interpreting LLMs on Character Counting Tasks

著者: Ayan Datta, Mounika Marreddy, Alexander Mehler, Zhixue Zhao, Radhika Mamidi

公開日: 2026-04-01 | 分野: LLM Transformer 解釈性推論言語テキストモデルエラー自然言語処理深層学習

※ 日本語タイトル・ポイントはAIによる自動生成です。正確な内容は原論文をご確認ください。

ポイント

LLMが複雑なタスクで優れているにもかかわらず、文字数カウントのような単純なタスクで失敗する原因を調査した。
文字レベルの推論を分離した最小限のプローブを使用し、モデル内部での正しい計算と出力層での表現失敗という一貫した現象を発見した。
モデル内の情報干渉が原因であり、競争的デコーディングの結果、正しい情報が抑制されることが示唆された。

Abstract

Large language models (LLMs) exhibit failures on elementary symbolic tasks such as character counting in a word, despite excelling on complex benchmarks. Although this limitation has been noted, the internal reasons remain unclear. We use character counting (e.g., "How many p's are in apple?") as a minimal, controlled probe that isolates token-level reasoning from higher-level confounds. Using this setting, we uncover a consistent phenomenon across modern architectures, including LLaMA, Qwen, and Gemma: models often compute the correct answer internally yet fail to express it at the output layer. Through mechanistic analysis combining probing classifiers, activation patching, logit lens analysis, and attention head tracing, we show that character-level information is encoded in early and mid-layer representations. However, this information is attenuated by a small set of components in later layers, especially the penultimate and final layer MLP. We identify these components as negative circuits: subnetworks that downweight correct signals in favor of higher-probability but incorrect outputs. Our results lead to two contributions. First, we show that symbolic reasoning failures in LLMs are not due to missing representations or insufficient scale, but arise from structured interference within the model's computation graph. This explains why such errors persist and can worsen under scaling and instruction tuning. Second, we provide evidence that LLM forward passes implement a form of competitive decoding, in which correct and incorrect hypotheses coexist and are dynamically reweighted, with final outputs determined by suppression as much as by amplification. These findings carry implications for interpretability and robustness: simple symbolic reasoning exposes weaknesses in modern LLMs, underscoring need for design strategies that ensure information is encoded and reliably used.

Paper AI Chat

この論文のPDF全文を対象にAIに質問できます。

質問の例:

AIチャット機能を利用するには、ログインまたは会員登録（無料）が必要です。

会員登録 / ログイン

💬 ディスカッション

ディスカッションに参加するにはログインが必要です。

ログイン / アカウント作成 →

arxivで読む PDFを開く

メタ情報

arxiv ID: 2604.00778
カテゴリ: cs.CL

ポイント

Abstract

Paper AI Chat

💬 ディスカッション

関連するAIDB記事

メタ情報