AIDB Daily Papers

LLMの巨大活性化を解明する単一層：MEレイヤーの発見と性能向上

原題: A Single Layer to Explain Them All:Understanding Massive Activations in Large Language Models

著者: Zeru Shi, Zhenting Wang, Fan Yang, Qifan Wang, Ruixiang Tang

公開日: 2026-05-08 | 分野: LLM 自然言語処理深層学習 cs.CL

※ 日本語タイトル・ポイントはAIによる自動生成です。正確な内容は原論文をご確認ください。

ポイント

LLMにおける巨大活性化の起源を調査し、MEレイヤーを発見してそのメカニズムを解明した。
MEレイヤーは、RMSNormとFFNパラメータが共同で巨大活性化を生み出し、表現の多様性を低下させるという点で重要である。
提案手法は、MEレイヤーの活性化を抑制することで、指示追従や数学推論タスクの性能を向上させ、アテンションシンクも緩和した。

Abstract

We investigate the origins of massive activations in large language models (LLMs) and identify a specific layer named the textbf{Massive Emergence Layer (ME Layer)}, that is consistently observed across model families, where massive activations first emerge and subsequently propagate to deeper layers through residual connections. We show that, within the ME Layer both the RMSNorm and the FFN parameters jointly contribute to the emergence of massive activations. Once formed, the massive activation token representation remains largely invariant across layers, reducing the diversity of hidden representations passed to the attention module. Motivated by this limitation, we propose a simple and effective method to reduce the rigidity of the massive activation token. Our approach consistently improves LLM performance across multiple tasks, including instruction following and math reasoning, in both training free and fine tuning settings. Moreover, we show that our method mitigates attention sinks by selectively weakening their influence, elucidating their origin at the hidden state level and shedding new light on principled mitigation strategies.

Paper AI Chat

この論文のPDF全文を対象にAIに質問できます。

質問の例:

AIチャット機能を利用するには、ログインまたは会員登録（無料）が必要です。

会員登録 / ログイン

arXivで読む PDFを開く

メタ情報

arXiv ID: 2605.08504
カテゴリ: cs.CL

ポイント

Abstract

Paper AI Chat

関連するAIDB記事

メタ情報