AIDB Daily Papers
LLMの巨大活性化を解明する単一層:MEレイヤーの発見と性能向上
※ 日本語タイトル・ポイントはAIによる自動生成です。正確な内容は原論文をご確認ください。
ポイント
- LLMにおける巨大活性化の起源を調査し、MEレイヤーを発見してそのメカニズムを解明した。
- MEレイヤーは、RMSNormとFFNパラメータが共同で巨大活性化を生み出し、表現の多様性を低下させるという点で重要である。
- 提案手法は、MEレイヤーの活性化を抑制することで、指示追従や数学推論タスクの性能を向上させ、アテンションシンクも緩和した。
Abstract
We investigate the origins of massive activations in large language models (LLMs) and identify a specific layer named the textbf{Massive Emergence Layer (ME Layer)}, that is consistently observed across model families, where massive activations first emerge and subsequently propagate to deeper layers through residual connections. We show that, within the ME Layer both the RMSNorm and the FFN parameters jointly contribute to the emergence of massive activations. Once formed, the massive activation token representation remains largely invariant across layers, reducing the diversity of hidden representations passed to the attention module. Motivated by this limitation, we propose a simple and effective method to reduce the rigidity of the massive activation token. Our approach consistently improves LLM performance across multiple tasks, including instruction following and math reasoning, in both training free and fine tuning settings. Moreover, we show that our method mitigates attention sinks by selectively weakening their influence, elucidating their origin at the hidden state level and shedding new light on principled mitigation strategies.
Paper AI Chat
この論文のPDF全文を対象にAIに質問できます。
質問の例: