AIDB Daily Papers
LLMの記憶メカニズム徹底解剖:統計的・内部構造レベルでの比較分析
※ 日本語タイトル・ポイントはAIによる自動生成です。正確な内容は原論文をご確認ください。
ポイント
- 複数のLLMを比較分析し、記憶メカニズムの共通点とモデル固有の特徴を明らかにすることを目指した研究。
- LLMの事前学習データへのアクセス制限がある中、複数モデル系列を横断的に分析し、普遍的な理解を目指す点が新しい。
- 記憶率はモデルサイズに対し対数線形に増加、記憶されたシーケンスは圧縮可能であり、重要なAttention Headの分布はモデル間で異なる。
Abstract
Memorization is a fundamental component of intelligence for both humans and LLMs. However, while LLM performance scales rapidly, our understanding of memorization lags. Due to limited access to the pre-training data of LLMs, most previous studies focus on a single model series, leading to isolated observations among series, making it unclear which findings are general or specific. In this study, we collect multiple model series (Pythia, OpenLLaMa, StarCoder, OLMo1/2/3) and analyze their shared or unique memorization behavior at both the statistical and internal levels, connecting individual observations while showing new findings. At the statistical level, we reveal that the memorization rate scales log-linearly with model size, and memorized sequences can be further compressed. Further analysis demonstrated a shared frequency and domain distribution pattern for memorized sequences. However, different models also show individual features under the above observations. At the internal level, we find that LLMs can remove certain injected perturbations, while memorized sequences are more sensitive. By decoding middle layers and attention head ablation, we revealed the general decoding process and shared important heads for memorization. However, the distribution of those important heads differs between families, showing a unique family-level feature. Through bridging various experiments and revealing new findings, this study paves the way for a universal and fundamental understanding of memorization in LLM.
Paper AI Chat
この論文のPDF全文を対象にAIに質問できます。
質問の例: