AIDB Daily Papers

Transformerベースの言語モデルにおける次トークン予測の汎化誤差限界

原題: Generalization Bounds for Transformer-Based Next-Token Prediction in a Language Model

著者: Insung Kong, Niklas Dexheimer, Johannes Schmidt-Hieber

公開日: 2026-06-11 | 分野: LLM NLP Transformer 統計機械学習

※ 日本語タイトル・ポイントはAIによる自動生成です。正確な内容は原論文をご確認ください。

ポイント

Transformerベースの言語モデルにおける次トークン予測の汎化誤差限界を導出した。
自然言語処理の文献から対数双線形言語モデルを拡張したデータ生成プロセスを提案し、その重要性を示した。
ネットワーク構造、語彙サイズ、文書数、文書長に依存する汎化誤差限界を明らかにした。

Abstract

A refined statistical understanding of LLM pre-training requires the analysis of the transformer architecture for data distributions that encapsulate key characteristics of text data. To address this, we propose a text data distribution based on an extension of the log-bilinear language model from the natural language processing literature. For this data generating process, we derive generalization bounds for deep transformer architectures, highlighting the dependence on the network architecture, the vocabulary size, the number of documents and the document length.

Paper AI Chat

この論文のPDF全文を対象にAIに質問できます。

質問の例:

AIチャット機能を利用するには、ログインまたは会員登録（無料）が必要です。

会員登録 / ログイン

arXivで読む PDFを開く

メタ情報

arXiv ID: 2606.13280
カテゴリ: math.ST

ポイント

Abstract

Paper AI Chat

関連するAIDB記事

メタ情報