AIDB Daily Papers

英語中心開発からの脱却を目指す大規模言語モデル

原題: Toward LLMs Beyond English-Centric Development

著者: Sho Takase, Ukyo Honda

公開日: 2026-05-15 | 分野: LLM NLP 多言語自然言語処理大規模言語モデル cs.CL

※ 日本語タイトル・ポイントはAIによる自動生成です。正確な内容は原論文をご確認ください。

ポイント

大規模言語モデル（LLM）の分析から、英語への強い偏りがあることが明らかになった。
継続的な事前学習は、言語適応や文化理解の向上において、ゼロからの学習に対するコスト優位性を示さなかった。
今後のLLM開発では、英語中心のリソース拡大に頼るのではなく、言語ごとの投資が重要になる可能性が示唆された。

Abstract

Through an analysis of sequences generated by open-weight large language models (LLMs), we demonstrate that LLMs are heavily biased toward English. While continual pre-training is commonly used to adapt LLMs to a target language, we show that it does not offer a cost advantage over training from scratch, even for improving cultural understanding in the target language. These findings suggest that dedicated per-language investment may become increasingly important for future LLM development, rather than relying primarily on the expansion of English-centric resources.

Paper AI Chat

この論文のPDF全文を対象にAIに質問できます。

質問の例:

AIチャット機能を利用するには、ログインまたは会員登録（無料）が必要です。

会員登録 / ログイン

arXivで読む PDFを開く

メタ情報

arXiv ID: 2605.15613
カテゴリ: cs.CL

ポイント

Abstract

Paper AI Chat

関連するAIDB記事

メタ情報