AIDB Daily Papers
ChatGPTモデルの自己収束:実験的証拠と多様性低下の検証
※ 日本語タイトル・ポイントはAIによる自動生成です。正確な内容は原論文をご確認ください。
ポイント
- ChatGPTモデルが生成するテキストの多様性を、時間経過とともにテキスト類似性を用いて評価しました。
- 自己生成データによる再帰的学習はモデルの崩壊を引き起こす可能性があり、その影響を長期的に調査することが重要です。
- 最新のChatGPTモデルは、テキストの多様性を生成する能力が低下しており、自己収束の兆候が見られました。
Abstract
Large Language Models (LLMs) that undergo recursive training on synthetically generated data are susceptible to model collapse, a phenomenon marked by the generation of meaningless output. Existing research has examined this issue from either theoretical or empirical perspectives, often focusing on a single model trained recursively on its own outputs. While prior studies have cautioned against the potential degradation of LLM output quality under such conditions, no longitudinal investigation has yet been conducted to assess this effect over time. In this study, we employ a text similarity metric to evaluate different ChatGPT models' capacity to generate diverse textual outputs. Our findings indicate a measurable decline of recent ChatGPT releases' ability to produce varied text, even when explicitly prompted to do so, by setting the temperature parameter to one. The observed reduction in output diversity may be attributed to the influence of the amounts of synthetic data incorporated within their training datasets as the result of internet infiltration by LLM generated data. The phenomenon is defined as model self-convergence because of the gradual increase of similarities of produced texts among different ChatGPT versions.
Paper AI Chat
この論文のPDF全文を対象にAIに質問できます。
質問の例: