AIDB Daily Papers
「私を理解してくれる?」:GenAI、LLM、非標準言語に関する計算言語学と社会言語学の視点
※ 日本語タイトル・ポイントはAIによる自動生成です。正確な内容は原論文をご確認ください。
ポイント
- 大規模言語モデル(LLM)と生成AIの設計が、少数言語にとって不利であり、デジタル言語格差を拡大させている現状を分析した。
- 言語の標準化という社会歴史的プロセスが技術を可能にし、言語を単一的システムとして捉える認識を悪化させている点を指摘し、学際的な視点を取り入れた。
- 南チロル方言とクルド語を例に、LLMが非標準言語を扱えるようにする方法と、それが民主的で脱植民地的なデジタル戦略に貢献できるかを議論した。
Abstract
The design of Large Language Models and generative artificial intelligence has been shown to be "unfair" to less-spoken languages and to deepen the digital language divide. Critical sociolinguistic work has also argued that these technologies are not only made possible by prior socio-historical processes of linguistic standardisation, often grounded in European nationalist and colonial projects, but also exacerbate epistemologies of language as "monolithic, monolingual, syntactically standardized systems of meaning". In our paper, we draw on earlier work on the intersections of technology and language policy and bring our respective expertise in critical sociolinguistics and computational linguistics to bear on an interrogation of these arguments. We take two different complexes of non-standard linguistic varieties in our respective repertoires--South Tyrolean dialects, which are widely used in informal communication in South Tyrol, Italy, as well as varieties of Kurdish--as starting points to an interdisciplinary exploration of the intersections between GenAI and linguistic variation and standardisation. We discuss both how LLMs can be made to deal with nonstandard language from a technical perspective, and whether, when or how this can contribute to "democratic and decolonial digital and machine learning strategies", which has direct policy implications.
Paper AI Chat
この論文のPDF全文を対象にAIに質問できます。
質問の例: