AIDB Daily Papers
自然言語で音声を伝達する新手法:Lexical Acoustic Coding (LAC)
※ 日本語タイトル・ポイントはAIによる自動生成です。正確な内容は原論文をご確認ください。
ポイント
- LLMエージェントが自然言語のみを用いて音声を送受信するLexical Acoustic Coding (LAC)フレームワークを提案しました。
- この研究は、音声情報をテキストで表現・伝達するという点で革新的であり、LLMとの親和性が高いです。
- LACは、音声構造を保持しつつ解釈可能で編集可能なテキストを生成し、通信レートと忠実度のトレードオフを示しました。
Abstract
Natural language is widely used to describe, prompt, and control audio systems, but rarely serves as the representation carrying audio itself. We introduce lexical acoustic coding (LAC), a framework in which pre-trained LLM sender and receiver agents transmit sound through natural language. Under fixed system prompts, the agents write their own analysis and synthesis code, communicating only through a lexical sentence, shared vocabulary, and optional symbolic music structure. The sender analyzes an input waveform into interpretable, non-learned acoustic descriptors, quantizes each with a feature-specific interval vocabulary, and verbalizes the lexical code as English. The receiver parses the sentence back into lexical-acoustic constraints and renders a waveform through closed-loop refinement. The transmitted text serves as both a rich caption and as the transport representation itself. We frame LAC as a finite-rate lossy quantizer, exposing trade-offs between vocabulary size, rate, and fidelity. Experiments on short sounds and symbolic music transfer show that plain text preserves measurable acoustic structure while remaining interpretable, editable, and native to LLM-mediated communication.
Paper AI Chat
この論文のPDF全文を対象にAIに質問できます。
質問の例: