AIDB Daily Papers

自然言語で音声を伝達する新手法：Lexical Acoustic Coding (LAC)

原題: Communicating Sound Through Natural Language

著者: Emanuele Rossi, Emanuele Rodolà

公開日: 2026-05-09 | 分野: LLM 自然言語処理 cs.CL cs.AI cs.MA cs.LG AIエージェント音声認識音声合成

※ 日本語タイトル・ポイントはAIによる自動生成です。正確な内容は原論文をご確認ください。

ポイント

LLMエージェントが自然言語のみを用いて音声を送受信するLexical Acoustic Coding (LAC)フレームワークを提案しました。
この研究は、音声情報をテキストで表現・伝達するという点で革新的であり、LLMとの親和性が高いです。
LACは、音声構造を保持しつつ解釈可能で編集可能なテキストを生成し、通信レートと忠実度のトレードオフを示しました。

Abstract

Natural language is widely used to describe, prompt, and control audio systems, but rarely serves as the representation carrying audio itself. We introduce lexical acoustic coding (LAC), a framework in which pre-trained LLM sender and receiver agents transmit sound through natural language. Under fixed system prompts, the agents write their own analysis and synthesis code, communicating only through a lexical sentence, shared vocabulary, and optional symbolic music structure. The sender analyzes an input waveform into interpretable, non-learned acoustic descriptors, quantizes each with a feature-specific interval vocabulary, and verbalizes the lexical code as English. The receiver parses the sentence back into lexical-acoustic constraints and renders a waveform through closed-loop refinement. The transmitted text serves as both a rich caption and as the transport representation itself. We frame LAC as a finite-rate lossy quantizer, exposing trade-offs between vocabulary size, rate, and fidelity. Experiments on short sounds and symbolic music transfer show that plain text preserves measurable acoustic structure while remaining interpretable, editable, and native to LLM-mediated communication.

Paper AI Chat

この論文のPDF全文を対象にAIに質問できます。

質問の例:

AIチャット機能を利用するには、ログインまたは会員登録（無料）が必要です。

会員登録 / ログイン

arXivで読む PDFを開く

メタ情報

arXiv ID: 2605.08750
カテゴリ: cs.LG, cs.AI, cs.CL, cs.MA

ポイント

Abstract

Paper AI Chat

関連するAIDB記事

メタ情報