AIDB Daily Papers

大規模言語モデル時代のサイレント音声インターフェース：包括的な分類と体系的レビュー

原題: Silent Speech Interfaces in the Era of Large Language Models: A Comprehensive Taxonomy and Systematic Review

著者: Kele Xu, Yifan Wang, Ming Feng, Qisheng Xu, Wuyang Chen, Yutao Dou, Cheng Yang, Huaimin Wang

公開日: 2026-03-12 | 分野: LLM NLP 音声安全性脳機械学習対話言語自動化インタフェースパーソナライズ

※ 日本語タイトル・ポイントはAIによる自動生成です。正確な内容は原論文をご確認ください。

ポイント

本研究では、神経筋や調音運動から言語意図を直接解読するサイレント音声インターフェース（SSI）の現状を包括的に分析しました。
従来の信号処理から、大規模言語モデル（LLM）を活用した潜在的意味アライメントへのパラダイムシフトが、実用的なSSIの実現に近づいています。
ウェアラブルデバイスへの統合や、自己教師あり学習によるユーザー依存性の克服、認知の自由を守る倫理的境界の定義など、今後の展望を示しました。

Abstract

Human-computer interaction has traditionally relied on the acoustic channel, a dependency that introduces systemic vulnerabilities to environmental noise, privacy constraints, and physiological speech impairments. Silent Speech Interfaces (SSIs) emerge as a transformative paradigm that bypasses the acoustic stage by decoding linguistic intent directly from the neuro-muscular-articulatory continuum. This review provides a high-level synthesis of the SSI landscape, transitioning from traditional transducer-centric analysis to a holistic intent-to-execution taxonomy. We systematically evaluate sensing modalities across four critical physiological interception points: neural oscillations, neuromuscular activation, articulatory kinematics (ultrasound/magnetometry), and pervasive active probing via acoustic or radio-frequency sensing. Critically, we analyze the current paradigm shift from heuristic signal processing to Latent Semantic Alignment. In this new era, Large Language Models (LLMs) and deep generative architectures serve as high-level linguistic priors to resolve the ``informational sparsity'' and non-stationarity of biosignals. By mapping fragmented physiological gestures into structured semantic latent spaces, modern SSI frameworks have, for the first time, approached the Word Error Rate usability threshold required for real-world deployment. We further examine the transition of SSIs from bulky laboratory instrumentation to ``invisible interfaces'' integrated into commodity-grade wearables, such as earables and smart glasses. Finally, we outline a strategic roadmap addressing the ``user-dependency paradox'' through self-supervised foundation models and define the ethical boundaries of ``neuro-security'' to protect cognitive liberty in an increasingly interfaced world.

Paper AI Chat

この論文のPDF全文を対象にAIに質問できます。

質問の例:

AIチャット機能を利用するには、ログインまたは会員登録（無料）が必要です。

会員登録 / ログイン

arXivで読む PDFを開く

メタ情報

arXiv ID: 2603.11877
カテゴリ: eess.AS

ポイント

Abstract

Paper AI Chat

関連するAIDB記事

メタ情報