AIDB Daily Papers

MindAlign：脳波・視覚・言語を連携させ、ゼロショットで視覚情報を復号する

原題: MindAlign: Bridging EEG, Vision, and Language for Zero-Shot Visual Decoding

著者: Zexuan Chen, Sichao Liu, Runhao Lu, Huichao Qi, Alexandra Woolgar, Xi Vincent Wang, Lihui Wang

公開日: 2026-05-23 | 分野: AI cs.CL cs.LG q-bio.NC AI支援 AI評価

※ 日本語タイトル・ポイントはAIによる自動生成です。正確な内容は原論文をご確認ください。

ポイント

脳波、画像、言語モデルによるテキスト記述を統一潜在空間に配置する三モード対照学習フレームワークを提案した。
非侵襲的な脳波信号から視覚情報を復号する際に、言語情報を意味的な正則化として活用することで、先行研究を大幅に上回る精度を達成した。
提案手法は、ゼロショット設定の視覚復号ベンチマークにおいて、既存手法を大きく凌駕する性能を示し、脳活動と視覚処理の関連性も明らかにした。

Abstract

Visual decoding from brain signals is a key challenge at the intersection of computer vision and neuroscience, requiring methods that bridge neural representations and computational models of vision. We introduce a tri-modal contrastive framework for EEG-based visual decoding that aligns EEG, visual, and textual representations within a unified latent space. Our approach follows a two-stage design. First, we pre-train an EEG encoder via masked reconstruction on unlabeled trials, learning spatio-temporal regularities that transfer robustly to downstream tasks. Second, we jointly align EEG, image, and LLM-generated textual descriptions through contrastive learning, where text supervision acts as a semantic regularizer that injects linguistic structure into the shared space without overwhelming the primary EEG-image signal. The encoder integrates subject-specific adaptation, graph-attention over channels, and temporal-spatial convolutional embeddings. On the Things-EEG2 200-way zero-shot benchmark, our framework achieves 54.1% Top-1 and 83.4% Top-5 accuracy, substantially exceeding the strongest prior baseline (32.4% / 64.0%), with paired Wilcoxon tests confirming significance (p < 0.01) over all in-subject baselines. We validate generalization on Things-MEG. Analysis reveals that compact embedding geometries (CN-CLIP) outperform much larger backbones, and that decoding aligns with established neurophysiology of visual processing. This work is a critical step towards robust, semantically-grounded visual decoding from non-invasive temporal neural signals. The source code is publicly available in https://github.com/anon-eeg/eeg_image_decoding.

Paper AI Chat

この論文のPDF全文を対象にAIに質問できます。

質問の例:

AIチャット機能を利用するには、ログインまたは会員登録（無料）が必要です。

会員登録 / ログイン

arXivで読む PDFを開く

メタ情報

arXiv ID: 2605.24523
カテゴリ: cs.LG, cs.CL, q-bio.NC

ポイント

Abstract

Paper AI Chat

関連するAIDB記事

メタ情報