AIDB Daily Papers
声、顔、そして感情:メンタルヘルス理解のためのマルチモーダル感情・認知キャプション生成
※ 日本語タイトル・ポイントはAIによる自動生成です。正確な内容は原論文をご確認ください。
ポイント
- マルチモーダルデータから感情と認知の状態を自然言語で記述するECMCタスクを提案し、メンタルヘルス評価の精度と解釈性を向上させる。
- 既存手法がマルチモーダルデータを分類タスクとして扱い、感情と認知の解釈性を制限する課題に対し、LLMと対照学習で感情・認知特徴を抽出する。
- 客観的・主観的評価でECMCが既存モデルを上回り、生成された感情・認知プロファイルがメンタルヘルス分析の診断と解釈性を大幅に改善した。
Abstract
Emotional and cognitive factors are essential for understanding mental health disorders. However, existing methods often treat multi-modal data as classification tasks, limiting interpretability especially for emotion and cognition. Although large language models (LLMs) offer opportunities for mental health analysis, they mainly rely on textual semantics and overlook fine-grained emotional and cognitive cues in multi-modal inputs. While some studies incorporate emotional features via transfer learning, their connection to mental health conditions remains implicit. To address these issues, we propose ECMC, a novel task that aims at generating natural language descriptions of emotional and cognitive states from multi-modal data, and producing emotion-cognition profiles that improve both the accuracy and interpretability of mental health assessments. We adopt an encoder-decoder architecture, where modality-specific encoders extract features, which are fused by a dual-stream BridgeNet based on Q-former. Contrastive learning enhances the extraction of emotional and cognitive features. A LLaMA decoder then aligns these features with annotated captions to produce detailed descriptions. Extensive objective and subjective evaluations demonstrate that: 1) ECMC outperforms existing multi-modal LLMs and mental health models in generating emotion-cognition captions; 2) the generated emotion-cognition profiles significantly improve assistive diagnosis and interpretability in mental health analysis.
Paper AI Chat
この論文のPDF全文を対象にAIに質問できます。
質問の例: