AIDB Daily Papers
感情の階層構造を捉える:マルチモーダル感情認識のための階層的双曲線RAG
※ 日本語タイトル・ポイントはAIによる自動生成です。正確な内容は原論文をご確認ください。
ポイント
- 感情の階層構造を考慮し、テキスト・音声・動画を統合して感情を認識する手法を提案した。
- 双曲線空間への埋め込みと階層的ビームサーチ、構造化された証拠グラフの注入により、既存手法を大幅に上回る性能を示した。
- 提案手法は、感情の階層性を活用し、外部知識を組み込むことで、より精緻な感情認識を実現した。
Abstract
Multimodal emotion recognition aims to integrate text, audio, and video sources to understand human affective states. Although multimodal large language models excel at multimodal reasoning, they typically treat emotion categories as independent labels, ignoring the rich hierarchical taxonomy of human psychology. Moreover, lacking external contextual knowledge makes them highly susceptible to over-interpreting noisy cues, further complicating fine-grained emotion classification. To address these issues, we propose textbf{HyperEmo-RAG}, a retrieval-augmented generation framework that leverages a structured emotional knowledge base. Our framework introduces two key innovations. 1) Hierarchical hyperbolic grounding. Recognizing the inherent hierarchical tree structure of emotion taxonomies, we jointly embed hierarchical emotion labels and multimodal samples into a continuous hyperbolic space (Poincaré ball) and design a hierarchical beam-search deliberation process that progressively retrieves samples from coarse to fine-grained levels. 2) Structured evidence injection. Based on the retrieved evidence, we construct an evidence graph and inject the structured knowledge as explicit cognitive context into the LLM through a Tree-Aware Attention mechanism and an EmotionGraphFormer, preserving the integrity of graph-structured information. Experiments on multiple datasets demonstrate that HyperEmo-RAG significantly outperforms existing methods.
Paper AI Chat
この論文のPDF全文を対象にAIに質問できます。
質問の例: