AIDB Daily Papers

キャラクター視点で世界を見る：マルチモーダルRPGエージェントの役割干渉を解決する

原題: Through the Lens of Character: Resolving Modality-Role Interference in Multimodal Role-Playing Agent

著者: Yihong Tang, Kehai Chen, Xuefeng Bai, Min Zhang

公開日: 2026-05-10 | 分野: LLM マルチモーダルロールプレイ cs.CL cs.CV AIエージェント

※ 日本語タイトル・ポイントはAIによる自動生成です。正確な内容は原論文をご確認ください。

ポイント

本研究は、マルチモーダルRPGエージェントにおける視覚情報とキャラクター設定の干渉問題を解決するフレームワークを提案した。
既存モデルではキャラクターに依存しない一般的な視覚特徴抽出が、キャラクターの一貫性を損なう課題があった。
提案手法により、エージェントはキャラクターに沿った視覚情報を統合し、一貫性のある対話能力が大幅に向上した。

Abstract

The advancement of Multimodal Large Language Models (MLLMs) has expanded Role-Playing Agents (RPAs) into visually grounded environments. However, human vision is inherently subjective and identity-driven, whereas existing MLLMs extract objective, character-agnostic features for general tasks. In RPAs, this generic visual noise overpowers fragile character traits, causing Modality-Role Interference (MRI), where agents struggle to integrate visual grounding and character consistency. To address this, we introduce the training-free Character-Aware Visual Intervention (CAVI) framework, enabling agents to perceive the world through the lens of character. CAVI systematically targets MRI: macroscopically, Character-Guided Token Pruning (CTP) restricts the visual receptive field to role-relevant entities; microscopically, Orthogonal Feature Modulation (OFM) projects tokens onto a character-context subspace to extract aligned facts; and during decoding, Modality-Adaptive Role Steering (MARS) dynamically optimizes steering intensity based on visual reliance. Extensive experiments show CAVI effectively alleviates MRI, significantly enhancing character-consistent multimodal interactions.

Paper AI Chat

この論文のPDF全文を対象にAIに質問できます。

質問の例:

AIチャット機能を利用するには、ログインまたは会員登録（無料）が必要です。

会員登録 / ログイン

arXivで読む PDFを開く

メタ情報

arXiv ID: 2605.09443
カテゴリ: cs.CV, cs.CL

ポイント

Abstract

Paper AI Chat

関連するAIDB記事

メタ情報