AIDB Daily Papers
視線は語る:パーソナライズされたユーザーエミュレーションのための視線一致チューニング
※ 日本語タイトル・ポイントはAIによる自動生成です。正確な内容は原論文をご確認ください。
ポイント
- 推薦システムの評価にLLMエージェントが活用されているが、視覚的な注意が欠けていた。
- ユーザーの視線パターンとVLMの視覚的注意を一致させ、シミュレーションの精度向上を目指す。
- FixATEは、視線パターンに基づいてモデルの注意を誘導し、クリック予測精度が向上した。
Abstract
Large language model (LLM) agents are increasingly deployed as scalable user simulators for recommender system evaluation. Yet existing simulators perceive recommendations through text or structured metadata rather than the visual interfaces real users browse-a critical gap, since attention over recommendation layouts is both visually driven and highly personalized. We investigate whether aligning a vision-language model's (VLM's) visual attention with user-specific gaze patterns can improve simulation fidelity. Analysis of a real-world eye-tracking dataset collected in a carousel-based recommendation setting reveals that users exhibit stable individual gaze patterns strongly predictive of click behavior. Building on this finding, we propose Fixation-Aligned Tuning for user Emulation (FixATE). Our approach first probes the VLM's internal visual attention via interpretability operators to obtain a slot-level relevance distribution comparable with human fixation, and then learns personalized soft prompts to steer the model's attention toward each user's characteristic fixation pattern. Experiments across three interpretability-based probing operators and two architecturally distinct VLM backbones demonstrate consistent improvements in both attention alignment and click prediction accuracy. These results suggest that making the model "see like the user" is a viable path toward simulators that more faithfully reproduce how users perceive and act in recommendation interfaces.
Paper AI Chat
この論文のPDF全文を対象にAIに質問できます。
質問の例: