AIDB Daily Papers

同じ結果でも道は違う：本番検索システムにおける人間とGUIエージェントの行動を比較するトレースレベルのフレームワーク

原題: Same Outcomes, Different Journeys: A Trace-Level Framework for Comparing Human and GUI-Agent Behavior in Production Search Systems

著者: Maria Movin, Claudia Hauff, Aron Henriksson, Panagiotis Papapetrou

公開日: 2026-04-09 | 分野: LLM 機械学習 AI 検索エージェント評価行動システムユーザタスク自然言語処理深層学習ヒューマンコンピュータインタラクションインタラクション GUI

※ 日本語タイトル・ポイントはAIによる自動生成です。正確な内容は原論文をご確認ください。

ポイント

LLM駆動のGUIエージェントと人間の行動を、タスクの成果、クエリ、ナビゲーションで比較する評価フレームワークを提示。
GUIエージェントはタスク成功率こそ人間並みだが、ナビゲーション戦略に違いがあり、行動レベルでのずれが明らかになった。
本番環境の検索システムでGUIエージェントをユーザーの代わりとして利用する際には、行動レベルの診断が重要となる。

Abstract

LLM-driven GUI agents are increasingly used in production systems to automate workflows and simulate users for evaluation and optimization. Yet most GUI-agent evaluations emphasize task success and provide limited evidence on whether agents interact in human-like ways. We present a trace-level evaluation framework that compares human and agent behavior across (i) task outcome and effort, (ii) query formulation, and (iii) navigation across interface states. We instantiate the framework in a controlled study in a production audio-streaming search application, where 39 participants and a state-of-the-art GUI agent perform ten multi-hop search tasks. The agent achieves task success comparable to participants and generates broadly aligned queries, but follows systematically different navigation strategies: participants exhibit content-centric, exploratory behavior, while the agent is more search-centric and low-branching. These results show that outcome and query alignment do not imply behavioral alignment, motivating trace-level diagnostics when deploying GUI agents as proxies for users in production search systems.

Paper AI Chat

この論文のPDF全文を対象にAIに質問できます。

質問の例:

AIチャット機能を利用するには、ログインまたは会員登録（無料）が必要です。

会員登録 / ログイン

arXivで読む PDFを開く

メタ情報

arXiv ID: 2604.07929
カテゴリ: cs.IR, cs.AI

ポイント

Abstract

Paper AI Chat

関連するAIDB記事

メタ情報