AIDB Daily Papers

複数ユーザーの旅行計画を評価するベンチマーク「GroupTravelBench」

原題: GroupTravelBench: Benchmarking LLM Agents on Multi-Person Travel Planning

著者: Xiang Cheng, Yulan Hu, Lulu Zheng, Zheng Pan, Xin Li, Yong Liu

公開日: 2026-05-24 | 分野: LLM NLP 計画インタラクション cs.CL AIエージェント

※ 日本語タイトル・ポイントはAIによる自動生成です。正確な内容は原論文をご確認ください。

ポイント

複数ユーザーの旅行計画における対話とコンフリクト解消能力を評価するベンチマークを開発した。
現実的なユーザープロファイルやPOIデータに基づき、3段階の難易度を持つ650のタスクを合成した。
最先端のLLMでも、ユーザー間の公平性や嗜好の網羅性に課題が見られた。

Abstract

Travel planning is a realistic task for evaluating the planning and tool-use abilities of LLM agents. However, existing benchmarks typically assume only a single user, thereby avoiding one of the most challenging aspects of real-world scenarios: an agent's ability to identify and resolve conflicts among multiple users. To address this gap, we introduce textbf{GroupTravelBench}, the first benchmark for textbf{multi-user, multi-turn} travel planning. Based on real user profiles, POI data, and ticket price data, we synthesize 650 tasks and divide them into three difficulty levels. Beyond standard abilities in single-user itinerary planning, such as multi-step reasoning and tool use, our benchmark further evaluates three key capabilities required for travel agents: emph{(i) elicitation} -- proactively engaging in multi-turn dialogue to gather preferences from each user; emph{(ii) coordination} -- resolving conflicts among users through compromise or subgrouping strategies; and emph{(iii) planning} -- searching for travel plans that maximize overall group utility while maintaining fairness and feasibility. To simulate real-world conversational itinerary planning while enabling reliable tool use and offline evaluation, we build an interactive sandbox environment with cached real-world tool data. We evaluate a wide range of LLMs and find that even frontier models still show substantial weaknesses in preference coverage and group fairness. textit{GroupTravelBench} provides a practical and reproducible benchmark for advancing research on LLM agents for real-world travel planning.

Paper AI Chat

この論文のPDF全文を対象にAIに質問できます。

質問の例:

AIチャット機能を利用するには、ログインまたは会員登録（無料）が必要です。

会員登録 / ログイン

arXivで読む PDFを開く

メタ情報

arXiv ID: 2605.25200
カテゴリ: cs.CL

ポイント

Abstract

Paper AI Chat

関連するAIDB記事

メタ情報