AIDB Daily Papers

買い物アシスタントAIの進化：構築・評価・最適化の設計図

原題: Build, Judge, Optimize: A Blueprint for Continuous Improvement of Multi-Agent Consumer Assistants

著者: Alejandro Breen Herrera, Aayush Sheth, Steven G. Xu, Zhucheng Zhan, Charles Wright, Marcus Yearwood, Hongtai Wei, Sudeep Das

公開日: 2026-03-03 | 分野: LLM 機械学習 AI エージェント情報検索対話評価最適化

※ 日本語タイトル・ポイントはAIによる自動生成です。正確な内容は原論文をご確認ください。

ポイント

会話型買い物アシスタントの評価と最適化に向けた実用的な設計図を提示した研究。
複数ターンの対話評価と、複雑に連携する複数エージェントシステムの最適化という課題に取り組む点が新しい。
LLMを評価者として活用し、プロンプト最適化戦略を開発、買い物アシスタントの品質向上に貢献。

Abstract

Conversational shopping assistants (CSAs) represent a compelling application of agentic AI, but moving from prototype to production reveals two underexplored challenges: how to evaluate multi-turn interactions and how to optimize tightly coupled multi-agent systems. Grocery shopping further amplifies these difficulties, as user requests are often underspecified, highly preference-sensitive, and constrained by factors such as budget and inventory. In this paper, we present a practical blueprint for evaluating and optimizing conversational shopping assistants, illustrated through a production-scale AI grocery assistant. We introduce a multi-faceted evaluation rubric that decomposes end-to-end shopping quality into structured dimensions and develop a calibrated LLM-as-judge pipeline aligned with human annotations. Building on this evaluation foundation, we investigate two complementary prompt-optimization strategies based on a SOTA prompt-optimizer called GEPA (Shao et al., 2025): (1) Sub-agent GEPA, which optimizes individual agent nodes against localized rubrics, and (2) MAMuT (Multi-Agent Multi-Turn) GEPA (Herrera et al., 2026), a novel system-level approach that jointly optimizes prompts across agents using multi-turn simulation and trajectory-level scoring. We release rubric templates and evaluation design guidance to support practitioners building production CSAs.

Paper AI Chat

この論文のPDF全文を対象にAIに質問できます。

質問の例:

AIチャット機能を利用するには、ログインまたは会員登録（無料）が必要です。

会員登録 / ログイン

arXivで読む PDFを開く

メタ情報

arXiv ID: 2603.03565
カテゴリ: cs.AI, cs.CL, cs.LG

ポイント

Abstract

Paper AI Chat

関連するAIDB記事

メタ情報