AIDB Daily Papers

失敗経路を信頼できるLLMエージェントへ：ハーネスの欠陥診断と修復

原題: From Failed Trajectories to Reliable LLM Agents: Diagnosing and Repairing Harness Flaws

著者: Mengzhuo Chen, Junjie Wang, Zhe Liu, Yawen Wang, Qing Wang

公開日: 2026-06-04 | 分野: LLM cs.MA cs.SE AIエージェントソフトウェア工学 AI支援

※ 日本語タイトル・ポイントはAIによる自動生成です。正確な内容は原論文をご確認ください。

ポイント

LLMエージェントの実行環境であるハーネスの欠陥を特定し、修復するためのフレームワークHarnessFixを提案した。
従来の自己改善手法は失敗原因の特定が曖昧だったが、HarnessFixは実行トレースを解析し、欠陥箇所を正確に突き止める。
HarnessFixはベンチマークで性能を15.2%～50.0%向上させ、既存手法を上回る結果を示した。

Abstract

LLM-based agents increasingly rely on harnesses that provide execution environments, tool interfaces, context, lifecycle orchestration, observability, verification, and governance. Existing self-improving agents and automatic harness evolution methods mainly improve agents through runtime supervision, prompt optimization, workflow search, or harness modification based on final outcomes. However, they often fail to diagnose where the responsible evidence lies in failed trajectories and which harness layer causes the unreliable behavior, resulting in broad, indirect, or poorly scoped changes. This paper proposes HarnessFix, a trace-guided framework for diagnosing agent failures and repairing agent harnesses. HarnessFix compiles raw execution traces and harness code into a Harness-aware Trace Intermediate Representation (HTIR), which normalizes fragmented trajectory evidence and captures step-level provenance and control-flow relations. It then attributes failures to responsible trajectory steps and harness layers, consolidates recurring diagnoses into actionable flaw records, and maps them to scoped repair operators. Finally, HarnessFix generates and validates harness patches under flaw-specific repair specifications to reduce target flaws without introducing unacceptable regressions. We evaluate HarnessFix on SWE-Bench Verified, Terminal-Bench 2.0 Verified, GAIA and AppWorld. Across these benchmarks, HarnessFix improves held-out test performance over the initial harnesses by 15.2%--50.0%, outperforms human-designed and self-evolution baselines, and reveals recurring harness-flaw patterns across ETCLOVG layers.

Paper AI Chat

この論文のPDF全文を対象にAIに質問できます。

質問の例:

AIチャット機能を利用するには、ログインまたは会員登録（無料）が必要です。

会員登録 / ログイン

arXivで読む PDFを開く

メタ情報

arXiv ID: 2606.06324
カテゴリ: cs.SE, cs.MA

ポイント

Abstract

Paper AI Chat

関連するAIDB記事

メタ情報