AIDB Daily Papers

文書増加によるRAGの性能低下を回避：ドメイン限定・モデル非依存の検索手法

原題: When More Documents Hurt RAG: Mitigating Vector Search Dilution with Domain-Scoped, Model-Agnostic Retrieval

著者: Nabaraj Subedi, Ahmed Abdelaty, Shivanand Venkanna Sheshappanavar

公開日: 2026-06-09 | 分野: LLM NLP RAG cs.CL cs.IR

※ 日本語タイトル・ポイントはAIによる自動生成です。正確な内容は原論文をご確認ください。

ポイント

大規模で多様な文書集合において、類似度検索の識別力が低下し、RAGの精度が悪化する「ベクトル検索希釈」問題を特定した。
この問題を解決するため、組織メタデータを用いたドメイン限定検索を提案し、精度を大幅に向上させることに成功した。
マルチエージェント連携は設定依存性が高く、ドメイン限定検索と単一合成呼び出しを推奨する。

Abstract

Retrieval-augmented generation degrades when scaled to large, heterogeneous document collections, where dense similarity loses discriminative power, and top-k retrieval increasingly returns semantically similar but contextually incorrect chunks. We refer to this failure mode as vector search dilution. Even when using hybrid dense+sparse retrieval, we observed this firsthand in a deployed Wyoming Department of Transportation corpus, where scaling from 54 to 1,128 documents (88,907 chunks) reduced accuracy from 75% to below 40%. To address this dilution, we propose MASDR-RAG ( Multi-Agent Scoped Domain Retrieval for RAG) and evaluate it on 200 expert-validated queries across five LLM backbones, six corpora, and two index stacks. Our results indicate that domain scoping using organizational metadata is the key fix, significantly improving P@10 from 0.77 to 0.86 ($p < 0.05$). Furthermore, our investigation of multi-agent orchestration revealed that a high degree of configuration dependence results --creating what we call the precision-faithfulness paradox. Based on these varied outcomes, our practical recommendation is simple: scope first, then perform a single synthesis call, reserving full multi-agent orchestration for genuinely multi-domain corpora paired with native-tool-call backbones. Code and Data will be made public upon acceptance.

Paper AI Chat

この論文のPDF全文を対象にAIに質問できます。

質問の例:

AIチャット機能を利用するには、ログインまたは会員登録（無料）が必要です。

会員登録 / ログイン

arXivで読む PDFを開く

メタ情報

arXiv ID: 2606.11350
カテゴリ: cs.CL, cs.IR

ポイント

Abstract

Paper AI Chat

関連するAIDB記事

メタ情報