AIDB Daily Papers

MultiDocFusion：階層型マルチモーダルチャンク処理による長尺な産業ドキュメント向けRAGの強化

原題: MultiDocFusion: Hierarchical and Multimodal Chunking Pipeline for Enhanced RAG on Long Industrial Documents

著者: Joongmin Shin, Chanjun Park, Jeongbae Park, Jaehyung Seo, Heuiseok Lim

公開日: 2026-04-14 | 分野: 質問応答階層産業データ抽出構造自然言語処理精度 OCR RAG LLM 解析ドキュメント情報検索検索 AI ベンチマーク Transformer マルチモーダル

※ 日本語タイトル・ポイントはAIによる自動生成です。正確な内容は原論文をご確認ください。

ポイント

産業ドキュメントの構造を考慮した、新しいマルチモーダルチャンク処理パイプラインMultiDocFusionを提案しました。
従来のRAGでは構造が無視されがちでしたが、本手法は文書構造を階層的に解析し、情報損失を低減します。
産業ベンチマークでの実験により、検索精度が8-15%向上、QAスコアが2-3%向上することを確認しました。

Abstract

RAG-based QA has emerged as a powerful method for processing long industrial documents. However, conventional text chunking approaches often neglect complex and long industrial document structures, causing information loss and reduced answer quality. To address this, we introduce MultiDocFusion, a multimodal chunking pipeline that integrates: (i) detection of document regions using vision-based document parsing, (ii) text extraction from these regions via OCR, (iii) reconstruction of document structure into a hierarchical tree using large language model (LLM)-based document section hierarchical parsing (DSHP-LLM), and (iv) construction of hierarchical chunks through DFS-based grouping. Extensive experiments across industrial benchmarks demonstrate that MultiDocFusion improves retrieval precision by 8-15% and ANLS QA scores by 2-3% compared to baselines, emphasizing the critical role of explicitly leveraging document hierarchy for multimodal document-based QA. These significant performance gains underscore the necessity of structure-aware chunking in enhancing the fidelity of RAG-based QA systems.

Paper AI Chat

この論文のPDF全文を対象にAIに質問できます。

質問の例:

AIチャット機能を利用するには、ログインまたは会員登録（無料）が必要です。

会員登録 / ログイン

arXivで読む PDFを開く

メタ情報

arXiv ID: 2604.12352
カテゴリ: cs.AI, cs.CL

ポイント

Abstract

Paper AI Chat

関連するAIDB記事

メタ情報