AIDB Daily Papers

LLM活用：スキーマ主導による異種データソースからの行方不明者情報抽出・検証

原題: LLM-based Schema-Guided Extraction and Validation of Missing-Person Intelligence from Heterogeneous Data Sources

著者: Joshua Castillo, Ravi Mukkamala

公開日: 2026-04-08 | 分野: LLM NLP 機械学習 AI 検証情報抽出法律 OCR 自然言語処理深層学習犯罪データ抽出 PDF スキーマ行方不明者

※ 日本語タイトル・ポイントはAIによる自動生成です。正確な内容は原論文をご確認ください。

ポイント

本研究では、行方不明者捜査における多様なドキュメントを解析するAIパイプライン「Guardian Parser Pack」を開発した。
レイアウトや用語のばらつきを統一し、スキーマに準拠した表現に変換することで、迅速な分析と捜索計画を支援する点が新しい。
LLM支援により、決定論的解析器と比較して抽出品質が大幅に向上し、主要フィールドの完全性も改善された。

Abstract

Missing-person and child-safety investigations rely on heterogeneous case documents, including structured forms, bulletin-style posters, and narrative web profiles. Variations in layout, terminology, and data quality impede rapid triage, large-scale analysis, and search-planning workflows. This paper introduces the Guardian Parser Pack, an AI-driven parsing and normalization pipeline that transforms multi-source investigative documents into a unified, schema-compliant representation suitable for operational review and downstream spatial modeling. The proposed system integrates (i) multi-engine PDF text extraction with Optical Character Recognition (OCR) fallback, (ii) rule-based source identification with source-specific parsers, (iii) schema-first harmonization and validation, and (iv) an optional Large Language Model (LLM)-assisted extraction pathway incorporating validator-guided repair and shared geocoding services. We present the system architecture, key implementation decisions, and output design, and evaluate performance using both gold-aligned extraction metrics and corpus-level operational indicators. On a manually aligned subset of 75 cases, the LLM-assisted pathway achieved substantially higher extraction quality than the deterministic comparator (F1 = 0.8664 vs. 0.2578), while across 517 parsed records per pathway it also improved aggregate key-field completeness (96.97% vs. 93.23%). The deterministic pathway remained much faster (mean runtime 0.03 s/record vs. 3.95 s/record for the LLM pathway). In the evaluated run, all LLM outputs passed initial schema validation, so validator-guided repair functioned as a built-in safeguard rather than a contributor to the observed gains. These results support controlled use of probabilistic AI within a schema-first, auditable pipeline for high-stakes investigative settings.

Paper AI Chat

この論文のPDF全文を対象にAIに質問できます。

質問の例:

AIチャット機能を利用するには、ログインまたは会員登録（無料）が必要です。

会員登録 / ログイン

arXivで読む PDFを開く

メタ情報

arXiv ID: 2604.06571
カテゴリ: cs.CL, cs.AI, cs.IR, cs.LG

ポイント

Abstract

Paper AI Chat

関連するAIDB記事

メタ情報