AIDB Daily Papers
大規模言語モデルにおける構造的ハルシネーション:知識構造と引用の完全性に関するネットワークベースの評価
※ 日本語タイトル・ポイントはAIによる自動生成です。正確な内容は原論文をご確認ください。
ポイント
- 大規模言語モデル(LLM)の知識構造の歪みを検出する新しい評価手法を開発しました。
- 文レベルの正確性評価では見過ごされる、概念構造、関係性、参考文献の歪みを構造的ハルシネーションと定義し、その検出を試みました。
- 実験の結果、LLMは知識構造の再現において大きな歪みを示し、従来の評価指標では捉えられない問題があることが判明しました。
Abstract
Large Language Models (LLMs) increasingly mediate access to scholarly information, yet their outputs are typically evaluated at the level of individual statements rather than knowledge structure. This paper introduces structural hallucination: systematic distortion of conceptual organization, relational architecture, and bibliographic grounding that remains invisible to sentence-level accuracy metrics. To detect such distortions, we develop a network-based hallucination stress test grounded in knowledge graph extraction, graph similarity analysis, centrality comparison, and citation integrity verification. The protocol is applied to three structured domains representing core forms of scholarly knowledge: Roget's Thesaurus (1911) as a lexical ontology, Wikidata philosophers as a biographical knowledge graph, and bibliographic citation records retrieved from the Dimensions.ai database. Across all domains, substantial structural divergence is observed. In the lexical benchmark, macro-averaged F1 scores fall below 0.05; in the biographical benchmark, hallucination rates exceed 93%; and in the bibliometric benchmark, citation omission reaches 91.9%. Network-level comparison in the Roget reconstruction further reveals node-set Jaccard similarity of 0.028 and fabrication rates above 94%. These findings show that structural fidelity cannot be inferred from local fluency alone. The proposed stress test provides a reproducible instrument for evaluating the structural integrity of LLM-generated knowledge representations within knowledge organization and information quality research.
Paper AI Chat
この論文のPDF全文を対象にAIに質問できます。
質問の例: