AIDB Daily Papers
LLMは学術的な共同研究を予測できるか? トポロジーヒューリスティクス vs. LLMによる共著ネットワークのリンク予測
※ 日本語タイトル・ポイントはAIによる自動生成です。正確な内容は原論文をご確認ください。
ポイント
- 大規模言語モデル(LLM)が研究者の共同研究を予測できるかを、OpenAlexの共著ネットワークで検証しました。
- LLMはグラフ構造にアクセスせず、著者プロファイルのみで未来の協調を予測し、トポロジーとは異なる情報を捉えます。
- LLMは、共通の隣人がいない著者間の新規協調を精度良く予測し、研究概念が重要な信号であることが判明しました。
Abstract
Can large language models (LLMs) predict which researchers will collaborate? We study this question through link prediction on real-world co-authorship networks from OpenAlex (9.96M authors, 108.7M edges), evaluating whether LLMs can predict future scientific collaborations using only author profiles, without access to graph structure. Using Qwen2.5-72B-Instruct across three historical eras of AI research, we find that LLMs and topology heuristics capture distinct signals and are strongest in complementary settings. On new-edge prediction under natural class imbalance, the LLM achieves AUROC 0.714--0.789, outperforming Common Neighbors, Jaccard, and Preferential Attachment, with recall up to 92.9%; under balanced evaluation, the LLM outperforms emph{all} topology heuristics in every era (AUROC 0.601--0.658 vs. best-heuristic 0.525--0.538); on continued edges, the LLM (0.687) is competitive with Adamic-Adar (0.684). Critically, 78.6--82.7% of new collaborations occur between authors with no common neighbor -- a blind spot where all topology heuristics score zero but the LLM still achieves AUROC 0.652 by reasoning from author metadata alone. A temporal metadata ablation reveals that research concepts are the dominant signal (removing concepts drops AUROC by 0.047--0.084). Providing pre-computed graph features to the LLM emph{degrades} performance due to anchoring effects, confirming that LLMs and topology methods should operate as separate, complementary channels. A socio-cultural ablation finds that name-inferred ethnicity and institutional country do not predict collaboration beyond topology, reflecting the demographic homogeneity of AI research. A node2vec baseline achieves AUROC comparable to Adamic-Adar, establishing that LLMs access a fundamentally different information channel -- author metadata -- rather than encoding the same structural signal differently.
Paper AI Chat
この論文のPDF全文を対象にAIに質問できます。
質問の例: