AIDB Daily Papers

大規模言語モデルを活用したオープン科学データベースの自動スケーラブル開発

原題: Leveraging Large Language Models for Automated Scalable Development of Open Scientific Databases

著者: Nikita Gautam, Doina Caragea, Ignacio Ciampitti, Federico Gomez

公開日: 2026-03-07 | 分野: LLM NLP データセット検索情報検索科学テキスト自動化 API

※ 日本語タイトル・ポイントはAIによる自動生成です。正確な内容は原論文をご確認ください。

ポイント

大規模言語モデル（LLM）を活用し、キーワード検索、APIデータ取得、テキスト分類を統合した自動フレームワークを開発した。
手作業によるデータ収集の課題を解決し、信頼性の高いドメイン固有の科学データベースを効率的に構築できる点が重要である。
農業と収量に関連するタスクで90%の精度を達成し、手作業の負担を大幅に軽減できることを示した。

Abstract

With the exponential increase in online scientific literature, identifying reliable domain-specific data has become increasingly important but also very challenging. Manual data collection and filtering for domain-specific scientific literature is not only time-consuming but also labor-intensive and prone to errors and inconsistencies. To facilitate automated data collection, the paper introduces a web-based tool that leverages Large Language Models (LLMs) for automated and scalable development of open scientific databases. More specifically, the tool is based on an automated and unified framework that combines keyword-based querying, API-enabled data retrieval, and LLM-powered text classification to construct domain-specific scientific databases. Data is collected from multiple reliable data sources and search engines using a parallel querying technique to construct a combined unified dataset. The dataset is subsequently filtered using LLMs queried with prompts tailored for each keyword-based query to extract the relevant data to a scientific query of interest. The approach was tested across a set of variable keyword-based searches for different domain-specific tasks related to agriculture and crop yield. The results and analysis show 90% overlap with small domain expert-curated databases, suggesting that the proposed tool can be used to significantly reduce manual workload. Furthermore, the proposed framework is both scalable and domain-agnostic and can be applied across diverse fields for building scalable open scientific databases.

Paper AI Chat

この論文のPDF全文を対象にAIに質問できます。

質問の例:

AIチャット機能を利用するには、ログインまたは会員登録（無料）が必要です。

会員登録 / ログイン

arXivで読む PDFを開く

メタ情報

arXiv ID: 2603.07050
カテゴリ: cs.IR

ポイント

Abstract

Paper AI Chat

関連するAIDB記事

メタ情報