AIDB Daily Papers

AIコーディングツールの設定を大規模に収集・公開したデータセット

原題: A Dataset of Agentic AI Coding Tool Configurations

著者: Matthias Galster, Seyedmoein Mohsenimofidi, Levi Böhme, Jai Lal Lulla, Muhammad Auwal Abubakar, Christoph Treude, Sebastian Baltes

公開日: 2026-05-08 | 分野: LLM データセット AI GitHub ソフトウェアエンジニアリング cs.SE AIエージェント

※ 日本語タイトル・ポイントはAIによる自動生成です。正確な内容は原論文をご確認ください。

ポイント

AIコーディングツールの設定アーティファクトを大規模に収集・整理したデータセットを構築した。
既存のデータセットには存在しない、AIコーディングツールの設定メカニズムに関する初の包括的なデータセットである。
このデータセットは、AIによるコード生成の研究やAIツールの採用パターン分析に貢献するものである。

Abstract

Agentic AI coding tools such as Claude Code and OpenAI Codex execute multi-step coding tasks with limited human oversight. To steer these tools, developers create repository-level configuration artifacts (e.g., Markdown files) for configuration mechanisms such as Context Files, Skills, Rules, and Hooks. There is no curated dataset yet that captures these configurations at scale. This dataset, collected from open-source GitHub repositories, fills that gap. We selected 40,585 actively maintained repositories through metadata filtering, classified them using GPT-5.2 to identify 36,710 as belonging to engineered software projects, and systematically detected configuration artifacts in these repositories. The dataset covers 4,738 repositories across five tools (Claude Code, GitHub Copilot, OpenAI Codex, Cursor, Gemini) and eight configuration mechanisms. We collected 15,591 configuration artifacts, the full content of 18,167 configuration files associated with these configuration artifacts, and 148,519 AI-co-authored commits. The dataset and the construction pipeline are publicly available on Zenodo under CC BY 4.0. An interactive website allows researchers to browse and explore the data. This data supports research on context engineering, AI tool adoption patterns, and human-AI collaboration.

Paper AI Chat

この論文のPDF全文を対象にAIに質問できます。

質問の例:

AIチャット機能を利用するには、ログインまたは会員登録（無料）が必要です。

会員登録 / ログイン

arXivで読む PDFを開く

メタ情報

arXiv ID: 2605.08435
カテゴリ: cs.SE

ポイント

Abstract

Paper AI Chat

関連するAIDB記事

メタ情報