AIDB Daily Papers
大規模Windows脆弱性研究のためのLLM支援ターゲット選定手法
※ 日本語タイトル・ポイントはAIによる自動生成です。正確な内容は原論文をご確認ください。
ポイント
- 本研究では、Windows OSの膨大な関数群から脆弱性研究の対象となる関数を効率的に選定する手法を開発した。
- 既存手法では分析対象の特定がボトルネックであったが、本研究ではシンボル情報の復元や構造的特徴量の付与により、LLMを用いた優先度付けを実現した。
- 開発したパイプラインにより、700万以上の関数から約2万件の候補関数を抽出することに成功し、効率的な脆弱性研究を支援する基盤を提供した。
Abstract
The attack surface of a modern operating system is a haystack: thousands of signed binaries and millions of functions, almost none relevant to any given vulnerability. A human analyst or an LLM agent must pick the function worth reading before analyzing it. At whole-OS scope, this target selection, not the analysis, is the binding constraint. We present Symbolicate-Enrich-Sample, a low-cost batch pipeline that turns a corpus of production Windows binaries into a queryable, priority-ranked research queue. We (i) recover function-level symbols for stripped vendor binaries by auto-fetching the public symbol files and joining them to a recovered call graph; (ii) attach cheap, deterministic structural features to each named function and, conditioned on those features, use a low-cost language model to assign a reachability tier, a risk level, a bug-class hypothesis, and a rationale; and (iii) draw diverse, prioritized batches via a priority-weighted importance sampler. The contribution is a selection substrate: the prioritization layer a downstream detector or LLM agent runs on top of. Across a whole Windows image of 7,231,419 functions, the labels are markedly selective, and stacking deterministic filters on them leaves a ~22K-function shortlist: the candidate needles, few enough for a human or agent to work through. We characterize the pipeline's selectivity and its failure modes, describe the methodology, and report aggregate statistics; we withhold the derived dataset for legal and dual-use reasons.
Paper AI Chat
この論文のPDF全文を対象にAIに質問できます。
質問の例: