AIDB Daily Papers
思考のネットワーク:複雑な推論タスクのための新たなパラダイム
※ 日本語タイトル・ポイントはAIによる自動生成です。正確な内容は原論文をご確認ください。
ポイント
- 複雑な推論を、ノードとエッジを持つ有向グラフとしてモデル化するNetwork-of-Thought(NoT)フレームワークを提案した。
- NoTは、中間結果の統合、仮説の再検討、複数ソースからの証拠統合を可能にし、複雑な推論タスクにおいて重要な役割を果たす。
- GSM8Kなどのベンチマークで、NoTはChain-of-ThoughtやTree-of-Thoughtを上回り、特にマルチホップ推論で高い精度を達成した。
Abstract
Existing prompting paradigms structure LLM reasoning in limited topologies: Chain-of-Thought (CoT) produces linear traces, while Tree-of-Thought (ToT) performs branching search. Yet complex reasoning often requires merging intermediate results, revisiting hypotheses, and integrating evidence from multiple sources. We propose Network-of-Thought (NoT), a framework that models reasoning as a directed graph with typed nodes and edges, guided by a heuristic-based controller policy. Across four benchmarks (GSM8K, Game of 24, HotpotQA, ProofWriter) and three models (GPT-4o-mini, Llama-3.3-70B-Instruct, Qwen2.5-72B-Instruct), we investigate when network topology outperforms chain or tree structures, whether LLM-generated heuristics can guide graph-based reasoning search, and the computation-accuracy tradeoff across topologies, evaluating each method on accuracy, topology simplicity, and token efficiency. Our results show that CoT remains effective for sequential tasks with GPT-4o-mini (89.5% on GSM8K), while NoT surpasses ToT on multi-hop reasoning (91.0% vs. 88.0% on HotpotQA with LLM-as-Judge). With 72B open-source models, NoT achieves the highest accuracy on GSM8K (91.5%), and Qwen2.5-72B achieves the best multi-hop QA result overall (91.7% on HotpotQA). Self-generated controller heuristics outperform fixed and random strategies on logical reasoning, with uncertainty-only weighting achieving 57.0% on ProofWriter. We also find that evaluation methodology significantly impacts method rankings: string-match underestimates all methods on open-ended QA, with the largest gap for NoT, a pattern consistent across all three models (14--18 percentage point gap on HotpotQA).
Paper AI Chat
この論文のPDF全文を対象にAIに質問できます。
質問の例: