AIDB Daily Papers
思考の境界線:デュアルチューニングによるマルチモーダルタスクの推論適合性定量化
※ 日本語タイトル・ポイントはAIによる自動生成です。正確な内容は原論文をご確認ください。
ポイント
- 大規模言語モデルの推論能力が向上する一方、マルチモーダルなタスクへの有効性は不明確であり、本研究ではその課題に取り組む。
- Chain-of-ThoughtとDirect-Answerのデータを活用したDual Tuningを提案し、推論が有効なタスクを評価する基準「思考の境界線」を確立した。
- 空間、数学、学際領域など多様なタスクで検証し、推論が常に有効とは限らないことを示唆、データ選択と学習戦略への指針を提供する。
Abstract
While reasoning-enhanced Large Language Models (LLMs) have demonstrated remarkable advances in complex tasks such as mathematics and coding, their effectiveness across universal multimodal scenarios remains uncertain. The trend of releasing parallel "Instruct" and "Thinking" models by leading developers serves merely as a resource-intensive workaround, stemming from the lack of a criterion for determining when reasoning is truly beneficial. In this paper, we propose Dual Tuning, a framework designed to assess whether reasoning yields positive gains for target tasks under given base models and datasets. By jointly fine-tuning on paired Chain-of-Thought (CoT) and Direct-Answer (DA) data under controlled prompts, we systematically quantify and compare the gains of both training modes using the proposed metrics, and establish the "Thinking Boundary" to evaluate the suitability of reasoning training across diverse multimodal tasks, including spatial, mathematical, and multi-disciplinary domains. We further explore the impact of reinforcement training and thinking patterns on reasoning suitability, and validate whether the "Thinking Boundary" can guide data refinement. Our findings challenge the "reasoning-for-all" paradigm, providing practical guidance for identifying appropriate data and training strategies, and motivating the development of resource-efficient, adaptive auto-think systems.
Paper AI Chat
この論文のPDF全文を対象にAIに質問できます。
質問の例: