AIDB Daily Papers

大規模言語モデルの討論における集団的真実探求ダイナミクス：機械の社会的推論

原題: Social Reasoning in Machines: Investigating Collective Truth-Seeking Dynamics in Large Language Model Debate

著者: Tom Pecher

公開日: 2026-05-28 | 分野: LLM AI cs.CL cs.AI cs.MA AIエージェント

※ 日本語タイトル・ポイントはAIによる自動生成です。正確な内容は原論文をご確認ください。

ポイント

本研究は、大規模言語モデル（LLM）を用いた多者間討論（MAD）により、人間の社会的推論理論（ATR）を初めてシミュレーションした。
個々のLLMの性能が限定的でも、多様なモデルを適切に設計することで、質問応答タスクにおける真実探求性能が向上することを示した。
討論ダイナミクスの分析から、LLM-MADを用いた新しいベンチマーク手法を提案し、モデルの幻覚傾向などの特性測定を可能にした。

Abstract

Human reasoning has long been theorised to operate socially, not through isolated individual cognition, but through collective adversarial discourse, a framework known as the Argumentative Theory of Reasoning (ATR). Rather than relying on individual "intellectualist reasoners" as the primary vehicle for truth-seeking, ATR reconceptualises truth as an emergent property of social epistemology: the product of imperfect individual reasoning refined under the adversarial pressure of debate. This distributed method of collective intelligence has guided humanity to ever-greater epistemic heights and underpins the foundational principles of all democratic systems. This thesis breaks new ground by, for the first time, simulating ATR through the multi-agent debate (MAD) of large language models (LLMs). With rigorous empirical analysis, we demonstrate that, when correctly engineering an epistemically diverse set of models, LLM-MAD can significantly improve truth-seeking performance on questionnaire-based tasks, even when individual debate participants exhibit limited standalone performance. Furthermore, we present strong empirical evidence that this performance gain is mechanistically grounded in the central principles of ATR, suggesting that collective reasoning may be universally favourable over individualist reasoning, rather than a quirk in biology or evolution. Finally, drawing on our analysis of debate dynamics, we propose a novel benchmarking methodology that leverages LLM-MAD to measure intrinsic model properties (such as hallucination propensity) in order to compare models in ways that current static benchmarking approaches cannot support.

Paper AI Chat

この論文のPDF全文を対象にAIに質問できます。

質問の例:

AIチャット機能を利用するには、ログインまたは会員登録（無料）が必要です。

会員登録 / ログイン

arXivで読む PDFを開く

メタ情報

arXiv ID: 2605.30391
カテゴリ: cs.MA, cs.AI, cs.CL

ポイント

Abstract

Paper AI Chat

関連するAIDB記事

メタ情報