AIDB Daily Papers
AIエージェント間の信頼:形成、破綻、回復の測定とマルチエージェントシステム統治への示唆
※ 日本語タイトル・ポイントはAIによる自動生成です。正確な内容は原論文をご確認ください。
ポイント
- AIエージェント間の信頼を測定するための行動ベースの検証手法を提案し、協力ゲームでその有効性を示した。
- frontierモデルの6つのスナップショットを分析した結果、モデルによって信頼の形成、破綻、回復の挙動が異なることが明らかになった。
- 信頼の形成は検証コストの削減と意思決定の迅速化をもたらし、統治においては過度な不信より適切な信頼の調整が重要であると結論づけた。
Abstract
As language-model agents increasingly work in teams, each agent must decide how much to trust its teammates. Yet we lack a standard way to measure trust between AI agents. We propose a behavioral measure based on costly verification. In a cooperative survival game, checking a teammate's work consumes resources, while trusting a wrong answer can be fatal. Relative to a memoryless version of the same model, reduced verification provides an observable measure of trust. Using this framework, we study trust formation, breakage, and recovery across six frontier model snapshots. When paired with a consistently reliable teammate, four snapshots (Claude Opus 4.6, Claude Sonnet 4.6, GPT-5.1, and Gemini 3.1 Pro) reduce verification by roughly 60-85%, whereas two smaller snapshots show little or no such adjustment. Failures reverse this discount, but models differ in how they respond. Some concentrate renewed scrutiny on the culprit, while others become more cautious toward the entire team. Recovery is slower than formation, and clustered failures sustain suspicion far longer than the same number of failures spread apart. These differences have practical consequences. Models that form trust verify less, decide more quickly, and achieve higher payoffs in our environment. By contrast, persistent over-verification is associated with indecision rather than safety. Our results show that trust dispositions can be measured before deployment and suggest that calibration, rather than maximal suspicion, should be the central concern in the governance of multi-agent AI systems.
Paper AI Chat
この論文のPDF全文を対象にAIに質問できます。
質問の例: