AIDB Daily Papers
応答時間は異種好みの整合性を向上させる
※ 日本語タイトル・ポイントはAIによる自動生成です。正確な内容は原論文をご確認ください。
ポイント
- 本研究では、大規模言語モデルの学習において、単一の報酬モデルに集約する従来の評価手法の限界を指摘した。
- 応答時間を二次的な信号として追加することで、匿名ラベル付けにおける真の集団平均好みの識別可能性を回復させることを示した。
- 応答時間を用いた新しい推定手法は、従来の選択肢のみのデータに基づく手法よりも一貫して優れた性能を示し、バイアスを低減させた。
Abstract
Aligning large language models (LLMs) to human preferences typically relies on aggregating pooled feedback into a single reward model. However, this standard approach assumes that all labelers share the same underlying preferences, ignoring the fact that real-world labelers are highly heterogeneous and usually anonymous. Consequently, relying solely on binary choice data fundamentally distorts the learned policy, making the true population-average preference unidentifiable. To overcome this critical limitation, we demonstrate that augmenting preference datasets with a simple, secondary signal -- the user's response time -- can restore the identifiability of the population's average preference. By modeling each decision as a Drift-Diffusion Model (DDM), we introduce a novel, consistent estimator of heterogeneous preferences that successfully corrects the distortions of standard choice-only labels. We prove that our estimator asymptotically converges to the true average preference even in extreme cases where each anonymous labeler contributes only a single choice. Empirically, across both synthetic and real-world datasets, our method consistently outperforms standard baselines that otherwise fail and plateau at a bias floor. Because response times are essentially free to record and require zero user tracking or identification, our results bring promises and open up new opportunities for future data-collection pipelines to improve the social benefit without requiring user-level identifiers or repeated elicitations.
Paper AI Chat
この論文のPDF全文を対象にAIに質問できます。
質問の例: