AIDB Daily Papers

ソーシャルサービスにおけるLLM：チャットボットの精度は人間の精度にどう影響するか？

原題: LLMs in social services: How does chatbot accuracy affect human accuracy?

著者: Jennah Gosciak, Eric Giannella, Zhaowen Guo, Michael Chen, Allison Koenecke

公開日: 2026-03-11 | 分野: LLM NLP ベンチマーク人間 AI 対話評価情報心理社会行動質問応答福祉精度実験

※ 日本語タイトル・ポイントはAIによる自動生成です。正確な内容は原論文をご確認ください。

ポイント

フードスタンプなどの複雑なソーシャルサービスに関する質問応答ベンチマークを構築し、人間の精度に対するLLMの影響を測定した。
非営利団体のケースワーカーを対象とした実験で、チャットボットの精度が低いと人間の精度が低下し、高いと精度が向上することが示された。
チャットボットの精度向上に伴う人間の精度向上は頭打ちになる「AI過小依存プラトー」現象が確認され、実用上の懸念が示唆された。

Abstract

Social service programs like the Supplemental Nutrition Assistance Program (SNAP, or food stamps) have eligibility rules that can be challenging to understand. For nonprofit caseworkers who often support clients in navigating a dozen or more complex programs, LLM-based chatbots may offer a means to provide better, faster help to clients whose situations may be less common. In this paper, we measure the potential effects of LLM-based chatbot suggestions on caseworkers' ability to provide accurate guidance. We first created a 770-question multiple-choice benchmark dataset of difficult, but realistic questions that a caseworker might receive. Next, using these benchmark questions and corresponding expert-verified answers, we conducted a randomized experiment with caseworkers recruited from nonprofit outreach organizations in Los Angeles. Caseworkers in the control condition did not see chatbot suggestions and had a mean accuracy of 49%. Caseworkers in the treatment condition saw chatbot suggestions that we artificially varied to range in aggregate accuracy from low (53%) to high (100%). Caseworker performance significantly improves as chatbot quality improves: high-quality chatbots (96-100% accurate) improved caseworker accuracy by 27 percentage points. At the question-level, incorrect chatbot suggestions substantially reduce caseworker accuracy, with a two-thirds reduction on easy questions where the control group performed best (without chatbot suggestions). Finally, improvements in caseworker accuracy level off as chatbot accuracy increases, a phenomenon that we call the "AI underreliance plateau," which is a concern for real-world deployment and highlights the importance of evaluating human-in-the-loop tools with their users.

Paper AI Chat

この論文のPDF全文を対象にAIに質問できます。

質問の例:

AIチャット機能を利用するには、ログインまたは会員登録（無料）が必要です。

会員登録 / ログイン

arXivで読む PDFを開く

メタ情報

arXiv ID: 2603.11213
カテゴリ: cs.HC, cs.CY

ポイント

Abstract

Paper AI Chat

関連するAIDB記事

メタ情報