AIDB Daily Papers

AIセラピストの壁：感情支援におけるコンテンツモデレーションの限界

原題: AI Content Moderation in Therapy Conversations

著者: Jiwon Kim, Claire Wang, Taeung Yoon, Sabelle Huang, Koustuv Saha

公開日: 2026-05-25 | 分野: LLM AI cs.AI cs.HC AI安全性 AI支援

※ 日本語タイトル・ポイントはAIによる自動生成です。正確な内容は原論文をご確認ください。

ポイント

本研究では、AIセラピストとしてのLLMの能力を評価するため、3つの最先端コンテンツモデレーションシステムを監査した。
LLMに組み込まれた安全ガードレールは、機密性の高い話題を扱えないため、セラピストとしての能力に影響を与える可能性がある。
監査の結果、これらのシステムが実際のセラピーセッションのコンテンツを不適切とフラグ付けする可能性が示唆された。

Abstract

Large language models (LLMs) are increasingly being used for emotional support. They are also being developed for formal therapy purposes. However, LLMs like ChaptGPT or Llama are often developed with content moderation guardrails that prevent them from discussing sensitive subjects with users for both liability and safety purposes, and this inability to broach these subjects may affect their capacity as therapists. In this study, we perform an algorithm audit on three state-of-the-art moderation systems (OpenAI's moderation endpoint, Meta's Llama Guard, and Google's Shield Gemma) to investigate the extent to which these systems flag the content of real-life therapy sessions as undesirable. Our results raise implications for the limitations that users and organizations may encounter when designing LLMs to play the part of a therapist.

Paper AI Chat

この論文のPDF全文を対象にAIに質問できます。

質問の例:

AIチャット機能を利用するには、ログインまたは会員登録（無料）が必要です。

会員登録 / ログイン

arXivで読む PDFを開く

メタ情報

arXiv ID: 2605.25454
カテゴリ: cs.HC, cs.AI, cs.CL, cs.CY, cs.SI

ポイント

Abstract

Paper AI Chat

関連するAIDB記事

メタ情報