AIDB Daily Papers

CR4T：思春期LLMの安全性を高める批判・修正型ガードレール

原題: CR4T: Rewrite-Based Guardrails for Adolescent LLM Safety

著者: Heajun An, Qi Zhang, Vedanth Achanta, Jin-Hee Cho

公開日: 2026-05-20 | 分野: LLM cs.CL cs.AI cs.CY AI安全性 AI支援

※ 日本語タイトル・ポイントはAIによる自動生成です。正確な内容は原論文をご確認ください。

ポイント

本研究は、成人中心の安全対策では不十分な思春期LLMの安全性向上を目指し、批判・修正（CR4T）フレームワークを提案した。
CR4Tは、不適切または拒否的な出力を、年齢に適した指導的な応答に書き換えることで、対話の継続性と発達的配慮を両立させる。
実験の結果、CR4Tは不安全な出力を大幅に削減しつつ、許容可能な対話への過剰な介入を避けることが示された。

Abstract

Large language models (LLMs) are increasingly embedded in adolescent digital environments, mediating information seeking, advice, and emotionally sensitive interactions. Yet existing safety mechanisms remain largely grounded in adult-centric norms and operationalize safety through refusal-oriented suppression. While such approaches may reduce immediate policy violations, they can also create conversational dead-ends, limit constructive guidance, and fail to address the developmental vulnerabilities inherent in adolescent-AI interactions. We argue that adolescent LLM safety should be framed not solely as a filtering problem, but as a socio-technical, developmentally aligned transformation problem. To operationalize this perspective, we propose Critique-and-Revise-for-Teenagers (CR4T), a model-agnostic safeguarding framework that selectively reconstructs unsafe or refusal-style outputs into ageappropriate, guidance-oriented responses while preserving benign intent. CR4T combines lightweight risk detection with domain-conditioned rewriting to remove risk-amplifying content, reduce unnecessary conversational shutdown, and introduce developmentally appropriate guidance. Experimental results show that targeted rewriting substantially reduces unsafe and refusal-oriented outcomes while avoiding unnecessary intervention on acceptable interactions. These findings suggest that selective response reconstruction offers a more human-centered alternative to refusal-centric guardrails for adolescent-facing LLM systems.

Paper AI Chat

この論文のPDF全文を対象にAIに質問できます。

質問の例:

AIチャット機能を利用するには、ログインまたは会員登録（無料）が必要です。

会員登録 / ログイン

arXivで読む PDFを開く

メタ情報

arXiv ID: 2605.21609
カテゴリ: cs.CL, cs.AI, cs.CY

ポイント

Abstract

Paper AI Chat

関連するAIDB記事

メタ情報