AIDB Daily Papers
CR4T:思春期LLMの安全性を高める批判・修正型ガードレール
※ 日本語タイトル・ポイントはAIによる自動生成です。正確な内容は原論文をご確認ください。
ポイント
- 本研究は、成人中心の安全対策では不十分な思春期LLMの安全性向上を目指し、批判・修正(CR4T)フレームワークを提案した。
- CR4Tは、不適切または拒否的な出力を、年齢に適した指導的な応答に書き換えることで、対話の継続性と発達的配慮を両立させる。
- 実験の結果、CR4Tは不安全な出力を大幅に削減しつつ、許容可能な対話への過剰な介入を避けることが示された。
Abstract
Large language models (LLMs) are increasingly embedded in adolescent digital environments, mediating information seeking, advice, and emotionally sensitive interactions. Yet existing safety mechanisms remain largely grounded in adult-centric norms and operationalize safety through refusal-oriented suppression. While such approaches may reduce immediate policy violations, they can also create conversational dead-ends, limit constructive guidance, and fail to address the developmental vulnerabilities inherent in adolescent-AI interactions. We argue that adolescent LLM safety should be framed not solely as a filtering problem, but as a socio-technical, developmentally aligned transformation problem. To operationalize this perspective, we propose Critique-and-Revise-for-Teenagers (CR4T), a model-agnostic safeguarding framework that selectively reconstructs unsafe or refusal-style outputs into ageappropriate, guidance-oriented responses while preserving benign intent. CR4T combines lightweight risk detection with domain-conditioned rewriting to remove risk-amplifying content, reduce unnecessary conversational shutdown, and introduce developmentally appropriate guidance. Experimental results show that targeted rewriting substantially reduces unsafe and refusal-oriented outcomes while avoiding unnecessary intervention on acceptable interactions. These findings suggest that selective response reconstruction offers a more human-centered alternative to refusal-centric guardrails for adolescent-facing LLM systems.
Paper AI Chat
この論文のPDF全文を対象にAIに質問できます。
質問の例: