AIDB Daily Papers

「全無視指令」：LLMソーシャルメディアボットに対抗する非暴力的な平和構築としてのジェイルブレイク

原題: Ignore All Previous Instructions: Jailbreaking as a de-escalatory peace building practise to resist LLM social media bots

著者: Huw Day, Adrianna Jezierska, Jessica Woodgate

公開日: 2026-03-02 | 分野: LLM NLP 安全性人間セキュリティ AI 政治情報抽出対話

※ 日本語タイトル・ポイントはAIによる自動生成です。正確な内容は原論文をご確認ください。

ポイント

LLMはソーシャルメディア上の政治的言説の操作を激化させ、対立をエスカレートさせている。
プラットフォーム主導の対策が中心だが、本研究ではユーザー主導のジェイルブレイクを非暴力的な対抗手段として提案する。
ユーザーはLLMの安全対策を回避し、自動化された挙動を露呈させ、誤解を招く情報の拡散を阻止することに成功した。

Abstract

Large Language Models have intensified the scale and strategic manipulation of political discourse on social media, leading to conflict escalation. The existing literature largely focuses on platform-led moderation as a countermeasure. In this paper, we propose a user-centric view of "jailbreaking" as an emergent, non-violent de-escalation practice. Online users engage with suspected LLM-powered accounts to circumvent large language model safeguards, exposing automated behaviour and disrupting the circulation of misleading narratives.

Paper AI Chat

この論文のPDF全文を対象にAIに質問できます。

質問の例:

AIチャット機能を利用するには、ログインまたは会員登録（無料）が必要です。

会員登録 / ログイン

💬 ディスカッション

ディスカッションに参加するにはログインが必要です。

ログイン / アカウント作成 →

arxivで読む PDFを開く

メタ情報

arxiv ID: 2603.01942
カテゴリ: cs.HC, cs.AI

ポイント

Abstract

Paper AI Chat

💬 ディスカッション

関連するAIDB記事

メタ情報