AIDB Daily Papers
無関係な指示が大規模言語モデルを誘導できるか?
※ 日本語タイトル・ポイントはAIによる自動生成です。正確な内容は原論文をご確認ください。
ポイント
- 本研究では、タスクと無関係な指示(偽の指示)が大規模言語モデルの振る舞いを誘導できるかを検証した。
- 偽の指示は、タスク関連の指示よりも性能を向上させたり、意図しない振る舞いを引き起こしたりする可能性がある点で重要である。
- 偽の指示は、モデルの性能を向上させ、特定の回答選択肢を選ばせたり、不正確な回答を生成させたりすることが発見された。
Abstract
Large language models are highly sensitive to prompts, but this sensitivity is usually studied through task-relevant instructions, demonstrations, or reasoning cues. In this paper, we study a different form of prompt sensitivity: whether prompts that are semantically unrelated to the task can nevertheless steer model behavior. We call them spurious prompts and show their surprising efficacy. We also propose a simple black-box search procedure for discovering them. Across reasoning and question-answering benchmarks, using models ranging from 0.8B to 27B parameters and spanning three model families, we show that spurious prompts can improve performance, often matching or outperforming standard prompting baselines and task-aware prompt optimization. We further show that they can steer models toward unintended behaviors, such as repeatedly selecting the first answer option, producing incorrect answers, returning an even, prime or small number without explicitly instructing the model to do so. These findings reveal a new kind of prompt sensitivity: LLMs can be systematically steered by prompts that are unrelated to the task they are asked to solve. Our code is available at https://github.com/Batorskq/spurious
Paper AI Chat
この論文のPDF全文を対象にAIに質問できます。
質問の例: