AIDB Daily Papers

LLMの指示を模倣するアクティベーション操作：プロンプト誘導を再現する

原題: Steer Like the LLM: Activation Steering that Mimics Prompting

著者: Geert Heyman, Frederik Vandeputte

公開日: 2026-05-05 | 分野: LLM NLP Transformer 機械学習 AI cs.CL cs.AI cs.LG

※ 日本語タイトル・ポイントはAIによる自動生成です。正確な内容は原論文をご確認ください。

ポイント

プロンプト誘導をアクティベーション操作の一種とみなし、その挙動を模倣するモデルを提案しました。
既存のアクティベーション操作手法はプロンプト誘導のメカニズムを忠実に再現しておらず、性能が低いという課題がありました。
提案手法は既存手法を上回り、プロンプト誘導と同等以上の性能を示すことを実験で確認しました。

Abstract

Large language models can be steered at inference time through prompting or activation interventions, but activation steering methods often underperform compared to prompt-based approaches. We propose a framework that formulates prompt steering as a form of activation steering and investigates whether distilling successful prompt steering behavior into simpler, interpretable models can close this gap. Our analysis reveals that popular activation steering methods are not faithful to the mechanics of prompt steering, which applies strong interventions on some tokens while barely affecting others. Based on these insights, we introduce Prompt Steering Replacement (PSR) models that estimate token-specific steering coefficients from the activations themselves and are trained to imitate prompt-based interventions. Experiments on three steering benchmarks across multiple language models show that PSR models outperform existing activation steering methods, especially when controlling for high-coherence completions, and also compare favorably to prompting on AxBench and persona steering.

Paper AI Chat

この論文のPDF全文を対象にAIに質問できます。

質問の例:

AIチャット機能を利用するには、ログインまたは会員登録（無料）が必要です。

会員登録 / ログイン

arXivで読む PDFを開く

メタ情報

arXiv ID: 2605.03907
カテゴリ: cs.CL, cs.AI, cs.LG

ポイント

Abstract

Paper AI Chat

関連するAIDB記事

メタ情報