AIDB Daily Papers

敵対的コンセプト探索：特徴幾何学からの合成エラー予測

原題: Adversarial Concept Search: Predicting Compositional Errors From Feature Geometry

著者: Jennifer Meng Lu, Ruochen Zhang, Isabelle Lee, David Alvarez-Melis, Ellie Pavlick, Naomi Saphra

公開日: 2026-06-11 | 分野: LLM 機械学習自然言語処理 cs.AI AI評価

※ 日本語タイトル・ポイントはAIによる自動生成です。正確な内容は原論文をご確認ください。

ポイント

LLMの内部表現の幾何学的構造を利用して、モデルが失敗する概念の組み合わせを予測する手法を提案した。
モデルの内部表現における特徴間の干渉が、合成エラーの主な原因であることを発見した点が新しい。
この手法により、具体的な入力評価なしに、モデルの失敗モードを高い精度で予測できることを示した。

Abstract

Humans cannot always intuit what scenarios are most challenging to LLMs. Hoping to capture challenging edge cases, developers either design problems to be difficult for humans or curate extensive benchmarks. What if we could instead anticipate which scenarios a model will fail on? In this paper, we use an LLM's representational geometry to predict which concept combinations it will fail on. We attribute this compositional failure to interference between salient features. In tasks that require systematic composition - toy programmatic settings, multihop reasoning, multilingual factual recall - we find that when a pair of concepts is encoded near-orthogonally, the model reliably composes them. When their linear encodings are close, producing interference, the model fails to compose them. Our method reliably anticipates failure modes across different compositional tasks, without evaluating specific inputs. These results lay the groundwork to use representational geometry to identify high-risk examples, construct targeted stress tests, and provide a scalable foundation for active learning in real-world deployment.

Paper AI Chat

この論文のPDF全文を対象にAIに質問できます。

質問の例:

AIチャット機能を利用するには、ログインまたは会員登録（無料）が必要です。

会員登録 / ログイン

arXivで読む PDFを開く

メタ情報

arXiv ID: 2606.13934
カテゴリ: cs.AI

ポイント

Abstract

Paper AI Chat

関連するAIDB記事

メタ情報