AIDB Daily Papers

創造性を注入！たった64トークンでText-to-Imageモデルを劇的に進化させる「CAT」

原題: A Creative Agent is Worth a 64-Token Template

著者: Ruixiao Shi, Fu Feng, Yucheng Xie, Xu Yang, Jing Wang, Xin Geng

公開日: 2026-03-18 | 分野: 画像生成 Transformer 機械学習デザイン AI 画像創造

※ 日本語タイトル・ポイントはAIによる自動生成です。正確な内容は原論文をご確認ください。

ポイント

創造的なテキストから画像を生成する際、創造的意図の解釈不足を解消するCATフレームワークを提案した。
CATは、創造性を理解するエージェントの知識をトークン化し、反復的な推論なしに再利用可能なテンプレートを生成する。
建築・家具デザイン等の実験で、既存手法より高速かつ低コストで、高品質な画像を生成できることを示した。

Abstract

Text-to-image (T2I) models have substantially improved image fidelity and prompt adherence, yet their creativity remains constrained by reliance on discrete natural language prompts. When presented with fuzzy prompts such as ``a creative vinyl record-inspired skyscraper'', these models often fail to infer the underlying creative intent, leaving creative ideation and prompt design largely to human users. Recent reasoning- or agent-driven approaches iteratively augment prompts but incur high computational and monetary costs, as their instance-specific generation makes ``creativity'' costly and non-reusable, requiring repeated queries or reasoning for subsequent generations. To address this, we introduce textbf{CAT}, a framework for textbf{C}reative textbf{A}gent textbf{T}okenization that encapsulates agents' intrinsic understanding of ``creativity'' through a textit{Creative Tokenizer}. Given the embeddings of fuzzy prompts, the tokenizer generates a reusable token template that can be directly concatenated with them to inject creative semantics into T2I models without repeated reasoning or prompt augmentation. To enable this, the tokenizer is trained via creative semantic disentanglement, leveraging relations among partially overlapping concept pairs to capture the agent's latent creative representations. Extensive experiments on textbf{textit{Architecture Design}}, textbf{textit{Furniture Design}}, and textbf{textit{Nature Mixture}} tasks demonstrate that CAT provides a scalable and effective paradigm for enhancing creativity in T2I generation, achieving a $3.7times$ speedup and a $4.8times$ reduction in computational cost, while producing images with superior human preference and text-image alignment compared to state-of-the-art T2I models and creative generation methods.

Paper AI Chat

この論文のPDF全文を対象にAIに質問できます。

質問の例:

AIチャット機能を利用するには、ログインまたは会員登録（無料）が必要です。

会員登録 / ログイン

arXivで読む PDFを開く

メタ情報

arXiv ID: 2603.17895
カテゴリ: cs.CV

ポイント

Abstract

Paper AI Chat

関連するAIDB記事

メタ情報