AIDB Daily Papers

WebVR：動画からのウェブページ再現におけるマルチモーダルLLMのベンチマーク（人間基準の視覚的評価基準を使用）

原題: WebVR: Benchmarking Multimodal LLMs for WebPage Recreation from Videos via Human-Aligned Visual Rubrics

著者: Yuhong Dai, Yanlin Lai, Mitt Huang, Hangyu Guo, Dingming Li, Hongbo Peng, Haodong Li, Yingxiu Zhao, Haoran Lyu, Zheng Ge, Xiangyu Zhang, Daxin Jiang

公開日: 2026-03-11 | 分野: LLM マルチモーダルコンピュータビジョンベンチマーク動画評価ウェブ

※ 日本語タイトル・ポイントはAIによる自動生成です。正確な内容は原論文をご確認ください。

ポイント

動画からウェブページを生成するタスクに着目し、既存研究の課題を克服する新たなベンチマークWebVRを提案しました。
WebVRは、相互作用やタイミングなど、動画が持つ豊富な情報を活用し、忠実なウェブページ再現を可能にする点で重要です。
19のモデルで実験した結果、スタイルやモーションの再現に課題が残る一方、自動評価は人間の好みに高い精度で一致しました。

Abstract

Existing web-generation benchmarks rely on text prompts or static screenshots as input. However, videos naturally convey richer signals such as interaction flow, transition timing, and motion continuity, which are essential for faithful webpage recreation. Despite this potential, video-conditioned webpage generation remains largely unexplored, with no dedicated benchmark for this task. To fill this gap, we introduce WebVR, a benchmark that evaluates whether MLLMs can faithfully recreate webpages from demonstration videos. WebVR contains 175 webpages across diverse categories, all constructed through a controlled synthesis pipeline rather than web crawling, ensuring varied and realistic demonstrations without overlap with existing online pages. We also design a fine-grained, human-aligned visual rubric that evaluates the generated webpages across multiple dimensions. Experiments on 19 models reveal substantial gaps in recreating fine-grained style and motion quality, while the rubric-based automatic evaluation achieves 96% agreement with human preferences. We release the dataset, evaluation toolkit, and baseline results to support future research on video-to-webpage generation.

Paper AI Chat

この論文のPDF全文を対象にAIに質問できます。

質問の例:

AIチャット機能を利用するには、ログインまたは会員登録（無料）が必要です。

会員登録 / ログイン

arXivで読む PDFを開く

メタ情報

arXiv ID: 2603.13391
カテゴリ: cs.CV

ポイント

Abstract

Paper AI Chat

関連するAIDB記事

メタ情報