AIDB Daily Papers

マルチモーダルLLMによるゲーム実況のリアルタイム生成：ポーズを考慮したデコーディング

原題: Real-Time Generation of Game Video Commentary with Multimodal LLMs: Pause-Aware Decoding Approaches

著者: Anum Afzal, Yuki Saito, Hiroya Takamura, Katsuhito Sudoh, Shinnosuke Takamichi, Graham Neubig, Florian Matthes, Tatsuya Ishigaki

公開日: 2026-03-03 | 分野: LLM NLP マルチモーダルベンチマークゲーム AI 動画

※ 日本語タイトル・ポイントはAIによる自動生成です。正確な内容は原論文をご確認ください。

ポイント

マルチモーダルLLMを用いて、ゲーム動画内のイベントをリアルタイムにテキストで解説するシステムを開発しました。
従来のプロンプトによる手法は内容生成に優れる一方、タイミングが課題でしたが、本研究ではファインチューニングなしで改善します。
動的な間隔調整デコーディングにより、人間の発話タイミングと内容に合致した実況生成を実現し、多言語ベンチマークも公開しました。

Abstract

Real-time video commentary generation provides textual descriptions of ongoing events in videos. It supports accessibility and engagement in domains such as sports, esports, and livestreaming. Commentary generation involves two essential decisions: what to say and when to say it. While recent prompting-based approaches using multimodal large language models (MLLMs) have shown strong performance in content generation, they largely ignore the timing aspect. We investigate whether in-context prompting alone can support real-time commentary generation that is both semantically relevant and well-timed. We propose two prompting-based decoding strategies: 1) a fixed-interval approach, and 2) a novel dynamic interval-based decoding approach that adjusts the next prediction timing based on the estimated duration of the previous utterance. Both methods enable pause-aware generation without any fine-tuning. Experiments on Japanese and English datasets of racing and fighting games show that the dynamic interval-based decoding can generate commentary more closely aligned with human utterance timing and content using prompting alone. We release a multilingual benchmark dataset, trained models, and implementations to support future research on real-time video commentary generation.

Paper AI Chat

この論文のPDF全文を対象にAIに質問できます。

質問の例:

AIチャット機能を利用するには、ログインまたは会員登録（無料）が必要です。

会員登録 / ログイン

💬 ディスカッション

ディスカッションに参加するにはログインが必要です。

ログイン / アカウント作成 →

arxivで読む PDFを開く

メタ情報

arxiv ID: 2603.02655
カテゴリ: cs.CL, cs.AI

ポイント

Abstract

Paper AI Chat

💬 ディスカッション

関連するAIDB記事

メタ情報