AIDB Daily Papers

エッジLLM推論：持続的負荷下におけるモバイル、NPU、GPUの性能効率トレードオフ

原題: LLM Inference at the Edge: Mobile, NPU, and GPU Performance Efficiency Trade-offs Under Sustained Load

著者: Pranay Tummalapalli, Sahil Arayakandy, Ritam Pal, Kautuk Kundan

公開日: 2026-03-24 | 分野: LLM 効率化ベンチマーク推論機械学習システム実装エネルギーハードウェアモバイル NPU GPU エッジ性能

※ 日本語タイトル・ポイントはAIによる自動生成です。正確な内容は原論文をご確認ください。

ポイント

省電力環境下でのLLMオンデバイス実装に向け、各種プラットフォームでQwen 2.5 1.5Bの性能をベンチマークした。
モバイル環境では熱管理がボトルネックとなり、iPhone 16 ProやGalaxy S24 Ultraで性能低下や推論停止が発生した点が重要。
NPUのHailo-10Hは低消費電力で安定した性能を示し、RTX 4050 GPUはバッテリー制限下で高いスループットを維持した。

Abstract

Deploying large language models on-device for always-on personal agents demands sustained inference from hardware tightly constrained in power, thermal envelope, and memory. We benchmark Qwen 2.5 1.5B (4-bit quantised) across four platforms: a Raspberry Pi 5 with Hailo-10H NPU, a Samsung Galaxy S24 Ultra, an iPhone 16 Pro, and a laptop NVIDIA RTX 4050 GPU. Using a fixed 258-token prompt over 20 warm-condition iterations per device, we measure throughput, latency, power, and thermal behaviour. For mobile platforms, thermal management supersedes peak compute as the primary constraint: the iPhone 16 Pro loses nearly half its throughput within two iterations, and the S24 Ultra suffers a hard OS-enforced GPU frequency floor that terminates inference entirely. On dedicated hardware, distinct constraints dominate: the RTX 4050 is bounded by its battery power ceiling, while the Hailo-10H is limited by on-module memory bandwidth. The RTX 4050 sustains 131.7 tok/s at 34.1 W; the Hailo-10H sustains 6.9 tok/s at under 2 W with near-zero variance, matching the RTX 4050 in energy proportionality at 19x lower throughput. Results should be interpreted as platform-level deployment characterisations for a single model and prompt type, reflecting hardware and software combined, rather than general claims about hardware capability alone.

Paper AI Chat

この論文のPDF全文を対象にAIに質問できます。

質問の例:

AIチャット機能を利用するには、ログインまたは会員登録（無料）が必要です。

会員登録 / ログイン

💬 ディスカッション

ディスカッションに参加するにはログインが必要です。

ログイン / アカウント作成 →

arxivで読む PDFを開く

メタ情報

arxiv ID: 2603.23640
カテゴリ: cs.DC, cs.LG

ポイント

Abstract

Paper AI Chat

💬 ディスカッション

関連するAIDB記事

メタ情報