BriefGPT.xyz
Jan, 2023
基于SMDP的GPU平台高效推断动态分批
SMDP-Based Dynamic Batching for Efficient Inference on GPU-Based Platforms
HTML
PDF
Yaodan Xu, Jingzhou Sun, Sheng Zhou, Zhisheng Niu
TL;DR
本文提出了一种动态批处理策略,该策略可以在GPU上实现高效算法和长时间响应之间的平衡,通过将GPU推理服务建模为一个批处理服务队列,然后将设计问题转化为半马尔可夫决策过程,通过解决一个相关的离散时间马尔可夫决策过程问题获得最优策略。
Abstract
In up-to-date
machine learning
(ML) applications on cloud or edge computing platforms,
batching
is an important technique for providing efficient and economical services at scale. In particular, parallel computin
→