CN115756789B

Movatterモバイル変換

Info

Publication number: CN115756789B
Application number: CN202211456890.8A
Authority: CN
Inventors: 彭亚琼
Original assignee: Hunan University
Current assignee: Hunan University
Priority date: 2022-11-21
Filing date: 2022-11-21
Publication date: 2025-07-25
Anticipated expiration: 2042-11-21
Also published as: CN115756789A

Abstract

Translated fromChinese

本发明新型公开了一种针对深度学习推理服务系统的GPU调度优化方法，包括针对深度学习推理服务系统作初始化处理；分配获取的处理后的系统对应包含的所有模型，启动预测线程，周期性执行吞吐量需求预测流程，针对预测线程分配得到的模型预测新周期内的吞吐量需求；启动调度线程，采用获得的预测吞吐量需求，在调度时刻执行基于反馈控制策略的吞吐量调整流程，优化系统各模型的实际吞吐量分配。本发明能够动态预测各模型服务在独占GPU时的吞吐量，有效适应复杂多变的工作负载；满足部署在同一台服务器上各模型请求的不同延时和吞吐量需求，弥补了现有模型服务系统中的任务调度策略的不足。

The present invention discloses a GPU scheduling optimization method for a deep learning reasoning service system, including initializing the deep learning reasoning service system; allocating all models corresponding to the obtained processed system, starting a prediction thread, periodically executing a throughput demand prediction process, and predicting the throughput demand in a new cycle for the model assigned to the prediction thread; starting a scheduling thread, using the obtained predicted throughput demand, executing a throughput adjustment process based on a feedback control strategy at the scheduling moment, and optimizing the actual throughput allocation of each model in the system. The present invention can dynamically predict the throughput of each model service when it monopolizes the GPU, effectively adapting to complex and changeable workloads; meeting the different delay and throughput requirements of each model request deployed on the same server, and making up for the deficiencies of the task scheduling strategy in the existing model service system.