CN113535365B

Movatterモバイル変換

Info

Publication number: CN113535365B
Application number: CN202110874519.2A
Authority: CN
Inventors: 周悦媛; 杨康; 章家维; 邵恩; 谭光明
Original assignee: Western Research Institute Of China Science And Technology Computing Technology
Current assignee: Western Research Institute Of China Science And Technology Computing Technology
Priority date: 2021-07-30
Filing date: 2021-07-30
Publication date: 2024-12-03
Anticipated expiration: 2041-07-30
Also published as: CN113535365A

Abstract

Translated fromChinese

本发明涉及计算资源调度技术领域，具体公开了基于强化学习的深度学习训练作业资源放置系统及方法，方法包括如下步骤：随机初始化DRL神经网络模型的参数；生成批量作业的状态向量；将状态向量送入DRL神经网络模型中推理得到批量作业的放置位置信息，并按照该放置位置信息进行作业放置，得到批量作业运行的最大完成时间记为T_RL；随机生成若干放置位置信息，并按照该随机生成的放置位置信息进行作业放置，得到该批量作业的若干最大完成时间，取得其中最小的最大完成时间记为T_Random；基于最大完成时间T_RL和最大完成时间T_Random计算奖励；反向梯度更新DRL神经网络模型的参数。采用本发明的技术方案能够在资源出错场景下对DLT作业进行自适应放置。

The present invention relates to the technical field of computing resource scheduling, and specifically discloses a system and method for placing deep learning training job resources based on reinforcement learning, the method comprising the following steps: randomly initializing the parameters of a DRL neural network model; generating a state vector of a batch job; sending the state vector into the DRL neural network model to infer the placement position information of the batch job, and placing the job according to the placement position information, obtaining the maximum completion time of the batch job operation and recording it as T_RL; randomly generating a number of placement position information, and placing the job according to the randomly generated placement position information, obtaining a number of maximum completion times of the batch job, obtaining the smallest maximum completion time and recording it as T_Random; calculating rewards based on the maximum completion time T_RL and the maximum completion time T_Random; and updating the parameters of the DRL neural network model by reverse gradient. The technical solution of the present invention can be used to adaptively place DLT jobs in resource error scenarios.