Movatterモバイル変換


[0]ホーム

URL:


CN116415682A - Operator processing method, processor, device and medium of machine learning processor - Google Patents

Operator processing method, processor, device and medium of machine learning processor
Download PDF

Info

Publication number
CN116415682A
CN116415682ACN202111672390.3ACN202111672390ACN116415682ACN 116415682 ACN116415682 ACN 116415682ACN 202111672390 ACN202111672390 ACN 202111672390ACN 116415682 ACN116415682 ACN 116415682A
Authority
CN
China
Prior art keywords
operator
distribution
code template
replaced
registration
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111672390.3A
Other languages
Chinese (zh)
Inventor
请求不公布姓名
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Cambricon Information Technology Co Ltd
Original Assignee
Shanghai Cambricon Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Cambricon Information Technology Co LtdfiledCriticalShanghai Cambricon Information Technology Co Ltd
Priority to CN202111672390.3ApriorityCriticalpatent/CN116415682A/en
Publication of CN116415682ApublicationCriticalpatent/CN116415682A/en
Pendinglegal-statusCriticalCurrent

Links

Images

Classifications

Landscapes

Abstract

Translated fromChinese

本申请公开了一种机器学习处理器的算子处理方法、处理器、装置和介质。该方法包括:获取第一算子配置信息字典;基于不同的算子库名,获取算子的注册代码模板和/或分发代码模板,其中,注册代码模板和/或分发代码模板包括待替换参数,待替换参数标识算子与内核库的对应关系;根据算子的注册代码模板和/或分发代码模板,筛选合并第一算子配置信息字典,以得到第二算子配置信息字典;根据第二算子配置信息字典更新注册代码模板和/或分发代码模板中的待替换参数,以实现将算子注册和/或分发到一个或多个内核库。该方法能够提高算子注册和/或分发效率,让算子在同一后端中实现对多个内核库的分发。

Figure 202111672390

The present application discloses an operator processing method, processor, device and medium of a machine learning processor. The method includes: obtaining a first operator configuration information dictionary; obtaining a registration code template and/or distribution code template of the operator based on different operator database names, wherein the registration code template and/or distribution code template include parameters to be replaced , the parameter to be replaced identifies the corresponding relationship between the operator and the kernel library; according to the registration code template and/or distribution code template of the operator, filter and merge the first operator configuration information dictionary to obtain the second operator configuration information dictionary; Two, the operator configuration information dictionary updates the parameters to be replaced in the registration code template and/or the distribution code template, so as to register and/or distribute the operator to one or more kernel libraries. This method can improve operator registration and/or distribution efficiency, allowing operators to distribute multiple kernel libraries in the same backend.

Figure 202111672390

Description

Translated fromChinese
机器学习处理器的算子处理方法、处理器、装置和介质Operator processing method, processor, device and medium of machine learning processor

技术领域technical field

本申请涉及人工智能技术领域,尤其涉及一种机器学习处理器的算子处理方法、处理器、装置和介质。The present application relates to the technical field of artificial intelligence, and in particular to an operator processing method, processor, device and medium of a machine learning processor.

背景技术Background technique

机器学习处理器(MLU,MachineLearningUnit)在调用算子进行运算时,需要提前将算子注册并分发到CPU(Central Processing Unit)、CUDA(Compute Unified DeviceArchitecture,统一计算设备架构)等后端及对应的底层kernel。目前,仅支持将一种后端的某个算子,自动注册并分发到唯一对应的底层kernel,无法实现将一种后端的算子自动注册和/或分发到不同的底层kernel。算子在不同底层kernel的注册和/或分发效率较低。When a machine learning processor (MLU, MachineLearningUnit) calls an operator to perform operations, it needs to register the operator in advance and distribute it to the backend such as the CPU (Central Processing Unit), CUDA (Compute Unified Device Architecture, unified computing device architecture) and the corresponding The underlying kernel. Currently, it is only supported to automatically register and distribute an operator of one type of backend to the unique corresponding underlying kernel, and it is impossible to automatically register and/or distribute operators of one type of backend to different underlying kernels. The registration and/or distribution of operators in different underlying kernels is inefficient.

发明内容Contents of the invention

有鉴于此,本申请实施例提供了一种机器学习处理器的算子处理方法、处理器、装置和介质,用以解决算子在不同底层kernel的注册和/或分发效率较低的问题。In view of this, the embodiments of the present application provide an operator processing method, processor, device, and medium of a machine learning processor, so as to solve the problem of low registration and/or distribution efficiency of operators in different underlying kernels.

第一方面,本申请实施例提供了一种机器学习处理器的算子处理方法,所述方法包括:In the first aspect, the embodiment of the present application provides an operator processing method of a machine learning processor, the method comprising:

获取第一算子配置信息字典,所述第一算子配置信息字典包括算子库名、算子名称、内核库名和算子参数列表;Obtain a first operator configuration information dictionary, where the first operator configuration information dictionary includes an operator library name, an operator name, a kernel library name, and an operator parameter list;

基于不同的所述算子库名,获取算子的注册代码模板和/或分发代码模板,其中,所述注册代码模板和/或所述分发代码模板包括待替换参数,所述待替换参数标识所述算子与所述内核库的对应关系;Obtain the registration code template and/or the distribution code template of the operator based on the different operator library names, wherein the registration code template and/or the distribution code template include parameters to be replaced, and the parameters to be replaced identify The corresponding relationship between the operator and the kernel library;

根据所述算子的所述注册代码模板和/或所述分发代码模板,筛选合并所述第一算子配置信息字典,以得到第二算子配置信息字典;Filtering and merging the first operator configuration information dictionary according to the registration code template and/or the distribution code template of the operator to obtain a second operator configuration information dictionary;

根据所述第二算子配置信息字典更新所述注册代码模板和/或所述分发代码模板中的所述待替换参数,以实现将所述算子注册和/或分发到一个或多个所述内核库。Update the registration code template and/or the parameters to be replaced in the distribution code template according to the second operator configuration information dictionary, so as to register and/or distribute the operator to one or more Described kernel library.

如上所述的方面和任一可能的实现方式,进一步提供一种实现方式,所述获取第一算子配置信息字典,包括:According to the foregoing aspect and any possible implementation manner, an implementation manner is further provided, the acquiring the first operator configuration information dictionary includes:

获取算子配置文件;Obtain the operator configuration file;

解析所述算子配置文件,得到第一算子配置信息字典。The operator configuration file is parsed to obtain the first operator configuration information dictionary.

如上所述的方面和任一可能的实现方式,进一步提供一种实现方式,所述解析所述算子配置文件,得到第一算子配置信息字典,包括:According to the above aspect and any possible implementation manner, an implementation manner is further provided, the parsing the operator configuration file to obtain the first operator configuration information dictionary includes:

匹配所述算子配置文件中的所述算子库名、所述算子名称、所述内核库名和所述算子参数列表,得到所述第一算子配置信息字典。Matching the operator library name, the operator name, the kernel library name and the operator parameter list in the operator configuration file to obtain the first operator configuration information dictionary.

如上所述的方面和任一可能的实现方式,进一步提供一种实现方式,在所述获取算子的注册代码模板和/或分发代码模板之前,所述方法还包括:According to the above-mentioned aspect and any possible implementation manner, an implementation manner is further provided. Before the acquisition of the registration code template and/or the distribution code template of the operator, the method further includes:

获取用户输入的注册定义函数和/或分发定义函数,其中,所述待替换参数包括待替换注册参数和/或待替换分发参数,所述注册定义函数包括待替换注册参数,所述分发定义函数包括待替换分发参数;Obtain the registration definition function and/or distribution definition function input by the user, wherein the parameters to be replaced include registration parameters to be replaced and/or distribution parameters to be replaced, the registration definition function includes registration parameters to be replaced, and the distribution definition function Include distribution parameters to be replaced;

根据所述注册定义函数和/或所述分发定义函数建立所述算子的所述注册代码模板和/或所述分发代码模板。The registration code template and/or the distribution code template of the operator is established according to the registration definition function and/or the distribution definition function.

如上所述的方面和任一可能的实现方式,进一步提供一种实现方式,所述根据算子的所述注册代码模板和/或所述分发代码模板,筛选合并所述第一算子配置信息字典,以得到第二算子配置信息字典,包括:According to the above aspect and any possible implementation manner, an implementation manner is further provided, the first operator configuration information is screened and merged according to the registration code template and/or the distribution code template of the operator Dictionary to get the second operator configuration information dictionary, including:

从所述注册代码模板和/或所述分发代码模板中解析得到所述待替换注册参数和/或所述待替换分发参数;Obtaining the registration parameter to be replaced and/or the distribution parameter to be replaced by parsing from the registration code template and/or the distribution code template;

基于键值对对应关系,根据所述待替换注册参数和/或所述待替换分发参数得到替换参数;Based on the key-value pair correspondence, the replacement parameter is obtained according to the registration parameter to be replaced and/or the distribution parameter to be replaced;

根据所述替换参数生成待合并键值对组;Generate a key-value pair to be merged according to the replacement parameters;

将所述待合并键值对组合并到所述第一算子配置信息字典中,得到所述第二算子配置信息字典。Merge the to-be-merged key-value pair into the first operator configuration information dictionary to obtain the second operator configuration information dictionary.

如上所述的方面和任一可能的实现方式,进一步提供一种实现方式,所述从所述注册代码模板和/或所述分发代码模板中解析得到所述待替换注册参数和所述待替换分发参数,包括:According to the above aspect and any possible implementation manner, an implementation manner is further provided, wherein the registration parameter to be replaced and the registration parameter to be replaced are obtained by parsing from the registration code template and/or the distribution code template. Distribution parameters, including:

匹配并确定所述注册代码模板和/或所述分发代码模板中的替换标识符;matching and determining replacement identifiers in said registration code template and/or said distribution code template;

将所述替换标识符后携带的参数作为所述待替换注册参数和/或所述待替换分发参数,得到所述待替换注册参数和/或所述待替换分发参数。The parameter carried after the replacement identifier is used as the registration parameter to be replaced and/or the distribution parameter to be replaced to obtain the registration parameter to be replaced and/or the distribution parameter to be replaced.

如上所述的方面和任一可能的实现方式,进一步提供一种实现方式,根据所述待替换注册参数和/或所述待替换分发参数,筛选合并所述第一算子配置信息字典得到替换参数,包括:According to the above aspect and any possible implementation, an implementation is further provided, according to the registration parameter to be replaced and/or the distribution parameter to be replaced, filter and merge the first operator configuration information dictionary to obtain the replacement parameters, including:

根据所述算子库名、所述算子名称、所述内核库名和所述算子参数列表进行键值对筛选,得到所述替换参数。Perform key-value pair filtering according to the operator library name, the operator name, the kernel library name, and the operator parameter list to obtain the replacement parameters.

如上所述的方面和任一可能的实现方式,进一步提供一种实现方式,所述分发定义函数包括入口函数、算子库确定函数和内核库确定函数。According to the foregoing aspect and any possible implementation manner, an implementation manner is further provided, wherein the distribution definition function includes an entry function, an operator library determination function, and a kernel library determination function.

如上所述的方面和任一可能的实现方式,进一步提供一种实现方式,所述方法还包括:According to the above aspects and any possible implementation, an implementation is further provided, the method further includes:

将更新后的所述注册代码模板和所述分发代码模板写入到待编译文件中;Writing the updated registration code template and the distribution code template into the file to be compiled;

编译所述待编译文件,根据所述入口函数确定执行算子分发操作;Compile the file to be compiled, and determine to execute the operator distribution operation according to the entry function;

确定执行所述算子分发操作后,根据所述算子库确定函数确定分发的算子库;After determining to execute the operator distribution operation, determine the distributed operator library according to the operator library determination function;

确定所述分发的算子库后,根据所述内核库确定函数确定分发的内核库。After the distributed operator library is determined, the distributed kernel library is determined according to the kernel library determining function.

第二方面,本申请实施例提供一种机器学习处理器的算子处理装置,包括:In the second aspect, the embodiment of the present application provides an operator processing device of a machine learning processor, including:

第一获取模块,用于获取第一算子配置信息字典,所述第一算子配置信息字典包括算子库名、算子名称、内核库名和算子参数列表;A first obtaining module, configured to obtain a first operator configuration information dictionary, where the first operator configuration information dictionary includes an operator library name, an operator name, a kernel library name, and an operator parameter list;

第二获取模块,用于基于不同的所述算子库名,获取算子的注册代码模板和/或分发代码模板,其中,所述注册代码模板和/或所述分发代码模板包括待替换参数,所述待替换参数标识所述算子与所述内核库的对应关系;The second acquiring module is configured to acquire a registration code template and/or a distribution code template of an operator based on a different name of the operator library, wherein the registration code template and/or the distribution code template include parameters to be replaced , the parameter to be replaced identifies the corresponding relationship between the operator and the kernel library;

字典合并模块,用于根据所述算子的所述注册代码模板和/或所述分发代码模板,筛选合并所述第一算子配置信息字典,以得到第二算子配置信息字典;A dictionary merging module, configured to filter and merge the first operator configuration information dictionary according to the registration code template and/or the distribution code template of the operator to obtain a second operator configuration information dictionary;

更新模块,用于根据所述第二算子配置信息字典更新所述注册代码模板和/或所述分发代码模板中的所述待替换参数,以实现将所述算子注册和/或分发到一个或多个所述内核库。An update module, configured to update the registration code template and/or the parameter to be replaced in the distribution code template according to the second operator configuration information dictionary, so as to register and/or distribute the operator to One or more of said kernel libraries.

第三方面,本申请实施例提供一种机器学习处理器,所述机器学习处理器执行计算机可读指令时执行如下步骤:In a third aspect, an embodiment of the present application provides a machine learning processor, which performs the following steps when executing computer-readable instructions:

获取第一算子配置信息字典,所述第一算子配置信息字典包括算子库名、算子名称、内核库名和算子参数列表;Obtain a first operator configuration information dictionary, where the first operator configuration information dictionary includes an operator library name, an operator name, a kernel library name, and an operator parameter list;

基于不同的所述算子库名,获取算子的注册代码模板和/或分发代码模板,其中,所述注册代码模板和/或所述分发代码模板包括待替换参数,所述待替换参数标识所述算子与所述内核库的对应关系;Obtain the registration code template and/or the distribution code template of the operator based on the different operator library names, wherein the registration code template and/or the distribution code template include parameters to be replaced, and the parameters to be replaced identify The corresponding relationship between the operator and the kernel library;

根据所述算子的所述注册代码模板和/或所述分发代码模板,筛选合并所述第一算子配置信息字典,以得到第二算子配置信息字典;Filtering and merging the first operator configuration information dictionary according to the registration code template and/or the distribution code template of the operator to obtain a second operator configuration information dictionary;

根据所述第二算子配置信息字典更新所述注册代码模板和/或所述分发代码模板中的所述待替换参数,以实现将所述算子注册和/或分发到一个或多个所述内核库。Update the registration code template and/or the parameters to be replaced in the distribution code template according to the second operator configuration information dictionary, so as to register and/or distribute the operator to one or more Described kernel library.

进一步地,在所述获取第一算子配置信息字典时,所述机器学习处理器执行计算机可读指令时具体执行如下步骤:Further, when the first operator configuration information dictionary is acquired, the machine learning processor specifically performs the following steps when executing computer-readable instructions:

获取算子配置文件;Obtain the operator configuration file;

解析所述算子配置文件,得到第一算子配置信息字典。The operator configuration file is parsed to obtain the first operator configuration information dictionary.

进一步地,在所述解析所述算子配置文件,得到第一算子配置信息字典时,所述机器学习处理器执行计算机可读指令时具体执行如下步骤:Further, when the operator configuration file is parsed to obtain the first operator configuration information dictionary, the machine learning processor specifically performs the following steps when executing computer-readable instructions:

匹配所述算子配置文件中的所述算子库名、所述算子名称、所述内核库名和所述算子参数列表,得到所述第一算子配置信息字典。Matching the operator library name, the operator name, the kernel library name and the operator parameter list in the operator configuration file to obtain the first operator configuration information dictionary.

进一步地,在匹配所述算子配置文件中的所述算子库名、所述算子名称、所述内核库名和所述算子参数列表,得到所述第一算子配置信息字典,所述机器学习处理器执行计算机可读指令时还执行如下步骤:Further, after matching the operator library name, the operator name, the kernel library name and the operator parameter list in the operator configuration file, the first operator configuration information dictionary is obtained, so The following steps are also performed when the machine learning processor executes the computer-readable instructions:

获取用户输入的注册定义函数和/或分发定义函数,其中,所述待替换参数包括待替换注册参数和/或待替换分发参数,所述注册定义函数包括待替换注册参数,所述分发定义函数包括待替换分发参数;Obtain the registration definition function and/or distribution definition function input by the user, wherein the parameters to be replaced include registration parameters to be replaced and/or distribution parameters to be replaced, the registration definition function includes registration parameters to be replaced, and the distribution definition function Include distribution parameters to be replaced;

根据所述注册定义函数和/或所述分发定义函数建立所述算子的所述注册代码模板和/或所述分发代码模板。The registration code template and/or the distribution code template of the operator is established according to the registration definition function and/or the distribution definition function.

进一步地,在所述根据算子的所述注册代码模板和/或所述分发代码模板,筛选合并所述第一算子配置信息字典,以得到第二算子配置信息字典时,所述机器学习处理器执行计算机可读指令时具体执行如下步骤:Further, when the first operator configuration information dictionary is screened and merged according to the operator's registration code template and/or the distribution code template to obtain a second operator configuration information dictionary, the machine When the learning processor executes the computer-readable instructions, the following steps are specifically performed:

从所述注册代码模板和/或所述分发代码模板中解析得到所述待替换注册参数和/或所述待替换分发参数;Obtaining the registration parameter to be replaced and/or the distribution parameter to be replaced by parsing from the registration code template and/or the distribution code template;

基于键值对对应关系,根据所述待替换注册参数和/或所述待替换分发参数得到替换参数;Based on the key-value pair correspondence, the replacement parameter is obtained according to the registration parameter to be replaced and/or the distribution parameter to be replaced;

根据所述替换参数生成待合并键值对组;Generate a key-value pair to be merged according to the replacement parameters;

将所述待合并键值对组合并到所述第一算子配置信息字典中,得到所述第二算子配置信息字典。Merge the to-be-merged key-value pair into the first operator configuration information dictionary to obtain the second operator configuration information dictionary.

进一步地,所述从所述注册代码模板和/或所述分发代码模板中解析得到所述待替换注册参数和所述待替换分发参数时,所述机器学习处理器执行计算机可读指令时具体执行如下步骤:Further, when the registration parameter to be replaced and the distribution parameter to be replaced are obtained by parsing the registration code template and/or the distribution code template, when the machine learning processor executes computer-readable instructions, it specifically Perform the following steps:

匹配并确定所述注册代码模板和/或所述分发代码模板中的替换标识符;matching and determining replacement identifiers in said registration code template and/or said distribution code template;

将所述替换标识符后携带的参数作为所述待替换注册参数和/或所述待替换分发参数,得到所述待替换注册参数和/或所述待替换分发参数。The parameter carried after the replacement identifier is used as the registration parameter to be replaced and/or the distribution parameter to be replaced to obtain the registration parameter to be replaced and/or the distribution parameter to be replaced.

进一步地,根据所述待替换注册参数和/或所述待替换分发参数,筛选合并所述第一算子配置信息字典得到替换参数时,所述机器学习处理器执行计算机可读指令时具体执行如下步骤:Further, according to the registration parameters to be replaced and/or the distribution parameters to be replaced, when the replacement parameters are obtained by screening and merging the first operator configuration information dictionary, the machine learning processor executes computer-readable instructions. Follow the steps below:

根据所述算子库名、所述算子名称、所述内核库名和所述算子参数列表进行键值对筛选,得到所述替换参数。Perform key-value pair filtering according to the operator library name, the operator name, the kernel library name, and the operator parameter list to obtain the replacement parameters.

进一步地,所述分发定义函数包括入口函数、算子库确定函数和内核库确定函数。Further, the distribution definition function includes an entry function, an operator library determination function, and a kernel library determination function.

进一步地,述机器学习处理器执行计算机可读指令时还执行如下步骤:Further, when the machine learning processor executes the computer-readable instructions, it also performs the following steps:

将更新后的所述注册代码模板和所述分发代码模板写入到待编译文件中;Writing the updated registration code template and the distribution code template into the file to be compiled;

编译所述待编译文件,根据所述入口函数确定执行算子分发操作;Compile the file to be compiled, and determine to execute the operator distribution operation according to the entry function;

确定执行所述算子分发操作后,根据所述算子库确定函数确定分发的算子库;After determining to execute the operator distribution operation, determine the distributed operator library according to the operator library determination function;

确定所述分发的算子库后,根据所述内核库确定函数确定分发的内核库。After the distributed operator library is determined, the distributed kernel library is determined according to the kernel library determination function.

第四方面,本申请实施例提供了一种计算机设备,包括存储器、如第三方面所述的机器学习处理器以及存储在存储器中并可在所述机器学习处理器上运行的计算机可读指令,所述机器学习处理器执行所述计算机可读指令时执行如第一方面所述机器学习处理器的算子处理方法的步骤。In a fourth aspect, an embodiment of the present application provides a computer device, including a memory, a machine learning processor according to the third aspect, and computer-readable instructions stored in the memory and operable on the machine learning processor When the machine learning processor executes the computer-readable instructions, it executes the steps of the operator processing method of the machine learning processor according to the first aspect.

第五方面,本申请实施例提供了一种计算机可读存储介质,所述计算机可读存储介质存储有计算机可读指令,所述计算机可读指令被机器学习处理器执行时实现如第一方面所述机器学习处理器的算子处理方法的步骤。In the fifth aspect, the embodiment of the present application provides a computer-readable storage medium, the computer-readable storage medium stores computer-readable instructions, and when the computer-readable instructions are executed by a machine learning processor, the first aspect is implemented. The steps of the operator processing method of the machine learning processor.

在本申请实施例中,首先获取第一算子配置信息字典,可从第一算子配置信息字典中提取注册和/或分发时所需的算子库名、算子名称、内核库名和算子参数列表等键值对;接着基于不同的所述算子库名,获取算子的注册代码模板和/或分发代码模板,以利用这些预先配置好的模板确定算子的注册和/或分发逻辑;接着根据所述算子的所述注册代码模板和/或所述分发代码模板,筛选合并所述第一算子配置信息字典,以得到第二算子配置信息字典,能够扩充第一算子配置信息字典,且进一步确定注册代码模板和/或分发代码模板中需要更替的参数;最后根据所述第二算子配置信息字典更新所述注册代码模板和/或所述分发代码模板中的所述待替换参数,以实现将所述算子注册和/或分发到一个或多个所述内核库。本申请能够提高算子的注册和/或分发效率,让算子在同一后端中实现对多个内核库的分发。In this embodiment of the application, the first operator configuration information dictionary is obtained first, and the operator library name, operator name, kernel library name, and operator library name required for registration and/or distribution can be extracted from the first operator configuration information dictionary. Key-value pairs such as sub-parameter lists; then based on different operator library names, obtain the registration code template and/or distribution code template of the operator, so as to use these pre-configured templates to determine the registration and/or distribution of the operator Logic; then, according to the registration code template and/or the distribution code template of the operator, filter and merge the first operator configuration information dictionary to obtain a second operator configuration information dictionary, which can expand the first operator Sub-configuration information dictionary, and further determine the parameters that need to be replaced in the registration code template and/or distribution code template; finally update the registration code template and/or the distribution code template according to the second operator configuration information dictionary The parameter to be replaced is used to register and/or distribute the operator to one or more kernel libraries. This application can improve the registration and/or distribution efficiency of operators, and allow operators to distribute multiple kernel libraries in the same backend.

附图说明Description of drawings

为了更清楚地说明本申请实施例的技术方案,下面将对实施例中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其它的附图。In order to more clearly illustrate the technical solutions of the embodiments of the present application, the following will briefly introduce the accompanying drawings that need to be used in the embodiments. Obviously, the accompanying drawings in the following description are only some embodiments of the present application. Those of ordinary skill in the art can also obtain other drawings based on these drawings without paying creative labor.

图1是本申请实施例中一种机器学习处理器的算子处理方法的流程图;FIG. 1 is a flowchart of an operator processing method of a machine learning processor in an embodiment of the present application;

图2是本申请实施例中一种与机器学习处理器的算子处理方法一一对应的装置的原理框图;Fig. 2 is a functional block diagram of a device corresponding to an operator processing method of a machine learning processor in an embodiment of the present application;

图3是本申请实施例中包搭载机器学习处理器的计算机设备。Fig. 3 is a computer device equipped with a machine learning processor in an embodiment of the present application.

具体实施方式Detailed ways

为了更好的理解本申请的技术方案,下面结合附图对本申请实施例进行详细描述。In order to better understand the technical solutions of the present application, the embodiments of the present application will be described in detail below in conjunction with the accompanying drawings.

应当明确,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其它实施例,都属于本申请保护的范围。It should be clear that the described embodiments are only some of the embodiments of the present application, not all of the embodiments. Based on the embodiments in this application, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the scope of protection of this application.

在本申请实施例中使用的术语是仅仅出于描述特定实施例的目的,而非旨在限制本申请。在本申请实施例和所附权利要求书中所使用的单数形式的“一种”、“所述”和“该”也旨在包括多数形式,除非上下文清楚地表示其他含义。Terms used in the embodiments of the present application are only for the purpose of describing specific embodiments, and are not intended to limit the present application. The singular forms "a", "said" and "the" used in the embodiments of this application and the appended claims are also intended to include plural forms unless the context clearly indicates otherwise.

应当理解,本文中使用的术语“和/或”仅仅是一种描述关联对象的相同的字段,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。另外,本文中字符“/”,一般表示前后关联对象是一种“或”的关系。It should be understood that the term "and/or" used herein is just a description of the same field of associated objects, indicating that there may be three relationships, for example, A and/or B, which may mean: A exists alone, and A exists simultaneously and B, there are three cases of B alone. In addition, the character "/" in this article generally indicates that the contextual objects are an "or" relationship.

应当理解,尽管在本申请实施例中可能采用术语第一、第二、第三等来描述预设范围等,但这些预设范围不应限于这些术语。这些术语仅用来将预设范围彼此区分开。例如,在不脱离本申请实施例范围的情况下,第一预设范围也可以被称为第二预设范围,类似地,第二预设范围也可以被称为第一预设范围。It should be understood that although the terms first, second, third, etc. may be used in the embodiments of the present application to describe preset ranges, etc., these preset ranges should not be limited to these terms. These terms are only used to distinguish preset ranges from one another. For example, without departing from the scope of the embodiments of the present application, the first preset range may also be called the second preset range, and similarly, the second preset range may also be called the first preset range.

取决于语境,如在此所使用的词语“如果”可以被解释成为“在……时”或“当……时”或“响应于确定”或“响应于检测”。类似地,取决于语境,短语“如果确定”或“如果检测(陈述的条件或事件)”可以被解释成为“当确定时”或“响应于确定”或“当检测(陈述的条件或事件)时”或“响应于检测(陈述的条件或事件)”。Depending on the context, the word "if" as used herein may be interpreted as "at" or "when" or "in response to determining" or "in response to detecting". Similarly, depending on the context, the phrases "if determined" or "if detected (the stated condition or event)" could be interpreted as "when determined" or "in response to the determination" or "when detected (the stated condition or event) )" or "in response to detection of (a stated condition or event)".

现有自动生成技术只能将一种后端的某个算子,分发到唯一对应的kernel(内核)中,不支持分发到多种底层kernel。The existing automatic generation technology can only distribute a certain operator of one backend to the only corresponding kernel (kernel), and does not support distribution to multiple underlying kernels.

例如,算子batch_norm_stats(aten库其中的一算子),其CUDA后端分发的结果只能唯一分发到batch_norm_stats_cuda,而无法分发到其他底层kernel(内核)。For example, for the operator batch_norm_stats (one of the operators in the aten library), the results distributed by its CUDA backend can only be uniquely distributed to batch_norm_stats_cuda, but cannot be distributed to other underlying kernels (kernels).

图1是本申请实施例中一种机器学习处理器的算子处理方法的流程图。该机器学习处理器的算子处理方法可应用在机器学习处理器将算子自动注册和/或分发到不同底层kernel的场景。如图1所示,该机器学习处理器对其算子处理方法包括如下步骤:FIG. 1 is a flow chart of an operator processing method of a machine learning processor in an embodiment of the present application. The operator processing method of the machine learning processor can be applied to a scenario where the machine learning processor automatically registers and/or distributes operators to different underlying kernels. As shown in Figure 1, the machine learning processor includes the following steps for its operator processing method:

S10:获取第一算子配置信息字典,第一算子配置信息字典包括算子库名、算子名称、内核库名和算子参数列表。S10: Obtain a first operator configuration information dictionary, where the first operator configuration information dictionary includes an operator library name, an operator name, a kernel library name, and an operator parameter list.

其中,第一算子配置信息字典中存储的是以键值对形式存在的数据。例如以键值对存储形式存在的算子库名、算子名称、内核库名和算子参数列表的键值对组。其中,算子库包括现有的算子库或者自定义的算子库,本申请对此不作限定。这些算子库中各自包括多种算子。算子库名可包括如aten(Pytorch自带的库)、torchvision、torch_mlu(自定义库名)等名称。算子名称可采用如(add、sub)等实现不同算术功能的缩略名。内核库名可包括如(cnml、cnnl、bang)等名称。参数列表是指表示算子中输入、输出所需参数、以及参数类型的列表。Wherein, the first operator configuration information dictionary stores data in the form of key-value pairs. For example, key-value pairs of operator library name, operator name, kernel library name, and operator parameter list stored in the form of key-value pairs. Wherein, the operator library includes an existing operator library or a user-defined operator library, which is not limited in this application. Each of these operator libraries includes various operators. Operator library names can include names such as aten (library that comes with Pytorch), torchvision, torch_mlu (custom library name), etc. Operator names can be abbreviated names such as (add, sub) to achieve different arithmetic functions. The kernel library name may include names such as (cnml, cnnl, bang). The parameter list refers to a list representing the required input and output parameters and parameter types of the operator.

在一实施例中,第一算子配置信息字典中的算子库名、算子名称、内核库名和算子参数列表都是用户注册和/或分发所需的关键信息,利用这些关键信息可将算子指定发送到用户想要注册和分发的内核库。In an embodiment, the operator library name, operator name, kernel library name, and operator parameter list in the first operator configuration information dictionary are all key information required for user registration and/or distribution, and these key information can be used to Send the operator specification to the kernel library that the user wants to register and distribute.

S20:基于不同的算子库名,获取算子的注册代码模板和/或分发代码模板,其中,注册代码模板和/或分发代码模板包括待替换参数,待替换参数标识算子与内核库的对应关系。S20: Obtain the operator's registration code template and/or distribution code template based on different operator library names, where the registration code template and/or distribution code template include parameters to be replaced, and the parameters to be replaced identify the operator and the kernel library Correspondence.

其中,注册代码模板和/或分发代码模板可以是预设建立好的,用于将算子注册和/或分发到一种或多种的内核库。可以理解地,该注册代码模板和/或分发代码模板是算子注册和/或分发的通用模板,可在确定具体的算子库名、算子名称、内核库名等信息后,对注册代码模板和/或分发代码模板进行更新替换,达到将算子准确注册和/或分发到指定算子库中的一种或多种内核库中的功能。Wherein, the registration code template and/or distribution code template may be pre-established and used to register and/or distribute operators to one or more types of kernel libraries. Understandably, the registration code template and/or distribution code template is a general template for operator registration and/or distribution. After determining the specific operator library name, operator name, kernel library name and other information, the registration code Templates and/or distribution code templates are updated and replaced to achieve the function of accurately registering and/or distributing operators to one or more kernel libraries in the specified operator library.

在一实施例中,算子库名对应的是机器学习处理器的后端,为了实现将算子分发到后端的不同内核库中(底层kernel库),首先可以基于不同的算子库名区分并获取算子的注册代码模板和/或分发代码模板。In one embodiment, the name of the operator library corresponds to the backend of the machine learning processor. In order to distribute operators to different kernel libraries (underlying kernel libraries) in the backend, firstly, it can be distinguished based on different operator library names. And obtain the operator's registration code template and/or distribution code template.

该注册代码模板和分发代码模板中包括待替换参数。其中,待替换参数标识算子与内核库的对应关系,具体地,该待替换参数是注册代码模板和分发代码模板中用于确定算子的注册和/或分发路径及算子本身的算法内容(如参数列表中的内容)的参数,该注册和/或分发路径决定了算子注册和/或分发的算子库中的具体内核库。该待替换参数可以是return_type(返回的参数类型)、func_name(算子名称)、type_formals_c(算子中的参数类型)等。当用户有需求对算子进行注册和分发时,可利用第一算子配置信息字典确定注册和分发的内核库,将注册代码模板和分发代码模板中待替换参数替换成用于注册和/或分发时的参数,以实现算子的自动注册、自动分发及定点分发。The registration code template and distribution code template include parameters to be replaced. Among them, the parameter to be replaced identifies the corresponding relationship between the operator and the kernel library. Specifically, the parameter to be replaced is the algorithm content used to determine the registration and/or distribution path of the operator and the operator itself in the registration code template and distribution code template (such as the content in the parameter list), the registration and/or distribution path determines the specific kernel library in the operator library for operator registration and/or distribution. The parameter to be replaced may be return_type (returned parameter type), func_name (operator name), type_formals_c (parameter type in the operator), etc. When users need to register and distribute operators, they can use the first operator configuration information dictionary to determine the kernel library for registration and distribution, and replace the parameters to be replaced in the registration code template and distribution code template with the ones used for registration and/or Parameters during distribution to realize automatic registration, automatic distribution, and fixed-point distribution of operators.

S30:根据算子的注册代码模板和/或分发代码模板,筛选合并第一算子配置信息字典,以得到第二算子配置信息字典。S30: Filter and merge the first operator configuration information dictionary according to the operator's registration code template and/or distribution code template to obtain a second operator configuration information dictionary.

在一实施例中,第二算子配置信息字典是在第一算子配置信息字典的基础上扩充得到的,除了包括第一算子配置信息字典中的键值对组,还包括注册代码模板和/或分发代码模板上待替换参数所对应的键值对组。可以理解地,第一算子配置信息字典包括有如算子库名、算子名称、内核库名,与算子本身的算法内容相关的参数列表等。注册代码模板和/或分发代码模板一开始是通用的模板,也即用户在未决定将算子具体分发到哪些内核库时该模板便可建立的。在有了第一算子配置信息字典后,也即确定将要注册和/或分发的算子后,可相应地根据这些确定的算子信息,对应的筛选查找注册代码模板和/或分发代码模板中与这些确定的算子信息对应的待替换参数。也即,一开始注册代码模板和/或分发代码模板是无法进行编译的,因为注册代码模板和/或分发代码模板中存在待替换参数,这些待替换参数在确定需要注册和/或分发的算子等信息后,可相应地确定注册代码模板和/或分发代码模板中的待替换参数。In one embodiment, the second operator configuration information dictionary is expanded on the basis of the first operator configuration information dictionary, and includes not only key-value pairs in the first operator configuration information dictionary, but also registration code templates and/or key-value pairs corresponding to the parameters to be replaced on the distribution code template. Understandably, the first operator configuration information dictionary includes, for example, the name of the operator library, the name of the operator, the name of the kernel library, a parameter list related to the algorithm content of the operator itself, and the like. The registration code template and/or distribution code template is a general template at the beginning, that is, the template can be established when the user has not decided which kernel libraries to distribute the operator to. After the first operator configuration information dictionary is available, that is, after the operators to be registered and/or distributed are determined, corresponding screening can be performed to search for registration code templates and/or distribution code templates based on the determined operator information The parameters to be replaced corresponding to these determined operator information in . That is, the registration code template and/or the distribution code template cannot be compiled at the beginning, because there are parameters to be replaced in the registration code template and/or the distribution code template, and these parameters to be replaced are determined to be registered and/or distributed. After substituting information, parameters to be replaced in the registration code template and/or distribution code template can be determined accordingly.

在一实施例中,为了使注册代码模板和分发代码模板能够符合编译的要求,需要通过第一算子配置信息字典来确定注册代码模板和分发代码模板中待替换参数,并将待替换参数所对应的键值对组合并入到第一算子配置信息字典中,得到第二算子配置信息字典。In one embodiment, in order to make the registration code template and the distribution code template meet the compilation requirements, it is necessary to determine the parameters to be replaced in the registration code template and the distribution code template through the first operator configuration information dictionary, and set the parameters to be replaced The corresponding key-value pair combination is merged into the first operator configuration information dictionary to obtain the second operator configuration information dictionary.

S40:根据第二算子配置信息字典更新注册代码模板和/或分发代码模板中的待替换参数,以实现将算子注册和/或分发到一个或多个内核库。S40: Update the parameters to be replaced in the registration code template and/or the distribution code template according to the second operator configuration information dictionary, so as to register and/or distribute the operator to one or more kernel libraries.

可以理解地,第二算子配置信息字典包含了算子注册和/或分发所需的键值对。在编译过程中所需确定的参数都可以在该第二算子配置信息中找到。Understandably, the second operator configuration information dictionary includes key-value pairs required for operator registration and/or distribution. All parameters that need to be determined during the compilation process can be found in the second operator configuration information.

进一步地,注册代码模板和/或分发代码模板中待替换参数找不到对应的键值对,注册代码模板和/或分发代码模板在未更新前是不符合编译需求的,因此,本申请实施例中,可采用第二算子配置信息字典替换更新注册代码模板和分发代码模板,这样,注册代码模板和/或分发代码模板中包括用于注册和/或分发的与键值对对应的参数,通过这些参数可以确定算子注册和/或分发的具体路径,实现将算子注册和/或分发到一个或多个内核库的功能。Furthermore, there is no corresponding key-value pair for the parameters to be replaced in the registration code template and/or distribution code template, and the registration code template and/or distribution code template do not meet the compilation requirements before they are updated. Therefore, this application implements In this example, the second operator configuration information dictionary can be used to replace and update the registration code template and the distribution code template, so that the registration code template and/or the distribution code template include parameters corresponding to key-value pairs for registration and/or distribution , through these parameters, the specific path for operator registration and/or distribution can be determined, and the function of registering and/or distributing operators to one or more kernel libraries can be realized.

在一实施例中,可以采用第二算子配置信息字典中的替换参数完成对注册代码模板和/或分发代码模板的更新。对于如分发代码模板中的待替换参数:若存在待替换分发参数type_formals_c(形参,算子的参数类型),则该待替换分发参数具体可采用第一算子配置信息字典中的值(const at::Tensor&input','const at::Tensor&position)进行替换。进一步地,对于如分发代码模板中的待替换参数schema_order_type_formals,可采用第二算子配置信息字典中的值(const at::Tensor&input','const at::Tensor&position)进行替换。In an embodiment, the update of the registration code template and/or the distribution code template can be completed by using the replacement parameters in the second operator configuration information dictionary. For parameters to be replaced in the distribution code template: if there is a distribution parameter to be replaced type_formals_c (formal parameter, operator parameter type), the distribution parameter to be replaced can specifically use the value in the first operator configuration information dictionary (const at::Tensor&input','const at::Tensor&position) to replace. Further, for the parameter schema_order_type_formals to be replaced in the distribution code template, the values (const at::Tensor&input', 'const at::Tensor&position) in the second operator configuration information dictionary can be used for replacement.

可以理解地,type_formals_c是分发代码模板中一种默认使用的表示方式,直接编译是通过不了的,需要将该type_formals_c替换为与第一算子配置信息字典中算子对应形参的键值对的值。It is understandable that type_formals_c is a default expression used in the distribution code template, and direct compilation cannot pass. You need to replace type_formals_c with the key-value pair corresponding to the formal parameter of the operator in the first operator configuration information dictionary value.

可以看到,除了包括第一算子配置信息字典之外,还包括分发代码模板中一些待替换参数所对应的键值对,这样在编译阶段可将第二算子配置信息字典中的键值对信息准确替换到分发代码模板。It can be seen that in addition to the first operator configuration information dictionary, it also includes key-value pairs corresponding to some parameters to be replaced in the distribution code template, so that the key-value pairs in the second operator configuration information dictionary can be used during compilation Substitute the exact information into the distribution code template.

进一步地,在步骤10之前,即获取第一算子配置信息字典之前,该方法还包括如下步骤:Further, beforestep 10, that is, before obtaining the first operator configuration information dictionary, the method further includes the following steps:

S11:获取算子配置文件。S11: Obtain an operator configuration file.

其中,算子配置文件用于记录算子的基础信息,这些基础信息具体可以是算子名称、与算子注册和/或分发路径相关的信息。Among them, the operator configuration file is used to record the basic information of the operator, which may specifically be the name of the operator, information related to operator registration and/or distribution path.

在一实施例中,机器学习处理器(MLU)在实现算子的注册和分发代码自动生成功能时获取算子配置文件。该算子配置文件可以是用户根据注册和/或分发需求预先创建的。机器学习处理器在获取该算子配置文件后,可根据算子配置文件中的配置信息确定对算子的注册和/或分发处理。In an embodiment, the machine learning unit (MLU) acquires the operator configuration file when realizing the registration of the operator and the automatic generation of the distribution code. The operator configuration file may be pre-created by the user according to registration and/or distribution requirements. After obtaining the operator configuration file, the machine learning processor can determine the registration and/or distribution processing of the operator according to the configuration information in the operator configuration file.

S12:解析算子配置文件,得到第一算子配置信息字典。S12: Parse the operator configuration file to obtain the first operator configuration information dictionary.

在一实施例中,机器学习处理器可根据预设的解析策略,从算子配置文件中提取用于注册和/或分发时所需的数据,并采用键值对的形式将算子与算子对应的信息存储下来,得到存储有键值对组的第一算子配置信息字典。进一步地,该配置文件具体可采用yaml(YAML Ain't Markup Language,另一种标记语言)文件格式表示,该yaml配置文件将算子库名、算子名称、内核库名和算子参数列表以可读性高的方式存储下来,用户可随时根据需求在yaml配置文件的基础上对配置信息进行修改。In one embodiment, the machine learning processor can extract the data required for registration and/or distribution from the operator configuration file according to a preset parsing strategy, and use key-value pairs to link the operator and operator The information corresponding to the operator is stored to obtain the first operator configuration information dictionary that stores key-value pairs. Further, the configuration file can be specifically expressed in the yaml (YAML Ain't Markup Language, another markup language) file format. The yaml configuration file includes the operator library name, operator name, kernel library name, and operator parameter list in the form of Stored in a highly readable way, users can modify the configuration information based on the yaml configuration file at any time according to their needs.

步骤S11-S12中,可将算子采用配置文件的格式存储,当用户需要将算子分发到指定的内核库时,可通过该算子配置文件快速解析并得到第一算子配置信息字典。In steps S11-S12, the operator can be stored in a configuration file format. When the user needs to distribute the operator to a specified kernel library, the operator configuration file can be quickly parsed to obtain the first operator configuration information dictionary.

进一步地,在步骤S12中,即解析算子配置文件,得到第一算子配置信息字典中,具体包括如下步骤:Further, in step S12, the operator configuration file is parsed to obtain the first operator configuration information dictionary, which specifically includes the following steps:

匹配算子配置文件中的算子库名、算子名称、内核库名和算子参数列表,得到第一算子配置信息字典。Match the operator library name, operator name, kernel library name, and operator parameter list in the operator configuration file to obtain the first operator configuration information dictionary.

在一实施例中,用户若需要将某个算子(如算子名称为dequantize的算子)注册和/或分发到不同内核库(如bang内核库)中,可采用字符串匹配的方式,从算子配置文件中匹配得到对应的算子库名(如namespace:torch_mlu)算子名称(如name:dequantize)、内核库名(如derived_type:bang)和算子参数列表(包括算子中运算的参数及参数类型,如算子中运算的输入参数有name:input,其对应的参数类型为type:constat::Tensor&,还有输入参数name:position,其对应的参数类型为type:constat::Tensor&,还有算子的返回类型return_type:at::Tensor,并根据这些信息将算子库名、算子名称、内核库名和算子参数列表组成对应的键值对,以生成第一算子配置信息字典。该第一算子配置信息字典中算子库名、算子名称、内核库名和算子参数列表包括键值对的键和键值对的值,具有键值对的对应关系,是一个由多组键值对构成的字典。这些用于注册和/或分发的键值对整合成一个字典进行存储,任何需要调用到与该第一算子配置信息字典中参数相关的步骤都可以直接访问,在该第一算子配置信息字典上获取相应的字段。In one embodiment, if a user needs to register and/or distribute an operator (such as an operator whose name is dequantize) to different kernel libraries (such as the bang kernel library), the user can use string matching, Get the corresponding operator library name (such as namespace: torch_mlu), operator name (such as name: dequantize), kernel library name (such as derived_type: bang) and operator parameter list (including operations in the operator) from the operator configuration file. Parameters and parameter types, for example, the input parameter of the operation in the operator has name: input, and its corresponding parameter type is type: constat::Tensor&, and the input parameter name: position, and its corresponding parameter type is type: constat: :Tensor&, and the return type return_type:at::Tensor of the operator. According to these information, the corresponding key-value pairs are composed of the operator library name, operator name, kernel library name, and operator parameter list to generate the first operator Sub-configuration information dictionary. The operator library name, operator name, kernel library name, and operator parameter list in the first operator configuration information dictionary include key-value pair keys and key-value pair values, and have key-value pair correspondences , is a dictionary composed of multiple sets of key-value pairs. These key-value pairs for registration and/or distribution are integrated into a dictionary for storage. Any steps that need to be called related to the parameters in the first operator configuration information dictionary can be accessed directly, and the corresponding fields can be obtained from the first operator configuration information dictionary.

进一步地,在步骤S20之前,即在获取算子的注册代码模板和/或分发代码模板之前,该方法还包括如下步骤:Further, before step S20, that is, before acquiring the operator's registration code template and/or distribution code template, the method further includes the following steps:

S21:获取用户输入的注册定义函数和/或分发定义函数,其中,待替换参数包括待替换注册参数和/或待替换分发参数,注册定义函数包括待替换注册参数,分发定义函数包括待替换分发参数。S21: Obtain the registration definition function and/or distribution definition function input by the user, wherein the parameter to be replaced includes the registration parameter to be replaced and/or the distribution parameter to be replaced, the registration definition function includes the registration parameter to be replaced, and the distribution definition function includes the distribution to be replaced parameter.

其中,注册定义函数和/或分发定义函数分别用于定义算子的注册和分发逻辑。算子在注册和/或分发时都需要通过该注册定义函数和/或分发定义函数的执行,这样,可方便对算子注册和/或分发的内核库进行设定。Wherein, the registration definition function and/or the distribution definition function are respectively used to define the registration and distribution logic of the operator. When an operator is registered and/or distributed, it needs to execute the registration definition function and/or distribution definition function. In this way, it is convenient to set the kernel library for operator registration and/or distribution.

待替换注册参数和待替换分发参数是待替换参数,也即注册代码模板和分发代码模板在用户未确定具体的算子注册和/或分发操作时中采用通用形式表示的参数。在编译阶段注册定义函数和/或分发定义函数执行时,待替换注册参数和/或待替换分发参数会替换成与用户期望算子注册和/或分发的内核库相关的参数。The registration parameters to be replaced and the distribution parameters to be replaced are the parameters to be replaced, that is, the registration code template and the distribution code template are parameters expressed in a general form when the user does not determine the specific operator registration and/or distribution operation. When the registration definition function and/or the distribution definition function are executed in the compilation phase, the registration parameters to be replaced and/or the distribution parameters to be replaced will be replaced with parameters related to the kernel library that the user expects operator registration and/or distribution.

在一实施例中,用户通过输入的注册定义函数和/或分发定义函数确定算子的注册和/或分发逻辑,当用户确定算子注册和/或分发的路径后,可通过待替换注册参数和/或待替换分发参数的方式,灵活且迅速实现不同算子分发到不同内核库中的功能。In one embodiment, the user determines the registration and/or distribution logic of the operator through the input registration definition function and/or distribution definition function. After the user determines the operator registration and/or distribution path, he can pass the registration parameters to be replaced And/or the way of distribution parameters to be replaced, flexibly and quickly realize the function of distributing different operators to different kernel libraries.

S22:根据注册定义函数和/或分发定义函数建立算子的注册代码模板和/或分发代码模板。S22: Establish a registration code template and/or a distribution code template of the operator according to the registration definition function and/or the distribution definition function.

在一实施例中,以注册定义函数和/或分发定义函数为基础设定了算子注册和/或分发的逻辑,根据该注册定义函数和/或分发定义函数可相应地建立注册代码模板和/或分发代码模板。在算子注册和/或分发阶段,只需要将注册代码模板和/或分发代码模板中的待替换注册参数和/或待替换分发参数进行替换,能够实现算子定向自动注册和/或分发的功能。In one embodiment, the operator registration and/or distribution logic is set based on the registration definition function and/or distribution definition function, and the registration code template and registration code template can be established accordingly according to the registration definition function and/or distribution definition function. /or distribute code templates. In the operator registration and/or distribution phase, only the registration parameters to be replaced and/or distribution parameters to be replaced in the registration code template and/or distribution code template need to be replaced, which can realize the operator-oriented automatic registration and/or distribution Function.

步骤S21-S22中,提供了预先配置注册代码模板和分发代码模板的具体实施方式,可通过确定待替换注册参数和待替换分发参数的参数替换,将注册代码模板和分发代码模板根据用户期望的算子注册和/或分发需求灵活变换。In steps S21-S22, a specific implementation method of pre-configuring the registration code template and the distribution code template is provided. By determining the parameter replacement of the registration parameter and the distribution parameter to be replaced, the registration code template and the distribution code template can be changed according to the user's desired Operator registration and/or distribution requirements can change flexibly.

进一步地,步骤S30中,即根据算子的注册代码模板和/或分发代码模板,筛选合并第一算子配置信息字典,以得到第二算子配置信息字典中,具体包括如下步骤:Further, in step S30, that is, according to the registration code template and/or distribution code template of the operator, screening and merging the first operator configuration information dictionary to obtain the second operator configuration information dictionary, specifically includes the following steps:

S31:从注册代码模板和/或分发代码模板中解析得到待替换注册参数和/或待替换分发参数。S31: Parsing the registration code template and/or the distribution code template to obtain the registration parameter to be replaced and/or the distribution parameter to be replaced.

在一实施例中,可对注册代码模板和/或分发代码模板进行解析,按照预设定的解析规则分别确定注册代码模板和/或分发代码模板中的待替换注册参数和/或待替换分发参数。In an embodiment, the registration code template and/or the distribution code template can be parsed, and the registration parameters to be replaced and/or distribution codes to be replaced in the registration code template and/or the distribution code template are respectively determined according to preset parsing rules. parameter.

S32:基于键值对对应关系,根据待替换注册参数和/或待替换分发参数得到替换参数。S32: Obtain the replacement parameter according to the registration parameter to be replaced and/or the distribution parameter to be replaced based on the key-value pair correspondence.

其中,替换参数是指用于替换待替换注册参数和待替换分发参数的参数数据。通过第一算子配置信息字典中的算子名称、算子库名、内核库名等信息可筛选确定待替换注册参数和/或待替换分发参数所对应的替换参数。Wherein, the replacement parameter refers to parameter data used to replace the registration parameter to be replaced and the distribution parameter to be replaced. The replacement parameters corresponding to the registration parameters to be replaced and/or the distribution parameters to be replaced can be determined by screening information such as operator names, operator library names, and kernel library names in the first operator configuration information dictionary.

例如,在分发代码模板中包括“operator_name”(待操作的算子名称)这一待替换参数时,可根据此次分发的算子名称如“dequantize”(这个dequantize是根据第一算子配置信息字典确定的算子名称),自动生成'operator_name':'dequantize'的键值对('dequantize'为该键值对的值,该值可作为替换参数替换注册代码模板和/或分发代码模板中的待替换参数),并将生成的该键值对并入到第一算子配置信息字典中,以得到第二算子配置信息字典。可以理解地,第二算子配置信息中不同键值对的值是可以相同的,但是键不同(这个是分发代码模板中的定义需要,该分发代码模板中的分发定义函数执行包括该对operator_name的定义),同样地,分发代码模板中还可包括“wrapper_name”(用于分发的算子名称)、“bang_func_name”(在bang内核库中分发的算子名称)等与分发定义函数相关的待替换参数,这些待替换参数所对应的键值对为'wrapper_name':'wrappper_dequantize'(值表示用于分发的算子名称为dequantize),'bang_func_name':'bang_dequantize'(值表示在bang内核库中分发的算子名称为dequantize)。For example, when the distribution code template includes the parameter "operator_name" (the name of the operator to be operated) to be replaced, the name of the operator to be distributed this time such as "dequantize" (this dequantize is based on the configuration information of the first operator) The operator name determined by the dictionary), and automatically generate a key-value pair of 'operator_name':'dequantize' ('dequantize' is the value of the key-value pair, which can be used as a replacement parameter in the registration code template and/or distribution code template parameters to be replaced), and merge the generated key-value pair into the first operator configuration information dictionary to obtain the second operator configuration information dictionary. Understandably, the values of different key-value pairs in the second operator configuration information can be the same, but the keys are different (this is required by the definition in the distribution code template, and the distribution definition function execution in the distribution code template includes the pair of operator_name definition), similarly, the distribution code template can also include "wrapper_name" (the name of the operator used for distribution), "bang_func_name" (the name of the operator distributed in the bang kernel library), etc. Replacement parameters, the key-value pairs corresponding to these parameters to be replaced are 'wrapper_name':'wrapper_dequantize' (the value indicates that the operator name used for distribution is dequantize), 'bang_func_name':'bang_dequantize' (the value indicates that it is in the bang kernel library The distribution operator name is dequantize).

S33:根据替换参数生成待合并键值对组。S33: Generate a key-value pair to be merged according to the replacement parameter.

其中,待合并键值对组是指与第一算子配置信息字典待合并的键值对组。Wherein, the key-value pair group to be merged refers to the key-value pair group to be merged with the first operator configuration information dictionary.

可以理解地,在一次算子注册和分发的过程中,待替换注册参数和待替换分发参数有多个,用于替换该待替换注册参数和待替换分发参数的替换参数也有多个,可将这些替换参数生成待合并键值对组。Understandably, in the process of operator registration and distribution, there are multiple registration parameters to be replaced and distribution parameters to be replaced, and there are multiple replacement parameters for replacing the registration parameters to be replaced and distribution parameters to be replaced. These substitution parameters generate key-value pairs to be merged.

S34:将待合并键值对组合并到第一算子配置信息字典中,得到第二算子配置信息字典。S34: Merge the key-value pairs to be merged into the first operator configuration information dictionary to obtain the second operator configuration information dictionary.

在一实施例中,可将第一算子配置信息字典和待合并键值对组合并,形成信息更加全面的第二算子配置信息字典。可以理解地,注册代码模板和分发代码模板中存在多个待替换参数,除了第一算子配置信息字典中的键值对组外,还需要待合并键值对组。In an embodiment, the first operator configuration information dictionary and the key-value pair to be merged may be combined to form a second operator configuration information dictionary with more comprehensive information. It can be understood that there are multiple parameters to be replaced in the registration code template and the distribution code template. In addition to the key-value pairs in the first operator configuration information dictionary, key-value pairs to be merged are also required.

具体地,该第二算子配置信息字典在第一算子配置信息字典的基础上合并了与分发代码模板中待替换参数相关的键值对组(以算子分发为例),这些合并的新键值对组具体可包括分发代码模板中分发定义函数与参数列表对应的键值对'tensor_list':'input,position',分发代码模板中分发定义函数的参数类型对应的键值对“'type_formals_c':['const at::Tensor&input','const at::Tensor&position']”,以及,在分发代码模板中其他分发定义函数对参数列表、参数类型、算子名称、算子库名、内核库名等的键值对信息。在一实施例中,分发代码模板中包括多种不同类型的分发定义函数,每种分发定义函数封装了不同层级的分发逻辑,每种分发定义函数对如参数列表、参数类型、算子名称、算子库名、内核库名等的内容会在该函数中有新的定义,因此,不同分发函数的待替换参数(对应键值对的键)是不同的,替换参数(对应键值对的值)可以存着相同的情况,例如对于入口层(确定将要执行分发操作的分发定义函数)的分发定义函数,其与参数类型相关的键(待替换参数)可以定义表示为type_formals_c,对于分发层(具体执行分发操作的分发定义函数),其与参数类型相关的键可以定义表示为schema_order_type_formals,该两个键所分别对应的值(替换参数)相同,具体为表示算子的参数类型的值'const at::Tensor&input','const at::Tensor&position'。Specifically, the second operator configuration information dictionary combines the key-value pairs related to the parameters to be replaced in the distribution code template on the basis of the first operator configuration information dictionary (taking operator distribution as an example). The new key-value pair group can specifically include the key-value pair 'tensor_list':'input,position' corresponding to the distribution definition function and parameter list in the distribution code template, and the key-value pair corresponding to the parameter type of the distribution definition function in the distribution code template "' type_formals_c':['const at::Tensor&input','const at::Tensor&position']", and, in the distribution code template, other distribution definition functions pair parameter list, parameter type, operator name, operator library name, kernel Key-value pair information such as library name. In one embodiment, the distribution code template includes multiple different types of distribution definition functions. Each distribution definition function encapsulates different levels of distribution logic. Each distribution definition function pairs such as parameter list, parameter type, operator name, The content of the operator library name, kernel library name, etc. will be newly defined in this function. Therefore, the parameters to be replaced (keys corresponding to key-value pairs) of different distribution functions are different, and the replacement parameters (keys corresponding to key-value pairs) value) can have the same situation, for example, for the distribution definition function of the entry layer (determining the distribution definition function that will execute the distribution operation), the key related to the parameter type (the parameter to be replaced) can be defined as type_formals_c, for the distribution layer (the distribution definition function that specifically executes the distribution operation), the key related to the parameter type can be defined as schema_order_type_formals, and the values (replacement parameters) corresponding to the two keys are the same, specifically the value representing the parameter type of the operator' const at::Tensor&input', 'const at::Tensor&position'.

可以看到,除了包括第一算子配置信息字典之外,还包括分发代码模板中一些待替换参数所对应的键值对,这样在编译阶段可将第二算子配置信息字典中的键值对信息准确替换到分发代码模板。It can be seen that in addition to the first operator configuration information dictionary, it also includes key-value pairs corresponding to some parameters to be replaced in the distribution code template, so that the key-value pairs in the second operator configuration information dictionary can be used during compilation Substitute the exact information into the distribution code template.

步骤S31-S34中,提供了得到第二算子配置信息字典的具体实施方式,可灵活调整注册代码模板和分发代码模板中的待替换参数,指定算子注册和/或分发到用户期望的内核中。In steps S31-S34, a specific implementation method for obtaining the second operator configuration information dictionary is provided, and the parameters to be replaced in the registration code template and distribution code template can be flexibly adjusted, and the specified operator is registered and/or distributed to the core desired by the user middle.

进一步地,在步骤S31中,即从注册代码模板和/或分发代码模板中解析得到待替换注册参数和/或待替换分发参数中,具体还包括如下步骤:Further, in step S31, that is, parsing from the registration code template and/or the distribution code template to obtain the registration parameter to be replaced and/or the distribution parameter to be replaced, specifically, the following steps are further included:

S311:匹配并确定注册代码模板和/或分发代码模板中的替换标识符。S311: Match and determine the replacement identifier in the registration code template and/or the distribution code template.

在一实施例中,该替换标识符为预先设定的标识符,可以为符号、字母、表达式等,本申请对此不作限定。在一个可选地实施例中,可通过符号$表示替换标识符,在解析注册代码模板和/或分发代码模板时匹配$标识符,以得到待替换注册参数和/或待替换分发参数。In an embodiment, the replacement identifier is a preset identifier, which may be a symbol, letter, expression, etc., which is not limited in this application. In an optional embodiment, the symbol $ may be used to represent the replacement identifier, and the $ identifier is matched when parsing the registration code template and/or the distribution code template to obtain the registration parameter to be replaced and/or the distribution parameter to be replaced.

S312:将替换标识符后携带的参数作为待替换注册参数和/或待替换分发参数,得到待替换注册参数和/或待替换分发参数。S312: Use the parameter carried after the replacement identifier as the registration parameter to be replaced and/or the distribution parameter to be replaced, to obtain the registration parameter to be replaced and/or the distribution parameter to be replaced.

在一实施例中,把$标识符后的参数作为待替换注册参数和/或待替换分发参数,根据模板是注册代码模板或是分发代码模板的类型区分并合并待替换注册参数和待替换分发参数。In one embodiment, the parameter after the $ identifier is used as the registration parameter to be replaced and/or the distribution parameter to be replaced, and the registration parameter to be replaced and the distribution to be replaced are distinguished and combined according to whether the template is a registration code template or a distribution code template type. parameter.

步骤S311-S312中,可通过匹配替换标识符的标识快速确定替换注册参数和待替换分发参数。In steps S311-S312, the replacement registration parameter and the distribution parameter to be replaced can be quickly determined by matching the identification of the replacement identifier.

进一步地,在步骤S32中,即基于键值对对应关系,根据待替换注册参数和/或待替换分发参数,筛选合并第一算子配置信息字典得到替换参数的步骤中,具体包括如下步骤:Further, in step S32, that is, in the step of screening and merging the first operator configuration information dictionary to obtain the replacement parameters based on the key-value pair correspondence and according to the registration parameters to be replaced and/or the distribution parameters to be replaced, specifically include the following steps:

根据算子库名、算子名称、内核库名和算子参数列表进行键值对筛选,得到替换参数。Filter key-value pairs based on the operator library name, operator name, kernel library name, and operator parameter list to obtain replacement parameters.

在一实施例中,通过算子库名、算子名称、内核库名等键值的筛选,可确定用户期望算子注册和/或分发的信息,再结合待替换注册参数和待替换分发参数的参数含义,可确定模板中待替换参数需具体采用的替换参数。In one embodiment, through the screening of key values such as operator library name, operator name, kernel library name, etc., the information that the user expects operator registration and/or distribution can be determined, and then combined with registration parameters to be replaced and distribution parameters to be replaced The meaning of the parameter can determine the specific replacement parameter to be used for the parameter to be replaced in the template.

进一步地,分发定义函数包括入口函数、算子库确定函数和内核库确定函数。Further, the distribution definition function includes an entry function, an operator library determination function, and a kernel library determination function.

在一实施例中,入口函数可用于确定算子的处理方式,比如是对算子进行分发还是注册,或者是其他操作。算子库确定函数可确定分发/注册的算子库。内核库确定函数可确定分发/注册的内核库。In an embodiment, the entry function can be used to determine the processing method of the operator, such as whether to distribute or register the operator, or other operations. The operator library determination function can determine the distributed/registered operator library. The kernel library determination function determines the distributed/registered kernel library.

具体地,对于dequantize该算子,可以采用由多嵌套层级函数构成的多种分发代码模板进行分发。该多种分发代码模板具体可以包括基于入口函数构建的分发代码模板,分发功能的封装函数构建的分发代码模板以及基于内核库分发函数构建的分发代码模板。其中,基于入口函数构建的分发代码模板是开启分发操作的函数,通过该入口函数可确定分发的算子库以及进入到接下来的分发操作;分发功能的封装函数构建的分发代码模板封装了分发的逻辑,该分发功能的封装函数的执行逻辑中还可包括多个基于内核库分发函数。进入到该分发功能的封装函数后可进入到基于内核库分发函数,通过该基于内核库分发函数实现将算子分发到具体的一个或多个内核库的功能。Specifically, for the dequantize operator, various distribution code templates composed of multi-nested functions can be used for distribution. The various distribution code templates may specifically include a distribution code template built based on an entry function, a distribution code template built based on a distribution function encapsulation function, and a distribution code template built based on a kernel library distribution function. Among them, the distribution code template built based on the entry function is a function to start the distribution operation, through which the distribution operator library can be determined and entered into the next distribution operation; the distribution code template built by the distribution function encapsulation function encapsulates the distribution The execution logic of the encapsulation function of the distribution function may also include multiple distribution functions based on the kernel library. After entering the encapsulation function of the distribution function, you can enter the distribution function based on the kernel library, and through the distribution function based on the kernel library, the function of distributing operators to one or more specific kernel libraries can be realized.

具体地,基于内核库分发函数构建的分发代码模板,从这个地方开始实现算子到具体内核库的分叉,可由derived_type决定采用的分发代码模板。在一实施例中,当内核库为cnnl时就用CNNL_OPS_DEFINITION模板,同理当内核库为bang时就会用BANG_OPS_DEFINITION模板,同理cnml也是。进一步地,如果同时存在多个内核库比如cnml&&cnnl,那么最后会同时生成两段以基于内核库函数构建的分发代码模板的代码。Specifically, based on the distribution code template built on the distribution function of the kernel library, the fork from the operator to the specific kernel library is realized from this place, and the distribution code template used can be determined by derived_type. In one embodiment, when the kernel library is cnnl, the CNNL_OPS_DEFINITION template is used; similarly, when the kernel library is bang, the BANG_OPS_DEFINITION template is used, and the same is true for cnml. Further, if multiple kernel libraries such as cnml&&cnnl exist at the same time, two pieces of codes based on distribution code templates built based on kernel library functions will be generated at the same time.

进一步地,该机器学习处理器的算子处理方法还包括如下步骤:Further, the operator processing method of the machine learning processor also includes the following steps:

S41:将更新后的注册代码模板和分发代码模板写入到待编译文件中。S41: Write the updated registration code template and distribution code template into the file to be compiled.

在一实施例中,注册代码模板和分发代码模板更新后可写入到如C++类型的编译文件中。在编译时,将根据注册代码模板和分发代码模板中用于注册和/或分发的与键值对对应的参数,把算子自动注册和/或分发到用户期望的内核库中。In an embodiment, after the registration code template and the distribution code template are updated, they can be written into compiled files such as C++. During compilation, the operator will be automatically registered and/or distributed to the user's desired kernel library according to the parameters corresponding to the key-value pairs used for registration and/or distribution in the registration code template and distribution code template.

S42:编译待编译文件,根据入口函数确定执行算子分发操作。S42: Compile the file to be compiled, and determine to execute the operator distribution operation according to the entry function.

S43:确定执行算子分发操作后,根据算子库确定函数确定分发的算子库。S43: After determining to execute the operator distribution operation, determine the distributed operator library according to the operator library determination function.

S44:确定分发的算子库后,根据内核库确定函数确定分发的内核库。S44: After determining the distributed operator library, determine the distributed kernel library according to the kernel library determination function.

步骤S41-S44中,分发定义函数采用多级嵌套的方式,使得分发代码模板可包括多种具体的实现方式,当需要分发指定的内核库时,相应地修改入口函数、算子库确定函数和内核库确定函数中相关的参数即可。In steps S41-S44, the distribution definition function adopts a multi-level nesting method, so that the distribution code template can include a variety of specific implementation methods. When the specified kernel library needs to be distributed, the entry function and operator library determination function are modified accordingly It is enough to determine the relevant parameters in the function with the kernel library.

应理解,上述实施例中各步骤的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。It should be understood that the sequence numbers of the steps in the above embodiments do not mean the order of execution, and the execution order of each process should be determined by its function and internal logic, and should not constitute any limitation to the implementation process of the embodiment of the present application.

在本申请实施例中,首先获取第一算子配置信息字典,可从第一算子配置信息字典中提取注册和/或分发时所需的算子库名、算子名称、内核库名和算子参数列表等键值对;接着基于不同的所述算子库名,获取算子的注册代码模板和/或分发代码模板,以利用这些预先配置好的模板确定算子的注册和/或分发逻辑;接着根据所述算子的所述注册代码模板和/或所述分发代码模板,筛选合并所述第一算子配置信息字典,以得到第二算子配置信息字典,能够扩充第一算子配置信息字典,且进一步确定注册代码模板和/或分发代码模板中需要更替的参数;最后根据所述第二算子配置信息字典更新所述注册代码模板和/或所述分发代码模板中的所述待替换参数,以实现将所述算子注册和/或分发到一个或多个所述内核库。本申请能够提高算子的注册和/或分发效率,让算子在同一后端中实现对多个内核库的分发。In this embodiment of the application, the first operator configuration information dictionary is obtained first, and the operator library name, operator name, kernel library name, and operator library name required for registration and/or distribution can be extracted from the first operator configuration information dictionary. Key-value pairs such as sub-parameter lists; then based on different operator library names, obtain the registration code template and/or distribution code template of the operator, so as to use these pre-configured templates to determine the registration and/or distribution of the operator Logic; then, according to the registration code template and/or the distribution code template of the operator, filter and merge the first operator configuration information dictionary to obtain a second operator configuration information dictionary, which can expand the first operator Sub-configuration information dictionary, and further determine the parameters that need to be replaced in the registration code template and/or distribution code template; finally update the registration code template and/or the distribution code template according to the second operator configuration information dictionary The parameter to be replaced is used to register and/or distribute the operator to one or more kernel libraries. This application can improve the registration and/or distribution efficiency of operators, and allow operators to distribute multiple kernel libraries in the same backend.

图2是本申请实施例中一种与机器学习处理器的算子处理方法一一对应的装置的原理框图。如图2所示,该机器学习处理器的算子处理装置包括第一获取模块10、第二获取模块20、字典合并模块30和更新模块40。FIG. 2 is a functional block diagram of a device corresponding to an operator processing method of a machine learning processor in an embodiment of the present application. As shown in FIG. 2 , the operator processing device of the machine learning processor includes afirst acquisition module 10 , asecond acquisition module 20 , adictionary combination module 30 and an update module 40 .

第一获取模块10,用于获取第一算子配置信息字典,第一算子配置信息字典包括算子库名、算子名称、内核库名和算子参数列表;The first acquiringmodule 10 is configured to acquire a first operator configuration information dictionary, where the first operator configuration information dictionary includes an operator library name, an operator name, a kernel library name, and an operator parameter list;

第二获取模块20,用于基于不同的算子库名,获取算子的注册代码模板和/或分发代码模板,其中,注册代码模板和/或分发代码模板包括待替换参数,待替换参数标识算子与内核库的对应关系;The second acquiringmodule 20 is configured to acquire the registration code template and/or the distribution code template of the operator based on different operator library names, wherein the registration code template and/or the distribution code template include the parameters to be replaced, and the parameters to be replaced identify Correspondence between operators and kernel libraries;

字典合并模块30,用于根据算子的注册代码模板和/或分发代码模板,筛选合并第一算子配置信息字典,以得到第二算子配置信息字典;Thedictionary merging module 30 is configured to filter and merge the first operator configuration information dictionary according to the operator's registration code template and/or distribution code template, so as to obtain the second operator configuration information dictionary;

更新模块40,用于根据第二算子配置信息字典更新注册代码模板和/或分发代码模板中的待替换参数,以实现将算子注册和/或分发到一个或多个内核库40。The update module 40 is configured to update the parameters to be replaced in the registration code template and/or the distribution code template according to the second operator configuration information dictionary, so as to register and/or distribute the operator to one or more kernel libraries 40 .

可选地,第一获取模块10具体用于:Optionally, the first acquiringmodule 10 is specifically used for:

获取算子配置文件。Obtain operator configuration files.

解析算子配置文件,得到第一算子配置信息字典。Parse the operator configuration file to obtain the first operator configuration information dictionary.

可选地,第一获取模块10还具体用于:Optionally, thefirst acquisition module 10 is also specifically configured to:

匹配算子配置文件中的算子库名、算子名称、内核库名和算子参数列表,得到第一算子配置信息字典。Match the operator library name, operator name, kernel library name, and operator parameter list in the operator configuration file to obtain the first operator configuration information dictionary.

可选地,该机器学习处理器的算子处理还具体用于:Optionally, the operator processing of the machine learning processor is also specifically used for:

获取用户输入的注册定义函数和/或分发定义函数,其中,待替换参数包括待替换注册参数和/或待替换分发参数,注册定义函数包括待替换注册参数,分发定义函数包括待替换分发参数。Obtain the registration definition function and/or distribution definition function input by the user, wherein the parameter to be replaced includes the registration parameter to be replaced and/or the distribution parameter to be replaced, the registration definition function includes the registration parameter to be replaced, and the distribution definition function includes the distribution parameter to be replaced.

根据注册定义函数和/或分发定义函数建立算子的注册代码模板和/或分发代码模板。The operator's registration code template and/or distribution code template is established according to the registration definition function and/or distribution definition function.

可选地,字典合并模块30还具体用于:Optionally, thedictionary merging module 30 is also specifically used for:

从注册代码模板和/或分发代码模板中解析得到待替换注册参数和/或待替换分发参数。The registration parameter to be replaced and/or the distribution parameter to be replaced is obtained by parsing from the registration code template and/or the distribution code template.

基于键值对对应关系,根据待替换注册参数和/或待替换分发参数得到替换参数。Based on the key-value pair correspondence, the replacement parameter is obtained according to the registration parameter to be replaced and/or the distribution parameter to be replaced.

根据替换参数生成待合并键值对组。Generate key-value pairs to be merged according to the replacement parameters.

将待合并键值对组合并到第一算子配置信息字典中,得到第二算子配置信息字典。Merge the key-value pairs to be merged into the first operator configuration information dictionary to obtain the second operator configuration information dictionary.

可选地,该机器学习处理器的算子处理还具体用于:Optionally, the operator processing of the machine learning processor is also specifically used for:

匹配并确定注册代码模板和/或分发代码模板中的替换标识符。Match and determine replacement identifiers in registration code templates and/or distribution code templates.

将替换标识符后携带的参数作为待替换注册参数和/或待替换分发参数,得到待替换注册参数和/或待替换分发参数。The parameter carried after the replacement identifier is used as the registration parameter to be replaced and/or the distribution parameter to be replaced to obtain the registration parameter to be replaced and/or the distribution parameter to be replaced.

可选地,该机器学习处理器的算子处理还具体用于:Optionally, the operator processing of the machine learning processor is also specifically used for:

根据算子库名、算子名称、内核库名和算子参数列表进行键值对筛选,得到替换参数。Filter key-value pairs based on the operator library name, operator name, kernel library name, and operator parameter list to obtain replacement parameters.

可选地,分发定义函数包括入口函数、算子库确定函数和内核库确定函数。Optionally, the distribution definition function includes an entry function, an operator library determination function, and a kernel library determination function.

可选地,该机器学习处理器的算子处理还具体用于:Optionally, the operator processing of the machine learning processor is also specifically used for:

将更新后的注册代码模板和分发代码模板写入到待编译文件中。Write the updated registration code template and distribution code template into the file to be compiled.

编译待编译文件,根据入口函数确定执行算子分发操作。Compile the file to be compiled, and determine the operator distribution operation according to the entry function.

确定执行算子分发操作后,根据算子库确定函数确定分发的算子库。After determining to execute the operator distribution operation, determine the distributed operator library according to the operator library determination function.

确定分发的算子库后,根据内核库确定函数确定分发的内核库。After determining the distributed operator library, determine the distributed kernel library according to the kernel library determination function.

在本申请实施例中,首先获取第一算子配置信息字典,可从第一算子配置信息字典中提取注册和/或分发时所需的算子库名、算子名称、内核库名和算子参数列表等键值对;接着基于不同的所述算子库名,获取算子的注册代码模板和/或分发代码模板,以利用这些预先配置好的模板确定算子的注册和/或分发逻辑;接着根据所述算子的所述注册代码模板和/或所述分发代码模板,筛选合并所述第一算子配置信息字典,以得到第二算子配置信息字典,能够扩充第一算子配置信息字典,且进一步确定注册代码模板和/或分发代码模板中需要更替的参数;最后根据所述第二算子配置信息字典更新所述注册代码模板和/或所述分发代码模板中的所述待替换参数,以实现将所述算子注册和/或分发到一个或多个所述内核库。本申请能够提高算子的注册和/或分发效率,让算子在同一后端中实现对多个内核库的分发。In this embodiment of the application, the first operator configuration information dictionary is obtained first, and the operator library name, operator name, kernel library name, and operator library name required for registration and/or distribution can be extracted from the first operator configuration information dictionary. Key-value pairs such as sub-parameter lists; then based on different operator library names, obtain the registration code template and/or distribution code template of the operator, so as to use these pre-configured templates to determine the registration and/or distribution of the operator Logic; then, according to the registration code template and/or the distribution code template of the operator, filter and merge the first operator configuration information dictionary to obtain a second operator configuration information dictionary, which can expand the first operator Sub-configuration information dictionary, and further determine the parameters that need to be replaced in the registration code template and/or distribution code template; finally update the registration code template and/or the distribution code template according to the second operator configuration information dictionary The parameter to be replaced is used to register and/or distribute the operator to one or more kernel libraries. This application can improve the registration and/or distribution efficiency of operators, and allow operators to distribute multiple kernel libraries in the same backend.

本申请还提供一种机器学习处理器,该机器学习处理器执行计算机可读指令时实现机器学习处理器的算子处理方法的步骤。The present application also provides a machine learning processor, which implements the steps of the operator processing method of the machine learning processor when executing computer readable instructions.

本申请还提供一种计算机设备,包括存储器、上述实施例的机器学习处理器以及存储在存储器中并可在机器学习处理器上运行的计算机可读指令,机器学习处理器执行计算机可读指令时执行如上述实施例中机器学习处理器的算子处理方法的步骤。The present application also provides a computer device, including a memory, the machine learning processor of the above-mentioned embodiments, and computer-readable instructions stored in the memory and operable on the machine learning processor. When the machine learning processor executes the computer-readable instructions Execute the steps of the operator processing method of the machine learning processor in the above embodiments.

图3是本申请实施例中包搭载机器学习处理器的计算机设备。如图3所示,计算机设备110包括机器学习处理器111、存储器112以及存储在存储器112中并可在机器学习处理器111上运行的计算机可读指令113。机器学习处理器111执行计算机可读指令113时实现机器学习处理器的算子处理方法的各个步骤。Fig. 3 is a computer device equipped with a machine learning processor in an embodiment of the present application. As shown in FIG. 3 , thecomputer device 110 includes amachine learning processor 111 , amemory 112 , and computerreadable instructions 113 stored in thememory 112 and executable on themachine learning processor 111 . When themachine learning processor 111 executes the computerreadable instructions 113, various steps of the operator processing method of the machine learning processor are implemented.

示例性地,计算机可读指令113可以被分割成一个或多个模块/单元,一个或者多个模块/单元被存储在存储器112中,并由机器学习处理器111执行,以完成本申请。一个或多个模块/单元可以是能够完成特定功能的一系列计算机可读指令段,该指令段用于描述计算机可读指令113在计算机设备110中的执行过程。Exemplarily, the computer-readable instructions 113 can be divided into one or more modules/units, and one or more modules/units are stored in thememory 112 and executed by themachine learning processor 111 to complete the present application. One or more modules/units may be a series of computer-readable instruction segments capable of performing specific functions, and the instruction segments are used to describe the execution process of the computer-readable instructions 113 in thecomputer device 110 .

计算机设备110可以是桌上型计算机、笔记本、掌上电脑及云端服务器等计算设备。计算机设备可包括,但不仅限于,机器学习处理器111、存储器112。本领域技术人员可以理解,图3仅仅是计算机设备110的示例,并不构成对计算机设备110的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件,例如计算机设备还可以包括输入输出设备、网络接入设备、总线等。Thecomputer device 110 may be computing devices such as desktop computers, notebooks, palmtop computers, and cloud servers. The computer device may include, but is not limited to, amachine learning processor 111 and amemory 112 . Those skilled in the art can understand that FIG. 3 is only an example of thecomputer device 110, and does not constitute a limitation to thecomputer device 110. It may include more or less components than those shown in the illustration, or combine certain components, or different components. , for example, a computer device may also include input and output devices, network access devices, buses, and so on.

根据实现方式的不同,所称机器学习处理器111可以包括中央处理器(CentralProcessing Unit,CPU)、图形处理器(Graphics Processing Unit,GPU)、人工智能处理器等通用和/或专用处理器中的一种或多种类型的处理器。这些处理器可以包括但不限于数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application SpecificIntegrated Circuit,ASIC)、现场可编程门阵列(Field-Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等,并且其数目可以根据实际需要来确定。存储器112可以是计算机设备110的内部存储单元,例如计算机设备110的硬盘或内存。存储器112也可以是计算机设备110的外部存储设备,例如计算机设备110上配备的插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(SecureDigital,SD)卡,闪存卡(Flash Card)等。进一步地,存储器112还可以既包括计算机设备110的内部存储单元也包括外部存储设备。存储器112用于存储计算机可读指令以及计算机设备所需的其他程序和数据。存储器112还可以用于暂时地存储已经输出或者将要输出的数据。According to different implementations, the so-calledmachine learning processor 111 may include general-purpose and/or special-purpose processors such as a central processing unit (Central Processing Unit, CPU), a graphics processing unit (Graphics Processing Unit, GPU), and an artificial intelligence processor. One or more types of processors. These processors may include but are not limited to Digital Signal Processor (Digital Signal Processor, DSP), Application Specific Integrated Circuit (Application Specific Integrated Circuit, ASIC), Field-Programmable Gate Array (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc., and the number thereof can be determined according to actual needs. Thestorage 112 may be an internal storage unit of thecomputer device 110 , such as a hard disk or a memory of thecomputer device 110 . Thememory 112 can also be an external storage device of thecomputer device 110, such as a plug-in hard disk equipped on thecomputer device 110, a smart memory card (Smart Media Card, SMC), a secure digital (SecureDigital, SD) card, a flash memory card (Flash Card )wait. Further, thestorage 112 may also include both an internal storage unit of thecomputer device 110 and an external storage device.Memory 112 is used to store computer readable instructions and other programs and data required by the computer device. Thememory 112 can also be used to temporarily store data that has been output or will be output.

本申请实施例可以基于人工智能技术对相关的数据进行获取和处理。其中,人工智能(Artificial Intelligence,AI)是利用数字计算机或者数字计算机控制的机器模拟、延伸和扩展人的智能,感知环境、获取知识并使用知识获得最佳结果的理论、方法、技术及应用系统。The embodiments of the present application may acquire and process relevant data based on artificial intelligence technology. Among them, artificial intelligence (AI) is the theory, method, technology and application system that uses digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use knowledge to obtain the best results. .

人工智能基础技术一般包括如传感器、专用人工智能芯片、云计算、分布式存储、大数据处理技术、操作/交互系统、机电一体化等技术。人工智能软件技术主要包括计算机视觉技术、机器人技术、生物识别技术、语音处理技术、自然语言处理技术以及机器学习/深度学习等几大方向。Artificial intelligence basic technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technology, operation/interaction systems, and mechatronics. Artificial intelligence software technology mainly includes computer vision technology, robotics technology, biometrics technology, speech processing technology, natural language processing technology, and machine learning/deep learning.

本申请实施例中,服务器可以是独立的服务器,也可以是提供云服务、云数据库、云计算、云函数、云存储、网络服务、云通信、中间件服务、域名服务、安全服务、内容分发网络(ContentDelivery Network,CDN)、以及大数据和人工智能平台等基础云计算服务的云服务器。In this embodiment of the application, the server can be an independent server, or it can provide cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communications, middleware services, domain name services, security services, content distribution Network (ContentDelivery Network, CDN), and cloud servers for basic cloud computing services such as big data and artificial intelligence platforms.

本申请还提供一种计算机可读存储介质,计算机可读存储介质存储有计算机可读指令,计算机可读指令被机器学习处理器执行时实现机器学习处理器的算子处理方法。The present application also provides a computer-readable storage medium. The computer-readable storage medium stores computer-readable instructions. When the computer-readable instructions are executed by a machine learning processor, an operator processing method of the machine learning processor is implemented.

以上实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围,均应包含在本申请的保护范围之内。The above embodiments are only used to illustrate the technical solutions of the present application, rather than to limit them; although the present application has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that: it can still apply to the foregoing embodiments Modifications to the technical solutions recorded, or equivalent replacements for some of the technical features; and these modifications or replacements do not make the essence of the corresponding technical solutions deviate from the spirit and scope of the technical solutions of each embodiment of the application, and should be included in this application. within the scope of protection.

Claims (13)

CN202111672390.3A2021-12-312021-12-31Operator processing method, processor, device and medium of machine learning processorPendingCN116415682A (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN202111672390.3ACN116415682A (en)2021-12-312021-12-31Operator processing method, processor, device and medium of machine learning processor

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN202111672390.3ACN116415682A (en)2021-12-312021-12-31Operator processing method, processor, device and medium of machine learning processor

Publications (1)

Publication NumberPublication Date
CN116415682Atrue CN116415682A (en)2023-07-11

Family

ID=87051851

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN202111672390.3APendingCN116415682A (en)2021-12-312021-12-31Operator processing method, processor, device and medium of machine learning processor

Country Status (1)

CountryLink
CN (1)CN116415682A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN110990053A (en)*2019-12-042020-04-10第四范式(北京)技术有限公司 Creation method, usage method and device of machine learning scheme template
CN112148391A (en)*2019-06-262020-12-29北京百度网讯科技有限公司 Method, apparatus, device, and storage medium for generating chip-based computing functions
US20210103433A1 (en)*2019-10-022021-04-08Nvidia CorporationKernel fusion for machine learning
US20210158131A1 (en)*2019-11-272021-05-27Amazon Technologies, Inc.Hierarchical partitioning of operators
WO2021258692A1 (en)*2020-06-242021-12-30苏州大学Multi-chip compatible compiling method and device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN112148391A (en)*2019-06-262020-12-29北京百度网讯科技有限公司 Method, apparatus, device, and storage medium for generating chip-based computing functions
US20210103433A1 (en)*2019-10-022021-04-08Nvidia CorporationKernel fusion for machine learning
US20210158131A1 (en)*2019-11-272021-05-27Amazon Technologies, Inc.Hierarchical partitioning of operators
CN110990053A (en)*2019-12-042020-04-10第四范式(北京)技术有限公司 Creation method, usage method and device of machine learning scheme template
WO2021258692A1 (en)*2020-06-242021-12-30苏州大学Multi-chip compatible compiling method and device

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
S.C. GOLDSTEIN等: "PipeRench: a reconfigurable architecture and compiler", COMPUTER, vol. 33, no. 4, 6 August 2002 (2002-08-06), pages 70, XP000948676, DOI: 10.1109/2.839324*
钱新宇: "基于OpenCL的深度卷积神经网络推理加速与性能模型研究", 中国优秀硕士学位论文全文数据库信息科技辑, no. 02, 15 February 2020 (2020-02-15), pages 140 - 180*

Similar Documents

PublicationPublication DateTitle
AU2018272840B2 (en)Automated dependency analyzer for heterogeneously programmed data processing system
CN111782265B (en)Software resource system based on field-level blood-relation and establishment method thereof
CN110908997A (en)Data blood margin construction method and device, server and readable storage medium
US8433673B2 (en)System and method for supporting data warehouse metadata extension using an extender
US8453136B1 (en)Change tracking and incremental synchronization of EDA design and technology data
US7340475B2 (en)Evaluating dynamic expressions in a modeling application
CN108334609B (en)Method, device, equipment and storage medium for realizing JSON format data access in Oracle
CN115827895A (en)Vulnerability knowledge graph processing method, device, equipment and medium
CN110765101B (en) Label generation method, device, computer-readable storage medium and server
CN104461531B (en)A kind of implementation method of reporting system SQL
CN112559444B (en) SQL file migration method, device, storage medium and equipment
CN109710220B (en)Relational database query method, relational database query device, relational database query equipment and storage medium
CN110019116A (en)Data traceability method, apparatus, data processing equipment and computer storage medium
CN110866029A (en)sql statement construction method, device, server and readable storage medium
CN114116773A (en) Structured query language SQL text review method and device
EP4542380A1 (en)Method and system for processing data conflict, and electronic device and computer-readable storage medium
CN116643814A (en) Method for building model library, method for invoking model based on model library, and related equipment
CN115617594B (en) Method, device, storage medium and program product for generating incentive information
CN119106054B (en)LLM-based data analysis private domain knowledge input auxiliary method
CN111984666A (en)Database access method and device, computer readable storage medium and computer equipment
CN118939635A (en) Data migration method, device, terminal device and computer readable storage medium
CN116415682A (en)Operator processing method, processor, device and medium of machine learning processor
CN117762984A (en)Data acquisition method, device, electronic equipment and storage medium
CN117851375A (en) Metadata model optimization method and computing device
JP6870454B2 (en) Analytical equipment, analytical programs and analytical methods

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination

[8]ページ先頭

©2009-2025 Movatter.jp