Movatterモバイル変換


[0]ホーム

URL:


CN115759294B - Data processing methods, devices, electronic equipment and storage media - Google Patents

Data processing methods, devices, electronic equipment and storage media
Download PDF

Info

Publication number
CN115759294B
CN115759294BCN202211488824.9ACN202211488824ACN115759294BCN 115759294 BCN115759294 BCN 115759294BCN 202211488824 ACN202211488824 ACN 202211488824ACN 115759294 BCN115759294 BCN 115759294B
Authority
CN
China
Prior art keywords
einstein
operator
operand
tensor
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211488824.9A
Other languages
Chinese (zh)
Other versions
CN115759294A (en
Inventor
熊昆
张留杰
刘红雨
蓝翔
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co LtdfiledCriticalBeijing Baidu Netcom Science and Technology Co Ltd
Priority to CN202211488824.9ApriorityCriticalpatent/CN115759294B/en
Publication of CN115759294ApublicationCriticalpatent/CN115759294A/en
Application grantedgrantedCritical
Publication of CN115759294BpublicationCriticalpatent/CN115759294B/en
Activelegal-statusCriticalCurrent
Anticipated expirationlegal-statusCritical

Links

Classifications

Landscapes

Abstract

Translated fromChinese

本公开提供了一种数据处理方法、装置、电子设备及存储介质,涉及人工智能技术领域,尤其涉及机器学习、自然语言处理、计算机视觉和蛋白质结构预测等技术领域。具体实现方案为:获取第一爱因斯坦算子所需的第一张量;调用内核对第一张量执行转置得到第一张量转置结果并存储到目标存储介质;在规划调度阶段按照预设标记规则生成第二爱因斯坦算子第一张量的下标;在需要复用第一张量转置结果的情况下,从目标存储介质中读取第一张量转置结果;基于读取的第一张量转置结果和第一张量的下标,执行第二爱因斯坦算子的运算。本公开实施例中通过复用第一爱因斯坦算子的第一张量转置结果,避免了多次启动调用内核,降低了资源消耗,提高了运算效率。

The present disclosure provides a data processing method, device, electronic equipment and storage medium, which relate to the technical field of artificial intelligence, especially to the technical fields of machine learning, natural language processing, computer vision and protein structure prediction. The specific implementation plan is: obtain the first tensor required for the first Einstein operator; call the kernel to perform transposition on the first tensor to obtain the first tensor transposition result and store it in the target storage medium; in the planning and scheduling phase Generate the subscript of the first tensor of the second Einstein operator according to the preset marking rules; when it is necessary to reuse the first tensor transposition result, read the first tensor transposition result from the target storage medium ; Based on the read transposition result of the first tensor and the subscript of the first tensor, perform the operation of the second Einstein operator. In the embodiment of the present disclosure, by reusing the first tensor transposition result of the first Einstein operator, multiple startup calls to the kernel are avoided, resource consumption is reduced, and computing efficiency is improved.

Description

Translated fromChinese
数据处理方法、装置、电子设备及存储介质Data processing methods, devices, electronic equipment and storage media

技术领域Technical field

本公开涉及人工智能技术领域,尤其涉及机器学习、自然语言处理、计算机视觉和蛋白质结构预测等技术领域。The present disclosure relates to the technical field of artificial intelligence, and in particular to the technical fields of machine learning, natural language processing, computer vision, and protein structure prediction.

背景技术Background technique

Einsum(Einstein summation convention,爱因斯坦求和约定)又称爱因斯坦标记法或爱因斯坦算子,可以轻松地表示各种线性运算。Einsum (Einstein summation convention), also known as Einstein notation or Einstein operator, can easily represent various linear operations.

在深度学习任务中,线性运算应用广泛,例如可以应用于Linear(线性)层、Matmul(矩阵乘法)层等。由此,爱因斯坦算子非常重要,可以极大地加速用户的模型设计和实现。In deep learning tasks, linear operations are widely used, such as Linear (linear) layer, Matmul (matrix multiplication) layer, etc. Therefore, the Einstein operator is very important and can greatly speed up the user's model design and implementation.

相关技术中,爱因斯坦算子的运算需要调用支持相应算子的内核,但是内核的启动和调用耗费大量的计算资源和时间。In related technologies, the operation of the Einstein operator requires calling a kernel that supports the corresponding operator, but the startup and calling of the kernel consumes a large amount of computing resources and time.

发明内容Contents of the invention

本公开提供了一种数据处理方法、装置、电子设备及存储介质。The present disclosure provides a data processing method, device, electronic equipment and storage medium.

根据本公开的一方面,提供了一种数据处理方法,包括:According to one aspect of the present disclosure, a data processing method is provided, including:

获取第一爱因斯坦算子所需的第一张量;Obtain the first tensor required for the first Einstein operator;

调用内核对第一张量执行转置操作,得到第一张量转置结果;Call the kernel to perform the transpose operation on the first tensor and obtain the transposition result of the first tensor;

存储第一张量转置结果到目标存储介质中;Store the first tensor transposition result in the target storage medium;

在对第二爱因斯坦算子的第一张量进行规划调度的阶段,按照预设标记规则生成第二爱因斯坦算子的第一张量的下标;其中,预设标记规则满足能够复用第一张量转置结果的要求;In the stage of planning and scheduling the first tensor of the second Einstein operator, the subscript of the first tensor of the second Einstein operator is generated according to the preset marking rules; wherein, the preset marking rules satisfy that The requirement to reuse the transposed result of the first tensor;

在第二爱因斯坦算子需要复用第一张量转置结果的情况下,从目标存储介质中读取第一张量转置结果;When the second Einstein operator needs to reuse the first tensor transposition result, read the first tensor transposition result from the target storage medium;

基于读取的第一张量转置结果以及第二爱因斯坦算子的第一张量的下标,执行第二爱因斯坦算子的运算过程。Based on the read transposition result of the first tensor and the subscript of the first tensor of the second Einstein operator, the operation process of the second Einstein operator is performed.

根据本公开的另一方面,提供了一种数据处理装置,包括:According to another aspect of the present disclosure, a data processing apparatus is provided, including:

第一获取模块,用于获取第一爱因斯坦算子所需的第一张量;The first acquisition module is used to obtain the first tensor required for the first Einstein operator;

调用模块,用于调用内核对第一张量执行转置操作,得到第一张量转置结果;The calling module is used to call the kernel to perform the transposition operation on the first tensor and obtain the transposition result of the first tensor;

第一存储模块,用于存储第一张量转置结果到目标存储介质中;The first storage module is used to store the first tensor transposition result in the target storage medium;

生成模块,用于在对第二爱因斯坦算子的第一张量进行规划调度的阶段,按照预设标记规则生成第二爱因斯坦算子的第一张量的下标;其中,预设标记规则满足能够复用第一张量转置结果的要求;The generation module is used to generate the subscript of the first tensor of the second Einstein operator according to the preset marking rules during the planning and scheduling stage of the first tensor of the second Einstein operator; wherein, the preset Assume that the marking rule meets the requirement of being able to reuse the transposed result of the first tensor;

读取模块,用于在第二爱因斯坦算子需要复用第一张量转置结果的情况下,从目标存储介质中读取第一张量转置结果;A reading module, used to read the first tensor transposition result from the target storage medium when the second Einstein operator needs to reuse the first tensor transposition result;

执行模块,用于基于读取的第一张量转置结果以及第二爱因斯坦算子的第一张量的下标,执行第二爱因斯坦算子的运算过程。The execution module is configured to execute the operation process of the second Einstein operator based on the read transposition result of the first tensor and the subscript of the first tensor of the second Einstein operator.

根据本公开的另一方面,提供了一种电子设备,包括:According to another aspect of the present disclosure, an electronic device is provided, including:

至少一个处理器;以及at least one processor; and

与该至少一个处理器通信连接的存储器;其中,A memory communicatively connected to the at least one processor; wherein,

该存储器存储有可被该至少一个处理器执行的指令,该指令被该至少一个处理器执行,以使该至少一个处理器能够执行本公开实施例中任一的方法。The memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor, so that the at least one processor can perform any method in the embodiments of the present disclosure.

根据本公开的另一方面,提供了一种存储有计算机指令的非瞬时计算机可读存储介质,其中,该计算机指令用于使该计算机执行根据本公开实施例中任一的方法。According to another aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium storing computer instructions, wherein the computer instructions are used to cause the computer to perform a method according to any one of the embodiments of the present disclosure.

根据本公开的另一方面,提供了一种计算机程序产品,包括计算机程序,该计算机程序在被处理器执行时实现根据本公开实施例中任一的方法。According to another aspect of the present disclosure, there is provided a computer program product comprising a computer program that, when executed by a processor, implements a method according to any one of the embodiments of the present disclosure.

本公开实施例中,通过在第二爱因斯坦算子计算过程中复用第一爱因斯坦算子的第一张量转置结果,避免了多次启动调用内核,从而节约了资源消耗,提高了运算效率。In the embodiment of the present disclosure, by reusing the first tensor transposition result of the first Einstein operator in the calculation process of the second Einstein operator, multiple startup calls to the kernel are avoided, thereby saving resource consumption. Improved operational efficiency.

应当理解,本部分所描述的内容并非旨在标识本公开的实施例的关键或重要特征,也不用于限制本公开的范围。本公开的其它特征将通过以下的说明书而变得容易理解。It should be understood that what is described in this section is not intended to identify key or important features of the embodiments of the disclosure, nor is it intended to limit the scope of the disclosure. Other features of the present disclosure will become readily understood from the following description.

附图说明Description of the drawings

附图用于更好地理解本方案,不构成对本公开的限定。其中:The accompanying drawings are used to better understand the present solution and do not constitute a limitation of the present disclosure. in:

图1是根据本公开一实施例的数据处理方法的流程示意图;Figure 1 is a schematic flowchart of a data processing method according to an embodiment of the present disclosure;

图2是根据本公开另一实施例将所得梯度转置为目标梯度的流程示意图;Figure 2 is a schematic flowchart of transposing the obtained gradient into a target gradient according to another embodiment of the present disclosure;

图3a是根据本公开另一实施例的数据处理方法的一个示例的示意图;Figure 3a is a schematic diagram of an example of a data processing method according to another embodiment of the present disclosure;

图3b是根据本公开另一实施例的数据处理方法的另一个示例的示意图;Figure 3b is a schematic diagram of another example of a data processing method according to another embodiment of the present disclosure;

图4是根据本公开另一实施例的数据处理方法在自然语言处理领域的应用示意图;Figure 4 is a schematic diagram of the application of a data processing method in the field of natural language processing according to another embodiment of the present disclosure;

图5是根据本公开另一实施例的数据处理方法在蛋白质结构预测领域的应用示意图;Figure 5 is a schematic diagram of the application of a data processing method in the field of protein structure prediction according to another embodiment of the present disclosure;

图6是根据本公开另一实施例的数据处理装置的结构示意图;Figure 6 is a schematic structural diagram of a data processing device according to another embodiment of the present disclosure;

图7是用来实现本公开实施例的数据处理方法的电子设备的框图。FIG. 7 is a block diagram of an electronic device used to implement the data processing method of an embodiment of the present disclosure.

具体实施方式Detailed ways

以下结合附图对本公开的示范性实施例做出说明,其中包括本公开实施例的各种细节以助于理解,应当将它们认为仅仅是示范性的。因此,本领域普通技术人员应当认识到,可以对这里描述的实施例做出各种改变和修改,而不会背离本公开的范围和精神。同样,为了清楚和简明,以下的描述中省略了对公知功能和结构的描述。Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the present disclosure are included to facilitate understanding and should be considered to be exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications can be made to the embodiments described herein without departing from the scope and spirit of the disclosure. Also, descriptions of well-known functions and constructions are omitted from the following description for clarity and conciseness.

本公开实施例中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或单元。方法、系统、产品或设备不必限于清楚地列出的那些步骤或单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。The terms "first", "second", etc. in the embodiments of the present disclosure are used to distinguish similar objects and are not necessarily used to describe a specific order or sequence. Furthermore, the terms "including" and "having" and any variations thereof are intended to cover a non-exclusive inclusion, for example, the inclusion of a series of steps or units. Methods, systems, products or devices are not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such processes, methods, products or devices.

爱因斯坦算子,可以轻松地表示各种线性运算。例如:ij->ji可以表示转置、ij,jk->ik可以表示矩阵乘法,也可表示外积、内积等。将所有的线性操作都使用统一的爱因斯坦算子进行求和表示,可以很大程度地减轻用户的记忆负担。Einstein operators can easily represent various linear operations. For example: ij->ji can represent transposition, ij, jk->ik can represent matrix multiplication, outer product, inner product, etc. Using a unified Einstein operator for summation representation of all linear operations can greatly reduce the user's memory burden.

深度学习框架中,基于两个操作数的爱因斯坦算子主要接受3个参数:2个输入操作数(例如张量A和张量B),以及一个字符串计算表达式,最后输出一个张量C。例如,表达式c=paddle.einsum(a,b,“ij,jk->ik”)中,a和b分别为一个操作数,ij,jk->ik为计算表达式、c为爱因斯坦算子的输出结果,paddle.einsum可理解为执行爱因斯坦算子的方法。In the deep learning framework, the Einstein operator based on two operands mainly accepts 3 parameters: 2 input operands (such as tensor A and tensor B), and a string calculation expression, and finally outputs a tensor Amount C. For example, in the expression c=paddle.einsum(a, b, "ij, jk->ik"), a and b are one operand respectively, ij, jk->ik are calculation expressions, and c is Einstein The output result of the operator, paddle.einsum, can be understood as the method of executing the Einstein operator.

一般情况下,几乎所有的爱因斯坦算子的前向实现逻辑上都可以分为两个部分:Planner(规划调度的阶段)和Executor(执行的阶段)。Planner部分就是通过Equation(表达式)来规划如何组合相关操作,其中,相关操作例如Transpose(转置)、Matmul、Sum(求和)等。因此一个大的爱因斯坦算子会被分解为非常多的小计算,这些小计算按照一定的顺序进行计算就可以得到最后的结果。Generally speaking, the forward implementation of almost all Einstein operators can be logically divided into two parts: Planner (the planning and scheduling stage) and Executor (the execution stage). The Planner part uses Equation (expression) to plan how to combine related operations, among which related operations such as Transpose (transposition), Matmul, Sum (summation), etc. Therefore, a large Einstein operator will be decomposed into many small calculations. These small calculations can be calculated in a certain order to obtain the final result.

当前主要的爱因斯坦算子实现方式有两种。一种是Trace(追溯)方式,另一种是组合方式。There are currently two main ways to implement the Einstein operator. One is the Trace method and the other is the combination method.

组合方式通常会实现一个高效的C++端的爱因斯坦算子前向函数,称为EinsumKernel(爱因斯坦核),然后在需要计算梯度的时候调用EinsumGradKernel(爱因斯坦梯度核)函数。其中,EinsumGradKernel函数通过调用2次EinsumKernel来完成对输入的2个操作数(例如张量A和张量B)的求导。使用组合方式实现爱因斯坦算子,能达到模块化良好,反向可共享前向的逻辑,实现简单且有助于后续的优化。The combination method usually implements an efficient Einstein operator forward function on the C++ side, called EinsumKernel (Einstein kernel), and then calls the EinsumGradKernel (Einstein gradient kernel) function when the gradient needs to be calculated. Among them, the EinsumGradKernel function completes the derivation of the two input operands (such as tensor A and tensor B) by calling EinsumKernel twice. Using a combination method to implement the Einstein operator can achieve good modularity, and the forward logic can be shared in the reverse direction. The implementation is simple and helpful for subsequent optimization.

Trace方式则相对组合方式更加简单,具体实现为:前向计算时记录下参与计算的算子顺序,并且按照与前向计算相反的顺序,重新调用一次对应的反向算子。例如前向调用了Transpose算子->Matmul算子,那么在反向的时候调用Matmul梯度算子->Transpose梯度算子即可。The trace method is simpler than the combination method. The specific implementation is as follows: during forward calculation, the order of operators participating in the calculation is recorded, and the corresponding reverse operator is re-called in the reverse order of the forward calculation. For example, if the Transpose operator->Matmul operator is called in the forward direction, then the Matmul gradient operator->Transpose gradient operator can be called in the reverse direction.

相对组合方式而言,Trace方式具有反向速度快的优点,但是缺点是实现复杂并且非常依赖框架提供的基础组件。如果框架不支持反向Op(operation code,操作码)的存在并且没有提供基本的Trace组件,那么实现成本将十分巨大。因为对Op的反向Trace组件几乎就是一个深度学习框架的核心机制了。Compared with the combination method, the Trace method has the advantage of fast reverse speed, but the disadvantage is that the implementation is complex and it relies heavily on the basic components provided by the framework. If the framework does not support the existence of reverse Op (operation code, operation code) and does not provide basic Trace components, then the implementation cost will be very huge. Because the reverse Trace component of Op is almost the core mechanism of a deep learning framework.

有鉴于此,本公开实施例对组合方式进行改造,以期能够提高组合方式执行爱因斯坦算子的速度,以降低对核资源的消耗。为了达到该目标,本公开实施例提出了组合方式能够复用前向传播的技术构思,以期望通过减少前向调用内核的数量,来加快组合方式中爱因斯坦算子的运算速度,并达到节约算力资源的目的。In view of this, embodiments of the present disclosure modify the combination method in order to improve the speed of executing the Einstein operator in the combination method and reduce the consumption of core resources. In order to achieve this goal, the embodiment of the present disclosure proposes a technical concept that can reuse forward propagation in the combination mode, hoping to speed up the operation speed of the Einstein operator in the combination mode by reducing the number of forward call kernels, and achieve The purpose of saving computing resources.

基于此技术构思,本公开提出了一种数据处理方法,该方法可应用于任何需要执行爱因斯坦算子的场景中,例如自然语言处理、机器视觉和蛋白质结构预测等领域。Based on this technical concept, the present disclosure proposes a data processing method that can be applied in any scenario that requires the execution of the Einstein operator, such as natural language processing, machine vision, and protein structure prediction.

如图1所示,为本公开实施例中数据处理方法的流程图,包括:As shown in Figure 1, it is a flow chart of a data processing method in an embodiment of the present disclosure, including:

S101,获取第一爱因斯坦算子所需的第一张量。S101, obtain the first tensor required for the first Einstein operator.

其中,第一张量即第一爱因斯坦算子的操作数,如前文示例中的A张量、B张量。Among them, the first tensor is the operand of the first Einstein operator, such as the A tensor and B tensor in the previous example.

S102,调用内核对第一张量执行转置操作,得到第一张量转置结果。S102: Call the kernel to perform a transposition operation on the first tensor to obtain the transposition result of the first tensor.

S103,存储第一张量转置结果到目标存储介质中。S103. Store the first tensor transposition result in the target storage medium.

其中,目标存储介质例如可以是缓存。The target storage medium may be a cache, for example.

S104,在对第二爱因斯坦算子的第一张量进行规划调度的阶段,按照预设标记规则生成第二爱因斯坦算子的第一张量的下标。其中,预设标记规则满足能够复用第一张量转置结果的要求。S104. In the stage of planning and scheduling the first tensor of the second Einstein operator, generate the subscript of the first tensor of the second Einstein operator according to the preset marking rules. Among them, the preset marking rules meet the requirement of being able to reuse the first tensor transposition result.

如前文所阐述的,任一爱因斯坦算子包括Planner阶段和Executor阶段。其中要执行爱因斯坦算子需要先在Planner阶段对爱因斯坦算子的下标进行标记,然后在Executor阶段依据标记的结果执行运算。如表1所示,为一种对下标进行标记的示例。表1中包括集合及该集合下的下标。其中,当操作数有2个且分别为A和B时,例如equation:mik,mjk->mij。那么Planner阶段先将下标归类并标记,常见的类别有4类,具体可包括:As explained previously, any Einstein operator includes a Planner stage and an Executor stage. To execute the Einstein operator, you need to first mark the subscript of the Einstein operator in the Planner stage, and then execute the operation based on the marked result in the Executor stage. As shown in Table 1, it is an example of marking subscripts. Table 1 includes the set and the subscript under the set. Among them, when there are two operands and they are A and B respectively, for example, equation:mik,mjk->mij. Then in the Planner stage, the subscripts are first classified and marked. There are 4 common categories, which can include:

batch(下文也称之为第一标记集合);batch (hereinafter also referred to as the first mark set);

free(下文也称之为第二标记集合或第三标记集合);free (hereinafter also referred to as the second tag set or the third tag set);

Contraction(规约)和reduction(自减)。Contraction and reduction.

在爱因斯坦算子中A和B分别为操作数,O为输出结果。相应地,batch表示A,B和O中共有的下标。free表示只存在AO或者是BO中的下标。Contraction表示A和B中同时存在,但是在O中不存在的下标。Reduction表示只在A或者B中存在的下标。In the Einstein operator, A and B are operands respectively, and O is the output result. Correspondingly, batch represents the common subscripts in A, B and O. Free means that only AO or subscripts in BO exist. Contraction represents a subscript that exists in both A and B but does not exist in O. Reduction represents a subscript that only exists in A or B.

按照此归类规则,在equation:mik,mjk->mij中,m属于batch,i和j属于free类别,k属于contraction类别。对于reduction,例如einsum(“i,j->j”,A,B),当输入的参数中有i,但经运算后输出结果中无i时,表示对i的reduction。According to this classification rule, in equation:mik,mjk->mij, m belongs to batch, i and j belong to the free category, and k belongs to the contraction category. For reduction, such as einsum("i, j->j", A, B), when there is i in the input parameter but there is no i in the output result after operation, it means the reduction of i.

ABOABOAOAOBOBOABABAABBbatchbatchfreeAfreeAfreeBfreeBcontractioncontractionreductionreductionreductionreduction

表1Table 1

本公开实施例中,通过规划Planner阶段的标记规则,能够使得第二爱因斯坦算子能够复用第一爱因斯坦算子的前向传播结果。例如,在第二爱因斯坦算子的计算过程中,可能调用内核执行与第一爱因斯坦算子的计算过程中相同的操作。例如,第二爱因斯坦算子和第一爱因斯坦算子的操作数都包括第一张量,且两个爱因斯坦算子的计算过程中都需要第一张量的转置。那么,可以通过预设标记规则,使得第二爱因斯坦算子中的第一张量可以复用第一爱因斯坦算子的第一张量转置结果。此时,如果在第二爱因斯坦算子计算中需要第一张量转置结果时,能够直接从目标存储介质中读取第一张量转置结果,以满足第二爱因斯坦算子的计算需求。由此,直接重复使用第一张量转置结果,而不是重复调用转置的内核,重复执行转置操作,从而省去了部分第二爱因斯坦算子的中间计算过程。In the embodiment of the present disclosure, by planning the marking rules in the Planner stage, the second Einstein operator can reuse the forward propagation result of the first Einstein operator. For example, during the calculation of the second Einstein operator, the kernel may be called to perform the same operations as during the calculation of the first Einstein operator. For example, the operands of the second Einstein operator and the first Einstein operator both include the first tensor, and the calculation process of both Einstein operators requires the transposition of the first tensor. Then, the first tensor in the second Einstein operator can reuse the first tensor transposition result of the first Einstein operator through preset marking rules. At this time, if the first tensor transposition result is needed in the calculation of the second Einstein operator, the first tensor transposition result can be directly read from the target storage medium to satisfy the second Einstein operator computing requirements. As a result, the first tensor transposition result is directly reused instead of repeatedly calling the transposed kernel and repeatedly performing the transposition operation, thereby eliminating part of the intermediate calculation process of the second Einstein operator.

因此,可执行S105,在第二爱因斯坦算子需要复用第一张量转置结果的情况下,从目标存储介质中读取第一张量转置结果。Therefore, S105 may be executed to read the first tensor transposition result from the target storage medium when the second Einstein operator needs to reuse the first tensor transposition result.

S106,基于读取的第一张量转置结果以及第二爱因斯坦算子的第一张量的下标,执行第二爱因斯坦算子的运算过程。S106. Based on the read transposition result of the first tensor and the subscript of the first tensor of the second Einstein operator, execute the operation process of the second Einstein operator.

相关技术中每次想要得到第一张量转置结果时,都需要启动和调用内核来对第一张量执行转置操作。但在实际情况中,调用内核执行相关运算的成本很大,多次调用内核会严重影响计算效率,加重硬件负担。本公开实施例中,通过在第二爱因斯坦算子计算过程中复用第一爱因斯坦算子的第一张量转置结果,避免了多次启动和调用内核,从而大大降低了资源消耗,提高了运算效率。In the related art, every time you want to get the transposed result of the first tensor, you need to start and call the kernel to perform the transpose operation on the first tensor. However, in actual situations, the cost of calling the kernel to perform related operations is very high. Calling the kernel multiple times will seriously affect the computing efficiency and increase the burden on the hardware. In the embodiment of the present disclosure, by reusing the first tensor transposition result of the first Einstein operator in the calculation process of the second Einstein operator, multiple startups and calls to the kernel are avoided, thereby greatly reducing resources. consumption, improving computing efficiency.

在一些实施例中,第一爱因斯坦算子中包括两个操作数,第一张量为两个操作数中的任一操作数。也即,可以将第一爱因斯坦算子的第一个操作数视为第一张量,存储其第一个操作数的转置结果供第二爱因斯坦算子复用。也可以将第一爱因斯坦算子的第二个操作数视为第一张量,存储其第二个操作数的转置结果供第二爱因斯坦算子复用。还可以将第一爱因斯坦算子的第一个操作数和第二个操作数均分别视为第一张量,存储其第一个操作数和第二个操作数各自的转置结果供第二爱因斯坦算子复用。In some embodiments, the first Einstein operator includes two operands, and the first tensor is any one of the two operands. That is, the first operand of the first Einstein operator can be regarded as the first tensor, and the transposed result of the first operand is stored for reuse by the second Einstein operator. The second operand of the first Einstein operator can also be regarded as the first tensor, and the transposed result of its second operand is stored for reuse by the second Einstein operator. The first operand and the second operand of the first Einstein operator can also be regarded as the first tensor respectively, and the respective transposed results of the first operand and the second operand are stored for The second Einstein operator is reused.

本公开实施例中,将爱因斯坦算子中的任一个操作数作为第一张量,而不必局限于将某一个具体的操作数限定为第一张量,能够按需合理的使用目标存储介质。此外,通过复用第一张量转置结果,可以有效减少对内核的调用次数,降低资源消耗。此外,在模型训练阶段,按需确定第一张量,能够使数据处理过程更加灵活高效。In the embodiment of the present disclosure, any operand in the Einstein operator is used as the first tensor, and there is no need to limit a specific operand to the first tensor. The target storage can be reasonably used as needed. medium. In addition, by reusing the first tensor transposition result, the number of calls to the kernel can be effectively reduced and resource consumption reduced. In addition, during the model training phase, determining the first tensor on demand can make the data processing process more flexible and efficient.

为便于理解,下面对本公开实施例中的预设标记规则的产生,以及该规则的具体内容进行详细说明。To facilitate understanding, the generation of the preset marking rule in the embodiment of the present disclosure and the specific content of the rule are described in detail below.

例如,在反向时,可以使用两次前向来进行计算。比如对O=paddle.einsum(“ibnd,jbnd->bnij”,A,B)进行反向计算时,可以使dA=paddle.einsum(“ibnd,jbnd->bnij”,dO,B),即将操作数A用dO进行替换,同时使用O的Equation替换操作数A的Equation,此时,相当于操作数A和O进行了互换。如果想要复用转置结果,就要考虑在三次EinsumKernel替换中,如果TA1,TA2,TA3的下标完全相同,就可以实现复用。下面对TA1,TA2,TA3的下标完全相同,就可以复用的原因进行分析说明,以下面三个前向调用为例,其表达式分别为1)、2)和3):For example, when going backwards, you can use two forward calculations. For example, when performing the reverse calculation of O = paddle.einsum ("ibnd, jbnd->bnij", A, B), you can make dA = paddle.einsum ("ibnd, jbnd->bnij", dO, B), that is, Operand A is replaced with dO, and the Equation of operand A is replaced with the Equation of O. At this time, it is equivalent to the exchange of operands A and O. If you want to reuse the transposed results, you must consider the three EinsumKernel replacements. If the subscripts of TA1, TA2, and TA3 are exactly the same, reuse can be achieved. The following is an analysis and explanation of why TA1, TA2, and TA3 can be reused because their subscripts are exactly the same. Take the following three forward calls as an example. Their expressions are 1), 2), and 3) respectively:

O=paddle.einsum(A,B)1)O=paddle.einsum(A,B)1)

dA=paddle.einsum(B,dO)2)dA=paddle.einsum(B,dO)2)

B=paddle.einsum(A,dO)3)B=paddle.einsum(A,dO)3)

其中,TA1、TA2、TA3分别对应操作数A在表达1)、2)、3)中包含的A的转置的下标。Among them, TA1, TA2, and TA3 respectively correspond to the subscripts of the transposition of A contained in the expressions 1), 2), and 3) of the operand A.

因为操作数A在表达式2)中并不作为输入操作数参与计算,所以表达式2)不需要复用操作数A的转置。但操作数A是表达式3)中的输入操作数,故希望表达式3)能够复用表达式1)中操作数A的转置。由于操作数A在1)中的TA1为:ABO,AO,AB。操作数A在3)中的TA3为:ABO,AB,AO。可见,TA1和TA3并不一致,所以不能复用操作数A的转置。因此,本公开实施例通过修改标记规则,使得表达式3)能够复用表达式1)中操作数A的转置。Because operand A does not participate in the calculation as an input operand in expression 2), expression 2) does not need to reuse the transpose of operand A. But operand A is the input operand in expression 3), so we hope that expression 3) can reuse the transpose of operand A in expression 1). Since the TA1 of operand A in 1) is: ABO, AO, AB. The TA3 of operand A in 3) is: ABO, AB, AO. It can be seen that TA1 and TA3 are not consistent, so the transposition of operand A cannot be reused. Therefore, the embodiment of the present disclosure enables expression 3) to reuse the transposition of operand A in expression 1) by modifying the marking rules.

有鉴于此,本公开实施例中,在对第二爱因斯坦算子的第一张量进行规划调度的阶段,按照预设标记规则生成第二爱因斯坦算子的第一张量的下标,预设标记规则包括:In view of this, in the embodiment of the present disclosure, in the stage of planning and scheduling the first tensor of the second Einstein operator, the next step of the first tensor of the second Einstein operator is generated according to the preset marking rules. Marking, the default marking rules include:

(1)第一爱因斯坦算子和第二爱因斯坦算子的同一标记集合中相同元素的排序顺序相同。(1) The sorting order of the same elements in the same tag set of the first Einstein operator and the second Einstein operator is the same.

其中,标记集合如表1所示,即集合ABO、AO、BO、AB、A、B。Among them, the mark set is shown in Table 1, that is, the set ABO, AO, BO, AB, A, B.

如果操作数A中包含的元素为“ibnd”,操作数B中包含的元素为“jbnd”,O中包含的元素为“bnij”。则在表达式1)和表达式3)中同一集合中的元素的顺序均为“bn”,而不是在表达式1)的ABO集合中为“bn”,而在表达式3)的ABO集合中就变成了“nb”。由此,本公开实施例要求同一标记集合中相同元素的排列顺序相同。If the element contained in operand A is "ibnd", the element contained in operand B is "jbnd", and the element contained in O is "bnij". Then the order of elements in the same set in expression 1) and expression 3) is both "bn", instead of "bn" in the ABO set of expression 1), and in the ABO set of expression 3) It becomes "nb". Therefore, the embodiment of the present disclosure requires that the same elements in the same tag set are arranged in the same order.

(2)不同操作数的前向下标不一致,即:(2) The forward subscripts of different operands are inconsistent, that is:

在第一张量为第二爱因斯坦算子的第一个操作数的情况下,第一个操作数的转置的下标满足ABO、AO、AB的顺序。也即通过依序拼接ABO、AO、AB集合中的元素即可得到第一个操作数的转置。例如,操作数A为ibnd,ABO集合中包含的元素为bn、AO集合中包含的元素为i、AB集合中包含的元素为d,则依序拼接ABO、AO、AB集合中的元素,得到操作数A的转置bnid。When the first tensor is the first operand of the second Einstein operator, the subscript of the transpose of the first operand satisfies the order of ABO, AO, and AB. That is, the transpose of the first operand can be obtained by sequentially splicing the elements in the ABO, AO, and AB sets. For example, if the operand A is ibnd, the element contained in the ABO set is bn, the element contained in the AO set is i, and the element contained in the AB set is d, then the elements in the ABO, AO, and AB sets are spliced in order to get The transposed bnid of operand A.

在第一张量为第二爱因斯坦算子的第二个操作数的情况下,第二个操作数的转置的下标满足ABO、AB、BO的顺序。也即,当操作数为第二个操作数时,通过依序连接ABO、AO、AB集合中的元素可得到其转置。例如,操作数A为ibnd,ABO集合中包含的元素为bn、AO集合中包含的元素为i、AB集合中包含的元素为d,则依序拼接ABO、AO、AB集合中的元素,得到操作数A的转置bnid。When the first tensor is the second operand of the second Einstein operator, the subscript of the transpose of the second operand satisfies the order of ABO, AB, BO. That is, when the operand is the second operand, its transpose can be obtained by sequentially connecting the elements in the ABO, AO, and AB sets. For example, if the operand A is ibnd, the element contained in the ABO set is bn, the element contained in the AO set is i, and the element contained in the AB set is d, then the elements in the ABO, AO, and AB sets are spliced in order to get The transposed bnid of operand A.

其中,ABO、AO和BO集合的含义与前文表述相同,例如ABO为第一标记集合,ABO中的元素包含在第二爱因斯坦算子的两个操作数中且包含在第二爱因斯坦算子的输出结果中。Among them, the meanings of ABO, AO and BO sets are the same as the previous expressions. For example, ABO is the first marked set, and the elements in ABO are included in the two operands of the second Einstein operator and included in the second Einstein operator. in the output of the operator.

AO为第二标记集合,AO中的元素包含在第二爱因斯坦算子的第一个操作数中且包含在第二爱因斯坦算子的输出结果中。AO is the second tag set, and the elements in AO are included in the first operand of the second Einstein operator and included in the output result of the second Einstein operator.

BO为第三标记集合,BO中的元素包含在第二爱因斯坦算子的第二个操作数中且包含在第二爱因斯坦算子的输出结果中。BO is the third tag set, and the elements in BO are included in the second operand of the second Einstein operator and included in the output result of the second Einstein operator.

本公开实施例中,通过制定预设标记规则,使第二爱因斯坦算子满足了复用第一爱因斯坦算子中第一张量转置结果的条件,确保了第一张量转置结果的可复用,为简化第二爱因斯坦算子的运算过程提供了条件。将转置结果缓存至存储介质以供复用,还减少了转置内核的调用次数。In the embodiment of the present disclosure, by formulating preset marking rules, the second Einstein operator satisfies the condition of reusing the first tensor transposition result in the first Einstein operator, ensuring that the first tensor transposition is The reusability of the setting results provides conditions for simplifying the operation process of the second Einstein operator. Caching the transposed results to the storage medium for reuse also reduces the number of calls to the transposed kernel.

在实施时,可以将前向计算的三个转置均存储至目标存储介质。例如将表达式1中的操作数A的转置TA、操作数B的转置TB以及输出结果O的转置TdO均存储起来以便于复用。When implemented, each of the three transposes of the forward calculation can be stored to the target storage medium. For example, the transposition TA of operand A, the transposition TB of operand B, and the transposition TdO of the output result O in expression 1 are all stored for easy reuse.

在一些实施方式中,在第一张量为第一爱因斯坦算子的第一个操作数,且第二爱因斯坦算子用于确定第一爱因斯坦算子的第二个操作数的梯度的情况下,为了能够尽可能多的复用前向计算的转置,本公开实施例中将第二爱因斯坦算子的表达式确定为第一目标表达式,第一目标表达式为如式4)所示:In some embodiments, the first tensor is a first operand of a first Einstein operator, and the second Einstein operator is used to determine a second operand of the first Einstein operator. In the case of the gradient of As shown in formula 4):

dO×A->dB 4)dO×A->dB 4)

在式4)中,dO表示第一爱因斯坦算子的输出结果的梯度、A表示第一张量、dB表示第一爱因斯坦算子的第二个操作数的梯度。In equation 4), dO represents the gradient of the output result of the first Einstein operator, A represents the first tensor, and dB represents the gradient of the second operand of the first Einstein operator.

将读取的第一张量转置结果作为第一目标表达式中A的转置,将从目标存储介质读取的第一爱因斯坦算子的输出结果的梯度的转置作为第一目标表达式中的dO的转置,并基于第二爱因斯坦算子的第一张量的下标,执行第一目标表达式,得到第一爱因斯坦算子的第二个操作数的梯度。Use the transposed result of the first tensor read as the transpose of A in the first target expression, and the transpose of the gradient of the output result of the first Einstein operator read from the target storage medium as the first target. The transpose of dO in the expression, and based on the subscript of the first tensor of the second Einstein operator, execute the first target expression to obtain the gradient of the second operand of the first Einstein operator .

由此,本公开实施例中,为了能够尽可能多的复用前向阶段的转置,精心设计了第一目标表达式来求解第一爱因斯坦算子的第二个操作数的梯度。由此,通过提高复用转置的数量,降低对内核的调用次数,由此节约计算资源。Therefore, in the embodiment of the present disclosure, in order to reuse as many transposes in the forward stage as possible, the first objective expression is carefully designed to solve the gradient of the second operand of the first Einstein operator. As a result, by increasing the number of multiplexed transpositions, the number of calls to the kernel is reduced, thereby saving computing resources.

类似地,在第一张量为第一爱因斯坦算子的第二个操作数,且第二爱因斯坦算子用于确定第一爱因斯坦算子的第一个操作数的梯度的情况下,将第二爱因斯坦算子的表达式确定为第二目标表达式,第二目标表达式与表达式2)相同,即为:B×dO->d A,其中dO表示第一爱因斯坦算子的输出结果的梯度、B表示第一张量、dA表示第一爱因斯坦算子的第一个操作数的梯度。Similarly, where the first tensor is the second operand of the first Einstein operator, and the second Einstein operator is used to determine the gradient of the first operand of the first Einstein operator In this case, the expression of the second Einstein operator is determined as the second target expression. The second target expression is the same as expression 2), that is: B×dO->d A, where dO represents the first The gradient of the output result of the Einstein operator, B represents the first tensor, and dA represents the gradient of the first operand of the first Einstein operator.

将读取的第一张量转置结果作为第二目标表达式中B的转置,将从目标存储介质读取的第一爱因斯坦算子的输出结果的梯度的转置作为第二目标表达式中的dO的转置,并基于第二爱因斯坦算子的第一张量的下标,执行第二目标表达式,得到第一爱因斯坦算子的第一个操作数的梯度。Use the transposed result of the first tensor read as the transpose of B in the second target expression, and the transpose of the gradient of the output result of the first Einstein operator read from the target storage medium as the second target. The transpose of dO in the expression, and based on the subscript of the first tensor of the second Einstein operator, execute the second target expression to obtain the gradient of the first operand of the first Einstein operator .

由此,本公开实施例中,为了能够尽可能多的复用前向阶段的转置,采用第二目标表达式来求解第一爱因斯坦算子的第二个操作数的梯度。由此,通过提高复用转置的数量,降低对内核的调用次数,由此节约计算资源。Therefore, in the embodiment of the present disclosure, in order to reuse as many transposes in the forward stage as possible, the second objective expression is used to solve the gradient of the second operand of the first Einstein operator. As a result, by increasing the number of multiplexed transpositions, the number of calls to the kernel is reduced, thereby saving computing resources.

综上,为了能够尽可能复用表达式1)中的三个转置,本公开实施例中,将原来的前向计算过程中的表达式3)修改为表达式4),这样,在反向阶段表达式2)和表达式4)均能够复用表达式1)的三个转置。In summary, in order to reuse the three transpositions in expression 1) as much as possible, in the embodiment of the present disclosure, the original expression 3) in the forward calculation process is modified into expression 4), so that in the reverse calculation Both phase-directed expression 2) and expression 4) can reuse the three transpositions of expression 1).

在一些实施例中,由于通过转置操作和采用预设标记规则,可能使得最终确定的操作数的梯度并不适用于后续计算。为此,本公开实施例中可通过转置操作,将得到的梯度转置为后续计算可用的目标梯度。如图2所示,可实施为:In some embodiments, due to the transposition operation and the adoption of preset marking rules, the finally determined gradient of the operand may not be suitable for subsequent calculations. To this end, in the embodiment of the present disclosure, the obtained gradient can be transposed into a target gradient usable for subsequent calculations through a transposition operation. As shown in Figure 2, it can be implemented as:

S201,在第二爱因斯坦算子相对第一爱因斯坦算子进行反向运算的情况下,将中间梯度存储到目标存储介质中。S201: When the second Einstein operator performs a reverse operation relative to the first Einstein operator, store the intermediate gradient in the target storage medium.

S202,从目标存储介质中读取中间梯度,并调用内核将目标梯度的下标按照对应目标操作数的下标进行转置操作,得到目标操作数的目标梯度。S202: Read the intermediate gradient from the target storage medium, and call the kernel to transpose the subscript of the target gradient according to the subscript of the corresponding target operand to obtain the target gradient of the target operand.

其中:in:

在目标操作数为第一爱因斯坦算子的第一个操作数的情况下,第二爱因斯坦算子确定的第一爱因斯坦算子的第一个操作数的梯度为中间梯度;When the target operand is the first operand of the first Einstein operator, the gradient of the first operand of the first Einstein operator determined by the second Einstein operator is the intermediate gradient;

在目标操作数为第一爱因斯坦算子的第二个操作数的情况下,第二爱因斯坦算子确定的第一爱因斯坦算子的第二个操作数的梯度为中间梯度。In the case where the target operand is the second operand of the first Einstein operator, the gradient of the second operand of the first Einstein operator determined by the second Einstein operator is the intermediate gradient.

举例说明,例如通过复用TdO和TA计算出中间梯度dB的下标为bndj,基于表达式1)可知,需要的目标梯度dB的下标为jbnd,则需要通过转置操作,将中间梯度的下标转换为目标梯度的下标顺序,即将bndj转置为jbnd。For example, by reusing TdO and TA, the subscript of the intermediate gradient dB is calculated to be bndj. Based on expression 1), it can be seen that the subscript of the required target gradient dB is jbnd, and the transposition operation is needed to convert the intermediate gradient The subscripts are converted to the subscript order of the target gradient, that is, bndj is transposed to jbnd.

在一些实施方式中,目标存储介质中至少存储有第二爱因斯坦算子确定的第一爱因斯坦算子的第一个操作数的梯度,以及第二爱因斯坦算子确定的第一爱因斯坦算子的第二个操作数的梯度。当对中间梯度进行转置操作时,可从目标存储介质中读取第一个操作数的梯度,或,读取第二个操作数的梯度,亦或,读取第一个操作数的梯度以及第二个操作数的梯度。In some embodiments, the target storage medium stores at least the gradient of the first operand of the first Einstein operator determined by the second Einstein operator, and the first gradient determined by the second Einstein operator. The gradient of the second operand of the Einstein operator. When transposing the intermediate gradient, the gradient of the first operand can be read from the target storage medium, or the gradient of the second operand can be read, or the gradient of the first operand can be read. and the gradient of the second operand.

本公开实施例中,将第二爱因斯坦算子相对第一爱因斯坦算子进行反向运算后所得到的中间梯度存储到目标存储介质中,以备后续复用该中间梯度将其转置为目标梯度,以便于提高计算的准确性。In the embodiment of the present disclosure, the intermediate gradient obtained by performing the reverse operation of the second Einstein operator relative to the first Einstein operator is stored in the target storage medium in preparation for subsequent reuse of the intermediate gradient to convert it. Set as target gradient to improve calculation accuracy.

需要说明的是,无论是否采用本公开实施例中方案实现对转置结果的复用,相关技术中也均有需要将得到的dB和dA进行转置操作,因此,从整体流程而言,本公开实施例中对中间梯度的转置操作,并未增加对内核的调用次数。It should be noted that, regardless of whether the solution in the embodiment of the present disclosure is used to multiplex the transposed results, it is also necessary in related technologies to transpose the obtained dB and dA. Therefore, from the perspective of the overall process, this The transposition operation on the intermediate gradient in the disclosed embodiment does not increase the number of calls to the kernel.

本公开实施例中,除了考虑爱因斯坦算子具有两个操作数的情况,还考虑到爱因斯坦算子可能具有多个操作数的情况。当爱因斯坦算子具有多个操作数时,可拓展为实施以下方法以实现对转置结果的复用:In the embodiment of the present disclosure, in addition to considering the case where the Einstein operator has two operands, the case where the Einstein operator may have multiple operands is also considered. When the Einstein operator has multiple operands, it can be extended to implement the following method to reuse the transposed results:

在目标爱因斯坦算子包括有序排列的n个操作数的情况下,基于以下方法将目标爱因斯坦算子拆解为多个有序执行的第一爱因斯坦算子:In the case where the target Einstein operator includes n operands arranged in order, the target Einstein operator is disassembled into multiple first Einstein operators executed in order based on the following method:

将n个操作数中排序为第一位置和第二位置的两个操作数确定为第二位置的操作数对应的第一爱因斯坦算子的第一个操作数,得到第二位置的操作数对应的第一爱因斯坦算子的输出结果。Determine the two operands sorted into the first position and the second position among the n operands as the first operand of the first Einstein operator corresponding to the operand at the second position, and obtain the operation at the second position. The output result of the first Einstein operator corresponding to the number.

在n个操作数中存在未处理的操作数的情况下,确定未处理的操作数中排序第一的操作数为目标操作数。并,将目标操作数的上一个操作数对应的第一爱因斯坦算子的输出结果和目标操作数分别作为目标操作数对应的第一爱因斯坦算子的第一个操作数和第二个操作数,得到目标操作数对应的第一爱因斯坦算子的输出结果。When there are unprocessed operands among the n operands, the operand ranked first among the unprocessed operands is determined to be the target operand. And, the output result and the target operand of the first Einstein operator corresponding to the previous operand of the target operand are respectively used as the first operand and the second operand of the first Einstein operator corresponding to the target operand. Operands, get the output result of the first Einstein operator corresponding to the target operand.

在一些实施例中,例如,目标爱因斯坦算子具有三个操作数,分别为D、E、F,此时,排序为第一位置和第二位置的两个操作数D和E就作为第二位置的操作数E对应的第一爱因斯坦算子的两个操作数,可得到第二位置的操作数E对应的第一爱因斯坦算子的输出结果。In some embodiments, for example, the target Einstein operator has three operands, namely D, E, and F. At this time, the two operands D and E sorted into the first position and the second position are as The two operands of the first Einstein operator corresponding to the operand E in the second position can obtain the output result of the first Einstein operator corresponding to the operand E in the second position.

此时,在三个操作数D、E、F中还存在F这个未处理的操作数,因为此处只余下一个操作数F,故F理应作为未处理的操作数中排序第一的目标操作数。目标操作数F对应的上一个操作数即E。进而将E所对应的第一爱因斯坦算子的输出结果和目标操作数F分别作为目标操作数F对应的第一爱因斯坦算子的第一个操作数和第二个操作数,继续按照上述对于两个操作数的处理过程,对E所对应的第一爱因斯坦算子的输出结果和目标操作数F进行运算,以得到目标操作数F对应的第一爱因斯坦算子的输出结果,直至不存在剩余的未处理操作数结束循环。At this time, there is an unprocessed operand F among the three operands D, E, and F. Since there is only one operand F left here, F should be the first-ordered target operation among the unprocessed operands. number. The previous operand corresponding to the target operand F is E. Then, the output result of the first Einstein operator corresponding to E and the target operand F are respectively used as the first operand and the second operand of the first Einstein operator corresponding to the target operand F, and continue. According to the above-mentioned processing process for the two operands, the output result of the first Einstein operator corresponding to E and the target operand F are operated to obtain the output result of the first Einstein operator corresponding to the target operand F. Output the results until there are no remaining unprocessed operands to end the loop.

本公开实施例中,在爱因斯坦算子可能具有多个操作数的情况下,可将复杂的爱因斯坦算子拆解为只需两个操作数的爱因斯坦算子,并执行本公开实施例中复用转置结果的方法,以减少对内核的调用次数,节约计算资源,实现对爱因斯坦算子的输出结果的复用,能够提高爱因斯坦算子的运算效率。In the embodiment of the present disclosure, in the case where the Einstein operator may have multiple operands, the complex Einstein operator can be disassembled into an Einstein operator with only two operands, and this method can be executed. The method of reusing the transposed results in the disclosed embodiments can reduce the number of calls to the kernel, save computing resources, realize the reuse of the output results of the Einstein operator, and improve the operation efficiency of the Einstein operator.

为便于理解本公开实施例,参照图3a对本公开实施例的数据处理方法的整体流程进行说明:In order to facilitate understanding of the embodiment of the present disclosure, the overall flow of the data processing method of the embodiment of the present disclosure is described with reference to Figure 3a:

如图3a所示,A和B作为两个操作数输入,将A、B进行第一次计算后得到的输出C求导并将dC也作为输入数。将A、B和dC进行第一次转置,在中间层得到TA、TB、TdC,TA、TB、TdC即可复用的3个缓存。此处可以减少3个Tranpose。在反向的时候,也就不需要再重复计算TA、TB和TC。继续对TA、TB、TdC进行BMM(BatchedMatMul,矩阵乘法批处理),得到中间输出结果C、dB、dA。此时的中间输出结果并不能直接被输出使用,而是需要再一次地转置操作,得到转置后的C、dB、dA作为最终的可用输出结果。As shown in Figure 3a, A and B are input as two operands. The output C obtained after the first calculation of A and B is derived and dC is also used as an input number. Transpose A, B, and dC for the first time, and obtain TA, TB, and TdC in the middle layer. Three caches that can be reused are TA, TB, and TdC. 3 Tranpose can be reduced here. When reversing, there is no need to repeatedly calculate TA, TB and TC. Continue to perform BMM (BatchedMatMul, matrix multiplication batch processing) on TA, TB, and TdC to obtain the intermediate output results C, dB, and dA. At this time, the intermediate output result cannot be directly used by the output, but needs to be transposed again to obtain the transposed C, dB, and dA as the final available output result.

针对具体的计算过程,以图3b所示为例(需要注意的是图3b表格表头表示标记集合,与具体的操作数不同):For the specific calculation process, take the example shown in Figure 3b (it should be noted that the table header in Figure 3b represents the mark set, which is different from the specific operands):

若前向计算为:O=paddle.einsum(“ibnd,jbnd->bnij”,A,B),由于A位于第一个操作数,当要求两个操作数的前向下标不一致时,第一个操作数的转置为:ABO集合,AO集合,AB集合;第二个操作数的转置为:ABO集合,AB集合,BO集合。得到ABO|AO|AB->bnid,所以TA1的下标为:bnid。If the forward calculation is: O=paddle.einsum("ibnd,jbnd->bnij",A,B), since A is located in the first operand, when the forward subscripts of the two operands are required to be inconsistent, the The transposition of one operand is: ABO set, AO set, AB set; the transposition of the second operand is: ABO set, AB set, BO set. Get ABO|AO|AB->bnid, so the subscript of TA1 is: bnid.

反向计算为:dB=paddle.einsum(“bnij,ibnd->jbnd”,dO,A),由于此时操作数A位于第二个操作数,第二个操作数的转置为:ABO集合,AB集合,BO集合,且表头中A、B和O分别为:A->dO,B->A,O->dB。则可得到操作数A的转置为:ABO|AB|BO->OAB|OA|AB->bnid,可见此时TA2=bnid。The reverse calculation is: dB=paddle.einsum("bnij,ibnd->jbnd",dO,A). Since the operand A is located in the second operand at this time, the transposition of the second operand is: ABO set , AB set, BO set, and A, B and O in the header are respectively: A->dO, B->A, O->dB. Then the transposition of operand A can be obtained as: ABO|AB|BO->OAB|OA|AB->bnid. It can be seen that TA2=bnid at this time.

综上,结合前向计算和反向计算过程,按照标记规则在反向计算中调整相应操作数的下标,最终使得TA1=TA2,满足复用条件。In summary, by combining the forward calculation and reverse calculation processes, and adjusting the subscripts of the corresponding operands in the reverse calculation according to the marking rules, TA1 = TA2 is finally made, which satisfies the reuse condition.

使用上述优化方法之后,可以确保反向复用的TA、TB、TC一定是正确的。可以将爱因斯坦算子的反向速度追平到几乎和Trace方式同一个水平。实验证明优化后的反向处理需要35ms,相比于未复用转置结果的原始的组合方式反向需要40ms,有16%的提升。After using the above optimization method, you can ensure that the reverse multiplexed TA, TB, and TC are correct. The reverse speed of the Einstein operator can be tied to almost the same level as the Trace method. Experiments show that the optimized reverse processing takes 35ms, which is a 16% improvement compared to the original combination method that does not reuse the transposed results and takes 40ms.

综上所述,通过实验证明,本公开实施例通过优化爱因斯坦算子反向传播过程,可以在消耗同样显存的情况下,将反向计算速度大幅提升。同时,在理论最优的情况下,还可以让爱因斯坦算子反向传播过程减少6次内核运算,包括3次Transpose运算和3次ReduceSum运算。In summary, experiments have proven that by optimizing the Einstein operator back propagation process, the embodiments of the present disclosure can significantly increase the reverse calculation speed while consuming the same video memory. At the same time, under theoretically optimal conditions, the Einstein operator backpropagation process can also be reduced by 6 kernel operations, including 3 Transpose operations and 3 ReduceSum operations.

实验证明可以将Transpose的数量降低到一个非常低的水平,令A、B、O是2个输入和1个输出,那么dO就是输出的梯度。由于需要得到O和dA,dB三个矩阵,在不考虑广播、不考虑reduction、同时在极端情况(理论最坏)下,内核计算次数是:理论最低值是前向的1个matmul+2个Tranpose(对A和B获取到TA和TB)。同时反向有2个matmul操作,以及一个对dO的Transpose。最后加上3个图3a中对输出的Transpose。所以是3个输入的Transpose+3个输出的Transpose+3个matmul。需要说明的是,这个值是理论上的,在最坏情况下的结论,由于相关技术中最好的规划算法中总调用内核次数最优是这个数量,因此本公开实施例的方案仍能够取得优势。除此之外,还存在另一些情况,比如输出或输入不需要Transpose,采用本公开实施例的方法能够进一步降低对内核的调用次数。总而言之,本公开实施例中,理论上每个变量只需要Transpose一次,而相关技术中同一变量可能Transpose多次。如果结合Reduction和广播,也可以节省掉Reduction+Transpose的结果,以及广播+Transpose的结果,因此通过复用减少对内核的调用次数在结合Reduction和广播的情况下也适用。Experiments have shown that the number of Transpose can be reduced to a very low level. Let A, B, and O be 2 inputs and 1 output, then dO is the gradient of the output. Since we need to obtain three matrices of O and dA, dB, without considering broadcast, without considering reduction, and at the same time in extreme cases (theoretical worst), the number of kernel calculations is: the theoretical minimum value is 1 matmul + 2 forward Tranpose (get TA and TB for A and B). At the same time, there are 2 matmul operations in the reverse direction, and a Transpose on dO. Finally, add three Transposes to the output in Figure 3a. So it is 3 input Transpose+3 output Transpose+3 matmul. It should be noted that this value is theoretical and is based on the worst case scenario. Since the optimal number of total kernel calls in the best planning algorithm in the related technology is this number, the solution of the embodiment of the present disclosure can still achieve Advantage. In addition, there are other situations, such as output or input that does not require Transpose. The method of the embodiment of the present disclosure can further reduce the number of calls to the kernel. In summary, in the embodiments of the present disclosure, theoretically each variable only needs to be Transposed once, whereas in related technologies, the same variable may be Transposed multiple times. If you combine Reduction and broadcast, you can also save the results of Reduction+Transpose and the results of Broadcast+Transpose. Therefore, reducing the number of calls to the kernel through reuse is also applicable when combining Reduction and broadcast.

以在自然语言处理领域为例,对本公开实施例的数据处理方法进行说明。如图4所示:Taking the field of natural language processing as an example, the data processing method of the embodiment of the present disclosure will be described. As shown in Figure 4:

图4是XLNet(Generalized Autoregressive Pretraining for Transformer-XL,Transformer-XL拓展)模型,einsum算子应用在XLNet的Relative Attention(相对注意力)中。具体的应用在图4的中两个Masked Two-streamAttention(掩蔽的双流注意力模块)。具体地,einsum算子对文本输入的embeding(特征)张量组合了多个复杂的线性运算,最后输出了注意力权重(output attentions)。Figure 4 is the XLNet (Generalized Autoregressive Pretraining for Transformer-XL, Transformer-XL extension) model. The einsum operator is applied in the Relative Attention of XLNet. The specific application is in the two Masked Two-streamAttention (masked two-stream attention modules) in Figure 4. Specifically, the einsum operator combines multiple complex linear operations on the embedding (feature) tensor of the text input, and finally outputs attention weights (output attentions).

以在蛋白质结构预测领域为例,对本公开实施例的数据处理方法进行说明。例如在图5的灰色区域中,该模型包括:PLM(Protein Language Model,蛋白质语言模型)、适配器和几何模型。基于该蛋白结构预测模型,预测候选蛋白的结构可实施为,如图5的灰色区域所示:Taking the field of protein structure prediction as an example, the data processing method of the embodiment of the present disclosure will be described. For example, in the gray area of Figure 5, the model includes: PLM (Protein Language Model, protein language model), adapter and geometric model. Based on this protein structure prediction model, the structure of the candidate protein can be predicted as follows, as shown in the gray area of Figure 5:

基于蛋白结构预测模型中的蛋白语言模型,构建候选蛋白的一级结构信息和注意力图;其中,本公开实施例采用3亿条单序列(如图5的灰色区域中~300M primarysequences)对PLM模型进行训练。以使得PLM能够准确的提取一级结构信息和注意力图。Based on the protein language model in the protein structure prediction model, the primary structure information and attention map of the candidate protein are constructed; among them, the embodiment of the present disclosure uses 300 million single sequences (~300M primary sequences in the gray area of Figure 5) for the PLM model Conduct training. This enables PLM to accurately extract primary structure information and attention maps.

将候选蛋白的一级结构信息和注意力图输入蛋白结构预测模型的适配器层(Adaptor),得到候选蛋白的二级结构信息。如图5的灰色区域所示,适配器层后面虚线框示出了二级结构信息,该二级结构信息中包括单序列表示1(singel repr.)和配对表示1(pair repr.)。其中适配器层可包括两个线性层,其中一级结构信息输入其中一个线性层得到二级结构信息中的单序列表示1,注意力图输入另一个线性层得到二级结构信息中的配对表示1。Input the primary structure information and attention map of the candidate protein into the adapter layer (Adaptor) of the protein structure prediction model to obtain the secondary structure information of the candidate protein. As shown in the gray area of Figure 5, the dotted box behind the adapter layer shows secondary structure information, which includes single sequence representation 1 (singel repr.) and paired representation 1 (pair repr.). The adapter layer may include two linear layers, in which the primary structure information is input into one linear layer to obtain a single sequence representation 1 in the secondary structure information, and the attention map is input into the other linear layer to obtain a paired representation 1 in the secondary structure information.

将候选蛋白的二级结构信息输入几何模型,得到候选蛋白的三级结构信息。如图5的灰色区域所示,几何模型可以为AlphaFold(阿尔法折叠)模型的Geometric Modeling(几何模型)。以便于使用几何模型的结构预测能力准确的预测候选蛋白的结构。Input the secondary structure information of the candidate protein into the geometric model to obtain the tertiary structure information of the candidate protein. As shown in the gray area of Figure 5, the geometric model can be Geometric Modeling of the AlphaFold (Alpha fold) model. In order to use the structure prediction ability of the geometric model to accurately predict the structure of the candidate protein.

需要说明的是,AlphaFold模型中原始的EvoFormer(伊瓦创造模块)采用搜索到的MSA(Measurement Systems Analysis,测量系统分析)作为输入。作为替代方案,本公开实施例中采用适配器层的输出作为MSA,由此省去搜索MSA的过程,提高预测速度。其次,本公开实施例中Evoformer采用各种注意机制来交换单序列表示和配对表示中的信息,以学习到空间关系。It should be noted that the original EvoFormer (Eva creation module) in the AlphaFold model uses the searched MSA (Measurement Systems Analysis) as input. As an alternative, in the embodiment of the present disclosure, the output of the adapter layer is used as the MSA, thereby eliminating the process of searching for the MSA and improving the prediction speed. Secondly, in the embodiment of the present disclosure, Evoformer uses various attention mechanisms to exchange information in single sequence representation and paired representation to learn spatial relationships.

本公开实施例中,结构模块(Structure Module)采用EvoFormer产生的单序列表示和配对表示,并利用不变点注意和其他几何变换算子实现端到端的预测对接结构中原子的3D坐标。In the embodiment of the present disclosure, the Structure Module uses the single sequence representation and paired representation generated by EvoFormer, and uses invariant point attention and other geometric transformation operators to achieve end-to-end prediction of the 3D coordinates of atoms in the docking structure.

本公开实施例采用3亿条单序列(如图5的灰色区域中~300M primarysequences)对PLM模型进行训练。由于仅仅依靠PLM预测结构不足以充分捕捉所需的特征信息,因此对蛋白结构预测模型(HelixFold-Single)中的PLMBase(即PLM)和GeometricModeling模块进行联合优化。优化时利用10万个实验确定的蛋白质结构(如图5的灰色区域中~1M estimated structures)进行优化训练。同时使用额外的一百万个估计的蛋白质结构(如图5的灰色区域中的~120K determined structures)进行训练。使用主要损失对网络进行端到端训练,包括帧对齐点误差(FAPE)损失和其他辅助损失。通过结合计算高效的PLMBase模块(与MSA搜索相比)和Geometric Modeling模块,HelixFols-Single能够提供高效和精确的蛋白质结构预测。The embodiment of the present disclosure uses 300 million single sequences (~300M primary sequences in the gray area of Figure 5) to train the PLM model. Since relying solely on PLM to predict the structure is not enough to fully capture the required feature information, the PLMBase (PLM) and Geometric Modeling modules in the protein structure prediction model (HelixFold-Single) were jointly optimized. During optimization, 100,000 experimentally determined protein structures (~1M estimated structures in the gray area of Figure 5) are used for optimization training. An additional one million estimated protein structures (~120K determined structures in the gray area of Figure 5) are also used for training. The network is trained end-to-end using the primary loss, including the Frame Aligned Point Error (FAPE) loss and other auxiliary losses. By combining the computationally efficient PLMBase module (compared to MSA searches) and the Geometric Modeling module, HelixFols-Single is able to provide efficient and accurate protein structure predictions.

在图5的灰色区域所示的蛋白质结构预测模型中使用爱因斯坦算子,输入表示的是蛋白质的序列即爱因斯坦算子的操作数为蛋白质序列,通过组合多个einsum算子,输出注意力权重(即图5的灰色区域中的注意力图)。The Einstein operator is used in the protein structure prediction model shown in the gray area of Figure 5. The input represents the protein sequence, that is, the operand of the Einstein operator is the protein sequence. By combining multiple einsum operators, the output Attention weights (i.e. the attention map in the gray area of Figure 5).

在蛋白质结构预测领域,通过本公开实施例的方法复用前向传播的蛋白质序列的转置,可减少对内核的启动调用次数,节约计算资源,并提高模型预训练的效率。In the field of protein structure prediction, by multiplexing the transposition of forward-propagated protein sequences through the methods of embodiments of the present disclosure, the number of startup calls to the kernel can be reduced, computing resources can be saved, and the efficiency of model pre-training can be improved.

以在机器视觉领域为例,对本公开实施例的数据处理方法进行说明。例如在自动驾驶控制中,通过车辆的图像采集装置采集车辆周围环境的图像,通过视觉模型中的特征提取模块从图像中提取周围环境的特征,实施时,可将提取的特征作为爱因斯坦算子的操作数,将该特征的转置缓存起来,以便于在后续调整视觉模型的模型参数的过程中,在反向传播求解模型参数的梯度时复用缓存的转置结果。Taking the field of machine vision as an example, the data processing method according to the embodiment of the present disclosure will be described. For example, in automatic driving control, images of the surrounding environment of the vehicle are collected through the vehicle's image acquisition device, and features of the surrounding environment are extracted from the images through the feature extraction module in the visual model. During implementation, the extracted features can be used as Einstein algorithms sub-operand, cache the transposition of the feature so that in the subsequent process of adjusting the model parameters of the visual model, the cached transposition result can be reused when backpropagating to solve the gradient of the model parameters.

在机器视觉领域,通过本公开实施例的方法复用前向传播的图像特征的转置,可减少对内核的调用次数,节约计算资源,并提高模型预训练的效率。In the field of machine vision, by reusing the transposed image features of forward propagation through the method of the embodiment of the present disclosure, the number of calls to the kernel can be reduced, computing resources can be saved, and the efficiency of model pre-training can be improved.

需要说明的是,在反向传播求解模型参数的梯度时复用缓存的转置结果,不仅适用于机器视觉领域,还适用于其他领域。如自然语言处理领域和蛋白质结构预测领域等。It should be noted that reusing the cached transposed results when backpropagating to solve the gradient of model parameters is not only applicable to the field of machine vision, but also to other fields. Such as the field of natural language processing and protein structure prediction.

基于相同的技术构思,本公开实施例还提供一种数据处理装置,如图6所示,包括:Based on the same technical concept, embodiments of the present disclosure also provide a data processing device, as shown in Figure 6, including:

第一获取模块,用于获取第一爱因斯坦算子所需的第一张量;The first acquisition module is used to obtain the first tensor required for the first Einstein operator;

调用模块,用于调用内核对第一张量执行转置操作,得到第一张量转置结果;The calling module is used to call the kernel to perform the transposition operation on the first tensor and obtain the transposition result of the first tensor;

第一存储模块,用于存储第一张量转置结果到目标存储介质中;The first storage module is used to store the first tensor transposition result in the target storage medium;

生成模块,用于在对第二爱因斯坦算子的第一张量进行规划调度的阶段,按照预设标记规则生成第二爱因斯坦算子的第一张量的下标;其中,预设标记规则满足能够复用第一张量转置结果的要求;The generation module is used to generate the subscript of the first tensor of the second Einstein operator according to the preset marking rules during the planning and scheduling stage of the first tensor of the second Einstein operator; wherein, the preset Assume that the marking rule meets the requirement of being able to reuse the transposed result of the first tensor;

读取模块,用于在第二爱因斯坦算子需要复用第一张量转置结果的情况下,从目标存储介质中读取第一张量转置结果;A reading module, used to read the first tensor transposition result from the target storage medium when the second Einstein operator needs to reuse the first tensor transposition result;

执行模块,用于基于读取的第一张量转置结果以及第二爱因斯坦算子的第一张量的下标,执行第二爱因斯坦算子的运算过程。The execution module is configured to execute the operation process of the second Einstein operator based on the read transposition result of the first tensor and the subscript of the first tensor of the second Einstein operator.

在一些实施例中,预设标记规则包括:In some embodiments, preset marking rules include:

第一爱因斯坦算子和第二爱因斯坦算子的同一标记集合中相同元素的排序顺序相同;The sorting order of the same elements in the same tag set of the first Einstein operator and the second Einstein operator is the same;

在第一张量为第二爱因斯坦算子的第一个操作数的情况下,第一个操作数的转置的下标满足ABO、AO、AB的顺序;When the first tensor is the first operand of the second Einstein operator, the subscript of the transpose of the first operand satisfies the order of ABO, AO, and AB;

在第一张量为第二爱因斯坦算子的第二个操作数的情况下,第二个操作数的转置的下标满足ABO、AB、BO的顺序:When the first tensor is the second operand of the second Einstein operator, the subscript of the transpose of the second operand satisfies the order of ABO, AB, BO:

其中,ABO为第一标记集合,ABO中的元素包含在第二爱因斯坦算子的两个操作数中且包含在第二爱因斯坦算子的输出结果中;Among them, ABO is the first tag set, and the elements in ABO are included in the two operands of the second Einstein operator and included in the output result of the second Einstein operator;

AO为第二标记集合,AO中的元素包含在第二爱因斯坦算子的第一个操作数中且包含在第二爱因斯坦算子的输出结果中;AO is the second tag set, and the elements in AO are included in the first operand of the second Einstein operator and included in the output result of the second Einstein operator;

BO为第三标记集合,BO中的元素包含在第二爱因斯坦算子的第二个操作数中且包含在第二爱因斯坦算子的输出结果中。BO is the third tag set, and the elements in BO are included in the second operand of the second Einstein operator and included in the output result of the second Einstein operator.

在一些实施例中,执行模块,用于:In some embodiments, the execution module is used to:

在第一张量为第一爱因斯坦算子的第一个操作数,且第二爱因斯坦算子用于确定第一爱因斯坦算子的第二个操作数的梯度的情况下,将第二爱因斯坦算子的表达式确定为第一目标表达式,第一目标表达式为:dO×A->dB,其中dO表示第一爱因斯坦算子的输出结果的梯度、A表示第一张量、dB表示第一爱因斯坦算子的第二个操作数的梯度;In the case where the first tensor is the first operand of the first Einstein operator and the second Einstein operator is used to determine the gradient of the second operand of the first Einstein operator, The expression of the second Einstein operator is determined as the first target expression. The first target expression is: dO×A->dB, where dO represents the gradient of the output result of the first Einstein operator, A represents the first tensor, dB represents the gradient of the second operand of the first Einstein operator;

将读取的第一张量转置结果作为第一目标表达式中A的转置,将从目标存储介质读取的第一爱因斯坦算子的输出结果的梯度的转置作为第一目标表达式中的dO的转置,并基于第二爱因斯坦算子的第一张量的下标,执行第一目标表达式,得到第一爱因斯坦算子的第二个操作数的梯度。Use the transposed result of the first tensor read as the transpose of A in the first target expression, and the transpose of the gradient of the output result of the first Einstein operator read from the target storage medium as the first target. The transpose of dO in the expression, and based on the subscript of the first tensor of the second Einstein operator, execute the first target expression to obtain the gradient of the second operand of the first Einstein operator .

在一些实施例中,执行模块,还用于:In some embodiments, the execution module is also used to:

在第一张量为第一爱因斯坦算子的第二个操作数,且第二爱因斯坦算子用于确定第一爱因斯坦算子的第一个操作数的梯度的情况下,将第二爱因斯坦算子的表达式确定为第二目标表达式,第二目标表达式为:B×dO->dA,其中dO表示第一爱因斯坦算子的输出结果的梯度、B表示第一张量、dA表示第一爱因斯坦算子的第一个操作数的梯度;In the case where the first tensor is the second operand of the first Einstein operator, and the second Einstein operator is used to determine the gradient of the first operand of the first Einstein operator, The expression of the second Einstein operator is determined as the second target expression. The second target expression is: B×dO->dA, where dO represents the gradient of the output result of the first Einstein operator, B represents the first tensor, dA represents the gradient of the first operand of the first Einstein operator;

将读取的第一张量转置结果作为第二目标表达式中B的转置,将从目标存储介质读取的第一爱因斯坦算子的输出结果的梯度的转置作为第二目标表达式中的dO的转置,并基于第二爱因斯坦算子的第一张量的下标,执行第二目标表达式,得到第一爱因斯坦算子的第一个操作数的梯度。Use the transposed result of the first tensor read as the transpose of B in the second target expression, and the transpose of the gradient of the output result of the first Einstein operator read from the target storage medium as the second target. The transpose of dO in the expression, and based on the subscript of the first tensor of the second Einstein operator, execute the second target expression to obtain the gradient of the first operand of the first Einstein operator .

在一些实施例中,数据处理装置,还包括:In some embodiments, the data processing device further includes:

第二存储模块,用于在第二爱因斯坦算子相对第一爱因斯坦算子进行反向运算的情况下,将中间梯度存储到目标存储介质中;The second storage module is used to store the intermediate gradient in the target storage medium when the second Einstein operator performs a reverse operation relative to the first Einstein operator;

第二获取模块,用于从目标存储介质中读取中间梯度,并调用内核将目标梯度的下标按照对应目标操作数的下标进行转置操作,得到目标操作数的目标梯度;The second acquisition module is used to read the intermediate gradient from the target storage medium, and call the kernel to transpose the subscript of the target gradient according to the subscript of the corresponding target operand to obtain the target gradient of the target operand;

其中:in:

在目标操作数为第一爱因斯坦算子的第一个操作数的情况下,第二爱因斯坦算子确定的第一爱因斯坦算子的第一个操作数的梯度为中间梯度;When the target operand is the first operand of the first Einstein operator, the gradient of the first operand of the first Einstein operator determined by the second Einstein operator is the intermediate gradient;

在目标操作数为第一爱因斯坦算子的第二个操作数的情况下,第二爱因斯坦算子确定的第一爱因斯坦算子的第二个操作数的梯度为中间梯度。In the case where the target operand is the second operand of the first Einstein operator, the gradient of the second operand of the first Einstein operator determined by the second Einstein operator is the intermediate gradient.

在一些实施例中,第一爱因斯坦算子具有两个操作数,还包括拆分模块,用于:In some embodiments, the first Einstein operator has two operands and further includes a splitting module for:

在目标爱因斯坦算子包括有序排列的n个操作数的情况下,基于以下方法将目标爱因斯坦算子拆解为多个有序执行的第一爱因斯坦算子:In the case where the target Einstein operator includes n operands arranged in order, the target Einstein operator is disassembled into multiple first Einstein operators executed in order based on the following method:

将n个操作数中排序为第一位置和第二位置的两个操作数确定为第二位置的操作数对应的第一爱因斯坦算子的第一个操作数,得到第二位置的操作数对应的第一爱因斯坦算子的输出结果;Determine the two operands sorted into the first position and the second position among the n operands as the first operand of the first Einstein operator corresponding to the operand at the second position, and obtain the operation at the second position. The output result of the first Einstein operator corresponding to the number;

在n个操作数中存在未处理的操作数的情况下,确定未处理的操作数中排序第一的操作数为目标操作数;并,When there are unprocessed operands among n operands, determine the first-ordered operand among the unprocessed operands as the target operand; and,

将目标操作数的上一个操作数对应的第一因斯坦算子的输出结果和目标操作数分别作为目标操作数对应的第一爱因斯坦算子的第一个操作数和第二个操作数,得到目标操作数对应的第一爱因斯坦算子的输出结果。The output result and the target operand of the first Einstein operator corresponding to the previous operand of the target operand are respectively used as the first operand and the second operand of the first Einstein operator corresponding to the target operand. , get the output result of the first Einstein operator corresponding to the target operand.

在一些实施例中,第一爱因斯坦算子中包括两个操作数,第一张量为两个操作数中的任一操作数。In some embodiments, the first Einstein operator includes two operands, and the first tensor is any one of the two operands.

本公开实施例的装置的各模块、子模块的具体功能和示例的描述,可以参见上述方法实施例中对应步骤的相关描述,在此不再赘述。For descriptions of specific functions and examples of each module and sub-module of the device in the embodiment of the present disclosure, please refer to the relevant description of the corresponding steps in the above method embodiment, and will not be described again here.

本公开的技术方案中,所涉及的用户个人信息的获取,存储和应用等,均符合相关法律法规的规定,且不违背公序良俗。In the technical solution of this disclosure, the acquisition, storage and application of user personal information involved are in compliance with relevant laws and regulations and do not violate public order and good customs.

根据本公开的实施例,本公开还提供了一种电子设备、一种可读存储介质和一种计算机程序产品。According to embodiments of the present disclosure, the present disclosure also provides an electronic device, a readable storage medium, and a computer program product.

图7示出了可以用来实施本公开的实施例的示例电子设备700的示意性框图。电子设备旨在表示各种形式的数字计算机,诸如,膝上型计算机、台式计算机、工作台、个人数字助理、服务器、刀片式服务器、大型计算机、和其它适合的计算机。电子设备还可以表示各种形式的移动装置,诸如,个人数字助理、蜂窝电话、智能电话、可穿戴设备和其它类似的计算装置。本文所示的部件、它们的连接和关系、以及它们的功能仅仅作为示例,并且不意在限制本文中描述的和/或者要求的本公开的实现。7 illustrates a schematic block diagram of an example electronic device 700 that may be used to implement embodiments of the present disclosure. Electronic devices are intended to refer to various forms of digital computers, such as laptop computers, desktop computers, workstations, personal digital assistants, servers, blade servers, mainframe computers, and other suitable computers. Electronic devices may also represent various forms of mobile devices, such as personal digital assistants, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions are examples only and are not intended to limit implementations of the disclosure described and/or claimed herein.

如图7所示,设备700包括计算单元701,其可以根据存储在只读存储器(ROM)702中的计算机程序或者从存储单元708加载到随机访问存储器(RAM)703中的计算机程序,来执行各种适当的动作和处理。在RAM703中,还可存储设备700操作所需的各种程序和数据。计算单元701、ROM 702以及RAM 703通过总线704彼此相连。输入/输出(I/O)接口705也连接至总线704。As shown in FIG. 7 , the device 700 includes a computing unit 701 that can execute according to a computer program stored in a read-only memory (ROM) 702 or loaded from a storage unit 708 into a random access memory (RAM) 703 Various appropriate actions and treatments. In the RAM 703, various programs and data required for the operation of the device 700 can also be stored. Computing unit 701, ROM 702 and RAM 703 are connected to each other via bus 704. An input/output (I/O) interface 705 is also connected to bus 704.

设备700中的多个部件连接至I/O接口705,包括:输入单元706,例如键盘、鼠标等;输出单元707,例如各种类型的显示器、扬声器等;存储单元708,例如磁盘、光盘等;以及通信单元709,例如网卡、调制解调器、无线通信收发机等。通信单元709允许设备700通过诸如因特网的计算机网络和/或各种电信网络与其他设备交换信息/数据。Multiple components in the device 700 are connected to the I/O interface 705, including: an input unit 706, such as a keyboard, a mouse, etc.; an output unit 707, such as various types of displays, speakers, etc.; a storage unit 708, such as a magnetic disk, optical disk, etc. ; and communication unit 709, such as a network card, modem, wireless communication transceiver, etc. The communication unit 709 allows the device 700 to exchange information/data with other devices through computer networks such as the Internet and/or various telecommunications networks.

计算单元701可以是各种具有处理和计算能力的通用和/或专用处理组件。计算单元701的一些示例包括但不限于中央处理单元(CPU)、图形处理单元(GPU)、各种专用的人工智能(AI)计算芯片、各种运行机器学习模型算法的计算单元、数字信号处理器(DSP)、以及任何适当的处理器、控制器、微控制器等。计算单元701执行上文所描述的各个方法和处理,例如数据处理方法。例如,在一些实施例中,数据处理方法可被实现为计算机软件程序,其被有形地包含于机器可读介质,例如存储单元708。在一些实施例中,计算机程序的部分或者全部可以经由ROM 702和/或通信单元709而被载入和/或安装到设备700上。当计算机程序加载到RAM 703并由计算单元701执行时,可以执行上文描述的数据处理方法的一个或多个步骤。备选地,在其他实施例中,计算单元701可以通过其他任何适当的方式(例如,借助于固件)而被配置为执行数据处理方法。Computing unit 701 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of the computing unit 701 include, but are not limited to, a central processing unit (CPU), a graphics processing unit (GPU), various dedicated artificial intelligence (AI) computing chips, various computing units running machine learning model algorithms, digital signal processing processor (DSP), and any appropriate processor, controller, microcontroller, etc. The computing unit 701 performs various methods and processes described above, such as data processing methods. For example, in some embodiments, the data processing method may be implemented as a computer software program that is tangibly embodied in a machine-readable medium, such as storage unit 708. In some embodiments, part or all of the computer program may be loaded and/or installed onto device 700 via ROM 702 and/or communication unit 709 . When the computer program is loaded into the RAM 703 and executed by the computing unit 701, one or more steps of the data processing method described above may be performed. Alternatively, in other embodiments, the computing unit 701 may be configured to perform the data processing method in any other suitable manner (eg, by means of firmware).

本文中以上描述的系统和技术的各种实施方式可以在数字电子电路系统、集成电路系统、现场可编程门阵列(FPGA)、专用集成电路(ASIC)、专用标准产品(ASSP)、芯片上系统的系统(SOC)、复杂可编程逻辑设备(CPLD)、计算机硬件、固件、软件、和/或它们的组合中实现。这些各种实施方式可以包括:实施在一个或者多个计算机程序中,该一个或者多个计算机程序可在包括至少一个可编程处理器的可编程系统上执行和/或解释,该可编程处理器可以是专用或者通用可编程处理器,可以从存储系统、至少一个输入装置、和至少一个输出装置接收数据和指令,并且将数据和指令传输至该存储系统、该至少一个输入装置、和该至少一个输出装置。Various implementations of the systems and techniques described above may be implemented in digital electronic circuit systems, integrated circuit systems, field programmable gate arrays (FPGAs), application specific integrated circuits (ASICs), application specific standard products (ASSPs), systems on a chip implemented in a system (SOC), complex programmable logic device (CPLD), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include implementation in one or more computer programs executable and/or interpreted on a programmable system including at least one programmable processor, the programmable processor The processor, which may be a special purpose or general purpose programmable processor, may receive data and instructions from a storage system, at least one input device, and at least one output device, and transmit data and instructions to the storage system, the at least one input device, and the at least one output device. An output device.

用于实施本公开的方法的程序代码可以采用一个或多个编程语言的任何组合来编写。这些程序代码可以提供给通用计算机、专用计算机或其他可编程数据处理装置的处理器或控制器,使得程序代码当由处理器或控制器执行时使流程图和/或框图中所规定的功能/操作被实施。程序代码可以完全在机器上执行、部分地在机器上执行,作为独立软件包部分地在机器上执行且部分地在远程机器上执行或完全在远程机器或服务器上执行。Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general-purpose computer, special-purpose computer, or other programmable data processing device, such that the program codes, when executed by the processor or controller, cause the functions specified in the flowcharts and/or block diagrams/ The operation is implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

在本公开的上下文中,机器可读介质可以是有形的介质,其可以包含或存储以供指令执行系统、装置或设备使用或与指令执行系统、装置或设备结合地使用的程序。机器可读介质可以是机器可读信号介质或机器可读储存介质。机器可读介质可以包括但不限于电子的、磁性的、光学的、电磁的、红外的、或半导体系统、装置或设备,或者上述内容的任何合适组合。机器可读存储介质的更具体示例会包括基于一个或多个线的电气连接、便携式计算机盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦除可编程只读存储器(EPROM或快闪存储器)、光纤、便捷式紧凑盘只读存储器(CD-ROM)、光学储存设备、磁储存设备、或上述内容的任何合适组合。In the context of this disclosure, a machine-readable medium may be a tangible medium that may contain or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. Machine-readable media may include, but are not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, devices or devices, or any suitable combination of the foregoing. More specific examples of machine-readable storage media would include one or more wire-based electrical connections, laptop disks, hard drives, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above.

为了提供与用户的交互,可以在计算机上实施此处描述的系统和技术,该计算机具有:用于向用户显示信息的显示装置(例如,CRT(阴极射线管)或者LCD(液晶显示器)监视器);以及键盘和指向装置(例如,鼠标或者轨迹球),用户可以通过该键盘和该指向装置来将输入提供给计算机。其它种类的装置还可以用于提供与用户的交互;例如,提供给用户的反馈可以是任何形式的传感反馈(例如,视觉反馈、听觉反馈、或者触觉反馈);并且可以用任何形式(包括声输入、语音输入、或者触觉输入)来接收来自用户的输入。To provide interaction with a user, the systems and techniques described herein may be implemented on a computer having a display device (eg, a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to the user ); and a keyboard and pointing device (eg, a mouse or a trackball) through which a user can provide input to the computer. Other kinds of devices may also be used to provide interaction with the user; for example, the feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and may be provided in any form, including acoustic input, voice input, or tactile input) to receive input from the user.

可以将此处描述的系统和技术实施在包括后台部件的计算系统(例如,作为数据服务器)、或者包括中间件部件的计算系统(例如,应用服务器)、或者包括前端部件的计算系统(例如,具有图形用户界面或者网络浏览器的用户计算机,用户可以通过该图形用户界面或者该网络浏览器来与此处描述的系统和技术的实施方式交互)、或者包括这种后台部件、中间件部件、或者前端部件的任何组合的计算系统中。可以通过任何形式或者介质的数字数据通信(例如,通信网络)来将系统的部件相互连接。通信网络的示例包括:局域网(LAN)、广域网(WAN)和互联网。The systems and techniques described herein may be implemented in a computing system that includes back-end components (e.g., as a data server), or a computing system that includes middleware components (e.g., an application server), or a computing system that includes front-end components (e.g., A user's computer having a graphical user interface or web browser through which the user can interact with implementations of the systems and technologies described herein), or including such backend components, middleware components, or any combination of front-end components in a computing system. The components of the system may be interconnected by any form or medium of digital data communication (eg, a communications network). Examples of communication networks include: local area network (LAN), wide area network (WAN), and the Internet.

计算机系统可以包括客户端和服务器。客户端和服务器一般远离彼此并且通常通过通信网络进行交互。通过在相应的计算机上运行并且彼此具有客户端-服务器关系的计算机程序来产生客户端和服务器的关系。服务器可以是云服务器,也可以为分布式系统的服务器,或者是结合了区块链的服务器。Computer systems may include clients and servers. Clients and servers are generally remote from each other and typically interact over a communications network. The relationship of client and server is created by computer programs running on corresponding computers and having a client-server relationship with each other. The server can be a cloud server, a distributed system server, or a server combined with a blockchain.

应该理解,可以使用上面所示的各种形式的流程,重新排序、增加或删除步骤。例如,本公开中记载的各步骤可以并行地执行也可以顺序地执行也可以不同的次序执行,只要能够实现本公开公开的技术方案所期望的结果,本文在此不进行限制。It should be understood that various forms of the process shown above may be used, with steps reordered, added or deleted. For example, each step described in the present disclosure can be executed in parallel, sequentially, or in a different order. As long as the desired results of the technical solutions disclosed in the present disclosure can be achieved, there is no limitation here.

上述具体实施方式,并不构成对本公开保护范围的限制。本领域技术人员应该明白的是,根据设计要求和其他因素,可以进行各种修改、组合、子组合和替代。任何在本公开的原则之内所作的修改、等同替换和改进等,均应包含在本公开保护范围之内。The above-mentioned specific embodiments do not constitute a limitation on the scope of the present disclosure. It will be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions are possible depending on design requirements and other factors. Any modifications, equivalent substitutions, improvements, etc. made within the principles of this disclosure shall be included in the protection scope of this disclosure.

Claims (14)

CN202211488824.9A2022-11-252022-11-25 Data processing methods, devices, electronic equipment and storage mediaActiveCN115759294B (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN202211488824.9ACN115759294B (en)2022-11-252022-11-25 Data processing methods, devices, electronic equipment and storage media

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN202211488824.9ACN115759294B (en)2022-11-252022-11-25 Data processing methods, devices, electronic equipment and storage media

Publications (2)

Publication NumberPublication Date
CN115759294A CN115759294A (en)2023-03-07
CN115759294Btrue CN115759294B (en)2023-10-24

Family

ID=85337922

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN202211488824.9AActiveCN115759294B (en)2022-11-252022-11-25 Data processing methods, devices, electronic equipment and storage media

Country Status (1)

CountryLink
CN (1)CN115759294B (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN109299725A (en)*2018-07-272019-02-01华中科技大学鄂州工业技术研究院 A prediction system and device for parallel realization of high-order principal eigenvalue decomposition based on tensor chain
JP2021018677A (en)*2019-07-222021-02-15株式会社Preferred NetworksInformation processing system, method for generating neural network structure and information processing program
CN114008586A (en)*2019-06-272022-02-01亚马逊技术股份有限公司 Transpose operation using an array of processing elements
CN114201242A (en)*2021-12-102022-03-18北京百度网讯科技有限公司 Method, apparatus, device and storage medium for processing data
CN114556372A (en)*2019-09-032022-05-27辉达公司Processor and system for transforming tensor operations in machine learning
CN114724254A (en)*2022-05-162022-07-08北京百度网讯科技有限公司Method, device, equipment, storage medium and program product for determining action category
CN115081607A (en)*2022-05-192022-09-20北京百度网讯科技有限公司 Reverse computing method, device, device and storage medium based on embedded operator
CN115169541A (en)*2022-08-172022-10-11无锡江南计算技术研究所 A Tensor, Vector, Scalar Computation Acceleration and Data Scheduling System

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US11170294B2 (en)*2016-01-072021-11-09Intel CorporationHardware accelerated machine learning

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN109299725A (en)*2018-07-272019-02-01华中科技大学鄂州工业技术研究院 A prediction system and device for parallel realization of high-order principal eigenvalue decomposition based on tensor chain
CN114008586A (en)*2019-06-272022-02-01亚马逊技术股份有限公司 Transpose operation using an array of processing elements
JP2021018677A (en)*2019-07-222021-02-15株式会社Preferred NetworksInformation processing system, method for generating neural network structure and information processing program
CN114556372A (en)*2019-09-032022-05-27辉达公司Processor and system for transforming tensor operations in machine learning
CN114201242A (en)*2021-12-102022-03-18北京百度网讯科技有限公司 Method, apparatus, device and storage medium for processing data
CN114724254A (en)*2022-05-162022-07-08北京百度网讯科技有限公司Method, device, equipment, storage medium and program product for determining action category
CN115081607A (en)*2022-05-192022-09-20北京百度网讯科技有限公司 Reverse computing method, device, device and storage medium based on embedded operator
CN115169541A (en)*2022-08-172022-10-11无锡江南计算技术研究所 A Tensor, Vector, Scalar Computation Acceleration and Data Scheduling System

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Fast evaluation of fnite element weak forms using python tensor contraction packages;Robert Cimrman;《 Advances in Engineering Software》;1-26*
基于FPGA的张量分解计算单元及其在人脸识别中的应用;周琦;《中国优秀硕士学位论文全文数据库》;1-68*
面向深度学习的批处理矩阵乘法设计与实现;黄春 等;《计算机学报》;第45卷(第2期);225-239*

Also Published As

Publication numberPublication date
CN115759294A (en)2023-03-07

Similar Documents

PublicationPublication DateTitle
EP3923185A2 (en)Image classification method and apparatus, electronic device and storage medium
JP7430820B2 (en) Sorting model training method and device, electronic equipment, computer readable storage medium, computer program
CN115374948B (en)Training method, data processing method, device and medium of quantum neural network
US20220300697A1 (en)Method for generating target object, electronic device, and storage medium
US20230215136A1 (en)Method for training multi-modal data matching degree calculation model, method for calculating multi-modal data matching degree, and related apparatuses
CN113379813B (en) Training method, device, electronic device and storage medium for depth estimation model
CN117114063A (en)Method for training a generative large language model and for processing image tasks
US12039281B2 (en)Method and system for processing sentence, and electronic device
US20230079275A1 (en)Method and apparatus for training semantic segmentation model, and method and apparatus for performing semantic segmentation on video
CN114201242B (en) Method, device, device and storage medium for processing data
US20160210718A1 (en)Data-parallel parameter estimation of the latent dirichlet allocation model by greedy gibbs sampling
CN114037074A (en) A model pruning method, device, electronic device and storage medium
CN112990218A (en)Optimization method and device of image semantic segmentation model and electronic equipment
CN118673325A (en)Training method, reasoning method, device, equipment and storage medium for large language model
EP4057283A2 (en)Method for detecting voice, method for training, apparatuses and smart speaker
US12393823B2 (en)Data processing method for neural network accelerator, device and storage medium
US20230367978A1 (en)Cross-lingual apparatus and method
WO2023045149A1 (en)Image fusion method and apparatus, electronic device, and storage medium
WO2024045866A1 (en)System and method for cross-modal interaction based on pre-trained model
CN114817476A (en)Language model training method and device, electronic equipment and storage medium
WO2024259915A1 (en)Model quantization method and apparatus, device, and medium
CN118520115A (en)Information management system and method based on RPA and AI technology
CN118196556A (en) Model training method, open set classification method, device, equipment and medium
JP2025098208A (en) Task execution method, device, electronic device, storage medium, and program used for large-scale models
US20250054494A1 (en)Method and device for training speech translation model, and storage medium

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination
GR01Patent grant
GR01Patent grant

[8]ページ先頭

©2009-2025 Movatter.jp