Movatterモバイル変換


[0]ホーム

URL:


CN114443259B - Data processing method, device, computer equipment and storage medium - Google Patents

Data processing method, device, computer equipment and storage medium

Info

Publication number
CN114443259B
CN114443259BCN202011193626.0ACN202011193626ACN114443259BCN 114443259 BCN114443259 BCN 114443259BCN 202011193626 ACN202011193626 ACN 202011193626ACN 114443259 BCN114443259 BCN 114443259B
Authority
CN
China
Prior art keywords
node
constant
output
nodes
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011193626.0A
Other languages
Chinese (zh)
Other versions
CN114443259A (en
Inventor
请求不公布姓名
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Anhui Cambricon Information Technology Co Ltd
Original Assignee
Anhui Cambricon Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Anhui Cambricon Information Technology Co LtdfiledCriticalAnhui Cambricon Information Technology Co Ltd
Priority to CN202011193626.0ApriorityCriticalpatent/CN114443259B/en
Publication of CN114443259ApublicationCriticalpatent/CN114443259A/en
Application grantedgrantedCritical
Publication of CN114443259BpublicationCriticalpatent/CN114443259B/en
Activelegal-statusCriticalCurrent
Anticipated expirationlegal-statusCritical

Links

Classifications

Landscapes

Abstract

The present disclosure relates to a data processing method, apparatus, computer device, and storage medium. The article includes a processor including a plurality of processing units for executing sequences of instructions for updating a computational graph and processing data, and a memory unit for storing data, which may include a random access memory (RAM, random Access Memory) and a register file. Multiple processing units in a processor may share some memory space, such as shared part of RAM memory space and register files, as well as having separate memory spaces. By using the processor, the method and the device can improve the operation efficiency of related products when the neural network model is operated.

Description

Data processing method, device, computer equipment and storage medium
Technical Field
The present disclosure relates to the field of computer technologies, and in particular, to a data processing method, apparatus, computer device, and storage medium.
Background
With the development of computer technology, a data stream programming technology has emerged, and for convenience of programming and flexibility, fine-grained operators are used to simulate calculation and splice into a deep neural network. This also means that hundreds or thousands of nodes in the network need to be executed in turn when reasoning is done in a deep neural network, each operator needs to call its kernel function by the processor and copy data from global memory to on-chip, so its performance overhead is in addition to the overhead of computing, copying data between each node and kernel start-up. To reduce these overheads, it is common to combine these fine-grained nodes, which are all capable of executing on the same device, into a large converged node before actually performing the computation. When the method is executed, only the fusion node is needed to be executed, so that the number of kernel function calls is reduced, and meanwhile, the data copying cost in the fusion node is also reduced. Whereas, among the nodes of the computational graph, there may be nodes that do not support fusion, resulting in fused segments that degrade the optimization.
Disclosure of Invention
In view of the foregoing, it is desirable to provide a data processing method, apparatus, computer device, and storage medium capable of improving the operation efficiency.
According to one aspect of the disclosure, a data processing method is provided, which comprises the steps of determining a first constant node which does not support fusion in a computational graph according to types of a plurality of nodes in the computational graph, performing constant elimination processing according to the first constant node to obtain an elimination result, and updating the computational graph according to the elimination result to obtain an updated computational graph.
In one possible implementation manner, constant elimination processing is performed according to the first constant node to obtain an elimination result, and the elimination result is obtained by eliminating the first constant node, an input edge and an output edge of the first constant node.
In one possible implementation manner, the first constant node comprises a third constant node, wherein the node pointed by the output edge of the third constant node comprises a non-constant node, and the removing of the first constant node, the input edge and the output edge of the first constant node comprises obtaining output information of the third constant node according to data to be processed of the input calculation graph, storing the output information of the third constant node, and removing the input edge and the output edge of the third constant node.
In one possible implementation manner, the first constant node comprises a second constant node, wherein nodes pointed by output edges of the second constant node are all the first constant node, and the removing of the input edges and the output edges of the first constant node comprises directly removing the input edges and the output edges of the second constant node.
In one possible implementation, the first constant node includes a second constant node and a third constant node, and the removing the first constant node, the input edge and the output edge of the first constant node includes removing the second constant node and the third constant node, and the input edge and the output edge of the second constant node and the third constant node simultaneously.
In one possible implementation manner, the method for determining the first constant node which does not support fusion in the computation graph according to the types of the plurality of nodes in the computation graph comprises the steps of determining a first target node of a preset type in the plurality of nodes in the computation graph according to the types of the plurality of nodes in the computation graph, performing constant discrimination processing on the output node of the first target node to obtain a second target node with constant input information, wherein the output node of the first target node comprises a direct output node and an indirect output node, the direct output node comprises a node to which an output edge of the first target node directly points, the indirect output node comprises a node to which an output edge of the first target node indirectly points through other nodes, and obtaining the first constant node according to the first target node and the second target node.
In one possible implementation manner, constant judging processing is carried out on the output node of the first target node to obtain a second target node with constant input information, the constant judging processing comprises judging whether input information of an ith direct output node of the first target node only comprises constant information or not, i is a positive integer, determining the ith direct output node of the first target node as a second target node when the input information of the ith direct output node of the first target node only comprises constant information, judging whether input information of a jth indirect output node of the first target node corresponding to the ith direct output node of the first target node only comprises constant information or not, j is a positive integer, and determining the jth indirect output node of the first target node as the second target node when the input information of the jth indirect output node of the first target node only comprises constant information.
In one possible implementation manner, the constant judging process is performed on the output node of the first target node to obtain a second target node with constant input information, and the constant judging process is stopped on the remaining indirect output nodes corresponding to the i direct output node of the first target node or the j indirect output node of the first target node, wherein the remaining indirect output nodes comprise indirect output nodes which do not perform the constant judging process in the i direct output node of the first target node or the j indirect output node of the first target node when the i direct output node of the first target node or the j indirect output node of the first target node comprises non-constant information.
In one possible implementation, the data to be processed includes at least one of an image, a video, a voice, and a text.
According to one aspect of the disclosure, a data processing device is provided, which comprises a determining module, a canceling module and an updating module, wherein the determining module is used for determining a first constant node which does not support fusion in a computational graph according to types of a plurality of nodes in the computational graph, the canceling module is used for performing constant canceling processing according to the first constant node to obtain a canceling result, and the updating module is used for updating the computational graph according to the canceling result to obtain an updated computational graph.
In one possible implementation, the cancellation module is further configured to remove the first constant node, the input edge and the output edge of the first constant node, and obtain the cancellation result.
In one possible implementation manner, the first constant node comprises a third constant node, wherein the node pointed by the output edge of the third constant node comprises a non-constant node, and the elimination module is further used for obtaining the output information of the third constant node according to the data to be processed input into the calculation graph, storing the output information of the third constant node, and eliminating the input edge and the output edge of the third constant node.
In one possible implementation manner, the first constant node comprises a second constant node, wherein nodes pointed by output edges of the second constant node are all the first constant node, and the elimination module is further used for directly eliminating the second constant node, input edges and output edges of the second constant node.
In one possible implementation, the first constant node includes a second constant node and a third constant node, and the cancellation module is further configured to simultaneously remove the second constant node and the third constant node, and input edges and output edges of the second constant node and the third constant node.
In one possible implementation manner, the determining module is further configured to determine a first target node of a preset type among the plurality of nodes in the computation graph according to the types of the plurality of nodes in the computation graph, perform constant discrimination processing on an output node of the first target node to obtain a second target node with constant input information, where the output node of the first target node includes a direct output node and an indirect output node, the direct output node includes a node to which an output edge of the first target node points directly, the indirect output node includes a node to which an output edge of the first target node points indirectly via other nodes, and obtain the first constant node according to the first target node and the second target node.
In one possible implementation manner, the determining module is further configured to determine whether input information of an ith direct output node of the first target node includes only constant information, i is a positive integer, determine the ith direct output node of the first target node as a second target node if the input information of the ith direct output node of the first target node includes only constant information, and determine whether input information of a jth indirect output node of the first target node corresponding to the ith direct output node of the first target node includes only constant information, j is a positive integer, and determine the jth indirect output node of the first target node as a second target node if the input information of the jth indirect output node of the first target node includes only constant information.
In one possible implementation manner, the determining module is further configured to stop performing the constant discrimination processing on the remaining indirect output nodes corresponding to the i-th direct output node of the first target node or the j-th indirect output node of the first target node, where the i-th direct output node of the first target node or the j-th indirect output node of the first target node includes non-constant information, and the remaining indirect output nodes include indirect output nodes that do not perform the constant discrimination processing in the i-th direct output node of the first target node or the j-th indirect output node of the first target node.
In one possible implementation, the data to be processed includes at least one of an image, a video, a voice, and a text.
According to an aspect of the present disclosure, there is provided an artificial intelligence chip comprising the data processing apparatus.
According to an aspect of the present disclosure, there is provided an electronic device including the artificial intelligence chip.
According to one aspect of the disclosure, a board card is provided, which comprises a storage device, an interface device, a control device and the artificial intelligent chip, wherein the artificial intelligent chip is respectively connected with the storage device, the control device and the interface device, the storage device is used for storing data, the interface device is used for realizing data transmission between the artificial intelligent chip and external equipment, and the control device is used for monitoring the state of the artificial intelligent chip.
According to an aspect of the disclosure, a computer device is provided, comprising a processor, a memory for storing processor-executable instructions, wherein the processor is configured to invoke the instructions stored in the memory to perform the data processing method.
According to an aspect of the present disclosure, there is provided a computer-readable storage medium having stored thereon computer program instructions, characterized in that the computer program instructions, when executed by a processor, implement the data processing method.
According to the embodiment of the disclosure, aiming at the problem that fusion is not supported by a node of a specific type, fusion segmentation is caused, so that the fusion effect is poor, the calculation graph is optimized by eliminating a first constant node which does not support fusion, the calculation graph segmentation is reduced, the fusion effect is improved, the processing overhead of data calling and transferring is reduced, the calculation efficiency of the calculation graph is improved, and the calculation performance of a processor is brought into play.
Other features and aspects of the present disclosure will become apparent from the following detailed description of exemplary embodiments, which proceeds with reference to the accompanying drawings.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate exemplary embodiments, features and aspects of the present disclosure and together with the description, serve to explain the principles of the disclosure.
FIG. 1 shows a schematic diagram of a processor of a data processing method according to an embodiment of the present disclosure;
FIG. 2 illustrates a flow chart of a data processing method according to an embodiment of the present disclosure;
FIG. 3 illustrates a schematic diagram of a first constant node according to an embodiment of the present disclosure;
fig. 4A and 4B are diagrams showing an application example of a data processing method according to an embodiment of the present disclosure;
FIG. 5 shows a block diagram of a data processing apparatus according to an embodiment of the present disclosure;
FIG. 6 is a block diagram illustrating a combination processing device according to an embodiment of the present disclosure;
fig. 7 is a schematic view showing the structure of a board according to an embodiment of the present disclosure;
Fig. 8 illustrates a block diagram of a computer device, according to an embodiment of the present disclosure.
Detailed Description
The following description of the technical solutions in the embodiments of the present disclosure will be made clearly and completely with reference to the accompanying drawings in the embodiments of the present disclosure, and it is apparent that the described embodiments are some embodiments of the present disclosure, but not all embodiments. Based on the embodiments in this disclosure, all other embodiments that a person skilled in the art would obtain without making any inventive effort are within the scope of protection of this disclosure.
It should be understood that the terms "first," "second," and the like in the claims, specification and drawings of the present disclosure are used for distinguishing between different objects and not for describing a particular sequential order. The terms "comprises" and "comprising" when used in the specification and claims of this disclosure are taken to specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
It is also to be understood that the terminology used in the description of the present disclosure is for the purpose of describing particular embodiments only, and is not intended to be limiting of the disclosure. As used in this disclosure and in the claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should be further understood that the term "and/or" as used in the present disclosure and claims refers to any and all possible combinations of one or more of the associated listed items, and includes such combinations.
As used in this specification and the claims, the term "if" may be interpreted as "when..once" or "in response to a determination" or "in response to detection" depending on the context. Similarly, the phrase "if a determination" or "if a [ described condition or event ] is detected" may be interpreted in the context of meaning "upon determination" or "in response to determination" or "upon detection of a [ described condition or event ]" or "in response to detection of a [ described condition or event ]".
The data processing method of the embodiment of the disclosure can be applied to the processor to improve the processing efficiency of the processor. The processor may be a general-purpose processor such as a CPU (Central Processing Unit ) or an artificial Intelligence Processor (IPU) for performing artificial intelligence operations. The artificial intelligence operations may include machine learning operations, brain-like operations, and the like. The machine learning operation comprises neural network operation, k-means operation, support vector machine operation and the like. The artificial intelligence processor may include, for example, one or a combination of a GPU (Graphics Processing Unit ), NPU (Neural-Network Processing Unit, neural network processing unit), DSP (DIGITAL SIGNAL Process, digital signal processing unit), field-Programmable gate array (Field-Programmable GATE ARRAY, FPGA) chip. The present disclosure is not limited by the specific type of processor.
In one possible implementation, the processors referred to in this disclosure may comprise multiple processing units, each of which may independently run various tasks assigned thereto, such as convolution tasks, pooling tasks, or full-join tasks. The present disclosure is not limited to the tasks that the processing unit operates on.
Fig. 1 shows a schematic diagram of a processor of a data processing method according to an embodiment of the present disclosure. As shown in fig. 1, the processor 100 includes a plurality of processing units 101 and a memory unit 102, the plurality of processing units 101 being configured to execute sequences of instructions, the memory unit 102 being configured to store data, which may include a random access memory (RAM, random Access Memory) and a register file. Multiple processing units 101 in processor 100 may share some memory space, such as shared part of RAM memory space and register files, as well as having separate memory space.
Fig. 2 shows a flow chart of a data processing method according to an embodiment of the present disclosure. As shown in fig. 2, the method is applied to the above processor, and includes:
in step S11, according to the types of a plurality of nodes in the calculation graph, determining a first constant node which does not support fusion in the calculation graph;
In step S12, constant elimination processing is carried out according to the first constant node, and an elimination result is obtained;
in step S13, the calculation map is updated according to the elimination result, and the updated calculation map is obtained.
According to the embodiment of the disclosure, aiming at the problem that fusion is not supported by a node of a specific type, fusion segmentation is caused, so that the fusion effect is poor, the calculation graph is optimized by eliminating a first constant node which does not support fusion, the calculation graph segmentation is reduced, the fusion effect is improved, the processing overhead of data calling and transferring is reduced, the calculation efficiency of the calculation graph is improved, and the calculation performance of a processor is brought into play.
In one possible implementation, the above steps may be stored in the storage unit 102 in the form of computer executable instructions, which are accessible to the plurality of processing units 101 and execute the instructions to perform the above steps. For example, the processor 100 may include a plurality of processing units 101, the first processing unit 101 may perform step S11, the second processing unit may perform step S12, and the third processing unit may perform step S13. In another example, more than two steps may be performed by one processing unit 101, for example, the first processing unit 101 performs step S11 and step S12, the second processing unit performs step S13, or step S11, step S12, and step S13 are performed by the same processing unit 101. The present disclosure does not limit the manner in which the steps are performed.
In an example, the first processing unit 101 may be a CPU, the second processing unit may be a GPU, the third processing unit may be an IPU, the first CPU may perform step S11, the GPU may perform step S12, and the IPU may perform step S13. The above-described processor may also perform a plurality of steps, for example, the GPU may perform steps S11-S13. The present disclosure is not limited in the type of processor.
In one possible implementation, the computation graph may be a data relationship graph representing an information processing flow, for example CPU, IPU, GPU, NPU, a Machine learning processor (Machine LearningUnit, MLU), or the like, and the information may be processed using the information processing flow described by the computation graph and the processing result obtained.
In an example, the computational graph of the neural network may be performed by a GPU, an MLU, or the like, and the data to be processed including at least one of images, video, voice, and text is processed. For example, the data to be processed is an image, the image can be input into a GPU, and the GPU can process the image according to an information processing flow represented by the calculation graph to obtain an image processing result.
In one possible implementation, the computation graph may include nodes and edges, where the nodes represent operators, the data may be processed by the operators, for example, the nodes may include predicate nodes, the predicate nodes may compare and/or determine the data by the predicate operators, the edges represent transmission paths of the data streams, the input edges of the nodes represent the data streams that input the nodes, the output edges of the nodes represent the data streams that output the processing results of the data by the nodes, and the output edges of one node may be the input edges of another node, for example, the node a transmits the processing results to the node B through the output edges, and the output edges of the node a are the input edges of the node B. In the process of processing according to the information processing flow represented by the computation graph, the data to be processed can be input into the computation graph through the input nodes, the input nodes can represent the input ports of the whole computation graph, and the input can enter the computation graph through the input nodes and be transmitted to other nodes for processing by the output edges of the input nodes. After the processing of the nodes, the processing results are transmitted to the nodes pointed by the nodes through the output edges of the nodes, the data to be processed can be processed according to the flow, and the output nodes output the processing results of the data to be processed, wherein the output nodes refer to the output ports of the whole computational graph.
In one possible implementation, the computation graph may include a plurality of types of nodes, for example, the types of nodes include computation nodes for computation such as convolution nodes for computation, logic processing nodes for control such as nodes where operators of Switch, merge, etc. are located, or nodes for finding an input Shape such as Shape node (Shape), rank node (Rank), size node (Size), etc. Optionally, a Shape node (Shape) is used to obtain the Size or dimension of the input information (e.g., matrix, image, etc.), a Rank node (Rank) is used to obtain the Rank of the input information, and a Size node (Size) is used to obtain the Size of the input information. Alternatively, the image classification and detection network may perform a Shape operation on the input to obtain its size and perform subsequent operations (e.g., restore) based on the size. Since the specific Shape of the input is variable, most of the subsequent operations are variable in Shape, shape operations for taking their Shape or Rank operations cannot be set to constant and none of their subsequent operations can be constant optimized. The present disclosure does not limit the type of node.
In one possible implementation, the node may obtain attribute parameters of the input information, such as size, dimension, and the like. Among the above nodes, there are nodes that do not support fusion, such as a node for finding an input shape, and the like. Alternatively, since the data type (for example, the data type is int 64) of the output data of the above-mentioned node does not support the fusion processing on the processor, and the output information of the above-mentioned node may also be a variable when the attribute parameter of the input information is changed. The present disclosure is not limited to reasons why fusion is not supported. In an example, the neural network may receive input images having different sizes, i.e., parameters such as length (N), width (W), height (H) of each input image (e.g., three-dimensional image) are not consistent with each other, in the above-mentioned node (e.g., shape node (shape)), a placeholder may be used instead of parameters such as length, width, height, etc., (. The output of the node is also variable as the size of the input image changes. Therefore, the above-described nodes are not optimized as constant nodes. Thus, the above nodes may cause fusion segments, which in turn affect the fusion effect.
However, when processing images using neural networks, multiple inputs may typically be processed in batches, and the attribute parameters of the input data of the same batch may typically be kept consistent, e.g., multiple video frames of a video may be input, and the sizes of the video frames may be kept consistent. For example, if a video segment includes 10000 video frames, the sizes of the 10000 video frames are consistent, and the processing result of the 10000 video frames by the node is kept unchanged. In this case, each time a video frame is input, the calculation is performed through the nodes, which not only wastes the calculation resources, but also makes the fusion effect of a plurality of nodes of the calculation graph of the neural network worse, resulting in fusion segmentation, and difficulty in improving the fusion effect.
In one possible implementation manner, for the above problem, a first constant node whose output information is constant may be determined among a plurality of nodes in the computation graph, so as to optimize the first constant node whose output information is constant when the attribute parameters of the batch input data remain unchanged, so as to improve the processing efficiency and the fusion effect.
In one possible implementation, the first constant node refers to a node that does not support fusion in the computation graph and whose output information is constant.
In one possible implementation, the first constant node may be determined among a plurality of nodes of the computational graph. The method further comprises the steps of determining a first target node of a preset type in the plurality of nodes of the computation graph according to the types of the plurality of nodes in the computation graph, performing constant discrimination processing on the output node of the first target node to obtain a second target node with constant input information, wherein the output node of the first target node comprises a direct output node and an indirect output node, the direct output node comprises a node directly pointed by an output edge of the first target node, the indirect output node comprises a node indirectly pointed by an output edge of the first target node through other nodes, and the first constant node is obtained according to the first target node and the second target node.
In one possible implementation, the first target node of the preset type includes a node that can directly determine that the output information is constant. As described above, if the attribute parameters of the batch data of the input computation graph are consistent, the output information of the above-mentioned type node is constant, and the preset type first target node in the computation graph may be searched according to the type of each node in the computation graph, where the preset type first target node may include the above-mentioned Shape node (Shape), rank node (Rank), size node (Size) and constant node whose output information is constant, and may include other nodes whose output information is constant when the attribute parameters of the batch data are unchanged, for example, a zero setting node (Zeroslike) that sets the element of the input image or matrix to zero, a zero setting node (Oneslike) that sets the element of the input image or matrix to one), and the present disclosure does not limit the preset type first target node.
Fig. 3 illustrates a schematic diagram of a first constant node according to an embodiment of the present disclosure.
In one possible implementation manner, after the first target node is determined, constant discrimination processing may be performed on an output node of the first target node, so as to obtain a second target node with constant input information. The method comprises the steps of judging whether input information of an ith direct output node of the first target node only comprises constant information, determining the ith direct output node of the first target node as a second target node when the input information of the ith direct output node of the first target node only comprises constant information, continuously judging whether input information of a jth indirect output node of the first target node corresponding to the ith direct output node of the first target node only comprises constant information, and determining the jth indirect output node of the first target node as a second target node when the input information of the jth indirect output node of the first target node only comprises constant information. Fig. 3 illustrates a schematic diagram of a first constant node according to an embodiment of the present disclosure. As shown in fig. 3, node 0 is an input node, which can be used as a port for inputting data such as an image or a video frame into a computation graph, node 1 and node 2 are first target nodes, for example, node 1 is a Shape node (Shape), node 2 is a Size node (Size), and node 5 is a constant node, and the dimensions and sizes of the input image or video frame can be obtained respectively. It may be determined whether the input information of the direct output nodes of the first target node (e.g., node 1 and node 2) includes only constant information. It may be determined whether the direct output node of the first target node includes only constant information. For example, the node 3 is the i-th direct output node of the node 1, and the input information of the node 3 includes only constant information (for example, the output information of the node 1), and since the output information of the node 1 is constant, the input information of the node 3 is constant, that is, the output information is constant after the node 3 computes the constant, and thus the node 3 is the second target node.
In one possible implementation, in the case where the i-th direct output node is the second target node, it may be determined whether an indirect output node subsequent to the i-th direct output node is the second target node. For example, the node 7 is the 1 st indirect output node subsequent to the i-th direct output node, it may be determined that the input information of the node 7 includes only constant information, for example, the input information of the node 7 is the output information of the node 3 and the output information of the node 6, and if the output information of the node 3 and the output information of the node 6 are both constant, the output information of the node 7 is also constant, that is, the node 7 is the second target node. For example, the node 8 is the ith subsequent 2 nd indirect output node, it may be determined that the input information of the node 8 includes only constant information, for example, the input information of the node 8 is the output information of the node 7 and the output information of the node 5, and if the output information of the node 7 and the output information of the node 5 are both constant, the output information of the node 8 is also constant, that is, the node 8 is the second target node. By the method, whether a plurality of indirect output nodes subsequent to the ith direct output node are second target nodes can be judged, and if judging that a certain indirect output node is not the second target node, the indirect output nodes subsequent to the indirect output node are not judged any more.
In one possible implementation manner, the constant judging process is performed on the output node of the first target node to obtain a second target node with constant input information, and the constant judging process is stopped on the remaining indirect output nodes corresponding to the i direct output node of the first target node or the j indirect output node of the first target node, wherein the remaining indirect output nodes comprise indirect output nodes which do not perform the constant judging process in the i direct output node of the first target node or the j indirect output node of the first target node when the i direct output node of the first target node or the j indirect output node of the first target node comprises non-constant information.
In an example, the input information of the node 8 is the output information of the node 7 and the output information of the node 5, if the output information of the node 7 is constant and the output information of the node 5 is not constant, the node 8 is not the second target node, and the judgment on the subsequent indirect output nodes of the node 8 can be no longer performed. The subsequent other indirect output nodes of node 7 may continue to be judged. If all the indirect output nodes subsequent to the node 7 are not the second target nodes, the indirect output node corresponding to the i-th direct output node may not be judged any more, and the judgment of the i+1th direct output node and the indirect output node corresponding thereto, for example, the judgment of the node 4 and the indirect output node corresponding thereto, may be started. Or if the i-th direct output node is not the second target node (e.g., the input information of node 3 also includes other non-constant information), the i+1th direct output node (e.g., node 4) and its corresponding indirect output node may be directly determined.
In one possible implementation, each of the first target node and the second target node in the computation graph may be determined in the manner described above, i.e., the first constant node in the computation graph is obtained.
By the method, the first constant node can be searched in the calculation graph, optimization of the calculation graph aiming at the first constant node is facilitated, fusion segmentation is reduced, and fusion effect is improved.
In a possible implementation manner, in step S12, after determining the first constant node in the computation graph, since the output information of the first constant node is constant, that is, a constant known amount, computation may not be performed through the first constant node, so as to reduce computation steps and improve processing efficiency. In step S12, removing the first constant node, the input edge and the output edge of the first constant node, and obtaining the elimination result.
In one possible implementation manner, in the first constant nodes, it is determined that the nodes pointed by the output edges are second constant nodes of the first constant nodes. Removing the first constant node, the input edge and the output edge of the first constant node, and obtaining the elimination result may include directly removing the second constant node, the input edge and the output edge of the second constant node. For example, as shown in fig. 3, if nodes 1 and 3 are first constant nodes and the output information of node 6 is constant, node 7 is also the first constant node, and then node 3 is the second constant node of the first constant node, where the output edges point to the nodes, and the input edges and the output edges of node 3 can be directly removed.
In one possible implementation, in the first constant node, it may further be determined that the node to which the output edge points includes a third constant node that is a non-constant node. Removing the first constant node, the input edge and the output edge of the first constant node, and obtaining the elimination result can comprise obtaining output information of a third constant node according to data to be processed input into the computational graph, storing the output information of the third constant node, and removing the third constant node, the input edge and the output edge of the third constant node. The output information of the third constant node may be first determined for use in subsequent computation of the non-constant node. In the example, the nodes 1 and 3 are first constant nodes, and in the case where the output information of the node 6 is non-constant, the node 7 is not the first constant node, and in this case, the node to which the output edge of the node 3 points includes a non-constant node, and the node 3 is a third constant node. The output information of the node 3 is a constant, and the value of the constant can be determined for subsequent calculation, then the node 3 and the input edge and the output edge thereof can be deleted, that is, the input data to be processed is processed through the calculation graph until the output information of the node 3 is determined, and after the output information of the node 3 is saved, the node 3 and the input edge and the output edge thereof can be deleted.
In one possible implementation manner, the second constant node in the computation graph can be directly removed, after the output information of the third constant node is determined, the third constant node is removed, and the input edges and the output edges of the second constant node and the third constant node are removed at the same time, so that the computation graph is simplified, and the processing efficiency is improved. The removing the first constant node, the input edge and the output edge of the first constant node comprises removing the second constant node and the third constant node, and the input edge and the output edge of the second constant node and the third constant node at the same time. In an example, when the second constant node and the third constant node are included in the computation graph at the same time, output information of the third constant node may be first determined and saved, and then input edges and output edges of the second constant node and the third constant node, and input edges and output edges of the second constant node and the third constant node, that is, the first constant node, and input edges and output edges of the first constant node, are removed, and a removal result (that is, the computation graph after removing the first constant node, and input edges and output edges of the first constant node) is obtained.
In one possible implementation manner, in step S13, since the first constant node (including the second constant node that does not support fusion) is deleted, other nodes that support fusion in the computation graph may be subjected to fusion processing to obtain a fusion node, and the updated computation graph includes at least the fusion node, that is, if the computation graph includes only the nodes that support fusion after the first constant node is deleted, the nodes in the computation graph are fused, and then the fusion node, that is, the updated computation graph, may be obtained. If the computation graph further includes a node that does not support fusion after the first constant node is deleted, the updated computation graph may include the fusion node and other nodes that do not support fusion.
In one possible implementation manner, the information to be processed can be processed through the updated calculation graph to obtain a processing result, and in the processing process, the cost of data transmission can be reduced, the operation cost of constant nodes is reduced, and the processing efficiency is improved due to the improvement of the fusion effect.
Fig. 4A and 4B are diagrams showing an application example of the data processing method according to the embodiment of the present disclosure. As shown in fig. 4A, the computation graph may be a computation graph of image processing performed by the GPU, and the node 10 is an input node, and the image may be input into the computation graph through the node 10 for processing.
In one possible implementation, a first constant node may be determined in a plurality of nodes of the computational graph, for example, the node 11 and the node 12 are first target nodes of a preset type, for example, the node 11 is a Shape node (Shape), the node 12 is a Size node (Size), and the output information of the node 11 and the node 12 is constant in the case that the image Size of the same batch is unchanged.
In one possible implementation, the input information of node 14 includes only the output information of node 12, and thus node 14 is the second target node, and the input information of node 17 includes only the output information of node 14, and thus node 17 is the second target node. The input information of nodes 13, 15 and 18 may include other non-constants, and thus nodes 13 and 15 and 18 are not second target nodes. The first constant nodes may include a first target node and a second target node, i.e., node 11, node 12, node 14, node 17.
In one possible implementation, the second constant node may be determined in the first constant node, i.e. the first constant node whose output edges each point to the first constant node. In an example, the second constant nodes may include node 11, node 14. The input information of the output node 18 of the node 17 includes non-constant information, and the input information of the output node 13 of the node 12 and the node 15 also includes non-constant information, and therefore, the node 12 and the node 17 are not second constant nodes but third constant nodes.
In one possible implementation, the output information of the third constant node may be determined according to the input image. For example, the input image may be processed by each node, and the output information of the nodes 12 and 17 may be obtained and stored. Then, the second constant node and the third constant node and the input edge and the output edge thereof (as shown in fig. 4B) may be deleted from the computation graph at the same time, and further, the remaining nodes may be fused to obtain an updated computation graph. And processing the input image and the output information of the nodes 12 and 17 through the updated calculation graph to obtain a processing result.
In an example, the remaining nodes (i.e., node 10, node 13, node 16, node 15, node 18, and other nodes) all support fusion, and the remaining nodes can be fused into a fused node, and the fused node processes other images of the same batch and output information of node 12 and node 17 to obtain a processing result. Or if some nodes in the rest nodes in the updated calculation graph do not support fusion, fusing the nodes supporting fusion into fusion nodes, and processing other images of the same batch by utilizing the fusion nodes and the nodes not supporting fusion to obtain a processing result.
In one possible implementation manner, the data processing method can remove nodes which do not support fusion in the computational graph, reduce unnecessary operation cost, improve fusion effect, reduce processing cost of data transmission and calling, improve the operation efficiency of the computational graph, and exert the computation performance of the processor. Can be used to optimize the processing of a GPU, MLU, etc. processor, for example, to optimize the processing of images through a neural network. The application scope of the data processing method is not limited by the disclosure.
It should be noted that, for simplicity of description, the foregoing method embodiments are all described as a series of acts, but it should be understood by those skilled in the art that the present disclosure is not limited by the order of acts described, as some steps may be performed in other orders or concurrently in accordance with the present disclosure. Further, those skilled in the art will also appreciate that the embodiments described in the specification are all alternative embodiments, and that the acts and modules referred to are not necessarily required by the present disclosure.
It should be further noted that, although the steps in the flowchart of fig. 2 are sequentially shown as indicated by arrows, the steps are not necessarily sequentially performed in the order indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in fig. 2 may include multiple sub-steps or stages that are not necessarily performed at the same time, but may be performed at different times, nor do the order in which the sub-steps or stages are performed necessarily performed in sequence, but may be performed alternately or alternately with at least a portion of the sub-steps or stages of other steps or other steps.
Fig. 5 shows a block diagram of a data processing apparatus according to an embodiment of the present disclosure. As shown in fig. 5, the method comprises a determining module 11, a canceling module 12 and an updating module 13, wherein the determining module is used for determining a first constant node which does not support fusion in a computational graph according to the types of a plurality of nodes in the computational graph, the canceling module 12 is used for carrying out constant canceling processing according to the first constant node to obtain a canceling result, and the updating module 13 is used for updating the computational graph according to the canceling result to obtain an updated computational graph.
In one possible implementation, the cancellation module is further configured to remove the first constant node, the input edge and the output edge of the first constant node, and obtain the cancellation result.
In one possible implementation manner, the first constant node comprises a third constant node, wherein the node pointed by the output edge of the third constant node comprises a non-constant node, and the elimination module is further used for obtaining the output information of the third constant node according to the data to be processed input into the calculation graph, storing the output information of the third constant node, and eliminating the input edge and the output edge of the third constant node.
In one possible implementation manner, the first constant node comprises a second constant node, wherein nodes pointed by output edges of the second constant node are all the first constant node, and the elimination module is further used for directly eliminating the second constant node, input edges and output edges of the second constant node.
In one possible implementation, the first constant node includes a second constant node and a third constant node, and the cancellation module is further configured to simultaneously remove the second constant node and the third constant node, and input edges and output edges of the second constant node and the third constant node.
In one possible implementation manner, the determining module is further configured to determine a first target node of a preset type among the plurality of nodes in the computation graph according to the types of the plurality of nodes in the computation graph, perform constant discrimination processing on an output node of the first target node to obtain a second target node with constant input information, where the output node of the first target node includes a direct output node and an indirect output node, the direct output node includes a node to which an output edge of the first target node points directly, the indirect output node includes a node to which an output edge of the first target node points indirectly via other nodes, and obtain the first constant node according to the first target node and the second target node.
In one possible implementation manner, the determining module is further configured to determine whether input information of an ith direct output node of the first target node includes only constant information, i is a positive integer, determine the ith direct output node as a second target node if the input information of the ith direct output node includes only constant information, determine whether input information of a jth indirect output node corresponding to the ith direct output node includes only constant information, j is a positive integer, and determine the jth indirect output node as the second target node if the input information of the jth indirect output node includes only constant information.
In one possible implementation manner, the determining module is further configured to stop performing the constant discrimination processing on the remaining indirect output nodes corresponding to the i-th direct output node or the j-th indirect output node, where the remaining indirect output nodes include indirect output nodes that do not perform the constant discrimination processing in the indirect output nodes corresponding to the i-th direct output node or the j-th indirect output node, where the i-th direct output node or the j-th indirect output node includes non-constant information.
In one possible implementation, the data to be processed includes at least one of an image, a video, a voice, and a text.
Fig. 6 is a block diagram illustrating a combination processing apparatus 1200 according to an embodiment of the present disclosure. As shown in fig. 6, the combined processing device 1200 includes a computing processing device 1202, an interface device 1204, other processing devices 1206, and a storage device 1208. Depending on the application scenario, one or more computing devices 1210 may be included in the computing processing device, which may be configured to perform the operations described herein in connection with fig. 2.
In various embodiments, the computing processing means of the present disclosure may be configured to perform user-specified operations. In an exemplary application, the computing processing device may be implemented as a single-core artificial intelligence processor or as a multi-core artificial intelligence processor. Similarly, one or more computing devices included within a computing processing device may be implemented as an artificial intelligence processor core or as part of a hardware architecture of an artificial intelligence processor core. When multiple computing devices are implemented as artificial intelligence processor cores or portions of hardware structures of artificial intelligence processor cores, the computing processing devices of the present disclosure may be considered to have a single core structure or an isomorphic multi-core structure.
In an exemplary operation, the computing processing device of the present disclosure may interact with other processing devices through an interface device to collectively accomplish user-specified operations. Depending on the implementation, other processing devices of the present disclosure may include one or more types of processors among a central processing unit (Central Processing Unit, CPU), a graphics processor (Graphics Processing Unit, GPU), an artificial intelligence processor, and/or the like, general purpose and/or special purpose processors. These processors may include, but are not limited to, digital signal processors (DIGITAL SIGNAL processors, DSPs), application SPECIFIC INTEGRATED Circuits (ASICs), field-Programmable gate arrays (Field-Programmable GATE ARRAY, FPGA) or other Programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc., and the number thereof may be determined according to actual needs. As previously mentioned, the computing processing device of the present disclosure may be considered to have a single core structure or an isomorphic multi-core structure only with respect to it. However, when computing processing devices and other processing devices are considered together, both may be considered to form heterogeneous multi-core structures.
In one or more embodiments, the other processing device may interface with external data and controls as a computing processing device of the present disclosure (which may be embodied as an associated computing device for artificial intelligence, such as neural network operations), performing basic controls including, but not limited to, data handling, turning on and/or off the computing device, and the like. In other embodiments, other processing devices may also cooperate with the computing processing device to jointly accomplish the computational tasks.
In one or more embodiments, the interface device may be used to transfer data and control instructions between the computing processing device and other processing devices. For example, the computing device may obtain input data from other processing devices via the interface device, and write the input data to a storage device (or memory) on the computing device. Further, the computing processing device may obtain control instructions from other processing devices via the interface device, and write the control instructions into a control cache on the computing processing device chip. Alternatively or in addition, the interface device may also read data in a memory device of the computing processing device and transmit it to the other processing device.
Additionally or alternatively, the combined processing apparatus of the present disclosure may further comprise a storage device. As shown in the figure, the storage means are connected to the computing processing means and the other processing means, respectively. In one or more embodiments, a storage device may be used to store data for the computing processing device and/or the other processing devices. For example, the data may be data that cannot be stored entirely within an internal or on-chip memory device of a computing processing device or other processing device.
In some embodiments, the present disclosure also discloses an artificial intelligence seed chip (e.g., chip 1302 shown in fig. 7) that includes the above-described data processing apparatus. In one implementation, the Chip is a System on Chip (SoC) and is integrated with one or more combined processing devices as shown in fig. 6. The chip may be connected to other related components by an external interface device (such as external interface device 1306 shown in fig. 7). The relevant component may be, for example, a camera, a display, a mouse, a keyboard, a network card, or a wifi interface. In some application scenarios, other processing units (e.g., video codecs) and/or interface modules (e.g., DRAM interfaces) etc. may be integrated on the chip. In some embodiments, the disclosure further discloses a chip package structure, which includes the chip. In some embodiments, the disclosure further discloses a board card, which includes the chip packaging structure described above. The board will be described in detail with reference to fig. 7.
Fig. 7 is a schematic diagram illustrating a board 1300 according to an embodiment of the disclosure. As shown in fig. 7, the board includes a memory device 1304 for storing data, which includes one or more memory cells 1310. The memory device may be connected and data transferred to the control device 1308 and the artificial intelligence chip 1302 described above by way of, for example, a bus. Further, the board card also includes an external interface device 1306 configured for data relay or transfer functions between the chip (or chips in the chip package structure) and an external device 1312 (e.g., a server or computer, etc.). For example, the data to be processed may be transferred by the external device to the chip through the external interface means. For another example, the calculation result of the chip may be transmitted back to the external device via the external interface device. The external interface device may have different interface forms according to different application scenarios, for example, it may use a standard PCIE interface or the like.
Each group of storage units is connected with the artificial intelligent chip through a bus. It is understood that each set of memory cells may be DDR SDRAM (Double sided DATA RATE SDRAM, double speed synchronous dynamic random access memory).
DDR can double the speed of SDRAM without increasing the clock frequency. DDR allows data to be read out on both the rising and falling edges of the clock pulse. DDR is twice as fast as standard SDRAM. In one embodiment, the memory device may include 4 sets of the memory cells. Each set of the memory cells may include a plurality of DDR4 particles (chips). In one embodiment, the artificial intelligence chip may include 4 72-bit DDR4 controllers therein, where 64 bits of the 72-bit DDR4 controllers are used to transfer data and 8 bits are used for ECC verification. It is understood that the theoretical bandwidth of data transfer can reach 25600MB/s when DDR4-3200 granules are employed in each set of memory cells.
In one embodiment, each set of memory cells includes a plurality of double rate synchronous dynamic random access memories arranged in parallel. DDR can transfer data twice in one clock cycle. And a controller for controlling DDR is arranged in the chip and is used for controlling data transmission and data storage of each storage unit.
The interface device is electrically connected with the artificial intelligent chip. The interface device is used for realizing data transmission between the artificial intelligent chip and an external device (such as a server or a computer). For example, in one embodiment, the interface device may be a standard PCIE interface. For example, the data to be processed is transferred from the server to the chip through the standard PCIE interface, so as to implement data transfer. Preferably, when PCIE 3.0X10 interface transmission is adopted, the theoretical bandwidth can reach 16000MB/s. In another embodiment, the interface device may be another interface, and the disclosure is not limited to the specific form of the other interface, and the interface unit may be capable of implementing a switching function. In addition, the results of the computation of the artificial intelligence chip are still transmitted back to the external device (e.g., server) by the interface device.
The control device is electrically connected with the artificial intelligence chip. The control device is used for regulating and controlling the state of the artificial intelligent chip. Specifically, the artificial intelligent chip and the control device can be electrically connected through an SPI interface. The control device may comprise a single chip microcomputer (Micro Controller Unit, MCU). The artificial intelligent chip can comprise a plurality of processing chips, a plurality of processing cores or a plurality of processing circuits, and can drive a plurality of loads. Therefore, the artificial intelligent chip can be in different working states such as multi-load and light-load. The control device can regulate and control the working states of a plurality of processing chips, a plurality of processing circuits and/or a plurality of processing circuits in the artificial intelligent chip.
From the above description in connection with fig. 6 and 7, those skilled in the art will appreciate that the present disclosure also discloses an electronic device or apparatus that may include one or more of the above-described boards, one or more of the above-described chips, and/or one or more of the above-described combination processing apparatuses.
According to different application scenarios, the electronic device or apparatus of the present disclosure may include a server, a cloud server, a server cluster, a data processing apparatus, a robot, a computer, a printer, a scanner, a tablet, an intelligent terminal, a PC device, an internet of things terminal, a mobile phone, a vehicle recorder, a navigator, a sensor, a camera, a video camera, a projector, a watch, an earphone, a mobile storage, a wearable device, a visual terminal, an autopilot terminal, a vehicle, a household appliance, and/or a medical device. The vehicle comprises an airplane, a ship and/or a vehicle, the household appliances comprise televisions, air conditioners, microwave ovens, refrigerators, electric cookers, humidifiers, washing machines, electric lamps, gas stoves and range hoods, and the medical equipment comprises a nuclear magnetic resonance instrument, a B-ultrasonic instrument and/or an electrocardiograph.
The disclosed embodiments also provide a computer readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement the above-described method. The computer readable storage medium may be a non-volatile computer readable storage medium.
The embodiment of the disclosure also provides electronic equipment, which comprises a processor and a memory for storing instructions executable by the processor, wherein the processor is configured to call the instructions stored by the memory so as to execute the method.
The electronic device may be provided as a terminal, server or other form of device.
Fig. 8 illustrates a block diagram of a computer device 1900 in accordance with an embodiment of the disclosure. For example, electronic device 1900 may be provided as a server. Referring to fig. 8, electronic device 1900 includes a processing component 1922 that further includes one or more processors and memory resources represented by memory 1932 for storing instructions, such as application programs, that can be executed by processing component 1922. The application programs stored in memory 1932 may include one or more modules each corresponding to a set of instructions. Further, processing component 1922 is configured to execute instructions to perform the methods described above.
The electronic device 1900 may also include a power component 1926 configured to perform power management of the electronic device 1900, a wired or wireless network interface 1950 configured to connect the electronic device 1900 to a network, and an input/output (I/O) interface 1958. The electronic device 1900 may operate based on an operating system stored in memory 1932, such as Windows Server, mac OS XTM, unixTM, linuxTM, freeBSDTM, or the like.
In an exemplary embodiment, a non-transitory computer readable storage medium is also provided, such as memory 1932, including computer program instructions executable by processing component 1922 of electronic device 1900 to perform the methods described above.
It should be noted that, for the sake of brevity, the present disclosure describes some methods and embodiments thereof as a series of actions and combinations thereof, but those skilled in the art will understand that the scheme of the present disclosure is not limited by the order of the described actions. Thus, one of ordinary skill in the art will appreciate in light of the present disclosure or teachings that certain steps thereof may be performed in other sequences or concurrently. Further, those skilled in the art will appreciate that the embodiments described in this disclosure may be considered alternative embodiments, i.e., wherein the acts or modules involved are not necessarily required for the implementation of some or some aspects of this disclosure. In addition, the description of some embodiments of the present disclosure also has an emphasis on each of them, depending on the solution. In view of this, those skilled in the art will appreciate that portions of one embodiment of the disclosure that are not described in detail may be referred to in connection with other embodiments.
In particular implementations, based on the disclosure and teachings of the present disclosure, one of ordinary skill in the art will appreciate that several embodiments of the disclosure disclosed herein may also be implemented in other ways not disclosed herein. For example, in terms of the foregoing embodiments of the electronic device or apparatus, the units are divided herein by taking into account the logic function, and there may be other manners of dividing the units when actually implemented. For another example, multiple units or components may be combined or integrated into another system, or some features or functions in the units or components may be selectively disabled. In terms of the connection relationship between different units or components, the connections discussed above in connection with the figures may be direct or indirect couplings between the units or components.
In the present disclosure, units described as separate components may or may not be physically separate, and components shown as units may or may not be physical units. The aforementioned components or units may be co-located or distributed across multiple network elements. In addition, according to actual needs, some or all of the units may be selected to achieve the purposes of the solution described in the embodiments of the disclosure. In addition, in some scenarios, multiple units in embodiments of the disclosure may be integrated into one unit or each unit may physically reside separately.
In the foregoing embodiments, the descriptions of the embodiments are emphasized, and for parts of one embodiment that are not described in detail, reference may be made to related descriptions of other embodiments. The technical features of the foregoing embodiments may be arbitrarily combined, and for brevity, all of the possible combinations of the technical features of the foregoing embodiments are not described, however, all of the combinations of the technical features should be considered as being within the scope of the disclosure.
The electronic device or apparatus of the present disclosure may also be applied to the internet, the internet of things, data centers, energy sources, transportation, public management, manufacturing, education, power grids, telecommunications, finance, retail, construction sites, medical, and the like. Further, the electronic device or apparatus of the present disclosure may also be used in application scenarios related to artificial intelligence, big data, and/or cloud computing, such as cloud, edge, terminal, and the like. In one or more embodiments, a computationally intensive electronic device or apparatus according to the aspects of the present disclosure may be applied to a cloud device (e.g., a cloud server), while a less power consuming electronic device or apparatus may be applied to a terminal device and/or an edge device (e.g., a smart phone or camera). In one or more embodiments, the hardware information of the cloud device and the hardware information of the terminal device and/or the edge device are compatible with each other, so that appropriate hardware resources can be matched from the hardware resources of the cloud device according to the hardware information of the terminal device and/or the edge device to simulate the hardware resources of the terminal device and/or the edge device, so as to complete unified management, scheduling and collaborative work of an end cloud entity or an edge cloud entity.
The foregoing may be better understood in light of the following clauses:
for example, clause A1, a method for processing data, the method includes determining a first constant node in a computation graph, which does not support fusion, according to types of a plurality of nodes in the computation graph, performing constant elimination processing according to the first constant node to obtain elimination results, and updating the computation graph according to the elimination results to obtain an updated computation graph.
And (A2) performing constant elimination processing according to the first constant node according to the method of the clause A1 to obtain an elimination result, wherein the elimination result comprises the steps of eliminating the first constant node, the input edge and the output edge of the first constant node and obtaining the elimination result.
Clause A3, wherein the first constant node comprises a third constant node, wherein the node pointed by the output edge of the third constant node comprises a non-constant node, and the removing the first constant node, the input edge and the output edge of the first constant node comprises obtaining the output information of the third constant node according to the data to be processed input into the calculation graph, storing the output information of the third constant node, and removing the input edge and the output edge of the third constant node.
Clause A4, wherein the first constant node comprises a second constant node, the nodes pointed by the output edges of the second constant node are all first constant nodes, and the removing the input edges and the output edges of the first constant node and the first constant node comprises directly removing the input edges and the output edges of the second constant node and the second constant node.
Clause A5, wherein the first constant node comprises a second constant node and a third constant node, and wherein the removing the first constant node, the input edge and the output edge of the first constant node comprises removing the second constant node and the third constant node, and the input edge and the output edge of the second constant node and the third constant node at the same time.
Clause A6, according to the method of clause A1, the determining a first constant node in the computation graph, which does not support fusion, according to the types of the multiple nodes in the computation graph includes determining a first target node of a preset type in the multiple nodes in the computation graph according to the types of the multiple nodes in the computation graph, performing constant discrimination processing on an output node of the first target node to obtain a second target node with constant input information, wherein the output node of the first target node includes a direct output node and an indirect output node, the direct output node includes a node to which an output edge of the first target node directly points, and the indirect output node includes a node to which an output edge of the first target node indirectly points via other nodes, and obtaining the first constant node according to the first target node and the second target node.
Clause A7, according to the method of clause A6, performing constant discrimination processing on the output node of the first target node to obtain a second target node with constant input information, wherein the method comprises the steps of judging whether the input information of the ith direct output node of the first target node only comprises constant information, i is a positive integer, determining the ith direct output node as the second target node when the input information of the ith direct output node only comprises constant information, judging whether the input information of the jth indirect output node corresponding to the ith direct output node only comprises constant information, j is a positive integer, and determining the jth indirect output node as the second target node when the input information of the jth indirect output node only comprises constant information.
And clause A8, according to the method in clause A6, performing constant discrimination processing on the output node of the first target node to obtain a second target node with constant input information, and stopping performing constant discrimination processing on the rest indirect output nodes corresponding to the ith direct output node or the jth indirect output node when the ith direct output node or the jth indirect output node comprises non-constant information, wherein the rest indirect output nodes comprise indirect output nodes which do not perform constant discrimination processing in the indirect output nodes corresponding to the ith direct output node or the jth indirect output node.
Clause A9, the method of clauses A1-A8, wherein the data to be processed comprises at least one of an image, a video, a voice, and a text.
The clause A10 is a data processing device, which comprises a determining module, a canceling module and an updating module, wherein the determining module is used for determining a first constant node which does not support fusion in a calculation graph according to the types of a plurality of nodes in the calculation graph, the canceling module is used for carrying out constant canceling processing according to the first constant node to obtain a canceling result, and the updating module is used for updating the calculation graph according to the canceling result to obtain an updated calculation graph.
Clause A11, the apparatus of clause A10, wherein the cancellation module is further configured to remove the first constant node, the input edge and the output edge of the first constant node, and obtain the cancellation result.
Clause a12, where the first constant node includes a third constant node, where a node pointed by an output edge of the third constant node includes a non-constant node, and the cancellation module is further configured to obtain output information of the third constant node according to data to be processed input into the computation graph, save the output information of the third constant node, and remove the third constant node, the input edge and the output edge of the third constant node.
Clause a13, wherein the first constant node comprises a second constant node, wherein the nodes pointed by the output edges of the second constant node are all the first constant node, and the elimination module is further configured to directly eliminate the second constant node, the input edges and the output edges of the second constant node.
Clause a14, the apparatus of clause a11, the first constant node comprising a second constant node and a third constant node, the cancellation module further configured to simultaneously remove the second constant node and the third constant node, and input edges and output edges of the second constant node and the third constant node.
The device according to clause a15, wherein the determining module is further configured to determine a first target node of a preset type among the plurality of nodes in the computation graph according to the types of the plurality of nodes in the computation graph, perform constant discrimination processing on an output node of the first target node to obtain a second target node with constant input information, the output node of the first target node includes a direct output node and an indirect output node, the direct output node includes a node to which an output edge of the first target node is directly directed, the indirect output node includes a node to which an output edge of the first target node is indirectly directed via other nodes, and obtain the first constant node according to the first target node and the second target node.
Clause a16, the apparatus according to clause a15, wherein the determining module is further configured to determine whether the input information of the ith direct output node of the first target node includes only constant information, i is a positive integer, determine the ith direct output node as a second target node if the input information of the ith direct output node includes only constant information, determine whether the input information of the jth indirect output node corresponding to the ith direct output node includes only constant information, j is a positive integer, and determine the jth indirect output node as a second target node if the input information of the jth indirect output node includes only constant information.
Clause a17, wherein the determining module is further configured to stop performing the constant discrimination processing on the remaining indirect output nodes corresponding to the i-th direct output node or the j-th indirect output node, where the i-th direct output node or the j-th indirect output node includes non-constant information, and the remaining indirect output nodes include indirect output nodes that do not perform the constant discrimination processing among the indirect output nodes corresponding to the i-th direct output node or the j-th indirect output node.
Clause a18, the device of clauses a10-a17, wherein the data to be processed comprises at least one of an image, a video, a voice, and a text.
While various embodiments of the present disclosure have been shown and described herein, it will be obvious to those skilled in the art that such embodiments are provided by way of example only. Numerous modifications, changes, and substitutions will occur to those skilled in the art without departing from the spirit and scope of the present disclosure. It should be understood that various alternatives to the embodiments of the disclosure described herein may be employed in practicing the disclosure. The appended claims are intended to define the scope of the disclosure and are therefore to cover all equivalents or alternatives falling within the scope of these claims.

Claims (10)

CN202011193626.0A2020-10-302020-10-30 Data processing method, device, computer equipment and storage mediumActiveCN114443259B (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN202011193626.0ACN114443259B (en)2020-10-302020-10-30 Data processing method, device, computer equipment and storage medium

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN202011193626.0ACN114443259B (en)2020-10-302020-10-30 Data processing method, device, computer equipment and storage medium

Publications (2)

Publication NumberPublication Date
CN114443259A CN114443259A (en)2022-05-06
CN114443259Btrue CN114443259B (en)2025-08-22

Family

ID=81358247

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN202011193626.0AActiveCN114443259B (en)2020-10-302020-10-30 Data processing method, device, computer equipment and storage medium

Country Status (1)

CountryLink
CN (1)CN114443259B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN109726800A (en)*2018-12-292019-05-07北京中科寒武纪科技有限公司Operation method, device and Related product
CN111401538A (en)*2019-09-242020-07-10上海寒武纪信息科技有限公司 A data processing method, device, computer equipment and storage medium

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US10380140B2 (en)*2015-11-302019-08-13Tableau Software, Inc.Systems and methods for implementing a virtual machine for interactive visual analysis
CN108764370B (en)*2018-06-082021-03-12Oppo广东移动通信有限公司Image processing method, image processing device, computer-readable storage medium and computer equipment
CN111260019B (en)*2020-02-182023-04-11深圳鲲云信息科技有限公司Data processing method, device and equipment of neural network model and storage medium

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN109726800A (en)*2018-12-292019-05-07北京中科寒武纪科技有限公司Operation method, device and Related product
CN111401538A (en)*2019-09-242020-07-10上海寒武纪信息科技有限公司 A data processing method, device, computer equipment and storage medium

Also Published As

Publication numberPublication date
CN114443259A (en)2022-05-06

Similar Documents

PublicationPublication DateTitle
CN110096309B (en) Computing method, apparatus, computer equipment and storage medium
CN110096310B (en)Operation method, operation device, computer equipment and storage medium
CN110119807B (en)Operation method, operation device, computer equipment and storage medium
CN110458285B (en)Data processing method, data processing device, computer equipment and storage medium
CN109726800B (en)Operation method, device and related product
WO2023201947A1 (en)Methods, systems, and storage media for task dispatch
WO2021036893A1 (en)Data processing method and apparatus, computer device, and storage medium
CN114185667B (en)Data processing method and device and related products
CN110020720B (en)Operator splicing method and device
CN111047005A (en) Computing method, apparatus, computer equipment and storage medium
CN112084023B (en) Data parallel processing method, electronic device and computer readable storage medium
CN114443259B (en) Data processing method, device, computer equipment and storage medium
CN112766475B (en)Processing component and artificial intelligence processor
CN112463158B (en)Compiling method, compiling device, electronic equipment and storage medium
CN111258732B (en)Data processing method, data processing device and electronic equipment
WO2021223642A1 (en)Data processing method and apparatus, and related product
CN114691589B (en) A processing device and related products
CN111047030A (en)Operation method, operation device, computer equipment and storage medium
CN112395008A (en)Operation method, operation device, computer equipment and storage medium
CN111353595A (en) Computing method, device and related products
CN112306949B (en)Data processing method and device and related product
CN111026440B (en)Operation method, operation device, computer equipment and storage medium
CN114692825B (en) A quantitative training method, device and equipment for neural network model
CN111124497B (en)Operation method, operation device, computer equipment and storage medium
CN114662641A (en)Method and equipment for quantizing neural network on processing unit

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination
GR01Patent grant
GR01Patent grant
TG01Patent term adjustment
TG01Patent term adjustment

[8]ページ先頭

©2009-2025 Movatter.jp