Movatterモバイル変換


[0]ホーム

URL:


CN116523051B - A model mixed precision reasoning method, device, equipment and storage medium - Google Patents

A model mixed precision reasoning method, device, equipment and storage medium

Info

Publication number
CN116523051B
CN116523051BCN202310524663.2ACN202310524663ACN116523051BCN 116523051 BCN116523051 BCN 116523051BCN 202310524663 ACN202310524663 ACN 202310524663ACN 116523051 BCN116523051 BCN 116523051B
Authority
CN
China
Prior art keywords
precision
segment
model
computing
computing node
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310524663.2A
Other languages
Chinese (zh)
Other versions
CN116523051A (en
Inventor
田宏泽
程伟
孙清阁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Suiyuan Intelligent Technology Co ltd
Original Assignee
Beijing Suiyuan Intelligent Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Suiyuan Intelligent Technology Co ltdfiledCriticalBeijing Suiyuan Intelligent Technology Co ltd
Priority to CN202310524663.2ApriorityCriticalpatent/CN116523051B/en
Publication of CN116523051ApublicationCriticalpatent/CN116523051A/en
Application grantedgrantedCritical
Publication of CN116523051BpublicationCriticalpatent/CN116523051B/en
Activelegal-statusCriticalCurrent
Anticipated expirationlegal-statusCritical

Links

Classifications

Landscapes

Abstract

Translated fromChinese

本发明公开了一种模型混精推理方法、装置、设备及存储介质,包括:将输入样本输入至芯片内的深度学习模型中,通过芯片内的计算节点对输入样本进行计算,得到float32类型的目标结果;获取模型的分段列表,根据模型针对各分段在预设精度选择参数下的混精结果及目标结果,对各分段的精度选择参数进行调整;将每个分段中各计算节点的目标精度选择参数,作为控制信号输入至控制节点中,通过芯片内的控制节点选择匹配的精度计算分支,并通过计算节点根据精度计算分支完成混精推理。本发明实施例的技术方案可以有效获取满足模型精度要求的混精推理方案,提高模型的混精推理效率。

The present invention discloses a model mixed precision reasoning method, device, equipment and storage medium, including: inputting input samples into a deep learning model in a chip, calculating the input samples through the computing nodes in the chip, and obtaining a target result of type float32; obtaining a segmentation list of the model, and adjusting the precision selection parameters of each segment according to the mixed precision results and target results of the model under preset precision selection parameters for each segment; inputting the target precision selection parameters of each computing node in each segment as a control signal into the control node, selecting a matching precision calculation branch through the control node in the chip, and completing mixed precision reasoning through the computing node according to the precision calculation branch. The technical solution of the embodiment of the present invention can effectively obtain a mixed precision reasoning scheme that meets the model precision requirements and improve the mixed precision reasoning efficiency of the model.

Description

Model mixed-precision reasoning method, device, equipment and storage medium
Technical Field
The present invention relates to the field of computer technologies, and in particular, to a method, an apparatus, a device, and a storage medium for model refinement reasoning.
Background
The deep learning model mixed precision reasoning method is that the process of deep neural network reasoning is accelerated by using the float16 and float32 data types in a mixed mode, and memory use and access are reduced, so that a larger neural network can be deduced.
In the case of mixed-refinement reasoning, the computational framework (such as Tensorflow or Pytorch) of the existing chip generally has two modes, namely, the first mode is the reasoning precision (such as float32 or float 16) used by each computational node in the user-defined model, and the second mode is selected according to a black-and-white list defined by the framework.
The first mode requires a user to have stronger theoretical knowledge of model calculation, the second mode does not necessarily find a mixed-precision reasoning scheme meeting the precision requirement, and the existing reasoning framework (such as TensorRT and the like) needs to find an effective mixed-precision scheme in an iterative mode when constructing the mixed-precision reasoning scheme, and each iteration needs a chip to compile the model, so that the compiling time is correspondingly multiplied along with the rising of the iteration times.
Disclosure of Invention
The invention provides a method, a device, equipment and a storage medium for model mixed-precision reasoning, which can effectively acquire a mixed-precision reasoning scheme meeting the model precision requirement, improve the mixed-precision reasoning efficiency of a model and save the computing resources of a chip in the process of model mixed-precision reasoning.
According to an aspect of the present invention, there is provided a model refinement inference method, including:
Inputting an input sample into a deep learning model in a chip, and calculating the input sample through a plurality of calculation nodes corresponding to the deep learning model in the chip to obtain a float32 type target result;
Obtaining a segmentation list corresponding to the model, and adjusting precision selection parameters corresponding to the segments according to a precision mixing result of the model under preset precision selection parameters and the target result, wherein each segment comprises at least one computing node;
The method comprises the steps that target precision selection parameters corresponding to all computing nodes in each segment are input into control nodes corresponding to all computing nodes as control signals, precision computing branches matched with the computing nodes are selected through the control nodes in the chip according to the control signals, and mixed precision reasoning is completed through the computing nodes according to the precision computing branches;
Wherein each computing node corresponds to a float32 precision computing branch and a float16 precision computing branch in advance.
Optionally, before inputting the input sample into the on-chip deep learning model, the method further comprises:
ordering a plurality of topologies included in the model;
And adding float16 precision calculation branches to the calculation nodes corresponding to each topological structure according to the topological sequencing result.
Optionally, after obtaining the segment list corresponding to the model, the method further includes:
Presetting precision selection parameters corresponding to the segments according to the segment types corresponding to the segments;
And inputting the input sample into a deep learning model, selecting parameters according to preset precision corresponding to each segment through the model, and processing the input sample to obtain a mixed precision result.
Optionally, the precision selection parameter corresponding to each segment is preset according to the segment type corresponding to each segment, including:
acquiring the longest segment, the known data type segment and the unknown data type segment from the segment list;
Setting the precision selection parameter corresponding to the longest segment as false;
Setting a precision selection parameter corresponding to the known data type segment as true or false according to the target data type corresponding to the known data type segment;
and setting the precision selection parameter corresponding to the unknown data type segment as true.
Optionally, according to the mixing result of the model under the preset precision selection parameters for each segment and the target result, adjusting the precision selection parameters corresponding to each segment, including:
constructing an evaluation standard according to the mixing result and the target result;
judging whether the mixing result is qualified or not according to the evaluation standard;
If yes, setting the precision selection parameter corresponding to each computing node in the longest segment as false, removing the longest segment in a segment list, and then returning to execute the operation of acquiring the longest segment in the segment list.
Optionally, after judging whether the refinement mixing result is qualified according to the evaluation standard, the method further includes:
If not, judging whether the longest segment has a fission condition or not;
If yes, the longest segment is split into a first segment and a second segment, the first segment and the second segment are added into a segment list, and then the operation of acquiring the longest segment in the segment list is carried out in a returning mode.
Optionally, before the target precision selection parameter corresponding to each computing node in each segment is input as the control signal to the control node corresponding to each computing node, the method further includes:
Judging whether the segmentation list is empty or not;
If yes, acquiring a current precision selection parameter corresponding to each computing node in each segment, and taking the current precision selection parameter as a target precision selection parameter.
According to another aspect of the present invention, there is provided a model refinement inference apparatus including:
The target result generation module is used for inputting an input sample into a deep learning model in a chip, and calculating the input sample through a plurality of calculation nodes corresponding to the deep learning model in the chip to obtain a target result of a float32 type, wherein the initial precision selection parameter of each calculation node is true;
the parameter adjustment module is used for acquiring a segmentation list corresponding to the model, and adjusting the precision selection parameters corresponding to the segments according to the precision mixing result of the model under the preset precision selection parameters and the target result, wherein each segment comprises at least one calculation node;
the branch selection module is used for inputting target precision selection parameters corresponding to each calculation node in each segment as control signals into control nodes corresponding to each calculation node, selecting precision calculation branches matched with the calculation nodes according to the control signals through the control nodes in the chip, and completing mixed precision reasoning according to the precision calculation branches through the calculation nodes;
Wherein each computing node corresponds to a float32 precision computing branch and a float16 precision computing branch in advance.
According to another aspect of the present invention, there is provided an electronic device, the device comprising:
At least one processor, and
A memory communicatively coupled to the at least one processor, wherein,
The memory stores a computer program executable by the at least one processor to enable the at least one processor to perform the model refinement inference method described in any one of the embodiments of the present invention.
According to another aspect of the present invention, there is provided a computer readable storage medium storing computer instructions for causing a processor to implement the model refinement inference method according to any embodiment of the present invention when executed.
According to the technical scheme provided by the embodiment of the invention, the input samples are input into the deep learning model in the chip, the input samples are calculated through a plurality of calculation nodes corresponding to the deep learning model in the chip to obtain the target result of float32 type, the segmentation list corresponding to the model is obtained, the precision selection parameters corresponding to the segments are adjusted according to the precision mixing result of the segments under the preset precision selection parameters and the target result, the target precision selection parameters corresponding to the calculation nodes in each segment are input into the control nodes corresponding to the calculation nodes as control signals, the control nodes in the chip select the precision calculation branch matched with the calculation nodes according to the control signals, and the precision mixing reasoning is completed through the calculation nodes according to the precision calculation branch, so that the precision mixing reasoning scheme meeting the model precision requirement can be effectively obtained, the precision mixing reasoning efficiency of the model is improved, and the calculation resources of the chip in the model mixing reasoning process are saved.
The technical scheme provided by the embodiment of the invention can be applied to the fields of text detection and image recognition. When the deep learning model is under fp16 precision reasoning, and overflow and underflow occur to cause reasoning errors, the overflow part can be automatically corrected through the technical scheme of the embodiment, and the overflow node can be deduced by using fp32 precision.
It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the invention or to delineate the scope of the invention. Other features of the present invention will become apparent from the description that follows.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a model refinement inference method provided in accordance with an embodiment of the present invention;
FIG. 2 is a flow chart of another model refinement inference method provided in accordance with an embodiment of the present invention;
FIG. 3 is a flow chart of another model refinement inference method provided in accordance with an embodiment of the present invention;
fig. 4 is a schematic structural diagram of a model refinement inference apparatus according to an embodiment of the present invention;
Fig. 5 is a schematic diagram of an electronic device for implementing the model refinement inference method according to an embodiment of the present invention.
Detailed Description
In order that those skilled in the art will better understand the present invention, a technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.
It should be noted that the terms "first," "second," and the like in the description and the claims of the present invention and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the invention described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
Fig. 1 is a flowchart of a model refinement inference method provided in an embodiment of the present invention, where the embodiment may be suitable for performing a refinement inference on a deep learning model, the method may be performed by a model refinement inference device, the model refinement inference device may be implemented in a hardware and/or software form, and the model refinement inference device may be configured in an electronic device. As shown in fig. 1, the method includes:
Step 110, inputting the input sample into a deep learning model in the chip, and calculating the input sample through a plurality of calculation nodes corresponding to the deep learning model in the chip to obtain a float32 type target result.
In this embodiment, the input sample may be a learning sample corresponding to a deep learning model selected by a user. After the input sample is obtained, the input sample can be input into a model, and the input sample is calculated according to initial precision selection parameters through each calculation node in the chip, so that a float32 type target result is obtained. Specifically, the precision selection parameter is used for representing the calculation precision adopted by the calculation node when processing data.
In a particular embodiment, each compute node pre-corresponds to a float32 precision compute branch and a float16 precision compute branch. The initial precision selection parameter for each compute node may be true, i.e., in this step, each compute node may compute the input samples using float32 precision computation branches.
Step 120, obtaining a segment list corresponding to the model, and adjusting the precision selection parameters corresponding to the segments according to the precision mixing result of the model under the preset precision selection parameters and the target result.
In this embodiment, the plurality of computing nodes in the model may form a plurality of segments, the plurality of segments forming the segment list. Wherein each segment may include at least one compute node therein.
In this step, after obtaining the segment list corresponding to the model, the precision selection parameters of the computing nodes in each segment may be preset (for example, set to true or false), and then the computing nodes are mixed to use float16 and float32 precision computing branches under the preset precision selection parameters, so as to process the input samples, and obtain the mixing result.
In a specific embodiment, optionally, after the mixing result is obtained, whether the mixing result meets a preset standard may be determined according to the target result, if not, the precision selection parameters corresponding to each segment are adjusted, and a new mixing result is obtained again until the mixing result meets the preset standard.
And 130, inputting target precision selection parameters corresponding to the calculation nodes in each segment as control signals into the control nodes corresponding to the calculation nodes, selecting precision calculation branches matched with the calculation nodes according to the control signals through the control nodes in the chip, and completing mixed precision reasoning through the precision calculation branches.
In this embodiment, the adjusted precision selection parameter of each segment may be used as the target precision selection parameter corresponding to the computing node in each segment. After the target precision selection parameter corresponding to the computing node is obtained, the target precision selection parameter can be used as a control signal to be input into a control node corresponding to the computing node.
In a specific embodiment, if the target precision selection parameter is "true", the control node may select a float32 precision calculation branch matched with the calculation node and process data through the calculation branch, whereas if the target precision selection parameter is "false", the control node may select a float16 precision calculation branch matched with the calculation node and process data through the calculation branch, thereby completing the mixed precision reasoning.
In the embodiment, the accuracy calculation branches matched with all calculation nodes in the model are determined in advance before the model mixing reasoning, so that the model can be compiled once in the model mixing reasoning process, time consumption caused by the fact that a chip compiles the model for many times can be avoided, and secondly, the mixing reasoning scheme meeting the model accuracy requirements can be effectively obtained through the implementation mode of adjusting precision selection parameters of all the segments according to the target result and the mixing result without a user having strong model calculation theoretical knowledge.
The method comprises the steps of inputting an input sample into a deep learning model in a chip, calculating the input sample through a plurality of calculation nodes corresponding to the deep learning model in the chip to obtain a float32 type target result, obtaining a segmentation list corresponding to the model, adjusting precision selection parameters corresponding to the segments according to a precision mixing result of the segments under preset precision selection parameters and the target result, inputting the target precision selection parameters corresponding to the calculation nodes in each segment as control signals into control nodes corresponding to the calculation nodes, selecting precision calculation branches matched with the calculation nodes according to the control signals by the control nodes in the chip, and completing precision mixing reasoning through the calculation nodes according to the precision calculation branches.
Fig. 2 is a flowchart of a model refinement inference method provided in the second embodiment of the present invention, and this embodiment is a further refinement of the foregoing embodiment. As shown in fig. 2, the method includes:
Step 210, sorting a plurality of topological structures included in the model, and adding float 16-precision computing branches to computing nodes corresponding to the topological structures according to a topological sorting result.
In this step, a plurality of topologies included in the model may be acquired first, then the plurality of topologies may be ordered, and a plurality of computing nodes corresponding to each topology may be acquired in turn according to the topology ordering result. Each computing node corresponds to a float32 precision node in advance, and the computing node and the float32 precision node form a float32 precision computing branch.
After multiple compute nodes are acquired, one precision conversion node (i.e., f32/f16 data type conversion node) may be added to each compute node, thereby forming a float16 precision computation branch of the compute node.
In this embodiment, each computing node in the chip may correspond to a control node. The control node can be executed by combining the high-precision computing branch and the low-precision computing branch corresponding to the computing node into one operator, so that the waste of computing power and storage resources of a chip is avoided.
Step 220, inputting an input sample into a deep learning model in a chip, and calculating the input sample through a plurality of calculation nodes corresponding to the deep learning model in the chip to obtain a float32 type target result, wherein the initial precision selection parameter of each calculation node is true.
Step 230, obtaining a segment list corresponding to the model, and presetting precision selection parameters corresponding to the segments according to the segment types corresponding to the segments.
In this step, optionally, different precision selection parameters may be preset for each segment according to different segment types, so that each computing node uses float16 and float32 precision computing branches in a mixed manner under the preset precision selection parameters, and processes an input sample to obtain a mixed precision result.
Step 240, inputting the input sample to a deep learning model, and processing the input sample to obtain a mixed precision result through the model according to preset precision selection parameters corresponding to each segment.
Step 250, adjusting the precision selection parameters corresponding to the segments according to the precision mixing result of the segments under the preset precision selection parameters and the target result.
And 260, inputting target precision selection parameters corresponding to each calculation node in each segment as control signals into control nodes corresponding to each calculation node, selecting precision calculation branches matched with the calculation nodes according to the control signals through control nodes in a chip, and completing mixed precision reasoning according to the precision calculation branches through the calculation nodes.
According to the technical scheme provided by the embodiment of the invention, a plurality of topological structures included in the model are ordered, a float16 precision calculation branch is added to a calculation node corresponding to each topological structure according to a topological ordering result, a preset input sample is input into the model, the input sample is calculated through a plurality of calculation nodes corresponding to a deep learning model in a chip to obtain a float32 type target result, a segmentation list of the model is obtained, precision selection parameters of each segment are preset according to the segmentation type of each segment, the input sample is input into the deep learning model, the input sample is processed through the model according to the preset precision selection parameters of each segment to obtain a mixed precision result, the precision selection parameters corresponding to each segment are adjusted according to the mixed precision result and the target result, the target precision selection parameters of each calculation node are input into the control node as control signals, the precision calculation branch is selected through the control node in the chip, the mixed precision reasoning technology means of the precision calculation branch is completed, the mixed precision reasoning scheme meeting the model precision requirements can be effectively obtained, the mixed precision reasoning efficiency of the model is improved, and the calculation resources of the chip in the mixed precision reasoning process can be saved.
Fig. 3 is a flowchart of another model refinement inference method according to the third embodiment of the present invention, which is a further refinement of the foregoing embodiment. As shown in fig. 3, the method includes:
Step 310, inputting an input sample into a deep learning model in a chip, and calculating the input sample through a plurality of calculation nodes corresponding to the deep learning model in the chip to obtain a float32 type target result, wherein the initial precision selection parameter of each calculation node is true.
Step 320, obtaining a segment list corresponding to the model, and obtaining the longest segment, the known data type segment and the unknown data type segment in the segment list.
In this embodiment, the known data type segment may be a segment that has been determined to use a particular calculation accuracy. The unknown data type segments may be segments for which the calculation accuracy is unknown and for which an accuracy selection parameter adjustment is to be made.
Step 330, setting the precision selection parameter corresponding to the longest segment as false, setting the precision selection parameter corresponding to the known data type segment as true or false according to the target data type corresponding to the known data type segment, and setting the precision selection parameter corresponding to the unknown data type segment as true.
In this step, the precision selection parameter of the longest segment may be set to false, that is, each computing node in the longest segment uses float16 precision computing branches for computation.
If the float16 precision is determined to be used in the known data type segment, the corresponding precision selection parameter may be set to false, whereas if the float32 precision is determined to be used, the corresponding precision selection parameter may be set to true.
And 340, inputting an input sample into a deep learning model, selecting parameters according to preset precision corresponding to each segment through the model, and processing the input sample to obtain a mixed precision result.
And 350, constructing an evaluation standard according to the mixing result and the target result, judging whether the mixing result is qualified according to the evaluation standard, if so, executing a step 360, and if not, executing a step 370.
And 360, setting the precision selection parameter corresponding to each computing node in the longest segment as false, removing the longest segment from the segment list, and returning to the operation of acquiring the longest segment from the segment list in the executing step 320.
In this step, if the above-mentioned mixing result is qualified, the longest segment may be removed, and then the longest segment is re-acquired in the segment list, so as to obtain a new mixing result.
Step 370, if the longest segment has a fission condition, the longest segment is fissionally divided into a first segment and a second segment, the first segment and the second segment are added to a segment list, and then the operation of obtaining the longest segment in the segment list in step 320 is performed.
In one implementation of this embodiment, a longest segment may be considered fissionable if its segment length (i.e., the number of compute nodes in the segment) is greater than 1. In this case, the longest segment may be fissioned by a dichotomy to obtain a first segment and a second segment, and then the first segment and the second segment are added to a segment list, and the longest segment is obtained again in the segment list, so as to obtain a new refinement mixing result, until the refinement mixing result is qualified.
In one implementation of this embodiment, if the segment length of the longest segment is equal to 1, then the longest segment may be considered to be free of fission conditions. In this case, the precision selection parameter of the longest segment may be set to true, and the longest segment may be removed in the segment list.
And 380, if the segment list is empty, inputting target precision selection parameters corresponding to all the computing nodes in each segment as control signals into control nodes corresponding to all the computing nodes, selecting precision computing branches matched with the computing nodes according to the control signals by the control nodes, and completing mixed precision reasoning by the precision computing branches.
In this embodiment, if each segment is processed, it may be determined whether the segment list is empty, and if yes, the current precision selection parameter corresponding to each computing node in each segment may be obtained, and the current precision selection parameter is used as the target precision selection parameter.
According to the technical scheme provided by the embodiment of the invention, an input sample is input into a model to obtain a target result of a float32 type, the precision selection parameter of the longest section is set as false, the precision selection parameter of the section of the known data type is set as true or false, the precision selection parameter of the section of the unknown data type is set as true, the input sample is processed through the model according to the preset precision selection parameter to obtain a mixed precision result, if the mixed precision result is qualified, the precision selection parameter of each calculation node in the longest section is set as false, the longest section is removed and then the operation of obtaining the longest section is carried out in a returning mode, if the longest section is unqualified, whether the longest section has a fission condition is judged, if the longest section is judged to be the first section and the second section, the first section and the second section are added into a section list, then the operation of obtaining the longest section is returned, if the section list is empty, the target precision selection parameter of each calculation node is input into a control node as a control signal, the precision calculation branch is selected through the control node, the precision calculation branch is completed, the required precision of the mixed precision is inferred, and the required method is effectively inferred, and the required precision of the mixed precision is improved.
Fig. 4 is a schematic structural diagram of a model refinement inference device according to a fourth embodiment of the present invention, where the device is applied to an electronic apparatus. As shown in fig. 4, the apparatus includes a target result generation module 410, a parameter adjustment module 420, and a branch selection module 430.
The target result generation module 410 is configured to input an input sample into a deep learning model in a chip, and calculate the input sample through a plurality of calculation nodes corresponding to the deep learning model in the chip to obtain a target result of float32 type;
The parameter adjustment module 420 is configured to obtain a segment list corresponding to the model, and adjust the precision selection parameter corresponding to each segment according to the precision mixing result of each segment under the preset precision selection parameter of the model and the target result, where each segment includes at least one computing node;
the branch selection module 430 is configured to input, as a control signal, a target precision selection parameter corresponding to each computing node in each segment to a control node corresponding to each computing node, select, by using a control node in the chip, a precision computing branch matched with the computing node according to the control signal, and complete mixed precision reasoning according to the precision computing branch by using the computing node;
Wherein each computing node corresponds to a float32 precision computing branch and a float16 precision computing branch in advance.
According to the technical scheme provided by the embodiment of the invention, the input samples are input into the deep learning model in the chip, the input samples are calculated through a plurality of calculation nodes corresponding to the deep learning model in the chip to obtain the target result of float32 type, the segmentation list corresponding to the model is obtained, the precision selection parameters corresponding to the segments are adjusted according to the precision mixing result of the segments under the preset precision selection parameters and the target result, the target precision selection parameters corresponding to the calculation nodes in each segment are input into the control nodes corresponding to the calculation nodes as control signals, the control nodes in the chip select the precision calculation branch matched with the calculation nodes according to the control signals, and the precision mixing reasoning is completed through the calculation nodes according to the precision calculation branch, so that the precision mixing reasoning scheme meeting the model precision requirement can be effectively obtained, the precision mixing reasoning efficiency of the model is improved, and the calculation resources of the chip in the model mixing reasoning process are saved.
On the basis of the above embodiment, the apparatus further includes:
the topology ordering module is used for ordering a plurality of topological structures included in the model;
And the branch adding module is used for adding float16 precision calculation branches to the calculation nodes in each topological structure according to the topological sequencing result.
The parameter adjustment module 420 includes:
the parameter presetting unit is used for presetting the precision selection parameters corresponding to the segments according to the segment types corresponding to the segments;
The sample input unit is used for inputting the input samples into a deep learning model, selecting parameters according to preset precision corresponding to each segment through the model, and processing the input samples to obtain a mixed precision result;
A segment obtaining unit, configured to obtain a longest segment, a known data type segment, and an unknown data type segment in the segment list;
A segment parameter setting unit, configured to set an accuracy selection parameter corresponding to the longest segment to false; setting the precision selection parameter corresponding to the known data type segment as true or false according to the target data type corresponding to the known data type segment;
the evaluation standard construction unit is used for constructing an evaluation standard according to the semen mixing result and the target result;
the mixed precision result judging unit is used for judging whether the mixed precision result is qualified or not according to the evaluation standard;
The segment removing unit is used for setting the precision selection parameter corresponding to each calculation node in the longest segment as false when the mixing result is qualified, removing the longest segment in a segment list, and then returning to execute the operation of acquiring the longest segment in the segment list;
a segment judging unit, configured to judge whether the longest segment has a fission condition;
The fission unit is used for fissioning the longest segment into a first segment and a second segment when the longest segment has a fission condition, adding the first segment and the second segment into a segment list, and then returning to execute the operation of acquiring the longest segment in the segment list;
And the segmentation list judging unit is used for judging whether the segmentation list is empty or not, if so, acquiring the current precision selection parameters corresponding to the calculation nodes in each segment, and taking the current precision selection parameters as target precision selection parameters.
The device can execute the method provided by all the embodiments of the invention, and has the corresponding functional modules and beneficial effects of executing the method. Technical details not described in detail in the embodiments of the present invention can be found in the methods provided in all the foregoing embodiments of the present invention.
Fig. 5 shows a schematic diagram of the structure of an electronic device 10 that may be used to implement an embodiment of the invention. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. Electronic equipment may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices (e.g., helmets, glasses, watches, etc.), and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the inventions described and/or claimed herein.
As shown in fig. 5, the electronic device 10 includes at least one processor 11, and a memory, such as a Read Only Memory (ROM) 12, a Random Access Memory (RAM) 13, etc., communicatively connected to the at least one processor 11, in which the memory stores a computer program executable by the at least one processor, and the processor 11 may perform various appropriate actions and processes according to the computer program stored in the Read Only Memory (ROM) 12 or the computer program loaded from the storage unit 18 into the Random Access Memory (RAM) 13. In the RAM 13, various programs and data required for the operation of the electronic device 10 may also be stored. The processor 11, the ROM 12 and the RAM 13 are connected to each other via a bus 14. An input/output (I/O) interface 15 is also connected to bus 14.
Various components in the electronic device 10 are connected to the I/O interface 15, including an input unit 16, such as a keyboard, mouse, etc., an output unit 17, such as various types of displays, speakers, etc., a storage unit 18, such as a magnetic disk, optical disk, etc., and a communication unit 19, such as a network card, modem, wireless communication transceiver, etc. The communication unit 19 allows the electronic device 10 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunication networks.
The processor 11 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of processor 11 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various processors running machine learning model algorithms, digital Signal Processors (DSPs), and any suitable processor, controller, microcontroller, etc. The processor 11 performs the various methods and processes described above, such as the model refinement inference method.
In some embodiments, the model refinement inference method may be implemented as a computer program tangibly embodied on a computer-readable storage medium, such as the storage unit 18. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 10 via the ROM 12 and/or the communication unit 19. When the computer program is loaded into RAM 13 and executed by processor 11, one or more steps of the model refinement inference method described above may be performed. Alternatively, in other embodiments, the processor 11 may be configured to perform the model refinement inference method by any other suitable means (e.g., by means of firmware).
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include being implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be a special or general purpose programmable processor, operable to receive data and instructions from, and to transmit data and instructions to, a storage system, at least one input device, and at least one output device.
A computer program for carrying out methods of the present invention may be written in any combination of one or more programming languages. These computer programs may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the computer programs, when executed by the processor, cause the functions/acts specified in the flowchart and/or block diagram block or blocks to be implemented. The computer program may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of the present invention, a computer-readable storage medium may be a tangible medium that can contain, or store a computer program for use by or in connection with an instruction execution system, apparatus, or device. The computer readable storage medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. Alternatively, the computer readable storage medium may be a machine readable signal medium. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on an electronic device having a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to the user and a keyboard and a pointing device (e.g., a mouse or a trackball) by which the user can provide input to the electronic device. Other kinds of devices may also be used to provide for interaction with a user, for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback), and input from the user may be received in any form, including acoustic input, speech input, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include a Local Area Network (LAN), a Wide Area Network (WAN), a blockchain network, and the Internet.
The computing system may include clients and servers. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical hosts and VPS service are overcome.
It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps described in the present invention may be performed in parallel, sequentially, or in a different order, so long as the desired results of the technical solution of the present invention are achieved, and the present invention is not limited herein.
The above embodiments do not limit the scope of the present invention. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention should be included in the scope of the present invention.

Claims (10)

Translated fromChinese
1.一种模型混精推理方法,其特征在于,所述方法包括:1. A model mixed precision reasoning method, characterized in that the method includes:将输入样本输入至芯片内的深度学习模型中,通过芯片内与深度学习模型对应的多个计算节点对输入样本进行计算,得到float32类型的目标结果;每个计算节点的初始精度选择参数为true;Input samples are fed into the deep learning model within the chip. Multiple computing nodes within the chip corresponding to the deep learning model compute the input samples and obtain the target result of type float32. The initial precision selection parameter of each computing node is set to true.获取所述模型对应的分段列表,根据模型针对各分段在预设精度选择参数下的混精结果,以及所述目标结果,对各分段对应的精度选择参数进行调整;其中,每个分段中包括至少一个计算节点;Obtain a list of segments corresponding to the model, and adjust the precision selection parameters corresponding to each segment according to the mixed precision results of the model for each segment under preset precision selection parameters and the target result; wherein each segment includes at least one computing node;将每个分段中各计算节点对应的目标精度选择参数,作为控制信号输入至各计算节点对应的控制节点中,通过所述芯片内的控制节点根据所述控制信号选择与计算节点匹配的精度计算分支,并通过计算节点根据所述精度计算分支完成混精推理;Input the target precision selection parameter corresponding to each computing node in each segment as a control signal to the control node corresponding to each computing node, select the precision calculation branch that matches the computing node according to the control signal through the control node in the chip, and complete the mixed precision inference according to the precision calculation branch through the computing node;其中,每个计算节点预先对应float32精度计算分支以及float16精度计算分支。Among them, each computing node is pre-corresponded to a float32 precision computing branch and a float16 precision computing branch.2.根据权利要求1所述的方法,其特征在于,在将输入样本输入至芯片内的深度学习模型之前,还包括:2. The method according to claim 1, further comprising:对所述模型中包括的多个拓扑结构进行排序;sorting a plurality of topological structures included in the model;根据拓扑排序结果,对各拓扑结构对应的计算节点添加float16精度计算分支。According to the topological sorting results, a float16 precision calculation branch is added to the calculation nodes corresponding to each topological structure.3.根据权利要求1所述的方法,其特征在于,在获取所述模型对应的分段列表之后,还包括:3. The method according to claim 1, characterized in that after obtaining the segment list corresponding to the model, it further comprises:根据各所述分段对应的分段类型,对各所述分段对应的精度选择参数进行预设;Presetting the precision selection parameters corresponding to each segment according to the segment type corresponding to each segment;将所述输入样本输入至深度学习模型,通过所述模型根据各所述分段对应的预设精度选择参数,对所述输入样本进行处理得到混精结果。The input sample is input into a deep learning model, and the model selects parameters according to the preset accuracy corresponding to each segment, and processes the input sample to obtain a mixed precision result.4.根据权利要求3所述的方法,其特征在于,根据各所述分段对应的分段类型,对各所述分段对应的精度选择参数进行预设,包括:4. The method according to claim 3, wherein the accuracy selection parameter corresponding to each segment is preset according to the segment type corresponding to each segment, including:在所述分段列表中获取最长分段、已知数据类型分段以及未知数据类型分段;Obtain the longest segment, segments of known data types, and segments of unknown data types from the segment list;将所述最长分段对应的精度选择参数设为false;Set the precision selection parameter corresponding to the longest segment to false;根据所述已知数据类型分段对应的目标数据类型,将所述已知数据类型分段对应的精度选择参数设为true或false;According to the target data type corresponding to the known data type segment, the precision selection parameter corresponding to the known data type segment is set to true or false;将所述未知数据类型分段对应的精度选择参数设为true。Set the precision selection parameter corresponding to the unknown data type segment to true.5.根据权利要求4所述的方法,其特征在于,根据模型针对各分段在预设精度选择参数下的混精结果,以及所述目标结果,对各分段对应的精度选择参数进行调整,包括:5. The method according to claim 4, characterized in that the precision selection parameters corresponding to each segment are adjusted according to the mixed precision results of the model for each segment under the preset precision selection parameters and the target result, including:根据所述混精结果以及所述目标结果,构建评价标准;Constructing an evaluation standard based on the blending result and the target result;根据所述评价标准,判断所述混精结果是否合格;According to the evaluation criteria, judging whether the mixed sperm results are qualified;若是,则将所述最长分段中各计算节点对应的精度选择参数设为false,并在分段列表中将所述最长分段进行移除,然后返回执行在所述分段列表中获取最长分段的操作。If so, set the precision selection parameter corresponding to each computing node in the longest segment to false, remove the longest segment from the segment list, and then return to execute the operation of obtaining the longest segment in the segment list.6.根据权利要求5所述的方法,其特征在于,在根据所述评价标准,判断所述混精结果是否合格之后,还包括:6. The method according to claim 5, characterized in that after determining whether the blending result is qualified according to the evaluation criteria, the method further comprises:若否,则判断所述最长分段是否具备裂变条件;If not, determining whether the longest segment meets the fission conditions;若是,则将所述最长分段裂变为第一分段和第二分段,并将所述第一分段和第二分段添加至分段列表中,然后返回执行在所述分段列表中获取最长分段的操作。If so, the longest segment is split into a first segment and a second segment, and the first segment and the second segment are added to the segment list, and then the operation of obtaining the longest segment in the segment list is returned.7.根据权利要求1所述的方法,其特征在于,在将每个分段中各计算节点对应的目标精度选择参数,作为控制信号输入至各计算节点对应的控制节点之前,还包括:7. The method according to claim 1, characterized in that before inputting the target accuracy selection parameter corresponding to each computing node in each segment as a control signal to the control node corresponding to each computing node, it also includes:判断所述分段列表是否为空;Determine whether the segment list is empty;若是,则获取每个分段中各计算节点对应的当前精度选择参数,并将所述当前精度选择参数作为目标精度选择参数。If so, the current precision selection parameters corresponding to each computing node in each segment are obtained, and the current precision selection parameters are used as the target precision selection parameters.8.一种模型混精推理装置,其特征在于,所述装置包括:8. A model mixed precision inference device, characterized in that the device comprises:目标结果生成模块,用于将输入样本输入至芯片内的深度学习模型中,通过芯片内与深度学习模型对应的多个计算节点对输入样本进行计算,得到float32类型的目标结果;每个计算节点的初始精度选择参数为true;The target result generation module is used to input the input sample into the deep learning model in the chip, calculate the input sample through multiple computing nodes in the chip corresponding to the deep learning model, and obtain the target result of float32 type; the initial precision selection parameter of each computing node is true;参数调整模块,用于获取所述模型对应的分段列表,根据所述模型针对各分段在预设精度选择参数下的混精结果,以及所述目标结果,对各分段对应的精度选择参数进行调整;其中,每个分段中包括至少一个计算节点;a parameter adjustment module, configured to obtain a list of segments corresponding to the model, and adjust the precision selection parameters corresponding to each segment based on the mixed precision results of the model for each segment under preset precision selection parameters and the target result; wherein each segment includes at least one computing node;分支选择模块,用于将每个分段中各计算节点对应的目标精度选择参数,作为控制信号输入至各计算节点对应的控制节点中,通过所述芯片内的控制节点根据所述控制信号选择与计算节点匹配的精度计算分支,并通过计算节点根据所述精度计算分支完成混精推理;A branch selection module is used to input the target precision selection parameters corresponding to each computing node in each segment as control signals into the control nodes corresponding to each computing node, select the precision calculation branch that matches the computing node according to the control signals through the control nodes in the chip, and complete mixed precision inference according to the precision calculation branches through the computing nodes;其中,每个计算节点预先对应float32精度计算分支以及float16精度计算分支。Among them, each computing node is pre-corresponded to a float32 precision computing branch and a float16 precision computing branch.9.一种电子设备,其特征在于,所述设备包括:9. An electronic device, characterized in that the device comprises:至少一个处理器;以及at least one processor; and与所述至少一个处理器通信连接的存储器;其中,a memory communicatively connected to the at least one processor; wherein,所述存储器存储有可被所述至少一个处理器执行的计算机程序,所述计算机程序被所述至少一个处理器执行,以使所述至少一个处理器能够执行权利要求1-7中任一项所述的模型混精推理方法。The memory stores a computer program that can be executed by the at least one processor. The computer program is executed by the at least one processor to enable the at least one processor to perform the model mixed precision inference method according to any one of claims 1 to 7.10.一种计算机可读存储介质,其特征在于,所述计算机可读存储介质存储有计算机指令,所述计算机指令用于使处理器执行时实现权利要求1-7中任一项所述的模型混精推理方法。10. A computer-readable storage medium, characterized in that the computer-readable storage medium stores computer instructions, and the computer instructions are used to enable a processor to implement the model mixed precision reasoning method according to any one of claims 1 to 7 when executed.
CN202310524663.2A2023-05-102023-05-10 A model mixed precision reasoning method, device, equipment and storage mediumActiveCN116523051B (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN202310524663.2ACN116523051B (en)2023-05-102023-05-10 A model mixed precision reasoning method, device, equipment and storage medium

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN202310524663.2ACN116523051B (en)2023-05-102023-05-10 A model mixed precision reasoning method, device, equipment and storage medium

Publications (2)

Publication NumberPublication Date
CN116523051A CN116523051A (en)2023-08-01
CN116523051Btrue CN116523051B (en)2025-10-03

Family

ID=87408020

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN202310524663.2AActiveCN116523051B (en)2023-05-102023-05-10 A model mixed precision reasoning method, device, equipment and storage medium

Country Status (1)

CountryLink
CN (1)CN116523051B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN110543332A (en)*2017-04-242019-12-06英特尔公司 Inference using a mix of low and high precision
CN115329939A (en)*2022-08-242022-11-11无锡江南计算技术研究所Method and device for realizing pulse array hardware supporting various different-precision operations

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US11586883B2 (en)*2018-12-142023-02-21Microsoft Technology Licensing, LlcResidual quantization for neural networks
CN112598078B (en)*2020-12-282024-04-19北京达佳互联信息技术有限公司Hybrid precision training method and device, electronic equipment and storage medium
CN114969446B (en)*2022-06-022023-05-05中国人民解放军战略支援部队信息工程大学Grouping hybrid precision configuration scheme searching method based on sensitivity model

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN110543332A (en)*2017-04-242019-12-06英特尔公司 Inference using a mix of low and high precision
CN115329939A (en)*2022-08-242022-11-11无锡江南计算技术研究所Method and device for realizing pulse array hardware supporting various different-precision operations

Also Published As

Publication numberPublication date
CN116523051A (en)2023-08-01

Similar Documents

PublicationPublication DateTitle
CN115271218B (en)Carbon emission prediction method, device, equipment and medium based on electric carbon factor
CN117827710B (en)DMA bandwidth determining method, device, equipment and medium based on AI chip
CN116992150A (en)Research and development component recommendation method, device, equipment and storage medium
CN119046124B (en)Cost evaluation method, device, equipment, medium and product of distributed system
CN118981355B (en) Topological structure generation method, device, electronic device and storage medium
CN114861039B (en)Parameter configuration method, device, equipment and storage medium of search engine
CN118656221B (en) Microservice merging method, device, equipment and storage medium
CN118033461B (en)Method and device for evaluating battery health state and electronic equipment
CN116523051B (en) A model mixed precision reasoning method, device, equipment and storage medium
CN118300997A (en)DMA bandwidth determining method and medium based on deep neural learning model
CN116881280A (en)Optimized database statement determination method, device, equipment and storage medium
CN115293083B (en)Integrated circuit time sequence prediction method and device, electronic equipment and storage medium
CN116932348A (en)Intelligent algorithm performance analysis method and device, electronic equipment and medium
CN116382658A (en)Compiling method and device of AI model, computer equipment and storage medium
CN115511047B (en)Quantification method, device, equipment and medium of Softmax model
CN117608589B (en)Code generation method, device, electronic equipment and storage medium
CN116629810B (en)Operation recommendation method, device, equipment and medium based on building office system
CN116108589B (en)Method, device, equipment and medium for constructing core model
CN118468059B (en)Power station characteristic determining method and device for electric power system, electronic equipment and medium
CN117271113B (en) Task execution method, device, electronic device and storage medium
CN119829120A (en)Optimization method, device, equipment and medium for configuration parameters of operating system
CN119443192A (en) Model training and information query method, device, equipment, medium and product
CN120782601A (en)Case complaint rate determining method and device, electronic equipment and storage medium
CN116205279A (en)Hardware scheduling execution method, device, equipment and medium of deep learning model
CN119025359A (en) A performance testing method, device, electronic device and medium for heterogeneous chips

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination
GR01Patent grant

[8]ページ先頭

©2009-2025 Movatter.jp