The application discloses a simulation realization method based on a neural network compiler, the neural network compiler and a computer readable storage medium, which are divisional applications of which the application date is 2021, 12 and 31, and the application number is 202111653883.2.
Disclosure of Invention
The application aims to provide a simulation implementation method for improving simulation efficiency, which aims to solve the technical problems that in the prior art, only each middle layer of a neural network model can be simulated and precision test of a ten-thousand-person test set cannot be performed.
In order to achieve the technical purpose, the application adopts the following technical scheme:
A simulation implementation method for improving simulation efficiency comprises the following steps:
A neural network compiler is constructed and used for receiving quantized set pictures, a plurality of different types of neural network models and a ten-thousand-person test set, and after the neural network compiler performs accuracy verification, the neural network models are simulated layer by layer;
the quantization set picture quantizes the neural network model through the neural network compiler to generate an executable file, and the ten-thousand-person test set generates first input data, a first fixed-point characteristic file and a floating-point characteristic file through the neural network compiler;
comparing the first fixed point characteristic file with the floating point characteristic file, and outputting a precision table for counting the neural network model;
and if the statistical result of the precision table accords with a preset precision range, reading the executable file and the first input data to simulate the neural network model.
Preferably, the method further comprises the steps of:
Building an environment of the neural network compiler, installing the neural network compiler, and testing whether the neural network compiler is successfully installed;
the building environment of the neural network compiler is set to be the same operating system as that of the simulation system.
Preferably, the quantization set picture quantizes the neural network model through the neural network compiler to generate an executable file, which specifically includes the following steps:
preparing different types of neural network models and quantized set pictures in different scenes;
operating the neural network compiler, and quantizing the neural network model according to the quantized set picture to generate the executable file;
the executable file comprises a neural network name identifier, a layer identifier of an input layer, a layer identifier of an intermediate layer, a layer identifier of an output layer, a quantized weight value, a quantized offset value, a layer operation name, layer parameter information, layer association information and layer memory information.
Preferably, the method further comprises the steps of:
Presetting the number of the neural network models, setting the initial circulation times to 0, and judging whether the circulation times accord with the preset number of the neural network models;
If the cycle times do not accord with the preset quantity of the neural network models, the quantization set picture quantizes the neural network models through the neural network compiler to generate the executable file, and the ten-thousand-person test set generates the first input data, the first fixed-point characteristic file and the floating-point characteristic file through the neural network compiler;
And if the cycle times accord with the preset quantity of the neural network models, ending the flow.
Preferably, the ten thousand person test set generates first input data, a first fixed point characteristic file and a floating point characteristic file through the neural network compiler, and specifically comprises the following steps:
preparing different ten-thousand-person test sets according to different neural network models;
The ten-thousand-person test set generates first input data with the network resolution through a scaling function, and the ten-thousand-person test set is simulated to generate a first fixed-point characteristic file and a floating-point characteristic file.
Preferably, comparing the first fixed point feature file with the floating point feature file, outputting a precision table for counting the neural network model, and specifically comprising the following steps:
the floating point characteristic file comprises first floating point characteristic data, the fixed point characteristic data in the first fixed point characteristic file is converted into floating point characteristic data, and second floating point characteristic data is generated;
comparing the similarity of the first floating point characteristic data and the second floating point characteristic data, and if the similarity is within a preset variable, meeting the precision requirement; if the similarity is not in the preset variable, the accuracy requirement is not met;
And outputting the similarity statistical result of the first floating point characteristic data and the second floating point characteristic data in a form of a table.
Preferably, if the statistical result of the precision table accords with a preset precision range, the executable file and the first input data are read to simulate the neural network model, and the method specifically includes the following steps:
counting the precision table, wherein the counting result is required to accord with a preset precision range;
Reading the executable file, configuring hardware according to the executable file, reading the first input data, starting simulation of the neural network model according to the first input data, and generating a second fixed-point characteristic file;
and comparing the first fixed point characteristic file with the second fixed point characteristic file, and if the first fixed point characteristic file and the second fixed point characteristic file are different, storing error data in the second fixed point characteristic file.
Preferably, the method further comprises the steps of:
establishing a first folder, and automatically generating a first main folder under the first folder, wherein the first main folder is used for storing the executable files;
automatically generating a first sub-folder under a first folder, wherein the first sub-folder is used for storing the first fixed-point characteristic file;
and automatically generating an input data folder under a first folder, wherein the input data folder is used for storing the first input data.
Preferably, different types of neural network models and quantized set pictures are prepared, and the method specifically comprises the following steps of:
and establishing a second folder, and generating a second main folder under the second folder, wherein the second main folder is used for storing the neural network models of different types, the quantized set pictures and the floating point characteristic files.
Preferably, different ten thousand person test sets are prepared according to different neural network models, and the method specifically comprises the following steps:
and establishing a second auxiliary folder under the second main folder, wherein the second auxiliary folder is used for storing the ten-thousand-person test set.
A neural network compiler is applied to the simulation implementation method for improving the simulation efficiency, and comprises the following steps: the network analysis module, the network quantification module, the network merging module, the network storage module and the network forward execution module are sequentially connected;
The network analysis module is used for receiving the quantized set pictures, the multiple different types of neural network models and the ten-thousand-person test set, analyzing and reconstructing the structure of the neural network models layer by layer, and at least acquiring one of the input layer, the output layer and the layer operation name, the layer parameter information and the layer association information of the middle layer of the neural network model;
the network quantization module is used for generating an offset value, a conversion value and converting a floating point type weight value into a fixed point type weight value according to the reconstructed neural network model;
The network merging module is used for merging the running water operation instructions of the convolution layer, the pooling layer and the activation layer in the neural network model;
The network storage module is used for storing the data in the network analysis module, the network quantization module and the network merging module to generate an executable file;
The network forward execution module is used for generating the first input data, the first fixed point characteristic file and the floating point characteristic file through the network forward execution module by the ten-thousand-person test set, comparing the first fixed point characteristic file and the floating point characteristic file, and outputting an accuracy table for counting the neural network model.
A computer readable storage medium having stored thereon computer instructions which when executed by a processor perform the steps of the method described above.
The beneficial effects provided by the application are as follows:
1. The quantization set picture is quantized to different neural network models through the neural network compiler to generate different executable files, and if the statistical result of the precision table accords with a preset precision range, the executable files and the first input data are read to simulate the neural network models. The simulation method has the advantages that batch simulation of a plurality of different types of neural network models is realized, various marginalized simulation is considered in the simulation of the neural network models, correctness of the transplanted neural network models to a chip or an FPGA is ensured, hardware is configured through executable files, simulation is conducted layer by layer aiming at the different types of the neural network models, more simulation verification points are covered, risk of chip streaming is prevented, cost is saved, simulation efficiency is improved, and meanwhile, comprehensive accuracy verification is conducted on an accuracy table for counting the neural network models.
2. The method comprises the steps of presetting the number of neural network models, setting the initial circulation times to 0, and judging whether the circulation times accord with the preset number of the neural network models. By judging the number of the neural network models, the time for generating the executable file, the first input data, the first fixed point characteristic file and the floating point characteristic file is saved, and the time consumption of quantification of the neural network models in the forward process is avoided. The generated different data are automatically stored under different folders through pre-stored paths, corresponding data are provided for realizing simulation of multiple types of neural network models, the simulation flow is simplified, and the simulation efficiency is accelerated.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present application more apparent, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the present application, but not all embodiments of the present application. The components of the embodiments of the present application generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations.
Thus, the following detailed description of the embodiments of the application, as presented in the figures, is not intended to limit the scope of the application, as claimed, but is merely representative of selected embodiments of the application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
Example 1:
As shown in fig. 1, the present embodiment includes a simulation implementation method for improving simulation efficiency, including the following steps:
And constructing a neural network compiler which is used for receiving the quantized set pictures, the neural network models of different types and the ten-thousand-person test set, and simulating the neural network models layer by layer after the neural network compiler performs accuracy verification.
The quantized set picture quantizes the neural network model through a neural network compiler to generate an executable file, and the ten-thousand-person test set generates first input data, a first fixed-point characteristic file and a floating-point characteristic file through the neural network compiler.
Comparing the first fixed point characteristic file with the floating point characteristic file, and outputting a precision table for the statistical neural network model; if the statistical result of the precision table accords with the preset precision range, the executable file and the first input data are read to simulate the neural network model.
The simulation method has the advantages that batch simulation of a plurality of different types of neural network models is realized, various marginalized simulation is considered in the simulation of the neural network models, correctness of the transplanted neural network models to a chip or an FPGA is ensured, hardware is configured through executable files, simulation is conducted layer by layer aiming at the different types of the neural network models, more simulation verification points are covered, risk of chip streaming is prevented, cost is saved, simulation efficiency is improved, and meanwhile, comprehensive accuracy verification is conducted on an accuracy table for counting the neural network models.
The method also comprises the steps of: and setting up the environment of the neural network compiler, installing the neural network compiler, and testing whether the neural network compiler is successfully installed, wherein the setting up environment of the neural network compiler is set to be the same operating system as the simulation system. Specifically, the neural network compiler is packaged into whl format, whl format is a compressed file format, and the installation test under the operating system is convenient.
The quantization set picture quantizes the neural network model through a neural network compiler to generate an executable file, and specifically comprises the following steps: different types of neural network models and quantized set pictures in different scenes are prepared.
And operating a neural network compiler, and quantizing the neural network model according to the quantized set picture to generate an executable file. The executable file comprises a neural network name identifier, a layer identifier of an input layer, a layer identifier of an intermediate layer, a layer identifier of an output layer, a quantized weight value, a quantized offset value, a layer operation name, layer parameter information, layer association information and layer memory information.
Specifically, the network analysis module of the neural network compiler analyzes and reconstructs the structure of the original neural network model layer by layer, generates offset values and conversion values according to the reconstructed neural network model, and converts floating point type weight values into fixed point type weight values. The network merging module and the network quantifying module operate simultaneously to merge pipeline operation instructions in a convolution layer, a pooling layer and an activation layer in the neural network model. And the network storage module generates executable files from the quantized data operated by the network analysis module, the network quantization module and the network merging module.
The formula for generating the offset value is as follows:
equation one: x's'm=(x′max-x′min)*2bw
Where x 'm denotes an offset value, x 'max denotes a maximum weight value of the floating point type, x 'min denotes a minimum weight value of the floating point type, bw denotes a converted bit width, and in this embodiment, a bit width of 12 bits is currently supported.
The formula for generating the conversion value is as follows:
formula II: f=max ((bw-ceil (log2(x′m) +1)), bw)
Wherein, f represents a conversion value, max represents a maximum value of the built-in function belonging to the system library, bw represents a converted bit width, log2 represents the built-in function of the system library, x'm represents an offset value, and ceil represents an upward rounding of the built-in function belonging to the system library.
Converting a floating point type weight value into a fixed point type weight value, wherein the formula for converting floating point characteristic data into fixed point characteristic data is expressed as follows:
And (3) a formula III: x=round (Xfloat*2f)+x′m
Wherein X represents fixed-point feature data, in this embodiment, may be a fixed-point weight value, Xfloat represents floating-point feature data, in this embodiment, may be a floating-point weight value, round represents a rounded system library built-in function, f represents a conversion value, and X'm represents an offset value.
Specifically, the layer operation names include at least one of convolution, deconvolution, pooling, full join, culling, join, point addition, point multiplication, normalization, and activation layer operations. The layer parameter information includes at least one of a convolution kernel size, a convolution kernel span, a grouping, a padding value, whether to bring an active layer, a quantized weight value, and a quantized offset value. The layer association information includes at least one of an input layer operation name of the current layer, layer parameter information, an output layer operation name of the current layer, and layer parameter information. The intra-layer memory information includes at least one of a memory size of a current layer and whether to multiplex the memories of other layers.
Specifically, the neural network models of different types comprise a detection network, an identification network, a classification network and the like, and the number of quantized set pictures in different scenes at least comprises 50.
The method also comprises the steps of: presetting the number of neural network models, setting the initial circulation times to 0, and judging whether the circulation times accord with the preset number of the neural network models.
If the cycle times do not accord with the number of the preset neural network models, the quantized set pictures quantize the neural network models through a neural network compiler to generate executable files, and the ten-thousand-person test set generates first input data, a first fixed-point characteristic file and a floating-point characteristic file through the neural network compiler.
If the cycle number accords with the number of the preset neural network models, ending the flow. And adding 1 to the circulation number every time an executable file is simulated.
By judging the number of the neural network models, the time for generating the executable file, the first input data, the first fixed point characteristic file and the floating point characteristic file is saved, and the time consumption of quantification of the neural network models in the forward process is avoided.
The ten-thousand-person test set generates first input data, a first fixed-point characteristic file and a floating-point characteristic file through a neural network compiler, and specifically comprises the following steps:
according to different neural network models, different ten-thousand-person test sets are prepared, the ten-thousand-person test sets generate first input data with the network resolution through a scaling function, and simulation is carried out on the ten-thousand-person test sets to generate a first fixed-point feature file and a floating-point feature file.
Specifically, the ten thousand person test sets are picture sets, the number of the ten thousand person test sets is ten thousand pictures, and the ten thousand person test sets generate first input data, a first fixed point characteristic file and a floating point characteristic file through a network forward execution module.
The method also comprises the steps of: and establishing a first folder, automatically generating a first main folder under the first folder, wherein the first main folder is used for storing executable files.
And automatically generating a first sub-folder under the first folder, wherein the first sub-folder is used for storing the first fixed-point characteristic file. And automatically generating an input data folder under the first folder, wherein the input data folder is used for storing the first input data.
Preparing different types of neural network models and quantized set pictures, and specifically comprising the following steps of: and establishing a second folder, and generating a second main folder under the second folder, wherein the second main folder is used for storing different types of neural network models, quantized set pictures and floating point characteristic files.
According to different neural network models, different ten thousand person test sets are prepared, and the method specifically comprises the following steps: and establishing a second auxiliary folder under the second main folder, wherein the second auxiliary folder is used for storing the ten-thousand-person test set.
Specifically, under the current PATH, a first folder and a second folder are established, the file name of the first folder is defined as SPE_PATH1, the file name of the second folder is defined as SPE_PATH2, under the SPE_PATH2 file, a second main folder named by the name of the neural network is established to store the neural network model and quantized set pictures generated by the GPU, and a second auxiliary folder is established under the second main folder to store the ten-thousand-person test set.
And generating a second main folder named by the neural network name under the SPE_PATH1 file by the neural network compiler every time an executable file is generated, and storing the executable file generated by the neural network compiler.
Automatically generating an input data folder under the SPE_PATH1 file, defining a neural network name which is analyzed by a neural network compiler as resnet in the embodiment, defining the file name of the generated input data folder as SPE_PATH1/resnet/data_input, storing a ten-thousand-person test set, generating first input data with the network resolution through a scaling function, and adopting a hexadecimal format and arranging each data line for the convenience of simulation.
And automatically generating a first sub-folder under the SPE_PATH1 file, wherein the analyzed neural network name is resnet, the network layer name is conv1_1, and the layer serial number is 1, and the file name of the generated first sub-folder is defined as SPE_PATH1/resnet/conv1_1 and is used for storing a first fixed point characteristic file generated by an intermediate layer and an output layer when the ten-thousand-person test set is simulated so as to facilitate the simulation to check the correctness of data, and each data line is arranged in a hexadecimal format.
The generated different data are automatically stored under different folders through pre-stored paths, corresponding data are provided for realizing simulation of multiple types of neural network models, the simulation flow is simplified, and the simulation efficiency is accelerated.
The method also comprises the steps of: and presetting the number of executable files, and judging whether the number of the executable files under the first main folder exceeds the number of the preset executable files.
If the number of the executable files under the first main folder does not exceed the number of the preset executable files, the neural network compiler simulates the ten-thousand-person test set to generate a first fixed-point characteristic file.
If the number of the executable files under the first main folder exceeds the number of the preset executable files, ending the process of simulating the ten-thousand-person test set by the neural network compiler.
And determining whether the ten-thousand-person test set is simulated by judging the number of executable files under the first main folder, and ending the simulation flow if the simulation is finished, so that the simulation efficiency is improved.
Comparing the first fixed point characteristic file with the floating point characteristic file, and outputting a precision table for the statistical neural network model, wherein the method specifically comprises the following steps:
the floating point characteristic file comprises first floating point characteristic data, the fixed point characteristic data in the first fixed point characteristic file is converted into floating point characteristic data, and second floating point characteristic data is generated;
Comparing the similarity of the first floating point characteristic data and the second floating point characteristic data, and if the similarity is within a preset variable, meeting the precision requirement; if the similarity is not in the preset variable, the accuracy requirement is not met;
And outputting the similarity statistics of the first floating point characteristic data and the second floating point characteristic data in the form of a table.
Specifically, the fixed point characteristic data in the first fixed point characteristic file is converted into floating point characteristic data through a conversion formula, wherein the conversion formula is as follows:
Equation four: x'.float=(X-x′m)/2f
Wherein X 'float represents floating point feature data, which in this embodiment may be second floating point feature data, X represents fixed point feature data, which in this embodiment may be fixed point feature data in the first fixed point feature file, X'm represents an offset value, and f represents a conversion value.
Specifically, the similarity between the first floating point feature data and the second floating point feature data is compared, and the similarity distance formula is as follows:
Formula five:
Where n represents the total number of floating point feature data, xi represents the first floating point feature data, and yi represents the second floating point feature data, i.e., the value of x'float in equation four. θ represents the similarity of distances, and closer to 1 indicates higher accuracy.
In this embodiment, by testing a ten-thousand-person test set corresponding to a neural network model, setting a preset variable to be a similarity distance of 0.8, comparing the similarity of the first floating point feature data and the second floating point feature data, that is, counting the similarity of each picture in the ten-thousand-person test set, when the similarity distance is greater than or equal to 0.8, indicating that the precision requirement is met, counting the duty ratio of the count data of each neural network model in the ten-thousand-person test set, and outputting a precision table for counting the neural network model. The statistical result of the precision table can be intuitively seen, and whether the requirement of hardware design meets the precision requirement is checked.
If the statistical result of the precision table accords with the preset precision range, reading the executable file and the first input data to simulate the neural network model, wherein the method specifically comprises the following steps:
and counting the precision table, wherein the counting statistical result is required to accord with a preset precision range. And reading the executable file, configuring hardware according to the executable file, reading the first input data, and starting simulation of the neural network model according to the first input data to generate a second fixed-point characteristic file.
And comparing the first fixed point characteristic file with the second fixed point characteristic file, and if the first fixed point characteristic file and the second fixed point characteristic file are different, storing error data in the second fixed point characteristic file.
The simulation problem can be conveniently located through the error data in the second fixed-point characteristic file, the simulation efficiency can be improved, and the simulation coverage is wider.
Example 2:
the embodiment includes a neural network compiler, which is applied to the simulation implementation method for improving the simulation efficiency of embodiment 1, and includes: the system comprises a network analysis module, a network quantification module, a network merging module, a network storage module and a network forward execution module which are connected in sequence.
The network analysis module is used for receiving the quantized set pictures, the nerve network models of different types and the ten-thousand-person test set, analyzing and reconstructing the structure of the nerve network model layer by layer, and at least acquiring one of the input layer, the output layer and the layer operation name, the layer parameter information and the layer association information of the middle layer of the nerve network model.
Specifically, the network analysis module analyzes the structure of the original neural network model layer by layer, at least one of the input layer, the output layer, the layer operation name of the middle layer, the layer parameter information and the layer association information of the neural network model is obtained, the structure executed in the internal sequence is reconstructed after analysis, the data structure of the internal relevant network layer is redefined, the network layer comprises a convolution layer, a pooling layer and an activation layer, and the content such as the layer execution sequence, the layer operation type, the layer operation name, the layer parameter information and the layer association information is filled into the data structure of the internal relevant network layer.
And the network quantization module is used for generating an offset value, a conversion value and converting a floating point type weight value into a fixed point type weight value according to the reconstructed neural network model.
Specifically, floating point characteristic data of the storage address space is converted into a data format supported by hardware, and conversion values are calculated, so that the calculated amount of hardware and the number of multipliers are reduced.
And the network merging module is used for merging the running water operation instructions of the convolution layer, the pooling layer and the activation layer in the neural network model.
Specifically, according to the principle of reducing the bandwidth of an external memory, pipeline operation instructions in a convolution layer, a pooling layer and an activation layer are optimized, equivalent transformation optimization is performed on the convolution layer, the pooling layer and the activation layer, and internal data structures are optimized and combined again, so that the resource consumption is reduced, and the execution efficiency is improved. And the data interaction between the internal memory and the external memory is reduced, so that the bandwidth utilization rate is improved, and the layers in the same pipeline stage are combined, wherein the main combined layers are a convolution layer and a pooling layer.
And the network storage module is used for storing the data in the network analysis module, the network quantization module and the network merging module to generate an executable file.
The network forward execution module is used for generating first input data, a first fixed point characteristic file and a floating point characteristic file through the network forward execution module by the ten-thousand testing set, comparing the first fixed point characteristic file and the floating point characteristic file, and outputting an accuracy form for counting the neural network model.
Specifically, the standardization part is implemented by adopting an open-source deep learning architecture so as to ensure the correct comparison standard, and the simulation part keeps the forward logic of the network consistent with the logic of the hardware execution network so as to ensure the consistency of the simulation result of the data and the hardware.
For relevance, see the section of example 1.
Example 3:
a computer readable storage medium having stored thereon computer instructions which when executed by a processor perform the steps of the method of embodiment 2.
It will be apparent to those skilled in the art that embodiments of the present invention may be provided as a method, apparatus, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, terminal devices (systems), and computer program products according to the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing terminal device to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing terminal device, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It should be noted that:
Reference in the specification to "one embodiment" or "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the application. Thus, the appearances of the phrase "one embodiment" or "an embodiment" in various places throughout this specification are not necessarily all referring to the same embodiment.
While preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following claims be interpreted as including the preferred embodiments and all such alterations and modifications as fall within the scope of the application.
In addition, the specific embodiments described in the present specification may differ in terms of parts, shapes of components, names, and the like. All equivalent or simple changes of the structure, characteristics and principle according to the inventive concept are included in the protection scope of the present application. Those skilled in the art may make various modifications or additions to the described embodiments or substitutions in a similar manner without departing from the scope of the application as defined in the accompanying claims.