Disclosure of Invention
Aiming at the problems in the prior art, the invention provides a linear programming-based neural network bias processing method, device, equipment and storage medium for a storage and computation integrated chip, which can at least partially solve the problems in the prior art.
In order to achieve the purpose, the invention adopts the following technical scheme:
in a first aspect, a linear programming based neural network bias processing method for a storage and computation integrated chip is provided, including:
acquiring input sample data, weight data, bias data and hardware parameters of a target storage and calculation integrated chip of a target neural network layer;
inputting the input sample data, the weight data, the bias data and the hardware parameters into a pre-established linear programming solving model to solve to obtain bias high-bit data and bias low-bit data; wherein,
the bias low-bit data is used for mapping to a flash memory unit array of a simulation domain of the target storage and calculation integrated chip to participate in simulation operation; and the bias high-order data is used for being stored into a digital domain of the target storage and calculation integrated chip and combined with an analog operation result output by an analog domain.
Further, the weight data includes: the weight array high-order data, the weight array low-order data and the scaling coefficient; the hardware parameters of the target storage and calculation integrated chip comprise: the input and output bit width of the flash memory unit array, the bit width of the digital domain and the maximum row number of the offset array.
Further, the linear programming solution model objective function includes:
summing the result of multiply-add operation of the input sample data and the low-level data of the weight array with the offset low-level data, wherein the total saturation truncation of the quotient of the sum divided by the scaling factor G _ Scale is performed for a minimum number of times, wherein the truncation is performed if the quotient is lower than a saturation truncation lower limit or higher than a saturation truncation upper limit, and the saturation truncation lower limit is-2n-1 The upper limit of saturation cut-off is 2n-1 -1, n is the input-output bit width;
summing the multiplication and addition operation result of the input sample data and the weight array low-level data with the offset low-level data, and if the sum is not saturated and truncated after being divided by the scaling coefficient, enabling the maximum value of the absolute value of the quotient to be as small as possible;
the maximum value of the absolute value of the offset low-bit data is as small as possible;
wherein the upper limit of the bias is m x (2)n-1 -1) xK with a lower bias limit of mx (-2)n-1 ) And x K, wherein m is the maximum row number of the offset array, and K is the amplification factor of the offset data.
Further, the constraint conditions of the linear programming solution model include:
the bias lower data is located between the bias lower limit and the bias upper limit;
the bias high-order data is positioned between the bias high-order lower limit and the bias high-order upper limit;
the sum of the offset high-order data multiplied by a scaling factor and the offset low-order data is equal to the offset data;
wherein, the upper limit of the bias high level is: 2w-1 -1, lower limit of offset high is 2w-1 And w is the bit width of the digital domain.
In a second aspect, a linear programming based neural network bias processing apparatus for storing a monolithic chip is provided, including:
the parameter acquisition module is used for acquiring input sample data, weight data, bias data and hardware parameters of the target storage and calculation integrated chip of the target neural network layer;
the linear solving module is used for solving the input sample data, the weight data, the bias data and the hardware parameter input into a pre-established linear programming solving model to obtain bias high-bit data and bias low-bit data; wherein,
the bias low-bit data is used for mapping to a flash memory unit array of an analog domain of the target storage and calculation integrated chip to participate in analog operation; and the bias high-order data is used for being stored into a digital domain of the target storage and calculation integrated chip and combined with an analog operation result output by an analog domain.
In a third aspect, a computing integrated chip is provided, comprising: the analog domain and the digital domain; the analog domain includes: the analog domain is used for executing matrix multiply-add operation of input data and weight low-bit data, summation operation of a matrix multiply-add operation result and the bias low-bit data and division operation of the summation result and a scaling parameter; pre-storing bias high-bit data and the scaling parameter in the digital domain, and summing a division operation result of the weight high-bit data and the scaling parameter with a division operation result output by the analog domain; and generating the bias low data and the bias high data according to the neural network bias processing method based on the linear programming.
In a fourth aspect, an electronic device is provided, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and the processor implements the steps of the linear programming based neural network bias processing method when executing the program.
In a fifth aspect, a computer-readable storage medium is provided, on which a computer program is stored, which computer program, when being executed by a processor, implements the steps of the above-mentioned linear programming-based neural network bias processing method.
The embodiment of the invention provides a linear programming-based neural network bias processing method for a memory-computation integrated chip, which comprises the following steps: acquiring input sample data, weight data, bias data and hardware parameters of a target storage and calculation integrated chip of a target neural network layer; inputting the input sample data, the weight data, the bias data and the hardware parameters into a pre-established linear programming solving model to solve to obtain bias high-bit data and bias low-bit data; the bias low-bit data is used for mapping to a flash memory unit array of an analog domain of the target storage and calculation integrated chip to participate in analog operation; and the bias high-order data is used for being stored in a digital domain of the target storage and calculation integrated chip to participate in digital domain calculation and then combined with an analog calculation result output by an analog domain. By adopting the technical scheme, the truncation error problem caused by saturation is reduced, and the operation precision of the chip is improved.
In order to make the aforementioned and other objects, features and advantages of the invention comprehensible, preferred embodiments accompanied with figures are described in detail below.
Detailed Description
In order to make the technical solutions of the present application better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only some embodiments of the present application, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
It should be noted that the terms "comprises" and "comprising," and any variations thereof, in the description and claims of this application and the above-described drawings, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.
In the prior art, the summation operation of the whole offset and the matrix multiplication and addition result is directly put on an analog domain for operation, and after the sum of the matrix multiplication and addition result and the offset is divided by a scaling coefficient, a result exceeds a bit width range preset by a storage and calculation integrated chip to cause truncation errors, so that the calculation accuracy of the chip is reduced. For example, g _ scale is 1, the result of the multiplication and addition of the representative samples and the low-order weight has 3 sets, which are respectively [ -200,0], [ -100,100], [0,200], the original bias has a value of [0,0], if the bias split is not considered, the result of the three sets is added to the bias value and then divided by g _ scale, the result is also [ -200,0], [ -100,100], [0,200], at this time, truncation occurs, and the final output result is [ -128,0], [ -100,100], [0,127], which results in low operation precision of the chip.
FIG. 1 is a flow chart illustrating a linear programming based neural network bias processing method for a computational integrated chip in an embodiment of the present invention; as shown in fig. 1, the linear programming based neural network bias processing method for a memory integrated chip may include the following steps:
step S100: acquiring input sample data, weight data, bias data and hardware parameters of the target storage and computation integrated chip of the target neural network layer.
It is worth to be noted that, the flash memory cell array of the target bank chip has been designed with the flash memory cell array for writing the weight array and the flash memory cell array for writing the offset in the hardware design stage, and the hardware parameters of the target bank chip are known; for a specific trained neural network, the weight data and the bias data of each layer are known.
In addition, the input sample data may be a plurality of samples, which are typical samples corresponding to the target neural network application scenario.
Step S200: and inputting the input sample data, the weight data, the bias data and the hardware parameters into a pre-established linear programming solving model for solving to obtain bias high-bit data and bias low-bit data.
It is worth to be noted that the offset low-bit data is used for being mapped to the flash memory unit array of the analog domain of the target storage and computation integrated chip to participate in analog operation; and the bias high-order data is used for being stored into a digital domain of the target storage and calculation integrated chip and is combined with an analog operation result output by an analog domain.
The embodiment of the invention provides a strict mathematical basis for bias splitting to search an optimal splitting result, splits bias data to reduce the truncation error problem caused by saturation, converts the split bias data into a solving problem of a linear programming mathematical model, and uses the solved bias low-bit data to be mapped to a flash memory unit array of a simulation domain of the target storage and computation integrated chip to participate in simulation operation; the bias high-order data is stored into a digital domain of the target storage and calculation integrated chip and is combined with an analog operation result output by an analog domain, so that the truncation error problem caused by saturation is reduced, and the operation precision of the chip is improved.
In an alternative embodiment, the weight data includes: the weight array high-order data, the weight array low-order data and the scaling coefficient; the hardware parameters of the target storage and calculation integrated chip comprise: the input and output bit width of the flash memory unit array, the bit width of the digital domain and the maximum row number of the offset array.
It should be noted that the weight array high-order data and the weight array low-order data are obtained by splitting the weight array according to a preset method, specifically, the weight array splitting technology may be to truncate the neural network weight array, use an array formed by the low-order data as the weight array low-order data, and use an array formed by the high-order data as the weight array high-order data, or may be to truncate an overflow bit into the weight array high-order data after integrally amplifying or reducing the weight array, and use the remaining data after truncation as the weight array low-order data after integrally amplifying or reducing the weight array.
The principles of embodiments of the present invention are described in the following formula:
wherein Input represents Input samples, each sample being a one-dimensional vector; weight L represents the lower x-bit part of the weight array and is a two-dimensional matrix; weight H represents the high y bit part of the weight array and is a two-dimensional matrix; for example, x may be equal to 8 or 16, etc., and y may also represent 8 or 16, etc., for example, if the hardware supports 8 bits, x and y may be designed to be equal, both being equal to 8.
BiasL indicates the lower bits of Bias after splitting, and is a one-dimensional vector, and the data type of the lower bits of Bias can be a 32-bit integer, assuming that x is 8; BiasH represents the high order of the Bias after splitting, and is a one-dimensional vector, the data type can be INT8, and the value range is-128 to 127 (the output data type also supports INT16, and the corresponding value range is-32768 to 32767). Wherein:
WeightL+WeightH*2x weight is the Weight array and x is the low bit width of the Weight split.
BiasL + BiasH G _ Scale ═ Bias, Bias is the scaling factor G _ Scale;
"+" is the matrix/vector addition.
(weight Input + BiasL)/G _ Scale are placed in an analog part of the chip for calculation, the type of the output data can be INT8, data beyond the range indicated by INT8 can be cut into-128 and 127, and may be INT16 and the like, which is not limited in the embodiment of the present invention; (WeightH Input) 2x the/G _ Scale and the sum with BiasH are put in the digital part of the chip for calculation.
In an alternative embodiment, the objective function of the linear programming solution model includes three, specifically:
(1) and summing the multiplication and addition operation result of the input sample data and the lower data of the weight array and the offset lower data, wherein the frequency of total saturation truncation of the sum value divided by the scaling coefficient G _ Scale is minimum.
Wherein the truncation is performed if the sum is lower than a saturation truncation lower limit or higher than a saturation truncation upper limit, and the saturation truncation lower limit is-2n-1 The upper limit of saturation cut-off is 2n-1 -1, n is the input-output bit width; saturation truncation may also be understood as data overflow.
See in particular the following formula:
wherein, | | represents or, InputNum represents the number of typical samples; weight Column represents the number of columns of the Weight array, the formula takes the output of hardware support 8bit as an example, 128 and 127 correspond to the upper and lower limits of int8, if the number of bits supported by hardware changes, the corresponding upper and lower limit values in the above formula also need to change.
(2) Summing the multiplication and addition operation result of the input sample data and the weight array low-level data with the offset low-level data, and if the sum is not saturated and truncated after being divided by the scaling coefficient, enabling the maximum value of the absolute value of the quotient to be as small as possible;
see in particular the following formula:
by adopting the technical scheme, the robustness of the model can be improved. The target may be understood as a second solution target that is a distance between the quotient of the sum of the weighted low-order data and the representative input sample and the sum of the biased low-order bits divided by the scaling factor and the overflow upper and lower bounds.
(3) The maximum value of the absolute value of the offset low-bit data is as small as possible;
see in particular the following formula:
min(max(abs(BiasLi )))
wherein the upper limit of the bias is m × (2)n-1 -1) xK with a lower bias limit of mx (-2)n-1 ) xK, m being the maximum number of rows in the bias array, and K being the amplification factor of the bias data, typically 128, which is self-contained after the model has been trained.
In an alternative embodiment, the constraints of the linear programming solution model include:
(1) the bias lower data is positioned between the bias lower limit and the bias upper limit;
(2) the bias high-order data is positioned between the bias high-order lower limit and the bias high-order upper limit, and the bias high-order upper limit is as follows: 2w-1 -1, lower limit of offset high is 2w-1 And w is the bit width of the digital domain. (ii) a
For example, if the hardware supports 8 bits, the value of each element in the Bias low-bit data cannot exceed the value range of the Bias array: -128 offset array row numbers 128 ═ BiasL < ═ 127 offset array row numbers 128.
(3) The sum of the offset high-order data multiplied by the scaling coefficient and the offset low-order data is equal to the offset data;
by adopting the technical scheme, the bias is split according to the low-order weight and a typical input multiply-add result to reduce the truncation error problem caused by saturation, and the split bias is converted into a solving problem of a linear programming mathematical model; converting the low-order weight value of each layer of the neural network, the multiplication and addition result of typical input, the maximum row number of the offset array, the offset and the G _ Scale value into the constraint and target of linear programming for solving; the sum of the result of multiplication and addition of the weighted low bits and the sample and the bias low bits is divided by the scaling factor G _ Scale, the total amount of saturation is minimum, the maximum value of the absolute value of the split bias is required to be as small as possible (the space occupation of the bias when the bias is arranged in the bias array is reduced), and the maximum value of the absolute value when the bias is not saturated is required to be as small as possible (the possibility of saturation truncation when the bias is input in an atypical mode is reduced).
The scheme provided by the embodiment of the invention simultaneously considers a plurality of conditions of minimum saturation times, minimum maximum value of the bias absolute value, minimum maximum value of the absolute value during unsaturation and the like, and can obtain a theoretical optimal solution; this scheme is easy to scale, increasing and decreasing the restrictions in the mapping.
It should be noted that the upper bias limit and the lower bias limit refer to the maximum representation range of the hardware bias array, for example, for a certain hardware, the bias array is 16 rows, the single row is represented by-128 to 127, the total representation range is 16 × (-128 to 127), and then multiplied by an amplification factor of 128, and the final representation range is 16-128 × 128 to 16 × 127 × 128.
It should be noted that, in practical applications, the linear programming solution model may directly call the open-source linear programming solution model in Python and the like, for example, the open-source linear programming solution model of google may be used. To enable those skilled in the art to better understand the present application, FIG. 2 illustrates a split example of an offset array.
In an alternative embodiment, an embodiment of the present invention further provides a storage integrated chip, where the chip includes: the analog domain and the digital domain; the analog domain includes: the analog domain is used for executing matrix multiply-add operation of input data and the weight low-bit data, summation operation of a matrix multiply-add operation result and the bias low-bit data, and division operation of the summation result and a scaling parameter; pre-storing bias high-bit data and the scaling parameter in the digital domain, and summing a division operation result of the weight high-bit data and the scaling parameter with a division operation result output by the analog domain; and generating the bias low-bit data and the bias high-bit data according to the neural network bias processing method based on the linear programming.
It should be noted that the memory integrated chip provided in the embodiment of the present invention may be applied to various electronic devices, such as: smart phones, tablet electronic devices, network set-top boxes, portable computers, desktop computers, Personal Digital Assistants (PDAs), vehicle-mounted devices, smart wearable devices, toys, smart home control devices, pipeline device controllers, and the like. Wherein, intelligence wearing equipment can include intelligent glasses, intelligent wrist-watch, intelligent bracelet etc..
Based on the same inventive concept, the embodiment of the present application further provides a linear programming-based neural network bias processing apparatus for storing and computing a monolithic chip, which can be used to implement the methods described in the above embodiments, as described in the following embodiments. The principle of solving the problems by the linear programming based neural network bias processing device for the memory and computation integrated chip is similar to that of the method, so the implementation of the linear programming based neural network bias processing device for the memory and computation integrated chip can refer to the implementation of the method, and repeated parts are not described again. As used hereinafter, the term "unit" or "module" may be a combination of software and/or hardware that implements a predetermined function. Although the means described in the embodiments below are preferably implemented in software, an implementation in hardware, or a combination of software and hardware is also possible and contemplated.
Fig. 3 is a block diagram of a linear programming-based neural network bias processing apparatus for storing a monolithic chip according to an embodiment of the present invention. The linear programming-based neural network bias processing device for the memory-computation-integrated chip comprises: aparameter acquisition module 10 and alinear solving module 20.
Theparameter acquisition module 10 acquires input sample data, weight data, bias data and hardware parameters of the target storage and calculation integrated chip of the target neural network layer;
thelinear solving module 20 is used for solving the input sample data, the weight data, the bias data and the hardware parameter input into a pre-established linear programming solving model to obtain bias high-order data and bias low-order data; wherein,
the bias low-bit data is used for mapping to a flash memory unit array of an analog domain of the target storage and calculation integrated chip to participate in analog operation; and the bias high-order data is used for being stored into a digital domain of the target storage and calculation integrated chip and combined with an analog operation result output by an analog domain.
The apparatuses, modules or units illustrated in the above embodiments may be implemented by a computer chip or an entity, or implemented by a product with certain functions. A typical implementation device is an electronic device, which may be, for example, a personal computer, a laptop computer, a cellular telephone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.
In a typical example, the electronic device specifically includes a memory, a processor, and a computer program stored on the memory and executable on the processor, and the processor executes the program to implement the steps of the neural network mapping method for storing a monolithic chip described above.
Referring now to FIG. 4, shown is a schematic diagram of an electronic device 600 suitable for use in implementing embodiments of the present application.
As shown in fig. 4, the electronic apparatus 600 includes a Central Processing Unit (CPU)601 that can perform various appropriate jobs and processes according to a program stored in a Read Only Memory (ROM)602 or a program loaded from astorage section 608 into a Random Access Memory (RAM)) 603. In the RAM603, various programs and data necessary for the operation of the system 600 are also stored. The CPU601, ROM602, and RAM603 are connected to each other via abus 604. An input/output (I/O)interface 605 is also connected tobus 604.
The following components are connected to the I/O interface 605: aninput portion 606 including a keyboard, a mouse, and the like; anoutput portion 607 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; astorage section 608 including a hard disk and the like; and acommunication section 609 including a network interface card such as a LAN card, a modem, or the like. Thecommunication section 609 performs communication processing via a network such as the internet. Thedriver 610 is also connected to the I/O interface 605 as needed. Aremovable medium 611 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on thedrive 610 as necessary, so that a computer program read out therefrom is mounted as necessary on thestorage section 608.
In particular, the processes described above with reference to the flowcharts may be implemented as a computer software program according to an embodiment of the present invention. For example, an embodiment of the present invention includes a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the above-described linear programming-based neural network bias processing method for storing a monolithic chip.
In such an embodiment, the computer program may be downloaded and installed from a network through thecommunication section 609, and/or installed from theremovable medium 611.
Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.
For convenience of description, the above devices are described as being divided into various units by function, and are described separately. Of course, the functionality of the various elements may be implemented in the same one or more pieces of software and/or hardware in the practice of the present application.
The present invention has been described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The application may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The application may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.
All the embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from other embodiments. In particular, for the system embodiment, since it is substantially similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.
The above description is only an example of the present application and is not intended to limit the present application. Various modifications and changes may occur to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the scope of the claims of the present application.