wherein Input represents Input samples, each sample being a one-dimensional vector; weight L represents the lower x-bit part of the weight array and is a two-dimensional matrix; weight H represents the high y bit part of the weight array and is a two-dimensional matrix; for example, x may be equal to 8 or 16, etc., and y may also represent 8 or 16, etc., for example, if the hardware supports 8 bits, x and y may be designed to be equal, both being equal to 8.

BiasL indicates the lower bits of Bias after splitting, and is a one-dimensional vector, and the data type of the lower bits of Bias can be a 32-bit integer, assuming that x is 8; BiasH represents the high order of the Bias after splitting, and is a one-dimensional vector, the data type can be INT8, and the value range is-128 to 127 (the output data type also supports INT16, and the corresponding value range is-32768 to 32767). Wherein:

WeightL+WeightH*2^x weight is the Weight array and x is the low bit width of the Weight split.

BiasL + BiasH G _ Scale ═ Bias, Bias is the scaling factor G _ Scale;

"+" is the matrix/vector addition.

(weight Input + BiasL)/G _ Scale are placed in an analog part of the chip for calculation, the type of the output data can be INT8, data beyond the range indicated by INT8 can be cut into-128 and 127, and may be INT16 and the like, which is not limited in the embodiment of the present invention; (WeightH Input) 2^x the/G _ Scale and the sum with BiasH are put in the digital part of the chip for calculation.

In an alternative embodiment, the objective function of the linear programming solution model includes three, specifically:

(1) and summing the multiplication and addition operation result of the input sample data and the lower data of the weight array and the offset lower data, wherein the frequency of total saturation truncation of the sum value divided by the scaling coefficient G _ Scale is minimum.

Wherein the truncation is performed if the sum is lower than a saturation truncation lower limit or higher than a saturation truncation upper limit, and the saturation truncation lower limit is-2^n-1 The upper limit of saturation cut-off is 2^n-1 -1, n is the input-output bit width; saturation truncation may also be understood as data overflow.

See in particular the following formula:

wherein, | | represents or, InputNum represents the number of typical samples; weight Column represents the number of columns of the Weight array, the formula takes the output of hardware support 8bit as an example, 128 and 127 correspond to the upper and lower limits of int8, if the number of bits supported by hardware changes, the corresponding upper and lower limit values in the above formula also need to change.

(2) Summing the multiplication and addition operation result of the input sample data and the weight array low-level data with the offset low-level data, and if the sum is not saturated and truncated after being divided by the scaling coefficient, enabling the maximum value of the absolute value of the quotient to be as small as possible;

see in particular the following formula:

by adopting the technical scheme, the robustness of the model can be improved. The target may be understood as a second solution target that is a distance between the quotient of the sum of the weighted low-order data and the representative input sample and the sum of the biased low-order bits divided by the scaling factor and the overflow upper and lower bounds.

(3) The maximum value of the absolute value of the offset low-bit data is as small as possible;

see in particular the following formula:

min(max(abs(BiasL_i )))

wherein the upper limit of the bias is m × (2)^n-1 -1) xK with a lower bias limit of mx (-2)^n-1 ) xK, m being the maximum number of rows in the bias array, and K being the amplification factor of the bias data, typically 128, which is self-contained after the model has been trained.

In an alternative embodiment, the constraints of the linear programming solution model include:

(1) the bias lower data is positioned between the bias lower limit and the bias upper limit;

(2) the bias high-order data is positioned between the bias high-order lower limit and the bias high-order upper limit, and the bias high-order upper limit is as follows: 2^w-1 -1, lower limit of offset high is 2^w-1 And w is the bit width of the digital domain. (ii) a

For example, if the hardware supports 8 bits, the value of each element in the Bias low-bit data cannot exceed the value range of the Bias array: -128 offset array row numbers 128 ═ BiasL < ═ 127 offset array row numbers 128.

(3) The sum of the offset high-order data multiplied by the scaling coefficient and the offset low-order data is equal to the offset data;

by adopting the technical scheme, the bias is split according to the low-order weight and a typical input multiply-add result to reduce the truncation error problem caused by saturation, and the split bias is converted into a solving problem of a linear programming mathematical model; converting the low-order weight value of each layer of the neural network, the multiplication and addition result of typical input, the maximum row number of the offset array, the offset and the G _ Scale value into the constraint and target of linear programming for solving; the sum of the result of multiplication and addition of the weighted low bits and the sample and the bias low bits is divided by the scaling factor G _ Scale, the total amount of saturation is minimum, the maximum value of the absolute value of the split bias is required to be as small as possible (the space occupation of the bias when the bias is arranged in the bias array is reduced), and the maximum value of the absolute value when the bias is not saturated is required to be as small as possible (the possibility of saturation truncation when the bias is input in an atypical mode is reduced).

The scheme provided by the embodiment of the invention simultaneously considers a plurality of conditions of minimum saturation times, minimum maximum value of the bias absolute value, minimum maximum value of the absolute value during unsaturation and the like, and can obtain a theoretical optimal solution; this scheme is easy to scale, increasing and decreasing the restrictions in the mapping.

It should be noted that the upper bias limit and the lower bias limit refer to the maximum representation range of the hardware bias array, for example, for a certain hardware, the bias array is 16 rows, the single row is represented by-128 to 127, the total representation range is 16 × (-128 to 127), and then multiplied by an amplification factor of 128, and the final representation range is 16-128 × 128 to 16 × 127 × 128.

It should be noted that, in practical applications, the linear programming solution model may directly call the open-source linear programming solution model in Python and the like, for example, the open-source linear programming solution model of google may be used. To enable those skilled in the art to better understand the present application, FIG. 2 illustrates a split example of an offset array.

In an alternative embodiment, an embodiment of the present invention further provides a storage integrated chip, where the chip includes: the analog domain and the digital domain; the analog domain includes: the analog domain is used for executing matrix multiply-add operation of input data and the weight low-bit data, summation operation of a matrix multiply-add operation result and the bias low-bit data, and division operation of the summation result and a scaling parameter; pre-storing bias high-bit data and the scaling parameter in the digital domain, and summing a division operation result of the weight high-bit data and the scaling parameter with a division operation result output by the analog domain; and generating the bias low-bit data and the bias high-bit data according to the neural network bias processing method based on the linear programming.

It should be noted that the memory integrated chip provided in the embodiment of the present invention may be applied to various electronic devices, such as: smart phones, tablet electronic devices, network set-top boxes, portable computers, desktop computers, Personal Digital Assistants (PDAs), vehicle-mounted devices, smart wearable devices, toys, smart home control devices, pipeline device controllers, and the like. Wherein, intelligence wearing equipment can include intelligent glasses, intelligent wrist-watch, intelligent bracelet etc..

Based on the same inventive concept, the embodiment of the present application further provides a linear programming-based neural network bias processing apparatus for storing and computing a monolithic chip, which can be used to implement the methods described in the above embodiments, as described in the following embodiments. The principle of solving the problems by the linear programming based neural network bias processing device for the memory and computation integrated chip is similar to that of the method, so the implementation of the linear programming based neural network bias processing device for the memory and computation integrated chip can refer to the implementation of the method, and repeated parts are not described again. As used hereinafter, the term "unit" or "module" may be a combination of software and/or hardware that implements a predetermined function. Although the means described in the embodiments below are preferably implemented in software, an implementation in hardware, or a combination of software and hardware is also possible and contemplated.

Fig. 3 is a block diagram of a linear programming-based neural network bias processing apparatus for storing a monolithic chip according to an embodiment of the present invention. The linear programming-based neural network bias processing device for the memory-computation-integrated chip comprises: aparameter acquisition module 10 and alinear solving module 20.

Theparameter acquisition module 10 acquires input sample data, weight data, bias data and hardware parameters of the target storage and calculation integrated chip of the target neural network layer;

thelinear solving module 20 is used for solving the input sample data, the weight data, the bias data and the hardware parameter input into a pre-established linear programming solving model to obtain bias high-order data and bias low-order data; wherein,

The apparatuses, modules or units illustrated in the above embodiments may be implemented by a computer chip or an entity, or implemented by a product with certain functions. A typical implementation device is an electronic device, which may be, for example, a personal computer, a laptop computer, a cellular telephone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.

In a typical example, the electronic device specifically includes a memory, a processor, and a computer program stored on the memory and executable on the processor, and the processor executes the program to implement the steps of the neural network mapping method for storing a monolithic chip described above.

Referring now to FIG. 4, shown is a schematic diagram of an electronic device 600 suitable for use in implementing embodiments of the present application.

As shown in fig. 4, the electronic apparatus 600 includes a Central Processing Unit (CPU)601 that can perform various appropriate jobs and processes according to a program stored in a Read Only Memory (ROM)602 or a program loaded from astorage section 608 into a Random Access Memory (RAM)) 603. In the RAM603, various programs and data necessary for the operation of the system 600 are also stored. The CPU601, ROM602, and RAM603 are connected to each other via abus 604. An input/output (I/O)interface 605 is also connected tobus 604.

The following components are connected to the I/O interface 605: aninput portion 606 including a keyboard, a mouse, and the like; anoutput portion 607 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; astorage section 608 including a hard disk and the like; and acommunication section 609 including a network interface card such as a LAN card, a modem, or the like. Thecommunication section 609 performs communication processing via a network such as the internet. Thedriver 610 is also connected to the I/O interface 605 as needed. Aremovable medium 611 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on thedrive 610 as necessary, so that a computer program read out therefrom is mounted as necessary on thestorage section 608.

In particular, the processes described above with reference to the flowcharts may be implemented as a computer software program according to an embodiment of the present invention. For example, an embodiment of the present invention includes a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the above-described linear programming-based neural network bias processing method for storing a monolithic chip.

In such an embodiment, the computer program may be downloaded and installed from a network through thecommunication section 609, and/or installed from theremovable medium 611.

Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.

For convenience of description, the above devices are described as being divided into various units by function, and are described separately. Of course, the functionality of the various elements may be implemented in the same one or more pieces of software and/or hardware in the practice of the present application.

The present invention has been described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The application may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The application may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.

All the embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from other embodiments. In particular, for the system embodiment, since it is substantially similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.

The above description is only an example of the present application and is not intended to limit the present application. Various modifications and changes may occur to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the scope of the claims of the present application.

Claims

1. A linear programming-based neural network bias processing method for a memory-computer integrated chip is characterized by comprising the following steps:

2. The linear programming-based neural network bias processing method for a computationally integrated chip of claim 1, wherein the weight data includes: the weight array high-order data, the weight array low-order data and the scaling coefficient; the hardware parameters of the target storage and calculation integrated chip comprise: the input and output bit width of the flash memory unit array, the bit width of the digital domain and the maximum row number of the offset array.

3. The linear programming-based neural network bias processing method for the storage-computation-integrated chip according to claim 2, wherein the linear programming solving an objective function of a model comprises:

the input sample data and the input data are combinedThe sum of the multiplication and addition operation result of the low-order data of the weight array and the bias low-order data is the least, and the total saturation truncation times of the quotient divided by the scaling coefficient is the least, wherein, if the quotient is lower than the saturation truncation lower limit or higher than the saturation truncation upper limit, the saturation truncation lower limit is-2^n-1 The upper limit of saturation cut-off is 2^n-1 -1, n is the input and output bit width;

4. The linear programming-based neural network bias processing method for the storage-computation-integrated chip according to claim 3, wherein the constraints of the linear programming solution model include:

the bias low data is located between the bias lower limit and the bias upper limit;

the bias high-level data is positioned between the lower limit of the bias high-level and the upper limit of the bias high-level, and the upper limit of the bias high-level is as follows: 2^w-1 -1, the lower limit of the offset high is-2^w-1 W is the bit width of the digital domain;

the sum of the offset high order data multiplied by a scaling factor and the offset low order data is equal to the offset data.

5. A linear programming-based neural network bias processing device for storing and calculating a unified chip is characterized by comprising the following components:

the bias low-bit data is used for mapping to a flash memory unit array of a simulation domain of the target storage and calculation integrated chip to participate in simulation operation; and the bias high-order data is used for being stored in a digital domain of the target storage and calculation integrated chip to participate in digital operation and then combined with an analog operation result output by an analog domain.

6. A computing integrated chip, comprising: the analog domain and the digital domain; the analog domain includes: the analog domain is used for executing matrix multiply-add operation of input data and the weight low-bit data, summation operation of a matrix multiply-add operation result and the bias low-bit data, and division operation of the summation result and a scaling parameter; pre-storing bias high-bit data and the scaling parameter in the digital domain, and summing a division operation result of the weight high-bit data and the scaling parameter with a division operation result output by the analog domain; wherein the bias low data and the bias high data are generated according to the linear programming-based neural network bias processing method of any one of claims 1 to 4.

7. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the steps of the method of any one of claims 1 to 4 when executing the program.

8. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the linear programming based neural network bias processing method of any one of claims 1 to 4.