Disclosure of Invention
Aiming at the problems in the prior art, the invention provides a neural network mapping method, a device and equipment for a memory integrated chip, which can at least partially solve the problems in the prior art.
In order to achieve the above purpose, the present invention adopts the following technical scheme:
in a first aspect, a neural network mapping method for a memory integrated chip is provided, including:
Mapping and sequencing all layers according to the Bias minimum line number and the weight matrix corresponding to all layers of the neural network to be mapped;
according to the mapping and sorting result, sequentially arranging weight matrixes corresponding to all layers into a main array of the integrated memory chip, and arranging corresponding Bias into positions corresponding to the arrangement position columns in the Bias array of the integrated memory chip according to the arrangement positions of the weight matrixes and the minimum number of Bias rows;
wherein, when the weight matrixes corresponding to the layers are sequentially arranged, the weight matrixes are arranged according to a snake-shaped sequence.
Further, the main array comprises a plurality of storage and calculation integrated unit blocks distributed in an array, and weight matrixes corresponding to all layers are arranged on the storage and calculation integrated unit blocks in the main array;
The method comprises the step of sequentially polling the storage and calculation integrated unit blocks in the main array according to the serpentine sequence for the weight matrix corresponding to each layer of neural network to find the arrangement position.
Further, the mapping and sorting the layers according to the Bias minimum line number and the weight matrix corresponding to the layers of the neural network to be mapped includes:
Mapping and sequencing the layers according to the Bias minimum line number corresponding to the layers;
And mapping and sequencing the layers with the same Bias minimum line number according to the column number of the weight matrix.
Further, the neural network mapping method for the memory integrated chip further comprises the following steps of;
And writing the weight matrix and Bias of each layer of the neural network to be mapped into the memory integrated chip according to the arrangement result.
Further, the sequentially polling the memory integrated unit blocks in the main array in a serpentine order to find the arrangement position includes:
Sequentially polling whether the storage and calculation integrated unit blocks in the main array have positions meeting the current layer arrangement condition according to a serpentine sequence;
if yes, arranging the weight matrix of the current layer to a position meeting the arrangement condition of the current layer;
If not, continuing to poll the next integrated unit block until finding out the position meeting the arrangement condition;
The arrangement condition is that the weight matrix is arranged side by side with the weight matrix already arranged on the current storage integrated unit block in the current arrangement period and can accommodate the weight matrix of the current layer.
Further, the weight matrix is arranged from the next column of the non-idle columns.
Further, the sequentially polling the memory integrated unit blocks in the main array according to a serpentine sequence to find the arrangement position, further includes:
If the positions meeting the arrangement conditions are not found after all the storage and calculation integrated unit blocks are polled according to the snake-shaped sequence, returning the first storage and calculation integrated unit block, and entering the next arrangement period:
judging whether the idle position of the first calculation integrated unit block can accommodate the weight matrix of the current layer;
if yes, arranging the weight matrix of the current layer at the idle position of the first storage integrated unit block;
If not, sequentially polling all the storage and calculation integrated unit blocks according to the serpentine sequence until the storage and calculation integrated unit blocks capable of accommodating the weight matrix are found, and arranging the weight matrix of the current layer at the idle position of the storage and calculation integrated unit blocks;
wherein, when arranging the weight matrix, the weight matrix is arranged from the next row of the non-idle rows.
Further, when the corresponding Bias is arranged at the position corresponding to the arrangement position column in the Bias array of the integrated memory chip, the Bias is arranged at the next row of the non-idle rows in the position corresponding to the column.
Further, the neural network mapping method for the memory integrated chip further comprises the following steps:
Expanding the arrangement of the Bias according to the Bias arrangement result and the idle condition of the Bias array to obtain a Bias final arrangement result.
Further, the expanding the Bias arrangement according to the Bias arrangement result and the idle condition of the Bias array includes:
judging whether the occupied line numbers of all Bias can be multiplied according to the idle line numbers of the Bias array;
if yes, expanding the number of occupied lines of all Bias by multiple times;
if not, bias is selected for expansion according to a preset rule.
Further, the neural network mapping method for the memory integrated chip further comprises the following steps:
Dividing the integrated memory unit array of the integrated memory chip into a main array and a Bias array;
The main array is divided into a plurality of memory integrated unit blocks.
Further, the neural network mapping method for the memory integrated chip further comprises the following steps:
acquiring parameters of a neural network to be mapped and parameters of a target memory integrated chip, wherein the parameters of the neural network to be mapped comprise weight matrixes and Bias corresponding to all layers;
and obtaining the minimum number of the Bias lines corresponding to each layer according to the Bias of each layer and the parameters of the integrated memory chip.
The second aspect provides a memory-calculation integrated chip, which comprises a memory-calculation integrated unit array for executing neural network operation, wherein the memory-calculation integrated unit array comprises a main array and a Bias array, wherein a weight matrix corresponding to each layer of neural network is mapped in the main array;
The weight matrix corresponding to each layer is ordered based on the minimum number of the Bias rows and the number of columns of the weight matrix, and is arranged in a serpentine shape on the main array according to the ordering result, and the Bias is arranged to a position corresponding to the arrangement position column of the corresponding weight matrix in the Bias array.
Further, the main array comprises a plurality of storage and calculation integrated unit blocks distributed in an array, and weight matrixes corresponding to all layers are arranged on the storage and calculation integrated unit blocks in the main array;
and sequentially polling the storage and calculation integrated unit blocks in the main array according to a serpentine sequence aiming at the weight matrix corresponding to each layer of neural network so as to find the corresponding arrangement positions.
Further, the principle of the sorting is as follows:
Mapping and sequencing the layers according to the Bias minimum line number corresponding to the layers;
And mapping and sequencing the layers with the same Bias minimum line number according to the column number of the weight matrix.
Furthermore, the Bias arrangement mode expands the number of lines occupied by each Bias based on the minimum number of lines of the Bias and the idle condition of the Bias array on the basis of the positions corresponding to the arrangement position columns of the corresponding weight matrix in the Bias array.
The third aspect provides a memory-calculation integrated chip, which comprises a memory-calculation integrated unit array for executing neural network operation, wherein the memory-calculation integrated unit array comprises a main array and a Bias array, wherein a weight matrix corresponding to each layer of neural network is mapped in the main array;
The weight matrix and the corresponding Bias arrangement mode are generated according to the neural network mapping method.
In a fourth aspect, a neural network mapping apparatus for a memory integrated chip is provided, including:
the sequencing module is used for mapping and sequencing the layers according to the Bias minimum line number and the weight matrix corresponding to the layers of the neural network to be mapped;
the arrangement module is used for sequentially arranging the weight matrixes corresponding to all layers into a main array of the integrated memory chip according to the mapping and sorting result, and arranging the corresponding Bias into the position corresponding to the arrangement position column in the Bias array of the integrated memory chip according to the arrangement position of the weight matrixes and the minimum number of Bias rows;
wherein, when the weight matrixes corresponding to the layers are sequentially arranged, the weight matrixes are arranged according to a snake-shaped sequence.
In a fifth aspect, an electronic device is provided, including a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the neural network mapping method described above when the program is executed.
In a sixth aspect, a computer readable storage medium is provided, on which a computer program is stored which, when executed by a processor, implements the steps of the neural network mapping method described above.
The neural network mapping method, device and equipment for the integrated memory chip comprise the steps of mapping and sorting all layers according to the minimum Bias line numbers and the weight matrixes corresponding to all layers of the neural network to be mapped, sequentially arranging the weight matrixes corresponding to all layers into a main array of the integrated memory chip according to the mapping and sorting result, and arranging the corresponding Bias to the positions corresponding to the arrangement position columns in the Bias array of the integrated memory chip according to the arrangement positions of the weight matrixes and the minimum Bias line numbers, wherein the weight matrixes corresponding to all layers are arranged in a serpentine sequence when the weight matrixes corresponding to all layers are arranged in sequence. Through the serpentine arrangement after sequencing each layer, the storage and calculation integrated unit is effectively utilized, and the size of the storage and calculation integrated unit array can be reduced under the same operation scale.
In addition, in the embodiment of the invention, the number of lines occupied by each Bias is expanded based on the minimum number of lines of the Bias and the idle condition of the Bias array, so that the offset value on a single memory integrated unit is reduced, the current noise is reduced, and the operation precision is improved.
The foregoing and other objects, features and advantages of the invention will be apparent from the following more particular description of preferred embodiments, as illustrated in the accompanying drawings.
Detailed Description
In order that those skilled in the art will better understand the present application, a technical solution in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present application without making any inventive effort, shall fall within the scope of the present application.
It will be appreciated by those skilled in the art that embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
It is noted that the terms "comprises" and "comprising," and any variations thereof, in the description and claims of the present application and in the foregoing figures, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed or inherent to such process, method, article, or apparatus.
It should be noted that, without conflict, the embodiments of the present application and features of the embodiments may be combined with each other. The application will be described in detail below with reference to the drawings in connection with embodiments.
FIG. 1 shows a flowchart of a neural network mapping method for a memory chip according to an embodiment of the present invention, and as shown in FIG. 1, the neural network mapping method for a memory chip may include the following:
Step S100, mapping and sequencing all layers according to the Bias minimum line number and the weight matrix corresponding to all layers of the neural network to be mapped;
for a particular neural network that has been trained, the weight matrix parameters for each layer, as well as Bias parameters, bias size, and the minimum number of rows occupied on the chip have been obtained. The neural networks of all layers are ordered according to the minimum number of Bias lines and the number of columns of the weight matrix, and the order is used as the mapping arrangement order.
Step 200, according to the mapping and sorting result, sequentially arranging weight matrixes corresponding to all layers into a main array of an integrated memory chip;
wherein, when the weight matrixes corresponding to the layers are sequentially arranged according to the sequence obtained in the step S100, the weight matrixes are arranged according to a serpentine sequence.
Specifically, in the case where the calculation integrated chip inputs the rows of the calculation integrated units and outputs the columns of the calculation integrated units (based on this premise, those skilled in the art can understand that the rows and columns of the calculation integrated units are only one relative concept, and in specific applications, the rows and columns of the calculation integrated units may also be input as columns and output as rows, and the principle is the same, which is not described in detail herein), when the weight matrix is arranged, for each block (divided into a plurality of calculation integrated unit blocks in the main array and arranged in the array of the calculation integrated unit blocks) arranged in the main array, the first row (the rows of the blocks, but not the rows of the calculation integrated units) may be arranged in the X direction, the second row is arranged in the X direction, the third row is arranged in the X direction, the fourth row is arranged in the serpentine manner of the same shape, and the first row is also in the X direction, the second row is arranged in the X direction, the third row is arranged in the X direction, the fourth row is arranged in the serpentine shape, and the fourth row is arranged in the X direction, and the fourth row is arranged in the serpentine manner from the last row, the last row is arranged in the X direction, and the first row is the last in the X direction, and the fourth row is the serpentine manner, and the fourth row is the last from the last row is in the X direction, and the first row is the last row is the first row in the s is the s direction and the s is the s in the s is the s direction the s is the s.
The integrated memory unit may be a flash memory unit.
Step S300, according to the arrangement position of the weight matrix and the minimum number of Bias rows, arranging the corresponding Bias to the position corresponding to the arrangement position column in the Bias array of the integrated memory chip;
Specifically, due to the specificity of calculation of the memory integrated chip, the weight matrix is aligned with the corresponding Bias required column, so that the arrangement position of the Bias is aligned with the arrangement position column of the corresponding weight matrix, and the arrangement mode in the row direction is arranged according to the minimum row number of the Bias.
Generating an arrangement scheme through the steps S100-S300, writing parameters of each layer of the neural network into an integrated unit array of an integrated memory chip through a compiling tool according to the arrangement scheme, in an application reasoning stage, according to the arrangement scheme and the control requirement, gating a corresponding weight matrix of the layer of neural network operation and a row and column where Bias is located through a row and column decoder when executing the layer of neural network operation, inputting input signals of the layer of neural network into a row corresponding to the weight matrix, performing matrix multiplication and addition operation with the weight matrix, and then superposing with the corresponding Bias to obtain a calculation result of the layer of neural network in a corresponding column.
Through adopting above-mentioned technical scheme, through ordering each layer neural network parameter back, arrange according to the snakelike, effective complementation, for example order according to order line number 9, 8, 7, 6, 5, 4, 3, 2, 1, after carrying out the snakelike arrangement, with the correspondence of less line number and more line number, just so can reduce the occupation line number in the Bios array, so as to provide more line numbers for the extension, the utilization efficiency of calculating the integrative cell array has been increased, under the same operation scale, the integrative cell array of calculating that needs can greatly reduce, adapt to the miniaturized demand of chip. In an alternative embodiment, the neural network mapping method for the integrated memory chip further comprises dividing an integrated memory cell array of the integrated memory chip into a main array and a Bias array, and dividing the main array into a plurality of integrated memory cell blocks. The method can refer to the use scene and the corresponding neural network scale for dividing, and can guarantee the use performance on the basis of effectively improving the resource utilization rate.
Specifically, as shown in (a) of fig. 2, the actual physical architecture of the chip is composed of a main array and a Bias array. The applicant finds that in the practical application process, since the current is too large in the analog calculation process and can significantly affect the calculation result, in the embodiment of the present invention, the array may be logically partitioned, as shown in (b) in fig. 2, the main array may be divided into 2×4 blocks, and only by way of example, in the practical application, the Bias array may be divided into blocks or not, and in the embodiment of the present invention, the two-dimensional array is used to mark each block in the main array for convenience of description, starting from (0, 0). If the chip is small, the main array may not be diced, but due to the generally large network size, in most scenarios, dicing is necessary in practice. It should be added that the division is performed according to the actual performance of the chip, and each block that can be divided is the same in size, or each block that can be divided is different in size, which is not limited in the embodiment of the present invention, but the sizes of the layers of the neural network are considered before division, so that the layer with the largest occupied space in the neural network can be mapped in one block.
In an optional embodiment, the neural network mapping method may further include writing the weight matrix and Bias of each layer of the neural network to be mapped on the integrated memory chip according to the arrangement result.
Specifically, the mapping method is executed on a tool chain, and can be understood as a program running on a terminal device, a server or a chip burning device, an arrangement scheme is generated through the mapping method, and then a weight matrix and Bias are required to be written into the integrated memory chip according to the arrangement scheme. The integrated memory chip can be installed on a corresponding device circuit board for reasoning application, so that neural network operation can be realized, for example, the integrated memory chip can be installed on a toy for voice recognition, at the moment, the neural network parameters written in the chip are parameters corresponding to the voice recognition neural network, of course, the integrated memory chip can also be installed on face recognition equipment, the neural network parameters written in the chip are parameters corresponding to the image recognition neural network, of course, the application scenes of the chip are only exemplified, the application scenes of the chip are not limited, and the integrated memory chip can be various devices and scenes needing to carry out the neural network operation.
In an alternative embodiment, referring to fig. 3, the step S100 may include the following:
step S110, mapping and sequencing the layers according to the Bias minimum line number corresponding to the layers;
and step S120, mapping and sequencing the layers with the same Bias minimum line number according to the column number of the weight matrix.
Specifically, referring to (a) in fig. 4, parameters of a neural network are shown, where the upper row of 5 matrices is a weight matrix corresponding to 5 neural network layers, and the lower row of 5 matrices is Bias corresponding to 5 neural network layers, where 1024×128 represents the scale of the weight matrix, 1024 columns and 128 rows, and the corresponding Bias is 1024×2, which represents 1024 columns and the minimum number of occupied rows is 2 rows, where the number of columns of the weight matrix corresponding to one layer of neural network, the number of columns corresponding to Bias, is the same, due to the operational characteristics in the memory integrated chip.
When the layers of the neural network are ordered, the layers can be ordered according to the minimum number of Bias lines corresponding to the layers, referring to (b) in fig. 4, the layers are ordered according to the minimum number of Bias lines from large to small, and of course, in practical application, the layers can be ordered according to the minimum number of Bias lines from small to large, and the principle is the same.
In addition, for the two Bias minimum lines 1024×2 and 768×2, the layers with the same Bias minimum line number are partially sorted from large to small according to the number of columns of the weight matrix, and the final sorting result is shown in fig. 4 (b).
It should be noted that, in the embodiment of the present invention, for a specific trained neural network, the Bias size of each layer and the minimum number of rows occupied on the chip are obtained, firstly, the sequence is performed from large to small according to the minimum number of rows occupied by the Bias of each layer, and then, the partial sequence is performed again on each layer with the same number of rows occupied by the Bias according to the number of columns from small to large. The first objective of this is to give priority to the problem of reducing the size of the Bias of each physical row in the network mapping process, because this can improve the accuracy of the chip calculation, while the input received by the embodiment of the invention contains the minimum number of rows occupied by the Bias of each layer, and after mapping according to this number of rows, the Bias rows on the array generally have unmapped space, and if m rows are used to carry the Bias value originally set for carrying with 1 row, the Bias value designed becomes 1/m of the original value on each row. Secondly, in the application, the number of input paths of each layer of the network is small, and the number of output paths is large, as shown in fig. 4, for the illustrated network layer, the input is m rows, the output is n columns, and often m is smaller than n, so that in order to better ensure that the whole network can be accommodated in a limited space area, the size of each layer in the X direction is ordered from large to small, and on the premise that the Bias size is used as a first priority, each layer of the network is placed with the size of the X direction as a second priority.
In an alternative embodiment, the main array includes a plurality of storage and calculation unit blocks distributed in an array, referring to fig. 2, when the weight matrixes corresponding to each layer are arranged on the storage and calculation unit blocks in the main array, the weight matrixes corresponding to each layer are sequentially arranged according to a serpentine sequence, and specifically, the method includes sequentially polling the storage and calculation unit blocks in the main array according to the serpentine sequence for the weight matrixes corresponding to each layer of neural network to find the arrangement position.
The method comprises the steps of sequentially polling whether the storage and calculation integrated unit blocks in the main array have positions meeting the arrangement condition of the current layer according to a serpentine sequence, wherein the arrangement condition is that the storage and calculation integrated unit blocks are parallel to the weight matrix already arranged on the current storage and calculation integrated unit blocks in the current arrangement period and can accommodate the weight matrix of the current layer.
If yes, arranging the weight matrix of the current layer to a position meeting the arrangement condition of the current layer;
If not, continuing to poll the next integrated unit block until finding out the position meeting the arrangement condition;
It should be noted that, in order to minimize resource idling, when a position satisfying the condition is found, the weight matrix is arranged from the next column of the non-idle columns, and of course, in a special application occasion, or in order to reduce interference, a certain number of columns may be spaced, and specifically, the weight matrix may be selected according to the actual application requirement.
And if the positions meeting the arrangement conditions are not found after all the storage and calculation integrated unit blocks are polled according to the serpentine sequence, returning to the first storage and calculation integrated unit block, and entering the next arrangement period.
In a new arrangement period, firstly judging whether the idle position of the first integrative unit block can accommodate the weight matrix of the current layer, if so, arranging the weight matrix of the current layer at the idle position of the first integrative unit block, otherwise, sequentially polling all integrative unit blocks according to a serpentine sequence until the integrative unit block capable of accommodating the weight matrix is found, and arranging the weight matrix of the current layer at the idle position of the integrative unit block;
it should be noted that, in order to minimize resource idling, when a position satisfying the condition is found, the weight matrix is arranged from the next row of the non-idle rows, and of course, in a special application occasion, or in order to reduce interference, a certain number of rows may be spaced, and specifically, the selection may be performed according to the actual application requirement.
By adopting the technical scheme, after the ordered weight matrix is arranged on the main array, according to the principle that the weight matrix corresponds to the corresponding Bias column, when the corresponding Bias is arranged at the position corresponding to the arrangement position column in the Bias array of the integrated memory chip, the Bias is arranged to the next row of non-idle rows in the position corresponding to the column.
In order for those skilled in the art to better understand the embodiments of the present invention, the mapping arrangement process will be described in detail with reference to fig. 5.
First, taking a neural network of 20 layers as an example, each layer of the network is mapped onto an integrated cell array of an integrated chip. After the sequencing is completed, each layer of the neural network is recorded as 0-19 according to the sequence, the weight matrix is numbered as D0-D19, the Bias sequence is numbered as B0-B19, the weight matrices D0-D19 are firstly arranged, then in the example, the Bias B0-B19 are arranged according to the arrangement result of the weight matrices, when the arrangement is performed, in the main array, the first row of storage and calculation integrated unit blocks are arranged according to the left-to-right sequence, the second row of storage and calculation integrated unit blocks are arranged according to the right-to-left sequence, the third row of storage and calculation integrated unit blocks are arranged according to the left-to-right sequence, and then the first row of storage and calculation integrated unit blocks are returned to perform polling according to the serpentine circulation sequence;
For example, for D0, first poll block (0, 0), there is a position that can hold the weight matrix of the current layer, at which time D0 is arranged at (0, 0); for D1, first poll the block (0, 0), there are positions satisfying the condition, arrange D1 at (0, 0) and side by side with D0; for D2, firstly polling the block (0, 0), and then polling the block (0, 1) and arranging the D2 in the (0, 1) in sequence when the D2 cannot be contained in the position which is parallel to the D0, D1 on the (0, 0) block; for D3, the block (0, 0) is polled first, D3 cannot be accommodated in the position of the block (0, 0) side by side with D0, D1, then the block (0, 1) is polled, D3 cannot be accommodated in the position of the block (0, 1) side by side with D2, then D3 is arranged in the (1, 1) according to a serpentine shape, for D4, the block (0, 0) is polled first, D4 cannot be accommodated in the position of the block (0, 0) side by side with D0, D1), then the block (0, 1) is polled, D4 cannot be accommodated in the position of the block (0, 1) side by side with D2, then D4 is arranged in the (1, 1) side by side with D3 according to a serpentine shape, for D5, the block (0, 0) is polled first, D3 is arranged in the position of the block (0, 0) side by side with D0, D1) side with D5, D5 is arranged in the position of the block (0, 1) side by side with D5 is arranged in the side by side with D4, then D5 is arranged in the order of the block (0, 1) side with D1) side by side with D4, and then D4 is arranged in the order of the block (0, 1, 16) is arranged by side with respect to D1, and then by side with D1 is arranged in the order, and if the position (the position which can accommodate D16 and is parallel to the matrix before the present arrangement period) of each block which does not meet the condition is not met, the process jumps to (0, 0) and enters the next arrangement period.
In a new arrangement period, firstly judging whether the idle position of (0, 0) can accommodate D16, if so, arranging D16 (0, 0), and when arranging, starting from the next row of the non-idle row, namely starting from the next row of D1, in this example, arranging D17-D19, see D1-D5, and not being repeated here.
It should be noted that in an alternative embodiment, D17 first polls (0, 0) for a free position to accommodate D17, where the free position includes a position in the row where D0 polls (i.e., the row where the previous arrangement period is located), polls sequentially, and then polls whether there is a position to the right of D16 in the current arrangement period to accommodate D17, and so on, the principle of arrangement is that two layers of networks are adjacent in the X direction in an array block in a round of mapping, cannot be staggered, but each layer of network is placed from top to bottom in the Y direction, as long as it can be put down. Each void on the array is therefore under consideration.
In another alternative embodiment, D17 first polls (0, 0) if the idle position to the right of D16 can accommodate D17, where the position of the row in which D0 was located (i.e., the row in which the last permutation period was located) is not polled during the polling, and only the position of the row in which the current period was located is polled. It is noted that on one block, the preceding matrix is in front of the succeeding matrix, and the preceding and succeeding matrices are determined in the order of polling, e.g. for the first row of blocks (0, 1) and (0, 1), the left position is in front of the right position, in one block the preceding matrix is in front of the right position, e.g. D0 is in left of D1, for the second row of blocks (1, 0) and (1, 1), the right position is in front of the left position, in one block of the second row the preceding matrix is in front of the right position, e.g. D3 is in right of D4.
It should be noted that, the mapping is performed in a serpentine sequence from left to right (0 th row block to even row block), from right to left (0 th row block to even row block), from top to bottom as a whole, and after one round is finished, the process is repeated back to the (0, 0) th block until the whole neural network is mapped. (the general principle above is merely an example, in the actual process, the 0 th row from right to left or the whole is arranged according to the order of snaking from the lowermost row of the array to the uppermost row of the array, but the snaking principle must be followed, and the snaking arrangement must be adopted, but the snaking arrangement cannot be adopted, which is the principle in the case of taking the rows as input and the columns as output, and it will be understood by those skilled in the art that if taking the rows as output, the columns as input, the contrary is not needed, the embodiments of the present invention arrange the layers of the neural network in the order from large to small in the above step, the snaking arrangement basically ensures that the most and least occupied Bias rows correspond in the Y direction, the more and less Bias occupies the more and less Bias rows also correspond in the Y direction, so that the Bias of each layer on the final Bias array is distributed upwards, or mapped in the order from bottom to top, the greater space is provided for the subsequent Bias expansion, and the calculation result is provided for the chip is higher.
The following problems need to be addressed in the main array mapping process. Unless specifically required, adjacent blocks are based on seamless connection, and the purpose of seamless connection is to fully utilize array space in the early stage of mapping as much as possible, because each layer must be put into the array in a whole, if gaps are left between layers, and the gaps are difficult to utilize, the later stage of mapping may be difficult to put down. In a round of mapping, each layer looks for positions in a serpentine order starting from the (0, 0) th block, and if the current block position is insufficient, the next block is considered. In one round of mapping, if there is a layer that is not placed in all blocks, a new round of mapping is entered. In Bias mapping, only the corresponding weight position and the preset minimum occupied line number are needed to be mapped on the Bias array from top to bottom, and the numbers in brackets behind each Bias in fig. 5 indicate the occupied line number, for example, B1 (9) indicates that the occupied line number of B1 is 9.
In an optional embodiment, the neural network mapping method for the integrated memory chip further comprises the steps of obtaining parameters of a neural network to be mapped and parameters of a target integrated memory chip, wherein the parameters of the neural network to be mapped comprise weight matrixes and Bias corresponding to all layers, and obtaining the minimum number of Bias lines corresponding to all layers according to the Bias of all layers and the parameters of the integrated memory chip.
Specifically, after the neural network model is determined, the weight array and the Bias value of each layer are known, and the Bias minimum line number of each layer is calculated according to the Bias value and the parameter (attribute of each line of Bias) of the target chip.
The minimum number of rows of Bias can be given in advance by engineers in the circuit aspect according to the precision requirement, generally, the minimum number of rows can meet the worst precision requirement as a standard, and can also be calculated according to a preset rule, which is not repeated in the embodiment of the present invention.
In an optional embodiment, the neural network mapping method for the integrated memory chip may further include expanding the Bias arrangement according to the Bias arrangement result and the idle condition of the Bias array to obtain a Bias final arrangement result.
Specifically, after the arrangement is completed, the Bias array may have free rows, so as to reduce the Bias values stored in all or part of the memory integrated units in the Bias array as far as possible, the arrangement of the Bias needs to be secondarily expanded by using the free rows in the Bias array to obtain a final arrangement scheme, and then the neural network parameters are written into the memory integrated chips according to the final arrangement scheme.
Because the integrated memory chip essentially adopts analog calculation, the larger the Bias value of each integrated memory unit on the Bias array is, the larger the noise generated by final calculation is, and the oversized noise introduced by oversized Bias can have decisive influence on calculation precision, so that the actual number of lines of the Bias array occupied by one line of Bias on logic can be expanded as much as possible according to the size of the array, if the occupied actual number of lines is m, the Bias size stored on each line is 1/m of the size of the logic Bias, and the calculation precision is improved.
In an alternative embodiment, expanding the Bias arrangement according to the Bias arrangement result and the idle condition of the Bias array includes:
judging whether the occupied line numbers of all Bias can be multiplied according to the idle line numbers of the Bias array;
if yes, expanding the number of occupied lines of all Bias by multiple times;
if not, bias is selected for expansion according to a preset rule.
Specifically, while mapping weights of each layer of the neural network onto the main array, preliminary mapping of each layer of Bias is completed on the Bias array according to the minimum number of lines occupied by each layer of Bias given in advance, and the mapping result is shown in (a) of fig. 6. Then, the expansion is performed according to the maximum integral multiple of the expansion of the mapped Bias rows, if the total row number of the Bias array is 10 rows and 3 rows are mapped, each row is expanded to be 3 times of the original row, and if 4 rows are mapped, each row is expanded to be 2 times of the original row. For the array arrangement shown in fig. 6 (a), the expansion can be doubled completely, and the result is shown in fig. 6 (b). Then, if there are more free rows in the Bias array, the mapped Bias rows equal to the free rows are selected from the bottom up, and these Bias rows are expanded into two rows, and if there are more than 2 free rows, the third and fourth rows are expanded into two rows from the bottom up, respectively, as shown in fig. 6 (c). Wherein, the idle line refers to a line which is not mapped with Bias in the whole line, and the line which is partially mapped with Bias calculates the mapped Bias line.
In general, the principle of expansion is to utilize the resources of the Bias array as much as possible, reduce idle rows, expand Bias corresponding to all layers if Bias corresponding to all layers can be expanded, expand Bias corresponding to part of layers if Bias corresponding to all layers cannot be expanded, and expand Bias of part of rows if Bias corresponding to part of layers cannot be expanded.
When expanding the Bias corresponding to the partial layer, the Bias corresponding to the partial layer may be selected from the front to the back or from the back to the front according to the above-mentioned order of ordering the layers, or the Bias corresponding to the partial layer may be expanded according to a preset priority, for example, the Bias corresponding to the layer with a great influence on the precision may be preferentially expanded, or the Bias corresponding to the layer with a great Bias value corresponding to a single calculation integrated unit may be expanded after preliminary mapping, and specifically selected according to the actual application requirement.
In summary, the embodiment of the invention provides a method for automatically mapping the neural network weight to the integrated memory chip, which can be integrated into a tool chain of the integrated memory chip design, thereby providing convenience for the integrated memory chip user.
The method comprises the steps of carrying out distribution on Bias, wherein the Bias is stored in a single Bias array, and the Bias value stored in the single Bias array is reduced by expanding the distribution mode of Bias, so that on one hand, the Bias value stored in the Bias array is fully utilized, the resource utilization rate is improved, the resource idling is reduced, on the other hand, the Bias value stored in the single Bias array is reduced, the conductance is reduced, the current noise is reduced, the operation precision is improved, and in addition, after the parameters of the neural network of each layer are sequenced, the parameters are arranged according to snakelike, the utilization rate of the Bias array is effectively complemented, and under the same operation scale, the required Bias array can be greatly reduced, so that the method is suitable for the chip miniaturization requirement.
The embodiment of the invention also provides a memory integrated chip, which comprises a memory integrated unit array for executing the operation of the neural network, wherein the memory integrated unit array comprises a main array and a Bias array, wherein the main array is mapped with a weight matrix corresponding to each layer of the neural network;
The weight matrix corresponding to each layer is ordered based on the minimum number of the Bias rows and the number of columns of the weight matrix, and is arranged in a serpentine shape on the main array according to the ordering result, and the Bias is arranged to a position corresponding to the arrangement position column of the corresponding weight matrix in the Bias array.
In an alternative embodiment, the main array comprises a plurality of storage and calculation integrated unit blocks distributed in an array, and weight matrixes corresponding to all layers are arranged on the storage and calculation integrated unit blocks in the main array;
and sequentially polling the storage and calculation integrated unit blocks in the main array according to a serpentine sequence aiming at the weight matrix corresponding to each layer of neural network so as to find the corresponding arrangement positions.
In an alternative embodiment, the principle of ordering is that mapping ordering is performed on each layer according to the Bias minimum line number corresponding to each layer, and mapping ordering is performed on the layers with the same Bias minimum line number according to the column number of the weight matrix.
In an alternative embodiment, the Bias arrangement manner expands the number of lines occupied by each Bias based on the minimum number of lines of the Bias and the idle condition of the Bias array on the basis of the positions corresponding to the arrangement position columns of the corresponding weight matrix in the Bias array.
The memory integrated chip, the weight matrix and the corresponding Bias arrangement mode provided by the embodiment of the invention are generated according to the neural network mapping method.
It should be noted that the integrated memory chip provided by the embodiment of the invention can be applied to various electronic devices, such as smart phones, tablet electronic devices, network set-top boxes, portable computers, desktop computers, personal Digital Assistants (PDAs), vehicle-mounted devices, intelligent wearable devices, toys, intelligent home control devices, pipeline device controllers and the like. Wherein, intelligent wearing equipment can include intelligent glasses, intelligent wrist-watch, intelligent bracelet etc..
Based on the same inventive concept, the embodiment of the present application also provides a neural network mapping device for a memory integrated chip, which can be used to implement the method described in the above embodiment, as described in the following embodiment. The principle of solving the problem by the neural network mapping device for the integrated memory chip is similar to that of the above method, so that the implementation of the neural network mapping device for the integrated memory chip can be referred to the implementation of the above method, and the repetition is omitted. As used below, the term "unit" or "module" may be a combination of software and/or hardware that implements the intended function. While the means described in the following embodiments are preferably implemented in software, implementation in hardware, or a combination of software and hardware, is also possible and contemplated.
Fig. 7 is a block diagram of a neural network mapping device for a memory chip in an embodiment of the present invention. The neural network mapping device for the integrated memory chip comprises a sequencing module 10, a weight arrangement module 20 and a bias arrangement module 30.
The sequencing module performs mapping sequencing on each layer according to the Bias minimum line number and the weight matrix corresponding to each layer of the neural network to be mapped;
The weight arrangement module sequentially arranges weight matrixes corresponding to all layers into a main array of the integrated memory chip according to the mapping and sorting result;
The Bias arrangement module arranges the corresponding Bias to the position corresponding to the arrangement position column in the Bias array of the integrated memory chip according to the arrangement position of the weight matrix and the minimum number of Bias rows;
wherein, when the weight matrixes corresponding to the layers are sequentially arranged, the weight matrixes are arranged according to a snake-shaped sequence.
The apparatus, module or unit set forth in the above embodiments may be implemented in particular by a computer chip or entity, or by a product having a certain function. A typical implementation device is an electronic device, which may be, for example, a personal computer, a laptop computer, a cellular telephone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.
In a typical example, the electronic device includes a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor executes the program to implement the steps of the neural network mapping method for a memory card described above.
Referring now to fig. 8, a schematic diagram of an electronic device 600 suitable for use in implementing embodiments of the present application is shown.
As shown in fig. 8, the electronic apparatus 600 includes a Central Processing Unit (CPU) 601, which can perform various appropriate works and processes according to a program stored in a Read Only Memory (ROM) 602 or a program loaded from a storage section 608 into a Random Access Memory (RAM)) 603. In the RAM603, various programs and data required for the operation of the system 600 are also stored. The CPU601, ROM602, and RAM603 are connected to each other through a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.
Connected to the I/O interface 605 are an input section 606 including a keyboard, a mouse, and the like, an output section 607 including a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, a speaker, and the like, a storage section 608 including a hard disk, and the like, and a communication section 609 including a network interface card such as a LAN card, a modem, and the like. The communication section 609 performs communication processing via a network such as the internet. The drive 610 is also connected to the I/O interface 605 as needed. Removable media 611 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on drive 610 as needed, so that a computer program read therefrom is mounted as needed as storage section 608.
In particular, according to embodiments of the present invention, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, embodiments of the present invention include a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the neural network mapping method for a memory card described above.
In such an embodiment, the computer program may be downloaded and installed from a network through the communication portion 609, and/or installed from the removable medium 611.
Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.
For convenience of description, the above devices are described as being functionally divided into various units, respectively. Of course, the functions of each element may be implemented in the same piece or pieces of software and/or hardware when implementing the present application.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises an element.
It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The application may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The application may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.
In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for system embodiments, since they are substantially similar to method embodiments, the description is relatively simple, as relevant to see a section of the description of method embodiments.
The foregoing is merely exemplary of the present application and is not intended to limit the present application. Various modifications and variations of the present application will be apparent to those skilled in the art. Any modification, equivalent replacement, improvement, etc. which come within the spirit and principles of the application are to be included in the scope of the claims of the present application.