Movatterモバイル変換


[0]ホーム

URL:


CN113988277B - Neural network mapping method, device and equipment for memory and calculation integrated chip - Google Patents

Neural network mapping method, device and equipment for memory and calculation integrated chip

Info

Publication number
CN113988277B
CN113988277BCN202111184060.XACN202111184060ACN113988277BCN 113988277 BCN113988277 BCN 113988277BCN 202111184060 ACN202111184060 ACN 202111184060ACN 113988277 BCN113988277 BCN 113988277B
Authority
CN
China
Prior art keywords
bias
storage
neural network
array
layer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111184060.XA
Other languages
Chinese (zh)
Other versions
CN113988277A (en
Inventor
康卓栋
张爱飞
陆振亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Witinmem Technology Co ltd
Hangzhou Zhicun Computing Technology Co ltd
Original Assignee
Beijing Witinmem Technology Co ltd
Hangzhou Zhicun Computing Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Witinmem Technology Co ltd, Hangzhou Zhicun Computing Technology Co ltdfiledCriticalBeijing Witinmem Technology Co ltd
Priority to CN202111184060.XApriorityCriticalpatent/CN113988277B/en
Publication of CN113988277ApublicationCriticalpatent/CN113988277A/en
Application grantedgrantedCritical
Publication of CN113988277BpublicationCriticalpatent/CN113988277B/en
Activelegal-statusCriticalCurrent
Anticipated expirationlegal-statusCritical

Links

Classifications

Landscapes

Abstract

The embodiment of the invention provides a neural network mapping method, device and equipment for an integrated memory chip, which comprises the steps of mapping and sequencing all layers according to the minimum number of Bias lines and weight matrixes corresponding to all layers of the neural network to be mapped, sequentially arranging the weight matrixes corresponding to all layers into a main array of the integrated memory chip according to the mapping and sequencing result, arranging the corresponding Bias lines in the Bias array of the integrated memory chip according to the arrangement positions of the weight matrixes and the minimum number of Bias lines, arranging the weight matrixes corresponding to all layers according to a snake-shaped sequence, effectively utilizing the integrated memory unit when the weight matrixes corresponding to all layers are arranged in sequence, and expanding the number of lines occupied by all Bias lines based on the minimum number of Bias lines and the idle condition of the Bias array, reducing the Bias value on a single integrated memory unit, reducing current noise and improving the operation precision.

Description

Neural network mapping method, device and equipment for memory and calculation integrated chip
Technical Field
The present invention relates to the field of semiconductor technologies, and in particular, to a method, an apparatus, and a device for mapping a neural network for a memory integrated chip.
Background
In recent years, with the continuous development of three dimensions of algorithm, calculation power and data volume scale, the machine learning technology continuously shows strong advantages in solving various problems, wherein the artificial neural network is widely focused by people in the fields of image recognition, target detection, semantic segmentation and the like. However, as the size of the neural network is enlarged, the traditional mode of processing the neural network algorithm by using the CPU+GPU architecture gradually encounters the bottleneck of speed and power consumption, and the root of the bottleneck is that the neural network algorithm with data as the center brings excessive data transmission overhead to a computing system due to the separation of memory computation under the von Neumann architecture, so that the speed is reduced and the power consumption is increased.
The in-memory computing technology solves the problem caused by memory computation separation, and the data source expressed by voltage is fed into the array by storing the weight of the neural network on the conductance of a flash memory array node in a memory computation integrated (NPU) chip, so that the current output by the array is the product of the voltage and the conductance, and the matrix multiplication and addition operation of the weight of the data source and the network is completed, which is basically analog computation rather than traditional digital computation.
The design of the tool chain is an important link in the whole process from design to production of the integrated chip. In the tool chain design for the integrated memory chip, the technology of automatically mapping the weight parameter of a specific neural network onto the chip array according to the requirement is a key technology, when the trained neural network is mapped onto the integrated memory cell array of the integrated memory chip, the weights and the bias are mapped onto the integrated memory chip array in sequence according to the sequence of each layer of the neural network, on one hand, the integrated memory cell array scale is increased, on the other hand, as the bias is directly mapped onto the integrated memory chip array, the larger the value of the bias corresponds to the larger the conductance of the integrated memory cell, the larger the current of the integrated memory cell under the same voltage, and the larger the noise is further caused, and the operation precision is affected.
Disclosure of Invention
Aiming at the problems in the prior art, the invention provides a neural network mapping method, a device and equipment for a memory integrated chip, which can at least partially solve the problems in the prior art.
In order to achieve the above purpose, the present invention adopts the following technical scheme:
in a first aspect, a neural network mapping method for a memory integrated chip is provided, including:
Mapping and sequencing all layers according to the Bias minimum line number and the weight matrix corresponding to all layers of the neural network to be mapped;
according to the mapping and sorting result, sequentially arranging weight matrixes corresponding to all layers into a main array of the integrated memory chip, and arranging corresponding Bias into positions corresponding to the arrangement position columns in the Bias array of the integrated memory chip according to the arrangement positions of the weight matrixes and the minimum number of Bias rows;
wherein, when the weight matrixes corresponding to the layers are sequentially arranged, the weight matrixes are arranged according to a snake-shaped sequence.
Further, the main array comprises a plurality of storage and calculation integrated unit blocks distributed in an array, and weight matrixes corresponding to all layers are arranged on the storage and calculation integrated unit blocks in the main array;
The method comprises the step of sequentially polling the storage and calculation integrated unit blocks in the main array according to the serpentine sequence for the weight matrix corresponding to each layer of neural network to find the arrangement position.
Further, the mapping and sorting the layers according to the Bias minimum line number and the weight matrix corresponding to the layers of the neural network to be mapped includes:
Mapping and sequencing the layers according to the Bias minimum line number corresponding to the layers;
And mapping and sequencing the layers with the same Bias minimum line number according to the column number of the weight matrix.
Further, the neural network mapping method for the memory integrated chip further comprises the following steps of;
And writing the weight matrix and Bias of each layer of the neural network to be mapped into the memory integrated chip according to the arrangement result.
Further, the sequentially polling the memory integrated unit blocks in the main array in a serpentine order to find the arrangement position includes:
Sequentially polling whether the storage and calculation integrated unit blocks in the main array have positions meeting the current layer arrangement condition according to a serpentine sequence;
if yes, arranging the weight matrix of the current layer to a position meeting the arrangement condition of the current layer;
If not, continuing to poll the next integrated unit block until finding out the position meeting the arrangement condition;
The arrangement condition is that the weight matrix is arranged side by side with the weight matrix already arranged on the current storage integrated unit block in the current arrangement period and can accommodate the weight matrix of the current layer.
Further, the weight matrix is arranged from the next column of the non-idle columns.
Further, the sequentially polling the memory integrated unit blocks in the main array according to a serpentine sequence to find the arrangement position, further includes:
If the positions meeting the arrangement conditions are not found after all the storage and calculation integrated unit blocks are polled according to the snake-shaped sequence, returning the first storage and calculation integrated unit block, and entering the next arrangement period:
judging whether the idle position of the first calculation integrated unit block can accommodate the weight matrix of the current layer;
if yes, arranging the weight matrix of the current layer at the idle position of the first storage integrated unit block;
If not, sequentially polling all the storage and calculation integrated unit blocks according to the serpentine sequence until the storage and calculation integrated unit blocks capable of accommodating the weight matrix are found, and arranging the weight matrix of the current layer at the idle position of the storage and calculation integrated unit blocks;
wherein, when arranging the weight matrix, the weight matrix is arranged from the next row of the non-idle rows.
Further, when the corresponding Bias is arranged at the position corresponding to the arrangement position column in the Bias array of the integrated memory chip, the Bias is arranged at the next row of the non-idle rows in the position corresponding to the column.
Further, the neural network mapping method for the memory integrated chip further comprises the following steps:
Expanding the arrangement of the Bias according to the Bias arrangement result and the idle condition of the Bias array to obtain a Bias final arrangement result.
Further, the expanding the Bias arrangement according to the Bias arrangement result and the idle condition of the Bias array includes:
judging whether the occupied line numbers of all Bias can be multiplied according to the idle line numbers of the Bias array;
if yes, expanding the number of occupied lines of all Bias by multiple times;
if not, bias is selected for expansion according to a preset rule.
Further, the neural network mapping method for the memory integrated chip further comprises the following steps:
Dividing the integrated memory unit array of the integrated memory chip into a main array and a Bias array;
The main array is divided into a plurality of memory integrated unit blocks.
Further, the neural network mapping method for the memory integrated chip further comprises the following steps:
acquiring parameters of a neural network to be mapped and parameters of a target memory integrated chip, wherein the parameters of the neural network to be mapped comprise weight matrixes and Bias corresponding to all layers;
and obtaining the minimum number of the Bias lines corresponding to each layer according to the Bias of each layer and the parameters of the integrated memory chip.
The second aspect provides a memory-calculation integrated chip, which comprises a memory-calculation integrated unit array for executing neural network operation, wherein the memory-calculation integrated unit array comprises a main array and a Bias array, wherein a weight matrix corresponding to each layer of neural network is mapped in the main array;
The weight matrix corresponding to each layer is ordered based on the minimum number of the Bias rows and the number of columns of the weight matrix, and is arranged in a serpentine shape on the main array according to the ordering result, and the Bias is arranged to a position corresponding to the arrangement position column of the corresponding weight matrix in the Bias array.
Further, the main array comprises a plurality of storage and calculation integrated unit blocks distributed in an array, and weight matrixes corresponding to all layers are arranged on the storage and calculation integrated unit blocks in the main array;
and sequentially polling the storage and calculation integrated unit blocks in the main array according to a serpentine sequence aiming at the weight matrix corresponding to each layer of neural network so as to find the corresponding arrangement positions.
Further, the principle of the sorting is as follows:
Mapping and sequencing the layers according to the Bias minimum line number corresponding to the layers;
And mapping and sequencing the layers with the same Bias minimum line number according to the column number of the weight matrix.
Furthermore, the Bias arrangement mode expands the number of lines occupied by each Bias based on the minimum number of lines of the Bias and the idle condition of the Bias array on the basis of the positions corresponding to the arrangement position columns of the corresponding weight matrix in the Bias array.
The third aspect provides a memory-calculation integrated chip, which comprises a memory-calculation integrated unit array for executing neural network operation, wherein the memory-calculation integrated unit array comprises a main array and a Bias array, wherein a weight matrix corresponding to each layer of neural network is mapped in the main array;
The weight matrix and the corresponding Bias arrangement mode are generated according to the neural network mapping method.
In a fourth aspect, a neural network mapping apparatus for a memory integrated chip is provided, including:
the sequencing module is used for mapping and sequencing the layers according to the Bias minimum line number and the weight matrix corresponding to the layers of the neural network to be mapped;
the arrangement module is used for sequentially arranging the weight matrixes corresponding to all layers into a main array of the integrated memory chip according to the mapping and sorting result, and arranging the corresponding Bias into the position corresponding to the arrangement position column in the Bias array of the integrated memory chip according to the arrangement position of the weight matrixes and the minimum number of Bias rows;
wherein, when the weight matrixes corresponding to the layers are sequentially arranged, the weight matrixes are arranged according to a snake-shaped sequence.
In a fifth aspect, an electronic device is provided, including a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the neural network mapping method described above when the program is executed.
In a sixth aspect, a computer readable storage medium is provided, on which a computer program is stored which, when executed by a processor, implements the steps of the neural network mapping method described above.
The neural network mapping method, device and equipment for the integrated memory chip comprise the steps of mapping and sorting all layers according to the minimum Bias line numbers and the weight matrixes corresponding to all layers of the neural network to be mapped, sequentially arranging the weight matrixes corresponding to all layers into a main array of the integrated memory chip according to the mapping and sorting result, and arranging the corresponding Bias to the positions corresponding to the arrangement position columns in the Bias array of the integrated memory chip according to the arrangement positions of the weight matrixes and the minimum Bias line numbers, wherein the weight matrixes corresponding to all layers are arranged in a serpentine sequence when the weight matrixes corresponding to all layers are arranged in sequence. Through the serpentine arrangement after sequencing each layer, the storage and calculation integrated unit is effectively utilized, and the size of the storage and calculation integrated unit array can be reduced under the same operation scale.
In addition, in the embodiment of the invention, the number of lines occupied by each Bias is expanded based on the minimum number of lines of the Bias and the idle condition of the Bias array, so that the offset value on a single memory integrated unit is reduced, the current noise is reduced, and the operation precision is improved.
The foregoing and other objects, features and advantages of the invention will be apparent from the following more particular description of preferred embodiments, as illustrated in the accompanying drawings.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, and it is obvious that the drawings in the following description are some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art. In the drawings:
FIG. 1 shows a flow chart of a neural network mapping method for a memory card in an embodiment of the invention;
FIG. 2 illustrates a memory integrated cell array division in an embodiment of the present invention;
FIG. 3 shows specific steps of step S100 in an embodiment of the invention;
FIG. 4 illustrates a process for ordering layers of a neural network in an embodiment of the present invention;
FIG. 5 illustrates the weight matrix and the corresponding Bias arrangement results in the embodiment of the present invention;
FIG. 6 illustrates an arrangement extension process of Bias in an embodiment of the present invention;
FIG. 7 is a block diagram of a neural network mapping device for a memory chip in an embodiment of the invention;
fig. 8 is a block diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
In order that those skilled in the art will better understand the present application, a technical solution in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present application without making any inventive effort, shall fall within the scope of the present application.
It will be appreciated by those skilled in the art that embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
It is noted that the terms "comprises" and "comprising," and any variations thereof, in the description and claims of the present application and in the foregoing figures, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed or inherent to such process, method, article, or apparatus.
It should be noted that, without conflict, the embodiments of the present application and features of the embodiments may be combined with each other. The application will be described in detail below with reference to the drawings in connection with embodiments.
FIG. 1 shows a flowchart of a neural network mapping method for a memory chip according to an embodiment of the present invention, and as shown in FIG. 1, the neural network mapping method for a memory chip may include the following:
Step S100, mapping and sequencing all layers according to the Bias minimum line number and the weight matrix corresponding to all layers of the neural network to be mapped;
for a particular neural network that has been trained, the weight matrix parameters for each layer, as well as Bias parameters, bias size, and the minimum number of rows occupied on the chip have been obtained. The neural networks of all layers are ordered according to the minimum number of Bias lines and the number of columns of the weight matrix, and the order is used as the mapping arrangement order.
Step 200, according to the mapping and sorting result, sequentially arranging weight matrixes corresponding to all layers into a main array of an integrated memory chip;
wherein, when the weight matrixes corresponding to the layers are sequentially arranged according to the sequence obtained in the step S100, the weight matrixes are arranged according to a serpentine sequence.
Specifically, in the case where the calculation integrated chip inputs the rows of the calculation integrated units and outputs the columns of the calculation integrated units (based on this premise, those skilled in the art can understand that the rows and columns of the calculation integrated units are only one relative concept, and in specific applications, the rows and columns of the calculation integrated units may also be input as columns and output as rows, and the principle is the same, which is not described in detail herein), when the weight matrix is arranged, for each block (divided into a plurality of calculation integrated unit blocks in the main array and arranged in the array of the calculation integrated unit blocks) arranged in the main array, the first row (the rows of the blocks, but not the rows of the calculation integrated units) may be arranged in the X direction, the second row is arranged in the X direction, the third row is arranged in the X direction, the fourth row is arranged in the serpentine manner of the same shape, and the first row is also in the X direction, the second row is arranged in the X direction, the third row is arranged in the X direction, the fourth row is arranged in the serpentine shape, and the fourth row is arranged in the X direction, and the fourth row is arranged in the serpentine manner from the last row, the last row is arranged in the X direction, and the first row is the last in the X direction, and the fourth row is the serpentine manner, and the fourth row is the last from the last row is in the X direction, and the first row is the last row is the first row in the s is the s direction and the s is the s in the s is the s direction the s is the s.
The integrated memory unit may be a flash memory unit.
Step S300, according to the arrangement position of the weight matrix and the minimum number of Bias rows, arranging the corresponding Bias to the position corresponding to the arrangement position column in the Bias array of the integrated memory chip;
Specifically, due to the specificity of calculation of the memory integrated chip, the weight matrix is aligned with the corresponding Bias required column, so that the arrangement position of the Bias is aligned with the arrangement position column of the corresponding weight matrix, and the arrangement mode in the row direction is arranged according to the minimum row number of the Bias.
Generating an arrangement scheme through the steps S100-S300, writing parameters of each layer of the neural network into an integrated unit array of an integrated memory chip through a compiling tool according to the arrangement scheme, in an application reasoning stage, according to the arrangement scheme and the control requirement, gating a corresponding weight matrix of the layer of neural network operation and a row and column where Bias is located through a row and column decoder when executing the layer of neural network operation, inputting input signals of the layer of neural network into a row corresponding to the weight matrix, performing matrix multiplication and addition operation with the weight matrix, and then superposing with the corresponding Bias to obtain a calculation result of the layer of neural network in a corresponding column.
Through adopting above-mentioned technical scheme, through ordering each layer neural network parameter back, arrange according to the snakelike, effective complementation, for example order according to order line number 9, 8, 7, 6, 5, 4, 3, 2, 1, after carrying out the snakelike arrangement, with the correspondence of less line number and more line number, just so can reduce the occupation line number in the Bios array, so as to provide more line numbers for the extension, the utilization efficiency of calculating the integrative cell array has been increased, under the same operation scale, the integrative cell array of calculating that needs can greatly reduce, adapt to the miniaturized demand of chip. In an alternative embodiment, the neural network mapping method for the integrated memory chip further comprises dividing an integrated memory cell array of the integrated memory chip into a main array and a Bias array, and dividing the main array into a plurality of integrated memory cell blocks. The method can refer to the use scene and the corresponding neural network scale for dividing, and can guarantee the use performance on the basis of effectively improving the resource utilization rate.
Specifically, as shown in (a) of fig. 2, the actual physical architecture of the chip is composed of a main array and a Bias array. The applicant finds that in the practical application process, since the current is too large in the analog calculation process and can significantly affect the calculation result, in the embodiment of the present invention, the array may be logically partitioned, as shown in (b) in fig. 2, the main array may be divided into 2×4 blocks, and only by way of example, in the practical application, the Bias array may be divided into blocks or not, and in the embodiment of the present invention, the two-dimensional array is used to mark each block in the main array for convenience of description, starting from (0, 0). If the chip is small, the main array may not be diced, but due to the generally large network size, in most scenarios, dicing is necessary in practice. It should be added that the division is performed according to the actual performance of the chip, and each block that can be divided is the same in size, or each block that can be divided is different in size, which is not limited in the embodiment of the present invention, but the sizes of the layers of the neural network are considered before division, so that the layer with the largest occupied space in the neural network can be mapped in one block.
In an optional embodiment, the neural network mapping method may further include writing the weight matrix and Bias of each layer of the neural network to be mapped on the integrated memory chip according to the arrangement result.
Specifically, the mapping method is executed on a tool chain, and can be understood as a program running on a terminal device, a server or a chip burning device, an arrangement scheme is generated through the mapping method, and then a weight matrix and Bias are required to be written into the integrated memory chip according to the arrangement scheme. The integrated memory chip can be installed on a corresponding device circuit board for reasoning application, so that neural network operation can be realized, for example, the integrated memory chip can be installed on a toy for voice recognition, at the moment, the neural network parameters written in the chip are parameters corresponding to the voice recognition neural network, of course, the integrated memory chip can also be installed on face recognition equipment, the neural network parameters written in the chip are parameters corresponding to the image recognition neural network, of course, the application scenes of the chip are only exemplified, the application scenes of the chip are not limited, and the integrated memory chip can be various devices and scenes needing to carry out the neural network operation.
In an alternative embodiment, referring to fig. 3, the step S100 may include the following:
step S110, mapping and sequencing the layers according to the Bias minimum line number corresponding to the layers;
and step S120, mapping and sequencing the layers with the same Bias minimum line number according to the column number of the weight matrix.
Specifically, referring to (a) in fig. 4, parameters of a neural network are shown, where the upper row of 5 matrices is a weight matrix corresponding to 5 neural network layers, and the lower row of 5 matrices is Bias corresponding to 5 neural network layers, where 1024×128 represents the scale of the weight matrix, 1024 columns and 128 rows, and the corresponding Bias is 1024×2, which represents 1024 columns and the minimum number of occupied rows is 2 rows, where the number of columns of the weight matrix corresponding to one layer of neural network, the number of columns corresponding to Bias, is the same, due to the operational characteristics in the memory integrated chip.
When the layers of the neural network are ordered, the layers can be ordered according to the minimum number of Bias lines corresponding to the layers, referring to (b) in fig. 4, the layers are ordered according to the minimum number of Bias lines from large to small, and of course, in practical application, the layers can be ordered according to the minimum number of Bias lines from small to large, and the principle is the same.
In addition, for the two Bias minimum lines 1024×2 and 768×2, the layers with the same Bias minimum line number are partially sorted from large to small according to the number of columns of the weight matrix, and the final sorting result is shown in fig. 4 (b).
It should be noted that, in the embodiment of the present invention, for a specific trained neural network, the Bias size of each layer and the minimum number of rows occupied on the chip are obtained, firstly, the sequence is performed from large to small according to the minimum number of rows occupied by the Bias of each layer, and then, the partial sequence is performed again on each layer with the same number of rows occupied by the Bias according to the number of columns from small to large. The first objective of this is to give priority to the problem of reducing the size of the Bias of each physical row in the network mapping process, because this can improve the accuracy of the chip calculation, while the input received by the embodiment of the invention contains the minimum number of rows occupied by the Bias of each layer, and after mapping according to this number of rows, the Bias rows on the array generally have unmapped space, and if m rows are used to carry the Bias value originally set for carrying with 1 row, the Bias value designed becomes 1/m of the original value on each row. Secondly, in the application, the number of input paths of each layer of the network is small, and the number of output paths is large, as shown in fig. 4, for the illustrated network layer, the input is m rows, the output is n columns, and often m is smaller than n, so that in order to better ensure that the whole network can be accommodated in a limited space area, the size of each layer in the X direction is ordered from large to small, and on the premise that the Bias size is used as a first priority, each layer of the network is placed with the size of the X direction as a second priority.
In an alternative embodiment, the main array includes a plurality of storage and calculation unit blocks distributed in an array, referring to fig. 2, when the weight matrixes corresponding to each layer are arranged on the storage and calculation unit blocks in the main array, the weight matrixes corresponding to each layer are sequentially arranged according to a serpentine sequence, and specifically, the method includes sequentially polling the storage and calculation unit blocks in the main array according to the serpentine sequence for the weight matrixes corresponding to each layer of neural network to find the arrangement position.
The method comprises the steps of sequentially polling whether the storage and calculation integrated unit blocks in the main array have positions meeting the arrangement condition of the current layer according to a serpentine sequence, wherein the arrangement condition is that the storage and calculation integrated unit blocks are parallel to the weight matrix already arranged on the current storage and calculation integrated unit blocks in the current arrangement period and can accommodate the weight matrix of the current layer.
If yes, arranging the weight matrix of the current layer to a position meeting the arrangement condition of the current layer;
If not, continuing to poll the next integrated unit block until finding out the position meeting the arrangement condition;
It should be noted that, in order to minimize resource idling, when a position satisfying the condition is found, the weight matrix is arranged from the next column of the non-idle columns, and of course, in a special application occasion, or in order to reduce interference, a certain number of columns may be spaced, and specifically, the weight matrix may be selected according to the actual application requirement.
And if the positions meeting the arrangement conditions are not found after all the storage and calculation integrated unit blocks are polled according to the serpentine sequence, returning to the first storage and calculation integrated unit block, and entering the next arrangement period.
In a new arrangement period, firstly judging whether the idle position of the first integrative unit block can accommodate the weight matrix of the current layer, if so, arranging the weight matrix of the current layer at the idle position of the first integrative unit block, otherwise, sequentially polling all integrative unit blocks according to a serpentine sequence until the integrative unit block capable of accommodating the weight matrix is found, and arranging the weight matrix of the current layer at the idle position of the integrative unit block;
it should be noted that, in order to minimize resource idling, when a position satisfying the condition is found, the weight matrix is arranged from the next row of the non-idle rows, and of course, in a special application occasion, or in order to reduce interference, a certain number of rows may be spaced, and specifically, the selection may be performed according to the actual application requirement.
By adopting the technical scheme, after the ordered weight matrix is arranged on the main array, according to the principle that the weight matrix corresponds to the corresponding Bias column, when the corresponding Bias is arranged at the position corresponding to the arrangement position column in the Bias array of the integrated memory chip, the Bias is arranged to the next row of non-idle rows in the position corresponding to the column.
In order for those skilled in the art to better understand the embodiments of the present invention, the mapping arrangement process will be described in detail with reference to fig. 5.
First, taking a neural network of 20 layers as an example, each layer of the network is mapped onto an integrated cell array of an integrated chip. After the sequencing is completed, each layer of the neural network is recorded as 0-19 according to the sequence, the weight matrix is numbered as D0-D19, the Bias sequence is numbered as B0-B19, the weight matrices D0-D19 are firstly arranged, then in the example, the Bias B0-B19 are arranged according to the arrangement result of the weight matrices, when the arrangement is performed, in the main array, the first row of storage and calculation integrated unit blocks are arranged according to the left-to-right sequence, the second row of storage and calculation integrated unit blocks are arranged according to the right-to-left sequence, the third row of storage and calculation integrated unit blocks are arranged according to the left-to-right sequence, and then the first row of storage and calculation integrated unit blocks are returned to perform polling according to the serpentine circulation sequence;
For example, for D0, first poll block (0, 0), there is a position that can hold the weight matrix of the current layer, at which time D0 is arranged at (0, 0); for D1, first poll the block (0, 0), there are positions satisfying the condition, arrange D1 at (0, 0) and side by side with D0; for D2, firstly polling the block (0, 0), and then polling the block (0, 1) and arranging the D2 in the (0, 1) in sequence when the D2 cannot be contained in the position which is parallel to the D0, D1 on the (0, 0) block; for D3, the block (0, 0) is polled first, D3 cannot be accommodated in the position of the block (0, 0) side by side with D0, D1, then the block (0, 1) is polled, D3 cannot be accommodated in the position of the block (0, 1) side by side with D2, then D3 is arranged in the (1, 1) according to a serpentine shape, for D4, the block (0, 0) is polled first, D4 cannot be accommodated in the position of the block (0, 0) side by side with D0, D1), then the block (0, 1) is polled, D4 cannot be accommodated in the position of the block (0, 1) side by side with D2, then D4 is arranged in the (1, 1) side by side with D3 according to a serpentine shape, for D5, the block (0, 0) is polled first, D3 is arranged in the position of the block (0, 0) side by side with D0, D1) side with D5, D5 is arranged in the position of the block (0, 1) side by side with D5 is arranged in the side by side with D4, then D5 is arranged in the order of the block (0, 1) side with D1) side by side with D4, and then D4 is arranged in the order of the block (0, 1, 16) is arranged by side with respect to D1, and then by side with D1 is arranged in the order, and if the position (the position which can accommodate D16 and is parallel to the matrix before the present arrangement period) of each block which does not meet the condition is not met, the process jumps to (0, 0) and enters the next arrangement period.
In a new arrangement period, firstly judging whether the idle position of (0, 0) can accommodate D16, if so, arranging D16 (0, 0), and when arranging, starting from the next row of the non-idle row, namely starting from the next row of D1, in this example, arranging D17-D19, see D1-D5, and not being repeated here.
It should be noted that in an alternative embodiment, D17 first polls (0, 0) for a free position to accommodate D17, where the free position includes a position in the row where D0 polls (i.e., the row where the previous arrangement period is located), polls sequentially, and then polls whether there is a position to the right of D16 in the current arrangement period to accommodate D17, and so on, the principle of arrangement is that two layers of networks are adjacent in the X direction in an array block in a round of mapping, cannot be staggered, but each layer of network is placed from top to bottom in the Y direction, as long as it can be put down. Each void on the array is therefore under consideration.
In another alternative embodiment, D17 first polls (0, 0) if the idle position to the right of D16 can accommodate D17, where the position of the row in which D0 was located (i.e., the row in which the last permutation period was located) is not polled during the polling, and only the position of the row in which the current period was located is polled. It is noted that on one block, the preceding matrix is in front of the succeeding matrix, and the preceding and succeeding matrices are determined in the order of polling, e.g. for the first row of blocks (0, 1) and (0, 1), the left position is in front of the right position, in one block the preceding matrix is in front of the right position, e.g. D0 is in left of D1, for the second row of blocks (1, 0) and (1, 1), the right position is in front of the left position, in one block of the second row the preceding matrix is in front of the right position, e.g. D3 is in right of D4.
It should be noted that, the mapping is performed in a serpentine sequence from left to right (0 th row block to even row block), from right to left (0 th row block to even row block), from top to bottom as a whole, and after one round is finished, the process is repeated back to the (0, 0) th block until the whole neural network is mapped. (the general principle above is merely an example, in the actual process, the 0 th row from right to left or the whole is arranged according to the order of snaking from the lowermost row of the array to the uppermost row of the array, but the snaking principle must be followed, and the snaking arrangement must be adopted, but the snaking arrangement cannot be adopted, which is the principle in the case of taking the rows as input and the columns as output, and it will be understood by those skilled in the art that if taking the rows as output, the columns as input, the contrary is not needed, the embodiments of the present invention arrange the layers of the neural network in the order from large to small in the above step, the snaking arrangement basically ensures that the most and least occupied Bias rows correspond in the Y direction, the more and less Bias occupies the more and less Bias rows also correspond in the Y direction, so that the Bias of each layer on the final Bias array is distributed upwards, or mapped in the order from bottom to top, the greater space is provided for the subsequent Bias expansion, and the calculation result is provided for the chip is higher.
The following problems need to be addressed in the main array mapping process. Unless specifically required, adjacent blocks are based on seamless connection, and the purpose of seamless connection is to fully utilize array space in the early stage of mapping as much as possible, because each layer must be put into the array in a whole, if gaps are left between layers, and the gaps are difficult to utilize, the later stage of mapping may be difficult to put down. In a round of mapping, each layer looks for positions in a serpentine order starting from the (0, 0) th block, and if the current block position is insufficient, the next block is considered. In one round of mapping, if there is a layer that is not placed in all blocks, a new round of mapping is entered. In Bias mapping, only the corresponding weight position and the preset minimum occupied line number are needed to be mapped on the Bias array from top to bottom, and the numbers in brackets behind each Bias in fig. 5 indicate the occupied line number, for example, B1 (9) indicates that the occupied line number of B1 is 9.
In an optional embodiment, the neural network mapping method for the integrated memory chip further comprises the steps of obtaining parameters of a neural network to be mapped and parameters of a target integrated memory chip, wherein the parameters of the neural network to be mapped comprise weight matrixes and Bias corresponding to all layers, and obtaining the minimum number of Bias lines corresponding to all layers according to the Bias of all layers and the parameters of the integrated memory chip.
Specifically, after the neural network model is determined, the weight array and the Bias value of each layer are known, and the Bias minimum line number of each layer is calculated according to the Bias value and the parameter (attribute of each line of Bias) of the target chip.
The minimum number of rows of Bias can be given in advance by engineers in the circuit aspect according to the precision requirement, generally, the minimum number of rows can meet the worst precision requirement as a standard, and can also be calculated according to a preset rule, which is not repeated in the embodiment of the present invention.
In an optional embodiment, the neural network mapping method for the integrated memory chip may further include expanding the Bias arrangement according to the Bias arrangement result and the idle condition of the Bias array to obtain a Bias final arrangement result.
Specifically, after the arrangement is completed, the Bias array may have free rows, so as to reduce the Bias values stored in all or part of the memory integrated units in the Bias array as far as possible, the arrangement of the Bias needs to be secondarily expanded by using the free rows in the Bias array to obtain a final arrangement scheme, and then the neural network parameters are written into the memory integrated chips according to the final arrangement scheme.
Because the integrated memory chip essentially adopts analog calculation, the larger the Bias value of each integrated memory unit on the Bias array is, the larger the noise generated by final calculation is, and the oversized noise introduced by oversized Bias can have decisive influence on calculation precision, so that the actual number of lines of the Bias array occupied by one line of Bias on logic can be expanded as much as possible according to the size of the array, if the occupied actual number of lines is m, the Bias size stored on each line is 1/m of the size of the logic Bias, and the calculation precision is improved.
In an alternative embodiment, expanding the Bias arrangement according to the Bias arrangement result and the idle condition of the Bias array includes:
judging whether the occupied line numbers of all Bias can be multiplied according to the idle line numbers of the Bias array;
if yes, expanding the number of occupied lines of all Bias by multiple times;
if not, bias is selected for expansion according to a preset rule.
Specifically, while mapping weights of each layer of the neural network onto the main array, preliminary mapping of each layer of Bias is completed on the Bias array according to the minimum number of lines occupied by each layer of Bias given in advance, and the mapping result is shown in (a) of fig. 6. Then, the expansion is performed according to the maximum integral multiple of the expansion of the mapped Bias rows, if the total row number of the Bias array is 10 rows and 3 rows are mapped, each row is expanded to be 3 times of the original row, and if 4 rows are mapped, each row is expanded to be 2 times of the original row. For the array arrangement shown in fig. 6 (a), the expansion can be doubled completely, and the result is shown in fig. 6 (b). Then, if there are more free rows in the Bias array, the mapped Bias rows equal to the free rows are selected from the bottom up, and these Bias rows are expanded into two rows, and if there are more than 2 free rows, the third and fourth rows are expanded into two rows from the bottom up, respectively, as shown in fig. 6 (c). Wherein, the idle line refers to a line which is not mapped with Bias in the whole line, and the line which is partially mapped with Bias calculates the mapped Bias line.
In general, the principle of expansion is to utilize the resources of the Bias array as much as possible, reduce idle rows, expand Bias corresponding to all layers if Bias corresponding to all layers can be expanded, expand Bias corresponding to part of layers if Bias corresponding to all layers cannot be expanded, and expand Bias of part of rows if Bias corresponding to part of layers cannot be expanded.
When expanding the Bias corresponding to the partial layer, the Bias corresponding to the partial layer may be selected from the front to the back or from the back to the front according to the above-mentioned order of ordering the layers, or the Bias corresponding to the partial layer may be expanded according to a preset priority, for example, the Bias corresponding to the layer with a great influence on the precision may be preferentially expanded, or the Bias corresponding to the layer with a great Bias value corresponding to a single calculation integrated unit may be expanded after preliminary mapping, and specifically selected according to the actual application requirement.
In summary, the embodiment of the invention provides a method for automatically mapping the neural network weight to the integrated memory chip, which can be integrated into a tool chain of the integrated memory chip design, thereby providing convenience for the integrated memory chip user.
The method comprises the steps of carrying out distribution on Bias, wherein the Bias is stored in a single Bias array, and the Bias value stored in the single Bias array is reduced by expanding the distribution mode of Bias, so that on one hand, the Bias value stored in the Bias array is fully utilized, the resource utilization rate is improved, the resource idling is reduced, on the other hand, the Bias value stored in the single Bias array is reduced, the conductance is reduced, the current noise is reduced, the operation precision is improved, and in addition, after the parameters of the neural network of each layer are sequenced, the parameters are arranged according to snakelike, the utilization rate of the Bias array is effectively complemented, and under the same operation scale, the required Bias array can be greatly reduced, so that the method is suitable for the chip miniaturization requirement.
The embodiment of the invention also provides a memory integrated chip, which comprises a memory integrated unit array for executing the operation of the neural network, wherein the memory integrated unit array comprises a main array and a Bias array, wherein the main array is mapped with a weight matrix corresponding to each layer of the neural network;
The weight matrix corresponding to each layer is ordered based on the minimum number of the Bias rows and the number of columns of the weight matrix, and is arranged in a serpentine shape on the main array according to the ordering result, and the Bias is arranged to a position corresponding to the arrangement position column of the corresponding weight matrix in the Bias array.
In an alternative embodiment, the main array comprises a plurality of storage and calculation integrated unit blocks distributed in an array, and weight matrixes corresponding to all layers are arranged on the storage and calculation integrated unit blocks in the main array;
and sequentially polling the storage and calculation integrated unit blocks in the main array according to a serpentine sequence aiming at the weight matrix corresponding to each layer of neural network so as to find the corresponding arrangement positions.
In an alternative embodiment, the principle of ordering is that mapping ordering is performed on each layer according to the Bias minimum line number corresponding to each layer, and mapping ordering is performed on the layers with the same Bias minimum line number according to the column number of the weight matrix.
In an alternative embodiment, the Bias arrangement manner expands the number of lines occupied by each Bias based on the minimum number of lines of the Bias and the idle condition of the Bias array on the basis of the positions corresponding to the arrangement position columns of the corresponding weight matrix in the Bias array.
The memory integrated chip, the weight matrix and the corresponding Bias arrangement mode provided by the embodiment of the invention are generated according to the neural network mapping method.
It should be noted that the integrated memory chip provided by the embodiment of the invention can be applied to various electronic devices, such as smart phones, tablet electronic devices, network set-top boxes, portable computers, desktop computers, personal Digital Assistants (PDAs), vehicle-mounted devices, intelligent wearable devices, toys, intelligent home control devices, pipeline device controllers and the like. Wherein, intelligent wearing equipment can include intelligent glasses, intelligent wrist-watch, intelligent bracelet etc..
Based on the same inventive concept, the embodiment of the present application also provides a neural network mapping device for a memory integrated chip, which can be used to implement the method described in the above embodiment, as described in the following embodiment. The principle of solving the problem by the neural network mapping device for the integrated memory chip is similar to that of the above method, so that the implementation of the neural network mapping device for the integrated memory chip can be referred to the implementation of the above method, and the repetition is omitted. As used below, the term "unit" or "module" may be a combination of software and/or hardware that implements the intended function. While the means described in the following embodiments are preferably implemented in software, implementation in hardware, or a combination of software and hardware, is also possible and contemplated.
Fig. 7 is a block diagram of a neural network mapping device for a memory chip in an embodiment of the present invention. The neural network mapping device for the integrated memory chip comprises a sequencing module 10, a weight arrangement module 20 and a bias arrangement module 30.
The sequencing module performs mapping sequencing on each layer according to the Bias minimum line number and the weight matrix corresponding to each layer of the neural network to be mapped;
The weight arrangement module sequentially arranges weight matrixes corresponding to all layers into a main array of the integrated memory chip according to the mapping and sorting result;
The Bias arrangement module arranges the corresponding Bias to the position corresponding to the arrangement position column in the Bias array of the integrated memory chip according to the arrangement position of the weight matrix and the minimum number of Bias rows;
wherein, when the weight matrixes corresponding to the layers are sequentially arranged, the weight matrixes are arranged according to a snake-shaped sequence.
The apparatus, module or unit set forth in the above embodiments may be implemented in particular by a computer chip or entity, or by a product having a certain function. A typical implementation device is an electronic device, which may be, for example, a personal computer, a laptop computer, a cellular telephone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.
In a typical example, the electronic device includes a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor executes the program to implement the steps of the neural network mapping method for a memory card described above.
Referring now to fig. 8, a schematic diagram of an electronic device 600 suitable for use in implementing embodiments of the present application is shown.
As shown in fig. 8, the electronic apparatus 600 includes a Central Processing Unit (CPU) 601, which can perform various appropriate works and processes according to a program stored in a Read Only Memory (ROM) 602 or a program loaded from a storage section 608 into a Random Access Memory (RAM)) 603. In the RAM603, various programs and data required for the operation of the system 600 are also stored. The CPU601, ROM602, and RAM603 are connected to each other through a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.
Connected to the I/O interface 605 are an input section 606 including a keyboard, a mouse, and the like, an output section 607 including a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, a speaker, and the like, a storage section 608 including a hard disk, and the like, and a communication section 609 including a network interface card such as a LAN card, a modem, and the like. The communication section 609 performs communication processing via a network such as the internet. The drive 610 is also connected to the I/O interface 605 as needed. Removable media 611 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on drive 610 as needed, so that a computer program read therefrom is mounted as needed as storage section 608.
In particular, according to embodiments of the present invention, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, embodiments of the present invention include a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the neural network mapping method for a memory card described above.
In such an embodiment, the computer program may be downloaded and installed from a network through the communication portion 609, and/or installed from the removable medium 611.
Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.
For convenience of description, the above devices are described as being functionally divided into various units, respectively. Of course, the functions of each element may be implemented in the same piece or pieces of software and/or hardware when implementing the present application.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises an element.
It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The application may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The application may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.
In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for system embodiments, since they are substantially similar to method embodiments, the description is relatively simple, as relevant to see a section of the description of method embodiments.
The foregoing is merely exemplary of the present application and is not intended to limit the present application. Various modifications and variations of the present application will be apparent to those skilled in the art. Any modification, equivalent replacement, improvement, etc. which come within the spirit and principles of the application are to be included in the scope of the claims of the present application.

Claims (20)

Translated fromChinese
1.一种用于存算一体芯片的神经网络映射方法,其特征在于,包括:1. A neural network mapping method for a storage-computing integrated chip, characterized by comprising:根据待映射神经网络各层对应的Bias最小行数与权重矩阵的列数对各层进行映射排序;Map and sort each layer according to the minimum number of Bias rows and the number of columns of the weight matrix corresponding to each layer of the neural network to be mapped;根据所述映射排序结果,依次将各层对应的权重矩阵排布到存算一体芯片的主阵列中,并根据权重矩阵的排布位置以及Bias最小行数将对应的Bias排布到所述存算一体芯片的Bias阵列中的与所述排布位置列对应的位置,其中,Bias的排布位置与对应的权重矩阵的排布位置列对齐,行方向的排布方式依据Bias最小行数进行排布;According to the mapping sorting result, the weight matrices corresponding to each layer are arranged in the main array of the storage-computing integrated chip in turn, and the corresponding Bias is arranged to the position corresponding to the arrangement position column in the Bias array of the storage-computing integrated chip according to the arrangement position of the weight matrix and the minimum number of Bias rows, wherein the arrangement position of the Bias is aligned with the arrangement position column of the corresponding weight matrix, and the arrangement method in the row direction is arranged according to the minimum number of Bias rows;其中,依次排布各层对应的权重矩阵时按照蛇形顺序排布。Among them, the weight matrices corresponding to each layer are arranged in a serpentine order.2.根据权利要求1所述的用于存算一体芯片的神经网络映射方法,其特征在于,所述主阵列中包含多个阵列分布的存算一体单元块,各层对应的权重矩阵排布到所述主阵列中的存算一体单元块上;2. The neural network mapping method for a storage-computation integrated chip according to claim 1, characterized in that the main array comprises a plurality of storage-computation integrated unit blocks distributed in an array, and the weight matrices corresponding to each layer are arranged on the storage-computation integrated unit blocks in the main array;所述依次排布各层对应的权重矩阵时按照蛇形顺序排布,包括:针对每一层神经网络对应的权重矩阵,按照蛇形顺序依次轮询所述主阵列中的存算一体单元块以找到所述排布位置。The weight matrices corresponding to each layer are arranged in a serpentine order, including: for the weight matrix corresponding to each layer of the neural network, the storage and computing integrated unit blocks in the main array are polled in a serpentine order to find the arrangement position.3.根据权利要求2所述的用于存算一体芯片的神经网络映射方法,其特征在于,所述根据待映射神经网络各层对应的Bias最小行数与权重矩阵对各层进行映射排序,包括:3. The neural network mapping method for a storage-computation integrated chip according to claim 2 is characterized in that the mapping and sorting of each layer according to the minimum number of Bias rows and the weight matrix corresponding to each layer of the neural network to be mapped comprises:根据各层对应的Bias最小行数对各层进行映射排序;Sort the mapping of each layer according to the minimum number of Bias rows corresponding to each layer;对Bias最小行数相同的层按照权重矩阵的列数进行映射排序。For layers with the same minimum number of Bias rows, they are mapped and sorted according to the number of columns in the weight matrix.4.根据权利要求2所述的用于存算一体芯片的神经网络映射方法,其特征在于,还包括;4. The neural network mapping method for a storage-computation integrated chip according to claim 2, characterized in that it also includes:根据排布结果将所述待映射神经网络各层的权重矩阵和Bias写入所述存算一体芯片上。According to the arrangement results, the weight matrix and bias of each layer of the neural network to be mapped are written into the storage and computing integrated chip.5.根据权利要求2所述的用于存算一体芯片的神经网络映射方法,其特征在于,所述按照蛇形顺序依次轮询所述主阵列中的存算一体单元块以找到所述排布位置,包括:5. The neural network mapping method for a storage-computation integrated chip according to claim 2, characterized in that the step of sequentially polling the storage-computation integrated unit blocks in the main array in a serpentine order to find the arrangement position comprises:按照蛇形顺序依次轮询所述主阵列中的存算一体单元块是否存在满足当前层排布条件的位置;polling the storage-computation integrated unit blocks in the main array in a serpentine order to see whether there are positions that meet the current layer arrangement conditions;若是,将当前层的权重矩阵排布到满足当前层排布条件的位置;If so, arrange the weight matrix of the current layer to a position that satisfies the arrangement conditions of the current layer;若否,继续轮询下一存算一体单元块,直至找到满足所述排布条件的位置;If not, continue to poll the next storage-computing integrated unit block until a position that meets the arrangement condition is found;其中,所述排布条件为:与当前排布周期中已排布在当前存算一体单元块上的权重矩阵并排且能够容纳当前层的权重矩阵。Among them, the arrangement condition is: to be arranged side by side with the weight matrix that has been arranged on the current storage-computation integrated unit block in the current arrangement cycle and to be able to accommodate the weight matrix of the current layer.6.根据权利要求5所述的用于存算一体芯片的神经网络映射方法,其特征在于,排布权重矩阵时从非空闲列的下一列开始排布。6. The neural network mapping method for a storage-computing integrated chip according to claim 5 is characterized in that when arranging the weight matrix, the arrangement starts from the next column of the non-idle column.7.根据权利要求5所述的用于存算一体芯片的神经网络映射方法,其特征在于,所述按照蛇形顺序依次轮询所述主阵列中的存算一体单元块以找到所述排布位置,还包括:7. The neural network mapping method for a storage-computation integrated chip according to claim 5, characterized in that the step of sequentially polling the storage-computation integrated unit blocks in the main array in a serpentine order to find the arrangement position further comprises:若按照蛇形顺序轮询完所有存算一体单元块后仍未找到满足所述排布条件的位置,则返回首个存算一体单元块,进入下一排布周期:If no position that meets the arrangement condition is found after polling all storage-computing integrated unit blocks in a serpentine order, the first storage-computing integrated unit block is returned to enter the next arrangement cycle:判断首个存算一体单元块的空闲位置是否能够容纳当前层的权重矩阵;Determine whether the free space of the first storage-computation-in-one unit block can accommodate the weight matrix of the current layer;若是,将当前层的权重矩阵排布在首个存算一体单元块的空闲位置;If yes, arrange the weight matrix of the current layer in the free position of the first storage-computation integrated unit block;若否,按照蛇形顺序依次轮询各存算一体单元块,直到找到能够容纳所述权重矩阵的存算一体单元块,将当前层的权重矩阵排布在所述存算一体单元块的空闲位置;If not, poll each storage-computation-in-one unit block in a serpentine order until a storage-computation-in-one unit block that can accommodate the weight matrix is found, and the weight matrix of the current layer is arranged in an idle position of the storage-computation-in-one unit block;其中,排布权重矩阵时从非空闲行的下一行开始排列。Among them, when arranging the weight matrix, the arrangement starts from the next row of the non-idle row.8.根据权利要求1所述的用于存算一体芯片的神经网络映射方法,其特征在于,所述将对应的Bias排布到所述存算一体芯片的Bias阵列中的与所述排布位置列对应的位置时,将所述Bias排布到列对应的位置中的非空闲行的下一行。8. The neural network mapping method for a storage and computing integrated chip according to claim 1 is characterized in that when the corresponding Bias is arranged to the position corresponding to the arrangement position column in the Bias array of the storage and computing integrated chip, the Bias is arranged to the next row of the non-idle row in the position corresponding to the column.9.根据权利要求1所述的用于存算一体芯片的神经网络映射方法,其特征在于,还包括:9. The neural network mapping method for a storage-computation integrated chip according to claim 1, characterized in that it also includes:根据Bias排布结果和所述Bias阵列的空闲情况对Bias的排布进行扩展,得到Bias最终排布结果。The arrangement of the Bias is expanded according to the Bias arrangement result and the idle condition of the Bias array to obtain the final arrangement result of the Bias.10.根据权利要求9所述的用于存算一体芯片的神经网络映射方法,其特征在于,所述根据Bias排布结果和所述Bias阵列的空闲情况对Bias的排布进行扩展,包括:10. The neural network mapping method for a storage-computation integrated chip according to claim 9, characterized in that the arrangement of the Bias is expanded according to the Bias arrangement result and the idle state of the Bias array, comprising:根据Bias阵列的空闲行数判断是否能够将所有Bias占用行数成倍扩展;Determine whether the number of occupied rows of all Bias can be expanded exponentially based on the number of free rows in the Bias array;若是,将所有Bias占用行数成倍扩展;If so, expand the number of rows occupied by all Bias by multiples;若否,按照预设规则选取Bias进行扩展。If not, select Bias for expansion according to the preset rules.11.根据权利要求1至9任一项所述的用于存算一体芯片的神经网络映射方法,其特征在于,还包括:11. The neural network mapping method for a storage-computation integrated chip according to any one of claims 1 to 9, characterized in that it also includes:将所述存算一体芯片的存算一体单元阵列划分为主阵列以及Bias阵列;Dividing the storage-computing-in-one unit array of the storage-computing-in-one chip into a main array and a bias array;将所述主阵列划分为多个存算一体单元块。The main array is divided into a plurality of storage and computing integrated unit blocks.12.根据权利要求1至9任一项所述的用于存算一体芯片的神经网络映射方法,其特征在于,还包括:12. The neural network mapping method for a storage-computation integrated chip according to any one of claims 1 to 9, characterized in that it also includes:获取待映射神经网络的参数以及目标存算一体芯片的参数,所述待映射神经网络的参数包括各层对应的权重矩阵以及Bias;Obtaining parameters of the neural network to be mapped and parameters of the target storage-computing integrated chip, wherein the parameters of the neural network to be mapped include weight matrices and biases corresponding to each layer;根据各层的Bias以及所述存算一体芯片的参数得到各层对应的Bias最小行数。The minimum number of Bias rows corresponding to each layer is obtained according to the Bias of each layer and the parameters of the storage-computing integrated chip.13.一种存算一体芯片,其特征在于,包括:用于执行神经网络运算的存算一体单元阵列,所述存算一体单元阵列包括:主阵列以及Bias阵列,所述主阵列中映射有各层神经网络对应的权重矩阵;Bias阵列中映射有各层神经网络对应的Bias;13. A memory-computing integrated chip, characterized in that it comprises: a memory-computing integrated unit array for performing neural network operations, the memory-computing integrated unit array comprising: a main array and a bias array, the main array is mapped with a weight matrix corresponding to each layer of the neural network; the bias array is mapped with a bias corresponding to each layer of the neural network;各层对应的权重矩阵基于Bias最小行数与权重矩阵的列数排序,且按照排序结果在所述主阵列上蛇形排布,所述Bias排布到Bias阵列中的与对应权重矩阵的排布位置列对应的位置,其中,Bias的排布位置与对应的权重矩阵的排布位置列对齐,行方向的排布方式依据Bias最小行数进行排布。The weight matrices corresponding to each layer are sorted based on the minimum number of rows of Bias and the number of columns of the weight matrix, and are arranged in a serpentine manner on the main array according to the sorting results. The Bias is arranged at a position in the Bias array corresponding to the arrangement position column of the corresponding weight matrix, wherein the arrangement position of the Bias is aligned with the arrangement position column of the corresponding weight matrix, and the arrangement in the row direction is arranged according to the minimum number of rows of Bias.14.根据权利要求13所述的存算一体芯片,其特征在于,所述主阵列包括多个阵列分布的存算一体单元块,各层对应的权重矩阵排布到所述主阵列中的存算一体单元块上;14. The storage-computing integrated chip according to claim 13, characterized in that the main array comprises a plurality of storage-computing integrated unit blocks distributed in an array, and the weight matrices corresponding to each layer are arranged on the storage-computing integrated unit blocks in the main array;其中,针对每一层神经网络对应的权重矩阵,按照蛇形顺序依次轮询所述主阵列中的存算一体单元块以找到对应的排布位置。Among them, for the weight matrix corresponding to each layer of the neural network, the storage and computing integrated unit blocks in the main array are polled in sequence in a serpentine order to find the corresponding arrangement position.15.根据权利要求13所述的存算一体芯片,其特征在于,所述排序的原则为:15. The storage and computing integrated chip according to claim 13, characterized in that the principle of sorting is:根据各层对应的Bias最小行数对各层进行映射排序;Sort the mapping of each layer according to the minimum number of Bias rows corresponding to each layer;对Bias最小行数相同的层按照权重矩阵的列数进行映射排序。For layers with the same minimum number of Bias rows, they are mapped and sorted according to the number of columns in the weight matrix.16.根据权利要求13所述的存算一体芯片,其特征在于,所述Bias排布方式在排布到Bias阵列中的与对应权重矩阵的排布位置列对应的位置的基础上,还基于Bias最小行数以及Bias阵列空闲情况对各Bias占据的行数进行了扩展。16. The storage and computing integrated chip according to claim 13 is characterized in that the Bias arrangement method is arranged in the Bias array at a position corresponding to the arrangement position column of the corresponding weight matrix, and the number of rows occupied by each Bias is expanded based on the minimum number of Bias rows and the idle situation of the Bias array.17.一种存算一体芯片,包括:用于执行神经网络运算的存算一体单元阵列,所述存算一体单元阵列包括:主阵列以及Bias阵列,所述主阵列中映射有各层神经网络对应的权重矩阵;Bias阵列中映射有各层神经网络对应的Bias;17. A memory-computing integrated chip, comprising: a memory-computing integrated unit array for performing neural network operations, the memory-computing integrated unit array comprising: a main array and a bias array, the main array being mapped with a weight matrix corresponding to each layer of the neural network; the bias array being mapped with a bias corresponding to each layer of the neural network;所述权重矩阵以及对应的Bias的排布方式根据权利要求1至12任一项所述的神经网络映射方法产生。The arrangement of the weight matrix and the corresponding Bias is generated according to the neural network mapping method according to any one of claims 1 to 12.18.一种用于存算一体芯片的神经网络映射装置,其特征在于,包括:18. A neural network mapping device for a storage-computation integrated chip, characterized by comprising:排序模块,根据待映射神经网络各层对应的Bias最小行数与权重矩阵对各层进行映射排序;The sorting module maps and sorts each layer according to the minimum number of Bias rows and weight matrix corresponding to each layer of the neural network to be mapped;权重排布模块,根据所述映射排序结果,依次将各层对应的权重矩阵排布到存算一体芯片的主阵列中;A weight arrangement module arranges the weight matrices corresponding to each layer into the main array of the storage and computing integrated chip in sequence according to the mapping sorting result;偏置排布模块,根据权重矩阵的排布位置以及Bias最小行数将对应的Bias排布到所述存算一体芯片的Bias阵列中的与所述排布位置列对应的位置;A bias arrangement module arranges the corresponding Bias to the position corresponding to the arrangement position column in the Bias array of the storage and computing integrated chip according to the arrangement position of the weight matrix and the minimum number of Bias rows;其中,依次排布各层对应的权重矩阵时按照蛇形顺序排布。Among them, the weight matrices corresponding to each layer are arranged in a serpentine order.19.一种电子设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,其特征在于,所述处理器执行所述程序时实现权利要求1至12任一项所述的神经网络映射方法的步骤。19. An electronic device comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor implements the steps of the neural network mapping method according to any one of claims 1 to 12 when executing the program.20.一种计算机可读存储介质,其上存储有计算机程序,其特征在于,该计算机程序被处理器执行时实现权利要求1至12任一项所述的神经网络映射方法的步骤。20. A computer-readable storage medium having a computer program stored thereon, wherein when the computer program is executed by a processor, the steps of the neural network mapping method according to any one of claims 1 to 12 are implemented.
CN202111184060.XA2021-10-112021-10-11Neural network mapping method, device and equipment for memory and calculation integrated chipActiveCN113988277B (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN202111184060.XACN113988277B (en)2021-10-112021-10-11Neural network mapping method, device and equipment for memory and calculation integrated chip

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN202111184060.XACN113988277B (en)2021-10-112021-10-11Neural network mapping method, device and equipment for memory and calculation integrated chip

Publications (2)

Publication NumberPublication Date
CN113988277A CN113988277A (en)2022-01-28
CN113988277Btrue CN113988277B (en)2025-07-25

Family

ID=79738159

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN202111184060.XAActiveCN113988277B (en)2021-10-112021-10-11Neural network mapping method, device and equipment for memory and calculation integrated chip

Country Status (1)

CountryLink
CN (1)CN113988277B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN114723024B (en)*2022-03-082025-05-02杭州知存算力科技有限公司 A linear programming-based neural network mapping method for storage-computing integrated chip
CN114997388B (en)*2022-06-302024-05-07杭州知存算力科技有限公司Neural network bias processing method based on linear programming for memory and calculation integrated chip

Citations (2)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN111241028A (en)*2018-11-282020-06-05北京知存科技有限公司 A digital-analog hybrid storage and computing integrated chip and computing device
WO2021163866A1 (en)*2020-02-182021-08-26杭州知存智能科技有限公司Neural network weight matrix adjustment method, writing control method, and related device

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US9971965B2 (en)*2015-03-182018-05-15International Business Machines CorporationImplementing a neural network algorithm on a neurosynaptic substrate based on metadata associated with the neural network algorithm
EP3735658A1 (en)*2018-07-122020-11-11Huawei Technologies Co. Ltd.Generating a compressed representation of a neural network with proficient inference speed and power consumption
CN113344170B (en)*2020-02-182023-04-25杭州知存智能科技有限公司Neural network weight matrix adjustment method, write-in control method and related device
CN112231631A (en)*2020-10-292021-01-15北京知存科技有限公司Assembly line control method for parallel work of storage and calculation integrated chip

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN111241028A (en)*2018-11-282020-06-05北京知存科技有限公司 A digital-analog hybrid storage and computing integrated chip and computing device
WO2021163866A1 (en)*2020-02-182021-08-26杭州知存智能科技有限公司Neural network weight matrix adjustment method, writing control method, and related device

Also Published As

Publication numberPublication date
CN113988277A (en)2022-01-28

Similar Documents

PublicationPublication DateTitle
JP7430744B2 (en) Improving machine learning models to improve locality
EP4036724A1 (en)Method for splitting neural network model by using multi-core processor, and related product
CN107437110B (en) Block convolution optimization method and device for convolutional neural network
CN114723024B (en) A linear programming-based neural network mapping method for storage-computing integrated chip
CN113703775A (en)Compiling method, device, equipment and storage medium
CN113988277B (en)Neural network mapping method, device and equipment for memory and calculation integrated chip
CN117494816B (en)Model reasoning method, device, equipment and medium based on computing unit deployment
CN110826708A (en) A method for splitting a neural network model with a multi-core processor and related products
CN114138231B (en) Method, circuit and SOC for performing matrix multiplication
CN109582911A (en)For carrying out the computing device of convolution and carrying out the calculation method of convolution
CN106295670A (en)Data processing method and data processing equipment
CN114626552A (en) Method and device for segmentation of machine learning model
CN107391564B (en)Data conversion method and device and electronic equipment
CN114968182B (en) Operator splitting method, control method and device for storage and computing integrated chip
CN112434817B (en) Method, device and computer storage medium for constructing communication algorithm database
CN111645687A (en)Lane changing strategy determining method, device and storage medium
CN114676132B (en) Data table association method, device, storage medium and electronic device
CN115600658A (en)Sampling method and sampling acceleration device applied to graph neural network training
CN107273207A (en)A kind of related data storage method based on hypergraph partitioning algorithm
CN115774577A (en)Spark GraphX parameter optimization method and device, electronic equipment and storage medium
CN112750074A (en)Small sample image feature enhancement method and system and image classification method and system
CN115329925A (en) Neural network structure determination method and device and related products
CN111260036A (en) A kind of neural network acceleration method and device
CN116755714B (en)Method, device, equipment and storage medium for operating deep neural network model
CN115905569B (en) A node-adaptive small-sample knowledge graph completion method and device

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination
CB02Change of applicant information
CB02Change of applicant information

Country or region after:China

Address after:Room 213-175, 2nd Floor, Building 1, No. 180 Kecheng Street, Qiaosi Street, Linping District, Hangzhou City, Zhejiang Province, 311100

Applicant after:Hangzhou Zhicun Computing Technology Co.,Ltd.

Address before:100080 15 / F, west block, brilliant times building, Haidian District, Beijing

Applicant before:BEIJING WITINMEM TECHNOLOGY Co.,Ltd.

Country or region before:China

GR01Patent grant
GR01Patent grant

[8]ページ先頭

©2009-2025 Movatter.jp