Movatterモバイル変換


[0]ホーム

URL:


CN113988277A - Neural network mapping method, device and equipment for storage and computation integrated chip - Google Patents

Neural network mapping method, device and equipment for storage and computation integrated chip
Download PDF

Info

Publication number
CN113988277A
CN113988277ACN202111184060.XACN202111184060ACN113988277ACN 113988277 ACN113988277 ACN 113988277ACN 202111184060 ACN202111184060 ACN 202111184060ACN 113988277 ACN113988277 ACN 113988277A
Authority
CN
China
Prior art keywords
bias
layer
array
neural network
storage
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111184060.XA
Other languages
Chinese (zh)
Other versions
CN113988277B (en
Inventor
康卓栋
张爱飞
陆振亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Witinmem Technology Co ltd
Original Assignee
Beijing Witinmem Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Witinmem Technology Co ltdfiledCriticalBeijing Witinmem Technology Co ltd
Priority to CN202111184060.XApriorityCriticalpatent/CN113988277B/en
Publication of CN113988277ApublicationCriticalpatent/CN113988277A/en
Application grantedgrantedCritical
Publication of CN113988277BpublicationCriticalpatent/CN113988277B/en
Activelegal-statusCriticalCurrent
Anticipated expirationlegal-statusCritical

Links

Images

Classifications

Landscapes

Abstract

Translated fromChinese

本发明实施例提供一种用于存算一体芯片的神经网络映射方法、装置、设备,该方法包括:根据待映射神经网络各层对应的Bias最小行数与权重矩阵对各层进行映射排序;根据所述映射排序结果,依次将各层对应的权重矩阵排布到存算一体芯片的主阵列中,并根据权重矩阵的排布位置以及Bias最小行数将对应的Bias排布到所述存算一体芯片的Bias阵列中的与所述排布位置列对应的位置;其中,依次排布各层对应的权重矩阵时按照蛇形顺序排布,有效利用了存算一体单元,另外,基于Bias最小行数以及Bias阵列空闲情况对各Bias占据的行数进行了扩展,减少单个存算一体单元上的偏置数值,降低电流噪声,提高运算精度。

Figure 202111184060

Embodiments of the present invention provide a neural network mapping method, device, and device for a memory-computing integrated chip. The method includes: mapping and sorting each layer according to the Bias minimum row number and weight matrix corresponding to each layer of the neural network to be mapped; According to the result of the mapping and sorting, the corresponding weight matrices of each layer are sequentially arranged in the main array of the integrated memory and computing chip, and the corresponding Bias are arranged in the memory according to the arrangement position of the weight matrix and the minimum number of Bias rows. Calculate the position corresponding to the arrangement position column in the Bias array of the integrated chip; wherein, the weight matrix corresponding to each layer is arranged in a serpentine order, which effectively utilizes the integrated storage and calculation unit. In addition, based on the Bias The minimum number of rows and the idle condition of the Bias array expand the number of rows occupied by each Bias, reduce the offset value on a single storage-computation unit, reduce current noise, and improve operation accuracy.

Figure 202111184060

Description

Neural network mapping method, device and equipment for storage and computation integrated chip
Technical Field
The invention relates to the technical field of semiconductors, in particular to a neural network mapping method, device and equipment for a storage and computation integrated chip.
Background
In recent years, with the continuous development of three dimensions of algorithms, computing power and data size, machine learning technology has continuously shown strong advantages in solving many problems, wherein the artificial neural network draws attention to its outstanding performance in the fields of image recognition, target detection, semantic segmentation, and the like. However, with the enlargement of the scale of the neural network, the traditional mode of processing the neural network algorithm by using the CPU + GPU architecture gradually encounters the bottleneck of speed and power consumption, and the root of the bottleneck is that the von neumann architecture is separated, so that the data-centered neural network algorithm brings excessive data transmission overhead to the computing system, and the speed is reduced while the power consumption is increased.
The memory computing technology solves the problems caused by memory separation, the weight of a neural network is stored on the conductance of a flash memory array node in a memory-computation-integrated (NPU) chip, and then a data source expressed by voltage is sent to an array, and the current output by the array is the product of the voltage and the conductance according to ohm's law, so that the matrix multiplication and addition operation of the data source and the network weight is completed, and analog computation is carried out essentially instead of traditional digital computation.
The design of the tool chain is an important link in the whole process from design to production of the integrated chip. In the tool chain design facing to the storage and calculation integrated chip, the technology of automatically mapping the weight parameters of a specific neural network onto a chip array according to requirements is a key technology, when the trained neural network is mapped onto the storage and calculation integrated unit array of the storage and calculation integrated chip, the weights and the offsets are sequentially mapped onto the storage and calculation integrated chip array according to the sequence of each layer of the neural network, on one hand, the storage and calculation integrated unit cannot be effectively utilized, the size of the storage and calculation integrated unit array is increased, on the other hand, the offset is directly mapped onto the storage and calculation integrated chip array, the larger the offset value is, the larger the corresponding conductance of the storage and calculation integrated unit is, and under the same voltage, the larger the current of the storage and calculation integrated unit is, the larger the noise is, and the operation precision is influenced.
Disclosure of Invention
Aiming at the problems in the prior art, the invention provides a neural network mapping method, device and equipment for storing and calculating an integrated chip, which can at least partially solve the problems in the prior art.
In order to achieve the purpose, the invention adopts the following technical scheme:
in a first aspect, a neural network mapping method for storing a monolithic chip is provided, including:
mapping and sequencing each layer according to the minimum line number of the Bias corresponding to each layer of the neural network to be mapped and the weight matrix;
according to the mapping sorting result, sequentially arranging the weight matrixes corresponding to the layers into a main array of the storage and computation integrated chip, and arranging the corresponding Bias to a position, corresponding to the arrangement position column, in the Bias array of the storage and computation integrated chip according to the arrangement position of the weight matrixes and the minimum number of rows of the Bias;
and when the weight matrixes corresponding to the layers are arranged in sequence, the weight matrixes are arranged in a snake-shaped sequence.
Furthermore, the main array comprises a plurality of storage and calculation integrated unit blocks distributed in an array, and the weight matrix corresponding to each layer is arranged on the storage and calculation integrated unit blocks in the main array;
arrange the weight matrix that each layer corresponds in proper order and arrange according to snakelike order when, include: and sequentially polling the storage and calculation integrated unit blocks in the main array according to a snake-shaped sequence aiming at the weight matrix corresponding to each layer of neural network so as to find the arrangement position.
Further, the mapping and sorting each layer according to the minimum line number of Bias and the weight matrix corresponding to each layer of the neural network to be mapped includes:
mapping and sequencing each layer according to the minimum number of rows of the Bias corresponding to each layer;
and carrying out mapping sorting on the layers with the same minimum row number of the Bias according to the column number of the weight matrix.
Further, the neural network mapping method for the storage-computation-integrated chip further comprises the following steps of;
and writing the weight matrix and the Bias of each layer of the neural network to be mapped into the integrated storage chip according to the configuration result.
Further, the sequentially polling the bank unit blocks in the main array in a serpentine order to find the arrangement position includes:
sequentially polling whether the storage and calculation integrated unit blocks in the main array have positions meeting the arrangement condition of the current layer according to a snake-shaped sequence;
if so, arranging the weight matrix of the current layer to a position meeting the arrangement condition of the current layer;
if not, continuing polling the next storage and calculation integrated unit block until finding the position meeting the arrangement condition;
wherein the arrangement conditions are as follows: and the weight matrix is arranged on the current calculation integral unit block in the current arrangement period, is side by side and can accommodate the weight matrix of the current layer.
Further, the weight matrix is arranged from the column next to the non-idle column.
Further, the sequentially polling the bank unit blocks in the main array in a serpentine order to find the arrangement position further includes:
if the position meeting the arrangement condition is not found after all the storage and calculation integrated unit blocks are polled according to the snake-shaped sequence, returning to the first storage and calculation integrated unit block, and entering the next arrangement period:
judging whether the idle position of the first storage integral unit block can accommodate the weight matrix of the current layer;
if so, arranging the weight matrix of the current layer at the idle position of the first storage integral unit block;
if not, sequentially polling all the storage and calculation integrated unit blocks according to the snake-shaped sequence until the storage and calculation integrated unit blocks capable of containing the weight matrix are found, and arranging the weight matrix of the current layer at the idle position of the storage and calculation integrated unit blocks;
wherein the weight matrix is arranged starting from the next row of the non-idle rows.
Further, when the corresponding Bias is arranged to the position corresponding to the arrangement position column in the Bias array of the storage and computation integrated chip, the Bias is arranged to the next row of the non-idle row in the position corresponding to the column.
Further, the neural network mapping method for the storage-computation-integrated chip further comprises the following steps:
and expanding the arrangement of the Bias according to the Bias arrangement result and the idle condition of the Bias array to obtain a final Bias arrangement result.
Further, the expanding the configuration of the Bias according to the Bias configuration result and the idle condition of the Bias array includes:
judging whether the number of rows occupied by all the Bias can be expanded in multiples according to the number of idle rows of the Bias array;
if yes, multiplying the number of rows occupied by all the Bias;
if not, selecting the Bias according to a preset rule for expansion.
Further, the neural network mapping method for the storage-computation-integrated chip further comprises the following steps:
dividing a storage and calculation integrated unit array of the storage and calculation integrated chip into a main array and a Bias array;
the main array is divided into a plurality of bank unit blocks.
Further, the neural network mapping method for the storage-computation-integrated chip further comprises the following steps:
acquiring parameters of a neural network to be mapped and parameters of a target storage integrated chip, wherein the parameters of the neural network to be mapped comprise weight matrixes and Bias corresponding to each layer;
and obtaining the minimum number of rows of the Bias corresponding to each layer according to the Bias of each layer and the parameters of the storage and computation integrated chip.
In a second aspect, there is provided a computing integrated chip comprising: an array of banked cells for performing neural network operations, the array of banked cells comprising: the device comprises a main array and a Bias array, wherein weight matrixes corresponding to each layer of neural network are mapped in the main array; the Bias array is mapped with the Bias corresponding to each layer of neural network;
and the weight matrixes corresponding to the layers are arranged in a snake shape on the main array based on the minimum row number of the Bias and the column number of the weight matrixes, and the Bias is arranged to the position, corresponding to the arrangement position column of the corresponding weight matrix, in the Bias array.
Further, the main array comprises a plurality of storage and calculation integrated unit blocks distributed in an array mode, and the weight matrix corresponding to each layer is arranged on the storage and calculation integrated unit blocks in the main array;
and sequentially polling the calculation integrated unit blocks in the main array according to a snake-shaped sequence aiming at the weight matrix corresponding to each layer of neural network so as to find out the corresponding arrangement position.
Further, the principle of the sorting is as follows:
mapping and sequencing each layer according to the minimum number of rows of the Bias corresponding to each layer;
and carrying out mapping sorting on the layers with the same minimum row number of the Bias according to the column number of the weight matrix.
Further, the Bias arrangement mode expands the number of rows occupied by each Bias based on the minimum number of rows of the Bias and the idle condition of the Bias array on the basis of the position arranged in the Bias array corresponding to the arrangement position column of the corresponding weight matrix.
In a third aspect, a computing integrated chip is provided, comprising: an array of banked cells for performing neural network operations, the array of banked cells comprising: the device comprises a main array and a Bias array, wherein weight matrixes corresponding to each layer of neural network are mapped in the main array; the Bias array is mapped with the Bias corresponding to each layer of neural network;
the weight matrix and the corresponding Bias arrangement mode are generated according to the neural network mapping method.
In a fourth aspect, a neural network mapping apparatus for storing a monolithic chip is provided, comprising:
the sorting module is used for mapping and sorting each layer according to the minimum line number of the Bias corresponding to each layer of the neural network to be mapped and the weight matrix;
the arrangement module is used for sequentially arranging the weight matrixes corresponding to the layers into the main array of the storage and computation integrated chip according to the mapping sequencing result, and arranging the corresponding Bias to the position, corresponding to the arrangement position column, in the Bias array of the storage and computation integrated chip according to the arrangement position of the weight matrixes and the minimum number of rows of the Bias;
and when the weight matrixes corresponding to the layers are arranged in sequence, the weight matrixes are arranged in a snake-shaped sequence.
In a fifth aspect, an electronic device is provided, which includes a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor executes the program to implement the steps of the neural network mapping method described above.
In a sixth aspect, a computer-readable storage medium is provided, on which a computer program is stored, which computer program, when being executed by a processor, carries out the steps of the neural network mapping method described above.
The embodiment of the invention provides a neural network mapping method, a device and equipment for a storage and computation integrated chip, wherein the method comprises the following steps: mapping and sequencing each layer according to the minimum line number of the Bias corresponding to each layer of the neural network to be mapped and the weight matrix; according to the mapping sorting result, sequentially arranging the weight matrixes corresponding to the layers into a main array of the storage and computation integrated chip, and arranging the corresponding Bias to a position, corresponding to the arrangement position column, in the Bias array of the storage and computation integrated chip according to the arrangement position of the weight matrixes and the minimum number of rows of the Bias; and when the weight matrixes corresponding to the layers are arranged in sequence, the weight matrixes are arranged in a snake-shaped sequence. Through arranging each layer in a snake shape after sequencing, the storage and calculation integrated unit is effectively utilized, and the size of the storage and calculation integrated unit array can be reduced under the same operation scale.
In addition, in the embodiment of the invention, the number of rows occupied by each Bias is expanded based on the minimum number of rows of the Bias and the idle condition of the Bias array, so that the offset value on a single storage and calculation integrated unit is reduced, the current noise is reduced, and the calculation accuracy is improved.
In order to make the aforementioned and other objects, features and advantages of the invention comprehensible, preferred embodiments accompanied with figures are described in detail below.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts. In the drawings:
FIG. 1 shows a flow diagram of a neural network mapping method for a compute-all-in-one chip in an embodiment of the invention;
FIG. 2 illustrates a method for partitioning a storage-integrated cell array according to an embodiment of the present invention;
fig. 3 shows the specific steps of step S100 in the embodiment of the present invention;
FIG. 4 illustrates a process for ordering layers of a neural network in an embodiment of the present invention;
fig. 5 illustrates a weight matrix and the arrangement result of the corresponding Bias in the embodiment of the present invention;
fig. 6 illustrates an arrangement expansion process of Bias in an embodiment of the present invention;
FIG. 7 is a block diagram of a neural network mapping apparatus for storing a monolithic chip according to an embodiment of the present invention;
fig. 8 is a block diagram of an electronic device according to an embodiment of the invention.
Detailed Description
In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only partial embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
It should be noted that the terms "comprises" and "comprising," and any variations thereof, in the description and claims of this application and the above-described drawings, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.
FIG. 1 shows a flow diagram of a neural network mapping method for a compute-all-in-one chip in an embodiment of the invention; as shown in fig. 1, the neural network mapping method for a compute-all-in-one chip may include the following:
step S100: mapping and sequencing each layer according to the minimum line number of the Bias corresponding to each layer of the neural network to be mapped and the weight matrix;
and aiming at the trained specific neural network, acquiring the weight matrix parameters, the Bias size and the minimum number of lines occupied on the chip of each layer. Firstly, sorting each layer of neural network according to the minimum row number of the Bias and the column number of the weight matrix, and taking the sorting as the mapping arrangement sequence.
Step S200: according to the mapping sorting result, sequentially arranging the weight matrixes corresponding to the layers into a main array of a storage and computation integrated chip;
when the weight matrices corresponding to the layers are sequentially arranged according to the order obtained in step S100, the weight matrices are arranged in a serpentine order.
Specifically, under a scenario that the incomes chip is input by rows of the incomes unit and output by columns of the incomes unit (the drawings in the embodiments of the present invention are based on this premise, those skilled in the art can understand that rows and columns of the incomes unit are only a relative concept, and in specific applications, the rows and columns may also be input by columns and output by rows, and the principle is the same, which is not described herein), when the weight matrix is arranged, for each block arranged in the main array (the main array is divided into a plurality of incomes unit blocks and arranged in an array of the incomes unit blocks), the first row (the row of the block but not the row of the incomes unit) may be in the X direction, the second row in the-X direction, the third row in the X direction, the fourth row in the-X direction, and so on in a serpentine manner, or the first row in the-X direction, the second line is in a snake-shaped mode according to the X direction, the third line is in the-X direction, the fourth line is in the X direction and the like; of course, it may also be a serpentine manner, starting from the last row, the last row being in the-X direction, the second to last row being in the X direction, the third to last row being in the-X direction, the fourth to last row being in the X direction, and so on.
Wherein, the integral storage unit can be a flash memory unit.
Step S300: arranging the corresponding Bias to a position corresponding to the arrangement position column in the Bias array of the storage and computation integrated chip according to the arrangement position of the weight matrix and the minimum line number of the Bias;
specifically, due to the particularity of the calculation of the memory integrated chip, the weight matrix and the Bias corresponding to the weight matrix need to be aligned in rows, so that the arrangement position of the Bias is aligned in rows with the arrangement position of the corresponding weight matrix, and the arrangement mode in the row direction is arranged according to the minimum row number of the Bias.
Generating an arrangement scheme through the steps S100 to S300, writing parameters of each layer of the neural network into a storage and computation integrated unit array of a storage and computation integrated chip through a compiling tool according to the arrangement scheme, gating a weight matrix corresponding to the operation of the layer of the neural network and a row and column where the Bias is located through a row and column decoder when a certain layer of the neural network is executed according to the arrangement scheme and control requirements in an application reasoning stage, inputting input signals of the layer of the neural network into rows corresponding to the weight matrix, performing matrix multiplication and addition operation with the weight matrix, then overlapping with the corresponding Bias, and obtaining a calculation result of the layer of the neural network in a corresponding column.
Through adopting above-mentioned technical scheme, through after sequencing each layer neural network parameter, arrange according to snakelike, effective complementary, for example, according toorder line number 9, 8, 7, 6, 5, 4, 3, 2, 1 sequencing, after carrying out snakelike range, with the correspondence of less line number and more line number, just so can reduce the row number that occupies in the Bios array, so that provide more line numbers for the extension, increased the utilization efficiency of saving integrative unit array, under the same operation scale, the integrative unit array of saving that needs can greatly reduce, adapt to the miniaturized demand of chip. In an optional embodiment, the neural network mapping method for a computation-integrated chip may further include: dividing a storage and calculation integrated unit array of the storage and calculation integrated chip into a main array and a Bias array; the main array is divided into a plurality of bank unit blocks. The use scenes and the corresponding neural network scales can be referred to for division, and the use performance can be guaranteed on the basis of effectively improving the resource utilization rate.
Specifically, as shown in fig. 2 (a), the actual physical architecture of the chip is composed of a main array and a Bias array. The applicant finds that, in the practical application process, since too large current in the simulation calculation process may significantly affect the calculation result, in the embodiment of the present invention, the array may be logically partitioned, as shown in (b) in fig. 2, the main array may be divided into 2 × 4 blocks, for example only, in the practical application, the array may be partitioned according to needs, and the Bias array may be partitioned or not partitioned. If the chip is small, the main array may not be blocked, but blocking is necessary in most scenarios in practice due to the generally large network size. It should be added that the division is performed according to the actual performance of the chip, and the size of each block that can be divided is the same, or the size of each block that can be divided is different.
In an optional embodiment, the neural network mapping method may further include: and writing the weight matrix and the Bias of each layer of the neural network to be mapped into the integrated storage chip according to the configuration result.
Specifically, the mapping method is executed in a tool chain, and may be understood as a program running on a terminal device, a server, or a chip burning apparatus, and an arrangement scheme is generated by the mapping method, and then the weight matrix and the Bias need to be written on the integrated storage chip according to the arrangement scheme. Then the memory and computation integrated chip can be installed on a corresponding equipment circuit board for reasoning application to realize neural network operation, for example, the memory and computation integrated chip can be installed on a toy for voice recognition, and at the moment, the neural network parameters written in the chip are parameters corresponding to the voice recognition neural network; of course, the chip may also be installed on a face recognition device, and the neural network parameters written in the chip are parameters corresponding to an image recognition neural network, which, of course, only exemplifies several chip application scenarios.
In an alternative embodiment, referring to fig. 3, this step S100 may include the following:
step S110: mapping and sequencing each layer according to the minimum number of rows of the Bias corresponding to each layer;
step S120: and carrying out mapping sorting on the layers with the same minimum row number of the Bias according to the column number of the weight matrix.
Specifically, referring to (a) in fig. 4, parameters of a neural network are shown, wherein the upper row of 5 matrices is weight matrices corresponding to 5 neural network layers, the lower row of 5 matrices is Bias corresponding to 5 neural network layers, wherein, taking the first weight matrix as an example, 1024 × 128 represents the scale of the weight matrix, which is 1024 columns and 128 rows, the corresponding Bias is 1024 × 2, which represents that the Bias is 1024 columns and the minimum number of occupied rows is 2 rows, wherein the number of columns of the weight matrix corresponding to one layer of neural network is the same as the number of columns corresponding to the Bias, which is determined by the operational characteristics in the integrated chip.
When the layers of the neural network are sorted, the layers can be sorted according to the minimum line number of the Bias corresponding to each layer; referring to (b) in fig. 4, sorting is performed according to the minimum number of rows of Bias from large to small, and of course, in practical applications, sorting may also be performed according to the principle of the same.
In addition, the minimum row numbers of the two Bias are the same, namely 1024 × 2 and 768 × 2, in this case, the layers with the same minimum row number of Bias are sorted from large to small according to the column number of the weight matrix, and the final sorting result is shown in fig. 4 (b).
It is worth noting that, according to the trained specific neural network, the embodiment of the present invention obtains the size of the Bias of each layer and the minimum number of rows occupied on the chip, and firstly sorts the layers according to the minimum number of rows occupied by the Bias of each layer from large to small, and then, locally sorts the layers with the same number of rows occupied by the Bias from small to large according to the number of columns. The first is to give priority to reducing the size of each physical row of Bias in the network mapping process, because this can improve the chip computation accuracy, and the input received by the embodiment of the present invention contains the minimum number of rows occupied by each layer of Bias, and the Bias rows on the array after mapping according to the number of rows generally have unmapped space, and if m rows are used to carry Bias values originally set to be carried by 1 row, the designed Bias values become 1/m of the original value on each row. Secondly, in application, the number of input paths of each layer of the network is small, and the number of output paths is large, as shown in fig. 4, for the illustrated network layer, the input is m rows, the output is n columns, and often m is smaller than n, so that in order to better ensure that the whole network can be accommodated in a limited space region, the sizes of all layers in the X direction are sorted from large to small, and on the premise that the size of the Bias is the first priority, all layers of the network are placed with the size of the X direction as the second priority.
In an optional embodiment, the main array includes a plurality of integrally storing and calculating unit blocks distributed in an array, referring to fig. 2, when the weight matrices corresponding to each layer are arranged on the integrally storing and calculating unit blocks in the main array, the weight matrices corresponding to each layer are arranged in a serpentine sequence when the weight matrices corresponding to each layer are arranged in sequence, which specifically includes: and sequentially polling the storage and calculation integrated unit blocks in the main array according to a snake-shaped sequence aiming at the weight matrix corresponding to each layer of neural network so as to find the arrangement position.
Specifically, whether the storage and calculation integrated unit blocks in the main array have positions meeting the arrangement condition of the current layer or not is polled sequentially according to a snake-shaped sequence; wherein the arrangement conditions are as follows: and the weight matrix is arranged on the current calculation integral unit block in the current arrangement period, is side by side and can accommodate the weight matrix of the current layer.
If so, arranging the weight matrix of the current layer to a position meeting the arrangement condition of the current layer;
if not, continuing polling the next storage and calculation integrated unit block until finding the position meeting the arrangement condition;
it should be noted that, in order to reduce the resource idleness to the maximum, when the position satisfying the condition is found, the weight matrix is arranged from the next column of the non-idle column, and of course, in a special application occasion or for interference reduction, a certain number of columns may be spaced, and the selection is specifically performed according to the actual application requirement.
And if the positions meeting the arrangement condition are not found after all the storage and calculation integrated unit blocks are polled according to the snake-shaped sequence, returning to the first storage and calculation integrated unit block and entering the next arrangement period.
In a new arrangement period, firstly judging whether the idle position of the first storage integral unit block can accommodate the weight matrix of the current layer; if so, arranging the weight matrix of the current layer at the idle position of the first storage integral unit block; if not, sequentially polling all the storage and calculation integrated unit blocks according to the snake-shaped sequence until the storage and calculation integrated unit blocks capable of containing the weight matrix are found, and arranging the weight matrix of the current layer at the idle position of the storage and calculation integrated unit blocks;
it should be noted that, in order to reduce the resource idle to the maximum, when finding the position satisfying the condition, the weight matrix is arranged from the next row of the non-idle rows, and certainly, in a special application occasion or for interference reduction, a certain number of rows may be provided, specifically, the rows are selected according to the actual application requirement.
By adopting the technical scheme, after the sorted weight matrix is arranged on the main array, the corresponding Bias is arranged to the next row of the non-idle row in the position corresponding to the row when the corresponding Bias is arranged to the position corresponding to the arrangement position row in the Bias array of the integrated storage chip according to the principle that the weight matrix corresponds to the corresponding Bias row.
In order to make the embodiment of the present invention better understood by those skilled in the art, the mapping process is described in detail below with reference to fig. 5.
First, taking a 20-layer neural network as an example, each layer of the network is mapped onto a bank cell array of a bank chip. After the sorting is finished, recording the layers of the neural network as 0-19 in sequence, numbering weight matrixes as D0-D19 in sequence, numbering Bias as B0-B19 in sequence, firstly arranging the weight matrixes as D0-D19, then, in the example, arranging Bias B0-B19 according to the arrangement result of the weight matrixes, in the arrangement process, in the main array, arranging the first row of storage and calculation integrated unit blocks in a left-to-right sequence, arranging the second row of storage and calculation integrated unit blocks in a right-to-left sequence, arranging the third row of storage and calculation integrated unit blocks in a left-to-right sequence, arranging the fourth row of storage and calculation integrated unit blocks in a right-to-left sequence, then returning to the first row, and polling according to the snake-shaped circulation sequence;
for example, for D0, first poll block (0,0), there is a position that can accommodate the weight matrix of the current layer, at which time D0 is arranged at (0, 0); for D1, first poll block (0,0), there is a location that satisfies the condition, arrange D1 at (0,0) and side by side with D0; for D2, first poll block (0,0), and locate on block (0,0) side-by-side with D0, D1 to hold the lower D2, then poll block (0,1) in order, and then arrange D2 at (0, 1); for D3, polling block (0,0), which is aligned with D0, D1 on block (0,0) and cannot accommodate lower D3, then polling block (0,1), which is aligned with D2 on block (0,1) and cannot accommodate D3, then polling block (1,1) in serpentine shape, and then arranging D3 at (1, 1); for D4, polling block (0,0), which cannot accommodate lower D4 in the (0,0) block side-by-side position with D0, D1, then polling block (0,1), which cannot accommodate D4 in the (0,1) block side-by-side position with D2, then polling block (1,1) in serpentine shape, and then arranging D4 in the (1,1) block side-by-side position with D3; for D5, block (0,0) was first polled, with block (0,0) side-by-side with D0, D1 in a position to accommodate D5, at which time D5 was arranged at (0,0) and side-by-side with D0, D1; by analogy, after D15 is arranged, for block D16, by sequentially polling from (0,0), a position on each block which does not satisfy the condition (a position which can accommodate D16 and is aligned with the matrix before the present arrangement period) is jumped to (0,0), and the next arrangement period is entered.
In a new arrangement period, it is first determined whether the free position of (0,0) can accommodate D16; if yes, D16 is arranged (0,0), and when the arrangement is performed, the arrangement is started from the next row of the non-idle rows, in this case, the arrangement is started from the next row of D1, and the arrangement of D17 to D19 is referred to as D1 to D5, which is not described herein again.
It is worth noting that in an alternative embodiment, D17 first polls (0,0) whether the free position can accommodate D17, wherein the free position includes the position of the row where D0 is located (i.e. the row where the previous arrangement period is located), polls in turn whether the right side of D16 in the current arrangement period has the position capable of accommodating D17, and so on, and the arrangement principle is that in one round of mapping, two layers of networks in an array block are adjacent in the X direction and cannot be staggered, but each layer of networks is from top to bottom in the Y direction and can be put down as long as the layer is put down. So that each slot on the array is considered.
In another alternative embodiment, D17 first polls (0,0) whether the free position to the right of D16 can accommodate D17, wherein the position of the row of D0 (i.e., the row of the previous layout cycle) is not polled at the time of polling, and only the position of the row of the current cycle is polled. It should be noted that, on a block, the matrix ordered before is in front of the matrix ordered after, and the front and back are determined according to the polling order, for example, for the first row of blocks (0,1) and (0,1), the polling is performed in the order from left to right, and the left position is in front of the right position, and in a block, the matrix ordered before is on the left of the matrix ordered after, for example, D0 is on the left of D1; for the second row of blocks (1,0) and (1,1), which are polled in order from right to left, the right position is ahead of the left position, and in a block in the second row, the matrix ordered ahead is to the right of the matrix ordered behind, e.g., D3 is to the right of D4.
It should be noted that, during mapping, the position of each layer of weight on the main array is considered first, and then the position of each layer of Bias on the Bias array is considered, and the general principle of mapping is as follows: starting from the (0,0) th block, mapping from left to right in the even row blocks (the 0 th block is used for calculating the even row blocks), mapping from right to left in the odd row blocks and from top to bottom in a snake-shaped sequence on the whole, and returning to the (0,0) th block after one round is finished to repeat the process until the whole neural network is mapped. (the above general principle is merely an example, in an actual process, the 0 th block row may be from right to left or may be arranged in a serpentine shape from the lowest block row to the highest block row of the array as a whole, but the serpentine principle must be followed, and the serpentine arrangement must be performed in a serpentine shape of rows, and cannot be performed in a serpentine shape of columns (this is a principle value in the case of row input and column output, it is understood by those skilled in the art that if the row output and the column input are used, the situation is the opposite, and the embodiment of the present invention is not described in detail herein). And a larger space is provided for the subsequent Bias expansion, and higher calculation precision is provided for the calculation result of the chip.
Several issues need to be addressed in the main array mapping process. Unless specifically required, adjacent blocks are based on seamless joint, and the purpose of seamless joint is to make full use of array space in the early stage of mapping as much as possible, because each layer must be put into the array in one piece, and if gaps are left between layers and are difficult to use, the gaps may be difficult to put down in the later stage of mapping. In one round of mapping, each layer looks for the position in serpentine order starting from the (0,0) th block, and if the current block is not sufficiently positioned, the next block is considered. In a round of mapping, if there is a layer that is not placed in all blocks, then a new round of mapping is entered. In the Bias mapping, it is only necessary to map sequentially from top to bottom on the Bias array according to the corresponding weight position and the minimum number of rows occupied, which is given in advance, and the number in parentheses after each Bias in fig. 5 indicates the number of rows occupied, for example, B1(9) indicates that the number of rows occupied by B1 is 9.
In an optional embodiment, the neural network mapping method for a computation-integrated chip may further include: acquiring parameters of a neural network to be mapped and parameters of a target storage integrated chip, wherein the parameters of the neural network to be mapped comprise weight matrixes and Bias corresponding to each layer; and then obtaining the minimum number of rows of the Bias corresponding to each layer according to the Bias of each layer and the parameters of the storage and computation integrated chip.
Specifically, after the neural network model is determined, the weight array and the offset value of each layer are known, and the minimum number of rows of Bias of each layer is calculated according to the offset value and the parameter of the target chip (the attribute of Bias of each row).
The minimum number of rows of the Bias may be given in advance by an engineer in the circuit aspect according to the precision requirement, generally, the minimum precision requirement is satisfied as a standard, and the operation may also be performed according to a preset rule, which is not described in detail in the embodiment of the present invention.
In an optional embodiment, the neural network mapping method for a computation-integrated chip may further include: and expanding the arrangement of the Bias according to the Bias arrangement result and the idle condition of the Bias array to obtain a final Bias arrangement result.
Specifically, after the configuration is completed, the Bias array may have idle rows, in order to reduce Bias values stored in all or part of the storage integration units in the Bias array as much as possible, the idle rows in the Bias array are required to be used for performing secondary expansion on the configuration of the Bias to obtain a final configuration scheme, and then neural network parameters are written into the storage integration chip according to the final configuration scheme.
Because the calculation-integrated chip essentially adopts analog calculation, the larger the Bias value of each calculation-integrated unit on the Bias array is, the larger the noise generated by final calculation is, and the excessive noise introduced by the excessive Bias can have a decisive influence on the calculation accuracy, so that the actual number of rows of the Bias array occupied by one row of Bias in logic can be expanded as much as possible according to the size of the array, and if the occupied actual number of rows is m, the size of the Bias stored in each row is 1/m of the size of the Bias in logic, thereby improving the calculation accuracy.
In an optional embodiment, expanding the assignment of the Bias according to the Bias assignment result and the idle condition of the Bias array includes:
judging whether the number of rows occupied by all the Bias can be expanded in multiples according to the number of idle rows of the Bias array;
if yes, multiplying the number of rows occupied by all the Bias;
if not, selecting the Bias according to a preset rule for expansion.
Specifically, while mapping each layer of weights of the neural network onto the main array, the preliminary mapping of each layer of Bias is completed on the Bias array according to the minimum number of rows occupied by each layer of Bias given in advance, and the mapping result is shown in (a) of fig. 6. And then, firstly, expanding according to the maximum integral multiple of the expansion of the rows of the mapped Bias, if the total rows of the Bias array is 10 rows, and if 3 rows are mapped, expanding each row to be 3 times of the original row, and if 4 rows are mapped, expanding each row to be 2 times of the original row. For the array arrangement shown in fig. 6 (a), the expansion can be doubled completely, and the result is shown in fig. 6 (b). Then, if the Bias array has idle rows, selecting the number of the mapped Bias rows equal to the idle rows from bottom to top, and expanding the Bias rows into two rows, as shown in (c) in fig. 6, and if there are 2 idle rows, respectively expanding the third row and the fourth row from bottom to top into two rows. The idle row refers to a row to which Bias is not mapped in the whole row, and a row to which Bias is mapped in part is calculated as a mapped Bias row.
In general, the principle of expansion is to utilize the resources of the Bias array as much as possible, reduce idle rows, expand the Bias corresponding to all layers under the condition that the Bias corresponding to all layers can be expanded, expand the Bias corresponding to part of the layers under the condition that the Bias corresponding to all layers cannot be expanded, and expand part of the rows Bias under the condition that the Bias corresponding to part of the layers cannot be expanded.
When the Bias corresponding to the partial layer is expanded, the Bias corresponding to the partial layer can be selected from the front to the back or from the back to the front according to the sequence of the above layers for expansion, and the Bias corresponding to the partial layer can also be expanded according to the preset priority; for example, the Bias corresponding to the layer having a large influence on the precision may be preferentially expanded, or the Bias corresponding to the layer having a large Bias value corresponding to the single storage-integration unit after the preliminary mapping may be expanded, which is specifically selected according to the actual application requirement.
In summary, the embodiments of the present invention provide a method for automatically mapping weights of a neural network to a saving-integration chip, which can be integrated into a tool chain of a saving-integration chip design to provide convenience for a saving-integration chip user.
The distribution mode of the Bias is expanded, so that on one hand, the storage and calculation integrated unit is fully utilized, the resource utilization rate is improved, the resource idling is reduced, on the other hand, the Bias values stored in the storage and calculation integrated unit in part or all of the Bias array are reduced, the smaller the Bias value stored in a single storage and calculation integrated unit is, the smaller the conductance is, the lower the current is, the smaller the current noise is and the higher the operation precision is under the same voltage; in addition, after the parameters of each layer of neural network are sequenced and arranged according to the snake shape, effective complementation is realized, the utilization efficiency of the storage and calculation integrated unit array is increased, and the required storage and calculation integrated unit array can be greatly reduced under the same operation scale, so that the requirement of chip miniaturization is met.
The embodiment of the invention also provides a storage and calculation integrated chip, which comprises: an array of banked cells for performing neural network operations, the array of banked cells comprising: the device comprises a main array and a Bias array, wherein weight matrixes corresponding to each layer of neural network are mapped in the main array; the Bias array is mapped with the Bias corresponding to each layer of neural network;
and the weight matrixes corresponding to the layers are arranged in a snake shape on the main array based on the minimum row number of the Bias and the column number of the weight matrixes, and the Bias is arranged to the position, corresponding to the arrangement position column of the corresponding weight matrix, in the Bias array.
In an optional embodiment, the main array comprises a plurality of storage and calculation integrated unit blocks distributed in an array, and the weight matrix corresponding to each layer is arranged on the storage and calculation integrated unit blocks in the main array;
and sequentially polling the calculation integrated unit blocks in the main array according to a snake-shaped sequence aiming at the weight matrix corresponding to each layer of neural network so as to find out the corresponding arrangement position.
In an alternative embodiment, the principle of ordering is: mapping and sequencing each layer according to the minimum number of rows of the Bias corresponding to each layer; and carrying out mapping sorting on the layers with the same minimum row number of the Bias according to the column number of the weight matrix.
In an optional embodiment, the Bias arrangement manner further expands the number of rows occupied by each Bias based on the minimum number of rows of the Bias and the idle condition of the Bias array, on the basis of the position arranged in the Bias array corresponding to the arrangement position column of the corresponding weight matrix.
The storage and computation integrated chip, the weight matrix and the corresponding Bias arrangement mode provided by the embodiment of the invention are generated according to the neural network mapping method.
It should be noted that the memory integrated chip provided in the embodiment of the present invention may be applied to various electronic devices, such as: smart phones, tablet electronic devices, network set-top boxes, portable computers, desktop computers, Personal Digital Assistants (PDAs), vehicle-mounted devices, smart wearable devices, toys, smart home control devices, pipeline device controllers, and the like. Wherein, intelligence wearing equipment can include intelligent glasses, intelligent wrist-watch, intelligent bracelet etc..
Based on the same inventive concept, the present application further provides a neural network mapping apparatus for storing a monolithic chip, which can be used to implement the methods described in the foregoing embodiments, as described in the following embodiments. The principle of solving the problems by the neural network mapping device for storing and calculating the all-in-one chip is similar to that of the method, so the implementation of the neural network mapping device for storing and calculating the all-in-one chip can refer to the implementation of the method, and repeated parts are not described again. As used hereinafter, the term "unit" or "module" may be a combination of software and/or hardware that implements a predetermined function. Although the means described in the embodiments below are preferably implemented in software, an implementation in hardware, or a combination of software and hardware is also possible and contemplated.
Fig. 7 is a block diagram of a neural network mapping apparatus for storing a monolithic chip according to an embodiment of the present invention. The neural network mapping device for the storage and computation integrated chip comprises: a sortingmodule 10, aweight arrangement module 20 and an offsetarrangement module 30.
The sorting module maps and sorts each layer according to the minimum line number of the Bias corresponding to each layer of the neural network to be mapped and the weight matrix;
the weight arrangement module sequentially arranges the weight matrixes corresponding to the layers into a main array of the storage and computation integrated chip according to the mapping sequencing result;
the Bias arrangement module arranges the corresponding Bias to the position corresponding to the arrangement position column in the Bias array of the storage and computation integrated chip according to the arrangement position of the weight matrix and the minimum line number of the Bias;
and when the weight matrixes corresponding to the layers are arranged in sequence, the weight matrixes are arranged in a snake-shaped sequence.
The apparatuses, modules or units illustrated in the above embodiments may be implemented by a computer chip or an entity, or implemented by a product with certain functions. A typical implementation device is an electronic device, which may be, for example, a personal computer, a laptop computer, a cellular telephone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.
In a typical example, the electronic device specifically includes a memory, a processor, and a computer program stored on the memory and executable on the processor, and the processor executes the program to implement the steps of the neural network mapping method for storing a monolithic chip described above.
Referring now to FIG. 8, shown is a schematic diagram of an electronic device 600 suitable for use in implementing embodiments of the present application.
As shown in fig. 8, the electronic apparatus 600 includes a Central Processing Unit (CPU)601 that can perform various appropriate works and processes according to a program stored in a Read Only Memory (ROM)602 or a program loaded from astorage section 608 into a Random Access Memory (RAM)) 603. In the RAM603, various programs and data necessary for the operation of the system 600 are also stored. The CPU601, ROM602, and RAM603 are connected to each other via abus 604. An input/output (I/O)interface 605 is also connected tobus 604.
The following components are connected to the I/O interface 605: aninput portion 606 including a keyboard, a mouse, and the like; anoutput portion 607 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; astorage section 608 including a hard disk and the like; and acommunication section 609 including a network interface card such as a LAN card, a modem, or the like. Thecommunication section 609 performs communication processing via a network such as the internet. Thedriver 610 is also connected to the I/O interface 605 as needed. Aremovable medium 611 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on thedrive 610 as necessary, so that a computer program read out therefrom is mounted as necessary on thestorage section 608.
In particular, according to an embodiment of the present invention, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, an embodiment of the present invention includes a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the above-described neural network mapping method for storing a monolithic chip.
In such an embodiment, the computer program may be downloaded and installed from a network through thecommunication section 609, and/or installed from theremovable medium 611.
Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.
For convenience of description, the above devices are described as being divided into various units by function, and are described separately. Of course, the functionality of the units may be implemented in one or more software and/or hardware when implementing the present application.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The application may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The application may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.
The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the system embodiment, since it is substantially similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.
The above description is only an example of the present application and is not intended to limit the present application. Various modifications and changes may occur to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the scope of the claims of the present application.

Claims (20)

Translated fromChinese
1.一种用于存算一体芯片的神经网络映射方法,其特征在于,包括:1. a neural network mapping method for storing and calculating an integrated chip, is characterized in that, comprising:根据待映射神经网络各层对应的Bias最小行数与权重矩阵对各层进行映射排序;Map and sort each layer according to the Bias minimum number of rows and weight matrix corresponding to each layer of the neural network to be mapped;根据所述映射排序结果,依次将各层对应的权重矩阵排布到存算一体芯片的主阵列中,并根据权重矩阵的排布位置以及Bias最小行数将对应的Bias排布到所述存算一体芯片的Bias阵列中的与所述排布位置列对应的位置;According to the result of the mapping and sorting, the weight matrix corresponding to each layer is sequentially arranged in the main array of the integrated memory and computing chip, and the corresponding Bias is arranged in the memory according to the arrangement position of the weight matrix and the minimum number of Bias rows. Calculate the position corresponding to the arrangement position column in the Bias array of the integrated chip;其中,依次排布各层对应的权重矩阵时按照蛇形顺序排布。Among them, when arranging the weight matrices corresponding to each layer in sequence, they are arranged in a serpentine order.2.根据权利要求1所述的用于存算一体芯片的神经网络映射方法,其特征在于,所述主阵列中包含多个阵列分布的存算一体单元块,各层对应的权重矩阵排布到所述主阵列中的存算一体单元块上;2 . The neural network mapping method for a memory-computing integrated chip according to claim 1 , wherein the main array includes a plurality of array-distributed memory-computing integrated unit blocks, and the corresponding weight matrix of each layer is arranged in a weight matrix. 3 . onto the storage-computing integrated unit block in the main array;所述依次排布各层对应的权重矩阵时按照蛇形顺序排布,包括:针对每一层神经网络对应的权重矩阵,按照蛇形顺序依次轮询所述主阵列中的存算一体单元块以找到所述排布位置。When arranging the weight matrices corresponding to each layer in sequence, they are arranged in a serpentine order, including: for the weight matrix corresponding to each layer of the neural network, polling the storage and computing integrated unit blocks in the main array in a serpentine order in turn to find the arrangement position.3.根据权利要求2所述的用于存算一体芯片的神经网络映射方法,其特征在于,所述根据待映射神经网络各层对应的Bias最小行数与权重矩阵对各层进行映射排序,包括:3. the neural network mapping method for storing and calculating integrated chip according to claim 2, is characterized in that, described according to the Bias minimum number of rows corresponding to each layer of neural network to be mapped and weight matrix to map and sort each layer, include:根据各层对应的Bias最小行数对各层进行映射排序;Map and sort each layer according to the minimum number of Bias rows corresponding to each layer;对Bias最小行数相同的层按照权重矩阵的列数进行映射排序。The layers with the same minimum number of rows of Bias are mapped and sorted according to the number of columns of the weight matrix.4.根据权利要求2所述的用于存算一体芯片的神经网络映射方法,其特征在于,还包括;4. The neural network mapping method for storing and calculating an integrated chip according to claim 2, characterized in that, further comprising;根据排布结果将所述待映射神经网络各层的权重矩阵和Bias写入所述存算一体芯片上。According to the arrangement result, the weight matrix and Bias of each layer of the neural network to be mapped are written into the integrated storage and computing chip.5.根据权利要求2所述的用于存算一体芯片的神经网络映射方法,其特征在于,所述按照蛇形顺序依次轮询所述主阵列中的存算一体单元块以找到所述排布位置,包括:5 . The neural network mapping method for a memory-computing integrated chip according to claim 2 , wherein the memory-computing integrated unit blocks in the main array are polled in turn according to a serpentine sequence to find the row. 6 . cloth locations, including:按照蛇形顺序依次轮询所述主阵列中的存算一体单元块是否存在满足当前层排布条件的位置;According to the serpentine sequence, poll the storage-computing integrated unit block in the main array in turn to see if there is a position that satisfies the current layer arrangement condition;若是,将当前层的权重矩阵排布到满足当前层排布条件的位置;If so, arrange the weight matrix of the current layer to a position that satisfies the layout conditions of the current layer;若否,继续轮询下一存算一体单元块,直至找到满足所述排布条件的位置;If not, continue to poll the next integrated storage and computing unit block until a position that satisfies the arrangement condition is found;其中,所述排布条件为:与当前排布周期中已排布在当前存算一体单元块上的权重矩阵并排且能够容纳当前层的权重矩阵。Wherein, the arrangement condition is that it is side by side with the weight matrix that has been arranged on the current storage-computing integrated unit block in the current arrangement cycle and can accommodate the weight matrix of the current layer.6.根据权利要求5所述的用于存算一体芯片的神经网络映射方法,其特征在于,排布权重矩阵时从非空闲列的下一列开始排布。6 . The neural network mapping method for a memory-computing integrated chip according to claim 5 , wherein when arranging the weight matrix, the arrangement starts from the next column of the non-idle column. 7 .7.根据权利要求5所述的用于存算一体芯片的神经网络映射方法,其特征在于,所述按照蛇形顺序依次轮询所述主阵列中的存算一体单元块以找到所述排布位置,还包括:7 . The neural network mapping method for a memory-computing integrated chip according to claim 5 , wherein, the memory-computing integrated unit blocks in the main array are polled sequentially in a serpentine sequence to find the row. 8 . cloth location, also includes:若按照蛇形顺序轮询完所有存算一体单元块后仍未找到满足所述排布条件的位置,则返回首个存算一体单元块,进入下一排布周期:If the location that satisfies the arrangement conditions is not found after polling all the integrated storage and calculation unit blocks in a serpentine sequence, the first integrated storage and calculation unit block is returned, and the next arrangement cycle is entered:判断首个存算一体单元块的空闲位置是否能够容纳当前层的权重矩阵;Determine whether the free position of the first storage and calculation unit block can accommodate the weight matrix of the current layer;若是,将当前层的权重矩阵排布在首个存算一体单元块的空闲位置;If so, arrange the weight matrix of the current layer in the free position of the first storage and calculation integrated unit block;若否,按照蛇形顺序依次轮询各存算一体单元块,直到找到能够容纳所述权重矩阵的存算一体单元块,将当前层的权重矩阵排布在所述存算一体单元块的空闲位置;If not, poll each storage-computing integrated unit block in a serpentine sequence until a storage-computing integrated unit block that can accommodate the weight matrix is found, and arrange the weight matrix of the current layer in the free space of the storage-computing integrated unit block. Location;其中,排布权重矩阵时从非空闲行的下一行开始排列。Among them, when arranging the weight matrix, the arrangement starts from the next row of the non-idle row.8.根据权利要求1所述的用于存算一体芯片的神经网络映射方法,其特征在于,所述将对应的Bias排布到所述存算一体芯片的Bias阵列中的与所述排布位置列对应的位置时,将所述Bias排布到列对应的位置中的非空闲行的下一行。8 . The neural network mapping method for a memory-computing integrated chip according to claim 1 , wherein the arrangement of the corresponding Bias in the Bias array of the memory-computing integrated chip is the same as the arrangement. 9 . When the position corresponding to the column is located, the Bias is arranged in the next row of the non-idle row in the position corresponding to the column.9.根据权利要求1所述的用于存算一体芯片的神经网络映射方法,其特征在于,还包括:9. The neural network mapping method for storing and calculating an integrated chip according to claim 1, characterized in that, further comprising:根据Bias排布结果和所述Bias阵列的空闲情况对Bias的排布进行扩展,得到Bias最终排布结果。The Bias arrangement is extended according to the Bias arrangement result and the idle condition of the Bias array to obtain the final Bias arrangement result.10.根据权利要求9所述的用于存算一体芯片的神经网络映射方法,其特征在于,所述根据Bias排布结果和所述Bias阵列的空闲情况对Bias的排布进行扩展,包括:10. The neural network mapping method for a memory-computing integrated chip according to claim 9, wherein, the arrangement of Bias is expanded according to the Bias arrangement result and the idle condition of the Bias array, comprising:根据Bias阵列的空闲行数判断是否能够将所有Bias占用行数成倍扩展;According to the number of free rows in the Bias array, it is judged whether the number of rows occupied by all Bias can be doubled;若是,将所有Bias占用行数成倍扩展;If so, multiply the number of lines occupied by all Bias;若否,按照预设规则选取Bias进行扩展。If not, select Bias to expand according to the preset rules.11.根据权利要求1至9任一项所述的用于存算一体芯片的神经网络映射方法,其特征在于,还包括:11. The neural network mapping method for storing and calculating an integrated chip according to any one of claims 1 to 9, characterized in that, further comprising:将所述存算一体芯片的存算一体单元阵列划分为主阵列以及Bias阵列;Dividing the storage and computing integrated unit array of the storage and computing integrated chip into a main array and a Bias array;将所述主阵列划分为多个存算一体单元块。The main array is divided into a plurality of integrated storage and computing unit blocks.12.根据权利要求1至9任一项所述的用于存算一体芯片的神经网络映射方法,其特征在于,还包括:12. The neural network mapping method for storing and calculating an integrated chip according to any one of claims 1 to 9, characterized in that, further comprising:获取待映射神经网络的参数以及目标存算一体芯片的参数,所述待映射神经网络的参数包括各层对应的权重矩阵以及Bias;Obtain the parameters of the neural network to be mapped and the parameters of the target storage-computing integrated chip, where the parameters of the neural network to be mapped include weight matrices and Bias corresponding to each layer;根据各层的Bias以及所述存算一体芯片的参数得到各层对应的Bias最小行数。The minimum number of rows of Bias corresponding to each layer is obtained according to the Bias of each layer and the parameters of the integrated storage and calculation chip.13.一种存算一体芯片,其特征在于,包括:用于执行神经网络运算的存算一体单元阵列,所述存算一体单元阵列包括:主阵列以及Bias阵列,所述主阵列中映射有各层神经网络对应的权重矩阵;Bias阵列中映射有各层神经网络对应的Bias;13. A memory-computing integrated chip, characterized in that it comprises: a memory-computing integrated unit array for performing neural network operations, the memory-computing integrated unit array comprising: a main array and a Bias array, wherein the main array is mapped with The weight matrix corresponding to each layer of neural network; the Bias corresponding to each layer of neural network is mapped in the Bias array;各层对应的权重矩阵基于Bias最小行数与权重矩阵的列数排序,且按照排序结果在所述主阵列上蛇形排布,所述Bias排布到Bias阵列中的与对应权重矩阵的排布位置列对应的位置。The weight matrix corresponding to each layer is sorted based on the minimum number of rows of Bias and the number of columns of the weight matrix, and is arranged in a serpentine shape on the main array according to the sorting result, and the Bias is arranged in the Bias array and the arrangement of the corresponding weight matrix The corresponding position in the position column.14.根据权利要求13所述的存算一体芯片,其特征在于,所述主阵列包括多个阵列分布的存算一体单元块,各层对应的权重矩阵排布到所述主阵列中的存算一体单元块上;14 . The integrated memory and computing chip according to claim 13 , wherein the main array comprises a plurality of integrated memory and computing unit blocks distributed in an array, and the weight matrix corresponding to each layer is arranged in the memory and computing in the main array. 15 . Count on one unit block;其中,针对每一层神经网络对应的权重矩阵,按照蛇形顺序依次轮询所述主阵列中的存算一体单元块以找到对应的排布位置。Wherein, for the weight matrix corresponding to each layer of the neural network, the storage and calculation integrated unit blocks in the main array are polled in turn according to the serpentine sequence to find the corresponding arrangement position.15.根据权利要求13所述的存算一体芯片,其特征在于,所述排序的原则为:15. The integrated storage and computing chip according to claim 13, wherein the sorting principle is:根据各层对应的Bias最小行数对各层进行映射排序;Map and sort each layer according to the minimum number of Bias rows corresponding to each layer;对Bias最小行数相同的层按照权重矩阵的列数进行映射排序。The layers with the same minimum number of rows of Bias are mapped and sorted according to the number of columns of the weight matrix.16.根据权利要求13所述的存算一体芯片,其特征在于,所述Bias排布方式在排布到Bias阵列中的与对应权重矩阵的排布位置列对应的位置的基础上,还基于Bias最小行数以及Bias阵列空闲情况对各Bias占据的行数进行了扩展。16 . The integrated storage-computation chip according to claim 13 , wherein the Bias arrangement is based on the positions arranged in the Bias array corresponding to the arrangement position columns of the corresponding weight matrices, and also based on 16 . The minimum number of rows of Bias and the idle condition of Bias array are extended to the number of rows occupied by each Bias.17.一种存算一体芯片,包括:用于执行神经网络运算的存算一体单元阵列,所述存算一体单元阵列包括:主阵列以及Bias阵列,所述主阵列中映射有各层神经网络对应的权重矩阵;Bias阵列中映射有各层神经网络对应的Bias;17. A memory-computing integrated chip, comprising: a memory-computing integrated unit array for performing neural network operations, the memory-computing integrated unit array comprising: a main array and a Bias array, wherein each layer of neural networks is mapped in the main array The corresponding weight matrix; the Bias corresponding to each layer of neural network is mapped in the Bias array;所述权重矩阵以及对应的Bias的排布方式根据权利要求1至12任一项所述的神经网络映射方法产生。The weight matrix and the arrangement of the corresponding Bias are generated according to the neural network mapping method according to any one of claims 1 to 12.18.一种用于存算一体芯片的神经网络映射装置,其特征在于,包括:18. A neural network mapping device for a memory-computing integrated chip, characterized in that it comprises:排序模块,根据待映射神经网络各层对应的Bias最小行数与权重矩阵对各层进行映射排序;The sorting module maps and sorts each layer according to the minimum number of Bias rows and weight matrix corresponding to each layer of the neural network to be mapped;权重排布模块,根据所述映射排序结果,依次将各层对应的权重矩阵排布到存算一体芯片的主阵列中;The weight arrangement module, according to the mapping and sorting result, sequentially arranges the weight matrix corresponding to each layer into the main array of the integrated storage and computing chip;偏置排布模块,根据权重矩阵的排布位置以及Bias最小行数将对应的Bias排布到所述存算一体芯片的Bias阵列中的与所述排布位置列对应的位置;The offset arrangement module, according to the arrangement position of the weight matrix and the minimum number of rows of the Bias, arranges the corresponding Bias to the position corresponding to the arrangement position column in the Bias array of the integrated storage and calculation chip;其中,依次排布各层对应的权重矩阵时按照蛇形顺序排布。Among them, when arranging the weight matrices corresponding to each layer in sequence, they are arranged in a serpentine order.19.一种电子设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,其特征在于,所述处理器执行所述程序时实现权利要求1至12任一项所述的神经网络映射方法的步骤。19. An electronic device comprising a memory, a processor and a computer program stored on the memory and running on the processor, wherein the processor implements any one of claims 1 to 12 when executing the program The steps of the neural network mapping method.20.一种计算机可读存储介质,其上存储有计算机程序,其特征在于,该计算机程序被处理器执行时实现权利要求1至12任一项所述的神经网络映射方法的步骤。20. A computer-readable storage medium on which a computer program is stored, characterized in that, when the computer program is executed by a processor, the steps of the neural network mapping method according to any one of claims 1 to 12 are implemented.
CN202111184060.XA2021-10-112021-10-11Neural network mapping method, device and equipment for memory and calculation integrated chipActiveCN113988277B (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN202111184060.XACN113988277B (en)2021-10-112021-10-11Neural network mapping method, device and equipment for memory and calculation integrated chip

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN202111184060.XACN113988277B (en)2021-10-112021-10-11Neural network mapping method, device and equipment for memory and calculation integrated chip

Publications (2)

Publication NumberPublication Date
CN113988277Atrue CN113988277A (en)2022-01-28
CN113988277B CN113988277B (en)2025-07-25

Family

ID=79738159

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN202111184060.XAActiveCN113988277B (en)2021-10-112021-10-11Neural network mapping method, device and equipment for memory and calculation integrated chip

Country Status (1)

CountryLink
CN (1)CN113988277B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN114723024A (en)*2022-03-082022-07-08北京知存科技有限公司 A neural network mapping method based on linear programming for memory-computing integrated chip
CN114997388A (en)*2022-06-302022-09-02北京知存科技有限公司Linear programming-based neural network bias processing method for memory and computation integrated chip

Citations (6)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US20160275395A1 (en)*2015-03-182016-09-22International Business Machines CorporationImplementing a neural network algorithm on a neurosynaptic substrate based on metadata associated with the neural network algorithm
CN111241028A (en)*2018-11-282020-06-05北京知存科技有限公司 A digital-analog hybrid storage and computing integrated chip and computing device
CN112231631A (en)*2020-10-292021-01-15北京知存科技有限公司Assembly line control method for parallel work of storage and calculation integrated chip
US20210125070A1 (en)*2018-07-122021-04-29Futurewei Technologies, Inc.Generating a compressed representation of a neural network with proficient inference speed and power consumption
US20210256364A1 (en)*2020-02-182021-08-19Hangzhou Zhicun Intelligent Technology Co., Ltd.Neural network weight matrix adjusting method, writing control method and related apparatus
WO2021163866A1 (en)*2020-02-182021-08-26杭州知存智能科技有限公司Neural network weight matrix adjustment method, writing control method, and related device

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US20160275395A1 (en)*2015-03-182016-09-22International Business Machines CorporationImplementing a neural network algorithm on a neurosynaptic substrate based on metadata associated with the neural network algorithm
US20210125070A1 (en)*2018-07-122021-04-29Futurewei Technologies, Inc.Generating a compressed representation of a neural network with proficient inference speed and power consumption
CN111241028A (en)*2018-11-282020-06-05北京知存科技有限公司 A digital-analog hybrid storage and computing integrated chip and computing device
US20210256364A1 (en)*2020-02-182021-08-19Hangzhou Zhicun Intelligent Technology Co., Ltd.Neural network weight matrix adjusting method, writing control method and related apparatus
WO2021163866A1 (en)*2020-02-182021-08-26杭州知存智能科技有限公司Neural network weight matrix adjustment method, writing control method, and related device
CN112231631A (en)*2020-10-292021-01-15北京知存科技有限公司Assembly line control method for parallel work of storage and calculation integrated chip

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
蒋明峰: "用于神经网络的浮栅单元向量-矩阵乘法器的研究与设计", 中国优秀硕士论文全文数据库信息科技辑, 15 January 2021 (2021-01-15), pages 13 - 52*

Cited By (3)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN114723024A (en)*2022-03-082022-07-08北京知存科技有限公司 A neural network mapping method based on linear programming for memory-computing integrated chip
CN114997388A (en)*2022-06-302022-09-02北京知存科技有限公司Linear programming-based neural network bias processing method for memory and computation integrated chip
CN114997388B (en)*2022-06-302024-05-07杭州知存算力科技有限公司Neural network bias processing method based on linear programming for memory and calculation integrated chip

Also Published As

Publication numberPublication date
CN113988277B (en)2025-07-25

Similar Documents

PublicationPublication DateTitle
CN107437110B (en) Block convolution optimization method and device for convolutional neural network
EP4036724A1 (en)Method for splitting neural network model by using multi-core processor, and related product
US12277440B2 (en)Scheduler, method of operating the same, and accelerator apparatus including the same
CN114723024B (en) A linear programming-based neural network mapping method for storage-computing integrated chip
CN111400555B (en)Graph data query task processing method and device, computer equipment and storage medium
CN113988277B (en)Neural network mapping method, device and equipment for memory and calculation integrated chip
CN103778191B (en)Vector contour line data partitioning method with space proximity relation considered
CN114138231B (en) Method, circuit and SOC for performing matrix multiplication
CN106295670A (en)Data processing method and data processing equipment
WO2023134453A1 (en)Operator processing method and computer device
JP2024536901A (en) Hardware Accelerator Optimized Group Convolution-Based Neural Network Models
CN114968182B (en) Operator splitting method, control method and device for storage and computing integrated chip
CN107391564A (en)Data transfer device, device and electronic equipment
US12412108B2 (en)System and method for inference generation via optimization of inference model portions
CN115906720A (en) Memory design method, device, electronic device and storage medium
CN113792170B (en)Graph data dividing method and device and computer equipment
US20250124700A1 (en)Neural network architecture for implementing group convolutions
CN111832714B (en) Computing methods and devices
CN118312327A (en)Hardware resource allocation method, electronic device and storage medium
CN111645687A (en)Lane changing strategy determining method, device and storage medium
US20250086480A1 (en)Method for task execution based on knowledge graph obtained through entity alignment
CN117076092A (en) Multidimensional data task processing method, device, electronic equipment and storage medium
CN107273207A (en)A kind of related data storage method based on hypergraph partitioning algorithm
CN113760380A (en)Method, device, equipment and storage medium for determining running code of network model
CN119201054B (en) Power business system splitting method and device

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination
CB02Change of applicant information
CB02Change of applicant information

Country or region after:China

Address after:Room 213-175, 2nd Floor, Building 1, No. 180 Kecheng Street, Qiaosi Street, Linping District, Hangzhou City, Zhejiang Province, 311100

Applicant after:Hangzhou Zhicun Computing Technology Co.,Ltd.

Address before:100080 15 / F, west block, brilliant times building, Haidian District, Beijing

Applicant before:BEIJING WITINMEM TECHNOLOGY Co.,Ltd.

Country or region before:China

GR01Patent grant
GR01Patent grant

[8]ページ先頭

©2009-2025 Movatter.jp