CN113988277A

Movatterモバイル変換

Info

Publication number: CN113988277A
Application number: CN202111184060.XA
Authority: CN
Inventors: 康卓栋; 张爱飞; 陆振亮
Original assignee: Beijing Witinmem Technology Co ltd
Current assignee: Beijing Witinmem Technology Co ltd
Priority date: 2021-10-11
Filing date: 2021-10-11
Publication date: 2022-01-28
Anticipated expiration: 2041-10-11
Also published as: CN113988277B

Abstract

Translated fromChinese

本发明实施例提供一种用于存算一体芯片的神经网络映射方法、装置、设备，该方法包括：根据待映射神经网络各层对应的Bias最小行数与权重矩阵对各层进行映射排序；根据所述映射排序结果，依次将各层对应的权重矩阵排布到存算一体芯片的主阵列中，并根据权重矩阵的排布位置以及Bias最小行数将对应的Bias排布到所述存算一体芯片的Bias阵列中的与所述排布位置列对应的位置；其中，依次排布各层对应的权重矩阵时按照蛇形顺序排布，有效利用了存算一体单元，另外，基于Bias最小行数以及Bias阵列空闲情况对各Bias占据的行数进行了扩展，减少单个存算一体单元上的偏置数值，降低电流噪声，提高运算精度。

Embodiments of the present invention provide a neural network mapping method, device, and device for a memory-computing integrated chip. The method includes: mapping and sorting each layer according to the Bias minimum row number and weight matrix corresponding to each layer of the neural network to be mapped; According to the result of the mapping and sorting, the corresponding weight matrices of each layer are sequentially arranged in the main array of the integrated memory and computing chip, and the corresponding Bias are arranged in the memory according to the arrangement position of the weight matrix and the minimum number of Bias rows. Calculate the position corresponding to the arrangement position column in the Bias array of the integrated chip; wherein, the weight matrix corresponding to each layer is arranged in a serpentine order, which effectively utilizes the integrated storage and calculation unit. In addition, based on the Bias The minimum number of rows and the idle condition of the Bias array expand the number of rows occupied by each Bias, reduce the offset value on a single storage-computation unit, reduce current noise, and improve operation accuracy.

Description

Neural network mapping method, device and equipment for storage and computation integrated chip

Technical Field

The invention relates to the technical field of semiconductors, in particular to a neural network mapping method, device and equipment for a storage and computation integrated chip.

Background

In recent years, with the continuous development of three dimensions of algorithms, computing power and data size, machine learning technology has continuously shown strong advantages in solving many problems, wherein the artificial neural network draws attention to its outstanding performance in the fields of image recognition, target detection, semantic segmentation, and the like. However, with the enlargement of the scale of the neural network, the traditional mode of processing the neural network algorithm by using the CPU + GPU architecture gradually encounters the bottleneck of speed and power consumption, and the root of the bottleneck is that the von neumann architecture is separated, so that the data-centered neural network algorithm brings excessive data transmission overhead to the computing system, and the speed is reduced while the power consumption is increased.

The memory computing technology solves the problems caused by memory separation, the weight of a neural network is stored on the conductance of a flash memory array node in a memory-computation-integrated (NPU) chip, and then a data source expressed by voltage is sent to an array, and the current output by the array is the product of the voltage and the conductance according to ohm's law, so that the matrix multiplication and addition operation of the data source and the network weight is completed, and analog computation is carried out essentially instead of traditional digital computation.

The design of the tool chain is an important link in the whole process from design to production of the integrated chip. In the tool chain design facing to the storage and calculation integrated chip, the technology of automatically mapping the weight parameters of a specific neural network onto a chip array according to requirements is a key technology, when the trained neural network is mapped onto the storage and calculation integrated unit array of the storage and calculation integrated chip, the weights and the offsets are sequentially mapped onto the storage and calculation integrated chip array according to the sequence of each layer of the neural network, on one hand, the storage and calculation integrated unit cannot be effectively utilized, the size of the storage and calculation integrated unit array is increased, on the other hand, the offset is directly mapped onto the storage and calculation integrated chip array, the larger the offset value is, the larger the corresponding conductance of the storage and calculation integrated unit is, and under the same voltage, the larger the current of the storage and calculation integrated unit is, the larger the noise is, and the operation precision is influenced.

Disclosure of Invention

Aiming at the problems in the prior art, the invention provides a neural network mapping method, device and equipment for storing and calculating an integrated chip, which can at least partially solve the problems in the prior art.

In order to achieve the purpose, the invention adopts the following technical scheme:

in a first aspect, a neural network mapping method for storing a monolithic chip is provided, including:

mapping and sequencing each layer according to the minimum line number of the Bias corresponding to each layer of the neural network to be mapped and the weight matrix;

according to the mapping sorting result, sequentially arranging the weight matrixes corresponding to the layers into a main array of the storage and computation integrated chip, and arranging the corresponding Bias to a position, corresponding to the arrangement position column, in the Bias array of the storage and computation integrated chip according to the arrangement position of the weight matrixes and the minimum number of rows of the Bias;

and when the weight matrixes corresponding to the layers are arranged in sequence, the weight matrixes are arranged in a snake-shaped sequence.

Furthermore, the main array comprises a plurality of storage and calculation integrated unit blocks distributed in an array, and the weight matrix corresponding to each layer is arranged on the storage and calculation integrated unit blocks in the main array;

arrange the weight matrix that each layer corresponds in proper order and arrange according to snakelike order when, include: and sequentially polling the storage and calculation integrated unit blocks in the main array according to a snake-shaped sequence aiming at the weight matrix corresponding to each layer of neural network so as to find the arrangement position.

Further, the mapping and sorting each layer according to the minimum line number of Bias and the weight matrix corresponding to each layer of the neural network to be mapped includes:

mapping and sequencing each layer according to the minimum number of rows of the Bias corresponding to each layer;

and carrying out mapping sorting on the layers with the same minimum row number of the Bias according to the column number of the weight matrix.

Further, the neural network mapping method for the storage-computation-integrated chip further comprises the following steps of;

and writing the weight matrix and the Bias of each layer of the neural network to be mapped into the integrated storage chip according to the configuration result.

Further, the sequentially polling the bank unit blocks in the main array in a serpentine order to find the arrangement position includes:

sequentially polling whether the storage and calculation integrated unit blocks in the main array have positions meeting the arrangement condition of the current layer according to a snake-shaped sequence;

if so, arranging the weight matrix of the current layer to a position meeting the arrangement condition of the current layer;

if not, continuing polling the next storage and calculation integrated unit block until finding the position meeting the arrangement condition;

wherein the arrangement conditions are as follows: and the weight matrix is arranged on the current calculation integral unit block in the current arrangement period, is side by side and can accommodate the weight matrix of the current layer.

Further, the weight matrix is arranged from the column next to the non-idle column.

Further, the sequentially polling the bank unit blocks in the main array in a serpentine order to find the arrangement position further includes:

if the position meeting the arrangement condition is not found after all the storage and calculation integrated unit blocks are polled according to the snake-shaped sequence, returning to the first storage and calculation integrated unit block, and entering the next arrangement period:

judging whether the idle position of the first storage integral unit block can accommodate the weight matrix of the current layer;

if so, arranging the weight matrix of the current layer at the idle position of the first storage integral unit block;

if not, sequentially polling all the storage and calculation integrated unit blocks according to the snake-shaped sequence until the storage and calculation integrated unit blocks capable of containing the weight matrix are found, and arranging the weight matrix of the current layer at the idle position of the storage and calculation integrated unit blocks;

wherein the weight matrix is arranged starting from the next row of the non-idle rows.

Further, when the corresponding Bias is arranged to the position corresponding to the arrangement position column in the Bias array of the storage and computation integrated chip, the Bias is arranged to the next row of the non-idle row in the position corresponding to the column.

Further, the neural network mapping method for the storage-computation-integrated chip further comprises the following steps:

and expanding the arrangement of the Bias according to the Bias arrangement result and the idle condition of the Bias array to obtain a final Bias arrangement result.

Further, the expanding the configuration of the Bias according to the Bias configuration result and the idle condition of the Bias array includes:

judging whether the number of rows occupied by all the Bias can be expanded in multiples according to the number of idle rows of the Bias array;

if yes, multiplying the number of rows occupied by all the Bias;

if not, selecting the Bias according to a preset rule for expansion.

dividing a storage and calculation integrated unit array of the storage and calculation integrated chip into a main array and a Bias array;

the main array is divided into a plurality of bank unit blocks.

acquiring parameters of a neural network to be mapped and parameters of a target storage integrated chip, wherein the parameters of the neural network to be mapped comprise weight matrixes and Bias corresponding to each layer;

and obtaining the minimum number of rows of the Bias corresponding to each layer according to the Bias of each layer and the parameters of the storage and computation integrated chip.

In a second aspect, there is provided a computing integrated chip comprising: an array of banked cells for performing neural network operations, the array of banked cells comprising: the device comprises a main array and a Bias array, wherein weight matrixes corresponding to each layer of neural network are mapped in the main array; the Bias array is mapped with the Bias corresponding to each layer of neural network;

and the weight matrixes corresponding to the layers are arranged in a snake shape on the main array based on the minimum row number of the Bias and the column number of the weight matrixes, and the Bias is arranged to the position, corresponding to the arrangement position column of the corresponding weight matrix, in the Bias array.

Further, the main array comprises a plurality of storage and calculation integrated unit blocks distributed in an array mode, and the weight matrix corresponding to each layer is arranged on the storage and calculation integrated unit blocks in the main array;

and sequentially polling the calculation integrated unit blocks in the main array according to a snake-shaped sequence aiming at the weight matrix corresponding to each layer of neural network so as to find out the corresponding arrangement position.

Further, the principle of the sorting is as follows:

Further, the Bias arrangement mode expands the number of rows occupied by each Bias based on the minimum number of rows of the Bias and the idle condition of the Bias array on the basis of the position arranged in the Bias array corresponding to the arrangement position column of the corresponding weight matrix.

In a third aspect, a computing integrated chip is provided, comprising: an array of banked cells for performing neural network operations, the array of banked cells comprising: the device comprises a main array and a Bias array, wherein weight matrixes corresponding to each layer of neural network are mapped in the main array; the Bias array is mapped with the Bias corresponding to each layer of neural network;

the weight matrix and the corresponding Bias arrangement mode are generated according to the neural network mapping method.

In a fourth aspect, a neural network mapping apparatus for storing a monolithic chip is provided, comprising:

the sorting module is used for mapping and sorting each layer according to the minimum line number of the Bias corresponding to each layer of the neural network to be mapped and the weight matrix;

the arrangement module is used for sequentially arranging the weight matrixes corresponding to the layers into the main array of the storage and computation integrated chip according to the mapping sequencing result, and arranging the corresponding Bias to the position, corresponding to the arrangement position column, in the Bias array of the storage and computation integrated chip according to the arrangement position of the weight matrixes and the minimum number of rows of the Bias;

In a fifth aspect, an electronic device is provided, which includes a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor executes the program to implement the steps of the neural network mapping method described above.

In a sixth aspect, a computer-readable storage medium is provided, on which a computer program is stored, which computer program, when being executed by a processor, carries out the steps of the neural network mapping method described above.

The embodiment of the invention provides a neural network mapping method, a device and equipment for a storage and computation integrated chip, wherein the method comprises the following steps: mapping and sequencing each layer according to the minimum line number of the Bias corresponding to each layer of the neural network to be mapped and the weight matrix; according to the mapping sorting result, sequentially arranging the weight matrixes corresponding to the layers into a main array of the storage and computation integrated chip, and arranging the corresponding Bias to a position, corresponding to the arrangement position column, in the Bias array of the storage and computation integrated chip according to the arrangement position of the weight matrixes and the minimum number of rows of the Bias; and when the weight matrixes corresponding to the layers are arranged in sequence, the weight matrixes are arranged in a snake-shaped sequence. Through arranging each layer in a snake shape after sequencing, the storage and calculation integrated unit is effectively utilized, and the size of the storage and calculation integrated unit array can be reduced under the same operation scale.

In addition, in the embodiment of the invention, the number of rows occupied by each Bias is expanded based on the minimum number of rows of the Bias and the idle condition of the Bias array, so that the offset value on a single storage and calculation integrated unit is reduced, the current noise is reduced, and the calculation accuracy is improved.

In order to make the aforementioned and other objects, features and advantages of the invention comprehensible, preferred embodiments accompanied with figures are described in detail below.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts. In the drawings:

FIG. 1 shows a flow diagram of a neural network mapping method for a compute-all-in-one chip in an embodiment of the invention;

FIG. 2 illustrates a method for partitioning a storage-integrated cell array according to an embodiment of the present invention;

fig. 3 shows the specific steps of step S100 in the embodiment of the present invention;

FIG. 4 illustrates a process for ordering layers of a neural network in an embodiment of the present invention;

fig. 5 illustrates a weight matrix and the arrangement result of the corresponding Bias in the embodiment of the present invention;

fig. 6 illustrates an arrangement expansion process of Bias in an embodiment of the present invention;

FIG. 7 is a block diagram of a neural network mapping apparatus for storing a monolithic chip according to an embodiment of the present invention;

fig. 8 is a block diagram of an electronic device according to an embodiment of the invention.

Detailed Description

In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only partial embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

It should be noted that the terms "comprises" and "comprising," and any variations thereof, in the description and claims of this application and the above-described drawings, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.

FIG. 1 shows a flow diagram of a neural network mapping method for a compute-all-in-one chip in an embodiment of the invention; as shown in fig. 1, the neural network mapping method for a compute-all-in-one chip may include the following:

step S100: mapping and sequencing each layer according to the minimum line number of the Bias corresponding to each layer of the neural network to be mapped and the weight matrix;

and aiming at the trained specific neural network, acquiring the weight matrix parameters, the Bias size and the minimum number of lines occupied on the chip of each layer. Firstly, sorting each layer of neural network according to the minimum row number of the Bias and the column number of the weight matrix, and taking the sorting as the mapping arrangement sequence.

Step S200: according to the mapping sorting result, sequentially arranging the weight matrixes corresponding to the layers into a main array of a storage and computation integrated chip;

when the weight matrices corresponding to the layers are sequentially arranged according to the order obtained in step S100, the weight matrices are arranged in a serpentine order.

Specifically, under a scenario that the incomes chip is input by rows of the incomes unit and output by columns of the incomes unit (the drawings in the embodiments of the present invention are based on this premise, those skilled in the art can understand that rows and columns of the incomes unit are only a relative concept, and in specific applications, the rows and columns may also be input by columns and output by rows, and the principle is the same, which is not described herein), when the weight matrix is arranged, for each block arranged in the main array (the main array is divided into a plurality of incomes unit blocks and arranged in an array of the incomes unit blocks), the first row (the row of the block but not the row of the incomes unit) may be in the X direction, the second row in the-X direction, the third row in the X direction, the fourth row in the-X direction, and so on in a serpentine manner, or the first row in the-X direction, the second line is in a snake-shaped mode according to the X direction, the third line is in the-X direction, the fourth line is in the X direction and the like; of course, it may also be a serpentine manner, starting from the last row, the last row being in the-X direction, the second to last row being in the X direction, the third to last row being in the-X direction, the fourth to last row being in the X direction, and so on.

Wherein, the integral storage unit can be a flash memory unit.

Step S300: arranging the corresponding Bias to a position corresponding to the arrangement position column in the Bias array of the storage and computation integrated chip according to the arrangement position of the weight matrix and the minimum line number of the Bias;

specifically, due to the particularity of the calculation of the memory integrated chip, the weight matrix and the Bias corresponding to the weight matrix need to be aligned in rows, so that the arrangement position of the Bias is aligned in rows with the arrangement position of the corresponding weight matrix, and the arrangement mode in the row direction is arranged according to the minimum row number of the Bias.

Generating an arrangement scheme through the steps S100 to S300, writing parameters of each layer of the neural network into a storage and computation integrated unit array of a storage and computation integrated chip through a compiling tool according to the arrangement scheme, gating a weight matrix corresponding to the operation of the layer of the neural network and a row and column where the Bias is located through a row and column decoder when a certain layer of the neural network is executed according to the arrangement scheme and control requirements in an application reasoning stage, inputting input signals of the layer of the neural network into rows corresponding to the weight matrix, performing matrix multiplication and addition operation with the weight matrix, then overlapping with the corresponding Bias, and obtaining a calculation result of the layer of the neural network in a corresponding column.

Through adopting above-mentioned technical scheme, through after sequencing each layer neural network parameter, arrange according to snakelike, effective complementary, for example, according to

order line number

9, 8, 7, 6, 5, 4, 3, 2, 1 sequencing, after carrying out snakelike range, with the correspondence of less line number and more line number, just so can reduce the row number that occupies in the Bios array, so that provide more line numbers for the extension, increased the utilization efficiency of saving integrative unit array, under the same operation scale, the integrative unit array of saving that needs can greatly reduce, adapt to the miniaturized demand of chip. In an optional embodiment, the neural network mapping method for a computation-integrated chip may further include: dividing a storage and calculation integrated unit array of the storage and calculation integrated chip into a main array and a Bias array; the main array is divided into a plurality of bank unit blocks. The use scenes and the corresponding neural network scales can be referred to for division, and the use performance can be guaranteed on the basis of effectively improving the resource utilization rate.

Specifically, as shown in fig. 2 (a), the actual physical architecture of the chip is composed of a main array and a Bias array. The applicant finds that, in the practical application process, since too large current in the simulation calculation process may significantly affect the calculation result, in the embodiment of the present invention, the array may be logically partitioned, as shown in (b) in fig. 2, the main array may be divided into 2 × 4 blocks, for example only, in the practical application, the array may be partitioned according to needs, and the Bias array may be partitioned or not partitioned. If the chip is small, the main array may not be blocked, but blocking is necessary in most scenarios in practice due to the generally large network size. It should be added that the division is performed according to the actual performance of the chip, and the size of each block that can be divided is the same, or the size of each block that can be divided is different.

In an optional embodiment, the neural network mapping method may further include: and writing the weight matrix and the Bias of each layer of the neural network to be mapped into the integrated storage chip according to the configuration result.

Specifically, the mapping method is executed in a tool chain, and may be understood as a program running on a terminal device, a server, or a chip burning apparatus, and an arrangement scheme is generated by the mapping method, and then the weight matrix and the Bias need to be written on the integrated storage chip according to the arrangement scheme. Then the memory and computation integrated chip can be installed on a corresponding equipment circuit board for reasoning application to realize neural network operation, for example, the memory and computation integrated chip can be installed on a toy for voice recognition, and at the moment, the neural network parameters written in the chip are parameters corresponding to the voice recognition neural network; of course, the chip may also be installed on a face recognition device, and the neural network parameters written in the chip are parameters corresponding to an image recognition neural network, which, of course, only exemplifies several chip application scenarios.

In an alternative embodiment, referring to fig. 3, this step S100 may include the following:

step S110: mapping and sequencing each layer according to the minimum number of rows of the Bias corresponding to each layer;

step S120: and carrying out mapping sorting on the layers with the same minimum row number of the Bias according to the column number of the weight matrix.

Specifically, referring to (a) in fig. 4, parameters of a neural network are shown, wherein the upper row of 5 matrices is weight matrices corresponding to 5 neural network layers, the lower row of 5 matrices is Bias corresponding to 5 neural network layers, wherein, taking the first weight matrix as an example, 1024 × 128 represents the scale of the weight matrix, which is 1024 columns and 128 rows, the corresponding Bias is 1024 × 2, which represents that the Bias is 1024 columns and the minimum number of occupied rows is 2 rows, wherein the number of columns of the weight matrix corresponding to one layer of neural network is the same as the number of columns corresponding to the Bias, which is determined by the operational characteristics in the integrated chip.

When the layers of the neural network are sorted, the layers can be sorted according to the minimum line number of the Bias corresponding to each layer; referring to (b) in fig. 4, sorting is performed according to the minimum number of rows of Bias from large to small, and of course, in practical applications, sorting may also be performed according to the principle of the same.

In addition, the minimum row numbers of the two Bias are the same, namely 1024 × 2 and 768 × 2, in this case, the layers with the same minimum row number of Bias are sorted from large to small according to the column number of the weight matrix, and the final sorting result is shown in fig. 4 (b).

It is worth noting that, according to the trained specific neural network, the embodiment of the present invention obtains the size of the Bias of each layer and the minimum number of rows occupied on the chip, and firstly sorts the layers according to the minimum number of rows occupied by the Bias of each layer from large to small, and then, locally sorts the layers with the same number of rows occupied by the Bias from small to large according to the number of columns. The first is to give priority to reducing the size of each physical row of Bias in the network mapping process, because this can improve the chip computation accuracy, and the input received by the embodiment of the present invention contains the minimum number of rows occupied by each layer of Bias, and the Bias rows on the array after mapping according to the number of rows generally have unmapped space, and if m rows are used to carry Bias values originally set to be carried by 1 row, the designed Bias values become 1/m of the original value on each row. Secondly, in application, the number of input paths of each layer of the network is small, and the number of output paths is large, as shown in fig. 4, for the illustrated network layer, the input is m rows, the output is n columns, and often m is smaller than n, so that in order to better ensure that the whole network can be accommodated in a limited space region, the sizes of all layers in the X direction are sorted from large to small, and on the premise that the size of the Bias is the first priority, all layers of the network are placed with the size of the X direction as the second priority.

In an optional embodiment, the main array includes a plurality of integrally storing and calculating unit blocks distributed in an array, referring to fig. 2, when the weight matrices corresponding to each layer are arranged on the integrally storing and calculating unit blocks in the main array, the weight matrices corresponding to each layer are arranged in a serpentine sequence when the weight matrices corresponding to each layer are arranged in sequence, which specifically includes: and sequentially polling the storage and calculation integrated unit blocks in the main array according to a snake-shaped sequence aiming at the weight matrix corresponding to each layer of neural network so as to find the arrangement position.

Specifically, whether the storage and calculation integrated unit blocks in the main array have positions meeting the arrangement condition of the current layer or not is polled sequentially according to a snake-shaped sequence; wherein the arrangement conditions are as follows: and the weight matrix is arranged on the current calculation integral unit block in the current arrangement period, is side by side and can accommodate the weight matrix of the current layer.

it should be noted that, in order to reduce the resource idleness to the maximum, when the position satisfying the condition is found, the weight matrix is arranged from the next column of the non-idle column, and of course, in a special application occasion or for interference reduction, a certain number of columns may be spaced, and the selection is specifically performed according to the actual application requirement.

And if the positions meeting the arrangement condition are not found after all the storage and calculation integrated unit blocks are polled according to the snake-shaped sequence, returning to the first storage and calculation integrated unit block and entering the next arrangement period.

In a new arrangement period, firstly judging whether the idle position of the first storage integral unit block can accommodate the weight matrix of the current layer; if so, arranging the weight matrix of the current layer at the idle position of the first storage integral unit block; if not, sequentially polling all the storage and calculation integrated unit blocks according to the snake-shaped sequence until the storage and calculation integrated unit blocks capable of containing the weight matrix are found, and arranging the weight matrix of the current layer at the idle position of the storage and calculation integrated unit blocks;

it should be noted that, in order to reduce the resource idle to the maximum, when finding the position satisfying the condition, the weight matrix is arranged from the next row of the non-idle rows, and certainly, in a special application occasion or for interference reduction, a certain number of rows may be provided, specifically, the rows are selected according to the actual application requirement.

By adopting the technical scheme, after the sorted weight matrix is arranged on the main array, the corresponding Bias is arranged to the next row of the non-idle row in the position corresponding to the row when the corresponding Bias is arranged to the position corresponding to the arrangement position row in the Bias array of the integrated storage chip according to the principle that the weight matrix corresponds to the corresponding Bias row.

In order to make the embodiment of the present invention better understood by those skilled in the art, the mapping process is described in detail below with reference to fig. 5.

First, taking a 20-layer neural network as an example, each layer of the network is mapped onto a bank cell array of a bank chip. After the sorting is finished, recording the layers of the neural network as 0-19 in sequence, numbering weight matrixes as D0-D19 in sequence, numbering Bias as B0-B19 in sequence, firstly arranging the weight matrixes as D0-D19, then, in the example, arranging Bias B0-B19 according to the arrangement result of the weight matrixes, in the arrangement process, in the main array, arranging the first row of storage and calculation integrated unit blocks in a left-to-right sequence, arranging the second row of storage and calculation integrated unit blocks in a right-to-left sequence, arranging the third row of storage and calculation integrated unit blocks in a left-to-right sequence, arranging the fourth row of storage and calculation integrated unit blocks in a right-to-left sequence, then returning to the first row, and polling according to the snake-shaped circulation sequence;

for example, for D0, first poll block (0,0), there is a position that can accommodate the weight matrix of the current layer, at which time D0 is arranged at (0, 0); for D1, first poll block (0,0), there is a location that satisfies the condition, arrange D1 at (0,0) and side by side with D0; for D2, first poll block (0,0), and locate on block (0,0) side-by-side with D0, D1 to hold the lower D2, then poll block (0,1) in order, and then arrange D2 at (0, 1); for D3, polling block (0,0), which is aligned with D0, D1 on block (0,0) and cannot accommodate lower D3, then polling block (0,1), which is aligned with D2 on block (0,1) and cannot accommodate D3, then polling block (1,1) in serpentine shape, and then arranging D3 at (1, 1); for D4, polling block (0,0), which cannot accommodate lower D4 in the (0,0) block side-by-side position with D0, D1, then polling block (0,1), which cannot accommodate D4 in the (0,1) block side-by-side position with D2, then polling block (1,1) in serpentine shape, and then arranging D4 in the (1,1) block side-by-side position with D3; for D5, block (0,0) was first polled, with block (0,0) side-by-side with D0, D1 in a position to accommodate D5, at which time D5 was arranged at (0,0) and side-by-side with D0, D1; by analogy, after D15 is arranged, for block D16, by sequentially polling from (0,0), a position on each block which does not satisfy the condition (a position which can accommodate D16 and is aligned with the matrix before the present arrangement period) is jumped to (0,0), and the next arrangement period is entered.

In a new arrangement period, it is first determined whether the free position of (0,0) can accommodate D16; if yes, D16 is arranged (0,0), and when the arrangement is performed, the arrangement is started from the next row of the non-idle rows, in this case, the arrangement is started from the next row of D1, and the arrangement of D17 to D19 is referred to as D1 to D5, which is not described herein again.

It is worth noting that in an alternative embodiment, D17 first polls (0,0) whether the free position can accommodate D17, wherein the free position includes the position of the row where D0 is located (i.e. the row where the previous arrangement period is located), polls in turn whether the right side of D16 in the current arrangement period has the position capable of accommodating D17, and so on, and the arrangement principle is that in one round of mapping, two layers of networks in an array block are adjacent in the X direction and cannot be staggered, but each layer of networks is from top to bottom in the Y direction and can be put down as long as the layer is put down. So that each slot on the array is considered.

In another alternative embodiment, D17 first polls (0,0) whether the free position to the right of D16 can accommodate D17, wherein the position of the row of D0 (i.e., the row of the previous layout cycle) is not polled at the time of polling, and only the position of the row of the current cycle is polled. It should be noted that, on a block, the matrix ordered before is in front of the matrix ordered after, and the front and back are determined according to the polling order, for example, for the first row of blocks (0,1) and (0,1), the polling is performed in the order from left to right, and the left position is in front of the right position, and in a block, the matrix ordered before is on the left of the matrix ordered after, for example, D0 is on the left of D1; for the second row of blocks (1,0) and (1,1), which are polled in order from right to left, the right position is ahead of the left position, and in a block in the second row, the matrix ordered ahead is to the right of the matrix ordered behind, e.g., D3 is to the right of D4.

It should be noted that, during mapping, the position of each layer of weight on the main array is considered first, and then the position of each layer of Bias on the Bias array is considered, and the general principle of mapping is as follows: starting from the (0,0) th block, mapping from left to right in the even row blocks (the 0 th block is used for calculating the even row blocks), mapping from right to left in the odd row blocks and from top to bottom in a snake-shaped sequence on the whole, and returning to the (0,0) th block after one round is finished to repeat the process until the whole neural network is mapped. (the above general principle is merely an example, in an actual process, the 0 th block row may be from right to left or may be arranged in a serpentine shape from the lowest block row to the highest block row of the array as a whole, but the serpentine principle must be followed, and the serpentine arrangement must be performed in a serpentine shape of rows, and cannot be performed in a serpentine shape of columns (this is a principle value in the case of row input and column output, it is understood by those skilled in the art that if the row output and the column input are used, the situation is the opposite, and the embodiment of the present invention is not described in detail herein). And a larger space is provided for the subsequent Bias expansion, and higher calculation precision is provided for the calculation result of the chip.

Several issues need to be addressed in the main array mapping process. Unless specifically required, adjacent blocks are based on seamless joint, and the purpose of seamless joint is to make full use of array space in the early stage of mapping as much as possible, because each layer must be put into the array in one piece, and if gaps are left between layers and are difficult to use, the gaps may be difficult to put down in the later stage of mapping. In one round of mapping, each layer looks for the position in serpentine order starting from the (0,0) th block, and if the current block is not sufficiently positioned, the next block is considered. In a round of mapping, if there is a layer that is not placed in all blocks, then a new round of mapping is entered. In the Bias mapping, it is only necessary to map sequentially from top to bottom on the Bias array according to the corresponding weight position and the minimum number of rows occupied, which is given in advance, and the number in parentheses after each Bias in fig. 5 indicates the number of rows occupied, for example, B1(9) indicates that the number of rows occupied by B1 is 9.

In an optional embodiment, the neural network mapping method for a computation-integrated chip may further include: acquiring parameters of a neural network to be mapped and parameters of a target storage integrated chip, wherein the parameters of the neural network to be mapped comprise weight matrixes and Bias corresponding to each layer; and then obtaining the minimum number of rows of the Bias corresponding to each layer according to the Bias of each layer and the parameters of the storage and computation integrated chip.

Specifically, after the neural network model is determined, the weight array and the offset value of each layer are known, and the minimum number of rows of Bias of each layer is calculated according to the offset value and the parameter of the target chip (the attribute of Bias of each row).

The minimum number of rows of the Bias may be given in advance by an engineer in the circuit aspect according to the precision requirement, generally, the minimum precision requirement is satisfied as a standard, and the operation may also be performed according to a preset rule, which is not described in detail in the embodiment of the present invention.

In an optional embodiment, the neural network mapping method for a computation-integrated chip may further include: and expanding the arrangement of the Bias according to the Bias arrangement result and the idle condition of the Bias array to obtain a final Bias arrangement result.

Specifically, after the configuration is completed, the Bias array may have idle rows, in order to reduce Bias values stored in all or part of the storage integration units in the Bias array as much as possible, the idle rows in the Bias array are required to be used for performing secondary expansion on the configuration of the Bias to obtain a final configuration scheme, and then neural network parameters are written into the storage integration chip according to the final configuration scheme.

Because the calculation-integrated chip essentially adopts analog calculation, the larger the Bias value of each calculation-integrated unit on the Bias array is, the larger the noise generated by final calculation is, and the excessive noise introduced by the excessive Bias can have a decisive influence on the calculation accuracy, so that the actual number of rows of the Bias array occupied by one row of Bias in logic can be expanded as much as possible according to the size of the array, and if the occupied actual number of rows is m, the size of the Bias stored in each row is 1/m of the size of the Bias in logic, thereby improving the calculation accuracy.

In an optional embodiment, expanding the assignment of the Bias according to the Bias assignment result and the idle condition of the Bias array includes:

if yes, multiplying the number of rows occupied by all the Bias;

if not, selecting the Bias according to a preset rule for expansion.

Specifically, while mapping each layer of weights of the neural network onto the main array, the preliminary mapping of each layer of Bias is completed on the Bias array according to the minimum number of rows occupied by each layer of Bias given in advance, and the mapping result is shown in (a) of fig. 6. And then, firstly, expanding according to the maximum integral multiple of the expansion of the rows of the mapped Bias, if the total rows of the Bias array is 10 rows, and if 3 rows are mapped, expanding each row to be 3 times of the original row, and if 4 rows are mapped, expanding each row to be 2 times of the original row. For the array arrangement shown in fig. 6 (a), the expansion can be doubled completely, and the result is shown in fig. 6 (b). Then, if the Bias array has idle rows, selecting the number of the mapped Bias rows equal to the idle rows from bottom to top, and expanding the Bias rows into two rows, as shown in (c) in fig. 6, and if there are 2 idle rows, respectively expanding the third row and the fourth row from bottom to top into two rows. The idle row refers to a row to which Bias is not mapped in the whole row, and a row to which Bias is mapped in part is calculated as a mapped Bias row.

In general, the principle of expansion is to utilize the resources of the Bias array as much as possible, reduce idle rows, expand the Bias corresponding to all layers under the condition that the Bias corresponding to all layers can be expanded, expand the Bias corresponding to part of the layers under the condition that the Bias corresponding to all layers cannot be expanded, and expand part of the rows Bias under the condition that the Bias corresponding to part of the layers cannot be expanded.

When the Bias corresponding to the partial layer is expanded, the Bias corresponding to the partial layer can be selected from the front to the back or from the back to the front according to the sequence of the above layers for expansion, and the Bias corresponding to the partial layer can also be expanded according to the preset priority; for example, the Bias corresponding to the layer having a large influence on the precision may be preferentially expanded, or the Bias corresponding to the layer having a large Bias value corresponding to the single storage-integration unit after the preliminary mapping may be expanded, which is specifically selected according to the actual application requirement.

In summary, the embodiments of the present invention provide a method for automatically mapping weights of a neural network to a saving-integration chip, which can be integrated into a tool chain of a saving-integration chip design to provide convenience for a saving-integration chip user.

The distribution mode of the Bias is expanded, so that on one hand, the storage and calculation integrated unit is fully utilized, the resource utilization rate is improved, the resource idling is reduced, on the other hand, the Bias values stored in the storage and calculation integrated unit in part or all of the Bias array are reduced, the smaller the Bias value stored in a single storage and calculation integrated unit is, the smaller the conductance is, the lower the current is, the smaller the current noise is and the higher the operation precision is under the same voltage; in addition, after the parameters of each layer of neural network are sequenced and arranged according to the snake shape, effective complementation is realized, the utilization efficiency of the storage and calculation integrated unit array is increased, and the required storage and calculation integrated unit array can be greatly reduced under the same operation scale, so that the requirement of chip miniaturization is met.

The embodiment of the invention also provides a storage and calculation integrated chip, which comprises: an array of banked cells for performing neural network operations, the array of banked cells comprising: the device comprises a main array and a Bias array, wherein weight matrixes corresponding to each layer of neural network are mapped in the main array; the Bias array is mapped with the Bias corresponding to each layer of neural network;

In an optional embodiment, the main array comprises a plurality of storage and calculation integrated unit blocks distributed in an array, and the weight matrix corresponding to each layer is arranged on the storage and calculation integrated unit blocks in the main array;

In an alternative embodiment, the principle of ordering is: mapping and sequencing each layer according to the minimum number of rows of the Bias corresponding to each layer; and carrying out mapping sorting on the layers with the same minimum row number of the Bias according to the column number of the weight matrix.

In an optional embodiment, the Bias arrangement manner further expands the number of rows occupied by each Bias based on the minimum number of rows of the Bias and the idle condition of the Bias array, on the basis of the position arranged in the Bias array corresponding to the arrangement position column of the corresponding weight matrix.

The storage and computation integrated chip, the weight matrix and the corresponding Bias arrangement mode provided by the embodiment of the invention are generated according to the neural network mapping method.

It should be noted that the memory integrated chip provided in the embodiment of the present invention may be applied to various electronic devices, such as: smart phones, tablet electronic devices, network set-top boxes, portable computers, desktop computers, Personal Digital Assistants (PDAs), vehicle-mounted devices, smart wearable devices, toys, smart home control devices, pipeline device controllers, and the like. Wherein, intelligence wearing equipment can include intelligent glasses, intelligent wrist-watch, intelligent bracelet etc..

Based on the same inventive concept, the present application further provides a neural network mapping apparatus for storing a monolithic chip, which can be used to implement the methods described in the foregoing embodiments, as described in the following embodiments. The principle of solving the problems by the neural network mapping device for storing and calculating the all-in-one chip is similar to that of the method, so the implementation of the neural network mapping device for storing and calculating the all-in-one chip can refer to the implementation of the method, and repeated parts are not described again. As used hereinafter, the term "unit" or "module" may be a combination of software and/or hardware that implements a predetermined function. Although the means described in the embodiments below are preferably implemented in software, an implementation in hardware, or a combination of software and hardware is also possible and contemplated.

Fig. 7 is a block diagram of a neural network mapping apparatus for storing a monolithic chip according to an embodiment of the present invention. The neural network mapping device for the storage and computation integrated chip comprises: a sortingmodule 10, aweight arrangement module 20 and an offsetarrangement module 30.

The sorting module maps and sorts each layer according to the minimum line number of the Bias corresponding to each layer of the neural network to be mapped and the weight matrix;

the weight arrangement module sequentially arranges the weight matrixes corresponding to the layers into a main array of the storage and computation integrated chip according to the mapping sequencing result;

the Bias arrangement module arranges the corresponding Bias to the position corresponding to the arrangement position column in the Bias array of the storage and computation integrated chip according to the arrangement position of the weight matrix and the minimum line number of the Bias;

The apparatuses, modules or units illustrated in the above embodiments may be implemented by a computer chip or an entity, or implemented by a product with certain functions. A typical implementation device is an electronic device, which may be, for example, a personal computer, a laptop computer, a cellular telephone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.

In a typical example, the electronic device specifically includes a memory, a processor, and a computer program stored on the memory and executable on the processor, and the processor executes the program to implement the steps of the neural network mapping method for storing a monolithic chip described above.

Referring now to FIG. 8, shown is a schematic diagram of an electronic device 600 suitable for use in implementing embodiments of the present application.

As shown in fig. 8, the electronic apparatus 600 includes a Central Processing Unit (CPU)601 that can perform various appropriate works and processes according to a program stored in a Read Only Memory (ROM)602 or a program loaded from astorage section 608 into a Random Access Memory (RAM)) 603. In the RAM603, various programs and data necessary for the operation of the system 600 are also stored. The CPU601, ROM602, and RAM603 are connected to each other via abus 604. An input/output (I/O)interface 605 is also connected tobus 604.

The following components are connected to the I/O interface 605: aninput portion 606 including a keyboard, a mouse, and the like; anoutput portion 607 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; astorage section 608 including a hard disk and the like; and acommunication section 609 including a network interface card such as a LAN card, a modem, or the like. Thecommunication section 609 performs communication processing via a network such as the internet. Thedriver 610 is also connected to the I/O interface 605 as needed. Aremovable medium 611 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on thedrive 610 as necessary, so that a computer program read out therefrom is mounted as necessary on thestorage section 608.

In particular, according to an embodiment of the present invention, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, an embodiment of the present invention includes a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the above-described neural network mapping method for storing a monolithic chip.

In such an embodiment, the computer program may be downloaded and installed from a network through thecommunication section 609, and/or installed from theremovable medium 611.

Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.

For convenience of description, the above devices are described as being divided into various units by function, and are described separately. Of course, the functionality of the units may be implemented in one or more software and/or hardware when implementing the present application.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The application may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The application may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.

The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the system embodiment, since it is substantially similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.

The above description is only an example of the present application and is not intended to limit the present application. Various modifications and changes may occur to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the scope of the claims of the present application.