Disclosure of Invention
The invention provides a multi-bit in-memory computing array structure and electronic equipment, which are used for solving the technical problem that the existing in-memory computing array structure only supports single-bit input and multiplication in-memory computing of weights and can only provide limited system-level reasoning precision.
The invention adopts the following technical scheme that the multi-bit memory internal computing array structure comprises:
The system comprises a plurality of voltage-controlled delay circuits which are arranged in a rectangular shape, wherein each row corresponds to a row of bit input values, each row corresponds to a row of bit weight values, each voltage-controlled delay circuit is provided with an input end, an output end, a voltage-controlled end and at least one control end, in each row, the output end of each voltage-controlled delay circuit is connected with the input end of the next row, the same control end of each row is connected with and receives corresponding bit weight values, the voltage-controlled end of each row is connected with and receives corresponding bit input values, when a control end signal is 0 time, a reference signal generates a delay through the corresponding input end and the output end, when a control end signal is 1 time, the reference signal generates a delay second through the corresponding input end and the output end, the delay second is the sum of delay adjustment quantity and the delay adjustment quantity, the delay adjustment quantity is in linear positive correlation with the corresponding voltage-controlled end signal, each voltage-controlled delay circuit comprises a high-bit voltage-controlled delay unit and a low-bit voltage-controlled delay unit, each voltage-controlled delay unit is provided with the input end, the output end, the voltage-controlled end of each voltage-controlled unit, two control ends opposite to the voltage-controlled end signal are connected with the high-bit voltage-controlled delay unit, and the low-level delay unit, and the high-bit delay unit is connected with the high-level delay unit, and the high-bit delay unit, and the low-bit delay unit is connected with the high-level control unit, and the high-bit delay unit, and the high-level unit, and the delay unit is connected with the high-level unit, respectively, and the delay unit;
the control end one of each row of voltage-controlled delay circuits is connected with a local bit line LBL of the memory array, and the control end two is connected with a local bit line LBLB;
Wherein, the multiply-accumulate calculation result of a plurality of bit input values and a plurality of bit weight values is represented by the accumulation of the delay adjustment amounts of the multi-column voltage-controlled delay circuits; the computational array structure implements multiply-accumulate computation by:
(1) Taking the sum of four times of delay adjustment quantity generated by each row of high-bit voltage-controlled delay units and delay adjustment quantity generated by each row of low-bit voltage-controlled delay units as a total adjustment quantity;
(2) Taking the number of unit adjustment amounts contained in the total adjustment amount as a multiplication accumulation result of a plurality of bit input values and a single bit weight value;
(3) The multi-row voltage-controlled delay circuit is used for corresponding to a plurality of bit weight values, and a plurality of total adjustment amounts are added according to weight ratio to obtain multiplication calculation results of a plurality of bit input values and a plurality of bit weight values;
(4) The total adjustment quantity generated by combining the multiple rows of voltage-controlled delay circuits in a column form is used for representing the multiply-accumulate calculation result of the multiple bit input values and the multiple bit weight values.
The invention characterizes the calculation result by the rising edge delay adjustment quantity of the reference signal, and because the rising edge delay adjustment quantity can be accumulated, when the multiply-accumulate calculation of multi-bit input and multi-bit weight is needed, the multi-column multi-bit data memory internal calculation array is only needed to be combined in a row form, the reference signal output by each voltage-controlled delay circuit in the former column is the reference signal received by the corresponding voltage-controlled delay circuit in the latter column, and then the bit calculation result in the former column can be transferred to the calculation result of the corresponding bit in the next column, the column number of the voltage-controlled delay circuit determines the maximum accumulated item number which can be carried out by the structure, and also determines the group number of the multi-bit multiply-accumulate which can be simultaneously carried out by the structure. The computing array structure can provide larger system-level reasoning precision and efficiency, and solves the problem that the existing nonvolatile memory internal computing circuit generally only supports single-bit input and weighted multiply-accumulate internal computing and can only provide limited system-level reasoning precision.
Further, each voltage controlled delay unit includes:
The input control circuit is used for generating an input voltage analog quantity of the voltage control end according to signals of the first control end and the second control end;
The trigger circuit is connected with the input control circuit and is used for outputting delay adjustment quantity of the pulse width of the reference signal according to the signal of the input end;
And the inverter circuit is connected with the trigger circuit, is used for shaping the waveform output by the trigger circuit and is provided with the output end.
Still further, the trigger circuit comprises NMOS transistors N1-N3 and PMOS transistors P1, wherein the grid electrode of the N1 is connected with the reference signal, the source electrode is grounded, the grid electrode of the N2 is connected with the reference signal, the source electrode is connected with the node X1, the drain electrode is connected with the node X2, the grid electrode of the N3 is connected with the node X2, the source electrode is connected with the node X1, the drain electrode is connected with the node X3, the grid electrode of the P1 is connected with the reference signal, the source electrode is connected with the node X2, the drain electrode is connected with a power supply VDD, and the grid electrodes of the N1, the N2 and the P1 are used as input ends;
the input control circuit comprises NMOS transistors N4 and N5, wherein the grid electrode of the N4 is used as a first control end and is connected with a local bit line LBL, the source electrode is connected with a node X3, the drain electrode is used as the voltage control end, the grid electrode of the N5 is used as a second control end and is connected with a local bit line LBLB, the source electrode is grounded, and the drain electrode is connected with the node X3;
the inverter circuit comprises an NMOS tube N6 and a PMOS tube P2, wherein the grid electrode of the NMOS tube N6 is connected with a node X2, the source electrode is grounded, the drain electrode is used as the output end, the grid electrode of the P2 is connected with the node X2, the source electrode is used as the output end, and the drain electrode is connected with a power supply VDD.
Further, the in-memory computing array structure further includes:
The quantization module is used for converting the pulse width of the reference signal output by the last column of any voltage-controlled delay circuit from time analog quantity to binary digital signals;
And the digital shift adder module is used for amplifying the quantization result of the high-order voltage-controlled delay unit by four times and then adding the quantization result of the low-order voltage-controlled delay unit to obtain a multiplication result of the single-bit input value and the single-bit weight value, and carrying out weighted summation on the multiplication results of different bits according to the weight value to obtain a multiplication accumulation calculation result of the multi-bit input value and the multi-bit weight value.
As a further improvement of the scheme, the reference signal is a rectangular wave reference signal, the delay A is a fixed value t0 and is not influenced by voltage of the voltage control terminal when the signal of the control terminal A is 0 and the signal of the control terminal B is 1, and the delay B is t0 +kDeltat when the signal of the control terminal A is 1 and the signal of the control terminal B is 0;
The calculation array structure calculates multiply-accumulate calculation results of a plurality of bit input values and a plurality of bit weight values through the quantity that the delay adjustment quantity contains the unit adjustment quantity deltat.
Further, the number of the bit input values and the bit weight values is four, and the in-memory computing array structure further comprises:
And the input signal column channel module is used for dividing four bit input values into a high-order two-bit signal and a low-order two-bit signal, respectively controlling the output voltage of the input column channel I of the high-order voltage-controlled delay unit and the output voltage of the input column channel II of the low-order voltage-controlled delay unit, and determining the output voltage value by the corresponding two-bit signal.
As a further improvement of the above solution, the in-memory computing array structure further includes:
the storage array comprises a plurality of storage units which are arranged in a rectangular shape and used for storing a plurality of bit weight values and supporting the switching of a standard read-write mode and a multi-bit multiply-accumulate computing mode, wherein in the multi-bit multiply-accumulate computing mode, the computing array structure realizes multiply-accumulate computation of a plurality of bit input values and a plurality of bit weight values, and a first storage node and a second storage node of each storage unit are respectively connected with local bit lines LBL and LBLB.
Further, the in-memory computing array structure further includes:
The local read-write module is used for transmitting read-write signals through a horizontal word line HWL and a global bit line GBL/GBLB to complete read-write operation of the memory cells in a standard read-write mode, and comprises NMOS tubes N7 and N8, wherein a grid electrode of the N7 is connected with the horizontal word line HWL, a drain electrode is connected with a local bit line LBL and a source electrode is connected with the global bit line GBL, a grid electrode of the N8 is connected with the horizontal word line HWL, the drain electrode is connected with the local bit line LBLB and the source electrode is connected with the global bit line GBLB, when word lines WL of the horizontal word line HWL and an N-th row memory cell are started, write signals are transmitted to the local bit line LBL/LBLB through the global bit line GBL/GBLB, and then data are written into the N-th row memory cell, and N is any row of the memory cell.
The invention also provides electronic equipment which comprises a memory and a processor, wherein the memory comprises the multi-bit in-memory computing array structure.
Compared with the existing in-memory computing array structure and electronic equipment, the multi-bit in-memory computing array structure and electronic equipment have the following beneficial effects:
The multi-bit in-memory computing array structure characterizes a computing result by adjusting rising edge delay of a reference signal, and because the rising edge delay can be accumulated, when multi-bit input and multi-bit weight multiply-accumulate computation is needed, only the multi-column multi-bit in-memory computing array is needed to be combined in a row form, signals output by each voltage-controlled delay circuit in the former column are used as input of the corresponding voltage-controlled delay circuit in the latter column, and then bit computing results in the former column can be transferred to computing results of corresponding bits in the next column, the column number of the voltage-controlled delay circuit determines the maximum accumulated item number which can be carried out by the structure, and also determines the group number of multi-bit multiply-accumulate which can be simultaneously carried out by the structure. The computing array structure can provide larger system-level reasoning precision and efficiency, and solves the problem that the existing nonvolatile memory internal computing circuit generally only supports single-bit input and weighted multiply-accumulate internal computing and can only provide limited system-level reasoning precision.
Detailed Description
The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
Example 1
Referring to fig. 1 to 5, the present embodiment provides a multi-bit in-memory computing array structure, where the core structure of the in-memory computing array structure is a plurality of voltage-controlled delay circuits, in this embodiment, the in-memory computing array structure further includes a memory array, a quantization module, a digital shift adder module, an input signal column channel module, and a local read-write module, and in other embodiments, the in-memory computing array structure may further include other modules or circuits, which may be set according to actual needs.
The voltage-controlled delay circuits are arranged in a rectangular shape, i.e. the voltage-controlled delay circuits are arranged in a matrix array, similar to a memory array. The columns of the matrix array correspond to a row of bit input values and the rows correspond to a column of bit weight values. That is, the voltage-controlled delay circuits in each column respectively correspond to the bit input values, and the voltage-controlled delay circuits in each row respectively correspond to the bit weight values. Each voltage-controlled delay circuit has an input terminal, an output terminal, a voltage-controlled terminal, and at least one control terminal. In each row, the output end of each voltage-controlled delay circuit is connected with the input end of the next column. The same control terminal of each column is connected to and receives corresponding bit weight values, and the voltage control terminal of each row is connected to and receives corresponding bit input values. When the control terminal signal is 0, the reference signal generates a delay one through the corresponding input terminal and output terminal. When the control end signal is 1, the reference signal generates a second delay through the corresponding input end and output end. The second delay is the sum of the first delay and the delay adjustment quantity, and the delay adjustment quantity and the corresponding voltage-controlled end signal are in linear positive correlation. The in-memory calculation array structure is used for representing multiply-accumulate calculation results of a plurality of bit input values and a plurality of bit weight values through delay adjustment quantity accumulation of a plurality of columns of voltage-controlled delay circuits.
With continued reference to fig. 2, the memory array includes a plurality of memory cells arranged in a rectangular shape and configured to store a plurality of bit weight values. The storage arrays may be arranged in groups, each group for storing a bit weight value. The memory unit is preferably a 6T-SRAM in this embodiment, and the memory array is configured to switch between a standard read-write mode and a multi-bit multiply-accumulate mode, where the in-memory computing array structure implements multiply-accumulate computations for a plurality of bit input values and a plurality of bit weight values. The first and second storage nodes of the memory cell are connected to local bit lines LBL and LBLB, respectively.
In some embodiments, including the present embodiment, the reference signal is a rectangular wave reference signal. When the signal at the control terminal one is 0 and the signal at the control terminal two is 1, the delay one is not affected by the voltage value of the voltage control terminal and is defined as t0. When the first signal of the control end is 1 and the second signal of the control end is 0, the second signal is t0 +0Δt when the voltage-controlled end signal is 00 time delay, the second signal is t0 +1Δt when the voltage-controlled end signal is 01 time delay, the second signal is t0 +2Δt when the voltage-controlled end signal is 10, and the second signal is t0 +3Δt when the voltage-controlled end signal is 11. And defining deltat as a unit adjustment quantity, and setting deltat as the minimum adjustment quantity of the voltage-controlled delay circuit on the reference signal. The in-memory calculation array structure calculates multiply-accumulate calculation results of a plurality of bit input values and a plurality of bit weight values by the amount of the unit adjustment amount contained in the delay adjustment amount.
With continued reference to fig. 3, 4 and 5, in this embodiment, each voltage-controlled delay circuit includes two voltage-controlled delay units, which are respectively a high-order voltage-controlled delay unit and a low-order voltage-controlled delay unit. Each voltage-controlled delay unit is provided with an input end IN, an output end OUT, a voltage-controlled end (Vin, A is the case of a high-order voltage-controlled delay unit, B is the case of a low-order voltage-controlled delay unit) and two control ends with opposite signals, namely a control end C and a control end CB. The delay adjustment amount of the voltage-controlled delay unit is determined by the storage weight and the voltage value of the input signal together, so that multiplication calculation of 2-bit input and 1-bit weight can be realized. In the multi-bit in-memory computing mode, a plurality of voltage-controlled delay units delay rising edges of input rectangular wave reference signals, and the voltage-controlled delay units are connected in series step by step, so that multiply-accumulate computation of multi-bit input and multi-bit weights can be realized.
The output end OUT of each high-order voltage-controlled delay unit is connected with the input end IN of the next high-order voltage-controlled delay unit, and the control end of the high-order voltage-controlled delay unit is only connected with the control end of the low-order voltage-controlled delay unit and is not connected with the control end of the next high-order voltage-controlled delay unit. The output end OUT of each low-bit voltage-controlled delay unit is connected with the input end IN of the low-bit voltage-controlled delay unit IN the next row, and the low-bit voltage-controlled delay units are only connected with the control ends of the high-bit voltage-controlled delay units and are not connected with the control ends of the low-bit voltage-controlled delay units IN the next row. In each voltage-controlled delay circuit, a first control end C of the high-order voltage-controlled delay unit is connected with a first control end C of the low-order voltage-controlled delay unit, and a second control end CB of the high-order voltage-controlled delay unit is connected with a second control end CB of the low-order voltage-controlled delay unit. The first control end of each row of voltage-controlled delay circuits is connected with a local bit line LBL of the memory array, and the second control end of each row of voltage-controlled delay circuits is connected with a local bit line LBLB of the memory array. The number of the high-bit voltage-controlled delay unit and the low-bit voltage-controlled delay circuit for the rising edge delay adjustment quantity of the reference signal is the multiplication calculation result corresponding to the 2-bit input and the 1-bit weight.
The calculation array structure of the present embodiment realizes multiply-accumulate calculation by:
(1) Taking the sum of four times of delay adjustment quantity generated by each row of high-bit voltage-controlled delay units and delay adjustment quantity generated by each row of low-bit voltage-controlled delay units as a total adjustment quantity;
(2) Taking the number of unit adjustment amounts contained in the total adjustment amount as a multiplication accumulation result of a plurality of bit input values and a single bit weight value;
(3) The multi-row voltage-controlled delay circuit is used for corresponding to a plurality of bit weight values, and a plurality of total adjustment amounts are added according to weight ratio to obtain multiplication calculation results of a plurality of bit input values and a plurality of bit weight values;
(4) The total adjustment quantity generated by combining the multiple rows of voltage-controlled delay circuits in a column form is used for representing the multiply-accumulate calculation result of the multiple bit input values and the multiple bit weight values.
The multiplication calculation of the multi-bit input and the multi-bit weight can be realized in each column, the operation realized in different columns is the multiplication calculation of the multi-bit input and the multi-bit weight value in another group, and the multiplication accumulation result of the multi-bit input and the multi-bit weight value is realized through cascade connection of the multi-columns. The voltage-controlled input voltages of the voltage-controlled delay circuits in the same column are the same as A and B, but the input controls C, CB of the voltage-controlled delay units in the same column are not the same, and are the data weights of the corresponding bits respectively. Meaning that the final outputs of the different sets of voltage controlled delay cells are used to characterize the multiplication results of the data inputs and data weights of the corresponding bits. The delta t amount included in the sum of the four times of the adjustment amount of the high-bit voltage-controlled delay unit to the time of the rising edge of the reference signal and the time adjustment amount of the low-bit voltage-controlled delay circuit to the rising edge of the reference signal is the multiplication result of the bit input and the weight, and delta t in the above is the unit adjustment amount, namely the minimum adjustment amount of the voltage-controlled delay circuit to the reference signal.
As a design manner, each voltage-controlled delay unit in this embodiment includes an input control circuit, a trigger circuit, and an inverter circuit, and in other embodiments, the structure of the voltage-controlled delay unit may be adaptively adjusted. The input control circuit is used for generating an analog value of an input voltage of the voltage control terminal and is provided with a control terminal I, a control terminal II CB and a voltage control terminal Vin. The trigger circuit is connected with the input control circuit and is used for outputting the delay adjustment quantity of the pulse width of the reference signal and is provided with an input end IN. The inverter circuit is connected with the trigger circuit, is used for shaping the waveform output by the trigger circuit, and is provided with an output end OUT. C and CB are connected with LBL and LBLB of the memory array, the output of the input control circuit is connected with the trigger circuit, the output of the trigger circuit is connected with the input of the inverter, the input of the trigger circuit is rectangular wave reference signal IN, and the output port of the inverter is the output port OUT of the voltage-controlled delay circuit.
The trigger circuit is a Schmitt trigger circuit and comprises NMOS transistors N1-N3 and a PMOS transistor P1. The grid electrode of N1 is connected with the reference signal, the source electrode is grounded, the drain electrode is connected with the node X1, the grid electrode of N2 is connected with the reference signal, the source electrode is connected with the node X1, and the drain electrode is connected with the node X2. N3 has a gate connected to node X2, a source connected to node X1, and a drain connected to node X3. The grid electrode of P1 is connected with the reference signal, the source electrode is connected with the node X2, and the drain electrode is connected with the power supply VDD. Wherein the gates of N1, N2, P1 are used as input terminals.
The input control circuit comprises NMOS transistors N4 and N5. The gate of N4 is used as the first control terminal and is connected to the local bit line LBL, the source is connected to node X3, and the drain is used as the voltage control terminal. The gate of N5 is used as the second control terminal and is connected to the local bit line LBLB, the source is grounded, and the drain is connected to the node X3. When the input control C is 0, the voltage-controlled delay circuit makes the time delay t0 of the rising edge of the reference signal, and when the input control C is 1, the voltage-controlled delay circuit adjusts the rising edge time of the reference signal on the basis of the time delay t0 of the rising edge of the reference signal and the adjustment strategy is that the delay time of the rising edge of the reference signal is in linear positive correlation with the voltage-controlled input Vin.
The inverter circuit comprises an NMOS tube N6 and a PMOS tube P2. The grid electrode of N6 is connected with the node X2, the source electrode is grounded, and the drain electrode is used as the output end. The gate of P2 is connected to node X2, the source is the output, and the drain is connected to power supply VDD. I.e. the drain of N6 is connected to the source of P2 and serves as output OUT.
The voltage-controlled delay unit has the following functions:
When c=0 and cb=1 (the inverse of C), the delay amount of the voltage-controlled delay circuit is not affected by the voltage value of the voltage-controlled port, and the rising edge delay amount is t0 after the rectangular wave is delayed by the voltage-controlled delay circuit.
When c=1, cb=0, there are several cases as follows:
When Vin= V00, the rising edge time delay amount is t0 +0Deltat,
When Vin= V01, the rising edge time delay amount is t0 +1Deltat,
When Vin= V10, the rising edge time delay amount is t0 +2Deltat,
At Vin= V11, the rising edge delay amount is t0 +3Deltat.
Namely, when the signal at the control end is 0 and the signal at the control end is 1, the delay time I is a fixed value t0 and is not influenced by the voltage of the voltage-controlled end. When the signal at the control end I is 1 and the signal at the control end II is 0, the delay II is t0 +kDeltat. k is the binary value of the voltage-controlled end signal, Δt is the unit adjustment quantity, and represents the minimum adjustment quantity of the voltage-controlled delay circuit to the reference signal. The calculation array structure of the present embodiment calculates the multiply-accumulate calculation result of the plurality of bit input values and the plurality of bit weight values by the number of unit adjustment amounts Δt contained in the delay adjustment amounts.
Table 1 function table of voltage controlled delay unit
The quantization module is used for converting the pulse width of the reference signal output by the last column of any voltage-controlled delay circuit from time analog quantity to binary digital signal. The quantization module quantizes the pulse width of the reference signal output by any voltage-controlled delay circuit in the last column and converts the pulse width from time analog quantity to binary digital signal. In this embodiment, the quantization module is a TDC quantization module, and the time-to-digital converter quantizes the rising edge delay time of the last column output of the array, converts the time amount into a binary code, and the quantized data may be a multi-bit input and multi-bit weight multiply-accumulate result of 32 columns (which may be changed according to need).
The digital shift adder module is used for amplifying the quantization result of the high-order voltage-controlled delay unit by four times and then adding the quantization result of the low-order voltage-controlled delay unit to obtain a multiplication result of a single-bit input value and a single-bit weight value, and carrying out weighted summation on the multiplication results of different bits according to the weight value to obtain a multiplication accumulation calculation result of a multi-bit input value and a multi-bit weight value. The method is used for amplifying the quantized result of the high-order voltage-controlled delay unit by four times and then adding the quantized result of the corresponding low-order voltage-controlled delay unit to obtain a multiplication result of the corresponding bit input and weight. The digital shift adder module is further used for carrying out weighted summation on multiplication calculation results of different bit inputs and weights according to bit relation so as to obtain multiplication accumulation calculation results of multi-bit input values and multi-bit weight values.
The local read-write module is used for transmitting read-write signals through the horizontal word line HWL and the global bit line GBL/GBLB in a standard read-write mode so as to complete read-write operation of the memory cell. The local read-write module comprises a seventh NMOS tube N7 and an eighth NMOS tube N8. The seventh NMOS transistor N7 has a gate connected to the horizontal word line HWL, a drain connected to the local bit line LBL, and a source connected to the global bit line GBL. The eighth NMOS transistor N8 has a gate connected to the horizontal word line HWL, a drain connected to the local bit line LBLB, and a source connected to the global bit line GBLB. When the horizontal word lines HWL and the word lines WL of the n-th row of memory cells are turned on, write signals are transmitted to the local bit lines LBL/LBLB through the global bit lines GBL/GBLB, then data are written into the n-th row of memory cells, and n is any row number of the memory cells.
The present embodiment illustrates that the number of bit input values and bit weight values is four, that is, the multiply-accumulate result of four bit input IN3IN2IN1IN0 and four bit weight W3W2W1W0 is achieved. The four groups of voltage-controlled delay units are respectively used for four bits, so that the four-bit data input and the four-bit multiplication result of the data weight can be obtained through the four groups of voltage-controlled delay units, and then the four-bit input IN3IN2IN1IN0 and the four-bit multiplication result of the four-bit weight W3W2W1W0 can be obtained by adding the high-bit to low-bit multiplication result according to the weight ratio of 8/4/2/1.
The input signal column channel module is used for dividing four bit input values IN3IN2IN1IN0 into a high-order two-bit signal IN3IN2 and a low-order two-bit signal IN1IN0, the high-order two-bit signal IN3IN2 and the low-order two-bit signal IN1IN0 respectively control output voltages of an input column channel I connected with the high-order voltage-controlled delay unit and an input column channel II connected with the low-order voltage-controlled delay unit, and the voltage values are determined by the corresponding two-bit signals.
Referring to fig. 6-13, the present embodiment may implement a multiplication of the four bit input IN3IN2IN1IN0 and the four bit weight W3W2W1W0. The core of the method is that the calculation result is represented by the rising edge delay adjustment quantity of the reference signal, and because the rising edge delay adjustment quantity can be accumulated, when the multiply-accumulate calculation of the four-bit input IN3IN2IN1IN0 and the four-bit weight W3W2W1W0 is needed to be realized, the multi-row multi-bit data memory calculation array is only needed to be combined IN a column form, the reference signal output by each voltage-controlled delay circuit IN the former column is the reference signal received by the voltage-controlled delay circuit corresponding to the latter column, and the corresponding reference signals refer to the high-order delay circuit or the low-order delay circuit corresponding to the same bit, for example, the high-order delay circuit corresponding to the second bit IN the former column and the high-order delay circuit corresponding to the second bit IN the latter column have the corresponding relation. And then the bit calculation result in the previous column can be transferred to the calculation result of the corresponding bit in the next column. The multi-bit data memory internal computing array provided by the embodiment supports the formation of a multi-bit data memory internal computing array structure through a combination mode, so that the multiply-accumulate computation of the four-bit input IN3IN2IN1IN0 and the four-bit weight W3W2W1W0 can be realized, larger system-level reasoning precision and efficiency can be provided, and the problem that the existing nonvolatile memory internal computing circuit usually only supports the multiply-accumulate internal computation of single-bit input and weight and only can provide limited system-level reasoning precision is solved.
In fig. 8, VIN is 0, 175mv, 338mv, 576mv, respectively, and the corresponding delays are 26p, 32p, 38p, 44p, respectively. In fig. 9 ,INPUT = 503ps、 t0 = 26ps、Δt = 6 ps,OUT7 = 649ps OUT6 = 637ps、OUT5 = 625ps、OUT4 = 637ps、OUT3 = 643ps、OUT2 = 643ps OUT1 = 631ps、OUT0 = 631ps,, :[(649-503)-4t0]/Δt=7;[(637-503)-4t0]/Δt=5;[(625-503)-4t0]/Δt=3;[(637-503)-4t0]/Δt=5;[(643-503) 4t0-]/Δt=6;[(643-503)-4t0]/Δt=6;[(631-503)-4t0]/Δt=4;[(631-503)-4t0]/Δt=4. is present, and in fig. 10-13 std is 2p, 2.5p, 3.1p, 4.2p, meng Ka run about 100 times, the fluctuation is below 5ps, and the fluctuation is very small.
The control inputs of the four voltage-controlled delay cells corresponding to the same bit in the four columns are respectively 1,0,1 and 1, and the input voltages of the four voltage-controlled delay cells are as follows a1=v11,A2 = V10,A3 = V00, A4 = V01. The theoretical quantization result of TDC is 3 x 1+2×0+0× 1+1×1=4. The rectangular wave rising edge delay t0 +3Δt is passed through a first row voltage-controlled delay circuit, the rectangular wave rising edge delay t0 +0Δt is passed through a second row voltage-controlled delay circuit, the rectangular wave rising edge delay t0 +0Δt is passed through a third row voltage-controlled delay circuit, and the rectangular wave rising edge delay t0 +Δt is passed through a fourth row voltage-controlled delay circuit, and the total delay is 4t0 +4Δt.
Only the rising delay amount of the last column of rectangular wave is converted into binary number (the number of the unit pulse delay adjustment amount deltat) and then the binary number is subtracted from the binary number set according to the rising edge time of the initial rectangular wave, so that the result of multiplying and accumulating four columns of 2bit multiplied by 1bit in the example can be obtained.
In this embodiment, a multi-bit multiply-accumulate process for calculating the array structure in the multi-bit data memory is described, which illustrates that the multiply result in the previous column can be accumulated into the multiply result in the next column. Further, in the multi-bit data memory internal computing array structure, the number of columns of the multi-bit data memory internal computing array determines the maximum accumulation item number which can be performed by the structure, and the number of rows of the multi-bit data memory internal computing array determines the number of groups of four-bit multiply accumulation which can be performed by the structure simultaneously. For example, in this embodiment, the number of the computing arrays in the multi-bit data memory in each column is 4, which means that the multiply-accumulate computation of 4 groups of multi-bit ratios can be parallel, and the total number of the computing arrays in the multi-bit data memory is 32, which means that the multiply-accumulate result of 32 multi-bit ratios can be accumulated maximally.
Compared with the existing in-memory computing array structure, the multi-bit in-memory computing array structure and the electronic device of the embodiment have the following beneficial effects:
The multi-bit in-memory computing array structure characterizes a computing result by adjusting rising edge delay of a reference signal, and because the rising edge delay can be accumulated, when multi-bit input and multi-bit weight multiply-accumulate computation is needed, only the multi-column multi-bit in-memory computing array is needed to be combined in a row form, signals output by each voltage-controlled delay circuit in the former column are used as input of the corresponding voltage-controlled delay circuit in the latter column, and then bit computing results in the former column can be transferred to computing results of corresponding bits in the next column, the column number of the voltage-controlled delay circuit determines the maximum accumulated item number which can be carried out by the structure, and also determines the group number of multi-bit multiply-accumulate which can be simultaneously carried out by the structure. The computing array structure can provide larger system-level reasoning precision and efficiency, and solves the problem that the existing nonvolatile memory internal computing circuit generally only supports single-bit input and weighted multiply-accumulate internal computing and can only provide limited system-level reasoning precision.
Example 2
The present embodiment provides a Static Random Access Memory (SRAM) that uses the multi-bit in-memory computing array structure of embodiment 1 to implement multiply-accumulate computation of multi-bit inputs and multi-bit weights, and can implement multiply-accumulate computation of four-bit inputs and four-bit weights.
Based on the multi-bit in-memory computing array structure in embodiment 1, the in-memory computing of the SRAM in this embodiment directly completes multiply-accumulate operation in the memory cell, thereby reducing data movement and significantly reducing power consumption. The SRAM can process multiply-accumulate operations of a plurality of inputs and weights simultaneously, so that the calculation efficiency is greatly improved, and the SRAM is particularly suitable for application scenes requiring high throughput. The read-write speed of the SRAM is far higher than that of the DRAM and the flash memory, low-delay multiply-accumulate calculation can be realized, and the SRAM is suitable for applications with high real-time requirements, such as edge calculation and Internet of things equipment. The SRAM can be integrated with other computing units (such as CPU and GPU) on the same chip to form an efficient memory-computing integrated architecture.
The SRAM of the present embodiment is suitable for artificial intelligence and machine learning. The reasoning and training process of the neural network involves a large number of multiply-accumulate operations, and the SRAM in-memory computation can obviously accelerate the operations and improve the overall performance. The multi-bit inputs and weights enable SRAM to support from simple linear models to complex deep neural networks. In-memory computation of the SRAM of the present embodiment reduces the complex interface between the memory and the processor in the conventional computing architecture, simplifying the system design. SRAM can reduce the overall cost and power consumption of the system by reducing data movement and simplifying the architecture.
Example 3
The embodiment provides an electronic device including a memory and a processor. The memory includes the multi-bit in-memory compute array architecture of example 1. Compared with the existing electronic equipment, the electronic equipment can remarkably improve the calculation efficiency, reduce the power consumption and support high-precision calculation. The method has wide application prospects in the fields of artificial intelligence, edge computing and the like, and has the advantages that the method becomes an important technical direction of in-memory computing in spite of facing some technical challenges.
Example 4
The present embodiment provides a computer device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor. Wherein, the memory is the static random access memory in embodiment 2.
The computer device may take various forms, either as an embedded chip or module, or as a general-purpose data processing device, such as an intelligent terminal, a tablet, a notebook, a desktop, a rack-mounted server, a blade server, a tower server, or a rack-mounted server (including an independent server, or a server cluster formed by multiple servers), which can execute a program, or the like.
The computer device of the present embodiment includes at least, but is not limited to, a memory and a processor that can be communicatively connected to each other via a system bus. The memory (i.e., readable storage medium) includes flash memory, hard disk, multimedia card, card memory (e.g., SD or DX memory, etc.), random Access Memory (RAM), static Random Access Memory (SRAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), programmable Read Only Memory (PROM), magnetic memory, magnetic disk, optical disk, etc. In some embodiments, the memory may be an internal storage unit of a computer device, such as a hard disk or memory of the computer device.
The processor may be a central processing unit (Central Processing Unit, CPU), an image processor GPU (Graphics Processing Unit), a controller, a microcontroller, a microprocessor, or other data processing chip in some embodiments. The processor is typically used to control the overall operation of the computer device. In this embodiment, the processor is configured to execute the program code stored in the memory or process the data.
Example 5
The embodiment provides an artificial intelligent chip, which comprises a multi-bit memory internal computing array structure in embodiment 1, and is provided with a standard read-write mode and a multi-bit multiply-accumulate computing mode, wherein the standard read-write mode can realize the read-write operation of data in a storage array. The chip can realize the operation of multiply-accumulate of a plurality of bit input values and a plurality of bit weight values in the multi-bit multiply-accumulate calculation mode. Therefore, the chip can efficiently process the artificial intelligence task, the problems of overhigh energy consumption and delay are not easy to occur, the occurrence of a storage wall is avoided, the data migration and the memory access consumption of a memory are greatly reduced, and the artificial intelligence chip of the embodiment has multiply accumulation of multi-bit Input (IN) and Output (OUT), thereby being beneficial to pushing the realization of the artificial intelligence chip with high reasoning precision.
The foregoing description of the preferred embodiments of the invention is not intended to be limiting, but rather is intended to cover all modifications, equivalents, and alternatives falling within the spirit and principles of the invention.