

技术领域technical field
本发明涉及累加器缓冲技术领域,具体为一种累加器缓冲结构及其数据累加卸载方法。The invention relates to the technical field of accumulator buffer, in particular to an accumulator buffer structure and a data accumulation and unloading method thereof.
背景技术Background technique
脉动阵列为二维运算结构,以数据流驱动方式实现运算加速。脉动阵列各个运算单元可以在相邻运算单元间传输数据,通过数据重用,减少对输入/输出数据的存取次数,进而降低访存带宽需求。脉动阵列可以在较小的访存带宽下实现较高的运算吞吐率,解决多数处理器面临的访存瓶颈问题,特别是在神经网络这种高密集型计算和访存的处理中应用优势明显。The systolic array is a two-dimensional computing structure, which realizes computing acceleration in a data flow-driven manner. Each computing unit of the systolic array can transmit data between adjacent computing units, and through data reuse, the number of accesses to input/output data is reduced, thereby reducing memory access bandwidth requirements. The systolic array can achieve high computing throughput with a small memory access bandwidth, and solve the memory access bottleneck problem faced by most processors, especially in the processing of high-intensity calculation and memory access such as neural networks. .
其中,矩阵乘法加速单元就是二维脉动阵列,其二维大小灵活可配,具体可根据性能和应用需求进行配置。如图1所示,矩阵乘法加速单元由多个同构的运算单元构成,北向数据在脉动阵列中从北向南传输并按照需要缓存在脉动阵列各运算单元中,西向数据在脉动阵列中从西向东传输西向数据到达运算单元后与缓存在运算单元内部的北向数据进行乘法运算,乘法结果与北边运算单元传输进来的累加数据进行加法,即完成累加操作,并把乘加结果向南边运算单元传输,实现乘加结果从北向南传输。Among them, the matrix multiplication acceleration unit is a two-dimensional systolic array, and its two-dimensional size is flexible and configurable, and can be configured according to performance and application requirements. As shown in Figure 1, the matrix multiplication acceleration unit is composed of multiple isomorphic computing units. The northbound data is transmitted from north to south in the systolic array and cached in each computing unit of the systolic array as required. The westbound data is transmitted from west to south in the systolic array. After the east-to-west data reaches the computing unit, it is multiplied with the northbound data buffered inside the computing unit, and the multiplication result is added to the accumulated data transmitted from the north computing unit, that is, the accumulation operation is completed, and the multiplication and addition results are transmitted to the south computing unit , realizing the transfer of multiply-add results from north to south.
又如图1所示,累加器缓冲设置在矩阵乘法加速单元的南边出口处,用于接收矩阵乘法加速单元传输过来的乘加结果并对乘加结果进行累加并缓存。当累加器缓冲收到卸载信号时,该轮累加操作过程结束,累加器缓冲中的数据被写回到局部数据存储器,然后需要对累加器缓冲中的数据进行卸载,卸载操作结束后,累加器缓冲才能重新进行下一轮累加缓存操作,这使得累加器缓冲的工作效率较低。As shown in Figure 1, the accumulator buffer is set at the south exit of the matrix multiplication acceleration unit, and is used to receive the multiplication and addition results transmitted by the matrix multiplication acceleration unit and accumulate and cache the multiplication and addition results. When the accumulator buffer receives the unload signal, the round of accumulation operation process ends, the data in the accumulator buffer is written back to the local data memory, and then the data in the accumulator buffer needs to be unloaded. After the unloading operation is completed, the accumulator The next round of accumulating cache operation can only be performed again after buffering, which makes the working efficiency of accumulator buffering low.
发明内容Contents of the invention
本发明针对现有技术存在的问题,提出了一种累加器缓冲结构及其数据累加卸载方法,能够通过第一缓冲和第二缓冲连续不断地对累加结果进行缓存处理,有效提高了累加器缓冲的工作效率。Aiming at the problems existing in the prior art, the present invention proposes an accumulator buffer structure and its data accumulation and unloading method, which can continuously cache the accumulation results through the first buffer and the second buffer, effectively improving the accuracy of the accumulator buffer. work efficiency.
本发明解决其技术问题所采用的技术方案是:一种累加器缓冲结构,包括The technical solution adopted by the present invention to solve its technical problems is: an accumulator buffer structure, comprising
一累加器缓冲控制逻辑;以及an accumulator buffer control logic; and
多个累加器缓冲模块,每一所述累加器缓冲模块均包括A plurality of accumulator buffer modules, each of which includes
双缓冲单元,包括第一缓冲和第二缓冲,当所述第一缓冲处于第一工作模式时,所述第二缓冲处于第二工作模式;当所述第一缓冲处于第二工作模式时,所述第二缓冲处于第一工作模式;其中,所述第一工作模式为对累加结果进行缓存,所述第二工作模式为对累加结果进行卸载;A double buffer unit, including a first buffer and a second buffer, when the first buffer is in the first working mode, the second buffer is in the second working mode; when the first buffer is in the second working mode, The second buffer is in a first working mode; wherein, the first working mode is to cache accumulation results, and the second working mode is to unload accumulation results;
控制寄存器,与所述累加器缓冲控制逻辑及所述双缓冲单元电性连接,用于接收并暂存所述累加器缓冲控制逻辑发出的控制信号;所述双缓冲单元根据所述控制信号确定所述第一缓冲及所述第二缓冲的工作模式。The control register is electrically connected with the accumulator buffer control logic and the double buffer unit, and is used to receive and temporarily store the control signal sent by the accumulator buffer control logic; the double buffer unit determines according to the control signal Working modes of the first buffer and the second buffer.
作为优选,所述累加器缓冲模块还包括Preferably, the accumulator buffer module also includes
第一数据寄存器,与矩阵乘法加速单元出口处的运算单元一一对应连接,用于获取并暂存对应运算单元输出的累加结果;The first data register is connected to the computing unit at the outlet of the matrix multiplication accelerating unit in one-to-one correspondence, and is used to obtain and temporarily store the accumulation result output by the corresponding computing unit;
第二数据寄存器,与所述双缓冲单元电性连接,用于获取并暂存处于第一工作模式下的所述第一缓冲或所述第二缓冲中的最新的累加结果;The second data register is electrically connected to the double buffer unit, and is used to obtain and temporarily store the latest accumulation result in the first buffer or the second buffer in the first working mode;
加法器,与所述第一数据寄存器及所述第二数据寄存器电性连接,用于将所述第一数据寄存器中的累加结果与所述第二数据寄存器中的累加结果进行相加以得到累加结果;An adder, electrically connected to the first data register and the second data register, for adding the accumulation result in the first data register to the accumulation result in the second data register to obtain accumulation result;
第三数据寄存器,与所述加法器及所述双缓冲单元电性连接,用于获取并暂存所述加法器输出的累加结果,且所述双缓冲单元将所述第三数据寄存器中的累加结果缓存至处于第一工作模式的第一缓冲或第二缓冲中。The third data register is electrically connected to the adder and the double buffer unit, and is used to obtain and temporarily store the accumulation result output by the adder, and the double buffer unit stores the data in the third data register The accumulation result is cached in the first buffer or the second buffer in the first working mode.
作为优选,所述累加器缓冲模块还包括Preferably, the accumulator buffer module also includes
结果获取单元,与所述双缓冲单元电性连接,当所述第一缓冲或所述第二缓冲由第一工作模式转为第二工作模式时,所述结果获取单元获取所述第一缓冲或所述第二缓冲中的累加结果;The result acquisition unit is electrically connected to the double buffer unit, and when the first buffer or the second buffer changes from the first working mode to the second working mode, the result acquiring unit acquires the first buffer or the accumulated result in the second buffer;
卸载执行单元,与所述结果获取单元以及所述双缓冲单元电性连接,当所述结果获取单元完成累加结果的获取后,所述卸载执行单元对所述第一缓冲或所述第二缓冲进行累加结果的卸载操作。an unloading execution unit, electrically connected to the result acquisition unit and the double buffer unit, and after the result acquisition unit completes the acquisition of the accumulation result, the unloading execution unit executes the first buffer or the second buffer Perform the unloading operation of the accumulation result.
作为优选,所述累加器缓冲结构还包括Preferably, the accumulator buffer structure also includes
结果写回模块,与所述结果获取单元以及局部数据存储器连接,用于将所述结果获取单元获取的所述累加结果写入局部数据存储器。The result write-back module is connected with the result acquisition unit and the local data storage, and is used for writing the accumulation result acquired by the result acquisition unit into the local data storage.
作为优选,所述累加器缓冲模块还包括Preferably, the accumulator buffer module also includes
控制信号有效性判定单元,与所述卸载执行单元及所述控制寄存器电性连接,当所述控制寄存器接收到的控制信号是在卸载执行单元未完成卸载操作时收到的,则判定接收到的控制信号为无效;当接收到的控制信号是在卸载执行单元已完成卸载操作时收到的,则判定接收到的控制信号为有效;The control signal validity determination unit is electrically connected to the unloading execution unit and the control register. When the control signal received by the control register is received when the unloading execution unit has not completed the unloading operation, it is determined that the unloading execution unit has received the control signal. The control signal is invalid; when the received control signal is received when the unloading execution unit has completed the unloading operation, it is determined that the received control signal is valid;
控制信号比对单元,与所述控制信号有效性判定单元以及所述控制寄存器电性连接,当新接收到的控制信号为有效时,用于将新接收到的控制信号与所述控制寄存器中上一次暂存的控制信号进行比对。The control signal comparison unit is electrically connected to the control signal validity determination unit and the control register, and is used to compare the newly received control signal with the control register in the control register when the newly received control signal is valid. Compare with the last temporarily stored control signal.
一种累加器缓冲的数据累加卸载方法,包括以下步骤A data accumulation and unloading method for accumulator buffering, comprising the following steps
S1累加器缓冲控制逻辑发出控制信号;S1 accumulator buffer control logic sends a control signal;
S2控制寄存器接收并暂存累加器缓冲控制逻辑发出的控制信号;The S2 control register receives and temporarily stores the control signal sent by the accumulator buffer control logic;
S3双缓冲单元根据所述控制寄存器中的控制信号确定第一缓冲及第二缓冲的工作模式,且当所述第一缓冲处于第一工作模式时,所述第二缓冲处于第二工作模式;当所述第一缓冲处于第二工作模式时,所述第二缓冲处于第一工作模式;其中,所述第一工作模式为对累加结果进行缓存,所述第二工作模式为对累加结果进行卸载。The S3 double buffer unit determines the working modes of the first buffer and the second buffer according to the control signal in the control register, and when the first buffer is in the first working mode, the second buffer is in the second working mode; When the first buffer is in the second working mode, the second buffer is in the first working mode; wherein, the first working mode is to cache the accumulation result, and the second working mode is to cache the accumulation result uninstall.
作为优选,所述数据累加卸载方法还包括以下步骤Preferably, the data accumulation and unloading method further includes the following steps
S4第一数据寄存器获取并暂存运算单元输出的累加结果;S4 the first data register acquires and temporarily stores the cumulative result output by the arithmetic unit;
S5第二数据寄存器获取并暂存处于第一工作模式下的所述第一缓冲或所述第二缓冲中的最新的累加结果;S5 The second data register acquires and temporarily stores the latest accumulation result in the first buffer or the second buffer in the first working mode;
S6加法器将所述第一数据寄存器中的累加结果与所述第二数据寄存器中的累加结果进行相加以得到累加结果;The S6 adder adds the accumulation result in the first data register to the accumulation result in the second data register to obtain the accumulation result;
S7第三数据寄存器获取并暂存所述加法器输出的累加结果;S7 The third data register obtains and temporarily stores the accumulation result output by the adder;
S8双缓冲单元通将所述第三数据寄存器中的累加结果缓存至处于第一工作模式的第一缓冲或第二缓冲中。The S8 double buffer unit buffers the accumulation result in the third data register into the first buffer or the second buffer in the first working mode.
作为优选,所述数据累加卸载方法还包括以下步骤Preferably, the data accumulation and unloading method further includes the following steps
L1当所述第一缓冲或所述第二缓冲由第一工作模式转为第二工作模式时,结果获取单元获取所述第一缓冲或所述第二缓冲中的累加结果;L1 When the first buffer or the second buffer changes from the first working mode to the second working mode, the result acquisition unit acquires the accumulation result in the first buffer or the second buffer;
L2当所述结果获取单元完成累加结果的获取后,卸载执行单元对所述述第一缓冲或所述第二缓冲进行累加结果的卸载操作。L2 After the result acquisition unit completes the acquisition of the accumulation result, the unloading execution unit performs an unloading operation on the first buffer or the second buffer.
作为优选,所述数据累加卸载方法还包括以下步骤Preferably, the data accumulation and unloading method further includes the following steps
L3结果写回模块将所述结果获取单元获取的所述累加结果写入局部数据存储器。The L3 result write-back module writes the accumulation result obtained by the result obtaining unit into a local data memory.
作为优选,所述S2具体包括As a preference, said S2 specifically includes
S21控制寄存器接收累加器缓冲控制逻辑发出的控制信号;The S21 control register receives the control signal sent by the accumulator buffer control logic;
S22控制信号有效性判定单元判定接收到的控制信号是否有效,当接收到的控制信号是在卸载执行单元未完成卸载操作时收到的,则判定接收到的控制信号为无效;当接收到的控制信号是在卸载执行单元已完成卸载操作时收到的,则判定接收到的控制信号为有效;S22 The control signal validity determination unit determines whether the received control signal is valid, and when the received control signal is received when the unloading execution unit has not completed the unloading operation, then it is determined that the received control signal is invalid; If the control signal is received when the unloading execution unit has completed the unloading operation, it is determined that the received control signal is valid;
S23当接收到的控制信号为无效时,控制寄存器将接收到的控制信号删除;S23, when the received control signal is invalid, the control register deletes the received control signal;
S24当接收到的控制信号为有效时,控制信号比对单元将接收到的控制信号与所述控制寄存器中上一次暂存的控制信号进行比对,当接收到的控制信号中第一缓冲的工作模式与上一次暂存的控制信号中第一缓冲的工作模式相同时,则控制寄存器将接收到的控制信号删除;当接收到的控制信号中第一缓冲的工作模式与上一次暂存的控制信号中第一缓冲的工作模式不同时,控制寄存器将接收到的控制信号暂存至控制寄存器中;S24 When the received control signal is valid, the control signal comparison unit compares the received control signal with the last temporarily stored control signal in the control register, and when the first buffered control signal in the received control signal When the working mode is the same as the working mode of the first buffer in the last temporarily stored control signal, the control register will delete the received control signal; when the working mode of the first buffer in the received control signal is the same as the last temporarily stored When the working mode of the first buffer in the control signal is different, the control register temporarily stores the received control signal into the control register;
S25控制寄存器将新暂存的控制信号发送给双缓冲单元。The S25 control register sends the newly temporarily stored control signal to the double buffer unit.
有益效果Beneficial effect
本发明的实施例中,累加器缓冲模块可以通过第一缓冲对累加结果进行缓存,同时可以对第二缓冲中已缓存的累加结果进行卸载,当收到下一次控制信号时,累加器缓冲模块可以直接通过第二缓冲对累加结果进行缓存,同时对第一缓冲中已缓存的累加结果进行卸载,如此不断循环,省去了现有技术中等待累加结果卸载的时间,进而有效提高了累加器缓冲的工作效率。In the embodiment of the present invention, the accumulator buffer module can cache the accumulated result through the first buffer, and can unload the accumulated accumulated result in the second buffer at the same time. When receiving the next control signal, the accumulator buffer module The accumulation result can be cached directly through the second buffer, and at the same time, the accumulated accumulation result cached in the first buffer is unloaded, so that the continuous cycle saves the time of waiting for the accumulation result to be unloaded in the prior art, thereby effectively improving the accumulator. Buffered work efficiency.
附图说明Description of drawings
图1为现有技术中矩阵乘法加速单元与累加器缓冲的工作原理图;Fig. 1 is a working principle diagram of a matrix multiplication acceleration unit and an accumulator buffer in the prior art;
图2为本发明实施例中累加器缓冲模块的结构示意图。FIG. 2 is a schematic structural diagram of an accumulator buffer module in an embodiment of the present invention.
具体实施方式Detailed ways
下面结合附图并通过具体实施方式来进一步说明本发明的技术方案。The technical solutions of the present invention will be further described below in conjunction with the accompanying drawings and through specific implementation methods.
实施例1:如图2所示,一种累加器缓冲结构,包括一个累加器缓冲控制逻辑和多个累加器缓冲模块,累加器缓冲模块的数量与矩阵乘法加速单元中列的数量相同,一列运算单元对应一个累加器缓冲模块,且全部累加器缓冲模块均通过该个累加器缓冲控制逻辑进行控制。Embodiment 1: as shown in Figure 2, a kind of accumulator buffer structure comprises an accumulator buffer control logic and a plurality of accumulator buffer modules, the quantity of the accumulator buffer module is the same as the number of columns in the matrix multiplication acceleration unit, one column The arithmetic unit corresponds to an accumulator buffer module, and all the accumulator buffer modules are controlled by the accumulator buffer control logic.
具体的,每一所述累加器缓冲模块均包括双缓冲单元和控制寄存器7。Specifically, each of the accumulator buffer modules includes a double buffer unit and a
双缓冲单元包括结构相同的第一缓冲1和第二缓冲2。当所述第一缓冲1处于第一工作模式时,所述第二缓冲2处于第二工作模式;当所述第一缓冲1处于第二工作模式时,所述第二缓冲2处于第一工作模式;其中,所述第一工作模式为对累加结果进行缓存,所述第二工作模式为对累加结果进行卸载。The double buffer unit includes a
当第一缓冲1正处于对累加结果进行缓存的模式时,则第二缓冲2一定是处于对累加结果进行卸载的模式,当第一缓冲1正处于对累加结果进行卸载的模式时,则第二缓冲2一定是正处于对累加结果进行缓存的模式。总之,第一缓冲1与第二缓冲2所处的工作模式一定是不同的。When the
控制寄存器7与所述累加器缓冲控制逻辑及所述双缓冲单元电性连接,用于接收并暂存所述累加器缓冲控制逻辑发出的控制信号。所述双缓冲单元根据所述控制信号确定所述第一缓冲1及所述第二缓冲2的工作模式。The
初始状态下,第一缓冲1和第二缓冲2中都没有缓存数据,累加器缓冲控制逻辑可以发出随机确定第一缓冲1、第二缓冲2工作模式的控制信号。例如,第一次控制信号可以是确定第一缓冲1进入第二工作模式,第二缓冲2进入第一工作模式。随后,累加器缓冲控制逻辑会依次发出工作模式与上一次相反的控制信号。例如,第二次控制信号为确定第一缓冲1进入第一工作模式,第二缓冲2进入第二工作模式。In the initial state, neither the
本实施例中,累加器缓冲模块可以先通过第一缓冲1对累加结果进行缓存,同时可以对第二缓冲2中已缓存的累加结果进行卸载(卸载完成后第二缓冲2处于待用状态)。这样,当收到下一次控制信号时,累加器缓冲模块就可以直接通过第二缓冲2对累加结果进行缓存,同时对第一缓冲1中已缓存的累加结果进行卸载(不需要像现有技术一样,在收到数据卸载信号后,一定要等待累加器缓冲先将内部的数据全部卸载之后才能进行下一轮数据的缓存),本实施例的累加器缓冲模块可以如此不断循环,省去了等待数据(即累加结果)卸载的时间,进而有效提高了本实施例累加器缓冲的工作效率。In this embodiment, the accumulator buffer module can first cache the accumulated results through the
进一步的,所述累加器缓冲模块还包括第一数据寄存器3,第二数据寄存器4,加法器5和第三数据寄存器6。Further, the accumulator buffer module further includes a
第一数据寄存器3与矩阵乘法加速单元南边出口处的运算单元一一对应连接,用于获取并暂存对应运算单元输出的乘加结果。第二数据寄存器4与双缓冲单元电性连接,用于获取并暂存处于第一工作模式下的所述第一缓冲1或第二缓冲2中的最新的累加结果。加法器5与第一数据寄存器3及第二数据寄存器4电性连接,用于将所述第一数据寄存器3中的乘加结果与第二数据寄存器4中的累加结果进行相加以得到最新的累加结果。第三数据寄存器6与所述加法器5及双缓冲单元电性连接,用于获取并暂存加法器5输出的累加结果。双缓冲单元将第三数据寄存器6中的累加结果缓存至处于第一工作模式的第一缓冲1或第二缓冲2中。The
假设在第一轮累加结果累加缓存过程中,第一缓冲1处于第一工作模式、第二缓冲2处于第二工作模式。首先第一数据寄存器3获取并暂存与之连接的运算单元输出的乘加结果,同时第二数据寄存器4获取并暂存第一缓冲1中缓存的最新的累加结果。然后通过加法器5将第一数据寄存器3中的乘加结果与第二数据寄存器4中的累加结果进行相加以得到最新的累加结果,并将最新的累加结果暂存在第三数据寄存器6中。最后当第三数据寄存器6获得最新的累加结果后,双缓冲单元将第三数据寄存器6中的累加结果缓存至第一缓冲1中,此时一个累加结果的累加缓存步骤结束。当一个累加结果的累加缓存步骤结束后,累加器缓冲模块会以相同的步骤继续下一个累加结果的累加缓存操作。另外,在第一缓冲1不断进行累加结果累加缓存的同时,第二缓冲2会对其内部早就缓存的累加结果进行卸载,使得第二缓冲2清零成待用状态。Assume that during the first round of accumulation and caching of accumulation results, the
一定时间后,累加器缓冲控制逻辑会给控制寄存器7发送一个新的控制信号(该控制信号会将第一缓冲1改为第二工作模式,将第二缓冲2改为第一工作模式),使得累加器缓冲模块进入第二轮累加结果累加缓存过程。其中累加结果的累加缓存步骤与上述步骤相同,且当一个累加结果的累加缓存步骤结束后,累加器缓冲模块会以相同的步骤继续下一个累加结果的累加缓存。另外,在第二缓冲2不断进行累加结果累加缓存的同时,第一缓冲1会对其内部在第一轮累加结果累加缓存过程中缓存的累加结果进行卸载,使得第一缓冲1清零成待用状态。After a certain period of time, the accumulator buffer control logic will send a new control signal to the control register 7 (the control signal will change the
一定时间后,累加器缓冲控制逻辑会继续给控制寄存器7发送一个新的控制信号(该控制信号会将第一缓冲1改为第一工作模式,将第二缓冲2改为第二工作模式),使得累加器缓冲模块进入第三轮累加结果累加缓存过程,且如此不断循环。After a certain period of time, the accumulator buffer control logic will continue to send a new control signal to the control register 7 (this control signal will change the
本实施例通过第一数据寄存器3、第二数据寄存器4、加法器5、第三数据寄存器6与双缓冲单元、控制寄存器7配合使用,使得累加器缓冲模块能够连续不断地对矩阵乘法加速单元传输过来的乘加结果进行累加缓存处理,进而实现了累加器缓冲的高工作效率。In this embodiment, the
进一步的,所述累加器缓冲模块还包括结果获取单元和卸载执行单元。Further, the accumulator buffer module also includes a result acquisition unit and an unloading execution unit.
结果获取单元与所述双缓冲单元电性连接,当所述第一缓冲1或所述第二缓冲2由第一工作模式转为第二工作模式时,所述结果获取单元获取所述第一缓冲1或所述第二缓冲2中的累加结果。卸载执行单元与所述结果获取单元以及所述双缓冲单元电性连接,当所述结果获取单元完成累加结果的获取后,卸载执行单元对所述第一缓冲1或所述第二缓冲2进行累加结果的卸载操作。The result acquisition unit is electrically connected to the double buffer unit, and when the
当累加器缓冲控制逻辑给控制寄存器7发送一个新的控制信号,使得第一缓冲1(或第二缓冲2)由第一工作模式转为第二工作模式时,需要确保第一缓冲1(或第二缓冲2)中的累加结果已经被提取出来后,才会对第一缓冲1(或第二缓冲2)中的累加结果进行卸载。When the accumulator buffer control logic sends a new control signal to the
假设在收到新的控制信号前,第一缓冲1处于第一工作模式,第二缓冲2处于第二工作模式。在收到新的控制信号后,第一缓冲1变为第二工作模式,第二缓冲2变为第一工作模式。当第一缓冲1变为第二工作模式后,需要先通过结果获取单元提取第一缓冲1中的累加结果,然后再通过卸载执行单元对第一缓冲1进行累加结果的卸载操作。同时,当第二缓冲2变为第一工作模式后,第二缓冲2能够立即对的累加结果进行累加缓存处理。Assume that before receiving a new control signal, the
进一步的,所述累加器缓冲结构还包括结果写回模块。一个累加器缓冲包括一个结果写回模块,该结果写回模块的输入端与全部累加器缓冲模块中的结果获取单元连接,该结果写回模块的输出端与局部数据存储器连接。通过结果写回模块能够将结果获取单元获取的累加结果写入局部数据存储器。Further, the accumulator buffer structure also includes a result write-back module. An accumulator buffer includes a result write-back module, the input of the result write-back module is connected to the result acquisition units in all accumulator buffer modules, and the output of the result write-back module is connected to the local data memory. The accumulation result acquired by the result acquisition unit can be written into the local data memory through the result write-back module.
结果获取单元获取一个累加结果,卸载执行单元就对应卸载一个累加结果。当结果获取单元将全部的累加结果都获取后,卸载执行单元正好完成全部累加结果的卸载。通过结果获取单元使得结果写回模块能够将全部的累加结果一次性写入局部数据存储器中。The result acquisition unit acquires an accumulation result, and the unloading execution unit corresponds to unloading an accumulation result. After the result acquisition unit acquires all the accumulation results, the unloading execution unit just completes the unloading of all the accumulation results. Through the result acquisition unit, the result write-back module can write all the accumulation results into the local data memory at one time.
进一步的,所述累加器缓冲模块还包括控制信号有效性判定单元和控制信号比对单元。Further, the accumulator buffer module also includes a control signal validity determination unit and a control signal comparison unit.
控制信号有效性判定单元与所述卸载执行单元及控制寄存器7电性连接,当所述控制寄存器7接收到的控制信号是在卸载执行单元未完成卸载操作时收到的,则判定接收到的控制信号为无效;当接收到的控制信号是在卸载执行单元已完成卸载操作时收到的,则判定接收到的控制信号为有效。The control signal validity determination unit is electrically connected to the unloading execution unit and the
当控制信号为无效时,说明卸载执行单元还在工作,第一缓冲1(或第二缓冲2)还未清零成待用状态,所以第一缓冲1(或第二缓冲2)不能在此时更改工作模式,则控制寄存器7会将无效的控制信号删除。当控制信号为有效时,说明卸载执行单元已经完成卸载工作,第一缓冲1(或第二缓冲2)已经清零成待用状态,所以第一缓冲1(或第二缓冲2)能够在此时更改工作模式,则控制寄存器7会将有效的控制信号替换掉控制寄存器7上一次暂存的控制信号,并将有效的控制信号发送给双缓冲单元,以使双缓冲单元中第一缓冲1(或第二缓冲2)的工作模式进行改变。When the control signal is invalid, it means that the unloading execution unit is still working, and the first buffer 1 (or the second buffer 2) has not been cleared to the standby state, so the first buffer 1 (or the second buffer 2) cannot be here When changing the working mode, the
控制信号比对单元与控制信号有效性判定单元以及控制寄存器7电性连接。当新接收到的控制信号为有效时,控制信号比对单元会将新接收到的控制信号中的第一缓冲1、第二缓冲2的工作模式与控制寄存器7上一次暂存的控制信号中的第一缓冲1、第二缓冲2的工作模式进行比对,当两者工作模式一样时,控制寄存器7仍旧将新接收到的控制信号删除;当两者工作模式不一样时,控制寄存器7再将有效的控制信号替换掉控制寄存器7上一次暂存的控制信号,并将有效的控制信号发送给双缓冲单元。The control signal comparison unit is electrically connected with the control signal validity determination unit and the
本实施例中的控制信号有效性判定单元和控制信号比对单元的设置,使得累加器缓冲模块不会将处于第二工作模式下且未完成数据卸载的第一缓冲1或第二缓冲2进行工作模式切换,避免累加器缓冲模块因随意更换工作模式而出现工作异常,进而不能有效地对矩阵乘法加速单元传输过来的乘加结果进行累加和缓存,导致累加器缓冲的工作效率变低。In this embodiment, the control signal validity determination unit and the control signal comparison unit are set so that the accumulator buffer module will not perform the
实施例2:一种累加器缓冲的数据累加卸载方法,采用实施例1中的累加器缓冲结构,具体包括以下步骤Embodiment 2: A method for accumulating and unloading data in an accumulator buffer, using the accumulator buffer structure in
S1累加器缓冲控制逻辑发出控制信号。The S1 accumulator buffer control logic issues control signals.
初始状态下,第一缓冲1和第二缓冲2中都没有缓存数据,累加器缓冲控制逻辑可以发出随机确定第一缓冲1、第二缓冲2工作模式的控制信号。例如,第一次控制信号可以是确定第一缓冲1进入第二工作模式,第二缓冲2进入第一工作模式。随后,累加器缓冲控制逻辑会依次发出工作模式与上一次相反的控制信号。例如,第二次控制信号为确定第一缓冲1进入第一工作模式,第二缓冲2进入第二工作模式。In the initial state, neither the
S2控制寄存器7接收并暂存累加器缓冲控制逻辑发出的控制信号。The S2 control register 7 receives and temporarily stores the control signal sent by the accumulator buffer control logic.
其中,所述S2具体包括S21控制寄存器7接收累加器缓冲控制逻辑发出的控制信号。S22控制信号有效性判定单元判定接收到的控制信号是否有效,当接收到的控制信号是在卸载执行单元未完成卸载操作时收到的,则判定接收到的控制信号为无效;当接收到的控制信号是在卸载执行单元已完成卸载操作时收到的,则判定接收到的控制信号为有效。S23当接收到的控制信号为无效时,控制寄存器7将接收到的控制信号删除。Wherein, the S2 specifically includes the S21 control register 7 receiving the control signal sent by the accumulator buffer control logic. S22 The control signal validity determination unit determines whether the received control signal is valid, and when the received control signal is received when the unloading execution unit has not completed the unloading operation, then it is determined that the received control signal is invalid; If the control signal is received when the unloading execution unit has completed the unloading operation, it is determined that the received control signal is valid. S23 When the received control signal is invalid, the
当控制信号为无效时,说明卸载执行单元还在工作,第一缓冲1(或第二缓冲2)还未清零成待用状态,所以第一缓冲1(或第二缓冲2)不能在此时更改工作模式,则控制寄存器7会将无效的控制信号删除。当控制信号为有效时,说明卸载执行单元已经完成卸载工作,第一缓冲1(或第二缓冲2)已经清零成待用状态,所以第一缓冲1(或第二缓冲2)能够在此时更改工作模式,则控制寄存器7会将有效的控制信号替换掉控制寄存器7上一次暂存的控制信号,并将有效的控制信号发送给双缓冲单元,以使双缓冲单元中第一缓冲1(或第二缓冲2)的工作模式进行改变。When the control signal is invalid, it means that the unloading execution unit is still working, and the first buffer 1 (or the second buffer 2) has not been cleared to the standby state, so the first buffer 1 (or the second buffer 2) cannot be here When changing the working mode, the
S24当接收到的控制信号为有效时,控制信号比对单元将接收到的控制信号与所述控制寄存器7中上一次暂存的控制信号进行比对,当接收到的控制信号中第一缓冲1的工作模式与上一次暂存的控制信号中第一缓冲1的工作模式相同时,控制寄存器7将接收到的控制信号删除;当接收到的控制信号中第一缓冲1的工作模式与上一次暂存的控制信号中第一缓冲1的工作模式不同时,控制寄存器7将接收到的控制信号暂存至控制寄存器7中。S25控制寄存器7将新暂存的控制信号发送给双缓冲单元。S24 When the received control signal is valid, the control signal comparison unit compares the received control signal with the last temporarily stored control signal in the
当新接收到的控制信号为有效时,控制信号比对单元会将新接收到的控制信号中的第一缓冲1、第二缓冲2的工作模式与控制寄存器7上一次暂存的控制信号中的第一缓冲1、第二缓冲2的工作模式进行比对,当两者工作模式一样时,控制寄存器7仍旧将新接收到的控制信号删除;当两者工作模式不一样时,控制寄存器7再将有效的控制信号替换掉控制寄存器7上一次暂存的控制信号,并将有效的控制信号发送给双缓冲单元。When the newly received control signal is valid, the control signal comparison unit will compare the operating modes of the
S3双缓冲单元根据所述控制寄存器7中的控制信号确定第一缓冲1及第二缓冲2的工作模式,且当所述第一缓冲1处于第一工作模式时,所述第二缓冲2处于第二工作模式;当所述第一缓冲1处于第二工作模式时,所述第二缓冲2处于第一工作模式;其中,所述第一工作模式为对累加结果进行缓存,所述第二工作模式为对累加结果进行卸载。The S3 double buffer unit determines the working modes of the
当第一缓冲1正处于对累加结果进行缓存的模式时,则第二缓冲2一定是处于对累加结果进行卸载的模式,当第一缓冲1正处于对累加结果进行卸载的模式时,则第二缓冲2一定是正处于对累加结果进行缓存的模式。总之,第一缓冲1与第二缓冲2所处的工作模式一定是不同的。When the
S4第一数据寄存器3获取并暂存运算单元输出的累加结果。第一数据寄存器3与矩阵乘法加速单元南边出口处的运算单元一一对应连接,用于获取并暂存对应运算单元输出的乘加结果。S4 The
S5第二数据寄存器4获取并暂存处于第一工作模式下的所述第一缓冲1或所述第二缓冲2中的最新的累加结果。第二数据寄存器4与双缓冲单元电性连接,用于获取并暂存处于第一工作模式下的所述第一缓冲1或第二缓冲2中的最新的累加结果。S5 The
S6加法器5将所述第一数据寄存器3中的乘加结果与所述第二数据寄存器4中的最新的累加结果进行相加以得到累加结果。加法器5与第一数据寄存器3及第二数据寄存器4电性连接,用于将所述第一数据寄存器3中的乘加结果与第二数据寄存器4中的累加结果进行相加以得到最新的累加结果。S6 The
S7第三数据寄存器6获取并暂存所述加法器5输出的累加结果。第三数据寄存器6与所述加法器5及双缓冲单元电性连接,用于获取并暂存所述加法器5输出的累加结果。S7 The third data register 6 acquires and temporarily stores the accumulation result output by the
S8双缓冲单元将所述第三数据寄存器6中的累加结果缓存至处于第一工作模式的第一缓冲1或第二缓冲2中。The S8 double buffering unit buffers the accumulation result in the third data register 6 into the
假设在第一轮累加结果累加缓存过程中,第一缓冲1处于第一工作模式、第二缓冲2处于第二工作模式。首先第一数据寄存器3获取并暂存与之连接的运算单元输出的乘加结果,同时第二数据寄存器4获取并暂存第一缓冲1中缓存的最新的累加结果。然后通过加法器5将第一数据寄存器3中的乘加结果与第二数据寄存器4中的累加结果进行相加以得到累加结果,并将累加结果暂存在第三数据寄存器6中。最后当第三数据寄存器6获得最新的累加结果后,双缓冲单元将第三数据寄存器6中的累加结果缓存至第一缓冲1中,此时一个累加结果的累加缓存步骤结束。当一个累加结果的累加缓存步骤结束后,累加器缓冲模块会以相同的步骤继续下一个累加结果的累加缓存。另外,在第一缓冲1不断进行累加结果累加缓存的同时,第二缓冲2会对其内部早就缓存的累加结果进行卸载,使得第二缓冲2清零成待用状态。Assume that during the first round of accumulation and caching of accumulation results, the
一定时间后,累加器缓冲控制逻辑会给控制寄存器7发送一个新的控制信号(该控制信号会将第一缓冲1改为第二工作模式,将第二缓冲2改为第一工作模式),使得累加器缓冲模块进入第二轮累加结果累加缓存过程。其中累加结果的累加缓存步骤与上述步骤相同,且当一个累加结果的累加缓存步骤结束后,累加器缓冲模块会以相同的步骤继续下一个累加结果的累加缓存。另外,在第二缓冲2不断进行累加结果累加缓存的同时,第一缓冲1会对其内部在第一轮累加结果累加缓存过程中缓存的累加结果进行卸载,使得第一缓冲1清零成待用状态。After a certain period of time, the accumulator buffer control logic will send a new control signal to the control register 7 (the control signal will change the
一定时间后,累加器缓冲控制逻辑会继续给控制寄存器7发送一个新的控制信号(该控制信号会将第一缓冲1改为第一工作模式,将第二缓冲2改为第二工作模式),使得累加器缓冲模块进入第三轮累加结果累加缓存过程,如此不断循环。After a certain period of time, the accumulator buffer control logic will continue to send a new control signal to the control register 7 (this control signal will change the
所述数据累加卸载方法还包括以下步骤The data accumulation and unloading method also includes the following steps
L1当所述第一缓冲1或所述第二缓冲2由第一工作模式转为第二工作模式时,结果获取单元获取所述第一缓冲1或所述第二缓冲2中的累加结果。L1 When the
L2当所述结果获取单元完成累加结果获取后,卸载执行单元对所述述第一缓冲1或所述第二缓冲2进行累加结果的卸载操作。L2 After the result acquisition unit completes the acquisition of the accumulation result, the unload execution unit performs an unload operation of the accumulation result on the
当累加器缓冲控制逻辑给控制寄存器7发送一个新的控制信号,使得第一缓冲1(或第二缓冲2)由第一工作模式转为第二工作模式时,需要确保第一缓冲1(或第二缓冲2)中的累加结果已经被提取出来后,才会对第一缓冲1(或第二缓冲2)中的累加结果进行卸载。When the accumulator buffer control logic sends a new control signal to the
假设在收到新的控制信号前,第一缓冲1处于第一工作模式,第二缓冲2处于第二工作模式。在收到新的控制信号后,第一缓冲1变为第二工作模式,第二缓冲2变为第一工作模式。当第一缓冲1变为第二工作模式后,需要先通过结果获取单元提取第一缓冲1中的累加结果,然后再通过卸载执行单元对第一缓冲1进行累加结果的卸载操作。当第二缓冲2变为第一工作模式后,第二缓冲2能够立即对矩阵乘法加速单元传输过来的乘加结果进行累加缓存处理。Assume that before receiving a new control signal, the
L3结果写回模块将所述结果获取单元获取的所述累加结果写入局部数据存储器。一个累加器缓冲包括一个结果写回模块,该结果写回模块的输入端与全部累加器缓冲模块中的结果获取单元连接,该结果写回模块的输出端与局部数据存储器连接。通过结果写回模块能够将结果获取单元获取的累加结果写入局部数据存储器。The L3 result write-back module writes the accumulation result obtained by the result obtaining unit into a local data memory. An accumulator buffer includes a result write-back module, the input of the result write-back module is connected to the result acquisition units in all accumulator buffer modules, and the output of the result write-back module is connected to the local data memory. The accumulation result acquired by the result acquisition unit can be written into the local data memory through the result write-back module.
本实施例中,累加器缓冲模块可以先通过第一缓冲1对矩阵乘法加速单元输出的乘加结果进行缓存,同时可以对第二缓冲2中已缓存的累加结果进行卸载(卸载完成后第二缓冲2处于待用状态)。这样,当收到下一次控制信号时,累加器缓冲模块就可以直接通过第二缓冲2对矩阵乘法加速单元输出的乘加结果进行缓存,同时对第一缓冲1中已缓存的累加结果进行卸载(不需要像现有技术一样,在收到数据卸载信号后,一定要等待累加器缓冲先将内部的数据全部卸载之后才能进行下一轮数据的缓存),本实施例的累加器缓冲模块可以如此不断循环,省去了等待数据(即累加结果)卸载的时间,进而有效提高了本实施例中累加器缓冲的数据累加方法的工作效率。In this embodiment, the accumulator buffer module can first cache the multiplication and addition results output by the matrix multiplication acceleration unit through the
上面所述的实施例仅是对本发明的优选实施方式进行描述,并非对本发明的构思和范围进行限定。在不脱离本发明设计构思的前提下,本领域普通人员对本发明的技术方案做出的各种变型和改进,均应落入到本发明的保护范围,本发明请求保护的技术内容,已经全部记载在权利要求书中。The above-mentioned embodiments are only descriptions of preferred implementations of the present invention, and are not intended to limit the concept and scope of the present invention. Under the premise of not departing from the design concept of the present invention, various modifications and improvements made by ordinary persons in the art to the technical solution of the present invention shall fall within the scope of protection of the present invention, and the technical content claimed in the present invention has been fully described in the claims.
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202210966726.5ACN115268838B (en) | 2022-08-12 | An accumulator buffer structure and data accumulation and unloading method thereof |
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202210966726.5ACN115268838B (en) | 2022-08-12 | An accumulator buffer structure and data accumulation and unloading method thereof |
| Publication Number | Publication Date |
|---|---|
| CN115268838Atrue CN115268838A (en) | 2022-11-01 |
| CN115268838B CN115268838B (en) | 2025-10-14 |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US5465343A (en)* | 1993-04-30 | 1995-11-07 | Quantum Corporation | Shared memory array for data block and control program storage in disk drive |
| CN101351791A (en)* | 2005-09-02 | 2009-01-21 | 奎克菲尔特技术公司 | Shared memory and shared multiplier programmable digital-filter implementation |
| CN103699355A (en)* | 2013-12-30 | 2014-04-02 | 南京大学 | Variable-order pipeline serial multiply-accumulator |
| WO2019164237A1 (en)* | 2018-02-20 | 2019-08-29 | 삼성전자주식회사 | Method and device for performing deep learning calculation by using systolic array |
| US20200201932A1 (en)* | 2019-12-28 | 2020-06-25 | Intel Corporation | Apparatuses, methods, and systems for instructions of a matrix operations accelerator |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US5465343A (en)* | 1993-04-30 | 1995-11-07 | Quantum Corporation | Shared memory array for data block and control program storage in disk drive |
| CN101351791A (en)* | 2005-09-02 | 2009-01-21 | 奎克菲尔特技术公司 | Shared memory and shared multiplier programmable digital-filter implementation |
| CN103699355A (en)* | 2013-12-30 | 2014-04-02 | 南京大学 | Variable-order pipeline serial multiply-accumulator |
| WO2019164237A1 (en)* | 2018-02-20 | 2019-08-29 | 삼성전자주식회사 | Method and device for performing deep learning calculation by using systolic array |
| US20200201932A1 (en)* | 2019-12-28 | 2020-06-25 | Intel Corporation | Apparatuses, methods, and systems for instructions of a matrix operations accelerator |
| CN113050990A (en)* | 2019-12-28 | 2021-06-29 | 英特尔公司 | Apparatus, method and system for instructions for a matrix manipulation accelerator |
| Publication | Publication Date | Title |
|---|---|---|
| US12061562B2 (en) | Computer memory expansion device and method of operation | |
| CN107657581B (en) | A convolutional neural network CNN hardware accelerator and acceleration method | |
| US6341318B1 (en) | DMA data streaming | |
| JP2516300B2 (en) | Apparatus and method for optimizing the performance of a multi-processor system | |
| US5526508A (en) | Cache line replacing system for simultaneously storing data into read and write buffers having multiplexer which controls by counter value for bypassing read buffer | |
| KR102520983B1 (en) | Acceleration control system based on binarization algorithm, chip and robot | |
| TWI773683B (en) | Providing memory bandwidth compression using adaptive compression in central processing unit (cpu)-based systems | |
| US10445261B2 (en) | System memory having point-to-point link that transports compressed traffic | |
| CN115878517A (en) | Memory device, method of operating the memory device, and electronic device | |
| CN117632043B (en) | CXL memory module, control chip, data processing method, medium and system | |
| CN103345368A (en) | Data caching method in buffer storage | |
| WO2007135602A1 (en) | Electronic device and method for storing and retrieving data | |
| CN110059024B (en) | Memory space data caching method and device | |
| CN106775477B (en) | SSD (solid State disk) master control data transmission management device and method | |
| CN114442908B (en) | Hardware acceleration system and chip for data processing | |
| CN111753962A (en) | Adder, multiplier, convolution layer structure, processor and accelerator | |
| CN115269492A (en) | Streaming data management method and device for reconfigurable processor multi-port cache | |
| CN115268838A (en) | Accumulator buffer structure and data accumulation unloading method thereof | |
| CN117435251B (en) | A post-quantum cryptographic algorithm processor and its system on chip | |
| CN115268838B (en) | An accumulator buffer structure and data accumulation and unloading method thereof | |
| CN110633233A (en) | DMA data transmission processing method based on assembly line | |
| CN115268836A (en) | Control structure and method for data accumulation unloading buffered by accumulator | |
| EP4109278B1 (en) | Technology for early abort of compression acceleration | |
| CN115268837A (en) | Data accumulation unloading system and method buffered by accumulator | |
| CN112380158B (en) | A computing platform for deep learning |
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant |