Data-intensive application-oriented network-on-chip approximate communication systemTechnical Field
The invention relates to the technical field of Network on chip (NoC), in particular to a data-intensive application-oriented Network on chip approximate communication system.
Background
With the continuous reduction of the size of semiconductor process, the industry starts to enter the "dark silicon" era, the performance of single processing core gradually tends to the limit, and multi-core has become the hot point of chip industry research and the main developing direction in the future. Compared with a bus structure, a multi-core communication architecture connected through a Network on Chip has the advantages of high bandwidth, good expandability, low delay and the like, and a Network on Chip (NoC) becomes a paradigm structure of multi-core interconnection, but at the same time, more time is spent on data communication.
At present, some large multi-core chips usually have hundreds of cores, and the number of processing cores of future chips is predicted to be more; as the number of processing cores increases, the communication time cost of several representative mega-million parallel applications increases rapidly, that is, the communication problem is rapidly becoming a major bottleneck for applications with extreme parallelism.
However, some data intensive applications have fault tolerance characteristics, such as image processing and scientific computing, which provide designers with new ideas in trying to solve the problem of network-on-chip communication congestion. These applications can tolerate modest errors while producing results that are acceptable to the user. Such as image processing applications, the difference is not significant to the end user even though the output is not one hundred percent correct. Therefore, conventional NoC designs that transmit all data with absolute accuracy are not the only choice for these applications.
The Chinese patent application, application number CN202010765578.1, published 2020, 12.18.discloses a many-core system energy consumption and performance optimization method based on collaborative approximate calculation, which can combine different abstraction layers to adopt a plurality of approximate techniques on the premise of controlling the output result of an application program to meet a certain error range, and comprises the steps of reducing the calculation workload of the application program at an application level, selectively deleting data at a network layer to reduce network congestion, and applying the approximate calculation to different abstraction layers of the many-core system through the optimization regulation of a global controller and the resource configuration of a local controller. The invention measures the importance of the discarded data based on a quality model, considers the cooperative management of communication and calculation, formulates a multi-objective optimization problem to minimize network congestion, application program running time and limit result quality, provides a new method for accelerating the application program running time, reducing energy consumption and improving chip-level energy efficiency for a many-core system, cuts and replies all approximable data in a network according to the application quality, does not consider the network congestion condition when data is compressed, causes unnecessary precision loss when the network is smooth, and can not bring better effect under the network congestion condition.
Disclosure of Invention
1. Technical problem to be solved
Aiming at the problems of easy congestion, delay and the like of a network-on-chip for data intensive application in the prior art, the invention provides a data intensive application-oriented network-on-chip approximate communication system, which can effectively improve the congestion condition of the network-on-chip, dynamically reduce the communication delay of the data intensive application, and can adapt to the topological structures of two-dimensional and three-dimensional networks and various dynamic routing algorithms.
2. Technical scheme
The invention discloses a scheme capable of improving the communication bandwidth of a network on chip and reducing the time delay and power consumption generated by data intensive application in the network on chip.
The purpose of the invention is realized by the following technical scheme.
A data intensive application-oriented network-on-chip approximate communication system comprises a plurality of approximate communication architectures, wherein each approximate communication architecture comprises a processing core, a router and a network interface, and the network interfaces are respectively connected with the processing core and the router;
a main control node is arranged in the processing core, and a global controller is arranged in the main control node; the router is provided with a network congestion state monitoring unit, and the network congestion state monitoring unit is used for transmitting network congestion information to the global controller in real time; the network interface is provided with a data screening unit and a data compression and decompression unit, and system data is approximately processed by the data screening unit and the data compression and decompression unit in the network port and then transmitted to the network through the router.
Preferably, the global controller determines the congestion status of network communication through the accumulated waiting time of each router in the area network, and the area is formed by all router paths that data may pass through in the packet source node and the destination node of the network on chip.
Preferably, the global controller counts the congestion number of the router nodes in the area, and sends different data approximation rate information to the source node sending the packet according to the congestion number of the router nodes.
Preferably, the network congestion status monitoring unit converts the communication congestion status of the network into router waiting time, judges the network congestion status according to the waiting time, and sends the congestion node information to the global controller.
Preferably, the network congestion state supervision unit records the phenomenon that a plurality of ports compete for the same output port in the network topology, and counts the times that the input port does not obtain the transmission priority into the waiting time; meanwhile, when the data back pressure signal is zero, namely when the output port matched with the current input port has no idle virtual channel, counting the times that the input port does not obtain the transmission right into the waiting time; and defining the router nodes with the waiting time exceeding the set threshold value as congestion nodes.
Preferably, the congestion node information transmitted by the network congestion status monitoring unit is transmitted to the global controller to match the corresponding data compression rate, and the original waiting time of the network congestion status monitoring unit is cleared.
The invention adopts the network congestion monitoring unit in the router to convert the competitive congestion of communication and the data back pressure signal into waiting time which is used for judging the congestion state of the network area, thereby effectively monitoring the communication congestion degree of the network on chip.
Preferably, the data filtering unit steplessly adjusts the compression rate of data transmitted to the network within an allowable approximation rate threshold. The method can be well adapted to the change of the network congestion condition.
Preferably, the data screening unit screens the data to be compressed by generating a pseudo random number through the linear feedback shift register, and the data packet without acquiring the transmission right is to be compressed.
Preferably, the communication architecture employs a data transmission mode that approximates communication.
Preferably, the network is a two-dimensional or three-dimensional network. The global controller of the invention takes the number of congestion nodes in a specific area as a basis for judging the network communication congestion condition, can support various network-on-chip dynamic routing algorithms, and is suitable for data intensive application running in two-dimensional and three-dimensional network topologies.
The communication system analyzes the communication competition condition and the data back pressure signal in the router, converts the communication congestion condition of the network into the waiting time of the router through the network congestion supervision unit, transmits the network congestion node information to the global controller through the network, and dynamically compresses the data in real time according to the network congestion condition. The global controller judges the congestion state of network communication through the accumulated waiting time of the router, generates a data compression rate corresponding to the congestion information in the network-on-chip area according to the congestion information in the network-on-chip area, and sends different data approximate rate information to a source node sending a packet according to the number of the congestion nodes.
3. Advantageous effects
Compared with the prior art, the invention has the advantages that:
(1) the network congestion monitoring unit in the router is adopted to convert the competitive congestion of communication and the data back pressure signal into waiting time which is used for judging the congestion state of a network area, so that the communication congestion degree of the network on chip is effectively monitored;
(2) pseudo random numbers are generated through a linear feedback shift register to screen data needing to be compressed, and the compression rate of the data is adjusted in a stepless mode within an error threshold allowed by application, so that the change of network congestion conditions can be well adapted;
(3) the global controller takes the number of congestion nodes in a specific area as a basis for judging the network communication congestion condition, dynamically compresses data in real time according to the network congestion condition, can support various network-on-chip dynamic routing algorithms, and is suitable for data intensive application running in two-dimensional and three-dimensional network topologies;
(4) the invention is oriented to any approximable application, does not need to run the application to obtain a model first, supports dynamic data approximation and can realize any data compression ratio in a single router, and the invention automatically generates different compression strength to different applications and network conditions dynamically according to network conditions, thereby having wider applicability;
in conclusion, the invention can better fit the actual delay curve of the application for the congestion condition of data intensive application in the network according to the dynamic data accuracy reduction of the congestion condition in the network. The invention effectively improves the congestion condition of the network on chip, dynamically reduces the communication delay of data intensive application, can adapt to the topological structures of two-dimensional and three-dimensional networks and various dynamic routing algorithms, can directly control the applications which are not operated before the network without obtaining a performance model of the applications, and has good use value and wide application prospect.
Drawings
FIG. 1 is a schematic cross-sectional view of a three-dimensional network topology;
FIG. 2 is a schematic diagram of the multi-core dynamic approximation communication architecture of the present invention;
fig. 3 is a schematic diagram of a network congestion supervision unit of the present invention;
FIG. 4 is a schematic diagram of the control logic of the present invention;
FIG. 5 is a schematic diagram of the simulation delay results for the integrated flow model;
fig. 6 is a diagram illustrating the results of the integrated flow model simulation power consumption.
Detailed Description
The invention is described in detail below with reference to the drawings and specific examples.
Examples
The invention discloses a data intensive application-oriented network-on-chip dynamic approximate communication system, which adopts an approximate communication data transmission mode and is provided with a global controller of data approximation rate in a network; a network congestion state monitoring unit is arranged in the router; the port connected with the router and the processing core is provided with an on-chip network data compression and decompression unit and a screening unit for generating the dynamic data compression probability; the architecture analyzes the communication competition condition and the data back pressure signal in the router, converts the communication congestion condition of the network into the waiting time of the router through the monitoring unit, judges the network congestion condition according to the waiting time, and the network congestion condition monitoring unit sends the congestion node information to the global controller. .
The data filtering unit filters the approximable data transmitted to the network by generating pseudo-random numbers for the registers through an 8-bit linear feedback, thereby steplessly adjusting the data compression rate within a maximum allowable data approximation rate threshold. The control logic of the global controller judges the congestion condition of network communication through the accumulated waiting time of each router in an area, and the area is formed by all routing paths which can be passed by data in a packet sending source node and a destination node of the network on chip.
A network congestion state supervision unit in the router records the phenomenon that a plurality of ports compete for the same output port in a two-dimensional or three-dimensional network topology, and counts the times that the input port does not acquire the transmission priority into the waiting time; meanwhile, when the data back pressure signal is zero, namely when the output port matched with the current input port has no idle virtual channel, counting the times that the input port does not obtain the transmission right into the waiting time; and defining the router nodes with the waiting time exceeding the set threshold value as congestion nodes.
In order to steplessly adjust the data compression rate of the incoming network within the application-allowed error threshold, hardware-implementable pseudo-random numbers are generated in the ports by means of linear shift feedback registers and compared with the dynamic network data approximation rate fed back by the global controller, by means of which all the approximable data are screened, data packets which do not win the transmission right will be compressed, thereby dynamically controlling the traffic of the incoming network in real time.
The control logic of the global controller generates a data compression rate corresponding to congestion information in an on-chip network area in an adjustable unit time interval according to the congestion information, wherein the area is formed by all possible paths from a data packet source node to a destination node through a dynamic routing algorithm; and the global controller counts the congestion number of the router nodes in the area and sends different data approximation rate information to the source node sending the packet according to the congestion number of the router nodes.
Fig. 1 is a single-sided cross-sectional view of a three-dimensional network topology, where the network on chip in this embodiment is a three-dimensional mesh structure, core in the diagram represents a processing core, router represents a router, and ni (network interface) represents a network interface, a size of the three-dimensional network structure in this embodiment is 4 × 4 × 4, and each network node includes an approximate communication architecture as shown in fig. 2; the buffer area of the router is used for storing flits to be transmitted, the size of each buffer area is 8 flits, the virtual channel of each input port is 1, and the virtual channel is a physical link of a time division multiplexing network on chip, so that the bandwidth of the link can be better utilized, and the situation that the link is continuously occupied is avoided.
The dynamic approximate communication architecture disclosed in this embodiment is shown in fig. 2, where a processing core includes a master node, and the master node is provided with a global controller; a Router in the system is provided with a network congestion state monitoring unit, the network congestion state monitoring unit accumulates delay into waiting time according to communication competition conditions affecting network communication delay and the condition that an input port cannot obtain output right due to the fact that an output port is occupied, and a Router node with the accumulated waiting time being more than 5 in every 2000 clock cycles is defined as a congestion node; the congested node information is then transmitted to the global controller to match the corresponding data compression rate and clear the original waiting time of the supervision unit.
The port NI (network interface) between the processing core and the Router is provided with a data compression and decompression unit and a data screening unit. Input data transmitted into a network firstly passes through a data screening unit, the data screening unit compares a pseudo random number generated by a linear shift feedback register with a data compression ratio sent by a whole-region controller, for example, the data compression ratio sent by the whole-region controller is 0.2, a random number between 0 and 1 randomly generated by the linear shift feedback register is compared with 0.2, when the data compression ratio is smaller than the data compression ratio, a data packet transmitted by an application is transmitted into a compression unit, otherwise, the data packet is transmitted into the network. When the data in the network reaches the port, the data is firstly recovered through a decompression unit in the port by a linear interpolation method and then transmitted into a processing core.
The specific structure of the network congestion status supervision unit is shown in fig. 3, and the statistical data of the accumulator generates two situations of delay in transmission. The first situation is that the tail ends of the multi-directional input port buffers compete for the same output port, and the times of not obtaining the priority of the output port are accumulated in the waiting time; the second case is that the output port back pressure signal to be sent to the end of the input port buffer is zero, i.e. the output port has no free buffer to hold data, and the number of times of data packet stagnation is accumulated into the waiting time.
As shown in fig. 4, the control logic of the global controller is to count the number of congested router nodes in an area formed by all possible paths through which a data packet is sent from a source node to a destination node in a network, in a three-dimensional network, all areas through which a dynamic routing algorithm of the shortest path may pass are configured as a cube, and the approximation rate of network data linearly increases along with the number of congested nodes until reaching the maximum error threshold allowed by an application, and then the data approximation rate is maintained.
In order to verify the effect of the system on improving the data congestion condition, the communication architecture is simulated on a noxim simulation platform. The simulation adopts a 4 multiplied by 4 network topology, the depth of the buffer area is 8 flits, the number of virtual channels of each input port is 1, and the size of a data packet transmitted in the network is 8 flits. The routing algorithm adopted by the simulation comprises an XYZ reason algorithm and an OEZ routing algorithm, and the flow patterns used for comparison comprise Random, transit and Shuffle. The simulation mainly verifies whether the dynamic approximate communication architecture can improve the congestion condition of the network and reduce the delay when the injection rate reaches saturation, and the dynamic approximate communication architecture shows good adaptability to the dynamic routing algorithm OE _ Z.
FIG. 5 shows the delay results of the integrated traffic simulation, where XYZ _ original and OE _ Z _ original represent the delay results of the integrated traffic pattern of the common three-dimensional network under the routing algorithms XYZ and OE _ Z; XYZ _ ABDTR indicates the delay result of the conventional approximate communication architecture suitable for the fixed routing algorithm running the integrated traffic pattern under routing algorithm XYZ, and XYZ _ DCBW and OE _ Z _ DCBW indicate the delay result of the invention running the integrated traffic pattern under routing algorithms XYZ and OE _ Z. It is not easy to see that the invention can effectively improve the saturation condition of the injection rate when the network communication approaches the congestion, and can keep the good control effect under the dynamic routing algorithm.
FIG. 6 shows the power consumption results of the integrated traffic simulation, where XYZ _ original and OE _ Z _ original represent the power consumption results of the integrated traffic model for a common three-dimensional network under the routing algorithms XYZ and OE _ Z; XYZ _ ABDTR indicates the power consumption result of the conventional approximate communication architecture adapted to the fixed routing algorithm running the integrated traffic pattern under routing algorithm XYZ, and XYZ _ DCBW and OE _ Z _ DCBW indicate the power consumption result of the invention running the integrated traffic pattern under routing algorithms XYZ and OE _ Z. The result shown in fig. 6 shows that the invention can effectively reduce the communication power consumption cost of the network on chip and keep the good control effect under the dynamic routing algorithm.
The invention and its embodiments have been described above schematically, without limitation, and the invention can be embodied in other specific forms without departing from the spirit or essential characteristics thereof. The representation in the drawings is only one of the embodiments of the invention, the actual construction is not limited thereto, and any reference signs in the claims shall not limit the claims concerned. Therefore, if a person skilled in the art receives the teachings of the present invention, without inventive design, a similar structure and an embodiment to the above technical solution should be covered by the protection scope of the present patent. Furthermore, the word "comprising" does not exclude other elements or steps, and the word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. Several of the elements recited in the product claims may also be implemented by one element in software or hardware. The terms first, second, etc. are used to denote names, but not any particular order.