Disclosure of Invention
An object of the embodiments of the present application is to provide a method and an apparatus for on-chip external bus communication, where on one hand, an external bus communication protocol in some embodiments of the present application supports multiple data communication protocols, and by supporting multiple protocols, requirements of data communication in different scenarios are met, and on the other hand, some embodiments of the present application can optimize and solve technical problems of different establishment and retention times of data transmission requirements due to differences of different circuit boards in actual application of a chip by using a dynamically configurable external bus read-write timing sequence.
In a first aspect, an embodiment of the present application provides a method for on-chip external bus communication, where the method includes: receiving a data transmission instruction from an on-chip target device, wherein the data transmission instruction is used for accessing an external storage device; configuring a target read-write time sequence for an external bus, and sending the data transmission instruction to the external storage device through the external bus, wherein the external bus supports multiple external bus protocols.
On one hand, in practical application of a chip, due to differences of different circuit boards, establishment and holding time of data transmission needs are different, a fixed data transmission protocol may not meet the needs of some application scenarios, and the problem can be solved by adopting the configurable time sequence of the embodiment of the application. On the other hand, some embodiments of the present application satisfy the needs of data communication in different scenarios by supporting multiple protocols.
In some embodiments, prior to receiving the data transfer instruction from the on-chip target device, the method further comprises: receiving data transmission instructions of a plurality of devices on the chip, wherein one device corresponds to one data transmission instruction; assigning usage rights of the external bus to the target device of the plurality of devices according to an arbitration policy.
Some embodiments of the present application ensure that multiple devices can use the same external bus in a time-sharing manner through an arbitration policy, thereby improving the use efficiency of the external bus.
In some embodiments, before said assigning the usage right of the external bus to the target device of the plurality of devices according to an arbitration policy, the method comprises: caching the data transmission instruction of each device to obtain caching information; wherein the receiving a data transmission instruction from an on-chip target device includes: and reading a data transmission instruction corresponding to the target equipment from the cache information.
Some embodiments of the present application require that data transfer instructions from multiple on-chip devices be cached first, which does not block the execution of other instructions of subsequent cores and DMA devices, making program execution more efficient.
In some embodiments, the types of devices include at least: an external host, a kernel, a DMA, and other devices, wherein the arbitration policy comprises: the data transmission instruction from the external host belongs to a first priority, the data transmission instruction from the kernel belongs to a second priority, the data transmission instruction from the DMA belongs to a third priority, and the data transmission instruction from the other equipment belongs to a fourth priority.
Some embodiments of the present application may ensure that important devices can preferentially use the external bus by setting different priorities for different kinds of devices.
In some embodiments, the arbitration policy further comprises: performing data transmission on a plurality of data transmission instructions under the same priority by adopting a polling mechanism; if there are several devices with the same priority applying bus, the maximum data transmission quantity of the device obtaining the bus use right is controlled by the maximum data transmission quantity value.
Some embodiments of the present application guarantee that devices of the same priority have an opportunity to use the external bus by setting the value Bmax of the maximum amount of data transfer and the setting of the polling function.
In some embodiments, when the target device is a kernel or a DMA, the sending the data transfer instruction to the external storage device through an external bus includes: the external bus is used by locking the external bus.
According to some embodiments of the application, when the kernel and the DMA adopt the external bus for data transmission, the use right of the external bus is occupied for a long time in a bus locking mode, and the design is mainly used for solving the problem of atomic operation and ensuring the consistency of required data.
In some embodiments, the plurality of external bus protocols includes a first external bus protocol, and the control signals corresponding to the first external bus protocol include: the effective chip selection information comprises setup, active and hold, wherein the continuous cycle time of any parameter of the setup, the active and the hold can be dynamically configured.
In the external bus of some embodiments of the present application, setup, active, and hold supported by the first external bus protocol (i.e., the slow protocol) are dynamically configurable, and the requirements of chip setup and hold times in different scenarios can be met by setting different setup and hold times.
In some embodiments, the control signals corresponding to the first external bus protocol further include: preparing information, wherein the preparing information is a signal from the external storage device, the preparing information for extending a cycle time corresponding to the active.
Some embodiments of the application use the configurable ready information ready signal as an input signal of the external storage device to the external bus interface, which can prolong the active data transmission period and solve the problem that data cannot be processed in time during data transmission.
In some embodiments, the plurality of external bus protocols includes a second external bus protocol, and the control signal corresponding to the second external bus protocol further includes a delay signal, wherein a value of the delay signal delay is configured to control a delay between the output control address signal and the data signal.
Some embodiments of the present application support the control and data period delay configurable by the second external bus protocol (i.e., the fast protocol), and by configuring the value of delay, the delay between the output control address signal and the output data signal can be controlled.
In a second aspect, some embodiments of the present application provide an apparatus for on-chip external bus communication, the apparatus comprising: and the external bus interface comprises a first external bus protocol module and a second external bus protocol module, wherein the first external bus protocol module can complete the dynamic configuration of setup, active and hold included by the effective chip selection information, and the second external bus protocol can complete the dynamic configuration of time delay between the control address signal and the data signal.
In some embodiments, the apparatus further comprises: an arbiter connected to an input of the external bus interface, the arbiter configured to determine a target device that can use an external bus from among a plurality of on-chip devices.
In some embodiments, the arbiter is provided with: the device comprises an input interface for receiving external bus use requests of various devices, a kernel access locking signal input interface, a DMA access locking signal input interface and a maximum transferable data quantity input interface, and is used for outputting external bus use request response signals aiming at various devices.
In a third aspect, some embodiments of the present application provide a chip, the chip comprising: a plurality of external hosts, a plurality of kernels, and a plurality of DMAs; the apparatus as in any one of the embodiments of the first aspect, wherein the plurality of external hosts, the plurality of cores, and the plurality of DMAs are coupled to an input of an arbiter included in the apparatus.
In a fourth aspect, some embodiments of the present application provide a system comprising: a plurality of external hosts, a plurality of kernels, and a plurality of DMAs; the apparatus of any embodiment of the first aspect, wherein the plurality of external hosts, the plurality of cores, and the plurality of DMAs are coupled to an input of an arbiter of the apparatus; and the external storage equipment is connected with the output end of the external bus interface included by the device.
Detailed Description
The technical solutions in the embodiments of the present application will be described below with reference to the drawings in the embodiments of the present application.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures. Meanwhile, in the description of the present application, the terms "first", "second", and the like are used only for distinguishing the description, and are not to be construed as indicating or implying relative importance.
As described in the background section, most of the prior art external bus interfaces support only a single protocol for communicating with the external storage device, while the external bus communication protocol of some embodiments of the present application supports multiple data communication protocols (for example, these data communication protocols include a slow protocol and a fast protocol), and by supporting multiple protocols, the technical solution of the present application can meet the needs of data communication in different scenarios. In practical application of the related art chip, due to differences of different circuit boards, establishment and holding time required for data transmission are different, a fixed data transmission protocol may not meet the requirements of some application scenarios, and in some embodiments of the present application, the problem may be optimized and solved by using a dynamically configurable external bus read-write timing sequence.
Referring to fig. 2, fig. 2 is a block diagram of a system, which at least includes theapparatus 10 for on-chip external bus communication, anexternal memory device 300 and anexternal host 400 according to an embodiment of the present application.
As shown in fig. 2, some embodiments of the present application provide anapparatus 10 for on-chip external bus communication, comprising: anexternal bus interface 200.
Theexternal bus interface 200 includes a first external bus protocol module (corresponding to the slow protocol module of fig. 2) and a second external bus protocol module (corresponding to the fast protocol module of fig. 2), wherein the first external bus protocol module can complete the dynamic configuration of setup, active, and hold included in the valid chip select information, and the second external bus protocol module can complete the dynamic configuration of controlling the delay between the address signal and the data signal.
The slow protocol module and the fast protocol module are exemplarily set forth below.
Compared with the prior art, some embodiments of the present application add a multifunctional external bus protocol (i.e. a slow protocol and a fast protocol) to the external bus interface module, support device accesses with different rates, and improve the external bus protocol accordingly in order to solve the needs of different chip setup and hold times in different applications.
The slow protocol is illustratively described below in conjunction with the timing diagram of fig. 3.
The slow protocol of the embodiment of the application supports setup, active and hold cycle time dynamic configuration. By configuring the setting of the three parameters, the requirements of different establishing and maintaining time of required data transmission under different application scenes can be solved.
As shown in fig. 3, when data transfer is performed, the active chip select signal MS _ n is first set and address and data signals are issued. According to the configuration parameter of setup, if setup is 1 in fig. 3, the read/write control signal is valid after 1 cycle. In fig. 3, active is 3, and when the active period lasts 3 clock cycles, by determining the ready signal ready, if 1, data can be written and read back normally. In fig. 3, the ready signal is low, which indicates that the current active needs to continue, otherwise the data cannot be written out or read back normally; when data can be written out normally or read back, ready is changed into high level, at this time, the data active period is ended, and the read-write control signal is pulled up to be in an invalid state. And entering a hold period, for example, according to the hold value, in fig. 3, if hold is 1, the data and the address continue for one period, and the whole data transmission can be ended.
The meaning of each signal related to fig. 3 is as follows:
clk characterizes the data transfer clock.
MS _ n characterizes the memory cell chip select signal, e.g., active low.
WR _ n characterizes the write operation signal, e.g., active low.
RD _ n indicates a read operation signal, e.g. active low
ADDR characterizes the access address.
DATA characterizes the DATA signal.
READY characterizes the data transfer handshake signal.
The difference between the protocol in fig. 3 and the existing protocol is that a ready signal is added to the slow protocol in fig. 3, and handshaking between the master device and the slave device is realized through the ready signal, so that reliability of data transmission is ensured.
It can be understood that, the ready signal ready of some embodiments of the present application is an input signal (an input signal sent by an external storage device to an external bus interface), which may extend a data transmission period of an active, and solve a problem that data cannot be processed in time during data transmission. Such as: when reading data, when reading address reading data and cannot return in time, the period of data access is prolonged through a ready signal, the ready signal is effectively pulled up after the data is prepared, and the main equipment reads the data back through judging that the ready signal is effective.
The fast protocol is exemplarily described below in connection with the timing diagram of fig. 4. It should be noted that the specific meanings of the signals in fig. 4 are the same as those of the signals in fig. 3, and are not described herein for avoiding redundancy. The difference from the existing protocol is that in the fast protocol in fig. 4, a ready signal is added, and handshaking between master and slave devices is realized through the ready signal, so that reliability of data transmission is ensured.
The fast protocol of some embodiments of the present application supports control and data period delay provisioning. By configuring the value of delay, the delay between the output control address signal and the data signal can be controlled. The function can solve the problems of event establishment and insufficient holding time in rapid data transmission. Meanwhile, the function of data ready is supported, the use of different devices is met, the rapid protocol can continuously send out addresses, and the data continuously return to the function.
In fig. 4, the delay in the fast protocol is set to 2, i.e., the delay between the control signal (MS _ n, WR _ n/RD _ n) and the address signal (ADDR), and the DATA signal (DATA) is set to 2 periods. The fast data is transmitted by a plurality of data, when a ready signal is pulled down (namely at the address addr3/data1) in the transmission process, the address and the data need to be kept, and the next address and data operation can be carried out after the address and the data are pulled up.
It will be appreciated that in some embodiments of the present application, an arbiter may be required if there are multiple devices sharing the same external bus. The arbiter provided by some embodiments of the present application is illustratively set forth below.
As shown in fig. 2, in some embodiments of the present application, theapparatus 10 for on-chip external bus communication further comprises: anarbiter 100, thearbiter 100 being connected to an input of theexternal bus interface 200, thearbiter 100 being configured to determine a target device from the plurality of on-chip devices that may use the external bus.
For example, in some embodiments of the present application, an arbiter is provided with: an input interface for receiving external bus use requests of various devices, a kernel access lock signal input interface, a DMA access lock signal input interface and a maximum transmittable data amount input interface for outputting external bus use request response signals for various devices, wherein data transmission instructions from the target device can be screened out by the request response signals.
That is, some embodiments of the present application provide an arbiter for an on-chip external bus, as compared to the prior art.
As shown in fig. 5, the input and output signals of thearbiter 100 are as follows.
core _ lock, each core can lock the external bus, the core acquiring the external bus can enable the locking function, and the core not acquiring the external bus right can not be locked.
cbr the kernel requests the external bus signal and requests the use request of the external bus.
cbg the kernel obtains an acknowledge signal of the usage rights of the external bus.
DMA lock signal, each DMA channel can lock the external bus, DMA to get the external bus can enable the lock function, not getting the external bus right can not lock.
And dbr, DMA applies for external bus signals and applies for the use request of the external bus.
dbg, DMA obtains the answer signal of the external bus use right.
br: the common device applies for an external bus permission signal.
bg, the common device obtains the external bus using right response signal.
And the hbr is used for applying an external bus right signal by the external host.
hbg the external host obtains the external bus usage rights signal.
bmax is the maximum data quantity which can be transmitted by the external bus right-using equipment when a plurality of same priority equipment applies for the external bus.
The arbitration process of the arbiter is exemplarily set forth below.
The specific implementation process of the arbiter is as follows: when there are multiple processor cores, multiple DMA host channels, and multiple master devices in the system that can initiate access to external devices. The use right of the external bus is assigned using the following arbitration scheme.
The external host is used as the first priority, as shown in fig. 6, when the host applies for the external bus, if the external bus is not locked (core _ Lock or dma _ Lock), when the current data transmission is completed, trans _ finish gives the external bus authority to the external host, and when the external bus Lock, the bus is not released.
The kernel access is used as a second priority, when no external bus application exists and a plurality of kernels access the external bus, the authority of the bus access is distributed in a polling mode, and the kernels 1 and the kernels n are polled in sequence. The core access can interrupt normal DMA accesses and cannot interrupt DMA locked data accesses of the external bus DMA lock.
DMA access as a third priority, the same DMA uses a polling mechanism (as shown in fig. 7) for data transfer, and the DMA can interrupt the normal data transfer.
The other accesses are used as a fourth priority, and the polling mechanism is also used for data transmission under the same priority.
The maximum amount of external bus data transfer bmax defines the maximum amount of data that can be transferred by the bus that each master device can obtain with the same priority. When the maximum number of transmissions is reached, bus permission needs to be given to the next master with the same priority.
When the kernel and the DMA carry out data transmission, the external bus usage right can be occupied for a long time in a mode of external bus locking. The design is mainly used for solving the problem of atomic operation and ensuring the consistency of required data. For example, when a kernel calculates, data is stored in an external device, and the data needs to be used in the next calculation, so when the data which is not expected to be stored is changed before the next calculation, the core _ lock needs to be enabled, the external bus is locked, and other devices and the external host are not allowed to use the external bus. After the data is read back from the external memory unit when the next calculation is performed, the core _ lock is released, and the normal bus arbitration is performed. The DMA lock bus also has the same function. Whereas a normal external bus does not have this function.
A host bus application mechanism is added to the external bus, and the purpose that the external host uses the common storage device can be met. The external host obtains the use right of the external bus by applying for using the external bus, and can access the external storage device.
Fig. 7 shows the sequential decision of the same priority arbitration.
Arbitration of the same priority is performed by round robin arbitration. Br1 in fig. 7 is a bus application of the processor, br2 is a bus application of theprocessor 2, br3 is a bus application of the processor 3, and the processor ID X indicates that the bus application signal is the brX signal.
At system power-up start, the default processor ID is 0 with the highest priority. The priority order polls sequentially from low to high.
As shown in fig. 7, three processor devices with ID 1/2/3 at the same time apply for a bus through br1, br2, br 3. br1 has the highest priority, and br2 can obtain the bus when br1 relinquishes the bus, and br3 can obtain the bus use right when br2 relinquishes the bus.
FIG. 8 shows the arbitration mechanism for the core, DMA, and general access request bus.
When br applies for the bus at the time of normal access, the normal access acquires the bus use right bg as high.
The dbr and cbr bus application signals are generated when DMA and kernel accesses occur. cbr has a higher priority than dbr than normal access and therefore interrupts normal data transmission. When the current data is transmitted by the ordinary transmission, the bus is abandoned to be used for the kernel, namely cbg is pulled high, and bg is lowered.
When the core finishes accessing the relinquish bus cbr is 0, bus usage is given high to the DMA, dbg, since the DMA is still applying a higher priority bus than normal transfers. After the DMA data transfer bus relinquishes the bus, the bus usage right is again given high to the normal transfer bg.
The following exemplifies a method of on-chip external bus communication implemented based on the above-described apparatus for on-chip external bus communication.
As shown in fig. 9, an embodiment of the present application provides a method for on-chip external bus communication, where the method includes: s101, receiving a data transmission instruction from a target device on a chip, wherein the data transmission instruction is used for accessing an external storage device; s102, configuring a target read-write time sequence for an external bus, and sending the data transmission instruction to the external storage device through the external bus.
It can be understood that, on one hand, in practical application of a chip, due to differences of different circuit boards, establishment and retention time of data transmission needs are different, a fixed data transmission protocol may not meet the needs of some application scenarios, and the problem can be solved by adopting the configurable timing sequence of the embodiment of the present application. On the other hand, some embodiments of the present application satisfy the needs of data communication in different scenarios by supporting multiple protocols.
In some embodiments of the present application, before performing S101, the method of on-chip external bus communication further comprises: receiving data transmission instructions of a plurality of devices on the chip, wherein one device corresponds to one data transmission instruction; assigning usage rights of the external bus to the target device of the plurality of devices according to an arbitration policy. Some embodiments of the present application ensure that multiple devices can use the same external bus in a time-sharing manner through an arbitration policy, thereby improving the use efficiency of the external bus.
In some embodiments of the present application, prior to the assigning the usage right of the external bus to the target device of the plurality of devices according to an arbitration policy, the method of on-chip external bus communication further comprises: caching the data transmission instruction of each device to obtain caching information; wherein the receiving a data transmission instruction from an on-chip target device includes: and reading a data transmission instruction corresponding to the target equipment from the cache information.
Some embodiments of the present application require that data transfer instructions from multiple on-chip devices be cached first, which does not block the execution of other instructions of subsequent cores and DMA devices, making program execution more efficient.
In some embodiments of the present application, the types of devices include at least: an external host, a kernel, a DMA, and other devices, wherein the arbitration policy comprises: the data transmission instruction from the external host belongs to a first priority, the data transmission instruction from the kernel belongs to a second priority, the data transmission instruction from the DMA belongs to a third priority, and the data transmission instruction from the other equipment belongs to a fourth priority. Some embodiments of the present application may ensure that important devices can preferentially use the external bus by setting different priorities for different kinds of devices.
In some embodiments of the present application, the arbitration policy further comprises: performing data transmission on a plurality of data transmission instructions under the same priority by adopting a polling mechanism; if there are several devices with the same priority applying bus, the maximum data transmission quantity of the device obtaining the bus use right is controlled by the maximum data transmission quantity value. Some embodiments of the present application guarantee that devices of the same priority have an opportunity to use the external bus by setting the value Bmax of the maximum amount of data transfer and the setting of the polling function.
In some embodiments of the present application, when the target device is a kernel or a DMA, the sending the data transfer instruction to the external storage device through an external bus includes: the external bus is used by locking the external bus. According to some embodiments of the application, when the kernel and the DMA adopt the external bus for data transmission, the use right of the external bus is occupied for a long time in a bus locking mode, and the design is mainly used for solving the problem of atomic operation and ensuring the consistency of required data.
In some embodiments of the present application, the plurality of external bus protocols includes a first external bus protocol, and the control signal corresponding to the first external bus protocol includes: the effective chip selection information comprises setup, active and hold, wherein the continuous cycle time of any parameter of the setup, the active and the hold can be dynamically configured. In the external bus of some embodiments of the present application, setup, active, and hold supported by the first external bus protocol (i.e., the slow protocol) are dynamically configurable, and the requirements of chip setup and hold times in different scenarios can be met by setting different setup and hold times.
In some embodiments of the present application, the control signal corresponding to the first external bus protocol further comprises: preparing information, wherein the preparing information is a signal from the external storage device, the preparing information for extending a cycle time corresponding to the active. Some embodiments of the application use the configurable ready information ready signal as an input signal of the external storage device to the external bus interface, which can prolong the active data transmission period and solve the problem that data cannot be processed in time during data transmission.
In some embodiments of the present application, the plurality of external bus protocols includes a second external bus protocol, and the control signal corresponding to the second external bus protocol further includes a delay signal, wherein a value of the delay signal delay is configured to control a delay between the output control address signal and the output data signal. Some embodiments of the present application may support a configurable control and data period delay by a second external bus protocol (i.e., a fast protocol), and may control a delay between an output control address signal and an output data signal by configuring a value of delay.
Some embodiments of the present application provide a chip, comprising: a plurality of external hosts, a plurality of kernels, and a plurality of DMAs; the apparatus for communicating with an on-chip external bus, wherein the plurality of external hosts, the plurality of cores, and the plurality of DMAs are connected to an input of an arbiter included in the apparatus.
Some embodiments of the present application provide a system comprising: a plurality of external hosts, a plurality of kernels, and a plurality of DMAs; the apparatus communicating as an on-chip external bus, wherein the plurality of external hosts, the plurality of cores, and the plurality of DMAs are connected with an input of an arbiter of the apparatus; and the external storage equipment is connected with the output end of the external bus interface included by the device.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method can be implemented in other ways. The apparatus embodiments described above are merely illustrative, and for example, the flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
In addition, functional modules in the embodiments of the present application may be integrated together to form an independent part, or each module may exist separately, or two or more modules may be integrated to form an independent part.
The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
The above description is only an example of the present application and is not intended to limit the scope of the present application, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application. It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.
The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.