CROSS-REFERENCES TO RELATED APPLICATIONSThis application claims priority from provisional U.S. patent application Ser. No. 60/211,094, filed Jun. 12, 2000 and which is incorporated by reference into this application for all purposes.
A related patent application is filed concurrently with the present application as U.S. patent application Ser. No. 09/668,704, filed on Sep. 22, 2000, in the names of May et al., and entitled “Setting Up Memory and Registers from a Serial Device” and assigned to the present assignee. Another related patent application is filed concurrently with the present application as U.S. patent application Ser. No. 09/668,202, filed on Sep. 22, 2000, in the names of May et al., entitled “Re-configurable Memory Map for a System on a Chip,” and assigned to the present assignee.
BACKGROUND OF THE INVENTIONThe present invention relates to digital systems. More specifically, the present invention relates to a bus architecture for an integrated digital system.
Since their inception, digital systems have progressed towards higher levels of integration. Higher integration offers several benefits to the system designer, including lower development costs, shorter design cycles, increased performance and generally lower power consumption. At the device level, this integration has been achieved by the accumulation of functions once performed by multiple, individual devices into more capable, higher density devices. Additionally, the need for design flexibility has increased due to more challenging time-to-market pressures and changes in system specifications.
Often at the heart of a digital system is the microprocessor, also known as a CPU. A microprocessor is an integrated circuit implemented on a semiconductor chip, which typically includes, among other things, an instruction execution unit, register file, arithmetic logic unit (ALU), multiplier, etc. Microprocessors are found in digital systems, such as personal computers for executing instructions, and can also be employed to control the operation of most digital devices.
Microprocessors have evolved, most notably, in two directions. The first is towards higher performance and the second is towards greater ease of use. The path to higher performance has produced microprocessors with wider data paths and longer instructions. Greater integration has also improved speed, as many microprocessors now incorporate on-board structures such as memory for caching. Finally, like all semiconductors, microprocessors have benefited from architectural and process enhancements, allowing higher speed through better clock rates and more efficient logic operations.
Another digital device, which has evolved over its lifetime to meet the needs of system designer is the programmable logic device (PLD). A programmable logic device is a logic element having a logic function, which is not restricted to a specific function. Rather, the logic function of a PLD is programmed by a user. PLDs provide the advantages of fixed integrated circuits with the flexibility of custom integrated circuits. Demands for greater capacity and performance have been met with larger PLD devices, architecture changes, and process improvements. Similar to microprocessors, the road to greater integration has also led to memory structures being incorporated into PLD architectures.
The traditional approach to system design involves combining a microprocessor and other off-the-shelf devices on a board, while partitioning the board's functions into the components that are best suited to perform them. While this method seems to be straightforward, it ignores the advantages to be gained by higher device-level integration. With higher device-level integration, the elimination of on-chip/off-chip delays enhances performance. Power consumption and overall manufacturing and design costs are often improved as well. Yet, integration presents problems of its own. For example, since a microprocessor will normally be clocked at a faster rate than other elements, a method and apparatus are needed to address this difference in clock speeds.
SUMMARY OF THE INVENTIONAccording to an embodiment of the present invention a system, which is integrated on a single chip, is disclosed. The system includes a combination of an embedded processor, reprogrammable memory, a programmable logic device (e.g. a PLD) and a multiple bus architecture including bus bridges that allow communication between adjacent clock domains, yet which allow communication among the PLD, reprogrammable memory, processor, etc.
The bus architecture of the present invention, in particular, is embodied as a multiple bus master system, which allows communication among all peripherals in the system, via bridges that de-couple the clock frequencies of the individual bus masters from the peripheral they are accessing. The bus architecture of the present invention, therefore, allows the system components, for example the processor peripherals, and PLD to run at their optimal speeds.
In a first aspect of the invention a digital system integrated on a semiconductor chip is disclosed. The system includes one or more first bus masters coupled to a first bus in a first clock domain, a PLD coupled to a second bus in a second clock domain. A first bridge is coupled between the first and second buses and is operable to de-couple the first clock domain from the second clock domain. Additionally, one or more masters on the first bus are configured to communicate with one or more slaves on the second bus. The second bus may also contain a number of masters, including the PLD.
In a second aspect of the invention, a digital system on a semiconductor chip includes a central processing unit coupled to a first bus, a programmable logic device coupled to a second bus and a bus bridge coupled between the first and second buses. In this aspect of the invention, the first bus operates within a first clock domain and the second bus operates within a second clock domain.
In a third aspect of the invention, a digital system on a semiconductor chip includes a central processing unit (CPU) coupled to a first bus in a first clock domain defined by a first bus clock frequency; a plurality of electronic devices coupled to a second bus in a second clock domain defined by a second bus clock frequency; a bus bridge coupled between the first and second buses and operable to allow communication between the CPU at the first bus clock frequency and one of the plurality of electronic devices at the second bus clock frequency; a programmable logic device (PLD) coupled to a third bus in a third clock domain; and a PLD bridge coupled between the second and third buses.
The following detailed description and the accompanying drawings provide a better understanding of the nature and advantages of the present invention.
BRIEF DESCRIPTION OF THE DRAWINGSFIG. 1 is diagram of a digital system with a programmable logic integrated circuit;
FIG. 2 is a block diagram of a digital system according to an embodiment of the present invention;
FIG. 3 is a block diagram of a system having a multiple bus architecture according to an embodiment of the present invention;
FIG. 4 shows a more detailed and exemplary diagram of a first bus in FIG. 3, and its connectivity to exemplary components and peripherals, according to an embodiment of the present invention;
FIG. 5 shows a more detailed and exemplary diagram of a second bus in FIG. 3, and its connectivity to exemplary components and peripherals, according to an embodiment of the present invention; and
FIG. 6 shows an exemplary block diagram of a bridge according to an embodiment of the present invention.
DESCRIPTION OF THE SPECIFIC EMBODIMENTSFIG. 1 shows a block diagram of a digital system within which the present invention may be embodied. The system may be provided on a single board, on multiple boards, or even within multiple enclosures. FIG. 1 illustrates asystem10 in which aprogrammable logic device106 may be utilized. Programmable logic devices are currently represented by, for example, Altera's MAX®, FLEX®, and APEX™ series of PLDs.
In the particular embodiment of FIG. 1, asemiconductor device100 is coupled to amemory102 and an I/O104 and comprises a programmable logic device (PLD)106 and embedded logic, which may include, among other components, aprocessor109. The system may be a digital computer system, digital signal processing system, specialized digital switching network, or other processing system. Moreover, such systems may be designed for a wide variety of applications such as, merely by way of example, telecommunications systems, automotive systems, control systems, consumer electronics, personal computers, and others.
Referring now to FIG. 2, there is shown a diagram of asystem20 having a multiple bus architecture, according to an embodiment of the present invention. The bus architecture is comprised ofbus masters200,201,202 and204, each of which can communicate with one or more of the peripherals in the system, e.g.,memory206, and other peripherals208-216 such as, for example, I/O devices, etc., via bridges218-224. The principle function of each bus master is to manage the bus it is associated with and control what devices can access the bus. Bridges218-224 function to allow communication between a bus master in a first clock domain with a peripheral in a second clock domain, thereby allowing components on each bridge to operate at their individually optimal speeds. A bridge accomplishes this by preferably including a first-in first-out (FIFO) buffer, which accepts data at the clock rate of a first bridge and writes it out to a second bus at the clock rate of the second bus. So long as each bus master is accessing a different peripheral on a different bus, employment of bus bridges218-224 leads to enhanced system performance, since multiple bus masters can communicate with different peripherals on different buses simultaneously without the problem of bus access contention. In other words, this embodiment of the present invention provides for the division of processing elements into their own clock domains226-232 and provides bridges218-224, which allow communication to other devices on buses across clock domains226-232. Nevertheless, the bus architecture ofsystem20 is flexible enough to accommodate multiple bus masters,e.g. bus masters200 and202, sharing the same bus. The only condition is that the bus masters run at the same frequency. Each clock domain can derive from independent clock sources or derive from a division of one or more clock sources. Whereas the embodiment in FIG. 2 is shown to have a certain number of bus masters and peripheral devices, it should be realized that this number is merely exemplary and that a design having any number of bus master, buses, bridges and peripherals is possible and, therefore, within the scope of the present invention.
FIG. 3 shows a portion of embedded logic illustrating an exemplary implementation of the multiple bus architecture shown in FIG.2. Access to a peripheral is controlled by a number of bus masters connected by a bus structure comprised of two or more buses, and which is described in greater detail below. In this exemplary implementation, there are three bus masters, includingprocessor300,PLD Master302 andConfiguration Logic304. These bus masters300-304 are capable of initiating read and write operations by providing address and control information.Processor300 is connected to a first bus306 (e.g. a 32-bit AHB bus).First bus306 also connects to one or more peripheral devices such as a synchronous dynamic random access memory (SDRAM)controller330, on-chip static random access memory (SRAM) (single310 and dual312 port), processor only peripherals, for example, an interruptcontroller314 for receiving an interrupt signal from another peripheral and reporting the signal to theprocessor300, and awatchdog timer316, which functions to cause the system to reset if, for example, certain logic states withinprocessor300 do not toggle within a predefined time period. A test interface controller (TIC)318 can also be connected tofirst bus306 for functional testing.
The remaining bus masters, which in this example arePLD Master302 andConfiguration Logic304, share a second bus307. Second bus307 can be, for example, a standard 32-bit AHB bus that can provide for a lower memory access speed, byPLD Master302 andConfiguration logic304, than may be required forprocessor300, which is, as described above, connected tofirst bus306. Similarly, peripherals that can be accessed with a relatively larger degree of latency tolerance can be connected to second bus307. Some of the modules connected to second bus307 may include, for example, a universal asynchronous transceiver (UART)320, abus expansion322, atimer324,clock generator326, a reset/mode controller328, anSDRAM memory controller330 for controlling external SDRAM, and single and dual on-chip static random access memories (SRAMs)310 and312.Bus expansion322 is used primarily to connect to external memory, for example, Flash memory from whichprocessor300 can boot.Clock generator326 is preferably programmable so that a desired clock frequency can be set for second bus307. Both single310 and dual312 SRAMs may be divided into multiple blocks (e.g. divided in two, as in FIG.4), each having their own bus arbitration. Division permits concurrent access to different blocks by bus masters on first306 and second307 buses. Second bus307 is also connected to aPLD slave bridge332 and aPLD master bridge334, each of which is interfaced to a PLD in the system (not shown in FIG.3), via third336 and fourth338 buses, respectively. Third336 and fourth338 buses can be, for example, standard 32-bit AHB buses. (Alternatively, a bridge to and from the PLD may be configured in a single device.) In this particular embodiment, the PLD may be, for example, an APEX™ 20KE, which is manufactured by Altera Corporation and described inAltera Data Book(1999), which is incorporated by reference.
FIG. 4 showsfirst bus306 in greater detail.First bus306 is clocked by, for example, a dedicated phase locked loop (PLL), which allows the maximum possible performance to be achieved byprocessor300. The clock frequency can be made selectable by writing toclock generator module326. Anaddress decoder440 provides selection ofbus bridge325,SDRAM memory controller330, on-chip SRAM310 and312, interruptcontroller314 andwatchdog timer316 in accordance with memory maps of the various modules.Address decoder440 selects one of these elements by comparing address information encoded in memory map registers (not shown in FIG. 3) on second bus307 to an address output byprocessor300. If the address output byprocessor300 is within an address range of any one of the elements onfirst bus306, then a select line for the corresponding element is activated. If access is not being made for elements coupled exclusively to first bus306 (e.g. memory controller330, interruptcontroller314, watchdog time316) or forSRAM310 or312, then access is directed to an element on second bus307 viabus bridge325.
FIG. 5 shows second bus307 from FIG. 2 in greater detail. Second bus307 may be clocked by, for example, a divided down version of the clock that clocksfirst bus306 or may be a clock unrelated to the first bus clock. A register for selection of this frequency is located withinclock generator module326. Address decoder340 provides for selection ofSDRAM memory controller330,bus expansion322, on-chip SRAM310 and312,UART320,clock generator326,timer324, reset/mode control328,PLD slave bridge332, etc. according to the system's memory map. Reset/mode controller328 functions to reset the system and control its mode of operation. It may also contain memory map registers a user can access to configure a memory map for the system. Second bus307 also includes anarbiter542 for determining which bus master,PLD master302 orconfiguration logic304 or bus masters on first bus306 (via bus bridge325) has access to second bus307.
First306 and second307 buses are coupled to each other bybus bridge325.PLD master334 andslave332 bridges are substantially identical withbus bridge325 with only minor differences related to the chosen address decoding scheme and bus structure. An exemplary embodiment of abridge60 is shown in FIG.6. An originating bus600 of a transaction is connected to that bridge'sslave602 while that bridge'smaster604 is connected to a destination bus606.Bridge60 includessynchronization logic608, which allows the master and slave interfaces to reside in different clock domains. The master and slave interfaces ofbridge60 can be synchronous or asynchronous relative to each other. If synchronous,bridge60 can be configured to bypasssynchronization logic608 to reduce the latency throughbridge60.
Awrite buffer610 is configured to accept bursts of posted write data from slave interface. Preferably, the bus protocol allows for several transfers of write data to be concatenated to enhances bus performance. No wait states are inserted so long as a buffer entry is free to accept the data. A write request is generated by slave interface and is synchronized to the master clock domain.Master604 de-queues data fromwrite buffer610, writes it out to destination bus606 and then asserts an acknowledge signal toslave602 to indicate that a buffer entry is now free for re-use byslave602. Sending an acknowledge signal back toslave602 accounts for the difference in clock frequencies in the slave and master clock domains. Without write posting, for example, ifmaster604 isprocessor300 onfirst bus306 andslave602 is one of the slaves on second bus307, as in FIG. 3, processor would have to wait for each single transfer to complete before it send the next transfer. Sinceprocessor300 will normally run at a higher frequency than slaves on second bus307, write posting allows theprocessor300 to run at its optimal speed. In an exemplary embodiment, write posting is controlled by action of the bridge coupled between the two buses. Preferably, each bridge includes a first-in first-out (FIFO), which accepts data at the clock rate of the first bridge, buffers it and writes it out to the second bus at the clock rate of the second bus. The FIFO thereby allowsprocessor300, for example, to carry out its next action at its own optimal clock rate and is not stalled by having to wait for data to be written to the second bus307.
When selected by a read transaction,slave602 asserts a read request that is synchronous to the master clock domain.Master604 performs a read transaction (pre-fetching data to fill aread buffer612 if enabled) and asserts an acknowledge signal to indicate when data is available. Read buffer tags are used to return the status of the transaction (e.g. OK, ERROR, RETRY).
Slave interface also provides access to a bridge status register and address status register (not shown in FIG.6). These registers contain information pertaining to a posted write transaction that resulted in an ERROR response, could not arbitrate for the destination bus, or could not complete an access that had a RETRY response. Whenslave602 indicates that a transfer is pending,master604 uses the address and control information to perform the requested transaction on destination bus606.Master604 will only read data from destination bus606 if there is a free entry inread buffer612 to receive it. If no free entries are available, then master604 will insert BUSY cycles. Similarly, if no data is available fromwrite buffer610 during a write transaction,master604 will insert BUSY cycles.
In conclusion, the present in invention discloses a bus architecture of the present invention, in particular, is embodied as a multiple bus master system, which allows communication among all peripherals in the system via bridges that de-couple clock frequencies of the individual bus masters from the peripheral they are accessing. The bus architecture of the present invention, therefore, allows various system units to run at their optimal speeds and reduces bus contention.
The foregoing description of preferred exemplary embodiments has been presented for the purposes of description. It is not intended to be exhaustive or to limit the invention to the precise form described herein, and modifications and variations are possible in light of the teaching above. Accordingly, the true scope and spirit of the invention is instead indicated by the following claims and their equivalents.