CROSS-REFERENCE TO RELATED APPLICATIONSThis application is a continuation of co-pending U.S. patent application Ser. No. 11/737,030, filed Apr. 18, 2007, which is a continuation of U.S. patent application Ser. No. 11/297,088, filed Dec. 7, 2005, now issued as U.S. Pat. No. 7,227,380, which is a continuation of U.S. patent application Ser. No. 10/948,010, filed Sep. 22, 2004, now U.S. Pat. No. 6,980,027, which is a continuation of U.S. patent application Ser. No. 10/448,259, filed May 28, 2003, now U.S. Pat. No. 6,838,902, issued Jan. 4, 2005, which are hereby incorporated by reference as if set forth herein.
BACKGROUND OF THE SYSTEM1. Field of the System
The present system relates to field programmable gate array (FPGA) devices. More specifically, the system relates to a synchronous first in/first out memory module for an FPGA.
2. Background
FPGAs are known in the art. An FPGA comprises any number of logic modules, an interconnect routing architecture and programmable elements that may be programmed to selectively interconnect the logic modules to one another and to define the functions of the logic modules. To implement a particular circuit function, the circuit is mapped into the array and the appropriate programmable elements are programmed to implement the necessary wiring connections that form the user circuit.
An FPGA core tile may be employed as a stand-alone FPGA, repeated in a rectangular array of core tiles, or included with other functions in a system-on-a-chip (SOC). The core FPGA tile may include an array of logic modules, and input/output modules. An FPGA circuit may also include other components such as static random access memory (SRAM) blocks. Horizontal and vertical routing channels provide interconnections between the various components within an FPGA core tile. Programmable connections are provided by programmable elements between the routing resources.
An FPGA circuit can be programmed to implement virtually any set of digital functions. Input signals are processed by the programmed circuit to produce the desired set of outputs. Such inputs flow from the user's system, through input buffers and through the circuit, and finally back out to the user's system via output buffers. The bonding pad, input buffer and output buffer combination is referred to as an input/output port (I/O). Such buffers provide any or all of the following input/output (I/O) functions: voltage gain, current gain, level translation, delay, signal isolation or hysteresis.
As stated above, many FPGA designers incorporate blocks of SRAM into their architecture. In some applications, the SRAM blocks are configured to function as a first-in/first-out (FIFO) memory. A FIFO is basically a SRAM memory with automatic read and write address generation and some additional control logic. The logic needed to implement a FIFO, in addition to the SRAM blocks, consists of address generating logic and flag generating logic.
Counters are used for address generation. Two separate counters are used in this application for independent read and write operations. By definition, a counter circuit produces a deterministic sequence of unique states. The sequence of states generated by a counter is circular such that after the last state has been reached the sequence repeats starting at the first state. The circular characteristic of a counter is utilized to generate the SRAM's write and read addresses so that data is sequenced as the first data written to the SRAM is the first data read. The size of the sequence produced by the counters is matched to the SRAM address space size. Assuming no read operation, when the write counter sequence has reached the last count, the SRAM has data written to all its addresses. Without additional control logic, further write operations would overwrite existing data starting at the first address.
Additional logic is needed to control the circular sequence of the read and write address counters in order to implement a FIFO. The control logic enables and disables the counters when appropriate and generates status flags. The read and write counters are initialized to produce a common start location. The control logic inhibits reading at any location until a write operation has been performed. When the write counter pulls ahead of the read counter by the entire length of the address space, the SRAM has data written to all its addresses. The control logic inhibits overwriting an address until its data has been read. Once the data has been read, the control permits overwriting at that address. When the read counter catches up to the write counter, the SRAM no longer contains valid data and the control logic inhibits reading until a write operation is performed.
Output signals, known to those of ordinary skill in the art as flags, provide the system with status on the SRAM capacity available. The full and empty conditions are indicated through full and empty flags. Two additional flags are generated to warn of approaching empty or full conditions.
FPGAs have programmable logic to implement this control logic. With the availability of a SRAM block, an FPGA application may be configured to operate as a FIFO memory. Many prior art FPGAs use this approach. However, considerable FPGA gates are consumed when implementing the control logic for a FIFO in this manner and this increases the cost of the application. Also, the performance of the FIFO is likely to be limited by the speed of the control logic and not the SRAM.
Hence, there is a need for an FPGA that has dedicated logic specifically included to implement a FIFO. The FIFO logic may included among the SRAM components in an FPGA core tile. The result is improved performance and a decrease in silicon area needed to implement the functions with respect to implementing the FIFO-function with FPGA gates.
SUMMARY OF THE SYSTEMA field programmable gate array having a plurality of random access memory blocks coupled to a plurality of dedicated first-in/first-out memory logic components and a plurality of random access memory clusters programmably coupled to the rest of the FPGA is described.
A better understanding of the features and advantages of the present invention will be obtained by reference to the following detailed description of the invention and accompanying drawings which set forth an illustrative embodiment in which the principles of the invention are utilized.
BRIEF DESCRIPTION OF THE DRAWINGSFIG. 1 is a block diagram of a one-tile FPGA of the present system.
FIG. 2 is a block diagram of an FPGA includingmultiple core tiles102 as shown inFIG. 1.
FIG. 3 is a simplified block diagram of a synchronous random access memory (SRAM) module of the present system.
FIG. 4 is a simplified schematic diagram illustrating the FIFO logic component of the present system.
FIG. 5 is a simplified block diagram illustrating the architecture of a RAM cluster of the present system.
FIG. 6 is a simplified schematic diagram illustrating RT module, RN module, RI module and RO module of a RAM cluster ofFIG. 5.
FIG. 7 is a simplified schematic diagram illustrating RC module of a ram cluster ofFIG. 5.
DETAILED DESCRIPTION OF THE INVENTIONThose of ordinary skill in the art will realize that the following description of the present invention is illustrative only and not in any way limiting. Other embodiments of the invention will readily suggest themselves to such skilled persons.
In the present disclosure, Vcc is used to define the positive power supply for the digital circuit as designed. As one of ordinary skill in the art will readily recognize, the size of a digital circuit may vary greatly depending on a user's particular circuit requirements. Thus, Vcc may change depending on the size of the circuit elements used.
Moreover, in this disclosure, various circuits and logical functions are described. It is to be understood that designations such as “1” and or “0” in these descriptions are arbitrary logical designations. In a first implementation of the invention, or “1” may correspond to a voltage high, while “0” corresponds to a voltage low or ground, while in a second implementation, “0” may correspond to a voltage high, while “1” corresponds to a voltage low or ground. Likewise, where signals are described, a “signal” as used in this disclosure may represent the application, or pulling “high” of a voltage to a node in a circuit where there was low or no voltage before, or it may represent the termination, or the bringing “low” of a voltage to the node, depending on the particular implementation of the invention.
FIG. 1 is a block diagram of anillustrative core tile102 in anFPGA100 of the present system.FPGA core tile102 comprises an array oflogic clusters104, static random access memory (SRAM)clusters106 and static random access memory (SRAM)modules108.Logic clusters104 andSRAM clusters106 are connected together by a routing interconnect architecture (not shown) that may comprise multiple levels of routing interconnects.FPGA core tile102 is surrounded by input/output (I/O)clusters110, input/output (I/O) FIFO control blocks114 and input/output banks112. There are two rows of I/O clusters110 on the top and bottom edges ofFPGA100 and one column of I/O clusters on the left and right edge ofFPGA100. In the present example, for illustrative purposes only, there are sevenSRAM clusters106 adjacent to and interacting with eachSRAM module108.
FIG. 2 is a block diagram of an illustrative FPGA includingmultiple core tiles102 as shown as an example inFIG. 1. As shown inFIG. 2,FPGA120 comprises fourcore tiles102, though other numbers of tiles are possible.Core tiles102 are surrounded by I/O clusters110, input/output FIFO control blocks114 and I/O banks112.
FIG. 3 is a simplified block diagram of a static random access memory (SRAM) block108 of the present system. The present system combines dedicated control logic with a two port SRAM to produce a FIFO. As set forth inFIGS. 1 and 2, there are fourSRAM blocks108 along the one side ofFPGA tile102. EachSRAM block108 may be configured to operate as an individual SRAM module or modules may be cascaded together to produce wider or deeper memory combinations. As set forth in greater detail below, dedicated FIFO control logic has been added to each SRAM block.
Referring still toFIG. 3,SRAM block108 comprises aSRAM component150.SRAM component150 is a memory component. Memory components are well known to those of ordinary skill in the relevant art and can vary greatly depending on the application. Writedata bus152 and writeaddress bus156 are coupled toSRAM component150 throughregister154. Write data enablesignal lines158 are each coupled toSRAM component150 through one input of two-input XOR gates160,162,164,166,168, ANDgate170 and register154. The second input of two-input XNOR gates160,162,164,166,168 is provided by write enablecontrol lines172.Register154 receives a clock signal through writeclock signal line159. Readaddress bus174 is coupled to SRAM component throughregister176. Read enablesignal lines178 are each coupled toSRAM component150 through one input ofXOR gates180,182,184,186,188, ANDgate190 and register176. The second input ofregister176 receives a clock signal through readclock signal line192. Input signal busses194 and196 provide the signals for determining the write word width and read word width respectively. Readdata bus198 is coupled to the output ofSRAM component150 throughregister199 and two-input multiplexer197.
In the present example, for illustrative purposes only,SRAM block108 has multiple bits accessible by two independent ports: a read only port (all circuitry on the right of SRAM block108) and a write only port (all circuitry on the left of SRAM block108). Both ports may be independently configured in multiple words by bits per words combinations. For example, both ports may be configured as 4,096×1, 2,048×2, 1,024×4, 512×9, 256×18 and 128×36. In addition, a plurality of SRAM blocks may be cascaded together by means ofbusses152,156,158,174,178,198. In the present example, there are five enable lines for each port, one for real enable and four for higher order address bits. The ten XOR gates are used to invert or not invert the lines on a block-by-block basis effectively making ANDgates170 and190 decoders with programmable bubbles on the inputs. The write port is synchronous to the write clock and the read port is synchronous to the read clock. As one of ordinary skill in the art would readily recognize, the above example is illustrative only, many other configurations or memory blocks could be used.
FIG. 4 is a simplified schematic diagram illustrating theFIFO logic component200 of the present invention.FIFO logic component200 is coupled between static random access memory (SRAM)clusters106 and static random access memory (SRAM)block108. In the present example, for illustrative purposes only,FIFO logic component200 is coupled between seven static random access memory (SRAM)clusters106 and static random access memory (SRAM)block108. Two input ANDgate202 has its non-inverting and inverting inputs coupled to randomaccess memory cluster106 viasignal lines240 and242 respectively and an output coupled to addresscomparator238, and tosubtractor circuit222 throughcounter210 and to addresscomparator232 throughregisters218 and220. The output of two-input ANDgate202 may also be coupled toRAM module108 throughtri-state buffer206. The output ofcounter210 may also be coupled to SRAM block108 throughtri-state buffer214. Two input ANDgate204 has its non-inverting and inverting inputs coupled toSRAM cluster106 throughsignal lines244 and246 respectively and its output coupled to addresscomparator232 throughcounter212. Two input ANDgate204 also has its output coupled to addresscomparator238 throughcounter212, register224 and register226 and its output is also coupled tosubtractor222 throughcounter212. The output of two-input ANDgate204 may also be coupled toSRAM module108 throughtri-state buffer208. The output ofcounter212 may also be coupled to SRAM block108 throughtri-state buffer216.Buffers206,208,214 and216 receive their control signals fromSRAM clusters106programmable configuration bits248.
Referring still toFIG. 4,subtractor circuit222 has its output coupled to one input ofmagnitude comparators234 and236.Magnitude comparators234 and236 receive their second input from theprogrammable configuration bits228 and230 respectively. The configuration bits in228 and230 are programmable threshold values need to generate the almost full and almost empty flags respectively.
Readdata bus250 and writedata bus252 are coupled directly toSRAM block108. When the FIFO logic component is not active,controller bits248 are set at 0 disabling thetri-state buffers206,208214 and216. When the SRAM is not configured as a FIFO, all input signals originate fromadjacent SRAM clusters106. When a SRAM is configured as a FIFO, a select set of signals from the RAM cluster modules are set to high impedance andFIFO logic component200 seizes control of the signal lines. WhenFIFO logic component200 is active, it seizes control of the write enablesignals158, the read enablesignals178 and the read and writeaddress lines174 and156 respectively as shown inFIG. 3.
Counters210 and212 are binary counters, however, they also generate gray code. Gray code or “single distance code” is an ordering of 2nbinary numbers such that only one bit changes between any two consecutive elements. The binary value is sent tosubtractor222 to calculate the difference between the read and write counters for the almost full and almost empty flags. The gray code is sent to addresscomparators232 and238 as well as totri-state buffers214 and216. In gray code, one and only one bit changes between any two consecutive codes in the sequence. The purpose ofregisters218 and220 is to synchronize the read counter address in210 to write clock signal and the purpose ofregisters224 and226 is to synchronize the write counter address to read clock signal for comparison purposes. Because there is no requirement that readclock signal253 and write clock signal be synchronous, there is no guarantee that the outputs of210 will not be changing during the setup and hold time windows ofregister218. Because of the likelihood of change during the register setup and hold time window, there is a chance of an uncertain result. The chance of an uncertain result is limited by using gray code to make sure that only one bit can change at a time. However the uncertainty on that one bit resolves itself, the result is that the bit will either get the last address or the next address and no other address when comparing the read and write addresses.
When the memory is full writing must be inhibited to prevent overwriting valid data in the SRAM. To control this the comparison between the read and write addresses is done in the write clock (WCK) time domain since write operations are synchronous to WCK. The readaddress counter210 gray code sampled two WCK cycles in the past byregisters218 and220 is compared to the currentwrite address counter212 gray code bycomparator232. If the result is equal, then the SRAM may be full and writing is inhibited. There is no way to reliably know for certain if the SRAM is really full. The read address being compared is two WCK cycles old and one or more read operations may have occurred during that time. However, by erring on the side of safety when it is possible that the memory might be full, overwriting of data can be reliably prevented.
In a similar manner, when the memory is empty reading must be inhibited to prevent outputting invalid data from the SRAM. To control this the comparison between the write and read addresses is done in the RCK time domain since read operations are synchronous to RCK. Thewrite address counter212 gray code sampled two RCK cycles in the past byregisters224 and226 is compared to the currentread address counter210 gray code bycomparator238. If the result is equal, then the SRAM may be empty and reading is inhibited. There is no way to reliably know for certain if the SRAM is really full. The write address being compared is two RCK cycles old and one or more read operations may have occurred during that time. However, by erring on the side of safety when it is possible that the memory might be empty, reading of invalid data can be reliably inhibited.
Since both a full and an empty condition are detected by equality between the read and write addresses, a way to tell the difference between the two conditions is require. This is accomplished by having an extra most significant bit (MSB) incounters210 and212 which is not part of the address space sent to the SRAM block (and not shown inFIG. 4 to avoid overcomplicating the disclosure and obscuring the invention). Additional logic (also not shown) inside eachcomparator232 and238 compares the read and write MSBs. When the two MSBs are equal and the read and write addresses are equal incomparator238, this implies a possible empty condition. When the two MSBs are not equal and the read and write addresses are equal incomparator232, this implies a possible full condition.
FIG. 5 is a simplified block diagram illustrating the architecture of aRAM cluster106 of the present system. As would be clear to those of ordinary skill in the art having the benefit of this disclosure,RAM cluster106 may comprise any number of the logic components as indicated below. The examples set forth below are for illustrative purposes only and in no way limit the scope of the present invention. Random access memory clusters106(0-6) further comprise twosub-clusters300 and302. Eachsub cluster300 and302 has two transmitter modules314 and tworeceiver modules312.Right sub cluster302 has abuffer module316.
To avoid overcomplicating the disclosure and thereby obscuring the present invention,receiver modules312, transmitter modules314 andbuffer module316 are not described in detail herein. The implementation ofreceiver modules312 and transmitter modules314 suitable for use according to the present system is disclosed in co-pending U.S. patent application Ser. No. 10/323,613, filed on Dec. 18, 2002, and hereby incorporated herein by reference. The implementation ofbuffer modules316 suitable for use according to the present system is disclosed in U.S. Pat. No. 6,727,726, issued Apr. 27, 2004, and hereby incorporated herein by reference.
In the present example, for illustrative purposes only, the interface to eachSRAM block108 is logically oneRAM cluster106 wide and seven rows long. Thus, there is a column of seven RAM clusters106(0) through106(6) for everySRAM block108.Sub-clusters300 and302 of RAM cluster106(0) each have one RAM clock interface input (RC)module304, six single ended input (RT)modules306 and two RAM interface output (RO)modules308 in addition to the two transmitter modules314 and tworeceiver modules312 as set forth above.Right sub cluster302 also has abuffer module316.RC modules304 in RAM cluster106(0) select the write and read clock signals from all the HCLK and RCLK networks or from signals in either of two adjacent two routed channels and determine their polarity.RC modules304 will be discussed in greater detail below. EachRT module306 provides a control signal toSRAM module108 which is either routed from a single channel or tied off tologic 1 orlogic 0.RO modules308 transmit read-data or FIFO flags fromSRAM module108 into an individual output track.RT modules306 andRO modules308 will be discussed in greater detail below.
Sub-clusters300 and302 of RAM clusters106(1-6) each have three two-input RAM channel-up/channel-down non-cascadable signal (RN)modules310, threeRO modules308 and six two-input RAM channel-up/channel-down cascadable signal (RI)modules309 in addition to the two transmitter modules314 and tworeceiver modules312 as set forth above.Right sub cluster302 also has abuffer module316.RN modules310 andRI modules309 provide an input signal toSRAM module108 that can be routed from two rows, the one in which it is located and the row immediately above it.
FIG. 6 is a simplified schematic diagram illustratingRT module306,RN module310,RI module309 andRO module308 of a RAM cluster ofFIG. 5.RT module306 comprises abuffer354 that has an input programmably coupled to a horizontal routing track in routing architecture row352. As is known to those of ordinary skill in the art, there are types of programmable elements. Illustrative examples of such programmable elements include, but are not limited to, MOS transistors, flash memory cell and antifuses.Buffer354 has an output that is coupled toSRAM block108.
RN module310 comprises a two-input ANDgate356 and abuffer358. One input of two-input ANDgate356 is programmably coupled to a horizontal routing track in routing architecture row350. The second input of two-input ANDgate356 is programmably coupled to a horizontal routing track in routing architecture row352. The output of two-input ANDgate356 is coupled toSRAM module108 throughbuffer358.
RI module309 comprises a two-input NAND gate376 having the ability to select a signal from routingarchitecture row150 or152. Two-input NAND gate376 has an output coupled to SRAM block108 throughtri-state buffer380 and one inverted signal input of a two-input ORgate378. Two-input OR gate has a second input coupled to Vcc or ground and its output coupled toSRAM module108 throughtri-state buffer380. In the present disclosure, Vcc is used to define the positive power supply for the digital circuit as designed. As one of ordinary skill in the art will readily recognize, the size of a digital circuit may vary greatly depending on a user's particular circuit requirements. Thus, Vcc may change depending on the size of the circuit elements used.
In this disclosure, various circuits and logical functions are described. It is to be understood that designations such as “1” and “0” in these descriptions are arbitrary logical designations. In a first implementation of the invention, “1” may correspond to a voltage high, while “0” corresponds to a voltage low or ground, while in a second implementation, “0” may correspond to a voltage high, while or “1” corresponds to a voltage low or ground. Likewise, where signals are described, a “signal” as used in this disclosure may represent the application, or pulling “high” of a voltage to a node in a circuit where there was low or no voltage before, or it may represent the termination, or the bringing “low” of a voltage to the node, depending on the particular implementation of the invention.
RO module308 comprises abuffer360 having an input coupled to FIFO control block200 orSRAM block108. The output ofbuffer360 requires programming voltage protection and drives an output track which in routing architecture row352.
FIG. 7 is a simplified schematic diagram illustratingRC module304 of a ram cluster ofFIG. 5.RC module304 comprises a fourinput multiplexer362 having inputs coupled to the clock network bus370 (not shown).Multiplexer362 has an output coupled to a first input of a two-input multiplexer365. The second input of two-input multiplexer365 is selectively programmably coupled to the routing architecture inrows372 and374 through two-input ANDgate364. Two-input multiplexer365 has an output coupled to an input of a two-input XNOR gate that has a second input programmably coupled to Vcc or ground inrouting architecture row372. The output ofXNOR gate366 is coupled to SRAM block108 throughbuffer368.
While embodiments and applications of this system have been shown and described, it would be apparent to those skilled in the art that many more modifications than mentioned above are possible without departing from the inventive concepts herein. The system, therefore, is not to be restricted except in the spirit of the appended claims.