FIELD OF THE INVENTIONThe present invention relates to computer systems; more particularly, the present invention relates to interfacing with memory devices.
BACKGROUNDA memory controller is an integrated circuit located on the motherboard, or processor die, within a computer system that manages the flow of data to and from a main memory device. Particularly, memory controllers include logic necessary to read and write data to dynamic RAM (DRAM). A component of the logic includes a clocking architecture to carry out transactions with the DRAM.
The clocking architecture typically includes special delay locked loops (DLL) that are used to transmit de-skew and receive de-skew. However, the conventional clocking architecture implements a relatively large number of logic components to control all de-skewing for a single memory controller channel.
BRIEF DESCRIPTION OF THE DRAWINGSThe invention is illustrated by way of example and not limitation in the figures of the accompanying drawings, in which like references indicate similar elements, and in which:
FIG. 1 is a block diagram of one embodiment of a computer system;
FIGS. 2A and 2B illustrate a conventional transmit delay locked loop architecture;
FIG. 3 illustrates a conventional receive delay locked loop architecture;
FIGS. 4A and 4B illustrate one embodiment of a global clocking architecture;
FIG. 5 illustrates one embodiment of a modular clocking architecture;
FIG. 6 illustrates another embodiment of a modular clocking architecture; and
FIG. 7 is a block diagram of another embodiment of a computer system.
DETAILED DESCRIPTIONA modular memory controller clocking architecture is described. In the following detailed description of the present invention numerous specific details are set forth in order to provide a thorough understanding of the present invention. However, it will be apparent to one skilled in the art that the present invention may be practiced without these specific details. In other instances, well-known structures and devices are shown in block diagram form, rather than in detail, in order to avoid obscuring the present invention.
Reference in the specification to “one embodiment” or “an embodiment” means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the invention. The appearances of the phrase “in one embodiment” in various places in the specification are not necessarily all referring to the same embodiment.
FIG. 1 is a block diagram of one embodiment of acomputer system100.Computer system100 includes a central processing unit (CPU)102 coupled to interconnect105. In one embodiment,CPU102 is a processor in the Pentium® family of processors available from Intel Corporation of Santa Clara, Calif. Alternatively, other CPUs may be used. For instance,CPU102 may be implemented as multiple processors, or multiple processor cores.
In a further embodiment, achipset107 is also coupled to interconnect105.Chipset107 may include a memory control hub (MCH)110. MCH110 may include amemory controller112 that is coupled to amain system memory115.Main system memory115 stores data and sequences of instructions that are executed byCPU102 or any other device included insystem100.
In one embodiment,main system memory115 includes one or more DIMMs incorporating dynamic random access memory (DRAM) devices; however,main system memory115 may be implemented using other memory types. Additional devices may also be coupled to interconnect105, such as multiple CPUs and/or multiple system memories.
MCH110 may be coupled to an input/output control hub (ICH)140 via a hub interface. ICH140 provides an interface to input/output (I/O) devices withincomputer system100. ICH140 may support standard I/O operations on I/O interconnects such as peripheral component interconnect (PCI), accelerated graphics port (AGP), universal serial interconnect (USB), low pin count (LPC) interconnect, or any other kind of I/O interconnect (not shown). In one embodiment, ICH140 is coupled to awireless transceiver160.
FIG. 7 illustrates another embodiment ofcomputer system100. In this embodiment,memory controller112 is included withinCPU102. As a result,memory115 is coupled toCPU102.Further chipset107 includes a control hub740.
Notwithstanding the embodiment, memory controller performs memory transactions withmain memory115 by transferring data betweencomputer system100 andmemory115. To perform the memory transactions,memory controller112 includes a clocking mechanism having delay locked loops (DLL) that are used to transmit de-skew and receive de-skew.FIG. 2A illustrates a conventional transmit delay locked loop architecture.
On the transmit side shown inFIG. 2A, the mechanism includes a DLL coupled to a phase locked loop (PLL) and several slave delay lines. A delay locked loop serves as a component to maintain delay tracking over PVT. Each slave delay line is coupled to a phase interpolator (PI) and a CMOS converter, which is further coupled to a transmitter.
The DLL sets the requisite delay in each of a number of delay elements within the DLL. This delay tracks Process, Voltage & Temperature (PVT) variations, is converted to an analog voltage (bias) and coupled to the slave delay lines. The PI coupled to each slave delay line creates a finer step of the delay and distributes the resultant clocks to each of the high speed10 transmitters, such as the Stub Series Termination Logic (SSTL) driver.
In a memory controller implementing a conventional clocking mechanism, there are typically eleven groups of transmitters that are skewed independently. Hence, there are eleven slave delay lines and corresponding clock buffers in the transmit direction. These clocking circuitries are located at a centralized location, as shown inFIG. 2B. Thus, the conventional clocking mechanism features the physical locations of the High Speed Drivers being far away from the clocking circuitry (e.g. ˜3000 um away) in the original design
FIG. 3 illustrates a conventional receive delay locked loop architecture. On the receive side, there are slave delay lines receiving a channel strobe or clock from the DRAMs. The slave delay lines are pre-programmed to a specific delay such that the internal strobe or clock would be center strobe with respect to the receive data. Another DLL and slave delay lines are used to create the requisite delay for every 8 bits (or byte) of receiving data. In a typical one channel memory controller, there are 8 bytes of receiving data. As a result, there will be eight sets of slave delay lines.
The problem with the conventional memory controller clocking mechanism is that the memory controller uses a total of nine DLLs & nineteen slave delay lines to control all of the de-skewing in a one channel memory controller. Further, the transmit deskew delays are generated at one location and then transmitted to the individual I/O transmitters, which are far away from the generation location. This results in area and power inefficiencies, as well as lost deskew setting accuracy when the data rate is scaled up.
According to one embodiment,memory controller112 includes a clocking architecture for both transmit and receive clock circuitries that reduces the number of delay locked loops and the number of slave delay lines, resulting in a reduction in silicon area and power, while providing comparable to better resolution to the conventional mechanism.
FIG. 4A illustrates one embodiment of aglobal clocking mechanism400.Clocking mechanism400 includes aPLL410 and data/command modules420. Eachmodule420 includes a Master DLL (MDLL). According to one embodiment, aPLL410 supplies a true differential reference clock to the MDLLs that provides a low jitter reference clock.Clocking mechanism400 also includes high speed input/output (HSIO) interfaces that facilitate data transfers withmemory112.
FIG. 4B illustrates another embodiment ofglobal clocking mechanism400, where the location of the MDLL in eachmodule420 has a location to enable share between the transmit and receive circuitry. This feature improves accuracy, the number of clock components and power.
FIG. 5 illustrates one embodiment of amodule420 coupled toPLL410. As shown inFIG. 5,module420 includes both transmit and receive clocking circuitry. The transmit side is shown on the top half component ofFIG. 5, while the receive side is shown as the bottom half component.Module420 includesMDLL510,slave delay lines520, as well as additional components (e.g., PIs, converters, etc.).
On the transmit side ofmodule420,MDLL510 generates de-skew clocks together with a set of PIs as well as maintaining the required delay. The PIs are now used for transmit bit de-skew. Therefore in one embodiment, eleven PIs are implemented, as opposed to the eleven slave delay lines employed in conventional transmit clocking components. Because the size of each PI is smaller than each slave delay line, there is a reduction in the silicon area needed to fabricatemodule420.
In one embodiment, the delay generated byMDLL510 is converted to an analog bias voltage, as shown inFIG. 5. The bias voltage is connected toslave delay lines520 for data receiving de-skewing. In such an embodiment, no additional DLL is required for the receive directions, which further reduces the needed silicon area.
FIG. 6 illustrates one embodiment of a detailed view ofmodule420. The transmit component at the bottom ofFIG. 6 shows a phase deductor (PD)600 and delay elements ofMDLL510. Each of delay elements, other than the last, has its output coupled to the next delay element and a multiplexer. The last delay element has an output coupled to the multiplexer andPD600. Thus, the PI is capable of receiving, via the multiplexer, the full delay setting of all of the delay elements, or finer delay settings.
The bias voltage is then transmitted from the transmit component to theslave delay line520 of the receive component. The slave delay lines also include delay elements coupled to a PI via a multiplexer. The slave delay lines receive a channel receive/clock strobe.
As shown above, the modular clocking mechanism enables a reduction of the number DLLs from nine to four, and the number of slave delay lines from nineteen to eight. The de-skew resolution is provided back by the additional PIs. Therefore, the modular clocking mechanism has a superior power to data rate scaling than conventional architectures due to the optimal and efficient use of circuit components.
Whereas many alterations and modifications of the present invention will no doubt become apparent to a person of ordinary skill in the art after having read the foregoing description, it is to be understood that any particular embodiment shown and described by way of illustration is in no way intended to be considered limiting. Therefore, references to details of various embodiments are not intended to limit the scope of the claims, which in themselves recite only those features regarded as essential to the invention.