FIELD Embodiments of the invention relate generally to memory, and specifically to merging data from a memory buffer onto serial data channels.
BACKGROUND INFORMATION In memory circuits there is typically a memory read latency that is the time period it takes for valid data to be read out of a memory circuit. A memory write latency is typically also required that is the time period to hold valid data for a memory circuit to write the data into memory. The memory read latency and the memory write latency may sometimes be buffered from a processor by a cache memory. However, there are occasions when the desired data is not found in the cache memory. In those cases, a processor may need to then read or write data with the memory circuits. Thus, the respective memory read latency or memory write latency may be experienced by the processor. If memory circuits differ, the memory read latencies and memory write latency may be inconsistent from one memory circuit to the next. In which case, the memory read latency and memory write latency experienced by a processor will differ.
Previously, memory modules were plugged into a mother or host printed circuit board and coupled in parallel to a parallel data bus over which parallel data could be read from and written into memory. The parallel data bus had parallel data bit lines that were synchronized together to transfer one or more data bytes or words of data at a time. The parallel data bit lines are typically routed over a distance on a printed circuit board (PCB) from one memory module socket to another. This introduces a first parasitic capacitive load. As the memory modules are plugged into a memory socket, an additional parasitic capacitive load is introduced onto the parallel data bits lines of the parallel data bus. As there may be a number of memory modules plugged in, the additional parasitic capacitive load may be significant and bog down high frequency memory circuits.
One memory module is typically addressed by an address on address lines at a time. The one addressed memory module, typically writes data onto the parallel data bus at a time. Other memory modules typically have to wait to write data onto the parallel data bus in order to avoid collisions.
While parallel data bit lines may speed data flow in certain instances, a parallel data bus in a memory may slow the read and write access of data between a memory circuit and a processor.
BRIEF DESCRIPTION OF THE DRAWINGSFIG. 1A illustrates a block diagram of a typical computer system in which embodiments of the invention may be utilized.
FIG. 1B illustrates a block diagram of a client-server system in which embodiments of the invention may be utilized.
FIG. 2A illustrates a block diagram of a central processing unit in which embodiments of the invention may be utilized.
FIG. 2B illustrates a block diagram of another central processing unit in which embodiments of the invention may be utilized.
FIG. 3 illustrates a simplified block diagram of a buffered memory controller to couple data into and out of banks of buffered memory modules.
FIG. 4 illustrates a block diagram of a buffered memory module including a buffer that may merge data with feed through data.
FIG. 5 illustrates a detailed block diagram of a buffered memory controller coupling to a bank of buffered memory modules.
FIG. 6 (FIGS. 6-1 and6-2) illustrates a functional block diagram of a buffer of a buffered memory module.
FIG. 7A illustrates a simplified block diagram of the data merge logic including lanes of data merge logic slices coupled to transmitters.
FIG. 7B illustrates a schematic diagram of a data merge logic slice for one lane of serial data.
FIG. 8 illustrates a timing diagram of signals for a data merge logic slice functioning in a twelve bit mode.
FIG. 9 illustrates a timing diagram of signals for a data merge logic slice functioning in a six-bit mode.
FIG. 10 illustrates a flow chart for the initialization, training, and functioning of the buffer in merging local data and feed through data together into a serial data stream output.
DETAILED DESCRIPTION In the following detailed description of embodiments of the invention, numerous specific details are set forth in order to provide a thorough understanding of the invention. However, it will be obvious to one skilled in the art that the embodiments of the invention may be practiced without these specific details. In other instances well known methods, procedures, components, and circuits have not been described in detail so as not to unnecessarily obscure aspects of the embodiments of the invention.
Generally the embodiments of the invention provide a data merge feature, referred to as a Northbound Data Merge (NBDM), that replaces parts of the data on a high speed link with its own data, on the fly. That is, the embodiments of the invention replace part of the incoming serial data traffic (e.g., “idle packets or frames”) over a serial data link with its local data, without having internal core logic process (e.g., serial-to-parallel conversion, assemblage into frames, and depacketize/deinterleave data) the incoming serial data traffic to determine where to insert the local data and retransmit the incoming data traffic with the local data inserted therein.
Previously, the incoming serial data had to be assembled into frames and received by the core logic in order to transmit local data. Without having to process the incoming serial data in order to transmit local data, an input/output (IO) interface of a memory module may simply retransmit the incoming serial data stream received from other memory modules or the memory controller over the serial data link, bypassing the internal core logic of a buffer integrated circuit. This can reduce data latency in the serial data stream. The portion of the serial data stream that is to be retransmitted is sometimes referred to as “Feed-thru Data” or “Feed-through data” (FTD).
Without any local data to transmit, the IO interface normally retransmits the received serial data stream, bypassing the core logic of the chip. When the core logic of the buffer memory chip needs to transmit local data, it sends a merge request along with the local data to the IO interface. Because the core clock that generates the local data is aligned during training to a frame clock of the high speed serial data link in the embodiments of the invention, the IO interface can readily merge the data at the appropriate frame boundary to replace an idle packet or frame.
Previously it was contemplated that the received serial data would be assembled into frames and received by the core logic and then re-transmitted on the outbound link. In this case, if the core logic had local data to send on the outbound link, it would then replace some of the incoming data with its own data, repacketize and serialize the data onto the outbound link. This would incur a data latency of at least two frames of data. The embodiments of the invention set up the merge timing during initial training so that the local data can be merged into the outbound link without having to receive and analyze the incoming data during normal operation in order to replace idle packets. The embodiments of the invention can reduce the data latency through the buffer memory integrated circuit from at least two frames of data down to a few bit intervals.
In one embodiment of the invention, an integrated circuit is provided that includes a serial input/output interface with one or more lanes. Each lane of the serial communication channel may include a first parallel-in-serial-output (PISO) shift register, a first multiplexer, and a serial transmitter coupled together.
The first parallel-in-serial-output (PISO) shift register has a parallel input coupled to a local data bus, a clock input coupled to a first clock signal, a load/shift-bar input coupled to a load signal. The first PISO shift register serializes parallel data on the local data bus into serialized local data on a first serial output.
The first multiplexer has a first data input coupled to the first serial output, a second data input to receive feed-through data, and a first select control input coupled to a local data select signal. The multiplexer selectively merges the serialized local data and the feed-through data into a serial data stream on a multiplexed output in response to the local data select signal.
The serial transmitter has an input coupled to the multiplexed output of the multiplexer to receive the serial data stream. The serial transmitter drives the serial data stream onto a serial data link.
The feed-through data may be two bits wide while the parallel input to the PISO shift register may be six bits wide and the serial output of the PISO shift register may be two bits wide. In this case, the first multiplexer may be a two bit bus multiplexer such that the serial data stream at the multiplexed output is two bits wide so that the serial transmitter receives a two bit serial data stream and serializes it onto the serial data link as a single bit serial data stream.
Each lane may further include a second multiplexer having a first input to receive resynchronized data, a second input to receive re-sampled data, and a select input coupled to a local clock mode signal. The second multiplexer to select between output the re-sampled data or the resynchronized data as the feed-through data in response to the local clock mode signal. Each lane may further include control logic coupled to the first multiplexer and the first PISO shift register. The control logic may include merge control logic and mode control logic. The control logic may receive the first clock signal and a merge enable signal to generate the local data select signal to merge the serialized local data and the feed-through data into the serial data stream in response to the merge enable signal and the first clock signal.
In another embodiment of the invention, a method for a memory module is provided including receiving an input serial data stream representing feed-through frames of data interspersed between idle frames of data; merging local frames of data and the feed-through frames of data together into an output serial data stream in response to a merge enable signal without decoding the input serial data stream; and transmitting the output serial data stream on a northbound data output to a next memory module or a memory controller. The local frames of data can be merged into the output serial data stream by replacing idle frames of data in the input serial data stream. In receiving the input serial data stream, sampling (also may be referred to as re-sampling) of the bits of data in the input serial data stream or re-synchronizing the bits of data in the input serial data stream may be provided. In merging of the local frames of data and the feed-through frames of data together, serializing parallel bits of the local frames of data into serial bits of data and multiplexing the serial bits of data of the local frames of data and serial bits of the feed-through frames of data into serial bits of the output serial data stream in response to the merge enable signal may be provided. A local frame of data may be selectively received in parallel over a local bus in six bit or twelve bit packets in response to a bus mode signal.
In another embodiment of the invention, a system is provided including: a processor, a memory controller coupled to the processor and at least one bank of memory coupled to the memory controller. The processor is provided to execute instruction and process data. The memory controller is provided to receive write memory instructions with write data from the processor and to receive read memory instructions from the processor and supply read data thereto.
The one bank of memory includes one or more memory modules each of which has a buffer integrated circuit and a random access memory integrated circuit coupled together. The buffer integrated circuit includes a southbound serial input/output interface with one or more serial lanes to receive the write data from the memory controller, and a northbound serial input/output interface with one or more serial lanes of a northbound serial input and a northbound serial output to transmit the read data to the memory controller.
Each serial lane in the northbound input/output interface has a parallel-to-serial converter and a first multiplexer. The parallel-to-serial converter has a parallel input coupled to parallel bits of a local data bus, a clock input coupled to a first clock signal, a load/shift-bar input coupled to a load signal. The parallel-to-serial converter serializes the parallel bits of data on the local data bus into serialized local data on a first serial output. The first multiplexer has a first data input coupled to the serial output of the parallel-to-serial converter, a second data input to receive serial feed-through data from the northbound serial input, and a select input coupled to a local data select signal. The first multiplexer to selectively merge the serialized local data and the serial feed-through data into a serial data stream on the northbound serial output in response to the local data select signal.
Each serial lane in the northbound serial input/output interface may further have a transmitter to drive the serial data stream onto the northbound serial data output towards the memory controller with an input coupled to the multiplexed output of the first multiplexer to receive the serial data stream, the transmitter.
Each serial lane of the northbound serial input/output interface may further include control logic coupled to the multiplexer and the first parallel-to-serial converter. The control logic to receive the first clock signal and a merge enable signal in order to generate the local data select signal and merge the serialized local data and the serial feed-through data into the serial data stream in response to the merge enable signal and the first clock signal.
For each bank of memory in the system, the memory controller includes a northbound serial input interface to receive one or more lanes of serial data from the one or more memory modules, and a southbound serial output interface to transmit one or more lanes of serial data to the one or more memory modules.
In another embodiment of the invention, a buffered memory module is provided including a printed circuit board, a plurality of random access memory (RAM) integrated circuits, and a buffer integrated circuit. The printed circuit board has an edge connection to couple to a receptacle of a host system. The plurality of random access memory (RAM) integrated circuits and the buffer integrated circuit are coupled to the printed circuit board. The buffer integrated circuit is electrically coupled to the plurality of RAM integrated circuits and the edge connection. The buffer integrated circuit has a southbound input/output interface and a northbound input/output interface with data merge logic having a plurality of merge logic slices for a plurality of lanes of serial data streams.
Each merge logic slice of the buffer integrated circuit includes a first parallel-in-serial-output (PISO) shift register and a first multiplexer. The first parallel-in-serial-output (PISO) shift register has a parallel input coupled to a local data bus, a clock input coupled to a first clock signal, a load/shift-bar input coupled to a first load signal. The first PISO shift register to serialize parallel data on the local data bus into serialized local data on a first serial output. The first multiplexer has a first data input coupled to the first serial output of the first PISO shift register, a second data input to receive serialized feed-through data, and a first select input coupled to a local data select signal. The first multiplexer selectively merges the serialized local data and the serialized feed-through data into a serial data stream on a multiplexed output in response to the local data select signal.
Each merge logic slice may further include control logic coupled to the first multiplexer and the first PISO shift register. The control logic receives the first clock signal and a merge enable signal to generate the local data select signal to merge the serialized local data and the serialized feed-through data into the serial data stream in response to the merge enable signal and the first clock signal.
The northbound input/output interface of the buffer integrated circuit in the buffered memory module may further include a plurality of transmitters each having an input coupled to a corresponding output of the first multiplexer in each merge logic slice, the plurality of transmitters to receive the serial data stream and drive it onto a serial data link.
In another embodiment of the invention, a memory system is provided including a plurality of buffered memory modules daisy chained together to form a bank of memory. Each buffered memory module includes a plurality of memory integrated circuits, and a buffer integrated circuit coupled to the plurality of memory integrated circuits. The buffer integrated circuit includes a southbound input/output serial interface to receive and retransmit southbound serial data from a memory controller or a prior buffered memory module to a next buffered memory module, a northbound input/output serial interface to receive northbound serial data from at least one buffered memory module as serialized feed-through data and retransmit it out towards the memory controller, a write data first-in-first-out (FIFO) buffer to store write data from the southbound input/output serial interface addressed to the buffered memory module by a write command, memory input/output interface to transfer write data stored in the write data FIFO buffer into at least one of the plurality of memory integrated circuits and to transfer read data from at least one of the plurality of memory integrated circuits into a read data FIFO buffer, and the read data FIFO buffer to store read data from at least one of the plurality of memory integrated circuits as the local data addressed from the buffered memory module by a read command.
The northbound input/output serial interface serializes the local data from the plurality of memory integrate circuits and merges it into a northbound serial data stream with the serialized feed-through data on a timing basis without decoding the received northbound serial data. The northbound input/output serial interface of includes a third FIFO buffer, data merge logic coupled to the third FIFO buffer, and a plurality of transmitters coupled to the data merge logic.
The data merge logic has a plurality of merge logic slices each including a first parallel-in-serial-output (PISO) shift register to serialize parallel data on the local data bus into serialized local data on a first serial output, and a first multiplexer to selectively merge serialized local data and serialized feed-through data into a serial data stream on a multiplexed output in response to the local data select signal. The PISO shift register has a parallel input coupled to a local data bus, a clock input coupled to a first clock signal, and a load/shift-bar input coupled to a first load signal. The first multiplexer having a first data input coupled to the first serial output of the first PISO shift register, a second data input to receive serialized feed-through data, a first select input coupled to a local data select signal, the first multiplexer.
Each of the plurality of transmitters has an input coupled to a corresponding output of the first multiplexer in each merge logic slice. The plurality of transmitters receive data from the serial data stream and drive it onto a serial data link.
In the memory system, each merge logic slice of the data merge logic may further include control logic coupled to the first multiplexer and the first PISO shift register to receive the first clock signal and a merge enable signal and generate the local data select signal to merge the serialized local data and the serialized feed-through data into the serial data stream.
The memory system may further include a memory controller coupled to at least one of the plurality of buffered memory modules. The memory controller has a southbound output serial interface to transmit the southbound serial data stream to the at least one of the plurality of buffered memory modules and a northbound input serial interface to receive the northbound serial data stream from the least one of the plurality of buffered memory modules.
Referring now toFIG. 1A, a block diagram of a typical computer system100 in which embodiments of the invention may be utilized is illustrated. Thecomputer system100A includes a central processing unit (CPU)101; input/output devices (I/O)102 such as keyboard, modem, printer, external storage devices and the like; and monitoring devices (M)103, such as a CRT or graphics display. The monitoring devices (M)103 may provide computer information in a human intelligible format such as visual or audio formats. The system100 may be a number of different electronic systems other than a computer system.
Referring now toFIG. 1B, aclient server system100B in which embodiments of the invention may be utilized is illustrated. Theclient server system100B includes one ormore clients110A-110M coupled to anetwork112 and aserver114 coupled to thenetwork112. Theclients110A-110M communicate with theserver114 through thenetwork112 in order to transmit or receive information and gain access to any database and/or application software that may be needed on the server. Theserver114 has a central processing unit with memory and may further include one or more disk drive storage devices. Theserver114 may be used in a storage area network (SAN) as a network attached storage (NAS) device, for example, and have an array of disks. The data access to theserver114 is shared over thenetwork112 with themultiple clients110A-110C.
Referring now toFIG. 2A, a block diagram of acentral processing unit101A in which embodiments of the invention may be utilized is illustrated. Thecentral processing unit101A includes aprocessor201, amemory controller202, and afirst memory204A of a first memory channel coupled together as shown and illustrated. Thecentral processing unit101A may further include acache memory203, coupled between thememory controller202 and theprocessor201 and adisk storage device206 coupled to theprocessor201. Thecentral processing unit101A may further include a second memory channel with asecond memory204B coupled to thememory controller202. As illustrated by thecentral processing unit101A, thememory controller202 and thecache memory203 may be external to theprocessor201.
Referring now toFIG. 2B, a block diagram of anothercentral processing unit101B in which embodiments of the invention may be utilized is illustrated. Thecentral processing unit101B includes aprocessor201′ with aninternal memory controller202′ and a first memory channel with amemory204A coupled to theinternal memory controller202′ of theprocessor201′. Theprocessor201′ may further include aninternal cache memory203′. Thecentral processing unit101B may further include asecond memory204B for a second memory channel, and adisk storage device206 coupled to theprocessor201′.
Thedisk storage device206 may be a floppy disk, zip disk, DVD disk, hard disk, rewritable optical disk, flash memory or other non-volatile storage device.
Theprocessor201,201′ may further include one or more execution units and one or more levels of cache memory. Other levels of cache memory may be external to the processor and interface to the memory controller. The processor, the one or more execution units, or the one or more levels of cache memory may read or write data (including instructions) through the memory controller with thememory204A-204B. In interfacing to thememory controller202,202′, there may be address, data, control and clocking signals coupled to the memory as part of the memory interface. Theprocessors201,201′ and thedisk storage device206 may both read and write information into thememories204A,204B.
Each of thememories204A and204B illustrated inFIGS. 2A-2B may include one or more buffered memory modules (MM1-MMn), such as a fully buffered (FB) dual in-line memory module (DIMM), (FBDIMM), or a fully buffered (FB) single in-line memory module (SIMM), (FBSIMM), for example.
Thememory controller202,202′ interfaces to eachmemory204A-240B. In one embodiment of the invention, thememory controller202,202′ particularly interfaces to a buffer (not shown inFIGS. 2A-2B, but seebuffer450A inFIG. 5) in a first buffered memory module MM1 of eachmemory204A-204B. With thememory controller202,202′ interfacing to the buffers of the memory modules, direct interface to the memory devices of the buffered memory modules (MM1-MMn) can be avoided. In this manner, different types of memory devices may be used to provide memory storage while the interface between the buffer and the memory controller can remain consistent.
Referring now toFIG. 3, a buffered memory module (BMM) memory controller (BMMMC)302 coupled to one ormore memory banks304A-304F (generally referred to asmemory bank304 or memory banks304) is illustrated. Thememory controller302 can support more than two channels of memory and more than two memory banks of memory modules. Eachmemory bank304 is made up of a plurality of bufferedmemory modules310A-310H coupled together in a serial chain. This serial chain of bufferedmemory modules310A-310H is also sometimes referred to as a daisy chain. Adjacent memory modules are coupled to each other, sometimes referred to as being daisy-chained together, such asmemory module310A being coupled toadjacent memory module310B, for example.
Each of thememory modules310A-310H in each bank bidirectionally communicate in a serial fashion with thememory controller302 along the serial chain ofmemory modules310A-310H. There is a southbound serial data link (SB) from thememory controller302 to eachmemory bank304 that may also be referred to as an outbound data link with outbound commands (e.g., read and write) and data. All write data from the memory controller that is to be written into the memory modules is sent over the southbound serial data link. There is a northbound serial data link (NB) from eachmemory bank304 to thememory controller302 that may also be referred to as an inbound data link with inbound data. All read data from the memory modules is sent to the memory controller over the northbound serial data link.
In the southbound serial data link (SB), data out from thememory controller302 to amemory bank304 is first coupled to thefirst memory module310A which can read the data and pass it tomemory module310B.Memory module310B can read the data and pass it to the next memory module in the serial chain, and so on and so forth until the last memory module in the southbound serial chain is reached. The last memory module in the southbound serial chain,memory module310H, has no further memory module to pass data to and so the southbound serial data link terminates.
In the northbound serial data link (NB), data is serially communicated in a direction from thememory bank304 to thememory controller302. Each memory module in each memory bank communicates back towards the memory controller on the northbound serial data link (NB).Memory module310H begins a serial chain of memory modules passing data towards the memory controller. Serial data transmitted bymemory module310H passes through or is otherwise retransmitted bymemory module310G. Whilememory module310G may pass or retransmit the serial data from theprior memory module310H, it may also add or merge its own local data to the northbound serial data stream heading to thememory controller302. Similarly, each memory module down the chain passes or retransmits the serial data from the prior memory module and may add or merge their own local data to the northbound serial data stream heading to thememory controller302. The last memory module in the northbound serial chain,memory module310A, transmits the final northbound serial data stream to thememory controller302.
The northbound and southbound serial data links may be considered as providing point to point communication from one memory module to another memory module and so on and so forth along the serial chain. The serial data flow from thememory controller302 out tomemory module310A throughmemory module310H may be referred to as a south data flow. The serial data flow frommemory module310H through memory module310Z to thememory controller302 may be referred to as a northbound data flow. InFIG. 3, the southbound data flow is illustrated by an arrow labeled SB, while the northbound data flow is illustrated by an arrow labeled NB.
Referring now toFIG. 4, a buffered memory module (BMM)310 is illustrated that is exemplary of thememory modules310A-310H. The bufferedmemory module310 may be of any type such as a SIMM or DIMM, for example. The bufferedmemory module310 includes a buffer integrated circuit chip (“buffer”)450 and memory integrated circuit chips (“memory devices”)452 coupled to a printedcircuit board451. Printedcircuit board451 includes an edge connector oredge connection454 that couples to an edge connector of the host printed circuit board. A southbound data input (SBDI) and a northbound data output (NBDO) of thememory module310 is respectively received from or transmitted to a prior buffered memory module or the buffered memory controller. A northbound data input (NBDI) and a southbound data output (SBDO) of thememory module310 is respectively received from or transmitted to a next buffered memory module, if any.
Referring now to bothFIGS. 3 and 4, thememory controller302 communicates with thebuffers450 of eachmemory module310A-310H in eachmemory bank304 by using the southbound data flow and the northbound data blow. Theedge connection454 of the first memory module being the closest to the memory controller in each bank,memory module310A, couples thebuffer450 of eachmemory module310A to thememory controller302.Memory module310A has no adjacent memory module in the northbound data flow path. The northbound data flow frommemory module310A is coupled to thememory controller302. Theadjacent memory module310A-310H in each bank are coupled together so that data can be read, written, and passed through eachbuffer450 of each memory module. The last memory module being the furthest from the memory controller in each bank,memory module310H, has no adjacent memory module in the southbound data flow path. Thus,memory module310H does not pass southbound data flow further along the serial chain of memory modules.
Thememory controller302 does not directly couple to thememory devices452 in any memory module. Thebuffer450 in eachmemory module310A-310H in eachmemory bank304 couples directly to thememory devices452 on the printed circuit board351. Thebuffer450 provides data buffering to all the memory integrated circuit chips ordevices452 on the same printedcircuit board451 of thememory module310. Thebuffer450 further performs serial to parallel conversion and parallel to serial conversion of the data, as well as interleaving/deinterleaving and packetizing/depacketizing of data as needed. Thebuffer450 also controls its portion of the serial chain of the northbound and southbound data links with adjacent memory modules. Additionally, in the case of the first memory module,memory module310A, thebuffer450 also controls its portion of the serial chain of the northbound and southbound data links with thememory controller302. Additionally, in the case of the last memory module,memory module310H, thebuffer450 also controls the initialization of the serial chain of memory modules and the generation of idle frames or idle packets of data in the northbound data link and northbound data flow to thememory controller302.
Without a direct coupling between thememory controller302 and thememory devices452 of the memory modules, the memory chips ordevices452 may be of different types, speeds, sizes, etc. to which thebuffer450 may communicate. This allows improved memory chips to be used in a memory module without needing to update the hardware interface between the memory controller and the memory modules by purchasing a new host or motherboard printed circuit board. The memory module that plugs into the host or motherboard printed circuit board is updated instead. In one embodiment of the invention, the memory chips, integrated circuits, ordevices452 are DDR memory chips with dynamic random access memory (DRAM). Otherwise, in other embodiments of the invention, the memory chips, integrated circuits, ordevices452 can be any other type of memory or storage device.
Referring now toFIG. 5, onememory bank304 of thememory banks304A-304F of a memory system is illustrated in greater detail coupled to the buffered memory module (BMM)memory controller302. In one embodiment of the invention, theBMM memory controller302 is a fully buffered dual inline (FBD) memory controller and each of thememory modules310A-310H is a fully buffered dual inline (FBD) memory module (FBDIMM). Thememory bank304 includes one ormore memory modules310A-310ndaisy chained together. Eachmemory module310 functions like a repeater for the valid data flowing in the serial bit streams along the northbound data link (NB) and the southbound data link (SB).
Eachmemory module310A-310nin thememory bank304 includes abuffer450A-450n,respectively. Each bufferedmemory module310A-310N respectively includesmemory devices452A-452N which may differ from each other. For example, thememory devices452A in bufferedmemory module310A may differ from the memory devices452B in bufferedmemory module310B. That is, thebuffer450 in each memory module makes the type of memory used for the memory device transparent from thememory controller302.
Thebuffer450 in each memory module functions like a repeater for the data flowing in the serial bit streams along the northbound data link (NB) and the southbound data link (SB). Additionally, thebuffer450 in each memory module may insert or merge its own local data into lanes of serial bit streams flowing along the northbound data link (NB) in place of frames or partial frames of idle or invalid data.
In order to synchronize the timing of thememory controller302 and thememory modules310A-310ntogether in thememory bank304, aclock generator500 is provided that is coupled to each memory module and the memory controller. Aclock signal501 from theclock generator500 is coupled to thememory controller302. Clock signals502A-502nare respectively coupled to thebuffers450A-450ninmemory modules310A-310n.
Memory controller302 communicates through the memory modules in thememory bank304 over the southbound data links SB1-SBn.Memory controller302 may receive data from eachmemory module310 within thememory bank304 over the northbound data links NB1-NBn. The southbound data links SB1-SBn may consist of one or more lanes of serial data. Similarly, the northbound data links NB1-NBn may consist of one or more lanes of serial data. In one embodiment of the invention, there are fourteen lanes of serial data in the northbound data links NB1-NBn.
The last memory module,memory module310n,regardless of whether or not it has data to send, generates a pseudorandom bit stream and starts it flowing towards thememory controller302 on the northbound link NBn. The pseudorandom bit stream may be passed from one memory module to the next on the northbound links NB1-NBn. If thememory module310nhas local data to send to thememory controller302, it generates a frame of data including the local data and places it on the northbound link NBn instead of a frame of data of the pseudorandom bit stream. The pseudorandom bit stream may include a sequence of bits packetized into a frame of data that indicates an idle frame of data. An idle frame of data may be replaced by the other memory modules further down the line (memory modules310A-3109n-1) in order to merge a frame of local data into the serial bit stream flowing on the northbound links NB1-NBn. For example,memory module310B may receive an idle frame on the incoming northbound link NB3 and merge a frame of local data in place of the idle frame into the serial bit stream on the outgoing northbound link NB2.
The memory system illustrated inFIG. 5 may further include an SM bus (SMBus)506 coupled from thememory control302 to each of thememory modules310A-310N. TheSM bus506 may be a serial data bus. TheSM Bus506 is a sideband mechanism to access internal registers of the buffer. Certain link parameters may be set up by a BIOS in the buffer before brining up the northbound and southbound serial data links. The SM-Bus may also be used to debug the system through access to the internal registers of the buffer.
Thememory controller302 may be a part of a processor (as illustrated byprocessor201′ andmemory controller202′ inFIG. 2B) or may be a separate integrated circuit (as illustrated byprocessor201 andmemory controller202 inFIG. 2A). In either case, thememory controller302 can receive write memory instructions with write data from the processor and receive read memory instructions from the processor and supply read data to the processor in order to respective write or read data to or from memory. Thememory controller302 may include a southbound serial output interface (SBO)510 to transmit one or more lanes of serial data to the one or more memory modules in each bank of memory. Thememory controller302 may further include a northbound serial input interface (NBI)511 to receive one or more lanes of serial data from the one or more memory modules in each bank of memory.
Referring now toFIG. 6 (FIGS. 6-1 and6-2), a functional block diagram of thebuffer450 for the bufferedmemory module310 is illustrated.Buffer450 is an integrated circuit that can be mounted to the printedcircuit board451 of the bufferedmemory module310. To couple data into and out of bufferedmemory module310,buffer450 includes a southbound buffer I/O interface600A and a northbound buffer I/O interface600B.
The northbound buffer I/O interface600B interfaces to the northbound data out (NBDO)601 and the northbound data in (NBDI)602. The southbound buffer I/O interface600A interfaces to the southbound data in (SBDI)603 and the southbound data out (SBDO)604. Northbound data in602 and the northbound data out601 includes fourteen lanes of a serial data stream in one embodiment of the invention. The southbound data in603 and the southbound data out604 includes ten lanes of serial data streams in one embodiment of the invention.
To interface to thememory devices452,buffer450 includes a memory I/O interface612. At the memory I/O interface612, DRAM data is bidirectionally passed over a DRAM DATA/STROBE bus605 while addresses and commands are sent out over DRAM ADDRESS/COMMAND buses606A-606B to the memory devices.Memory devices452 are clocked by theDRAM clock buses607A-607B in order to synchronize data transfer with the memory I/O interface612. From the core logic of thebuffer450, the memory I/O interface612 receives commands over theCMD OUT bus692 from themultiplexer635; addresses over theADD OUT bus693 from themultiplexer637; and write data over theDATA OUT bus691 from themultiplexer636. The write data on theDATA OUT bus691 is communicated to the appropriate memory devices over the DRAM DATA/STROBE bus605. Address data on theDATA OUT bus691 is communicated to the appropriate memory devices over the DRAM ADDRESS/COMMAND buses606A-606B. The commands on theCMD OUT bus692 is communicated to the appropriate memory devices over the DRAM ADDRESS/COMMAND buses606A-606B.
In order to generate thecore_clock signal611 for the functional blocks of thebuffer450, it receives a reference clock (REF CLOCK)502 that is coupled into a phase lock loop (PLL)613. The reference clock (REF CLOCK)502 may be a differential input signal and appropriately received by a differential input receiver. Buffer450 further receives anSM bus506 coupled to anSM bus controller629. A reset signal (Reset#)608 is coupled into areset control block628 in order to reset thebuffer450 and the functional blocks when it goes active low.
Between the memory I/O interface612 and the buffer I/O interfaces600A-600B is the core logic of thebuffer450. The core logic of thebuffer450 is used to read data out from the memory devices and drive it out as local data through thenorthbound data interface600B. Additionally, any other response from a memory module is driven out by the buffer and into the northbound serial data stream through thenorthbound data interface600B. The core logic of thebuffer450 is also used to write data into the memory devices that is received from thesouthbound data interface600A. The commands to read and write data are received from thesouthbound data interface600A. If thememory devices452 of the givenbuffered memory module310 are not to be accessed, serial data on thenorthbound data input602 and thesouthbound data input603 may pass through the buffer I/O interface600A-600B onto thenorthbound data output601 and thesouthbound data output604, respectively. In this manner, data from another bufferedmemory module310 will be passed through to the memory controller on thenorthbound data interface600B without having to be processed by the core logic of thebuffer450. Similarly, data from the memory controller may be passed on to another memory module on thesouthbound data interface600A without having to be processed by the core logic of thebuffer450.
The core logic of thebuffer450 includes functional blocks to read data from and write data into thememory devices452. The core logic of thebuffer450 includes a phase lock loop (PLL)613, adata CRC generator614, a read FIFO buffer6633, a five into onebus multiplexer616, a sync andidle pattern generator618, aNB LAI buffer620, an integrated built in self-tester for the link (IBIST)622B, a link initialization SM and control and configuration status registers (CSRs)624B, areset controller625, a core control and configuration status registers (CSRs) block627, anLAI controller block628, anSMbus controller629, an external MEMBISTmemory calibration block630, and afailover block646B coupled together as shown inFIG. 6. The core logic of thebuffer450 may further include a command decoder andCRC checker block626, an idle built in self-tester (IBIST)block622A, a link initialization SM and control and CSRs block624A, a memory state controller andCSRs632, a writedata FIFO buffer634, a four into onebus multiplexer635, a four into onebus multiplexer636, a three into onebus multiplexer637, anLAI logic block638, an initialization patterns block640, a two into onebus multiplexer642, and afailover block646A coupled together as shown inFIG. 6.
A multiplexer includes at least two data inputs, an output, and at least one control or select input to select the data input that is to be provided at the output of the multiplexer. For a two input multiplexer, one control or select input is used to select the data that is output at the multiplexer. A bus multiplexer receives a plurality of bits at each data input and has an output with a plurality of bits as well. A two into one bus multiplexer has two buses as its data input and a single bus output. A three into one bus multiplexer has three buses as its data input and a single bus output. A four into one bus multiplexer has four buses as its data input and a single bus output.
Within thebuffer450, each of the buffer I/O interfaces600A-600B includes aFIFO buffer651, data mergelogic650, atransmitter652, areceiver654, are-synchronization block653, and a demultiplexer/serialparallel converter block656. Data can pass through each of the buffer I/O interfaces600A-600B through aresynchronization path661 or a/re-sample path662 without interfacing to the core logic. Through the embodiments of the invention, local data associated with thebuffer450 can be merged into the serial data stream to overwrite an idle frame without having the core logic receive a serial data stream and determine where the idle frames are located there-in.
Themultiplexer616 selects what data is directed towards theFIFO buffer651 of the northbound buffer I/O interface600B for driving out as local data on the serial lanes of thenorthbound data output601. Generally, themultiplexer616 may select status or other control information from the core control and CSR block627, read data from the readFIFO buffer633, read data with attached CRC data from theCRC generator614, synchronization or idle patterns from thepattern generator618, or test pattern data from the IBIST block622B.
Themultiplexer642 selects what data is directed towards theFIFO buffer651 of the southbound buffer I/O interface600A for driving out on the serial lanes of thesouthbound data output604. Generally, themultiplexer642 may select initialization patterns from the init patterns block640 or test pattern data from theIBIST block622A.
Referring now toFIG. 7A, a block diagram of the data mergelogic650 coupled to thetransmitter652 is illustrated. Thetransmitter652 is made up of N lanes oftransmitters752A-752n.As discussed previously, in one embodiment of the invention the number of lanes is ten. In another embodiment of the invention, the number of lanes is fourteen. In the data mergelogic650 there is a datamerge logic slice700A-700nfor each one of the N lanes.
A parallellocal data bus660 from the first-in-first-out (FIFO) buffer651 couples into each data mergelogic slice700A-700n.Respective lanes of serial data of there-synch bus661 couple into each respective data mergelogic slice700A-700n.The bit width of there-synch bus661 is two times the number of lanes. Two bits of each respective lane of there-synch bus661 are coupled into each respective data mergelogic slice700A-700N. Respective lanes of serial data of there-sample bus662 couple into each respective data mergelogic slice700A-700n.The bit width of there-sample bus662 is two times the number of lanes. Two bits of each respective lane of there-sample bus662 are coupled into each respective data mergelogic slice700A-700N.
There-sample bus662 and theresynch bus661 both transfer a two bit serial data stream for each lane into each respective data mergelogic slice700A-700N. In contrast, theparallel data bus660 couples six or twelve bits for each lane into each respective data mergelogic slice700A-700N. The bit width of the parallellocal data bus660 is twelve times the number of lanes. However in a six-bit mode, only six bits of the twelve may be active per lane. The output from each of the data mergelogic slices700A-700N is a two bit serial data stream which is respectively coupled into theserial transmitters752A-752N. Each serial transmitter752 converts two parallel bits of serial data into a single bit serial data stream on therespective lane601A-601N of the northbound data output (NBDO)604 or therespective lane604A-604N of the southbound data output (SBDO)601 as shown inFIG. 7A.
Referring now toFIG. 7B, a schematic diagram of a datamerge logic slice700iis illustrated coupled to atransmitter752i.The data mergelogic slice700irepresents one of the data mergelogic slices700A-700nfor each of the N lanes illustrated inFIG. 7A. Thetransmitter752irepresents one of thetransmitters752A-752nfor each of the N lanes illustrated inFIG. 7A.
Each data mergelogic slice700ican operate in one of two bit width modes, a full frame mode of twelve bits width (also referred to as a 12 bit mode) or a half frame mode of six bits width (also referred to as a six-bit mode). A mode control signal (6bit_mode)722 indicates and controls which of the two bit width modes the data mergelogic slice700iis to function with the core logic.
In the full frame mode or twelve bit mode, the core logic uses a full frame of twelve bits to communicate data overbus660iwith the data mergelogic slice700i.The lower six bits ofbus660iare represented by the Data[5:0]bus726 while the upper six bits ofbus660iare represented by a Delayed— data[5:0] bus 727. The twelve bits of local data (Data[5:0] and Delayed_data[5:0]) that are to be merged into the serial data stream and transmitted are respectively latched into a lower parallel-in-serial-output (PISO)converter708B and an upper parallel-in-serial-output (PISO)converter708A at the beginning of the frame by an “Early_Load_Pulse”control signal720.
The lower parallel-in-serial-output (PISO)converter708B and the upper parallel-in-serial-output (PISO)converter708A are parallel-in-serial-output (PISO) shift registers and may be also be referred to herein as such. Each of thePISO converters708A-708B, also referred to as PISO shift registers708A-708B, have a parallel data input, a clock input, a load/shift-bar input, a serial input (SIN), and a serial output (SO). The serial output of the upperPISO shift register708A is coupled into the serial input of the lowerPISO shift register708B to support serializing twelve parallel bits of thelocal data bus660i.The serial input of the upperPISO shift register708A may be coupled to a logical low (e.g., ground) in one embodiment of the invention or a logical high (e.g., VDD) in another embodiment of the invention. The serial output (SOUT) of the PISO shift registers708A-708B is two bits at a time in one embodiment of the invention. In another embodiment of the invention, the serial output (SOUT) of the PISO shift registers708A-708B may be one bit at a time.
In the twelve bit mode, the six bits ofbus726 are coupled to the parallel data input (PIN) of the lowerPISO shift register708B while the six bits ofbus727 are coupled to the parallel data input (PIN) of the upperPISO shift register708A. These twelve bits are respectively loaded into each PISO shift register during theearly load pulse720 with themode control signal722 indicating a twelve bit bus mode (e.g.,mode control signal722 indicates twelve bit mode by being a logical low level and a six-bit mode by being a logical high level in one embodiment of the invention). In the twelve bit mode, the clear input to D-type flip flop706A is logically high setting the Q output of the D-type flip flop706A to logical zero so that the control input to themultiplexer703 selects thebus726 to be output ontobus728.
In the half frame mode or six-bit mode, the core logic only uses a half frame of six bits to communicate data overbus660iwith the data mergelogic slice700iat a time. The core logic sends six bits of data at a time or early data (Data[5:0]726) and late data (Delayed_data[5:0]) offset by half of a frame. In the half frame mode, only the lowerPISO shift register708B of the data mergelogic slice700iis used to merge data into the serial data stream for transmission.
In the six-bit mode, themultiplexer703 selectively couples the six bits ofbus726 to the parallel data input (PIN) of the lowerPISO shift register708B during theearly load pulse720 and the six bits ofbus727 to the parallel data input (PIN) of the lowerPISO shift register708B during thelate load pulse721. The six bits ofbus726 are loaded into thePISO shift register708B during thelate load pulse721 with themode control signal722 indicating a six-bit bus mode. The six bits ofbus727 are loaded into thePISO shift register708B during theearly load pulse720 with themode control signal722 indicating a six-bit bus mode.
The data mergeslice700iincludes data path logic andcontrol logic701i.The data path logic selectively merges the local data and the feed-through data into the serial bit stream. Thecontrol logic701icontrols the data path logic in each data merge slice in order to properly synchronize the merging of local data and feed-through data into the serial bit stream.
Thecontrol logic701i,with mode control logic and merge control logic, includes three single bit two to onemultiplexers702A-702C, set/reset D flip-flops706A-706B, an ORgate710, an ANDgate711, and aninverter712 coupled together as shown and illustrated inFIG. 7B. The signals generated by thecontrol logic701iare coupled to the data path logic. Themultiplexers702A-702B, the D-type flip flop706A, theOR gate710, the ANDgate711, and theinverter712 provides mode control logic. Themultiplexer702C and the D-type flip flop706B provide merge control logic.
The data path logic includes a six-bit two to onebus multiplexer703, two bit two-to-one bus multiplexers704-705, and a pair of six-bit in/two bit out parallel in serial out (PISO)converters708A-708B coupled together as shown and illustrated inFIG. 7B.
Eachslice700iof the data mergelogic650 may receive a two bit serial lane of resynch data661i,a two bit serial lane ofre-sample data662i,and a twelve bit parallel lane oflocal data660i.The parallel lane oflocal data660iis from the core logic of thebuffer450 and may be various types of data. For example, thelocal data660imay be read data frommemory devices452, cyclic redundancy check (CRC) data, test data, status data, or any other data that is to be received, transmitted, or generated by the core logic of the buffer.
The two bit lane of re-sync data661iand the two bit lane ofre-sample data662ihave no contact with the core logic of the givenbuffer450 and are multiplexed into feed-through data (also referred to herein as “feedthru data”)725 bymultiplexer705 in response to a localclock mode signal736. If thebuffer450 is operating in the local clock mode, the resynch data is multiplexed onto thefeedthru data725. If thebuffer450 is not operating in the local clock mode, there-sample data662iis multiplexed onto thefeedthru data725. In a local clock mode, a phase locked loop (PLL) clock generator is used to generate a local clock signal in the buffer that is used to resynchronize the input serial data stream to generate the re-synch data. If not in the local clock mode, a received clock is generated from and synchronized with the frames of data in the received serial data stream that is used to sample the input serial data stream to generate the re-sample data. The clock2UI signal723 is switched between the locally generated clock signal and the received clock signal in response to the localclock mode signal736. The source of thefeedthru data725 may be from thebuffer450 of anothermemory module310 on the northbound (NB) side (also referred to as forwarded northbound data); or from thebuffer450 of anothermemory module310 on the southbound (SB) side (also referred to as forwarded southbound data) or alternatively from thememory controller302 on the southbound (SB) side.
The two to onebus multiplexer704 receives the two bits of serialfeedthru data725 as a first input, a two bit serial output from the 6-2PISO shift register708B as a second input, and a local data select signal (PISO_SEL)732 at its control input. The two bitserial output735 from the 6-2PISO shift register708B is two serialized bits oflocal data735 from theparallel data bus660i. Thus in response to the local data select signal (PISO_SEL)732, themultiplexer704 either selects to output two bits offeedthru data725 or two bits of serializedlocal data735 from theparallel data bus660igenerated by the6-2PISO shift register708B. The twobit output730 from themultiplexer704 is coupled to the transmitter752 and further serialized into a single bit onto the lane NBDOi/SBDOi601i,604i. In this manner, local data from the core logic can be multiplexed with feed-through data and merged into a lane of the serial bit stream at NBDOi/SBDOi601i,604i.
The local data select signal (PISO_SEL)732 that controls the merging of data into the serial bit stream is generated by the D flip-flop706B. In response to a merge enablesignal724, the D flip-flop706B generates the local data select signal (PISO_SEL)732 on the rising edge of theclock signal Clock_2UI723. The merge enablesignal724 is coupled into a first input of themultiplexer702C. The local data select signal (PISO_SEL)732 is fed back and coupled into a second input of themultiplexer702C. The output ofmultiplexer702C is coupled into the D input of the D flip-flop706B. An early load pulse (EARLY_LD_PULSE) signal720 is coupled into the select control input of themultiplexer702C. If theearly load pulse720 is active high, the merge enablesignal724 is driven out by themultiplexer702C and coupled into the D input of the D flip-flop706B. If theearly load pulse720 is low, the local data select signal (PISO_SEL)732 is fed back through themultiplexer702C and coupled into the D input of D flip-flop706B to retain the current state of the local data select signal (PISO_SEL)732. As theearly load pulse720 is periodically clocked, if the merge enablesignal724 is low, it clears theD flip flop706B so its Q output is a low logic level signal that terminates the merge of data at the appropriate time.
The merge enablesignal724 is synchronized into the local data select signal (PISO_SEL)732 on the edge of theclock signal Clock_2UI723. As the merge_enable signal724 is sampled during theearly_load_pulse720 to generate the local data select signal (PISO_SEL)732, themultiplexer704 is switched on frame boundaries (12 bits of data per lane in a frame). If the merge enablesignal724 is active high on the rising edge of theclock signal Clock_2UI723, the local data select signal (PISO_SEL)732 goes active high to control themultiplexer704 to select the two serialized bits oflocal data735 as its twobit output730. If the merge enablesignal724 is low on the rising edge of theclock signal Clock_2UI723, the local data select signal (PISO_SEL)732 remains low to control themultiplexer704 to select the two feedthru bits ofdata725 as its twobit output730.
In response to the local data select signal (PISO_SEL)732 being a logical high, the two serial bits in theparallel data bus660iare to be merged into the lane NBDOi/SBDOi601i,604i. In response to the local data select signal (PISO_SEL)732 being a logical low, the two bits offeedthru data725 is selected bymultiplexer704 to be output onto the lane NBDOi/SBDOi601i,604i.
As the local data select signal (PISO_SEL)732 is responsive to the merge enablesignal724, the generation of the merge enablesignal724 allows the parallel data ofbus660ito be merged onto the serial data stream of the lane NBDOi/SBDOi601i,604i. The merge enablesignal724 is generated by link control logic (in the link init SM and control and CSRfunctional block624B illustrated inFIG. 6) in time to allow local data to be merged into the serial data stream at the appropriate time.
Referring momentarily back toFIG. 5, the timing of the merge enable signal is established for eachmemory module310 during initialization and training of the system. Note that for thelast memory module310nin abank304, the merge enable signal is more of a data transmit signal as there are no further memory modules in the chain generating data in the northbound data link.
Referring now toFIG. 10, a flow chart is illustrated for the initialization, training, and functioning of the buffer in merging local data and feed through data together into a serial data stream output. The flow chart starts atblock1000.
Atblock1002, the buffer in each memory module of each memory bank is initialized. During initialization of amemory bank304, each memory module has its southbound and northbound serial data links initialized (may also be referred to as being part of link training). Thememory controller302 sends out an initialization pattern on the southbound (SB) data link SB1-SBn. During initialization, the buffer450nin thelast memory module310nreceives the initialization pattern on the southbound data link SBn and retransmits it back onto the northbound (NB) data link NB1-NBn through other memory modules back to thememory controller302. As each buffer has its own clock, the initialization pattern received on the northbound (NB) data link NB1-NBn by the buffer is used for bit locking and frame alignment purposes in each lane of serial data. The clock in the buffer may be synchronized to the initialization pattern. The timing of logic may be aligned with the initialization pattern in order to receive packets of data in the serial data stream as well as parse a header from a frame of data and any error correction/detection or other data fields within a packet. The generation of theEarly_Ld_Pulse720 is set up to be coincident with the beginning of frames of data received by a given memory module. The generation of theLate_LD_Pulse721 is set up to be at a half frame boundary of frames of data received by a given memory module.
Next atblock1004, each buffer in each memory module of each memory bank is trained. After sending out the initialization pattern, thememory controller302 sends out a training pattern through to thelast memory module310nin a givenbank304 during training. During training, the buffer450nin thelast memory module310nreceives the training pattern on the southbound data link SBn and retransmits it back onto the northbound (NB) data link NB1-NBn through other memory modules back to thememory controller302. Each memory module observes one of the training patterns on the southbound (SB) data link and determines the amount of time or clock cycles for it to return to the same memory module on the northbound (NB) data link. A roundtrip time is determined for a given position of each memory module.
Provided that the requests are not overly bunched together, the roundtrip time represents a slot in time where it is safe for a given memory module to merge data onto the northbound data link without colliding with valid data of another memory module. At a given memory module, an idle data packet is expected to be received at this point in time on the northbound data link after seeing a memory request command on the southbound data link. At this point in time, the idle data packet can be replaced by a local data packet. The roundtrip time and the command to data delay time for a given memory module are the basis for setting up the timing of the merge enable signal that is used to control the merging of local data into the northbound data link. If the roundtrip time is long, data can be fetched in advanced and placed in a FIFO buffer waiting for the proper moment to be merged into the northbound data stream. The distance between read and write FIFO buffer pointers in the northbound interface of the buffer can be set based on the roundtrip timing.
The roundtrip time may be determined as a function of a whole number of periods of the bit rate clock,clock_2UI723. The number of memory modules in a channel and the command to data delay of the last memory module in the channel determines the round-trip time for that channel.
A command to data delay for each memory module may be further determined to assist in establishing the timing of the merge enable signal in each memory module. The command to data delay timing may include one or more of the following time periods: the time for a command to be transferred from thesouthbound IO interface600A to thememory IO interface612; the time for the command to be transferred from thememory IO interface612 to thememory devices452; differences in clock timing for thememory IO interface612 and thememory devices452, routing delays in the clock signals and command signals to thememory devices452; any set-up/hold times for thebuffer450 and thememory devices452; the read latency in the memory devices452 (e.g., CAS timing and any added latency); routing delays in the data signals and strobe signals from thememory devices452 to thebuffer450; data delay skew between memory devices; delays through thememory IO interface612, any set-up/hold times for thebuffer450 and thememory devices452; and time for data to be transferred from thememory IO interface612 to thenorthbound IO interface600B (this may include buffering and clocking delays for data within the buffer450). The command to data delay timing may be determined as a number of multiple of frames or a fraction there-of with a granularity of the delay time being as a function of a whole number of periods (bit times such as frame/12 or clock_2ui/2) of a bit rate clock. The command to data delay timing of a memory module, such as thelast memory module310n,can be programmatically increased by a register setting if additional delay time is desired.
Next atblock1006, after the initialization and training, each buffer is ready to receive an input serial data stream from a serial data input. However, the buffer in thelast memory module310nin thememory bank304 either transmits idle packets or read requested data packets on the northbound data link towards thememory controller302. Otherwise, an input serial data stream is received that represents feed-through frames of data interspersed between idle frames of data.
Next atblock1008, a determination may be made with respect to the availability of local data. If there is local data to merge into the serial data stream, then the control flow jumps to block1010. If there is no local data to merge into the serial data stream, then the control flow jumps to block1014.
Atblock1014 with no local data to merge, the feed-through data is transmitted onto the serial data output. The feed-through data may have its bits of data in the input serial data stream re-sampled. Alternatively, feed-through data may have its bits of data in the input serial data stream resynchronized. Then the control flow jumps back to go to block1006 to continuously receive the input serial data stream.
Atblock1010 with local data to merge, frames of the local data replace the feed-through data in the output serial data stream. That is, if local data needs to be sent by a buffer, frames of data in the incoming serial data stream are dropped and frames of local data are sent in place thereof in response to the merge enable signal. The frames of the local data and the feed-through data may be merged together by serializing parallel bits of the local frames of data into serial bits of data and then multiplexing the serial bits of data of the local frames of data and the serial bits of the feed-through frames of data into serial bits of the output serial data stream in response to the merge enable signal. During initialization and training, the host and memory controller ensures that idle frames of data in the input serial data stream are replaced by local frames of data. The buffer does not need to check if the incoming frame in the input serial data stream that is being replaced is an idle frame of data or not.
Atblock1012, the output serial data stream, including the merged data, is transmitted onto the serial data output to the next memory module up the chain or alternatively to the memory controller.
Next, the control process jumps back to block1006 to continue receiving the input serial data stream from the serial data input.
As discussed previously, the local data from the core logic and thebuffer450 may be output in six bits, chunks or twelve bit chunks at a time. The mode control signal (6 bit_mode)722 determines whether the data mergelogic slice700iis to function in a six-bit mode (half frame mode) or a twelve bit mode (full frame mode). The mode control signal (6 bit_mode)722 is coupled into the selection or control input of themultiplexer702A and a first input of the ANDgate711, and the input to theinverter712.
The earlyload pulse signal720 controls the loading of the first six bits on theparallel data bus660i. A lateload pulse signal721 controls the loading of the second six bits on theparallel data bus660i. Alate load pulse721 is coupled into a first input of theOR gate710. The early loadpulse control signal720 is coupled into the first input of themultiplexer702B, the second input of theOR gate710, the first input of themultiplexer signal702A, a load/shift-bar input of the 6-2PISO shift register708A, and the select input of themultiplexer702C.
Theclock signal Clock_2UI723 couples into the clock inputs of the D flip-flops706A-706B, and the clock inputs of the 6-2 PISO shift registers708A-708B. The output ofmultiplexer702A is coupled into the load/shift-bar input of the 6-2PISO shift register708B.
The parallel input of the 6-2PISO shift register708A is coupled to the six bit delayeddata bus727. The two bit serial output of the 6-2PISO shift register708A is coupled into the two bit serial input of the 6-2PISO shift register708B. The parallel input of the 6-2PISO shift register708B is coupled to the six-bit output from themultiplexer703. In this manner when a data mergelogical slice700iis in a 12 bit mode, 12 bits of data can be loaded into the 6-2PISO shift register708A-708B and then shifted serially out from the 2 bitserial output708B, through themultiplexer704 and coupled into thetransmitter752i.
Theserial transmitter752iis double clocked by a clock signal in order to convert the 2 parallel bit sets into the serial single bit at itsoutput601i,604i.
The data mergelogical slice700iis in a 12 bit mode when the 6bit_mode control signal722 is a logical low. The data merge logical/700iis in a 6 bit mode when the 6 bit mode controls signal722 is a logical high. Control logic710-712 in conjunction with themultiplexer702B and D flip-flop706A generate a data bus select (Data_Sel) signal729 which is coupled to the select input of themultiplexer703 in order to establish a 12 bit mode or a 6 bit mode in response to the 6 bitmode control signal722. When the data busselect signal729 is logically low, 12 bits of data are to be loaded in parallel into the 6-2 PISO shift registers708A-708B. When the data busselect signal729 is a logical high, 6 bits of thedata bus727 are to be coupled into the 6-2PISO shift register708B.
In a 6 bit mode, either the earlyload pulse signal720 or thelate load pulse721 can load parallel data into the 6-2PISO shift register708B. In either the 6 bit or 12 bit mode, theearly load pulse720 is only used to load parallel data from thedata bus727 into the 6-2PISO shift register708A.
The serial input of the 6-2PISO shift register708A is coupled to ground such that only zeros will be serially shifted in behind the data to be transmitted. Alternatively, the serial input of the 6-2PISO shift register708A may be connected to VDD such that only logical ones are serial shifted in behind the data being transmitted.
The Q output of D flip-flop706A is coupled into the second input of themultiplexer702B such that when the output of ANDgate711 is a logical row, the Q output couples into the D input of the D flip-flop706A to retain the loaded logic state therein of the data bus select (DATA_SEL)signal729.
Referring now toFIG. 8, a timing diagram of waveforms depicting the data mergelogic slice700ifunctioning in a twelve-bit mode is illustrated. That is, the 6 bitmode control signal722 is a logical low in the timing diagram ofFIG. 8.
InFIG. 8,Clock_2UI signal723 is illustrated by thewaveform823. Thecore clock signal611 is illustrated by thewaveform811. The lower six bits of data (MEM_DATA IN [5:0])690A on theparallel data bus690 is illustrated by thewaveform890A. The upper six bits of data (MEM_DATA IN [11:6])690B on theparallel data bus690 is illustrated by the waveform diagram890B. The lower six bits of data (FBD_DATA [5:0])726 on theparallel data bus660iis illustrated by the waveform diagram826. The upper six bits of data (FBD_DATA [11:6])727 on theparallel data bus660iis illustrated by the waveform diagram827. The merge enablecontrol signal724 is illustrated by the waveform diagram824. The early loadpulse control signal720 is illustrated by thewaveform820. The late loadpulse control signal721 is illustrated by thewaveform821. The local data select control signal (PISO_SEL)732 is illustrated by thewaveform832. The single bit serial outputdata stream NBDOi601iis illustrated by thewaveform801.
Without any local data to merge into the northbound serial data stream, thebuffer450 passes the received bits on the northbound data input602 (“Feedthru Data”725) to thetransmitter752iin the high speed clock domain, bypassing the core logic of thebuffer450. The local data select control signal (PISO_SEL)732 is low when theFeedthru Data725 is multiplexed into thetransmitter752ias is illustrated by thewaveform832.
As discussed previously, the “Early_Ld_Pulse”720 is set up to be coincident with the beginning of a frame (as seen on the link) and thelate_ld_pulse721 is set up to be at half frame boundary during the initial training of a lane of the serial data link. A frame of data is a logical unit of data over the link when in a full frame operational mode and is made up of twelve bits of data in one embodiment of the invention.
In full frame operational mode, twelve bits of a frame are loaded into the PISO shift register using the “Early_Ld_Pulse”signal720. The “late_ld_ pulse”signal721 is not used to load bits into the PISO shift registers. Both the upper and lower PISO shift registers708A-708B are used in this mode. The sixbit_mode control signal722, being low in the twelve bit mode, causes the Data_Sel”signal729 to be low in twelve bit mode by clearing the output of theD flip flop706A. With the “Data_Sel”signal729 being low in the twelve bit mode, the six lower data bits (FBD_DATA[5:0]726 ofbus660iare coupled into the lowerPISO Shift register708B through themultiplexer703.
The periodic generation of theEarly_Ld_Pulse720 also enables sampling of the “Merge_enable”signal724 by the D flip-flop706B. The periodic generation of theEarly_Ld_Pulse720, being active high, selectively controls themultiplexer702C to select the merge_enable signal724 as its output data that is coupled into the data input D of the D flip-flop706B.
As discussed previously, the merge enablesignal724 is generated at an appropriate time to insert local data from a given memory module into a lane of northbound serial data, replacing an idle frame or packet of data in the serial data stream.Waveform824 illustrates an activehigh pulse844 being generated when local data is made available on the upper bits (FBD_DATA [11:6])727 and lower bits (FBD_DATA [5:0])726 of thedata bus660i.
When the activehigh pulse844 is generated in thewaveform824 of the merge enablesignal724, thepulses840A-840B in theearly_ld_pulse signal720 allow the activehigh pulse844 of the merge enablesignal724 to be sampled by the D flip-flop706B using theclock_2UI signal723. This causes an activehigh pulse842 to be generated in thewaveform832 of the local data select signal (PISO_SEL)732. The activehigh pulse842 of the local data select signal (PISO_SEL)732 causes themultiplexer704 to switch from providing the two-bit “Feedthru Data”725 at its output to provide the two-bit serializedlocal data735 at its output instead. The switch from feed-throughdata725 tolocal data735 occurs at the frame boundary when the activehigh pulse842 is first generated. This is because the falling edge of the “Early_Ld_Pulse”720 that allows the PISO shift registers708A-708B to start shifting is coincident with the frame starting point.
When merging data with the “Early_Ld_Pulse”720 and themultiplexer output731 both low, the PISO shift registers708A-708B serially shift out the twelve bits of local data two bits at a time on theserial output735 using the “Clock_2ui”clock signal723. The transmitter725ifurther serializes the two bits into a single bit serial data stream on theNBDOi output601ias illustrated by the local data indicated above thewaveform801.
Referring now toFIG. 9, a timing diagram of waveforms depicting the data mergelogic slice700ifunctioning in a six-bit mode is illustrated. That is, the six-bit mode control signal (6BIT_MODE)722 is a logical high as illustrated by thewaveform922 in the timing diagram ofFIG. 9.
InFIG. 9, theClock_2UI signal723 is illustrated by thewaveform923. The core clock signal (core_clk)611 is illustrated by thewaveform901. The lower six parallel data bits (MEM_DATA IN [5:0])690A on thememory data bus690 are illustrated by thewaveform990A. The upper six parallel data bits (MEM_DATA IN [11:6])690B of thememory data bus690 are illustrated by thewaveform990B. The lower six bits of data (FBD_DATA [5:0])726 on theparallel data bus660iare illustrated by the waveform diagram926. The upper six bits of data (FBD_DATA [11:6])727 on theparallel data bus660i, are illustrated by the waveform diagram927. The merge enablecontrol signal724 is illustrated by the waveform diagram924 that occurs earlier than that of thewaveform824 inFIG. 8. The early load pulse control signal (EARLY_LD_PULSE)720 is illustrated by thewaveform920. The late load pulse control signal (LATE_LD_PULSE)721 is illustrated by thewaveform921. The data bus select control signal (DATA_SEL)729 is illustrated by thewaveform929. The local data select control signal (PISO_SEL)732 is illustrated by thewaveform932. The single bit serial outputdata stream NBDOi601iis illustrated by thewaveform901.
In the six-bit mode, the lowerPISO shift register708B is used to convert parallel bits of data into serial data by shifting bits out. The data bus select signal (DATA_SEL)729 toggles whether the least significant six bits of the frame, FBD_Data[5:0]726, or the most significant six bits of the frame, FBD_Data[11:6]727, are loaded into the lowerPISO shift register708B through the selected output of thebus multiplexer703.
Both of the “Early_Ld_Pulse”720 and the “Late_Ld_Pulse”721 can cause the lowerPISO shift register708B to either load data or shift data out because the output of theOR gate710 is coupled into the load/shift-bar input of the lowerPISO shift register708B through themultiplexer702A when the 6BIT_MODE signal722 is active high.
When the “Early_Ld_Pulse”720 and the “Late_Ld_Pulse”721 are low, bits are shifted out from the lowerPISO shift register708B. Also during the parallel load of bits into the lowerPISO shift register708B when the load/shift-bar control input is high, bits previously loaded continue to be shifted out. When the load/shift-bar control input returns to low after a parallel load of data bits, the newly loaded bits are then shifted out by the lowerPISO shift register708B. In this manner, all six bits of data may be shifted out while a new set of parallel bits is being loaded.
The least significant six bits of the frame, FBD_Data[5:0]726, are loaded into the lowerPISO shift register708B by thepulses940A and940B inwaveform920 of the “Early_Ld_Pulse”720 when the data bus select signal (DATA_SEL)729 is low, such as atlow points949C,949D for example. The most significant six bits of the frame, FBD_Data[11:6]727, are loaded into the lowerPISO shift register708B by thepulses941A and941B inwaveform921 of the “Late_Ld_Pulse”721 when the data bus select signal (DATA_SEL)729 is high, duringpulses949A,949B for example.
In the six-bit mode, the switching between serialized “Feedthru_Data”725 and the serializedlocal data735 is similar to the twelve bit mode of operation described previously and is not repeated here for reasons of brevity.
When merging data, thePISO shift register708B alternates between serially shifting out six most significant bits or six least significant bits of local data two bits at a time onto theserial output735 using theClock_2UI clock signal723. The transmitter725ifurther serializes the two bits into a single bit serial data stream on theNBDOi output601ias illustrated by the local data indicated above thewaveform901.
While in a six-bit mode a full frame of data is still being transmitted, embodiments of the invention further reduce the latency of local data being merged into the serial data stream. In comparingFIGS. 8 and 9 together, the merging of local data occurs one frame time earlier inFIG. 9.
Embodiments of the invention enable merging of feed-through data and local data together into the serial data link on-the-fly without having to decode incoming packets of the serial input data stream to determine the location of an idle packet. Previously, the incoming serial data stream was received, depacketized/decoded, and reassembled into frames by the core logic before being re-transmitted. Embodiments of the invention avoid the depacketizing/decoding of the incoming serial data stream and its reassembly into frames of data and then encoding/packetizing for retransmission. The embodiments of the invention enable re-transmission of the incoming serial data stream and the merging of local data into the serial data stream without involving the core logic of a buffer integrated circuit. In a multi memory module system, the serial communication channels may continue to function even though a memory integrated circuit in one of the daisy chained memory modules is non-functional.
Embodiments of the invention are designed to provide for low latency memory access operations. This can allow a larger memory with more memory modules to be provided in each bank without memory access latency degrading the system performance as the number of memory modules increase in a channel.
While certain exemplary embodiments have been described and shown in the accompanying drawings, it is to be understood that such embodiments are merely illustrative of and not restrictive on the broad invention, and that this invention not be limited to the specific constructions and arrangements shown and described, since various other modifications may occur to those ordinarily skilled in the art. For example, one embodiment of the invention has been described to provide a serial data link for a fully buffered dual inline memory module. However, embodiments of the invention may be implemented in other types of memory modules and systems. As another example, data was serialized two bits at a time on two bit buses around the PISO shift registers708A-708B within the merge logic to provide relaxed data timing in one embodiment of the invention. However, embodiments of the invention may use a single bit output PISO with different clock timing and serialize the local data into a single bit serial data stream with the feed through data andmultiplexers704,705 being provided to support a single bit serial data stream.