BACKGROUND OF THE INVENTIONThe present invention relates to data processing, and more specifically, to a coherent proxy for an attached processor.
A conventional distributed shared memory computer system, such as a server computer system, includes multiple processing units all coupled to a system interconnect, which typically comprises one or more address, data and control buses. Coupled to the system interconnect is a system memory, which represents the lowest level of volatile memory in the multiprocessor computer system and generally is accessible for read and write access by all processing units. In order to reduce access latency to instructions and data residing in the system memory, each processing unit is typically further supported by a respective multi-level cache hierarchy, the lower level(s) of which may be shared by one or more processor cores.
Because multiple processor cores may request write access to a same memory block (e.g., cache line or sector) and because cached memory blocks that are modified are not immediately synchronized with system memory, the cache hierarchies of multiprocessor computer systems typically implement a cache coherency protocol to ensure at least a minimum required level of coherence among the various processor core's “views” of the contents of system memory. The minimum required level of coherence is determined by the selected memory consistency model, which defines rules for the apparent ordering and visibility of updates to the distributed shared memory. In all memory consistency models in the continuum between weak consistency models and strong consistency models, cache coherency requires, at a minimum, that after a processing unit accesses a copy of a memory block and subsequently accesses an updated copy of the memory block, the processing unit cannot again access the old (“stale”) copy of the memory block.
A cache coherency protocol typically defines a set of cache states stored in association with cached copies of memory blocks, as well as the events triggering transitions between the cache states and the cache states to which transitions are made. Coherency protocols can generally be classified as directory-based or snoop-based protocols. In directory-based protocols, a common central directory maintains coherence by controlling accesses to memory blocks by the caches and by updating or invalidating copies of the memory blocks held in the various caches. Snoop-based protocols, on the other hand, implement a distributed design paradigm in which each cache maintains a private directory of its contents, monitors (“snoops”) the system interconnect for memory access requests targeting memory blocks held in the cache, and responds to the memory access requests by updating its private directory, and if required, by transmitting coherency message(s) and/or its copy of the memory block.
The cache states of the coherency protocol can include, for example, those of the well-known MESI (Modified, Exclusive, Shared, Invalid) protocol or a variant thereof. The MESI protocol allows a cache line of data to be tagged with one of four states: “M” (Modified), “E” (Exclusive), “S” (Shared), or “I” (Invalid). The Modified state indicates that a memory block is valid only in the cache holding the Modified memory block and that the memory block is not consistent with system memory. The Exclusive state indicates that the associated memory block is consistent with system memory and that the associated cache is the only cache in the data processing system that holds the associated memory block. The Shared state indicates that the associated memory block is resident in the associated cache and possibly one or more other caches and that all of the copies of the memory block are consistent with system memory. Finally, the Invalid state indicates that the data and address tag associated with a coherency granule are both invalid.
BRIEF SUMMARYIn at least one embodiment, a coherent attached processor proxy (CAPP) of a primary coherent system receives a memory access request from an attached processor (AP) and an expected coherence state of a target address of the memory access request with respect to a cache memory of the AP. In response, the CAPP determines a coherence state of the target address and whether or not the expected state matches the determined coherence state. In response to determining that the expected state matches the determined coherence state, the CAPP issues a memory access request corresponding to that received from the AP on a system fabric of the primary coherent system. In response to determining that the expected state does not match the coherence state determined by the CAPP, the CAPP transmits a failure message to the AP without issuing on the system fabric a memory access request corresponding to that received from the AP.
In at least one embodiment, in response to receiving a memory access request and expected coherence state at an attached processor at a coherent attached processor proxy (CAPP), the CAPP determines that a conflicting request is being serviced. In response to determining that the CAPP is servicing a conflicting request and that the expected state matches, a master machine of the CAPP is allocated in a Parked state to service the memory access request after completion of service of the conflicting request. The Parked state prevents servicing by the CAPP of a further conflicting request snooped on the system fabric. In response to completion of service of the conflicting request, the master machine transitions out of the Parked state and issues on the system fabric a memory access request corresponding to that received from the AP.
In at least one embodiment, a coherent attached processor proxy (CAPP) within a primary coherent system participates in an operation on a system fabric of the primary coherent system on behalf of an attached processor (AP) that is external to the primary coherent system and that is coupled to the CAPP. The operation includes multiple components communicated with the CAPP including a request and at least one coherence message. The CAPP determines one or more of the components of the operation by reference to at least one programmable data structure within the CAPP that can be reprogrammed.
BRIEF DESCRIPTION OF THE SEVERAL VIEWS OF THE DRAWINGSFIG. 1 is a high level block diagram of an exemplary data processing system in which a coherent device participates with a primary coherent system across a communication link through a proxy;
FIG. 2 is a more detailed block diagram of an exemplary embodiment of the data processing system ofFIG. 1;
FIG. 3 is a more detailed block diagram of an exemplary embodiment of a processing unit in the data processing system ofFIG. 2;
FIG. 4 is a time-space diagram of an exemplary operation on the system fabric of the data processing system ofFIG. 2;
FIG. 5 is a more detailed block diagram of an exemplary embodiment of the coherent attached processor proxy (CAPP) in the processing unit ofFIG. 3;
FIG. 6 is a high level logical flowchart of an exemplary process by which a CAPP coherently handles a memory access request received from an attached processor (AP) in accordance with one embodiment;
FIG. 7 is a high level logical flowchart of an exemplary process by which a CAPP coherently handles a snooped memory access request in accordance with one embodiment;
FIG. 8 is a first time-space diagram of an exemplary processing scenario in which an AP requests to coherently update a memory block within the primary coherent system to which it is attached;
FIG. 9 is a second time-space diagram of an exemplary processing scenario in which an AP requests to coherently update a memory block within the primary coherent system to which it is attached;
FIG. 10 is a third time-space diagram of an exemplary processing scenario in which an AP requests to coherently update a memory block within the primary coherent system to which it is attached; and
FIG. 11 is a data flow diagram of an exemplary design process.
DETAILED DESCRIPTIONWith reference now to the figures and with particular reference toFIG. 1, there is illustrated a high level block diagram of an exemplarydata processing system100 in which a coherent device participates with a primary coherent system across a communication link through a proxy. As shown,data processing system100 includes a primarycoherent system102 in which coherency of a distributed shared memory is maintained by implementation of a coherency protocol, such as the well-known MESI protocol or a variant thereof. The coherency protocol, which in various embodiments can be directory-based or snoop-based, is characterized by a bounded time frame in which a system-wide coherency response is determined for each memory access request.
As shown, the functionality ofdata processing system100 can be expanded by coupling an attached processor (AP)104 to primarycoherent system102 by acommunication link108. AP104 may be implemented, for example, as a field programmable gate array (FPGA), application specific integrated circuit (ASIC), or other general or special-purpose processor or system. In various embodiments, AP104 may, for example, serve as a co-processor that off-loads predetermined processing tasks from primarycoherent system102, provide low cost expansion of the general-purpose processing capabilities ofdata processing system100, and/or provide an interface with a heterogeneous system external to primarycoherent system102. In support of these and other possible functions of AP104, AP104 preferably includes acache106 that holds local copies of memory blocks in the coherent memory address space of primarycoherent system102 to enable low latency access to those memory blocks by AP104.
In many cases, the technology utilized to implement AP104,cache106, and/orcommunication link108 has insufficient speed, bandwidth and/or reliability to guarantee that AP104 can participate in the determination of the system-wide coherency responses for memory access requests within the bounded time frame required by the coherency protocol of primarycoherent system102. Accordingly, primarycoherent system102 further includes a coherent attached processor proxy (CAPP)110 that participates on behalf of AP104 in the determination of the system-wide coherency responses for AP104 within a timeframe that satisfies the timing requirements of the coherency protocol of primarycoherent system102. Although not required, it is preferable if CAPP110 is programmable and can therefore be programmed to support any of multiple different implementations of AP104.
Referring now toFIG. 2, there is depicted a more detailed block diagram of adata processing system200 that is one of the numerous possible embodiments ofdata processing system100 ofFIG. 1.Data processing system200 may be implemented, for example, with one of the IBM Power servers, a product line of International Business Machines Corporation of Armonk, N.Y.
In the depicted embodiment,data processing system200 is a distributed shared memory multiprocessor (MP) data processing system including a plurality ofprocessing units202a-202m. Each ofprocessing units202a-202mis supported by a respective one of sharedsystem memories204a-204m, the contents of which may generally be accessed by any ofprocessing units202a-202m.Processing units202a-202mare further coupled for communication to asystem fabric206, which may include one or more bused, switched and/or wireless communication links. The communication onsystem fabric206 includes memory access requests byprocessing units202 requesting coherent access to various memory blocks within various sharedsystem memories204a-204m.
As further shown inFIG. 2, one or more ofprocessing units204a-204mare further coupled to one or more communication links210 providing expanded connectivity. For example,processing units202aand202mare respectively coupled to communication links210a-210kand210p-210v, which may be implemented, for example, with Peripheral Component Interconnect express (PCIe) local buses. As shown, communication links210 can be utilized to support the direct or indirect coupling of input/output adapters (IOAs) such as IOAs212a,212pand212v, which can be, for example, network adapters, storage device controllers, display adapters, peripheral adapters, etc. For example, IOA212p, which is network adapter coupled to anexternal data network214, is coupled tocommunication link210poptionally through an I/O fabric216p, which may comprise one or more switches and/or bridges. In a similar manner, IOA212v, which is a storage device controller that controlsstorage device218, is coupled tocommunication link210voptionally through an I/O fabric216v. As discussed with reference toFIG. 1, communication links210 can also be utilized to support the attachment of one ormore APs104, either directly to aprocessing unit202, as is the case forAP104k, which is coupled to processingunit202abycommunication link210k, or indirectly to aprocessing unit202 through an intermediate I/O fabric216, as can be the case for AP104w, which can be coupled to processingunit202mthroughcommunication link210vand optional I/O fabric216v.
Data processing system200 further includes aservice processor220 that manages the boot process ofdata processing system200 and thereafter monitors and reports on the performance of and error conditions detected indata processing system200.Service processor220 is coupled tosystem fabric206 and is supported by alocal memory222, which may include volatile (e.g., dynamic random access memory (DRAM)) and non-volatile memory (e.g., non-volatile random access memory (NVRAM) or static random access memory (SRAM)).Service processor220 is further coupled to amailbox interface224 through whichservice processor220 communicates I/O operations with communication link210a.
Those of ordinary skill in the art will appreciate that the architecture and components of a data processing system can vary between embodiments. For example, other devices and interconnects may alternatively or additionally be used. Accordingly, the exemplarydata processing system200 given inFIG. 2 is not meant to imply architectural limitations with respect to the claimed invention.
With reference now toFIG. 3, there is illustrated a more detailed block diagram of an exemplary embodiment of aprocessing unit202 indata processing system200 ofFIG. 2. In the depicted embodiment, eachprocessing unit202 is preferably realized as a single integrated circuit chip having a substrate in which semiconductor circuitry is fabricated as is known in the art.
Eachprocessing unit202 includes multiple processor cores302a-302nfor independently processing instructions and data. Each processor core302 includes at least an instruction sequencing unit (ISU)304 for fetching and ordering instructions for execution and one ormore execution units306 for executing instructions. The instructions executed byexecution units306 may include, for example, fixed and floating point arithmetic instructions, logical instructions, and instructions that request read and write access to a memory block in the coherent address space ofdata processing system200.
The operation of each processor core302a-302nis supported by a multi-level volatile memory hierarchy having at its lowest level one or more shared system memories204 (only one of which is shown inFIG. 3) and, at its upper levels, one or more levels of cache memory. As depicted, processingunit202 includes an integrated memory controller (IMC)324 that controls read and write access to an associatedsystem memory204 in response to requests received from processor cores302a-302nand operations received onsystem fabric206.
In the illustrative embodiment, the cache memory hierarchy ofprocessing unit202 includes a store-through level one (L1)cache308 within each processor core302a-302nand a store-in level two (L2) cache310. As shown, L2 cache310 includes an L2 array anddirectory314,masters312 andsnoopers316.Masters312 initiate transactions onsystem fabric206 and access L2 array anddirectory314 in response to memory access (and other) requests received from the associated processor cores302.Snoopers316 detect operations onsystem fabric206, provide appropriate responses, and perform any accesses to L2 array anddirectory314 required by the operations. Although the illustrated cache hierarchy includes only two levels of cache, those skilled in the art will appreciate that alternative embodiments may include additional levels (L3, L4, etc.) of private or shared, on-chip or off-chip, in-line or lookaside cache, which may be fully inclusive, partially inclusive, or non-inclusive of the contents the upper levels of cache.
As further shown inFIG. 3, processingunit202 includesintegrated interconnect logic320 by whichprocessing unit202 is coupled tosystem fabric206, as well as an instance ofresponse logic322, which in embodiments employing snoop-based coherency, implements a portion of a distributed coherency messaging mechanism that maintains coherency of the cache hierarchies ofprocessing unit202.Processing unit202 further includes one or more integrated I/O (input/output) controllers330 (e.g., PCI host bridges (PHBs)) supporting I/O communication via one or more communication links210.Processing unit202 additionally includes aCAPP110 as previously described. As shown,CAPP110 may optionally include a dedicated I/O controller332 (e.g., a PHB) by whichCAPP110 supports communication over anexternal communication link210kto which anAP104kis also coupled. In alternative embodiments, dedicated I/O controller332 can be omitted, andCAPP110 can communicate withAP104 via a shared I/O controller330.
Those skilled in the art will appreciate thatdata processing system200 can include many additional or alternative components. Because such additional components are not necessary for an understanding of the present invention, they are not illustrated inFIG. 3 or discussed further herein.
Referring now toFIG. 4, there is depicted a time-space diagram of an exemplary operation on thesystem fabric206 ofdata processing system200 ofFIG. 2 in accordance with one embodiment of a snoop-based coherence protocol. The operation begins when a master400 (e.g., amaster312 of an L2 cache310, a master within an I/O controller330 or a master in CAPP110) issues arequest402 onsystem fabric206. Request402 preferably includes at least a transaction type indicating a type of desired access and a resource identifier (e.g., real address) indicating a resource to be accessed by the request. Common types of requests preferably include those set forth below in Table I.
| TABLE I |
|
| Request | Description |
|
| READ | Requests a copy of the image of a memory block for query purposes |
| RWITM (Read- | Requests a unique copy of the image of a memory block with the |
| With-Intent-To- | intent to update (modify) it and requires destruction of other copies, |
| Modify) | if any |
| BKILL | Requests invalidation of all cached copies of a target memory block |
| (Background Kill) | and cancellation of all reservations for the target memory block |
| DCLAIM (Data | Requests authority to promote an existing query-only copy of |
| Claim) | memory block to a unique copy with the intent to update (modify) it |
| and requires destruction of other copies, if any |
| DCBZ (Data Cache | Requests authority to create a new unique copy of a memory block |
| Block Zero) | without regard to its present state and subsequently modify its |
| contents; requires destruction of other copies, if any |
| CASTOUT | Copies the image of a memory block from a higher level of memory |
| to a lower level of memory in preparation for the destruction of the |
| higher level copy |
| WRITE | Requests authority to create a new unique copy of a memory block |
| without regard to its present state and immediately copy the image of |
| the memory block from a higher level memory to a lower level |
| memory in preparation for the destruction of the higher level copy |
|
Further details regarding these operations and an exemplary cache coherency protocol that facilitates efficient handling of these operations may be found in U.S. Pat. No. 7,389,388, which is incorporated by reference.
Request402 is received by snoopers404 distributed throughoutdata processing system200, including, for example,snoopers316 of L2 caches310,snoopers326 ofIMCs324, and snoopers within CAPPs110 (see, e.g., snoop machines (SNMs)520 ofFIG. 5). In general, with some exceptions,snoopers316 in the same L2 cache310 as themaster312 ofrequest402 do not snoop request402 (i.e., there is generally no self-snooping) because arequest402 is transmitted onsystem fabric206 only if therequest402 cannot be serviced internally by aprocessing unit202. Snoopers404 that receive and process requests402 each provide a respective partial response (Presp)406 representing the response of at least that snooper404 to request402. Asnooper326 within anIMC324 determines thepartial response406 to provide based, for example, upon whether thesnooper326 is responsible for the request address and whether it has resources available to service the request. Asnooper316 of an L2 cache310 may determine itspartial response406 based on, for example, the availability of its L2 array anddirectory314, the availability of a snoop machine instance withinsnooper316 to handle the request, and the coherence state associated with the request address in L2 array anddirectory314.
Thepartial responses406 of snoopers404 are logically combined either in stages or all at once by one or more instances ofresponse logic322 to determine a systemwide coherence response to request402, referred to herein as a combined response (Cresp)410. In one preferred embodiment, which will be assumed hereinafter, the instance ofresponse logic322 responsible for generating combinedresponse410 is located in theprocessing unit202 containing themaster400 that issuedrequest402.Response logic322 provides combinedresponse410 tomaster400 and snoopers404 viasystem fabric206 to indicate the response (e.g., success, failure, retry, etc.) to request402. If combinedresponse410 indicates success ofrequest402, combinedresponse410 may indicate, for example, a data source for a requested memory block, a cache state in which the requested memory block is to be cached bymaster400, and whether “cleanup” operations invalidating the requested memory block in one or more caches are required.
In response to receipt of combinedresponse410, one or more ofmaster400 and snoopers404 typically perform one or more actions in order to servicerequest402. These actions may include supplying data tomaster400, invalidating or otherwise updating the coherence state of data cached in one or more caches, performing castout operations, writing back data to asystem memory204, etc. If required byrequest402, a requested or target memory block may be transmitted to or frommaster400 before or after the generation of combinedresponse410 byresponse logic322.
In the following description, thepartial response406 of a snooper404 to arequest402 and the actions performed by the snooper404 in response to therequest402 and/or its combinedresponse410 will be described with reference to whether that snooper is a Highest Point of Coherency (HPC), a Lowest Point of Coherency (LPC), or neither with respect to the request address specified by the request. An LPC is defined herein as a memory device or I/O device that serves as the repository for a memory block. In the absence of a HPC for the memory block, the LPC holds the true image of the memory block and has authority to grant or deny requests to generate an additional cached copy of the memory block. For a typical request in the data processing system embodiment ofFIG. 2, the LPC will be thememory controller324 for thesystem memory204 holding the referenced memory block. An HPC is defined herein as a uniquely identified device that caches a true image of the memory block (which may or may not be consistent with the corresponding memory block at the LPC) and has the authority to grant or deny a request to modify the memory block. Descriptively, the HPC may also provide a copy of the memory block to a requestor in response to an operation that does not modify the memory block. Thus, for a typical request in the data processing system embodiment ofFIG. 2, the HPC, if any, will be an L2 cache310 orCAPP110. Although other indicators may be utilized to designate an HPC for a memory block, a preferred embodiment of the present invention designates the HPC, if any, for a memory block utilizing selected cache coherency state(s), which may be held, for example, in a cache directory.
Still referring toFIG. 4, the HPC, if any, for a memory block referenced in arequest402, or in the absence of an HPC, the LPC of the memory block, preferably has the responsibility of protecting the transfer of ownership of a memory block, if necessary, in response to arequest402. In the exemplary scenario shown inFIG. 4, asnooper404nat the HPC (or in the absence of an HPC, the LPC) for the memory block specified by the request address ofrequest402 protects the transfer of ownership of the requested memory block tomaster400 during aprotection window412athat extends from the time that snooper404ndetermines itspartial response406 untilsnooper404nreceives combinedresponse410 and during asubsequent window extension412bextending (preferably, for a programmable time) beyond receipt bysnooper404nof combinedresponse410. Duringprotection window412aandwindow extension412b,snooper404nprotects the transfer of ownership by providingpartial responses406 to other requests specifying the same request address that prevent other masters from obtaining ownership (e.g., a retry partial response) until ownership has been successfully transferred tomaster400. If necessary,master400 may also likewise initiate aprotection window413 to protect its ownership of the memory block requested inrequest402 following receipt of combinedresponse410.
As will be appreciated by those skilled in the art, the snoop-based coherence protocol illustrated inFIG. 4 may be implemented utilizing multiple diverse sets of coherence states. In a preferred embodiment, the cache coherence states employed within the protocol, in addition to providing (1) an indication of whether a cache is the HPC for a memory block, also indicate at least (2) whether the cached copy is unique (i.e., is the only cached copy system-wide), (3) whether and when the cache can provide a copy of the memory block to a master of a memory access request for the memory block, (4) whether the cached image of the memory block is consistent with the corresponding memory block at the LPC (system memory). These attributes can be expressed, for example, in a variant of the well-known MESI (Modified, Exclusive, Shared, Invalid) protocol including at least the coherence states summarized below in Table II.
| TABLE II |
|
| | | | Consistent |
| Coherence state | HPC? | Unique? | Data Source? | with LPC? |
|
| M (Modified) | Yes | Yes | Yes (before Cresp) | No |
| T (Shared-Owner) | Yes | Unknown | Yes (after Cresp) | No |
| S (Shared) | No | Unknown | No | Unknown |
| I (Invalid) | No | No | No | N/a - data is |
| | | | invalid |
|
In addition to the coherence states listed in Table II, the coherence protocol may include one or more additional transitional coherence states that can be employed, among other things, to implementprotection window412a,window extension412b, andprotection window413. For example, the coherence protocol may include an HPC Protect state that master400 may assume in response to combinedresponse410 to protect transfer of HPC status (i.e., coherence ownership) to thatmaster400 duringprotection window413. Similarly, the coherence protocol may additionally include a Shared Protect state that amaster400 or asnooper404nmay assume in response to issuing or snooping a DClaim request, respectively, in order to implementprotection window413 orprotection window412aandwindow extension412b. Further, the coherence protocol may include an Shared Protect Noted state that may be assumed to facilitate assumption of HPC status by anothermaster400, as described further herein.
Referring now toFIG. 5, there is depicted a more detailed block diagram of an exemplary embodiment of the coherent attached processor proxy (CAPP)110 inprocessing unit202 ofFIG. 3. As shown,CAPP110 is coupled to interconnectlogic320 to permitCAPP110 to transmit and receive address, control and coherency communication viasystem fabric206 on behalf of (i.e., as a proxy for) an AP104 (e.g.,AP104k) to which it is coupled by a communication link (e.g.,communication link210k).
CAPP110 includessnooper logic500,master logic502,transport logic504, and as discussed above, an optional I/O controller332.Transport logic504 has two interfaces, a first by whichtransport logic504 manages communication overcommunication link210kas necessary to comport with the messaging protocol employed bycommunication link210kand/orAP104, and a second by whichtransport logic504 manages data communication withsystem fabric206. Thus,transport logic504 may packetize data, may apply message encapsulation/decapsulation or encryption/decryption, may compute, append and/or verify checksums, etc., as is known in the art.
Snooper logic500 includes adecoder510, adirectory512 of the contents of thedata array552 of thecache106 of the associatedAP104, a snoop table514, adispatcher516, and a set of snoop machines (SNMs)520.Decoder510 ofsnooper logic500 receives memory access requests fromsystem fabric206 viainterconnect logic320 and optionally but preferably decodes the snooped memory access requests into a corresponding set of internal snoop requests. The set of internal snoop requests implemented bydecoder510 is preferably programmable (and in some embodiments dynamically reprogrammable) to decouple the design ofCAPP110 from that ofAP104 and to allow flexibility in mapping the memory access requests of the primarycoherent system102 to the request set of the associatedAP104. Following decoding bydecoder510, the target address specified by the memory access request is utilized to accessdirectory512 in order to look up the coherence state of the target address with respect toAP104. It should be noted that the coherence state indicated bydirectory512 may not match or correspond to that indicated bydirectory550 ofcache106 inAP104. Nevertheless, the use of the coherence state information indirectory512 inCAPP110 rather thandirectory550 enables the bounded time frame in which a system-wide coherency response is to be determined for each memory access request in primarycoherent system102 to be met, regardless of whether communication link210 and/orAP104 have lower speed or reliability than other components of data processing system (e.g., CAPP110).
The coherence state specified bydirectory512 and the internal request determined bydecoder510 are then utilized by snoop table514 to determine an appropriate partial response (Presp) to the snooped memory access request. In response to at least the internal snoop request determined bydecoder510, coherence state output bydirectory512 and Presp output by snoop table514,dispatcher516 determines whether or not any further action is or may possibly be required in response to the memory access request (e.g., update ofdirectory512, sourcing the target cache line to the requester, etc.), and if so, dispatches a snoopmachine520 to manage performance of that action.
Master logic502 optionally but preferably includes a master table530 that maps memory access and other requests originated byAP104kand received byCAPP110 to internal master requests. As with the mapping performed bydecoder510 ofsnooper logic500, the mapping performed by master table530 decouples the design ofCAPP110 andAP104 and enablesCAPP110 to programmably support a wide variety ofdiverse APs104. In at least some embodiments, master table530 supports dynamic reprogramming.Master logic502 further includes a set of master machines (MMs)532 that services internal master requests output by master table530. In a typical case, amaster machine532 allocated to service an internal master request determines and manages an action to be performed to service the internal request (e.g., initiating a directory update and/or memory access request on system fabric206) based at least in part on the coherence state indicated for the target address of the master request bydirectory512. Data transfers to and fromAP104 viaCAPP110 in response to the operation ofsnooper logic500 andmaster logic502 are tracked via operation tags allocated fromtag pool540.
As further indicated inFIG. 5,master logic502 includes a combined response (Cresp) table534. In response to receipt of a combined response representing the systemwide coherence response to a request, Cresp table534 translates the combined response received fromsystem fabric206 into an internal Cresp message and distributes the internal Cresp message to mastermachines532 and snoopmachines520. Again, the translation of combined responses to internal Cresp messages by Cresp table534 decouples the design ofAP104 from that of primarycoherent system102 and enables the interface provided byCAPP110 to be programmable and thus support a variety ofdiverse APs104.
As noted above, several data structures (e.g.,decoder510, snoop table514, master table530 and Cresp table534) withinCAPP110 are preferably programmable, and in some embodiments, dynamically programmable. In one implementation, a control processor (e.g.,service processor220 or any ofprocessing units202 running supervisory code (e.g., hypervisor)) dynamically updates the data structures by first instructingAP104 to invalidate itsdirectory550 and quiesce. The control processor then updates one or more of the data structures withinCAPP110. In response to completion of the updates, the control processor instructsAP104 to resume normal processing. It should also be noted that the configurations of master table530 and snoop table514 affects not only the mapping (translation) of incoming AP requests and snooped requests, respectively, but also the behavior ofMMs532 andSNMs520. That is, the behavior ofMMs532 in response to AP requests and the messages transmitted onsystem fabric206 and toAP104 are also preferably determined by the configuration of master table530. Similarly, the behavior ofSNMs520 in response to snooped requests and the messages transmitted onsystem fabric206 and toAP104 are preferably determined by the configuration of snoop table514. Thus, the behaviors and messages ofMMs532 andSNMs520 can be selectively changed by appropriate reprogramming of master table530 and snoop table514.
Referring now toFIG. 6, there is depicted a high level logical flowchart of an exemplary process by which aCAPP110 coherently handles a memory access request received from anAP104 in accordance with one embodiment. As with the other logical flowcharts presented herein, it should be appreciated that steps are presented in a logical rather than strictly chronological order and at least some of the illustrated steps may be performed concurrently or in a different order than that illustrated.
The process shown inFIG. 6 begins atblock600 and then proceeds to block602, which illustrates anAP104 generating a target address within the coherent address space of primarycoherent system102. The target address identifies a coherent storage location to which some type of access is desired, for example, an access to obtain a query-only copy of a cache line, update or invalidate contents of a storage location identified by the target address, writeback a cache line tosystem memory204, invalidate a page table entry utilized to perform address translation, etc.AP104 additionally performs a lookup of the coherence state of the target address in AP directory550 (block604).AP104 then transmits to CAPP110 a memory access request specifying the desired access, together with the coherence state read fromAP directory550 and any associated data (block606).
The coherence state transmitted with the AP memory access request is referred to herein as the “expected state,” in that in many cases, the type of memory access request selected byAP104 is predicated on the coherence state indicated byAP directory550. In a preferred embodiment,AP104 transmits the memory access request toCAPP110 even in cases in which the expected state is or corresponds to an HPC state that, if held in an L2 cache310, would permit the associated processor core302 to unilaterally access the storage location identified by the target address prior to receipt of a combined response. This is the case because the coherence state determination made byAP104 is only preliminary, with the final coherence state determination being made byCAPP110 as described below.
In response to receipt of the AP memory access request, master table530 ofmaster logic502 optionally translates the AP memory access request into an internal master request (e.g., one of the set of requests within the communication protocol specified for system fabric206 (block610). In a typical embodiment, the translation includes mapping the transaction type (ttype) indicated by the AP memory access request to a ttype utilized onsystem fabric206. In addition,CAPP110 determines a coherence state for the target address specified by the memory access request with respect to AP104 (block616). In a preferred embodiment, the coherence state is determined from multiple sources of coherence information according to a predetermined prioritization of the sources, which include (in order of increasing priority):directory512,MMs532 andSNMs520. Thus, ifCAPP110 determines atblock616 that one ofSNMs520 is processing a snooped memory access request that collides with the target address, the coherence state indicated by thatSNM520 is determinative. Similarly, ifCAPP110 determines atblock616 that noSNMs520 is active processing a request that collides with the target address, but the target address of the AP memory access request collides with the target address of a master request being processed by one ofMMs532, the coherence state indicated by thatMM532 is determinative. If the request address does not collide with anactive SNM520 orMM532, the coherence state indicated byCAPP directory512 is determinative.
Atblock620,master logic502 determines whether or not the expected state communicated with the AP memory access request matches the coherence state determined byCAPP110 atblock616. If so,master logic502 allocates anMM532 to service the AP memory access request in an Active state in which theMM532 begins its activities to service the AP memory access request (block621). Atblock622, theMM532 allocated to service the AP memory access request determines whether or not servicing the AP memory access request includes initiating a memory access request onsystem fabric206. If not, the process passes through page connector B to block650, which is described further below.
If, however,MM532 determines atblock622 that servicing the AP memory access request includes initiating a memory access request onsystem fabric206, theMM532 initiates the required memory access request onsystem fabric206 on behalf of AP104 (block624). Within a bounded time,master logic502 receives the combined response (Cresp) for the request (block626), which Cresp table534 optionally translates to an internal Cresp message (block628) and distributes to theMM532 that initiated the memory access request. As indicated atblock630, if the combined response indicates Retry, meaning that at least one necessary participant could not service the request (e.g., was not available to service the request or was already processing another request having an address collision with the target address), the process returns to block616, which has been described. If, on the other hand, the combined response indicates that the request succeeded, theMM532 that initiated request performs any data handling actions, cleanup actions, and/or directory update actions required to complete servicing the request (block632). The data handling actions can include, for example,MM532 receiving requested data and forwarding the data toAP104 or transmitting data fromAP104 onsystem fabric206. The cleanup actions can include, for example,MM532 issuing one or more kill requests onsystem fabric206 to invalidate one or more copies of a cache line identified by the target address cached elsewhere withindata processing system200. The directory update actions include making any coherence update required by the request to bothCAPP directory512 andAP directory550. Thereafter, the process shown inFIG. 6 ends at block634.
Returning to block620, in response to a determination that the expected coherence state specified with the AP memory access request does not match the coherence state determined byCAPP110, the process proceeds to blocks640-644. In one embodiment in which optional blocks640-642 are omitted, theMM532 allocated to the service the request transmits a Failure message toAP104. In addition to the Failure message,MM532 optionally further indicates, with the Failure message or in a separate directory update message, the coherence state for the target address determined byCAPP110, thus enablingAP104 to update itsAP directory550 and to subsequently initiate an appropriate AP memory access request together with the appropriate expected state. Thereafter, the process shown inFIG. 6 ends at block634. In this embodiment,AP104 may require numerous requests to access the target memory block if the target memory block is highly contended by snoopers in primarycoherent system102. Accordingly, in an alternative embodiment including blocks640-642,master logic502 is able to increase its priority for the target memory block with respect to snoopers in primarycoherent system102 by entering a Parked state. In particular,master logic502 determines atblock640 whether or not the coherence state mismatch detected atblock620 is due to one ofSNMs520 being active servicing a snooped memory access request that has an address collision with the target address. If not, the process proceeds to block644, which has been described.
If, however,master logic502 determines atblock640 that the coherence state mismatch detected atblock620 is due to one ofSNMs520 being active servicing a snooped memory access request that has an address collision with the target address, the process passes to block642.Block642 depictsmaster logic502 allocating anMM532 in Parked state. In the Parked state,MM532 does not actively begin to service the AP memory access request and does not inhibit theSNM520 that is active on the target address from completing its processing of the snooped memory access request, but does (in one embodiment) inhibit any other of theSNMs520 andMMs532 in thesame CAPP110 from transitioning to an active state to service a request specifying an address that collides with the target address of the AP memory access request. The allocatedMM532 remains in the Parked state until theSNM520 that is active servicing the conflicting snooped memory access request transitions to an Idle state, and in response to this transition, itself transitions from the Parked state to an Active state. The process then passes to block616 and following blocks, which have been described. Returning to block616 ensures that theSNM520 that was active on the target address did not change the CAPP coherence state from the expected state.
In at least some embodiments, the allocation of anMM532 in the Parked state does not absolutely inhibit any other of theSNMs520 andMMs532 in thesame CAPP110 from transitioning to an active state. Instead, the effects of aMM532 in the Parked state (and/or an active state) on the dispatch ofother SNMs520 andMMs532 to service selected types of conflicting requests can be varied, for example, via program control (i.e., via execution of an appropriate CAPP control instruction by one of processor cores302 or AP104) of the composite coherence state determination described above with reference to block616. For example, to eliminate unnecessary traffic onsystem fabric206,dispatcher516 can be permitted by programmable control to dispatch aSNM520 in an active state to service a snooped BKill request that invalidates the target memory block of a conflicting request being handled by aMM532 in the Parked state or an active state. In cases in which another machine is dispatched to service a conflicting request while aMM532 is in the Parked state, theMM532 in the Parked state re-enters the Parked state when the process ofFIG. 6 proceeds along the path fromblock642 toblocks616,620,640 and returns to block642.Master logic502 further preferably implements a counter to bound the number of times aMM532 is forced to re-enter the Parked state in this manner for a single AP request. When a threshold value of the counter is reached, the dispatch ofother SNMs520 andMMs532 to service conflicting requests is then inhibited to permit theMM532 to exit the Parked state and manage servicing of its AP request.
Referring now to block650, in response to determining the servicing the AP memory access request does not require issuing a memory access request onsystem fabric206,MM532 updates theCAPP directory512 as indicated by the AP memory access request.MM532 then transmits a Success message toAP104 to confirm the update toCAPP directory512. The process thereafter terminates at block632.
With reference now toFIG. 7, there is illustrated a high level logical flowchart of an exemplary process by which aCAPP110 coherently handles a snooped memory access request in accordance with one embodiment. The illustrated process begins atblock700 and then proceeds to block702, which depictssnooper logic500 ofCAPP110 receiving a memory access request onsystem fabric206 viainterconnect logic320. Atblock704,decoder510 decodes the snooped memory access request to determine the type of the request. In addition, atblock706,CAPP110 determines a coherence state for the address referenced by the snooped memory access request, for example, utilizing the methodology previously described with reference to block616.
Based on the decoded type of the snooped memory access request as determined atblock704 and the coherence state for the referenced address as determined atblock706, snoop table514 determines and transmits on system fabric206 a partial response representing the coherence response ofAP104 to the snooped memory access request (block710).
Referring now to block712,dispatcher516 ofsnooper logic500 determines based on the partial response determined atblock710 and the decoded memory access request whether or not further action byCAPP110 may be required to service the snooped memory access request. In general, if the coherence state determined atblock706 is Invalid, meaning thatAP cache106 does not hold a valid copy of the memory block identified by the referenced memory address, no further action on the part ofCAPP110 orAP104 is required to service the snooped memory access request. If the coherence state determined atblock706 is other than Invalid, at least some additional action may be required on the part ofCAPP110 and/orAP104 to service the snooped memory access request.
In response to a negative determination atblock712, the process depicted inFIG. 7 ends atblock730. If, however,dispatcher516 determines atblock712 that further action byCAPP110 and/orAP104 may be required to service the snooped memory access request,dispatcher516 dispatches one ofSNMs520 to manage any action required to service the snooped memory access request (block714). Atblock716, the dispatchedSNM520 determines whether the action required to service the snooped memory access request can be determined without the combined response representing the systemwide coherence response to the memory access request or whether the combined response is required to determine the action required to appropriately service the snooped memory access request. In response to a determination atblock716 that the combined response is not required to determine the action to perform to service the snooped memory access request, the dispatchedSNM520 manages performance of any data handling and/or directory update actions required by the decoded memory access request and coherence state to service the memory access request (block718). Thereafter, the process illustrated inFIG. 7 ends atblock730.
In response to a determination atblock716 that the combined response is required to determine the action to be performed to service the snooped memory access request, the dispatchedSNM520 waits for the combined response, as shown atblock720. In response to receiving the combined response, Cresp table534 optionally translates the combined response into an internal Cresp message employed by CAPP110 (block722). The dispatchedSNM520 then manages performance of any data handling and/or directory update actions required by the combined response to service the memory access request (block724). Thereafter, the process illustrated inFIG. 7 ends atblock730.
Referring now toFIG. 8, there is depicted a first time-space diagram of an exemplary processing scenario in which anAP104 requests to coherently update a memory block within the primarycoherent system102 to which it is attached. For purposes of illustration, the exemplary processing scenario given inFIG. 8 and other similar figures will be described with reference to the illustrative hardware embodiments given inFIGS. 2-3 and 5.
As the exemplary processing scenario begins, anAP104 processes a command (e.g., a software or firmware instruction executed within AP104) specifying an update to a memory block identified by a target address within the coherent address space of primarycoherent system102. In response to the command,AP104 allocates one of its idle finite state machines (FSMs) to manage performance of the command and performs a lookup of the target address inAP directory550, as indicated by arrow800. The AP FSM transitions from an idle state (indicated by “X”) to an Update Active state and, based on a determination that the target address has an Invalid coherence state with respect toAP directory550, transmits toCAPP110 an update request with an expected state of Invalid, as shown at reference numeral802.
In response to receipt fromAP104 of the update request,CAPP110 translates the AP update request into a RWITM request, which as indicated in Table I, is one of the set of requests within the communication protocol specified forsystem fabric206. In addition,CAPP110 determines a coherence state for the target address specified by the memory access request. Because in this case, the target address of the RWITM request does not collide with an address that anMM532 orSNM520 is currently processing, the coherence state of the target address forCAPP110 is determined byCAPP directory512, which returns Invalid.
The previouslyidle MM532 allocated to service the RWITM request, in response to determining a coherence state match between the expected state and the coherence state determined byCAPP110, transitions to a Valid state and initiates the RWITM request onsystem fabric206 as shown at reference numeral806. The RWITM request requests a copy of the target memory block and further requests invalidation of all other cached copies of the memory block (to permitAP104 to modify the memory block). Within a bounded time,MM532 receives a combined response indicating success of the RWITM request, as indicated at reference numeral808.MM532 also receives a copy of the requested memory block, possibly prior to, concurrently with, or after the combined response.
In response to receiving the combined response indicating success of the RWITM request,MM532 transitions to the HPC Protect state, thus initiating aprotection window413 for the target address. In addition, as indicated by arrow810,MM532 updates the coherence state for the target address inCAPP directory512 to Modified. In addition, as indicated by arrow812,MM532 transmits the copy of the requested memory block and a Complete message toAP104. Thereafter,MM532 returns to the Idle state. In response to receipt of the requested memory block and Complete message, the AP FSM directs the requested update to the target memory block, storage of the updated target memory block inarray552, and update of the coherence state for the target address inAP directory550 to Modified. The updates toAP cache106 are performed asynchronously to the update toCAPP directory512, and due to the possibly unreliable connection provided by communication link210, may requireCAPP110 to retransmit the Complete message one or more times. Thereafter, the AP FSM returns to the Idle state.
It can also be appreciated by reference toFIG. 8 that (depending on the presence or absence of other colliding requests) the processing of a read request ofAP104 could be handled similarly to the illustrated processing scenario, with the following exceptions: the AP FSM would assume the Read Active state rather than the Update Active state,MM532 would assume the Shared Protect state following receipt of the combined response indicated by arrow808 rather than the HPC Protect state, andCAPP directory512 andAP directory550 would be updated to the Shared state rather than the Modified State.
With reference now toFIG. 9, there is depicted a second time-space diagram of an exemplary processing scenario in which anAP104 requests to coherently update a memory block within the primarycoherent system102 to which it is attached.
As the exemplary processing scenario begins, anAP104 processes a command (e.g., a software or firmware instruction executed within AP104) specifying an update to a memory block identified by a target address within the coherent address space of primarycoherent system102. In response to the command,AP104 allocates one of its idle finite state machines (FSMs) to manage performance of the command and performs a lookup of the target address inAP directory550, as indicated byarrow900. The AP FSM transitions from an Idle state (indicated by “X”) to an Update Active state and, based on a determination that the target address has an Shared-Owner (T) coherence state with respect toAP directory550, transmits toCAPP110 an update request with an expected state of T, as shown atreference numeral902.
In response to receipt fromAP104 of the update request,CAPP110 translates the update request to a BKill request. As described above with reference to Table I, the BKill request requests invalidation of all other cached copies of the memory block to permitAP104 to modify its existing HPC copy of the target memory block.CAPP110 additionally determines a coherence state for the target address specified by the update request with respect toCAPP110, as shown atreference numeral904. Because in this case, the target address of the update request collides with an address that aSNM520 is currently processing, the state of thatSNM520 is determinative, meaning thatCAPP110 determines an HPC Protect state. Thus, the coherence state determined byCAPP110 does not match the expected state. In embodiments in which the optional functionality described above with reference to blocks640-642 ofFIG. 6 is not implemented,CAPP110 would respond to the update request by transmitting a Failure message toAP104. However, in the illustrated case in which the optional functionality described above with reference to blocks640-642 ofFIG. 6 is implemented,CAPP110 allocates anidle MM532 to service the BKill request in the Parked state, as indicated byarrow906. As noted above, the Parked state of theMM532 inhibits anyother SNM520 from transitioning to an active state to service a snooped memory access request for the target address.
In response to theSNM520 that is active working on the conflicting address transitioning to the Idle state without modifying the matching T coherence state in CAPP directory512 (e.g., as would be the case if the snooped memory access request is a Read request), theMM532 verifies that the coherence state determined for CAPP110 (which is the T state recorded inCAPP directory512 in the absence of aSNM520 orMM532 active on a conflicting address) matches the expected state, as discussed previously with reference to block616 ofFIG. 6. In response to verifying that the coherence state ofCAPP directory110 matches the expected state, theMM532 allocated to service the BKill request transitions to the HPC Protect state (thus initiating aprotection window413 for the target address) and initiates the BKill request onsystem fabric206 as shown atreference numeral910. In other scenarios (not illustrated) in whichSNM520 modifies the coherence state in CAPP directory512 (e.g., as would be the case if the snooped memory access request is a RWITM request),MM532 instead returns a failure message toAP104 and returns to the Idle state.
Returning to the scenario shown inFIG. 9, in response to the BKill request,MM532 receives a combined response indicating success of the BKill request, as indicated atreference numeral912. In response to receiving the combined response indicating success of the BKill request,MM532 updates the coherence state for the target address inCAPP directory512 to Modified. In addition, as indicated byarrow914,MM532 transmits a Complete message toAP104. Thereafter,MM532 returns to the Idle state. In response to receipt of the Complete message, the AP FSM directs the update of the coherence state for the target address inAP directory550 from T to Modified and the update of the corresponding cache line inAP array552. Thereafter, the AP FSM returns to the Idle state.
Referring now toFIG. 10, there is depicted a third time-space diagram of an exemplary processing scenario in which anAP104 requests to coherently update a memory block within the primarycoherent system102 to which it is attached.
As the exemplary processing scenario shown inFIG. 10 begins, anAP104 processes a command (e.g., a software or firmware instruction executed within AP104) specifying an update to a memory block identified by a target address within the coherent address space of primarycoherent system102. In response to the command,AP104 allocates one of its idle finite state machines (FSMs) to manage performance of the command and performs a lookup of the target address inAP directory550, as indicated byarrow1000. The AP FSM transitions from an Idle state (indicated by “X”) to an Update Active state and, based on a determination that the target address has an Shared (S) coherence state with respect toAP directory550, transmits toCAPP110 an update request with an expected state of S, as shown atreference numeral1002.
In response to receipt fromAP104 of the update request,CAPP110 translates the update request to a DClaim request. As described above with reference to Table I, the DClaim request requests invalidation of all other cached copies of the target memory block to permitAP104 to modify its existing Shared copy of the target memory block.CAPP110 additionally determines a coherence state for the target address specified by the update request with respect toCAPP110, as shown atreference numeral1004. Because in this case, the target address of the update request collides with an address of a snooped DClaim request that aSNM520 is currently processing, the state of thatSNM520 is determinative, meaning thatCAPP110 determines the Shared Protect (SP) state. Thus, the coherence state determined byCAPP110 does not match the expected state of Shared (see, e.g., block620 ofFIG. 6). Consequently,CAPP110 allocates anidle MM532 to the DClaim request in the Parked (P) state, as previously described with reference to block642 ofFIG. 6.
In response to the snooped DClaim request, theSNM520 that is active working on the snooped DClaim request updates the coherence state of the target address inCAPP directory512 to the Shared Protect Noted state, as indicated byarrow1010, and additionally transmits a Kill message toAP104 to cause the coherence state inAP directory550 to be updated to the Invalid state, as indicated byarrow1012. As shown inFIG. 10, theSNM520 thereafter returns to the Idle state.
In response to theSNM520 returning to the Invalid state, theMM532 allocated to the DClaim request transitions from the Parked state to an active state and again determines the coherence state of the target memory address with respect toCAPP110, as described above with reference to block616 ofFIG. 6. Because the Parked state inhibits the dispatch of anyother SNM520 to service a conflicting address, the coherence state specified by CAPP directory512 (i.e., Shared Protect Noted) is determinative of the coherence state of the target memory address with respect toCAPP110. In response to detecting a mismatch of the coherence state in CAPP directory512 (Shared Protect Noted) with the expected state (Shared), theMM532 provides a Failure message toAP104 to indicate failure of the DClaim request ofAP104, as indicated byarrow1014.
Due to the potential unreliability of communication link210, the invalidation inAP directory550 initiated bySNM520 is preferably confirmed by receipt ofMM532 of a full handshake fromAP104 as indicated byarrow1018. IfMM532 does not receive a handshake fromAP104 confirming invalidation of the target memory address inAP directory550 within a predetermined time period,MM532 preferably retries a Kill message until the handshake is returned byAP104 or a failure threshold is reached. In response to receipt of the handshake fromAP104, theMM532 allocated to the DClaim request returns to the Idle state.
As will be appreciated, in an alternative embodiment,CAPP110 can instead accommodate for the possible unreliability of communication link210 by leaving theSNM520 allocated to service the conflicting DClaim request in the Shared Protect state until theSNM520 receives the handshake fromAP104. However, this alternative embodiment consumes more resources in that it requires both theSNM520 andMM532 to remain active for longer periods of time, thus reducing the availability of resources to service other memory access requests received fromAP104 or snooped onsystem fabric206.
The AP FSM, in response to receivingKill message1012, transitions from the Update Active state to a Kill Active state, reflecting a need to invalidate the target memory block inCAPP directory512. Accordingly, the AP FSM performs a lookup in AP directory550 (as indicated by arrow1020) transmits aKill request1022 toCAPP110 specifying the same target memory address as its earlier update request and indicating an expected coherence state of Shared Protect Noted (which the AP FSM received in Kill message1012). In response to the Kill request,master logic502 again determines the coherence state of the target memory address with respect toCAPP110 as described above with respect to block616 ofFIG. 6, and as indicated inFIG. 10 byarrow1024. In response, to determining that the coherence state of the target memory address with respect to CAPP110 (i.e., the Shared Protect Noted state indicated by CAPP directory512) matches the expected state indicated byAP104,master logic502 allocates a MM532 (which could be thesame MM532 or a different MM532) in an Active (A) state to service the AP Kill request, as described above with reference to block621 ofFIG. 6. Because the Kill request does not require a memory access request to be issued onsystem fabric206, theMM532 updates theCAPP directory512 as indicated by the AP memory access request, as described above with reference to block650 ofFIG. 6, in this case by invalidating the target memory address inCAPP directory512. This update toCAPP directory512 is illustrated inFIG. 10 byarrow1010. On completion of the update toCAPP directory512,MM532 also transmits a Success message toAP104 to confirm the update toCAPP directory512, as indicated inFIG. 10 byarrow1012 and as described above with respect to block652 ofFIG. 6.
After the scenario illustrated inFIG. 10, the processing scenario illustrated inFIG. 8 can be performed in order to allowAP104 to update the target memory block of primarycoherent system102.
Referring now toFIG. 11, there is depicted a block diagram of anexemplary design flow1100 used for example, in semiconductor IC logic design, simulation, test, layout, and manufacture.Design flow1100 includes processes, machines and/or mechanisms for processing design structures or devices to generate logically or otherwise functionally equivalent representations of the design structures and/or devices described above and shown inFIGS. 1-3 and5. The design structures processed and/or generated bydesign flow1100 may be encoded on machine-readable transmission or storage media to include data and/or instructions that when executed or otherwise processed on a data processing system generate a logically, structurally, mechanically, or otherwise functionally equivalent representation of hardware components, circuits, devices, or systems. Machines include, but are not limited to, any machine used in an IC design process, such as designing, manufacturing, or simulating a circuit, component, device, or system. For example, machines may include: lithography machines, machines and/or equipment for generating masks (e.g. e-beam writers), computers or equipment for simulating design structures, any apparatus used in the manufacturing or test process, or any machines for programming functionally equivalent representations of the design structures into any medium (e.g. a machine for programming a programmable gate array).
Design flow1100 may vary depending on the type of representation being designed. For example, adesign flow1100 for building an application specific IC (ASIC) may differ from adesign flow1100 for designing a standard component or from adesign flow1100 for instantiating the design into a programmable array, for example a programmable gate array (PGA) or a field programmable gate array (FPGA) offered by Altera® Inc. or Xilinx® Inc.
FIG. 11 illustrates multiple such design structures including aninput design structure1120 that is preferably processed by adesign process1110.Design structure1120 may be a logical simulation design structure generated and processed bydesign process1110 to produce a logically equivalent functional representation of a hardware device.Design structure1120 may also or alternatively comprise data and/or program instructions that when processed bydesign process1110, generate a functional representation of the physical structure of a hardware device. Whether representing functional and/or structural design features,design structure1120 may be generated using electronic computer-aided design (ECAD) such as implemented by a core developer/designer. When encoded on a machine-readable data transmission, gate array, or storage medium,design structure1120 may be accessed and processed by one or more hardware and/or software modules withindesign process1110 to simulate or otherwise functionally represent an electronic component, circuit, electronic or logic module, apparatus, device, or system such as those shown inFIGS. 1-3 and 5. As such,design structure1120 may comprise files or other data structures including human and/or machine-readable source code, compiled structures, and computer-executable code structures that when processed by a design or simulation data processing system, functionally simulate or otherwise represent circuits or other levels of hardware logic design. Such data structures may include hardware-description language (HDL) design entities or other data structures conforming to and/or compatible with lower-level HDL design languages such as Verilog and VHDL, and/or higher level design languages such as C or C++.
Design process1110 preferably employs and incorporates hardware and/or software modules for synthesizing, translating, or otherwise processing a design/simulation functional equivalent of the components, circuits, devices, or logic structures shown inFIGS. 1-3 and 5 to generate anetlist1180 which may contain design structures such asdesign structure1120.Netlist1180 may comprise, for example, compiled or otherwise processed data structures representing a list of wires, discrete components, logic gates, control circuits, I/O devices, models, etc. that describes the connections to other elements and circuits in an integrated circuit design.Netlist1180 may be synthesized using an iterative process in which netlist1180 is resynthesized one or more times depending on design specifications and parameters for the device. As with other design structure types described herein,netlist1180 may be recorded on a machine-readable storage medium or programmed into a programmable gate array. The medium may be a non-volatile storage medium such as a magnetic or optical disk drive, a programmable gate array, a compact flash, or other flash memory. Additionally, or in the alternative, the medium may be a system or cache memory, or buffer space.
Design process1110 may include hardware and software modules for processing a variety of input data structuretypes including netlist1180. Such data structure types may reside, for example, withinlibrary elements1130 and include a set of commonly used elements, circuits, and devices, including models, layouts, and symbolic representations, for a given manufacturing technology (e.g., different technology nodes, 32 nm, 45 nm, 90 nm, etc.). The data structure types may further includedesign specifications1140,characterization data1150,verification data1160,design rules1170, and test data files1185 which may include input test patterns, output test results, and other testing information.Design process1110 may further include, for example, standard mechanical design processes such as stress analysis, thermal analysis, mechanical event simulation, process simulation for operations such as casting, molding, and die press forming, etc. One of ordinary skill in the art of mechanical design can appreciate the extent of possible mechanical design tools and applications used indesign process1110 without deviating from the scope and spirit of the invention.Design process1110 may also include modules for performing standard circuit design processes such as timing analysis, verification, design rule checking, place and route operations, etc.
Design process1110 employs and incorporates logic and physical design tools such as HDL compilers and simulation model build tools to processdesign structure1120 together with some or all of the depicted supporting data structures along with any additional mechanical design or data (if applicable), to generate asecond design structure1190.Design structure1190 resides on a storage medium or programmable gate array in a data format used for the exchange of data of mechanical devices and structures (e.g., information stored in a IGES, DXF, Parasolid XT, JT, DRG, or any other suitable format for storing or rendering such mechanical design structures). Similar todesign structure1120,design structure1190 preferably comprises one or more files, data structures, or other computer-encoded data or instructions that reside on transmission or data storage media and that when processed by an ECAD system generate a logically or otherwise functionally equivalent form of one or more of the embodiments of the invention shown inFIGS. 1-3 and 5. In one embodiment,design structure1190 may comprise a compiled, executable HDL simulation model that functionally simulates the devices shown inFIGS. 1-3 and 5.
Design structure1190 may also employ a data format used for the exchange of layout data of integrated circuits and/or symbolic data format (e.g., information stored in a GDSII (GDS2), GLI, OASIS, map files, or any other suitable format for storing such design data structures).Design structure1190 may comprise information such as, for example, symbolic data, map files, test data files, design content files, manufacturing data, layout parameters, wires, levels of metal, vias, shapes, data for routing through the manufacturing line, and any other data required by a manufacturer or other designer/developer to produce a device or structure as described above and shown inFIGS. 1-3 and 5.Design structure1190 may then proceed to astage1195 where, for example, design structure1190: proceeds to tape-out, is released to manufacturing, is released to a mask house, is sent to another design house, is sent back to the customer, etc.
As has been described, in at least one embodiment, a coherent attached processor proxy (CAPP) of a primary coherent system receives a memory access request from an attached processor (AP) and an expected coherence state of a target address of the memory access request with respect to a cache memory of the AP. In response, the CAPP determines a coherence state of the target address and whether or not the expected state matches the determined coherence state. In response to determining that the expected state matches the determined coherence state, the CAPP issues a memory access request corresponding to that received from the AP on a system fabric of the primary coherent system. In response to determining that the expected state does not match the coherence state determined by the CAPP, the CAPP transmits a failure message to the AP without issuing on the system fabric a memory access request corresponding to that received from the AP.
In at least one embodiment, in response to receiving a memory access request and expected coherence state at an attached processor at a coherent attached processor proxy (CAPP), the CAPP determines that a conflicting request is being serviced. In response to determining that the CAPP is servicing a conflicting request and that the expected state matches, a master machine of the CAPP is allocated in a Parked state to service the memory access request after completion of service of the conflicting request. The Parked state prevents servicing by the CAPP of a further conflicting request snooped on the system fabric. In response to completion of service of the conflicting request, the master machine transitions out of the Parked state and issues on the system fabric a memory access request corresponding to that received from the AP.
In at least one embodiment, a coherent attached processor proxy (CAPP) within a primary coherent system participates in an operation on a system fabric of the primary coherent system on behalf of an attached processor (AP) that is external to the primary coherent system and that is coupled to the CAPP. The operation includes multiple components communicated with the CAPP including a request and at least one coherence message. The CAPP determines one or more of the components of the operation by reference to at least one programmable data structure within the CAPP that can be reprogrammed.
While various embodiments have been particularly shown as described with reference to a preferred embodiment, it will be understood by those skilled in the art that various changes in form and detail may be made therein without departing from the spirit and scope of the claims. For example, although aspects have been described with respect to a computer system executing program code that directs the functions of the present invention, it should be understood that present invention may alternatively be implemented as a program product including a computer-readable storage device (e.g., volatile or non-volatile memory, optical or magnetic disk or other statutory manufacture) that stores program code that can be processed by a data processing system. Further, the term “coupled” as used herein is defined to encompass embodiments employing a direct electrical connection between coupled elements or blocks, as well as embodiments employing an indirect electrical connection between coupled elements or blocks achieved using one or more intervening elements or blocks. In addition, the term “exemplary” is defined herein as meaning one example of a feature, not necessarily the best or preferred example.