Movatterモバイル変換


[0]ホーム

URL:


USRE44129E1 - System and method for instruction-level parallelism in a programmable multiple network processor environment - Google Patents

System and method for instruction-level parallelism in a programmable multiple network processor environment
Download PDF

Info

Publication number
USRE44129E1
USRE44129E1US11/862,815US86281507AUSRE44129EUS RE44129 E1USRE44129 E1US RE44129E1US 86281507 AUS86281507 AUS 86281507AUS RE44129 EUSRE44129 EUS RE44129E
Authority
US
United States
Prior art keywords
instruction
thread
dependency
counter
execution
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Lifetime, expires
Application number
US11/862,815
Inventor
Joel Zvi Apisdorf
Sam Brandon Sandbote
Michael Daniel Poole
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
US Department of Navy
Original Assignee
US Department of Navy
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by US Department of NavyfiledCriticalUS Department of Navy
Priority to US11/862,815priorityCriticalpatent/USRE44129E1/en
Application grantedgrantedCritical
Publication of USRE44129E1publicationCriticalpatent/USRE44129E1/en
Adjusted expirationlegal-statusCritical
Expired - Lifetimelegal-statusCriticalCurrent

Links

Images

Classifications

Definitions

Landscapes

Abstract

A system and method process data elements with instruction-level parallelism. An instruction buffer holds a first instruction and a second instruction, the first instruction being associated with a first thread, and the second instruction being associated with a second thread. A dependency counter counts satisfaction of dependencies of instructions of the second thread on instructions of the first thread. An instruction control unit is coupled to the instruction buffer and the dependency counter, the instruction control unit increments and decrements the dependency counter according to dependency information included in instructions. An execution switch is coupled to the instruction control unit and the instruction buffer, and the execution switch routes instructions to instruction execution units.

Description

CROSS-REFERENCE TO RELATED APPLICATIONS
The present invention is related to patent applications “System And Method For Processing Overlapping Tasks In A Programmable Network Processor Environment” (Ser. No. 09/833,581) and “System and Method for Data Forwarding in a Programmable Multiple Network Processor Environment” (Ser. No. 09/833,578), both of which are incorporated herein by reference.
BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates generally to digital computing. More specifically, the present invention relates to network processors for processing network data elements.
2. Discussion of the Related Art
Network switches and routers, or network switch elements, form the backbone of digital networks, such as the Internet. Network switch elements connect network segments by receiving network data from ingress network segments and transferring the network data to egress network segments. Because large telecommunications switching facilities and central offices aggregate network traffic from extensive networks and many network segments, they require high-speed and high-availability switches and routers.
Network switch elements select the egress network segment by processing the address or destination included in the network data according to network data processing program logic. Traditionally, network switch elements included Application Specific Integrated Circuits (ASICs) that provided the program logic. Because ASICs are “hard-coded” with program logic for handling network traffic, they provide the high speed necessary to process a large volume of network data. ASICs, however, make it difficult to upgrade or reconfigure a network switch element, and it is expensive to design and fabricate a new ASIC for each new type of network rig switch element.
In response to these drawbacks, manufacturers of network switch elements are turning to programmable network processors to enable network switch elements to process network data. Programmable network processors process network data according to program instructions, or software, stored in a memory. The software allows manufacturers and users to define the functionality of the network switch elements-functionality that can be altered and changed as needed. With programmable network processors, manufacturers and users can change the software to respond to new services quickly, without costly system upgrades, as well as implement new designs quickly.
To the extent that there is a drawback to the use of programmable network processors in network switch elements, that drawback relates to speed. Because programmable network processors process network data using software, they are usually slower than a comparable hard-coded ASIC. One of the major design challenges, therefore, is developing programmable network processors fast enough to process the large volume of network data at large telecommunications switching facilities.
One technique used to increase speed in traditional processor design is “instruction-level parallelism,” or processing multiple threads of instructions on a processing element in parallel. However, traditional instruction-level parallelism techniques are either highly complex, or would introduce unacceptable delays and timing problems into the processing of network data, which must be processed on a time critical basis.
SUMMARY OF THE INVENTION
The present invention provides a system and method for processing information using instruction-level parallelism. In the system, an instruction buffer holds a first instruction and a second instruction, the first instruction being associated with a first thread, and the second instruction being associated with a second thread. In this system, one or more instructions from the second thread may be dependent on the execution of one or more instructions in the first thread. A dependency counter is used to record dependencies of instructions between the first thread and the second thread. An instruction control unit is coupled to the instruction buffer and the dependency counter, the instruction control unit increments and decrements the dependency counter on the basis of information in the instructions. An execution switch is coupled to the instruction control unit and the instruction buffer, the execution switch sends instructions to an execution unit.
In the method, a first instruction associated with a first thread is loaded on a processing element. The processing element determines that execution of a second instruction depends on the execution of the first instruction, where the second instruction is associated with a second thread. A dependency counter associated with the second thread is incremented if the processing element determines that execution of a second instruction depends on the execution of the first instruction.
BRIEF DESCRIPTION OF THE DRAWINGS
The present invention is described with reference to the accompanying drawings. In the drawings, like reference numbers indicate identical or functionally similar elements. Additionally, the left-most digit(s) of a reference number identifies the drawing in which the reference number first appears.
FIG. 1 illustrates a system block diagram of a data communications system.
FIG. 2 illustrates a system block diagram of a programmable network processor.
FIG. 3 illustrates a system block diagram of a multiprocessor core.
FIG. 4 illustrates a system block diagram of an exemplary processing element.
FIG. 5 is a diagram illustrating concurrent processing of three threads of instructions.
FIG. 6 illustrates concurrent processing of two threads of instructions.
FIG. 7 illustrates dependency counter groups.
FIG. 8 illustrates an exemplary instruction.
FIG. 9 illustrates an exemplary process for executing instructions.
FIG. 10 illustrates an exemplary process for executing instructions
DETAILED DESCRIPTION
Exemplary embodiments of the invention are discussed in detail below. While specific implementations are discussed, it should be understood that this is done for illustrative purposes only. A person skilled in the relevant art will recognize that other components and configurations may be used without parting from the spirit and scope of the invention.
Programmable network processors offer a number of advantages including flexibility, low cost, maintenance ease, decreased time to market, and increased service life. It is difficult, however, to develop a programmable network processor capable of meeting the demand for ever-increasing speed. One technique for increasing the speed of a programmable network processor is instruction-level parallelism. In instruction-level parallelism, threads of parallel programs can execute concurrently on a single processing element. Instruction-level parallelism allows a processing element to continue processing instructions, even if one or more threads are waiting for long-latency operations to complete.
One problem with instruction-level parallelism is maintaining synchronization of dependent instructions between the threads running on a processing element. Often, an instruction in one thread is dependent on the execution of instructions in another thread. Examples of instruction dependency are control dependency (i.e., the execution of one instruction is conditioned on the execution of another) and data dependency (i.e., one instruction uses the results of the execution of another instruction). Unfortunately, conventional techniques for synchronizing the execution of instructions among multiple threads do not lend themselves to programmable network processor applications. Conventional techniques introduce significant delays to processing, delays that are unsuitable for processing time critical network data elements.
The present invention is directed to a system and method for synchronizing the execution of multiple threads of instructions on a single processing element at high speed. An instruction in a first thread can include dependence indicators, such as a bit or bits, that indicate dependence of the instruction on the execution of a second thread. When a processing element encounters an instruction that includes dependence indicators that indicate dependence between threads, the processing element checks, decrements, or increments one or more dependency counters that record satisfaction of dependencies between instructions and threads. If a dependency indicator indicates that an instruction in a first thread is dependent upon the execution of a second thread, a dependency counter is checked. If the dependency counter is not above a threshold, the processing element suspends the execution of the first thread until the dependency counter is incremented by a second thread to above the threshold. This allows the processing element to maintain synchronized execution of dependent instructions between threads in a highly efficient manner. It should be recognized that the concepts described below are not restricted to processing network data elements but are extensible to a generic form of data processing. Prior to discussing the features of the present invention, a brief description of a data communications system is provided.
FIG. 1 illustrates a block diagram of a network data communications system, according to an embodiment of the present invention.Data communications system100 can be, for example, of the type used by network service providers and telecommunication carriers to provide voice and data communications services to consumers.Data communications system100 includesnetwork102,network line modules1041104N, and switchfabric106. Note that a subscript “N” in the figures denotes a plurality of elements generally, and not a specific number or equality of number between different elements with a subscript “N.”
Network102 is connected to networkline modules1041104Nwhich, in turn, are connected to switchfabric106. Althoughdata communications system100 is shown as including physical connections between the various components, other configurations are possible, such as wireless connections. Connections betweennetwork102,network line modules1041104N, and switchfabric106 can be, for example, wireless data connections, data over copper, fiber optic connections (e.g., OC-48, OC-192, OC-768), or other data communications connections as would be apparent.
Network line modules1041104Nsend and receive network data elements to (from)network102.Network line modules1041104Nprocess the network data elements and communicate the process network data elements withswitch fabric106. Network data elements are signals carrying information including communications information. Examples of network data elements are asynchronous transfer mode (“ATM”) cells, Frame Relay frames, Internet Protocol (“IP”) packets, etc., and portions (segments) of these. Processing includes the concepts of performing a calculation or manipulation involving a network data element. Processing can include, for example, determining the next hop or egress port to which the network data element Star should be routed, network management, such as traffic shaping or policing, network monitoring, etc.Network102 is a network for communicating network data elements.Network102 can be, for example, the Internet, a telecommunications data network, an intranet, an extranet, a voice over data communications network, etc., and combinations thereof.
For descriptive clarity, operation ofdata communication system100 is described in terms ofnetwork line module1041.Network line module104, includes network linemodule ingress port108, network linemodule egress port110, andprogrammable network processors11211122. Note that the configuration ofnetwork line modules1041104Nis shown for illustrative purposes only, and alternate configurations fornetwork line modules1041104Nare possible. Alternate configurations include, for example, single or additional programmable network processors per network line module, additional network line module ingress ports, multiple egress ports, additional connections to network102, etc.
Network line module1041 1041receives network data elements fromnetwork102 at network linemodule ingress port108.Programmable network processor1121receives network data elements from network linemodule ingress port108.Programmable network processor112 enables network line module1041 1041to process the received network data elements.Programmable network processor1121provides the network data elements to switchfabric106 after processing.
Switch fabric106 includes switchfabric ingress ports1141114Nand switchfabric egress ports1161116N. Switchfabric ingress ports1141114Nreceive data fromnetwork line modules1041104Nand switchfabric egress ports1161116Nports provide data to networkline modules1041104N.Switch fabric106 outputs network data elements received fromnetwork processor1121on the desired switchfabric egress port1161116N.Network line module1041receives processed network data elements from switchfabric egress port1161and performs additional processing, as required, and transmits the network data element to network102 via network linemodule egress port110. Note that network linemodule ingress port108, networkelement egress port110, switchfabric ingress ports1141114N, and switchfabric egress ports1161116Nare logical representations of physical devices, and other combinations, such as single ports that transmit and receive network data elements, are possible.
FIG. 2 illustrates a system block diagram of a programmable network processor, according to an embodiment of the present invention.Programmable network processor200 can be considered an exemplary embodiment of both ingress and egressprogrammable network processors1121112N, as described above.Programmable network processor200 includesmemory controller204,input interface206,multiprocessor core202, andoutput interface208.Multiprocessor core202 is connected to inputinterface206,output interface208, andmemory controller204. Note that the particular configuration, number, and type of elements ofprogrammable processor200 are shown for illustrative purposes only and other configurations ofprogrammable network processor200 are possible as would be apparent.
For the purposes of this description, it is presumed that theprogrammable network processor200 ofFIG. 2 corresponds toprogrammable network processor1121. In operation, such aprogrammable network processor200 receives network data elements from network linemodule ingress port108 viainput interface206.Input interface206 receives the network data elements and provides them tomultiprocessor core202 for processing as described above.Multiprocessor core202 processes the network data elements and provides the result tooutput interface208.Output interface208 receives processed network data elements frommultiprocessor core202 and forwards them to switchfabric106 for routing.Multiprocessor core202 accesses storage located offprogrammable network processor200 viamemory controller204.
Multiprocessor core202 is connected to hostcontrol processor210.Host control processor210 provides network management logic and information forprogrammable network processor200. Such network management logic and information includes, for example, generating and receiving network data elements for controllingswitch fabric106,network line modules1041104Nand other network components.Host control processor210 performs other functions, such as generating network data elements for switch fabric control, setting up network connections and loading programs intomultiprocessor core202 for operation.
FIG. 3 illustrates a system block diagram of a multiprocessor core, according to an embodiment of the present invention.Multiprocessor core300 is an exemplary embodiment ofmultiprocessor core202, as described above. Althoughmultiprocessor core300 can be used for a generic form of data processing,multiprocessor core300 can also be of the type employed indata communications system100.Multiprocessor core300 includes processing elements (PE)3021302N, data memories (DM)3041304N, program memories (PM)3061306N,intraswitch314, andhost control interface308.Processing elements3021302Nare connected to programmemories3061306N, andintraswitch314.Data memories3041304Nare connected to intraswitch314.Program memories3061306Nare connected to processingelements3021302Nandintraswitch314.Host control interface308 is connected to intraswitch314.Intraswitch314 is connected to on-chipperipheral units310 and312. Examples of on-chipperipheral units310 and312 areinput interface206,output interface208, andmemory controller204 ofFIG. 2.
Processing elements3021302Nprocess network data elements, thereby providing the processing functionality formultiprocessor core300.Processing elements3021302Nexecute program instructions fromprogram memories3061306N, and load and store data indata memories3041304N. Each of processingelements3021302Ncan process multiple threads of instructions concurrently, according to an embodiment of the present invention.
Program memories3061306Nanddata memories3041304Nprovide data storage functionality for the various elements ofmultiprocessor core300.Program memories3061306Nstore program instructions for the processing of network data elements by processingelements3021302N. AlthoughFIG. 3 depicts groups of four processing elements directly connected to one ofprogram memories3061306N, other configurations connecting program memory to processing elements are possible, including for example, additional processing elements or program memories as would be apparent.Data memories3041304Nprovide on-chip storage for data, such as intermediate-results data from processing network data elements, for the operation of processingelements3021302N.
Intraswitch314 enables communication between the various components ofmultiprocessor core300. For example, processingelements3021302Naccess data memories3041304Nthroughintraswitch314.Intraswitch314 can be, for example, a switching fabric inmultiprocessor core300, or individual trace connections inmultiprocessor core300.Host control interface308 connectsmultiprocessor core300 to hostcontrol processor210.Multiprocessor core300 is connected to on-chipperipheral units310 and312 viaintraswitch314.
In operation,multiprocessor core300 receives network data elements from on-chipperipheral units310 and312.Processing elements3021302Nreceive the network data elements and process them according to the programs stored as instructions inprogram memories3061306N. The intermediate results and final results of the processing operations are stored indata memories3041304N. After a network data element has been processed, it is sent to on-chipperipheral unit310 and312.
FIG. 4 illustrates a system block diagram of an exemplary processing element, according to an embodiment of the present invention.Processing element400 is an example of one of the processing elements shown inFIG. 3, and can be employed in a generic form of data processing or can be of the type that is employed indata communications system100.
Moreover,exemplary processing element400 is an instruction-level parallel processing element, in which two or more threads of parallel programs execute concurrently.Processing element400 can, therefore, maintain a high utilization under conditions where the processing element would otherwise idle waiting for long-latency operations to complete. Note thatprocessing element400 is provided for illustrative purposes only and that other processing element configurations are possible.
Processing element400 includes instruction fetchunit402,instruction buffers404A,404B,404C, and404D.Processing element400 also includes function decode andexecution switch406, dependency counters410,instruction issue control408, memory/peripheral interface unit4;2 412,primary function unit414,auxiliary function unit416, and registerfile418. Note that although dependency counters410 are shown as being part ofinstruction issue control408, other configurations are possible. For example, dependency counters410 can also be connected to, but not part of,instruction issue control408.
Instruction fetchunit402 is connected to each ofinstruction buffers404A–404D. Each of the connections between fetchunit402 andinstruction buffers404A–404D provides a path for instructions from a program thread. Instruction buffers404A–404D are, in turn, connected to function decode andexecution switch406. Instruction buffers404A–404D are also connected toinstruction issue control408.Instruction issue control408 is connected to function decode andexecution switch406. Function decode andexecution switch406 is connected to memoryperipheral interface unit412,primary function unit414, andauxiliary function unit416. Memoryperipheral interface unit412,primary function unit414, andauxiliary function unit416 are also referred to herein asexecution units412416. Memory peripheral interface unit is connected to intraswitch314, and registerfile418.Primary function unit414 is connected to registerfile418.Auxiliary function unit416 is connected to registerfile418.
Register file418 includes readports420 and writeport422. Readports420 allowexecution units412416 to read data from the various registers inregister file418. Writeport422 allowsexecution units412416 to write data to registerfile418.
Exemplary processing element400 is shown as supporting four concurrent threads of instructions. Instruction fetchunit402 fetches instructions fromprogram memory306. The instructions are entered in the fourinstruction buffers404A–404D according to the program thread they belong to. Each ofinstruction buffers404A–404D is associated with one of four threads. For descriptive clarity, the convention of associating thread0 (T0) withinstruction buffer404A, thread1 (T1) withinstruction buffer404B, thread2 (T2) withinstruction buffer404C, and thread3 (T3) withinstruction buffer404D is adopted.
Function decode andexecution switch406 receives the instructions associated with the four threads frominstruction buffers404A–404D. Function decode andexecution switch406 provides the instructions toexecution units412416.
FIG. 5 is a diagram illustrating concurrent processing of three threads of instructions. Instruction processing diagram500 illustrates the problem of instruction synchronization between multiple threads. The instructions of one thread can be dependent on the results of instructions in another thread. For example, the contents of a register that is set by a first instruction in one thread can be used by a second instruction in another thread. In such a case, if the first instruction is not executed before the second instruction, the register will not include data valid for the execution of the first second instruction. These types of problems are referred to as synchronization problems, and may result in a program execution error.
Instruction processing diagram500 shows three threads of instructions,thread502,thread504, andthread506.Threads502506 can be of the type employed in a generic form of data processing or can be of the type that are employed indata communications system100. Note that three threads are shown for descriptive clarity only, and other configurations are possible. A processing element can process as few as two threads, and as many threads as is accommodated by a processing element architecture. For example,processing element400 accommodates four concurrent threads of instructions.
Each ofthreads502506 is shown including two instructions.Thread502 includes instruction508 (i1) and instruction510 (i2).Thread504 includes instruction512 (i3) and instruction514 (i4).Thread506 includes instruction516 (i5) and instruction518 (i6). Note that instruction processing diagram500 shows two instructions per thread for descriptive clarity only, and other configurations are possible. For example, each ofthreads502506 can include additional instructions (not shown) before the first instruction (e.g.,instruction508 in thread502), between the first and second instruction (e.g.,instructions508 and510 in thread502), and after the second instruction (e.g.,instruction510 in thread502).Threads502506 can include as many instructions as are required to perform generic data processing or perform processing fordata communications system100.
Generally, a processing element processes the three threads by executing their respective instructions. Instruction processing diagram500 shows instruction execution proceeding from left to right, and the relative spacing of instructions indicates when an instruction is being executed. For example, instruction processing diagram500 showsinstruction508 is executed beforeinstruction510 ofthread502. Note also the chronological relationships between instructions of different threads. For example, the processing element executesinstruction508 ofthread502 beforeinstruction512 ofthread504, andinstruction512 beforeinstruction516 ofthread506.
Additionally, instruction processing diagram500 shows the dependency between the instructions ofthreads502506. Dependency is when the execution of a second instruction is conditional on the execution of a first instruction. Consider, for example, a situation in which a first instruction in a first thread writes a value to a register file, such asregister file418, and a second instruction in a second thread subsequently reads the value from the register file and uses the value as an operand in a calculation. In this situation, the first instruction is referred to as the dependee instruction, and the second instruction is referred to as the dependent instruction. A dependent instruction is an instruction that must not be executed before the instruction on which it depends. A dependee instruction is an instruction on which a dependent instruction depends. As long as the dependee instruction is executed before the dependent instruction, the register file includes the correct value for the execution of the dependent instruction.
Dependsindicators520526 are used to show dependencies between the instructions ofthreads502506. Depends indicators are drawn from a dependent instruction to a dependee instruction (i.e., the arrow of the depends indicator points to the dependee instruction). Dependsindicator520 indicates that the execution ofinstruction512 depends on the execution ofinstruction508. Dependsindicator522 indicates that the execution of instruction of510 depends on the execution ofinstruction514. Dependsindicator524 indicates that the execution ofinstruction516 depends on the execution ofinstruction510. Dependsindicator526 indicates that the execution ofinstruction518 is dependent on the execution ofinstruction514.
As described above, if a first instruction depends on a second, earlier executed, instruction, processing may proceed normally. Instruction processing diagram500 showsinstruction512 andinstruction516 dependent on earlier executed instructions. Program errors may occur, however, if a first instruction depends on a later executed instruction. Instruction processing diagram500 shows the synchronization problem asinstruction510 depending on a later executed instruction. As such, it is important for a processing element to synchronize the execution order of dependent and dependee instructions between threads to avoid such program errors.
The present invention provides a system and method that maintains the order of instruction execution between threads. Generally, a processing element processes multiple threads of instructions. Instructions in the threads can include dependence indicators that indicate dependencies between instructions and threads. When the processing element encounters instructions that include dependence indicators identifying a dependent instruction or thread, it checks, decrements, or increments one or more dependency counters. If the dependency counter is not above a threshold, it indicates that a dependency has not been satisfied, and the processing element can suspend the execution of a thread until the dependency counter is incremented to above the threshold. This allows the processing element to maintain a form of synchronized execution of dependent instructions between threads.
In one embodiment, instructions can include the dependence indicators as bits, called “depends” bits and “tells” bits. A depends bit is an indicator in a dependent instruction that a particular other thread includes an instruction on which this one depends. A tells bit is an indicator in a dependee instruction that a particular other thread includes an instruction dependent on this one. The additional bits can be included with the instruction in a number of ways. For example, a compiler for instruction-level parallel processors can include the bits at compile time based on dependencies, or a programmer may specify the instruction execution order by including “depends” and “tells” bits when coding, etc.
An exemplary embodiment is described herein to provide context for discussion, and the present invention encompasses other embodiments, as are described further below. Consider an exemplary processing element processing four threads of instructions. Each of the instructions in the four threads can include depends bits and tells bits. In an exemplary embodiment each instruction in a thread can include three depends bits, each of which indicates that the instruction is dependent on one of the other three threads. Similarly, each instruction in a thread can include three tells bits, each of which indicates that one of the other three threads depends on the execution of the instruction.
In the exemplary embodiment, the processing element can include four groups of dependency counters, each of which is associated with one of the four threads. Each of the groups of dependency counters includes three individual dependency counters, each of which is associated with one of the other three threads. For instance, consider four exemplary threads,thread0,thread1,thread2, andthread3, each having an associated group of dependency counters. The exemplary group of dependency counters associated withthread0 includes three individual dependency counters, each of which is associated with one ofthread1,thread2, orthread3.
In operation, the exemplary processing element processes the instructions of the four threads. When the exemplary processing element encounters an instruction in a first thread that includes a tells bit identifying a second thread (i.e., one of the other three threads), the exemplary processing element increments the dependency counter associated with the first thread of the group of dependency counters associated with the second thread.
When the exemplary processing element processes an instruction in a first thread that includes a depends bit identifying a second thread, the processing element checks the dependency counter associated with the second thread of the group of dependency counters associated with the first thread to determine whether the instruction can be executed. If the value of the exemplary dependency counter is above a threshold (e.g., non-zero), the processing element executes the instruction. If, on the other hand, the value of the exemplary dependency counter is below a threshold, processing of the first thread is inhibited. The processing element increments the dependency counter when instructions including tells bits in the second thread are executed, and processing the first thread is resumed once the dependency counter is above the threshold. Note that an instruction can include multiple dependency indicators, such as one or more tells bits in combination with one or more depends bits. When an instruction includes more than one depends bit, the associated dependency counters must be above the threshold before the instruction is executed.
The threshold is a dependency counter value chosen to ensure that dependent instructions are not executed before the instructions in other threads on which they depend. The threshold value can be set to ensure correct instruction level synchronization. For example, the threshold can be chosen to be zero, so that a dependency counter must be incremented before a dependent instruction can be executed, as is described in further detail, below. Network data element processing is often repetitive and predictable. As such, a programmer, or compiler, can determine that value at which the threshold can be set. Note that although one embodiment of the present invention is explained in terms of a “threshold,” “above a threshold,” and “not above a threshold,” other configurations that record dependency between instructions and threads are possible. For example, in an alternate embodiment, the processing element can suspend processing a thread if a dependency counter falls below a threshold.
According to an embodiment of the present invention, depends bits, tells bits, and dependency counters are used to record the satisfaction of dependencies between instructions in a first thread and the processing of a second thread. This is in contrast to instruction processing diagram500 ofFIG. 5 that shows dependencies between individual instructions. It is sufficient to record dependency at this level because the present invention provides a system and method that ensures that dependent instructions are executed after the instructions on which they depend.
Consider, for example, the application of “depends” bits and “tells” bits to instruction processing diagram500 ofFIG. 5. In this example,instruction512 would include a dependsbit identifying instruction512 as dependent upon instructions inthread502. In one embodiment, the depends bit identifies the thread that includes the instruction on whichinstruction512 is dependent, which is, in this case,thread502. In another embodiment, the depends bits can identify the type or particular one of the instructions inthread502. For example, the instruction can include more bits (i.e., more information) that identify instruction characteristics (such as type, priority, etc.). For descriptive clarity, however, depends bits and tells bits are described herein as identifying threads, and not instructions. As such,instruction508 would include a tells bit that identifiesthread504 as including an instruction or instructions that are dependent upon the execution ofinstruction508.
Similarly,instruction510 would include a tellsbit identifying thread506 as including instructions dependent upon the execution ofinstruction510.Instruction510 would also include a dependsbit identifying instruction510 as dependent on the execution of instructions inthread504.Instruction514 would include a tellsbit identifying thread502 as including instructions that are dependent on the execution ofinstruction514.Instruction514 also would include a tellsbit identifying thread506 as including instructions dependent on the execution ofinstruction514.Instruction516 would include abit identifying instruction516 as dependent on instructions inthread502.Instruction518 would include a dependsbit identifying instruction518 as dependent on the execution of instructions inthread504.
FIG. 8 illustrates an exemplary instruction, according to an embodiment of the present invention.Instruction800 includesopcode802, source0804 0 804, source1806 1 806,result808, dependsbit810, dependsbit812, dependsbit814, tellsbit816, tellsbit818, and tellsbit820.Opcode802 is the operator forinstruction800. Source0804 0 804 specifies a first operand operated upon byopcode802. Source1806 1 806 specifies a second operand operated upon byopcode802.Result808 identifies a register to which the results of opcode1302 are stored.
Dependsbits810814 indicate thatinstruction800 depends upon the execution of instructions in other threads.Instruction800 is configured for a processing element that supports the operation of four threads. Note that althoughinstruction800 includes three depends bits which identify three other threads, and three tells bits, which also identify three other threads, other configurations are possible. By adding additional bits or changing how the bits are used,instruction800 can be configured for a processing element that supports more than four threads. Consider, for example, binary coding of dependsbits810814, and tellsbits816818 816-820. In such an example, dependsbits810814 can represent up to eight other threads, extendinginstruction800 to a processing element supporting nine threads. Similarly, additional depends and tells bits can be added as is necessary for a given processing element architecture.
Consider, for example, the case in whichinstruction800 is executing inthread1. Ifinstruction800 is executing inthread1, the other three threads on which the execution ofinstruction800 may depend arethread0,thread2, andthread3. In this case, dependsbit810 can identifyinstruction800 as dependent onthread0, dependsbit812 can identifyinstruction800 as dependent onthread2, and depends bid814 can identifyinstruction800 as dependent onthread3. Likewise, tellsbit816 can identifythread0 as dependent oninstruction800. Tellsbit818 can identifythread2 as dependent oninstruction800. Tellsbit820 can identifythread3 as dependent oninstruction800.
As suggested by the relationships described above, dependency counter groups are a set of dependency counters associated with each thread. Each ofthreads502506 of instruction processing diagram500, for example, would have, or be associated with, a dependency counter group. Each dependency counter group could include a number of individual dependency counters, each of which is associated with one of the other threads executing on the processing element. For example, the dependency counter group associated withthread502 of instruction processing diagram500 would include two dependency counters, one related to, or associated with,thread504, and one related to, or associated with,thread506.
FIG. 7 illustrates exemplary dependency counter groups, according to an embodiment of the present invention.FIG. 7 shows four dependency counter groups, each of which is associated with one of four threads. Dependency counter group702 (T0) is associated withthread0, dependency counter group704 (T1) is associated withthread1, dependency counter group706 (T2) is associated withthread2, and dependency counter group708 (T3) is associated withthread3. Each ofdependency counter groups702708 includes three dependency counters, each of which is associated with one of the other three threads.Dependency counter group702 includes dependency counter T01, dependency counter T02, and dependency counter T03. Dependency counter T01is that dependency counter ofthread0 that is related to, or associated with,thread1. Similarly, dependency counter T02and dependency counter T03arethread0 dependency counters associated with, or related to,threads2 and3, respectively. In the same manner,dependency counter group704 includes dependency counter T10, dependency counter T12, and dependency counter T13. Dependency counter T10is associated withthread0, dependency counter T12is associated withthread2, and dependency counter T13is associated withthread3. Also,dependency counter group706 includes dependency counter T20, dependency counter T21, and dependency counter T23. Dependency counter T20is associated withthread0, dependency counter T21is associated withthread1, and dependency counter T13is associated withthread3.Dependency counter group708 includes dependency counter T30, dependency counter T31, and dependency counter T32. Dependency counter T30is associated withthread0, dependency counter T31is associated withthread1, and dependency counter T32is associated withthread2.
Note that although four dependency counter groups are shown (as are implemented in one embodiment to support four threads), and the dependency counter groups include three dependency counters each, other configurations are possible. For example, greater or fewer than four dependency counter groups can be used according to the number of threads a processing element can execute concurrently. Additionally,dependency counter groups702708 can include more or fewer dependency counters, depending on the processing element architecture.
Moreover, although the invention and illustrative examples are described in terms of dependency counter groups, and dependency counters, other configurations are possible. Consider, for example, bi-state, or tri-state elements substituted for dependency counters702708. A bi-state element associated with a first thread can be set when a corresponding dependee instruction in a second thread is executed, and reset when the dependent instruction is executed. In this example, a processing element suspends processing the first thread when it encounters an instruction including a depends bit if the bi-state element is not set. Similarly, tri-state elements, and other state retaining elements can be set and reset by the processing element. In this embodiment, however, care should be taken to avoid overflowing the state elements. For example, a bi-state element may be incremented, or changed, only once in response to an instruction that includes a tells bit.
Similarly, the implementation of the present invention should account for the size of the dependency counters to avoid overflow. Consider, for example, the case in which multiple instructions including tells bits identifying one thread are executed. In such a case, it is possible to overflow the dependency counter. Dependency counters, therefore, should be specified large enough to ensure that overflow will never occur, or limits should be set on the number of times a dependency counter can be incremented. For example, a first thread that includes many instructions that include tells bits identifying a second thread can be suspended once the dependency counter associated with the second thread has reached a limit. The limit can ensure that the dependency counter does not overflow, and can also ensure that a dependee thread does not get too far ahead of a dependent thread.
In operation, a tells bit affects one or more dependency counters of the threads other than the one on which the tells bit appears. By contrast, a depends bit affects one or more dependency counters of the thread on which the depends bit appears. Thus, when the processing element detects a first instruction in a first thread as including a tells bit that identifies a second thread, the processing element increments one of the dependency counters in the dependency counter group of the second thread. In particular, it increments that dependency counter of the second thread that is associated with the first thread. Consider, for example, the case in whichthread1 is executing a stream of instructions. One of the instructions inthread1 includes a tells bit that identifiesthread0. In response to the tells bit, the processing element increments the particular dependency counter independency counter group702 associated withthread1. In the example ofdependency counter group702, dependency counter T01, is associated withthread1. The processing element, therefore, increments T01ofdependency counter group704 when thethread1 instruction tells bit is detected. Similarly, when the processing element detects an instruction in a thread that includes a depends bit, the dependency counters are checked, and the processing element either suspends the dependent thread or executes the instruction and decrements the associated dependency counter.
For example,thread1 can include an instruction that includes a depends bit that identifies the instruction as depending on the execution ofthread0. In this case, when the processing element detects the depends bit, the dependency counter associated withthread0 of the dependency counter group associated with thethread1 is checked. In this case, dependency counter T10ofdependency counter group704 is associated with thread T0. Depending on the value of dependency counter T10, the processing element either suspendsprocessing thread1 or both decrements T10and continues processing thethread1, thereby executing the instruction. Once suspended, the processing element resumesprocessing thread1 when dependency counter T10is incremented by the processing element (i.e., when an instruction inthread0 with a tells bit is executed).
FIG. 6 illustrates concurrent processing of two threads of instructions, according to an embodiment of the present invention. Thread synchronization diagram600 showsthread602 andthread604 as a series of processing steps. A processing step is an action or actions performed by a processing element in the implementation of one embodiment of the present invention. A processing step can be, for example, the execution of an instruction, incrementing a dependency counter, decrementing a dependency counter, etc.Thread602 includesprocessing step606, processingstep608, processingstep610, processingstep612, processingstep614, andprocessing step616.Thread604 includesprocessing step618, processingstep620, andprocessing step622. Although synchronization diagram600 only shows two threads of instructions, other configurations are possible. For example, the system and method of the present invention can be extended to three, four, and more than four threads, as described above.
For the purpose of descriptive clarity, the instructions of thread synchronization diagram600 are referred to as instruction508 (i1), instruction510 (i2), instruction512 (i3), and instruction514 (i4). Note, however, that instruction processing diagram500 showsinstruction512 as dependent oninstruction508, and showsinstruction510 as dependent oninstruction514. Thread synchronization diagram600, on the other hand, shows the instructions ofthread602 dependent on the execution of instructions inthread604 generally, and the instructions ofthread604 dependent on the execution of instructions inthread602 generally. The dependencies betweeninstructions508514 shown in instruction processing diagram500 are implemented in the operation of one embodiment of the present invention through the general dependency of instructions within one thread on the processing of another thread (i.e., rather than particular instructions). This concept is illustrated in further detail below.
Additionally, thread synchronization diagram600 shows tellsbits624 and630 and dependsbit626 and628 as arrows pointing from processing steps to the threads that the bits identify. The arrows are shown to indicate that an instruction being processed in a processing step includes a tells bit or depends bit, and identifies the thread to which the bit points. Either the thread pointed to depends on the instruction (i.e., tells bit), or the instruction depends on the thread (i.e., depends bit). For example, tellsbit624 identifiesthread604 as dependent oninstruction508 ofprocessing step606. Similarly, dependsbit626 identifiesinstruction512 ofprocessing step618 as dependent onthread602.
Processing ofthread602 andthread604 begins when the processing element executesinstruction508, in processingstep606.Instruction508 includes tellsbit624 that identifiesthread604 as dependent oninstruction508. The processing element detects tellsbit624 and increments a dependency counter in adependency counter group704 associated with thread T1,604.
As described above, a dependency counter group is associated with a thread, and the dependency counter group includes dependency counters, each of which is associated with one of the other threads executing on the processing element. Thread synchronization diagram600 is described in terms of dependency counter group702 (associated with thread602) and dependency counter group704 (associated with thread604). Dependency counter T10 T10ofdependency counter group704 is associated withthread602, and dependency counter T0, is associated withthread602.
After processingstep606, the processing element receivesinstruction512, in processingstep618.Instruction512 includes dependsbit626 identifyinginstruction512 as dependent on the execution of instructions inthread602. The processing element determines if dependency counter T10is above a predefined threshold. For the purposes of explanation, dependency counter T10is assumed to have been above, or at the threshold, so that it is above the threshold after being incremented. Since the processing element has incremented dependency counter T10, when the dependency counter is checked in response toinstruction512, the processing element determines that dependency counter T10is above the threshold.
Since dependency counter T10is above the threshold, the processing element continuesprocessing instruction512, at processingstep620. Inprocessing step620, the processing element executesinstruction512 and decrements dependency counter T10.
Meanwhile, the processing element processesthread602 inprocessing step608. Inprocessing step608, the processing element receivesinstruction510 from program memory.Instruction510 includes dependsbit628, which identifiesinstruction510 as dependent on the execution of instructions inthread604. The processing element checks the dependency counter group ofthread602, particularly the dependency counter related tothread604, in response to detecting dependsbit628. This corresponds to dependency counter T01. The value can be, for example, zero, or some other number representing a predetermined threshold. For exemplary purposes, however, dependency counter T01is defined as having a value of the predetermined threshold. In any case, the value of dependency counter T01indicates that instructions inthread604 upon whichinstruction510 depends, have not yet been executed. In response to detecting that dependency counter T01is not above a threshold, the processing element suspends execution ofthread602 inprocessing step610.
Meanwhile, the processing element continuesprocessing thread604. The processing element receivesinstruction514 inprocessing step622.Instruction514 includes tellsbit630 that identifiesthread602 as including instructions dependent oninstruction514. The processing element increments that dependency counter of thethread602 dependency counter group that is related to thread604 (namely, dependency counter T01) in response to detecting tellsbit630, and executesinstruction514, in processingstep622. Note that the order of executing the instruction and incrementing or decrementing dependency counters is chosen for illustrative purposes only, and the same outcome can be achieved with reversed order.
After processingstep622, the processing element detects that dependency counter T01has been incremented to above the threshold, in processingstep612. As such, the processing element resumesprocessing thread602 atinstruction510 inprocessing step614. After resumingprocessing thread602, the processing element executesinstruction510, decrements dependency counter T01, and continues processing the instructions ofthread602, instep616. Note that in the example ofFIG. 6, dependency counter T01is now equal to the threshold value, and any additional instructions inthread602 that include dependsbits identifying thread604 will cause the processing element to suspend execution of the thread (absent prior instructions inthread604 with tells bits identifying thread602).
The operation of thread synchronization diagram600 is now described with reference to the elements ofexemplary processing element400. The execution ofthread602 begins in processingstep606. For descriptive clarity,thread602 is associated withinstruction buffer404A, andthread604 is associated withinstruction buffer404B. In general, instruction fetchunit402 fetches program instructions fromprogram memory306. Instruction fetchunit402 distributes the instructions associated with the four threads to one ofinstruction buffers404A,404B,404C, or404D. In one embodiment, each ofinstruction buffers404A–404D is associated with a particular thread.
Instruction issue control408 detects the presence of depends bits such as dependsbits810814 or the presence of tells bits, such as tellsbits816820 included in instructions ininstruction buffers404A–404D. Based on presence or absence of depends bits and tells bits in the instruction,instruction issue control408 controls function decode andexecution switch406. Based on signals frominstruction issue control408, function decode andexecution switch406 issues instructions frominstruction buffers404A–404D to one ofexecution units412416 (i.e., memoryperipheral interface unit412,primary function unit414, or auxiliary function unit416).
Inprocessing step606,instruction508 is received ininstruction buffer404A.Instruction issue control408 detects the presence of tellsbit624 ininstruction508. In response to detecting the presence of tellsbit624, instruction issue control increments one of the dependency counters in dependency counters410. As described above,instruction issue control408 increments dependency counter T10.Instruction issue control408 then causes function decode andexecution switch406 to provideinstruction508 to one ofexecution units412416 for execution. Meanwhile,processing element400 is also processingthread604.Instruction buffer404B receivesinstruction512, in processingstep618.Instruction issue control408 detects the existence of dependsbit626 ininstruction512. Dependsbit626 identifiesinstruction512 as dependent on instructions inthread602. In response to detecting dependsbit626,instruction issue control408 checks dependency counter T10in processingstep618. Since dependency counter T10is above the threshold (as described above),instruction issue control408 enables function decode andexecution switch406 to provideinstruction512 to one ofexecution units412416 for execution. Additionally, instruction issue control decrements dependency counter T10in dependency counters410.
Meanwhile,processing element400 receivesinstruction510 inprocessing step608.Instruction issue control408 detects the existence of dependsbit628 ininstruction buffer404A. Dependsbit628 identifiesinstruction510 as dependent on instructions inthread604. In response to detecting dependsbit628,instruction issue control408 checks dependency counter T01in dependency counters410. In this particular example, dependency counter T01is equal to the threshold necessary to continue processinginstruction510. Since dependency counter T01is not above the threshold,instruction issue control408 suspends execution ofthread602 by holdinginstruction510 in function decode andexecution switch406.
Processing element400 continuesprocessing thread604, and receivesinstruction514 inprocessing step622.Instruction514 includes tellsbit630 identifyingthread602 as dependent on the execution ofinstruction514.Instruction issue control408 increments dependency counter T01in response to detecting tellsbit630, in processingstep622.Instruction issue control408 causes function decode andexecution switch406 to sendinstruction514 to one ofexecution units412416 for execution. After dependency counter T01has been incremented in processingstep622,instruction issue control408 detects that dependency counter T01has been incremented.Instruction issue control408 checks dependency counter T01to determine if it is above the threshold. In the example of thread synchronization diagram600,instruction issue control408 determines that dependency counter T01is above the threshold, in processingstep612. In response to detecting dependency counter T01above the threshold,instruction issue control408 resumesprocessing thread602 by issuinginstruction510 to one ofexecution units412416, in processingstep614.Instruction510 is executed, andinstruction issue control408 decrements dependency counter T01in processingstep616.
FIG. 9 illustrates a process for executing instructions, according to an embodiment of the present invention. Aftermethod900 starts instep902, a processing element receives an instruction in a first thread, instep904. Instep906, the processing element determines if the execution of the instruction in the first thread is dependent on the execution of instructions in a second thread.
If the processing element determines that the execution of the instruction in the first thread is not dependent on the execution of instructions in a second thread,method900 ends instep916.
If, on the other hand, the processing element determines that the execution of the instruction in the first thread is dependent on the execution of instructions in a second thread, the process ofmethod900 continues instep908. Instep908, the processing element examines a dependency counter group that includes a dependency counter associated with the second thread.
Instep910, the processing element determines whether the dependency counter includes a value above a threshold. If the dependency counter includes a value above a threshold,method900 continues instep914. Instep914, the processing element executes the first thread instruction and decrements the dependency counter.
If, on the other hand, the processing element determines that the dependency counter does not include a value above a threshold,method900 continues instep912. Instep912, the processing element suspends execution of the first thread until the dependency counter is incremented to above a threshold. Once the dependency counter is incremented to above a threshold, processing the first thread resumes,method900 continues instep914. Instep914, the processing element executes the first thread instruction.
FIG. 10 illustrates an exemplary process for executing instructions, according to an embodiment of the present invention. Aftermethod1000 starts instep1002, a processing element receives a first thread instruction, instep1004. After the first thread instruction has been received, the processing element determines whether a second thread is dependent on the first thread instruction, instep1006.
If a second thread is dependent on the execution of the first thread instruction,method1000 continues instep1008. Instep1008, the processing element increments a dependency counter included in a dependency counter group associated with the second thread. After the dependency counter is incremented, the processing element executes the first thread instruction, instep1010.
If, on the other hand, the processing element determines that a second thread is not dependent on the first thread instruction, the process ofmethod1000 continues instep1010. Instep1010, the processing element executes the first thread instruction.
Afterstep1010,method1000 ends instep1012.
The present invention provides a system and method for high speed processing of network data elements. A network line module, such asnetwork line module1041, receives network data elements from a network or switch fabric via a network line module ingress port. The network line module provides the network data elements to a multiprocessor core. The received network data elements are distributed to multiple processing elements within the multiprocessor core for processing according to a program.
The processing elements process the network data elements according to program instructions stored in program memory. Each of the processing elements uses instruction-level parallelism to process multiple threads of instructions concurrently. Instruction execution is synchronized by recording dependencies between instructions and threads. Instructions in the threads can include dependence indicators identifying dependencies between instructions and threads. When a processing element encounters an instruction that includes dependence indicators identifying a dependent instruction or thread, the processing element checks, decrements, or increments one or more dependency counters that records dependency between instructions and threads. If an instruction in a first thread is dependent upon the execution of instructions in a second thread, a dependency counter is checked. If the dependency counter is not above a predetermined threshold, the processing element suspends the execution of the first thread until the dependency counter is incremented by the second thread to above the threshold.
After processing, the multiprocessor core provides processed network data elements to the network line module. The network line module provides the processed network data element to an egress port connected to a network or switch fabric.
It will be apparent to one skilled in the art that various changes and modifications can be made therein without departing from the spirit and scope thereof. Thus, it is intended that the present invention cover the modifications and variations of this invention provided they come within the scope of the appended claims and their equivalents.

Claims (26)

What is claimed is:
1. An apparatus for instruction-level parallelism in a processing element, comprising: an instruction control unit; The apparatus of claim 7, wherein the instruction buffer comprises:
a first instruction buffer coupled to said instruction control unit, the first instruction buffer configured to hold a the first instructionincluding a dependency indicator and being associated with a first thread;, and
a second instruction buffer coupled to said instruction control unit, the second instruction buffer configured to hold a the second instructionincluding a dependency indicator and being associated with a second thread; a dependency counter coupled to said instruction control unit;; and wherein
anthe execution switch is coupled to said instruction control unit, said first instruction buffer, and said second instruction buffer; and
an execution unit coupled to said execution switch;
said instruction control unit configured to detect the dependency indicators and change the value of said dependency counter in response to detecting the dependency indicators and configured to disallow execution of the first instruction if said dependency counter includes a value less than a threshold value.
2. The apparatus ofclaim 1, wherein said dependency counter includes a first counter associated with the first instruction buffer and a second counter associated with the second instruction buffer.
3. The apparatus ofclaim 1, wherein said instruction control unit identifies instruction dependency bits in said first instruction buffer, the instruction dependency bits being associated with instructions.
4. The apparatus ofclaim 1, said instruction control unit generating control signals based on the dependency bits and values included in said dependency counter.
5. The apparatus ofclaim 4, said execution switch providing instructions from said first instruction buffer to said execution unit based on control signals from said instruction control unit.
6. The apparatus ofclaim 1, said execution switch providing instructions from said first instruction buffer to said execution unit based on control signals from said instruction control unit.
7. An apparatus for processing instructions in multiple threads in an execution unit, comprising:
an instruction buffer holding configured to hold a first instruction and a second instruction, the first instruction being associated with a first thread, and the second instruction being associated with a second thread, the first instruction and the second instruction including one or more instruction dependency bits;
a dependency counter,
an instruction control unit coupled to said instruction buffer and said dependency counter, said instruction control unit detecting configured to detect the instruction dependency bits and incrementing and decrementing to increment and decrement said dependency counter in response to detecting the instruction dependence bits, said instruction control unit configured to disallow execution of the first instruction if in response to said dependency counter includes including a value less than a threshold value; and
an execution switch coupled to said instruction control unit and said instruction buffer, said execution switch sending configured to send instructions to the execution unit.
8. The apparatus ofclaim 7, wherein said dependency counter includes a first counter associated with the first thread and a second counter associated with the second thread.
9. The apparatus ofclaim 7, wherein said instruction buffer includes the instruction dependency bits, the instruction dependency bits being associated with instructions.
10. The apparatus ofclaim 7, wherein said instruction control detects dependency between the first instruction and the second thread based on dependency bits in said instruction buffer and a value of said dependency counter.
11. A method for processing instructions in multiple threads, comprising:
receiving a first instruction associated with a first thread;
determining whether execution of the first instruction depends on execution of a second instruction, the second instruction being associated with a second thread;
examining a counter logic element associated with the first thread if in response to said determining indicates indicating that the first instruction depends on the execution of the second instruction, wherein the logic element comprises a single bi-state element or a tri-state element;
decrementing the counter ifmodifying the logic element in response to said examining indicatesindicating that the second instruction has already been executed; and
executing the first instruction; and
suspending the processing of the first thread until said examining indicates that the second instruction has already been executed and then resuming processing.
12. The method ofclaim 11, further comprising suspending the processing of the first thread until said examining indicates that the second instruction has already been executed.
13. A method for processing instructions in multiple threads, comprising:
receiving a first instruction associated with a first thread;
determining whether execution of a second instruction depends on the execution of the first instruction, the second instruction being associated with a second thread;
incrementing a counter associated with the second thread if in response to said determining indicates indicating that execution of a second instruction depends on the execution of the first instruction; and
executing the first instruction; and
suspending the processing of the second thread in response to the counter associated with the second thread not exceeding a threshold and resuming the processing of the second thread in response to the counter associated with the second thread exceeding the threshold;
wherein the first instruction and the second instruction include one or more instruction dependency bits.
14. The method ofclaim 13, further comprising suspending the processing of the second thread if the counter associated with the second thread does not exceed a threshold.
15. A method for processing instructions in multiple threads, comprising:
receiving a first instruction associated with a first thread, the first instruction including one or more instruction dependency bits;
determining whether a second thread depends on said first instruction;
incrementing a counter associated with the second thread if in response to the second thread depends depending on said first instruction;
loading a second instruction associated with a second thread; and
processing the second instruction in a manner related to the value of the counter associated with the second thread; and
suspending, the processing of the second thread in response to the counter not exceeding a threshold and resuming the processing of the second thread in response to the counter exceeding said threshold.
16. The method ofclaim 15, further comprising suspending the processing the second thread if the counter indicates that a dependent thread has not been executed.
17. The method ofclaim 15, further comprising executing the second instruction if the counter indicates that said first instruction has been executed.
18. An apparatus for processing instructions in multiple threads, comprising:
an instruction buffer configured to hold a first instruction and a second instruction, the first instruction including a dependency indicator and being associated with a first thread, and the second instruction including a dependency indicator and being associated with a second thread;
an instruction control unit coupled to said instruction buffer;
a dependency counter coupled to said instruction control unit, said dependency counter associated with the first thread;
said instruction control unit configured to detect the dependency indicators and change the value of increment and decrement said dependency counter in response to detecting the dependency indicators; and
said instruction control unit configured to disallow execution of the first instruction if in response to said dependency counter includes including a value less than a threshold value.
19. The apparatus ofclaim 18, wherein said instruction control unit is configured to determine that the dependency indicator included in the first instruction indicates that the second thread includes an instruction on which the first instruction depends.
20. The apparatus ofclaim 18, wherein the dependency indicator included in the first instruction is a depends bit.
21. The apparatus ofclaim 18, wherein said instruction control unit is configured to determine that the dependency indicator included in the second instruction indicates that the first thread includes an instruction that is dependent on the second instruction.
22. The apparatus ofclaim 18, wherein the dependency indicator included in the second instruction is a tells bit.
23. The apparatus of claim18 25, wherein said instruction control unit is configured to increment said dependency counter in response to detecting the dependency indicator included in the second instruction.
24. The apparatus of claim18 25, wherein said instruction control unit is configured to decrement said dependency counter in response to detecting the dependency indicator included in the first instruction.
25. The apparatus according to claim 18, wherein said dependency counter is coupled to said instruction control unit and associated with the first thread.
26. A method for processing instructions in multiple threads, comprising:
receiving a first instruction associated with a first thread;
determining that execution of the first instruction depends on execution of a second instruction, the second instruction being associated with a second thread;
examining a dependency counter associated with the first thread to determine whether the second instruction has already been executed;
incrementing the dependency counter in response to said determining indicating that execution of the first instruction depends on execution of the second instruction; and
suspending the processing of the first thread when examining indicates that the dependency counter does not exceed a threshold and resuming the processing after the dependency counter exceeds said threshold,
wherein the first instruction and the second instruction include one or more instruction dependency bits.
US11/862,8152001-04-132007-09-27System and method for instruction-level parallelism in a programmable multiple network processor environmentExpired - LifetimeUSRE44129E1 (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
US11/862,815USRE44129E1 (en)2001-04-132007-09-27System and method for instruction-level parallelism in a programmable multiple network processor environment

Applications Claiming Priority (2)

Application NumberPriority DateFiling DateTitle
US09/833,580US6950927B1 (en)2001-04-132001-04-13System and method for instruction-level parallelism in a programmable multiple network processor environment
US11/862,815USRE44129E1 (en)2001-04-132007-09-27System and method for instruction-level parallelism in a programmable multiple network processor environment

Related Parent Applications (1)

Application NumberTitlePriority DateFiling Date
US09/833,580ReissueUS6950927B1 (en)2001-04-132001-04-13System and method for instruction-level parallelism in a programmable multiple network processor environment

Publications (1)

Publication NumberPublication Date
USRE44129E1true USRE44129E1 (en)2013-04-02

Family

ID=34992770

Family Applications (2)

Application NumberTitlePriority DateFiling Date
US09/833,580CeasedUS6950927B1 (en)2001-04-132001-04-13System and method for instruction-level parallelism in a programmable multiple network processor environment
US11/862,815Expired - LifetimeUSRE44129E1 (en)2001-04-132007-09-27System and method for instruction-level parallelism in a programmable multiple network processor environment

Family Applications Before (1)

Application NumberTitlePriority DateFiling Date
US09/833,580CeasedUS6950927B1 (en)2001-04-132001-04-13System and method for instruction-level parallelism in a programmable multiple network processor environment

Country Status (1)

CountryLink
US (2)US6950927B1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US20150035843A1 (en)*2013-07-312015-02-05Sandia CorporationGraphics processing unit management system for computed tomography

Families Citing this family (60)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US6968447B1 (en)2001-04-132005-11-22The United States Of America As Represented By The Secretary Of The NavySystem and method for data forwarding in a programmable multiple network processor environment
US7225301B2 (en)*2002-11-222007-05-29Quicksilver TechnologiesExternal memory controller node
US7574439B2 (en)*2004-05-202009-08-11International Business Machines CorporationManaging a nested request
US20060212450A1 (en)*2005-03-182006-09-21Microsoft CorporationTemporary master thread
US8645959B2 (en)*2005-03-302014-02-04Intel CorporaitonMethod and apparatus for communication between two or more processing elements
CN101449256B (en)2006-04-122013-12-25索夫特机械公司Apparatus and method for processing instruction matrix specifying parallel and dependent operations
US8766995B2 (en)*2006-04-262014-07-01Qualcomm IncorporatedGraphics system with configurable caches
US20070260856A1 (en)*2006-05-052007-11-08Tran Thang MMethods and apparatus to detect data dependencies in an instruction pipeline
US20070268289A1 (en)*2006-05-162007-11-22Chun YuGraphics system with dynamic reposition of depth engine
US8884972B2 (en)*2006-05-252014-11-11Qualcomm IncorporatedGraphics processor with arithmetic and elementary function units
US8869147B2 (en)*2006-05-312014-10-21Qualcomm IncorporatedMulti-threaded processor with deferred thread output control
US8644643B2 (en)*2006-06-142014-02-04Qualcomm IncorporatedConvolution filtering in a graphics processor
US8766996B2 (en)*2006-06-212014-07-01Qualcomm IncorporatedUnified virtual addressed register file
US8291431B2 (en)*2006-08-292012-10-16Qualcomm IncorporatedDependent instruction thread scheduling
DE112006004005T5 (en)*2006-10-272009-06-10Intel Corporation, Santa Clara Communication between multiple execution sequences in a processor
EP2527972A3 (en)2006-11-142014-08-06Soft Machines, Inc.Apparatus and method for processing complex instruction formats in a multi- threaded architecture supporting various context switch modes and virtualization schemes
US9317290B2 (en)*2007-05-042016-04-19Nvidia CorporationExpressing parallel execution relationships in a sequential programming language
US8200947B1 (en)*2008-03-242012-06-12Nvidia CorporationSystems and methods for voting among parallel threads
WO2009118731A2 (en)2008-03-272009-10-01Rocketick Technologies LtdDesign simulation using parallel processors
US9032377B2 (en)*2008-07-102015-05-12Rocketick Technologies Ltd.Efficient parallel computation of dependency problems
US8564616B1 (en)2009-07-172013-10-22Nvidia CorporationCull before vertex attribute fetch and vertex lighting
US8542247B1 (en)2009-07-172013-09-24Nvidia CorporationCull before vertex attribute fetch and vertex lighting
US8468539B2 (en)*2009-09-032013-06-18International Business Machines CorporationTracking and detecting thread dependencies using speculative versioning cache
US8976195B1 (en)2009-10-142015-03-10Nvidia CorporationGenerating clip state for a batch of vertices
US8384736B1 (en)2009-10-142013-02-26Nvidia CorporationGenerating clip state for a batch of vertices
US10241799B2 (en)2010-07-162019-03-26Qualcomm IncorporatedOut-of-order command execution with sliding windows to maintain completion statuses
US9830157B2 (en)*2010-08-182017-11-28Wisconsin Alumni Research FoundationSystem and method for selectively delaying execution of an operation based on a search for uncompleted predicate operations in processor-associated queues
KR101685247B1 (en)2010-09-172016-12-09소프트 머신즈, 인크.Single cycle multi-branch prediction including shadow cache for early far branch prediction
KR101966712B1 (en)2011-03-252019-04-09인텔 코포레이션Memory fragments for supporting code block execution by using virtual cores instantiated by partitionable engines
KR101620676B1 (en)2011-03-252016-05-23소프트 머신즈, 인크.Register file segments for supporting code block execution by using virtual cores instantiated by partitionable engines
US9766893B2 (en)2011-03-252017-09-19Intel CorporationExecuting instruction sequence code blocks by using virtual cores instantiated by partitionable engines
EP2710480B1 (en)2011-05-202018-06-20Intel CorporationAn interconnect structure to support the execution of instruction sequences by a plurality of engines
US9940134B2 (en)2011-05-202018-04-10Intel CorporationDecentralized allocation of resources and interconnect structures to support the execution of instruction sequences by a plurality of engines
US9529596B2 (en)*2011-07-012016-12-27Intel CorporationMethod and apparatus for scheduling instructions in a multi-strand out of order processor with instruction synchronization bits and scoreboard bits
US10191746B2 (en)2011-11-222019-01-29Intel CorporationAccelerated code optimizer for a multiengine microprocessor
CN104040491B (en)2011-11-222018-06-12英特尔公司 Microprocessor-accelerated code optimizer
WO2013101229A1 (en)*2011-12-302013-07-04Intel CorporationStructure access processors, methods, systems, and instructions
US20130246761A1 (en)*2012-03-132013-09-19International Business Machines CorporationRegister sharing in an extended processor architecture
US9400653B2 (en)2013-03-142016-07-26Samsung Electronics Co., Ltd.System and method to clear and rebuild dependencies
WO2014150806A1 (en)2013-03-152014-09-25Soft Machines, Inc.A method for populating register view data structure by using register template snapshots
WO2014151043A1 (en)2013-03-152014-09-25Soft Machines, Inc.A method for emulating a guest centralized flag architecture by using a native distributed flag architecture
US10275255B2 (en)2013-03-152019-04-30Intel CorporationMethod for dependency broadcasting through a source organized source view data structure
US9569216B2 (en)2013-03-152017-02-14Soft Machines, Inc.Method for populating a source view data structure by using register template snapshots
EP2972845B1 (en)2013-03-152021-07-07Intel CorporationA method for executing multithreaded instructions grouped onto blocks
US9904625B2 (en)2013-03-152018-02-27Intel CorporationMethods, systems and apparatus for predicting the way of a set associative cache
WO2014150991A1 (en)2013-03-152014-09-25Soft Machines, Inc.A method for implementing a reduced size register view data structure in a microprocessor
WO2014150971A1 (en)2013-03-152014-09-25Soft Machines, Inc.A method for dependency broadcasting through a block organized source view data structure
US10140138B2 (en)2013-03-152018-11-27Intel CorporationMethods, systems and apparatus for supporting wide and efficient front-end operation with guest-architecture emulation
US9811342B2 (en)2013-03-152017-11-07Intel CorporationMethod for performing dual dispatch of blocks and half blocks
US9886279B2 (en)2013-03-152018-02-06Intel CorporationMethod for populating and instruction view data structure by using register template snapshots
WO2014142704A1 (en)2013-03-152014-09-18Intel CorporationMethods and apparatus to compile instructions for a vector of instruction pointers processor architecture
US9891924B2 (en)2013-03-152018-02-13Intel CorporationMethod for implementing a reduced size register view data structure in a microprocessor
JP6515771B2 (en)*2015-10-072019-05-22富士通コネクテッドテクノロジーズ株式会社 Parallel processing device and parallel processing method
US10339063B2 (en)*2016-07-192019-07-02Advanced Micro Devices, Inc.Scheduling independent and dependent operations for processing
GB2576457B (en)*2017-06-162020-09-23Imagination Tech LtdQueues for inter-pipeline data hazard avoidance
GB2563582B (en)2017-06-162020-01-01Imagination Tech LtdMethods and systems for inter-pipeline data hazard avoidance
US10564989B2 (en)2017-11-282020-02-18Microsoft Technology LicensingThread independent parametric positioning for rendering elements
US10424041B2 (en)*2017-12-112019-09-24Microsoft Technology Licensing, LlcThread independent scalable vector graphics operations
US11740907B2 (en)*2020-03-162023-08-29Arm LimitedSystems and methods for determining a dependency of instructions
US11740908B2 (en)*2020-03-162023-08-29Arm LimitedSystems and methods for defining a dependency of preceding and succeeding instructions

Citations (36)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US4941143A (en)1987-11-101990-07-10Echelon Systems Corp.Protocol for network having a plurality of intelligent cells
US5029169A (en)1989-07-111991-07-02Bell Communications Research, Inc.Methods and apparatus for fault detection
US5197130A (en)1989-12-291993-03-23Supercomputer Systems Limited PartnershipCluster architecture for a highly parallel scalar/vector multiprocessor system
US5274809A (en)1988-05-261993-12-28Hitachi, Ltd.Task execution control method for a multiprocessor system with enhanced post/wait procedure
US5488730A (en)*1990-06-291996-01-30Digital Equipment CorporationRegister conflict scoreboard in pipelined computer using pipelined reference counts
US5522083A (en)1989-11-171996-05-28Texas Instruments IncorporatedReconfigurable multi-processor operating in SIMD mode with one processor fetching instructions for use by remaining processors
US5524212A (en)1992-04-271996-06-04University Of WashingtonMultiprocessor system with write generate method for updating cache
US5623670A (en)1995-02-171997-04-22Lucent Technologies Inc.Method and apparatus for crash safe enforcement of mutually exclusive access to shared resources in a multitasking computer system
US5710902A (en)*1995-09-061998-01-20Intel CorporationInstruction dependency chain indentifier
US5761474A (en)*1996-05-241998-06-02Hewlett-Packard Co.Operand dependency tracking system and method for a processor that executes instructions out of order
US5805915A (en)1992-05-221998-09-08International Business Machines CorporationSIMIMD array processing system
US5850533A (en)*1997-06-251998-12-15Sun Microsystems, Inc.Method for enforcing true dependencies in an out-of-order processor
US5913925A (en)*1996-12-161999-06-22International Business Machines CorporationMethod and system for constructing a program including out-of-order threads and processor and method for executing threads out-of-order
US5964841A (en)1997-03-031999-10-12Cisco Technology, Inc.Technique for handling forwarding transients with link state routing protocol
US5978900A (en)*1996-12-301999-11-02Intel CorporationRenaming numeric and segment registers using common general register pool
US5978855A (en)1994-05-271999-11-02Bell Atlantic Network Services, Inc.Downloading applications software through a broadcast channel
US6016540A (en)*1997-01-082000-01-18Intel CorporationMethod and apparatus for scheduling instructions in waves
US6044438A (en)1997-07-102000-03-28International Business Machiness CorporationMemory controller for controlling memory accesses across networks in distributed shared memory processing systems
US6065112A (en)1997-06-182000-05-16Matsushita Electric Industrial Co., Ltd.Microprocessor with arithmetic processing units and arithmetic execution unit
US6065105A (en)*1997-01-082000-05-16Intel CorporationDependency matrix
US6108770A (en)*1998-06-242000-08-22Digital Equipment CorporationMethod and apparatus for predicting memory dependence using store sets
US6182210B1 (en)*1997-12-162001-01-30Intel CorporationProcessor having multiple program counters and trace buffers outside an execution pipeline
US6212623B1 (en)*1998-08-242001-04-03Advanced Micro Devices, Inc.Universal dependency vector/queue entry
US20010023479A1 (en)2000-03-162001-09-20Michihide KimuraInformation processing unit, and exception processing method for specific application-purpose operation instruction
US6334182B2 (en)*1998-08-182001-12-25Intel CorpScheduling operations using a dependency matrix
US20020138714A1 (en)*2001-03-222002-09-26Sun Microsystems, Inc.Scoreboard for scheduling of instructions in a microprocessor that provides out of order execution
US6463522B1 (en)*1997-12-162002-10-08Intel CorporationMemory system for ordering load and store instructions in a processor that performs multithread execution
US6493804B1 (en)1997-10-012002-12-10Regents Of The University Of MinnesotaGlobal file system and data storage device locks
US6496871B1 (en)1998-06-302002-12-17Nec Research Institute, Inc.Distributed agent software system and method having enhanced process mobility and communication in a computer network
US6557095B1 (en)*1999-12-272003-04-29Intel CorporationScheduling operations using a dependency matrix
US6567840B1 (en)1999-05-142003-05-20Honeywell Inc.Task scheduling and message passing
US6629233B1 (en)2000-02-172003-09-30International Business Machines CorporationSecondary reorder buffer microprocessor
US6763519B1 (en)1999-05-052004-07-13Sychron Inc.Multiprogrammed multiprocessor system with lobally controlled communication and signature controlled scheduling
US6782469B1 (en)*2000-09-292004-08-24Intel CorporationRuntime critical load/data ordering
US6968447B1 (en)2001-04-132005-11-22The United States Of America As Represented By The Secretary Of The NavySystem and method for data forwarding in a programmable multiple network processor environment
US6978459B1 (en)2001-04-132005-12-20The United States Of America As Represented By The Secretary Of The NavySystem and method for processing overlapping tasks in a programmable network processor environment

Patent Citations (38)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US4941143A (en)1987-11-101990-07-10Echelon Systems Corp.Protocol for network having a plurality of intelligent cells
US5274809A (en)1988-05-261993-12-28Hitachi, Ltd.Task execution control method for a multiprocessor system with enhanced post/wait procedure
US5029169A (en)1989-07-111991-07-02Bell Communications Research, Inc.Methods and apparatus for fault detection
US5522083A (en)1989-11-171996-05-28Texas Instruments IncorporatedReconfigurable multi-processor operating in SIMD mode with one processor fetching instructions for use by remaining processors
US5197130A (en)1989-12-291993-03-23Supercomputer Systems Limited PartnershipCluster architecture for a highly parallel scalar/vector multiprocessor system
US5640524A (en)1989-12-291997-06-17Cray Research, Inc.Method and apparatus for chaining vector instructions
US5488730A (en)*1990-06-291996-01-30Digital Equipment CorporationRegister conflict scoreboard in pipelined computer using pipelined reference counts
US5524212A (en)1992-04-271996-06-04University Of WashingtonMultiprocessor system with write generate method for updating cache
US5805915A (en)1992-05-221998-09-08International Business Machines CorporationSIMIMD array processing system
US5978855A (en)1994-05-271999-11-02Bell Atlantic Network Services, Inc.Downloading applications software through a broadcast channel
US5623670A (en)1995-02-171997-04-22Lucent Technologies Inc.Method and apparatus for crash safe enforcement of mutually exclusive access to shared resources in a multitasking computer system
US5710902A (en)*1995-09-061998-01-20Intel CorporationInstruction dependency chain indentifier
US5761474A (en)*1996-05-241998-06-02Hewlett-Packard Co.Operand dependency tracking system and method for a processor that executes instructions out of order
US5913925A (en)*1996-12-161999-06-22International Business Machines CorporationMethod and system for constructing a program including out-of-order threads and processor and method for executing threads out-of-order
US5978900A (en)*1996-12-301999-11-02Intel CorporationRenaming numeric and segment registers using common general register pool
US6016540A (en)*1997-01-082000-01-18Intel CorporationMethod and apparatus for scheduling instructions in waves
US6065105A (en)*1997-01-082000-05-16Intel CorporationDependency matrix
US5964841A (en)1997-03-031999-10-12Cisco Technology, Inc.Technique for handling forwarding transients with link state routing protocol
US6065112A (en)1997-06-182000-05-16Matsushita Electric Industrial Co., Ltd.Microprocessor with arithmetic processing units and arithmetic execution unit
US5850533A (en)*1997-06-251998-12-15Sun Microsystems, Inc.Method for enforcing true dependencies in an out-of-order processor
US6044438A (en)1997-07-102000-03-28International Business Machiness CorporationMemory controller for controlling memory accesses across networks in distributed shared memory processing systems
US6493804B1 (en)1997-10-012002-12-10Regents Of The University Of MinnesotaGlobal file system and data storage device locks
US6463522B1 (en)*1997-12-162002-10-08Intel CorporationMemory system for ordering load and store instructions in a processor that performs multithread execution
US6182210B1 (en)*1997-12-162001-01-30Intel CorporationProcessor having multiple program counters and trace buffers outside an execution pipeline
US6493820B2 (en)*1997-12-162002-12-10Intel CorporationProcessor having multiple program counters and trace buffers outside an execution pipeline
US6108770A (en)*1998-06-242000-08-22Digital Equipment CorporationMethod and apparatus for predicting memory dependence using store sets
US6496871B1 (en)1998-06-302002-12-17Nec Research Institute, Inc.Distributed agent software system and method having enhanced process mobility and communication in a computer network
US6334182B2 (en)*1998-08-182001-12-25Intel CorpScheduling operations using a dependency matrix
US6212623B1 (en)*1998-08-242001-04-03Advanced Micro Devices, Inc.Universal dependency vector/queue entry
US6763519B1 (en)1999-05-052004-07-13Sychron Inc.Multiprogrammed multiprocessor system with lobally controlled communication and signature controlled scheduling
US6567840B1 (en)1999-05-142003-05-20Honeywell Inc.Task scheduling and message passing
US6557095B1 (en)*1999-12-272003-04-29Intel CorporationScheduling operations using a dependency matrix
US6629233B1 (en)2000-02-172003-09-30International Business Machines CorporationSecondary reorder buffer microprocessor
US20010023479A1 (en)2000-03-162001-09-20Michihide KimuraInformation processing unit, and exception processing method for specific application-purpose operation instruction
US6782469B1 (en)*2000-09-292004-08-24Intel CorporationRuntime critical load/data ordering
US20020138714A1 (en)*2001-03-222002-09-26Sun Microsystems, Inc.Scoreboard for scheduling of instructions in a microprocessor that provides out of order execution
US6968447B1 (en)2001-04-132005-11-22The United States Of America As Represented By The Secretary Of The NavySystem and method for data forwarding in a programmable multiple network processor environment
US6978459B1 (en)2001-04-132005-12-20The United States Of America As Represented By The Secretary Of The NavySystem and method for processing overlapping tasks in a programmable network processor environment

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
Zhou et al. (Zhou) (Thread Scheduling for Out-of-Core Applications with Memory Server on Multicomputers); IOPADS '99 Proceedings of the sixth workshop on I/O in parallel and distributed systems; 11 pages total; 1999.*

Cited By (2)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US20150035843A1 (en)*2013-07-312015-02-05Sandia CorporationGraphics processing unit management system for computed tomography
US10062135B2 (en)*2013-07-312018-08-28National Technology & Engineering Solutions Of Sandia, LlcGraphics processing unit management system for computed tomography

Also Published As

Publication numberPublication date
US6950927B1 (en)2005-09-27

Similar Documents

PublicationPublication DateTitle
USRE44129E1 (en)System and method for instruction-level parallelism in a programmable multiple network processor environment
US6978459B1 (en)System and method for processing overlapping tasks in a programmable network processor environment
USRE43825E1 (en)System and method for data forwarding in a programmable multiple network processor environment
US7965624B2 (en)Data link fault tolerance
US6330584B1 (en)Systems and methods for multi-tasking, resource sharing and execution of computer instructions
US5524250A (en)Central processing unit for processing a plurality of threads using dedicated general purpose registers and masque register for providing access to the registers
US8055879B2 (en)Tracking network contention
US9110714B2 (en)Systems and methods for multi-tasking, resource sharing, and execution of computer instructions
US8752051B2 (en)Performing an allreduce operation using shared memory
US7673011B2 (en)Configuring compute nodes of a parallel computer in an operational group into a plurality of independent non-overlapping collective networks
US9882801B2 (en)Providing full point-to-point communications among compute nodes of an operational group in a global combining network of a parallel computer
US7376952B2 (en)Optimizing critical section microblocks by controlling thread execution
KR20040010789A (en)A software controlled content addressable memory in a general purpose execution datapath
US8270295B2 (en)Reassigning virtual lane buffer allocation during initialization to maximize IO performance
US20090125703A1 (en)Context Switching on a Network On Chip
US20090040946A1 (en)Executing an Allgather Operation on a Parallel Computer
US20060136681A1 (en)Method and apparatus to support multiple memory banks with a memory block
US10558485B2 (en)Information processing apparatus and method for shifting buffer
EP0473777A4 (en)High-speed packet switching apparatus and method
US9246792B2 (en)Providing point to point communications among compute nodes in a global combining network of a parallel computer
US7702717B2 (en)Method and apparatus for controlling management agents in a computer system on a packet-switched input/output network
US8296457B2 (en)Providing nearest neighbor point-to-point communications among compute nodes of an operational group in a global combining network of a parallel computer
US20040246956A1 (en)Parallel packet receiving, routing and forwarding
US7280539B2 (en)Data driven type information processing apparatus
US8418129B1 (en)Method for automatically generating code to define a system of hardware elements

Legal Events

DateCodeTitleDescription
CCCertificate of correction
FPAYFee payment

Year of fee payment:12


[8]ページ先頭

©2009-2025 Movatter.jp