Movatterモバイル変換


[0]ホーム

URL:


US5636353A - Superscalar processor with direct result bypass between execution units having comparators in execution units for comparing operand and result addresses and activating results bypassing - Google Patents

Superscalar processor with direct result bypass between execution units having comparators in execution units for comparing operand and result addresses and activating results bypassing
Download PDF

Info

Publication number
US5636353A
US5636353AUS08/225,265US22526594AUS5636353AUS 5636353 AUS5636353 AUS 5636353AUS 22526594 AUS22526594 AUS 22526594AUS 5636353 AUS5636353 AUS 5636353A
Authority
US
United States
Prior art keywords
instruction
data
executing
pipeline
instructions
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
US08/225,265
Inventor
Chikako Ikenaga
Hideki Ando
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Mitsubishi Electric Corp
Original Assignee
Mitsubishi Electric Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Mitsubishi Electric CorpfiledCriticalMitsubishi Electric Corp
Priority to US08/225,265priorityCriticalpatent/US5636353A/en
Priority to US08/865,308prioritypatent/US6233670B1/en
Application grantedgrantedCritical
Publication of US5636353ApublicationCriticalpatent/US5636353A/en
Anticipated expirationlegal-statusCritical
Expired - Fee Relatedlegal-statusCriticalCurrent

Links

Images

Classifications

Definitions

Landscapes

Abstract

The disclosed is an improved superscalar processor for reducing the time required for execution of an instruction. The superscalar processor includes an instruction fetching stage, an instruction decoding stage, and function units each having a pipeline structure. A function unit includes an execution stage, a memory access stage, and a write back stage. Function units are connected through a newly provided bypass line. Data obtained by preceding execution in the other function unit (the other pipeline) is applied through the bypass line to a function unit (pipeline) for executing a later instruction. Executed data is transmitted between pipelines without through a register file, so that it becomes unnecessary for the pipeline requesting the executed data to wait for termination of execution of the other pipeline. As a result, time required for execution of an instruction is reduced.

Description

This application is a continuation of application Ser. No. 07/828,277 filed Jan. 30, 1992 now abandoned.
BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates generally to superscalar processors and, more particularly, to a superscalar processor capable of directly transferring data used in a plurality of instructions executed in parallel between pipelines.
2. Description of the Background Art
"A superscalar" is known as one of the architectures for increasing the processing speed of a microprocessor. Instructions which can be executed simultaneously are detected out of given plurality of instructions, and the detected instructions are processed simultaneously or in parallel by a plurality of pipelines in a microprocessor using a superscalar.
FIG. 7 is a block diagram of a superscalar processor illustrating the background of the present invention. Referring to FIG. 7, asuperscalar processor 20 includes aninstruction fetching stage 2 for fetching a plurality of instructions stored in aninstruction memory 1,instruction decoding stage 3 for decoding the instructions fetched ininstruction fetching stage 2,function units 14 to 17 each having a pipeline structure, and aregister file 9 for temporarily holding data used for executing the instructions.Functional units 14 to 17 can access anexternal data memory 8 through adata bus 11. Registerfile 9 is implemented with a RAM and is accessed fromfunction units 14 to 17.
Instruction fetching stage 2 includes a program counter (not shown) and gives an address signal generated from the program counter toinstruction memory 1. Designated plurality of instructions designated by the given address signal are fetched and held ininstruction fetching stage 2.
Instruction decoding stage 3 receives the plurality of instructions frominstruction fetching stage 2 and decodes them. Simultaneously executable instructions are detected out of the given plurality of instructions by decoding the instructions. In addition,instruction decoding stage 3 relays data betweenfunction units 14 to 17 and registerfile 9. Specifically,instruction decoding stage 3 reads data to be used byfunction units 14 to 17 for executing the given instructions fromregister file 9 and gives the read data tofunction units 14 to 17.
Each offunction units 14 to 17 has a pipeline structure. Specifically,superscalar processor 20 has four pipelines implemented with fourfunction units 14 to 17.
The fourfunction units 14 to 17 perform predetermined arithmetic operations as described in the following, for example.Function units 14 and 15 perform integer arithmetic operations.Function unit 16 carries out loading and storing of data intodata memory 8. Function unit 17 performs floating-point arithmetic operations. Each offunction units 14 and 15 includes an execution stage (EXC) and a write back stage (WB) to registerfile 9.Function unit 16 includes an address processing stage (ADR), a memory accessing stage (MEM), and a write back stage (WB). Function unit 17 includes three execution stages (EX1, EX2, EX3) and a write back stage (WB). Generally, the execution stages perform arithmetic operations and an address calculation, and, on the other hand, the memory access stage performs reading/writing from/intodata memory 8.
Superscalarprocessor 20 operates in response to externally applied two-phase non-overlap clock signals φ1 and φ2. Specifically,instruction fetching stage 2,instruction decoding stage 3, and various stages infunction units 14 to 17 are operated in response to clock signals φ1 and φ2 under the control of pipelines. An example of two-phase non-overlap clock signals is illustrated in FIG. 6.
In operation,instruction decoding stage 3 detects simultaneously executable instructions out of given plurality of instructions and gives the detected instructions to functionunits 14 to 17 (according to circumstances, to some offunction units 14 to 17).Function units 14 to 17 have pipeline structure, so that they can execute the given instructions simultaneously or in parallel.
Now, it is assumed that a superscalar processor has three function units (pipelines), and each function unit has an execution stage (EXC), a memory access stage (MEM), and a write back stage (WB). An example of progress of pipeline processing in this case is illustrated in FIG. 8A. Referring to FIG. 8A, it is assumed that three pipelines PL1, L2, and PL3 executeinstructions 1, 2, and 3, respectively. Processing ininstruction fetching stage 2 is performed in a period T1, and processing ininstruction decoding stage 3 is performed in a period T2 in pipeline PL1. Processing in the execution stage, the memory access stage, and the write back stage is executed in periods T3, T4, and T5, respectively. On the other hand, in pipeline PL2, processing ininstruction fetching stage 2 is started in period T2. The stages (ID, EXC, MEM, WB) are performed in periods T3 to T6, respectively, as inpipeline 1. In pipeline PL3, after processing ininstruction fetching stage 2 is started in period T3, processing in respective stages is performed in periods T4 to T7. As seen from FIG. 8A, each of pipelines PL1 to PL3 executes corresponding one of the giveninstructions 1 to 3, so that it is understood that respective stages are made to proceed simultaneously and in parallel. However, a problem arises from the view point of time required for processing in the following case.
Referring to FIG. 8B, it is assumed that twoinstructions 11 and 12 are given, and they are processed by pipelines PL1 and PL2. In addition, it is assumed that the data of a result obtained by executinginstruction 11 is used in processing ofinstruction 12. In other words, it is assumed thatinstruction 12 which executes its own processing using the data obtained by executinginstruction 11 is given.
Conventionally,instruction 11 is executed and terminated first in such a case. Specifically, in pipeline PL1,instruction fetching stage 2 is executed in period T1, andinstruction decoding stage 3 is executed in period T2. The execution stage, the memory access stage, and the write back stage are executed in periods T3, T4, and T5, respectively. Data obtained by executinginstruction 11 is once stored inregister file 9 illustrated in FIG. 7 according to execution of the write back stage. On the other hand, in pipeline PL2,instruction fetching stage 2 is executed in period T2, andinstruction decoding stage 3 is executed in period T3. However, execution ofinstruction 12 is stopped in periods T4 and T5. The reason for this is thatinstruction 12 uses data obtained by executinginstruction 11 as described above, so that it should wait for termination of execution ofinstruction 11. Accordingly, processing in pipeline PL2 is stopped until the write back stage in pipeline PL1 is terminated in period T5. In other words, pipeline PL2 is brought to a standby state (pipeline interlock) in periods T4 and T5.
After period T5, the data obtained by executinginstruction 11 is stored inregister file 9. Therefore, execution ofinstruction 12 is restarted in pipeline PL2 in period T6. Specifically, afterinstruction decoding stage 3 is executed in period T6, the execution stage, the memory access stage, and the write back stage are executed in periods T7 to T9, respectively.
As described above, after the data obtained by executinginstruction 11 is once written inregister file 9,register file 9 is accessed in processing of anotherinstruction 12. In other words, the data obtained by executing processing in a pipeline PL1 is given to another pipeline PL2 throughregister file 9. However, as illustrated in FIG. 8B, although the data obtained by executinginstruction 11 has been already obtained by processing in the execution stage in period T3, transmission of data between two pipelines PL1 and PL2 is performed throughregister file 9, so that pipeline PL2 must wait for termination of execution of the write back stage in pipeline PL1. As a result, a long time was required for completing execution of the instruction. In other words, the processing speed of a superscalar processor was reduced.
SUMMARY OF THE INVENTION
An object of the present invention is to reduce the time required for executing an instruction using the data obtained by executing another instruction in a superscalar processor capable of executing two or more instructions simultaneously or in parallel.
Another object of the present invention is to prevent pipeline interlock in a superscalar processor capable of executing two or more instructions simultaneously or in parallel.
To be brief, a superscalar processor according to the present invention executes a first plurality of instructions given earlier and a second plurality of instructions given later under pipeline control, using data stored in a data storage circuit. Each instruction includes a source address indicating an address in the data storage circuit wherein data to be used for executing the instruction is stored and a destination address indicating an address in a data storage circuit wherein the executed data should be stored. The superscalar processor includes a simultaneously executable instruction detecting circuit for detecting simultaneously executable instructions out of the given first or second plurality of instructions and a plurality of pipeline processing executing circuits for executing respective instructions detected by the simultaneously executable instruction. detecting circuit simultaneously and in parallel under pipeline control. Each of the pipeline processing executing circuits includes an earlier instruction executing circuit and a later instruction executing circuit for executing sequentially the given instructions under pipeline control. The later instruction executing circuit provided in at least one of the plurality of pipeline processing executing circuits includes an address coincidence detecting circuit for detecting coincidence between the source address included in a given instruction and the destination address included in the instruction executed in the earlier instruction executing circuit in another pipeline processing executing circuit, and a direct application circuit responsive to the address coincidence detecting circuit for applying the data executed in the earlier instruction executing circuit in another pipeline processing executing circuit directly to the later instruction executing circuit provided in at least one pipeline processing executing circuit.
In operation, in response to the address coincidence detecting circuit, the direct application circuit applies the data executed in the earlier instruction executing circuit in another pipeline processing executing circuit directly to the later instruction executing circuit provided in at least one pipeline processing executing circuit. Specifically, the later instruction executing circuit in the at least one pipeline processing executing circuit directly receives the data executed in the earlier instruction executing circuit in another pipeline processing executing circuit without going through the data storage circuit, so that it is possible to complete the instruction in a short time.
The foregoing and other objects, features, aspects and advantages of the present invention will become more apparent from the following detailed description of the present invention when taken in conjunction with the accompanying drawings.
BRIEF DESCRIPTION OF THE DRAWINGS
FIG. 1 is a block diagram of a superscalar processor illustrating an embodiment of the present invention.
FIG. 2 is a block diagram of one of the function units illustrated in FIG. 1.
FIG. 3 is a block diagram of the address comparator illustrated in FIG. 2.
FIG. 4 is a block diagram of the data selector illustrated in FIG. 2.
FIG. 5 is a block diagram of the arithmetic operation executing unit illustrated in FIG. 2.
FIG. 6 is a timing chart of a two-phase non-overlap clock signal.
FIG. 7 is a block diagram of a superscalar processor illustrating the background of the present invention.
FIGS. 8A, 8B, and 8C are timing charts illustrating progress of pipeline processing.
DESCRIPTION OF THE PREFERRED EMBODIMENTS
Referring to FIG. 1, asuperscalar processor 10 includes aninstruction fetching stage 2, aninstruction decoding stage 3,improved function units 4, 5, 6, and 7, registerfile 9, and a newly provided bypass lines 12. Each of function units 4 to 7 includes an execution stage (EXC) 41, a memory access stage (MEM) 42, and a write back stage (WB) 43.Superscalar processor 10 is operated in response to an externally applied two-phase non-overlap clock signal φ1, φ2 under pipeline control. The basic operation is the same as that of the conventionalsuperscalar processor 20 illustrated in FIG. 7, so that description of it will be omitted.
Bypass line 12 is provided between respective function units 4 to 7, and data obtained in the execution stage in the function units is transmitted throughbypass line 12. In addition, although not particularly illustrated, the destination addresses included in instructions processed in respective function units are transmitted between function units 4 to 7.
One of the improved function units illustrated in FIG. 1 is illustrated in FIG. 2. Referring to FIG. 2, the improved function unit includes anexecution stage 41 for executing an instruction, amemory access stage 42 for accessingdata memory 8 or registerfile 9, and a write backstage 43 for writing the data obtained by executing the instruction intoregister file 9.
Execution stage 41 includes anaddress comparator 80 for comparing a source address and a destination address, adata selector 83 responsive to signals S11 to S18 and S21 to S28 indicating comparison results for selecting data to be used in an arithmetic operation, and an arithmeticoperation executing unit 84 for executing the arithmetic operation on the basis of the selected data. An instruction given from decodingstage 3 includes an instruction code OP, two source addresses SA1 and SA2, and a destination address DA.Address comparator 80 receives the source addresses SA1 and SA2 included in the given instruction. In addition,address comparator 80 receives destination addresses DAb1, DAb2, DAc1, DAc2, DAd1, and DAd2 included in instructions handled in theother function units 5, 6, and 7 illustrated in FIG. 1. Furthermore, addresscomparator 80 receives a destination address DAa1 inexecution stage 41 and a destination address DAa2 inmemory access stage 42.Address comparator 80 compares source addresses SA1 and SA2 and destination addresses DAa1 to DAa2 and detects coincidence between them, and the details of this will be described later.
Destination register 81 holds a destination address DA given from decodingstage 3. The held destination address DAa1 is given to addresscomparator 80 and to destination register 82 inmemory access stage 42.
Data selector 83 is connected to receive two data DT1 and DT2 given byregister file 9 throughdecoding stage 3. Data DT1 and DT2 are required by an instruction inexecution stage 41. In addition,data selector 83 receives data Db1, Db2, Dc1, Dc2, Dd1, and Dd2 having been executed in the execution stages in theother function units 5, 6, and 7 illustrated in FIG. 1 through newly providedbypass line 12. Furthermore,data selector 83 receives data Dal held by data register 86 inmemory access stage 42 and data Da2 held by data register 85 inexecution stage 41.Data selector 83 is operated in response to selection signals S11 to S18 and S21 to S28 given fromaddress comparator 80, and the details of this will be described later.
Arithmeticoperation executing unit 84 is connected throughdata lines 31 and 32 todata selector 83. Data selected bydata selector 83 is applied throughdata lines 31 and 32 to arithmeticoperation executing unit 84, and an arithmetic operation based on the given instruction is execute there. Data indicating the execution result of the arithmetic operation is held by data register 85.
Memory access stage 42 includes adestination register 82 for holding a destination address and adata register 86 for holding data indicating an arithmetic operation result. Write back stage receives the destination address held bydestination register 82 and the executed data held by data register 86. Write backstage 43 writes the executed data intoregister file 9 in accordance with the given destination address.
An example of the address comparator illustrated in FIG. 2 is illustrated in FIG. 3. Referring to FIG. 3, anaddress comparator 80 includescoincidence detectors 811 to 818 for detecting coincidence between a source address SA1 and a destination address andcoincidence detectors 821 to 828 for detecting coincidence between a source address SA2 and a destination address. Source addresses SA1 and SA2 are included in an instruction given from adecoding stage 3.Coincidence detectors 811 to 818 receive destination addresses DAb1, DAb2, DAc1, DAc2, DAd1, and DAd2 given from the other function units and its own destination addresses DAa1 and DAa2, respectively.Coincidence detectors 821 to 828 receive destination addresses in the same way. A coincidence detector, for example,detector 811 detects coincidence between the given source address SA1 and destination address DAbl and generates coincidence detecting signal S11 at a high level. Coincidence detecting signals S11 to S18 and S21 to S28 are given as selection signals for selecting data used in an arithmetic operation todata selector 83.
An example ofdata selector 83 is illustrated in FIG. 4. Referring to FIG. 4,data selector 83 includes tri-state buffers 910 to 918 connected to adata line 31,tri-state buffers 920 to 928 connected to adata line 32, and two NORgates 901 and 902.Tri-state buffers 910 and 920 are connected to receive data DT1 and DT2 given fromregister file 9, respectively.Tri-state buffers 911 to 916 receive data Db1, Db2, Dc1, Dc2, Dd1, and Dd2 applied from theother function units 5, 6 and 7 through abypass line 12, respectively.Tri-state buffers 917 and 918 receive their own executed data Da1 and Da2, respectively.Tri-state buffers 921 to 928 receive the given data in the same way astri-state buffers 911 to 918.Tri-state buffers 911 to 918 are controlled in response to data selection signals S11 to S18 given fromaddress comparator 80, respectively. For example, when data selection signal S11 at a high level is applied,tri-state buffer 911 applies data Db1 onbypass line 12 todata line 31. On the other hand, tri-state buffer 910 operates in response to an output signal from a NORgate 901. If all data selection signals S11 to S18 indicate a low level, NORgate 901 applies an output signal at a high level to tri-state buffer 910, so that data DT1 given fromregister file 9 is applied todata line 31.
Tri-state buffers 921 to 928 operate in response to data selection signals S21 to S28, respectively.Tri-state buffer 920 operates in response to an output signal from a NORgate 902. Data selected by transmission gates 910 to 918 and 920 to 928 is applied throughdata lines 31 and 32 to arithmeticoperation executing unit 84.
An example of arithmeticoperation executing unit 84 is illustrated in FIG. 5. Referring to FIG. 5, arithmeticoperation executing unit 84 includesregisters 841 and 842 for holding data applied throughdata lines 31 and 32, respectively, and an arithmetic operation unit (ALU) 843 for performing an arithmetic operation using the data held byregisters 841 and 842. The data obtained by performing the arithmetic operation inarithmetic operation unit 843 is applied to data register 85 illustrated in FIG. 2.
Now, referring to FIGS. 1 to 5 and FIG. 8C, operation ofsuperscalar processor 10 illustrated in FIG. 1 will be described. It is assumed that superscalar processor_10 illustrated in FIG. 1 also executesinstructions 11 and 12 described above with reference to FIG. 8B in the following description. Referring to FIG. 8C,instruction fetching stage 2 executesinstruction 11 in period T1 (pipeline PL1). In period T2,instruction decoding stage 3 executes instruction 11 (pipeline PL1), and, on the other hand,instruction fetching stage 2 executesinstruction 12. In period T3,instruction 11 is executed in pipeline PL1, and the executed data is obtained. The executed data is scheduled to be written intoregister file 9 in accordance with the destination address included in instruction However, the executed data is scheduled to be used in the execution stage in pipeline PL2 ininstruction 12, so that the executed data is applied throughbypass line 12 illustrated in FIG. 1 to pipeline PL2 as described in the following. In the above description, pipeline PL1 corresponds to functionunit 5, and pipeline PL2 corresponds to function unit 4.
Function unit 4 constituting pipeline PL2 has the structure illustrated in FIG. 2. Referring to FIG. 2, addresscomparator 80 compares source addresses SA1 and SA2 included in the giveninstruction 12 and destination addresses DAa1, DAa2, . . . DAd1, and DAd2. Referring to FIG. 3 again, when coincidence is detected, coincidence detection signals S11 to S18 or S21 to S28 at a high level are generated as selection signals.
Data selector 83 illustrated in FIG. 4 selectively applies data given throughbypass line 12 todata lines 31 and 32 in response to data selection signals S11 to S18 and S21 to S28. If all the data selection signals S11 to S18 and S21 to S28 are at a low level, in the other words, if no coincidence between the source addresses and the destination addresses is detected, data DT1 and DT2 given fromregister file 9 are applied todata lines 31 and 32. Accordingly, when at least one coincidence is detected inaddress comparator 80 illustrated in FIG. 3, data in other function unit (i.e. pipeline PL1) is transmitted through bypass line 12 (i.e. not through register file 9) to function unit 4 (i.e. pipeline PL2). The data executed in the other pipeline PL1 is applied not throughregister file 9 to pipeline PL2, so that it becomes unnecessary for pipeline PL2 to wait for termination of execution in the memory access stage and the write back stage in pipeline PL1.
Specifically, referring to FIG. 8C again, executed data obtained in the execution stage in pipeline PL1 in period T3 is applied throughbypass line 12 to pipeline PL2, so that the execution stage in pipeline PL2 can operate in period T4- While execution of processing in pipeline PL2 was made to stand by in periods T4 and T5 (pipeline interlock) as illustrated in FIG. 8B in the conventionalsuperscalar processor 20 illustrated in FIG. 7, it is possible to continue execution of processing in the execution stage and the memory access stage in periods T4 and T5 insuperscalar processor 10 illustrated in FIG. 1. In the other words, pipeline interlock does not occur. The reason for this is that the data executed in pipeline PL1 is applied throughbypass line 12 to pipeline PL2, so that it becomes unnecessary to wait for termination of execution in the memory access stage and the write back stage in pipeline PL1.
As a result, as seem from comparison between FIGS. 8B and 8C, the period (T2 to T6) required for pipeline PL2 to complete execution ofinstruction 12 is made shorter than the period (T2 to T9) required in the conventional superscalar processor. Accordingly, processing at a higher speed in a superscalar processor is achieved.
Whilesuperscalar processor 10 each having the same pipeline structures (i.e. stages) has been described in the example illustrated in FIG. 1, it is pointed out that application of the present invention is not limited to that. Specifically, even if the superscalar processors have different pipeline structures, it is possible to obtain the above-described advantages with respect to the time required for processing by providing a bypass line between pipelines.
Although the present invention has been described and illustrated in detail, it is clearly understood that the same is by way of illustration and example only and is not to be taken by way of limitation, the spirit and scope of the present invention being limited only by the terms of the appended claims.

Claims (5)

What is claimed is:
1. A superscalar processor including means for storing data and executing a first plurality of instructions given earlier and a second plurality of instructions given later under pipeline control using data stored in said data storage means, wherein each of said instructions includes a source address indicating an address in said data storage means in which data to be used for executing the instruction is stored and a destination address indicating an address in said data storage means in which the executed data should be stored, said superscalar processor comprising:
simultaneously executable instruction detecting means for detecting simultaneously executable instructions out of said given first and second plurality of instructions; and
a plurality of pipeline processing executing means for executing respective instructions detected by said simultaneously executable instruction detecting means in parallel and simultaneously under pipeline control, each of said plurality of pipeline processing executing means including an instruction executing means for executing sequentially the given instructions under pipeline control
and including
address coincidence detecting means for detecting coincidence between the source address included in the given instruction and the destination address included in an instruction executed earlier by an instruction executing means in the same pipeline processing executing means or another pipeline processing executing means, and
direct application means responsive to said address coincidence detecting means for directly applying the data executed earlier by an instruction executing means in one pipeline processing executing means to the instruction executing means provided in at least one other pipeline processing executing means.
2. The superscalar processor according to claim 1, wherein said simultaneously executable instruction detecting means includes decoding means for decoding said first and second plurality of instructions to detect said simultaneously executable instructions.
3. The superscalar processor according to claim 1, wherein said direct application means includes
a data transmission line connected between each of said plurality of pipeline processing executing means for transmitting the data executed earlier by an instruction executing means to the instruction executing means provided in said at least one other pipeline processing executing means, and
data selecting means for receiving data transmitted through said data transmission line and selecting the transmitted data in response to said address coincidence detecting means.
4. The superscalar processor according to claim 1, wherein
each instruction executing means includes a memory access state for executing memory access in accordance with the given instruction, and an execution state for executing an arithmetic operation in accordance with the give instruction.
5. The superscalar processor according to claim 1, wherein said data storage means includes a register file for storing temporarily the data to be used in said first or second plurality of instructions.
US08/225,2651991-06-171994-04-07Superscalar processor with direct result bypass between execution units having comparators in execution units for comparing operand and result addresses and activating results bypassingExpired - Fee RelatedUS5636353A (en)

Priority Applications (2)

Application NumberPriority DateFiling DateTitle
US08/225,265US5636353A (en)1991-06-171994-04-07Superscalar processor with direct result bypass between execution units having comparators in execution units for comparing operand and result addresses and activating results bypassing
US08/865,308US6233670B1 (en)1991-06-171997-05-29Superscalar processor with direct result bypass between execution units having comparators in execution units for comparing operand and result addresses and activating result bypassing

Applications Claiming Priority (4)

Application NumberPriority DateFiling DateTitle
JP3144560AJPH04367936A (en)1991-06-171991-06-17 super scalar processor
JP3-1445601991-06-17
US82827792A1992-01-301992-01-30
US08/225,265US5636353A (en)1991-06-171994-04-07Superscalar processor with direct result bypass between execution units having comparators in execution units for comparing operand and result addresses and activating results bypassing

Related Parent Applications (1)

Application NumberTitlePriority DateFiling Date
US82827792AContinuation1991-06-171992-01-30

Related Child Applications (1)

Application NumberTitlePriority DateFiling Date
US08/865,308ContinuationUS6233670B1 (en)1991-06-171997-05-29Superscalar processor with direct result bypass between execution units having comparators in execution units for comparing operand and result addresses and activating result bypassing

Publications (1)

Publication NumberPublication Date
US5636353Atrue US5636353A (en)1997-06-03

Family

ID=15365101

Family Applications (2)

Application NumberTitlePriority DateFiling Date
US08/225,265Expired - Fee RelatedUS5636353A (en)1991-06-171994-04-07Superscalar processor with direct result bypass between execution units having comparators in execution units for comparing operand and result addresses and activating results bypassing
US08/865,308Expired - Fee RelatedUS6233670B1 (en)1991-06-171997-05-29Superscalar processor with direct result bypass between execution units having comparators in execution units for comparing operand and result addresses and activating result bypassing

Family Applications After (1)

Application NumberTitlePriority DateFiling Date
US08/865,308Expired - Fee RelatedUS6233670B1 (en)1991-06-171997-05-29Superscalar processor with direct result bypass between execution units having comparators in execution units for comparing operand and result addresses and activating result bypassing

Country Status (3)

CountryLink
US (2)US5636353A (en)
JP (1)JPH04367936A (en)
DE (1)DE4207148A1 (en)

Cited By (16)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US5768554A (en)*1995-02-021998-06-16Ricoh Company, Ltd.Central processing unit
US5778248A (en)*1996-06-171998-07-07Sun Microsystems, Inc.Fast microprocessor stage bypass logic enable
US5802386A (en)*1996-11-191998-09-01International Business Machines CorporationLatency-based scheduling of instructions in a superscalar processor
US5805852A (en)*1996-05-131998-09-08Mitsubishi Denki Kabushiki KaishaParallel processor performing bypass control by grasping portions in which instructions exist
US5812845A (en)*1996-05-131998-09-22Mitsubishi Denki Kabushiki KaishaMethod for generating an object code for a pipeline computer process to reduce swapping instruction set
US5813033A (en)*1996-03-081998-09-22Advanced Micro Devices, Inc.Superscalar microprocessor including a cache configured to detect dependencies between accesses to the cache and another cache
US5872986A (en)*1997-09-301999-02-16Intel CorporationPre-arbitrated bypassing in a speculative execution microprocessor
US5941984A (en)*1997-01-311999-08-24Mitsubishi Denki Kabushiki KaishaData processing device
US6016543A (en)*1997-05-142000-01-18Mitsubishi Denki Kabushiki KaishaMicroprocessor for controlling the conditional execution of instructions
US6092184A (en)*1995-12-282000-07-18Intel CorporationParallel processing of pipelined instructions having register dependencies
WO2002029554A3 (en)*2000-10-062002-08-01Intel CorpRegister move operations
US6851044B1 (en)*2000-02-162005-02-01Koninklijke Philips Electronics N.V.System and method for eliminating write backs with buffer for exception processing
US6862677B1 (en)*2000-02-162005-03-01Koninklijke Philips Electronics N.V.System and method for eliminating write back to register using dead field indicator
US20070277014A1 (en)*2006-05-192007-11-29International Business Machines CorporationMove data facility with optional specifications
US20080270763A1 (en)*2005-12-162008-10-30Freescale Semiconductor, Inc.Device and Method for Processing Instructions
US20090070568A1 (en)*2007-09-112009-03-12Texas Instruments IncorporatedComputation parallelization in software reconfigurable all digital phase lock loop

Families Citing this family (16)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US5546554A (en)*1994-02-021996-08-13Sun Microsystems, Inc.Apparatus for dynamic register management in a floating point unit
GB2287108B (en)*1994-02-281998-05-13Intel CorpMethod and apparatus for avoiding writeback conflicts between execution units sharing a common writeback path
US5559976A (en)*1994-03-311996-09-24International Business Machines CorporationSystem for instruction completion independent of result write-back responsive to both exception free completion of execution and completion of all logically prior instructions
JP2636789B2 (en)*1995-03-311997-07-30日本電気株式会社 Microprocessor
US6014759A (en)*1997-06-132000-01-11Micron Technology, Inc.Method and apparatus for transferring test data from a memory array
US6044429A (en)1997-07-102000-03-28Micron Technology, Inc.Method and apparatus for collision-free data transfers in a memory device with selectable data or address paths
JP4201927B2 (en)*1999-08-252008-12-24株式会社ルネサステクノロジ Data processing management device
WO2003036468A1 (en)2001-10-242003-05-01Telefonaktiebolaget Lm Ericsson (Publ)An arrangement and a method in processor technology
US20040128482A1 (en)*2002-12-262004-07-01Sheaffer Gad S.Eliminating register reads and writes in a scheduled instruction cache
US7937557B2 (en)*2004-03-162011-05-03Vns Portfolio LlcSystem and method for intercommunication between computers in an array
US7904695B2 (en)*2006-02-162011-03-08Vns Portfolio LlcAsynchronous power saving computer
JP4243271B2 (en)*2005-09-302009-03-25富士通マイクロエレクトロニクス株式会社 Data processing apparatus and data processing method
US7752422B2 (en)*2006-02-162010-07-06Vns Portfolio LlcExecution of instructions directly from input source
US7904615B2 (en)*2006-02-162011-03-08Vns Portfolio LlcAsynchronous computer communication
US7966481B2 (en)2006-02-162011-06-21Vns Portfolio LlcComputer system and method for executing port communications without interrupting the receiving computer
ATE495491T1 (en)*2006-02-162011-01-15Vns Portfolio Llc EXECUTION OF INSTRUCTIONS DIRECTLY FROM THE INPUT SOURCE

Citations (7)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US4594655A (en)*1983-03-141986-06-10International Business Machines Corporation(k)-Instructions-at-a-time pipelined processor for parallel execution of inherently sequential instructions
US4916652A (en)*1987-09-301990-04-10International Business Machines CorporationDynamic multiple instruction stream multiple data multiple pipeline apparatus for floating-point single instruction stream single data architectures
US5043868A (en)*1984-02-241991-08-27Fujitsu LimitedSystem for by-pass control in pipeline operation of computer
US5051940A (en)*1990-04-041991-09-24International Business Machines CorporationData dependency collapsing hardware apparatus
US5197135A (en)*1990-06-261993-03-23International Business Machines CorporationMemory management for scalable compound instruction set machines with in-memory compounding
US5214763A (en)*1990-05-101993-05-25International Business Machines CorporationDigital computer system capable of processing two or more instructions in parallel and having a coche and instruction compounding mechanism
US5333281A (en)*1988-03-041994-07-26Nec CorporationAdvanced instruction execution system for assigning different indexes to instructions capable of parallel execution and same indexes to instructions incapable of parallel execution

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
JPH0769822B2 (en)*1988-11-081995-07-31日本電気株式会社 Calculation register bypass check method
JP2693651B2 (en)*1991-04-301997-12-24株式会社東芝 Parallel processor
US5539911A (en)*1991-07-081996-07-23Seiko Epson CorporationHigh-performance, superscalar-based computer system with out-of-order instruction execution

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US4594655A (en)*1983-03-141986-06-10International Business Machines Corporation(k)-Instructions-at-a-time pipelined processor for parallel execution of inherently sequential instructions
US5043868A (en)*1984-02-241991-08-27Fujitsu LimitedSystem for by-pass control in pipeline operation of computer
US4916652A (en)*1987-09-301990-04-10International Business Machines CorporationDynamic multiple instruction stream multiple data multiple pipeline apparatus for floating-point single instruction stream single data architectures
US5333281A (en)*1988-03-041994-07-26Nec CorporationAdvanced instruction execution system for assigning different indexes to instructions capable of parallel execution and same indexes to instructions incapable of parallel execution
US5051940A (en)*1990-04-041991-09-24International Business Machines CorporationData dependency collapsing hardware apparatus
US5214763A (en)*1990-05-101993-05-25International Business Machines CorporationDigital computer system capable of processing two or more instructions in parallel and having a coche and instruction compounding mechanism
US5197135A (en)*1990-06-261993-03-23International Business Machines CorporationMemory management for scalable compound instruction set machines with in-memory compounding

Non-Patent Citations (12)

* Cited by examiner, † Cited by third party
Title
"Computer Architecture and Parallel Processing", K. Hwang et al., McGraw-Hill Book Company 1984, pp. 200-203.
"Instruction Issue Logic for High Performance, Interruptible, Multiple Functional Unit, Pipelined Computers", by Gurindar S. Sohi, IEEE Transactions on Computers, vol. 39, No. 3, Mar. 1990, pp. 349-359.
Computer Architecture and Parallel Processing , K. Hwang et al., McGraw Hill Book Company 1984, pp. 200 203.*
Groves et al., "An IBM Second Generation RISC Processor Architecture", IEEE 1990, pp. 166-172.
Groves et al., An IBM Second Generation RISC Processor Architecture , IEEE 1990, pp. 166 172.*
Gurindar S. Sohi, Instruction Issue Logic for High Performance, Interruptible, Multiple Functional Unit, Pipelined Computers , IEEE Transactions on Computers, vol. 39, No. 3, Mar. 1990.*
Gurindar S. Sohi, Instruction Issue Logic for High-Performance, Interruptible, Multiple Functional Unit, Pipelined Computers, IEEE Transactions on Computers, vol. 39, No. 3, Mar. 1990.
Horowitz et al., "MIPS-X; A 20-MIPS Peak, 32-bit Micropressor with On-Chip Cache", IEEE Journal of Solid-State Circuits, vol. sc-22, No. 5, Oct. 1987, pp. 790-799.
Horowitz et al., MIPS X; A 20 MIPS Peak, 32 bit Micropressor with On Chip Cache , IEEE Journal of Solid State Circuits, vol. sc 22, No. 5, Oct. 1987, pp. 790 799.*
Instruction Issue Logic for High Performance, Interruptible, Multiple Functional Unit, Pipelined Computers , by Gurindar S. Sohi, IEEE Transactions on Computers, vol. 39, No. 3, Mar. 1990, pp. 349 359.*
McGeady, "The 1960CA SuperScalar Implementation of the 80960 Architecture", IEEE 1990, pp. 232-240.
McGeady, The 1960CA SuperScalar Implementation of the 80960 Architecture , IEEE 1990, pp. 232 240.*

Cited By (22)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US5768554A (en)*1995-02-021998-06-16Ricoh Company, Ltd.Central processing unit
US5864691A (en)*1995-02-021999-01-26Ricoh Company, Ltd.Central processing unit with a selector that bypasses circuits where processing is not required
US6092184A (en)*1995-12-282000-07-18Intel CorporationParallel processing of pipelined instructions having register dependencies
US5813033A (en)*1996-03-081998-09-22Advanced Micro Devices, Inc.Superscalar microprocessor including a cache configured to detect dependencies between accesses to the cache and another cache
US5805852A (en)*1996-05-131998-09-08Mitsubishi Denki Kabushiki KaishaParallel processor performing bypass control by grasping portions in which instructions exist
US5812845A (en)*1996-05-131998-09-22Mitsubishi Denki Kabushiki KaishaMethod for generating an object code for a pipeline computer process to reduce swapping instruction set
US5778248A (en)*1996-06-171998-07-07Sun Microsystems, Inc.Fast microprocessor stage bypass logic enable
US5802386A (en)*1996-11-191998-09-01International Business Machines CorporationLatency-based scheduling of instructions in a superscalar processor
US5941984A (en)*1997-01-311999-08-24Mitsubishi Denki Kabushiki KaishaData processing device
US6016543A (en)*1997-05-142000-01-18Mitsubishi Denki Kabushiki KaishaMicroprocessor for controlling the conditional execution of instructions
US6430679B1 (en)*1997-09-302002-08-06Intel CorporationPre-arbitrated bypasssing in a speculative execution microprocessor
US5872986A (en)*1997-09-301999-02-16Intel CorporationPre-arbitrated bypassing in a speculative execution microprocessor
US6851044B1 (en)*2000-02-162005-02-01Koninklijke Philips Electronics N.V.System and method for eliminating write backs with buffer for exception processing
US6862677B1 (en)*2000-02-162005-03-01Koninklijke Philips Electronics N.V.System and method for eliminating write back to register using dead field indicator
WO2002029554A3 (en)*2000-10-062002-08-01Intel CorpRegister move operations
US6728870B1 (en)2000-10-062004-04-27Intel CorporationRegister move operations
US20080270763A1 (en)*2005-12-162008-10-30Freescale Semiconductor, Inc.Device and Method for Processing Instructions
US8078845B2 (en)*2005-12-162011-12-13Freescale Semiconductor, Inc.Device and method for processing instructions based on masked register group size information
US20070277014A1 (en)*2006-05-192007-11-29International Business Machines CorporationMove data facility with optional specifications
US7594094B2 (en)2006-05-192009-09-22International Business Machines CorporationMove data facility with optional specifications
US20090070568A1 (en)*2007-09-112009-03-12Texas Instruments IncorporatedComputation parallelization in software reconfigurable all digital phase lock loop
US7809927B2 (en)*2007-09-112010-10-05Texas Instruments IncorporatedComputation parallelization in software reconfigurable all digital phase lock loop

Also Published As

Publication numberPublication date
JPH04367936A (en)1992-12-21
US6233670B1 (en)2001-05-15
DE4207148A1 (en)1992-12-24

Similar Documents

PublicationPublication DateTitle
US5636353A (en)Superscalar processor with direct result bypass between execution units having comparators in execution units for comparing operand and result addresses and activating results bypassing
US5043868A (en)System for by-pass control in pipeline operation of computer
US5293500A (en)Parallel processing method and apparatus
US5404552A (en)Pipeline risc processing unit with improved efficiency when handling data dependency
KR930004214B1 (en)Data processing system
JP3151444B2 (en) Method for processing load instructions and superscalar processor
EP0491693B1 (en)Improved cpu pipeline having register file bypass on update/access address compare
JP3400458B2 (en) Information processing device
US5226166A (en)Parallel operation processor with second command unit
US5522084A (en)Method and system for invalidating instructions utilizing validity and write delay flags in parallel processing apparatus
US5469552A (en)Pipelined data processor having combined operand fetch and execution stage to reduce number of pipeline stages and penalty associated with branch instructions
US4758949A (en)Information processing apparatus
US5678016A (en)Processor and method for managing execution of an instruction which determine subsequent to dispatch if an instruction is subject to serialization
RU2150738C1 (en)Information processing system and method for its operations
US5504870A (en)Branch prediction device enabling simultaneous access to a content-addressed memory for retrieval and registration
KR100241970B1 (en)Data processing apparatus for performing pipeline processing
KR900002436B1 (en)Bypass control system for pipeline processing
US20040098564A1 (en)Status register update logic optimization
JPH06139071A (en) Parallel computer
JP3461887B2 (en) Variable length pipeline controller
JPH08272608A (en) Pipeline processing equipment
JP2511063B2 (en) Pipeline control method
JP2920968B2 (en) Instruction processing order control method
JP2000305782A (en)Arithmetic unit
JPH05307483A (en)Method and circuit for controlling write-in to register

Legal Events

DateCodeTitleDescription
FEPPFee payment procedure

Free format text:PAYOR NUMBER ASSIGNED (ORIGINAL EVENT CODE: ASPN); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

FPAYFee payment

Year of fee payment:4

FPAYFee payment

Year of fee payment:8

REMIMaintenance fee reminder mailed
LAPSLapse for failure to pay maintenance fees
STCHInformation on status: patent discontinuation

Free format text:PATENT EXPIRED DUE TO NONPAYMENT OF MAINTENANCE FEES UNDER 37 CFR 1.362

FPLapsed due to failure to pay maintenance fee

Effective date:20090603


[8]ページ先頭

©2009-2025 Movatter.jp