This application is a continuation of U.S. patent application Ser. No. 14/326,828, filed Jul. 9, 2014, which is hereby incorporated by reference herein in its entirety.
BACKGROUNDThis invention relates to integrated circuits and, more particularly, to configurable specialized processing blocks in an integrated circuit.
Considering a programmable logic device (PLD) as one example of an integrated circuit, as applications for which PLDs are used increase in complexity, it has become more common to design PLDs to include configurable specialized blocks such as configurable specialized storage blocks and configurable specialized processing blocks in addition to blocks of generic programmable logic. Such specialized blocks may include circuitry that has been partly or fully hardwired to perform one or more specific tasks, such as a logical or a mathematical operation.
A specialized block may also contain one or more specialized structures. Examples of structures that are commonly implemented in such specialized blocks include multipliers, arithmetic logic units (ALUs), memory elements such as random-access memory (RAM) blocks, read-only memory (ROM) blocks, content-addressable memory (CAM) blocks and register files, AND/NAND/OR/NOR arrays, etc., or combinations thereof.
One particularly useful type of configurable specialized processing block that has been provided on programmable logic devices (PLDs) is a specialized processing block (SPB) that can be used to process audio signals (as an example). Such blocks may sometimes be referred to as multiply-accumulate (MAC) blocks, when they include structures to perform multiplication operations, summing operations, and/or accumulations of multiplication operations.
SUMMARYIn accordance with certain aspects of the invention, a circuit may have an output port, input ports, and configuration circuitry. The configuration circuitry may configure the circuit to perform an arithmetic function based on first, second, and third signals that were received at the input ports and to provide the result of the arithmetic function at the output port in a first mode. The configuration circuitry may further configure the circuit to perform a multiplexing function based on the first, second, and third signals and provide a selected one of the first, second, and third signals at the output port in a second mode.
It is appreciated that the present invention can be implemented in numerous ways, such as a process, an apparatus, a system, a device, or a method on a computer readable medium. Several inventive embodiments of the present invention are described below.
In certain embodiments, the above-mentioned configuration circuitry may configure the circuit to perform a register pipeline function of the first signal in a third mode. In this mode, the circuit may include at least one pipeline register between one of the input ports and the output port.
If desired, the circuit may further include two multipliers. The first multiplier may receive first and second signals from the input ports and perform a portion of the arithmetic function in the first mode and a portion of the multiplexing function in the second mode. The second multiplier may receive second and third signals from the input ports and perform an additional portion of the arithmetic function in the first mode and an additional portion of the multiplexing function in the second mode.
If desired, the circuit may further include an adder that is coupled to the first and second multipliers. The adder and the two multipliers may implement a sum of product function in the first mode and another portion of the multiplexing function in the second mode.
Further features of the invention, its nature and various advantages, will be more apparent from the accompanying drawings and the following detailed description of the preferred embodiments.
BRIEF DESCRIPTION OF THE DRAWINGSFIG. 1 is a diagram of an illustrative integrated circuit having specialized processing blocks in accordance with an embodiment.
FIG. 2 is a diagram of an illustrative integrated circuit with storage, processing, and input-output circuitry in accordance with an embodiment.
FIG. 3 is a diagram of an illustrative specialized processing block in accordance with an embodiment.
FIG. 4 is a diagram of an illustrative specialized processing block that is configured to operate as a 4:1 multiplexer in accordance with an embodiment.
FIG. 5 is a diagram of an illustrative specialized processing block that is configured to operate as a 4:1 multiplexer with a scaling factor in accordance with an embodiment.
FIG. 6 is a diagram of five illustrative specialized processing blocks that are arranged and configured to operate as a 16:1 multiplexer in accordance with an embodiment.
FIG. 7 is a diagram of an illustrative specialized processing block that is configured to operate as a two independent 16-bit pipeline registers in accordance with an embodiment.
FIG. 8 is a diagram of an illustrative specialized processing block that is configured to operate as a 36-bit pipeline register in accordance with an embodiment.
FIG. 9 is a flow chart showing illustrative steps for operating a specialized processing block as an arithmetic circuit in a first mode and as a multiplexer in a second mode in accordance with an embodiment.
FIG. 10 is a flow chart showing illustrative steps for operating a specialized processing block as an arithmetic circuit in a first mode and as pipeline registers in a second mode in accordance with an embodiment.
DETAILED DESCRIPTIONThe present invention relates to integrated circuits and, more particularly, to integrated circuits with configurable specialized processing blocks.
Configurable specialized processing blocks are often configured to support a pure arithmetic use model in which input data is processed by arithmetic operators such as multipliers and adders or accumulators to implement an arithmetic function such as a multiply-accumulate function.
Typically, only a portion of a user design implements such arithmetic functions and this portion may vary depending on the design, whereas some integrated circuits may provide a fixed number of configurable specialized processing blocks. Thus, situations frequently arise where the implementation of a user design on an integrated circuit leaves some of the available configurable specialized processing blocks unused. Consequently, in an effort to use the available circuit area more efficiently, it would be desirable to implement other portions of the user design on these otherwise unused configurable specialized processing blocks.
For this purpose, a configurable specialized processing block may be configured to operate in different modes. For example, a configured specialized processing block may be configured as an arithmetic circuit, as a multiplexer, and/or as a register pipeline.
It will be recognized by one skilled in the art, that the present exemplary embodiments may be practiced without some or all of these specific details. In other instances, well-known operations have not been described in detail in order not to unnecessarily obscure the present embodiments.
An illustrative embodiment of an integrated circuit such as programmable logic device (PLD)100 having an exemplary interconnect circuitry is shown inFIG. 1. As shown inFIG. 1, the programmable logic device (PLD) may include a two-dimensional array of functional blocks, including logic array blocks (LABs)110 and other functional blocks, such as random access memory (RAM) blocks130 and configurable specialized processing blocks such as specialized processing blocks (SPB)120, for example. Functional blocks such asLABs110 may include smaller programmable regions (e.g., logic elements, configurable logic blocks, or adaptive logic modules) that receive input signals and perform custom functions on the input signals to produce output signals.
Programmable logic device100 may contain programmable memory elements. Memory elements may be loaded with configuration data (also called programming data) using input/output elements (IOEs)102. Once loaded, the memory elements each provide a corresponding static control signal that controls the operation of an associated functional block (e.g.,LABs110, SPB120,RAM130, or input/output elements102).
In a typical scenario, the outputs of the loaded memory elements are applied to the gates of metal-oxide-semiconductor transistors in a functional block to turn certain transistors on or off and thereby configure the logic in the functional block including the routing paths. Programmable logic circuit elements that may be controlled in this way include parts of multiplexers (e.g., multiplexers used for forming routing paths in interconnect circuits), look-up tables, logic arrays, AND, OR, NAND, and NOR logic gates, pass gates, etc.
The memory elements may use any suitable volatile and/or non-volatile memory structures such as random-access-memory (RAM) cells, fuses, antifuses, programmable read-only-memory memory cells, mask-programmed and laser-programmed structures, combinations of these structures, etc. Because the memory elements are loaded with configuration data during programming, the memory elements are sometimes referred to as configuration memory, configuration RAM (CRAM), configuration memory elements, or programmable memory elements.
In addition, the programmable logic device may have input/output elements (IOEs)102 for driving signals off of PLD and for receiving signals from other devices. Input/output elements102 may include parallel input/output circuitry, serial data transceiver circuitry, differential receiver and transmitter circuitry, or other circuitry used to connect one integrated circuit to another integrated circuit. As shown, input/output elements102 may be located around the periphery of the chip. If desired, the programmable logic device may have input/output elements102 arranged in different ways. For example, input/output elements102 may form one or more columns of input/output elements that may be located anywhere on the programmable logic device (e.g., distributed evenly across the width of the PLD). If desired, input/output elements102 may form one or more rows of input/output elements (e.g., distributed across the height of the PLD). Alternatively, input/output elements102 may form islands of input/output elements that may be distributed over the surface of the PLD or clustered in selected areas.
The PLD may also include programmable interconnect circuitry in the form of vertical routing channels140 (i.e., interconnects formed along a vertical axis of PLD100) and horizontal routing channels150 (i.e., interconnects formed along a horizontal axis of PLD100), each routing channel including at least one track to route at least one wire. If desired, the interconnect circuitry may include double data rate interconnections and/or single data rate interconnections.
If desired, routing wires may be shorter than the entire length of the routing channel. A length L wire may span L functional blocks. For example, a length four wire may span four blocks. Length four wires in a horizontal routing channel may be referred to as “H4” wires, whereas length four wires in a vertical routing channel may be referred to as “V4” wires.
Different PLDs may have different functional blocks which connect to different numbers of routing channels. A three-sided routing architecture is depicted inFIG. 1 where input and output connections are present on three sides of each functional block to the routing channels. Other routing architectures are also intended to be included within the scope of the present invention. Examples of other routing architectures include 1-sided, 1½-sided, 2-sided, and 4-sided routing architectures.
In a direct drive routing architecture, each wire is driven at a single logical point by a driver. The driver may be associated with a multiplexer which selects a signal to drive on the wire. In the case of channels with a fixed number of wires along their length, a driver may be placed at each starting point of a wire.
Note that other routing topologies, besides the topology of the interconnect circuitry depicted inFIG. 1, are intended to be included within the scope of the present invention. For example, the routing topology may include diagonal wires, horizontal wires, and vertical wires along different parts of their extent as well as wires that are perpendicular to the device plane in the case of three dimensional integrated circuits, and the driver of a wire may be located at a different point than one end of a wire. The routing topology may include global wires that span substantially all ofPLD100, fractional global wires such as wires that span part ofPLD100, staggered wires of a particular length, smaller local wires, or any other suitable interconnection resource arrangement.
Furthermore, it should be understood that embodiments of the present invention may be implemented in any integrated circuit. If desired, the functional blocks of such an integrated circuit may be arranged in more levels or layers in which multiple functional blocks are interconnected to form still larger blocks. Other device arrangements may use functional blocks that are not arranged in rows and columns.
FIG. 2 shows a block diagram of another embodiment of anintegrated circuit200 in accordance with the present invention.Integrated circuit200 may include amemory block260, a specialized processing (SP) block220, aversatile processing block270, and input/output circuitry240.
SP block220 may include multipliers, adders, accumulators, shifters, and other arithmetic circuitry. SP block220 may also include storage elements such as registers, latches, memory arrays, or other storage circuitry. Such storage elements may serve different purposes. For instance, storage elements may store coefficients for implementing FIR filters or to select and mask input data when implementing a multiplexing function. Alternatively, storage elements may be used to pipeline a critical path or to synchronize data before it is processed. If desired, SP block220 may be configurable to operate in different modes. For example, SP block220 may be configurable to operate as an arithmetic circuit, as a multiplexer, and/or a register pipeline.
Memory block260 may include random-access memory (RAM), first-in first-out (FIFO) circuitry, stack or last-in first-out (LIFO) circuitry, read-only memory (ROM), registers, latches, or other storage circuitry suitable to store data. Input/output circuitry may include parallel input/output circuitry, differential input/output circuitry, serial data transceiver circuitry, or other input/output circuitry suitable to transmit and receive data.
Versatile processing block270 may include embedded microprocessors, microcontrollers, or other processing circuitry.Versatile processing block270 may have combinational and sequential logic circuitry such as logical function blocks and storage elements such as registers.Versatile processing block270 may be configurable or programmable to perform any arbitrary function. In comparison, SP block220 may have limited functionality due to specialized processing components and limited configurability of interconnect resources. For example, SP block220 may include multipliers and adders to facilitate the efficient implementation of arithmetic functions, but may not be configurable to implement combinational functions such as a combinational sum-of-products (i.e., a logical OR function of several logical AND functions). As another example, interconnect resources may be arranged such that SP block220 can be configured to implement an arithmetic sum-of-products (i.e., an addition of several multiplications), but not an arithmetic product-of-sums (i.e., a multiplication of several additions).
In contrast,versatile processing block270 may be configured to perform any function including arithmetic functions and combinational functions. However,versatile processing block270 may be much less efficient in executing a function that SP block220 can implement.
Internal interconnection resources230 such as conductive lines and busses may be used to send data from one component to another component or to broadcast data from one component to one or more other components.External interconnection resources250 such as conductive lines and busses, optical interconnect infrastructure, or wired and wireless networks with optional intermediate switches may be used to communicate with other devices. In certain embodiments, theinternal interconnect resources230, and/or theexternal interconnect resources250 may be implemented using configurable interconnect circuitry.
FIG. 3 shows an embodiment of a configurable specialized processing block such as SP block220 ofFIG. 2. Configurablespecialized processing block300 may include input registers320 andoutput registers380,multiplexers310,335,340,360, and390,configuration memory385,coefficient storage circuitry330,multipliers350, andadder370.
Configurablespecialized processing block300 described inFIG. 3 is merely illustrative and is not intended to limit the scope of the present invention. If desired, some multiplexers may be omitted to trade-off flexibility for circuit area. For example, omittingmultiplexers310 may reduce the area of configurablespecialized processing block300 by four 2:1 multiplexers and the respective configuration bits and wires at the expense that all input data is registered, thereby adding one clock cycle of latency to all incoming data. Similarly, omittingmultiplexers390 may reduce the area of configurablespecialized processing block300 by three 2:1 multiplexers and the respective configuration bits and wires at the expense that all output data is registered, thereby adding one clock cycle of latency all outgoing data.
If desired, configurablespecialized processing block300 may include additional circuitry. For example, configurablespecialized processing block300 may include additional circuitry for pattern detection, rounding, saturation, overflow and underflow handling, and/or additional arithmetic circuitry such as accumulator circuitry (e.g., circuitry implemented as a feedback loop from anoutput register380 to adder370) or pre-adder circuitry (e.g., to add input signals in symmetrical filter implementations), just to name a few. Arithmetic circuitry may perform integer arithmetic, fixed-point arithmetic, and/or floating-point arithmetic (e.g., single-precision floating-point, double-precision floating-point, etc.) operations.
If desired, configurablespecialized processing block300 may have more or less inputs and outputs. For example, configurablespecialized processing block300 may have only a single output and six inputs. In this example, the single output may be driven by a multiplexer that may choose between an adder output and a multiplier output (not shown).
Every signal in configurablespecialized processing block300 may include multiple bits of data. For example, input signals IN_0, IN_1, IN_2, and IN_3 may all include nine bits, 12 bits, 16 bits, 18 bits, 25 bits, 27 bits, 32 bits, 36 bits, etc. If desired, each pair of signals (i.e., IN_0 and IN_1, or IN_2 and IN_3) may have a different number of bits. For example, IN_0 and IN_2 may have 18 bits while IN_1 and IN_3 have 25 bits. As another example, IN_0 and IN_2 may have 27 bits while IN_1 and IN_3 have 18 bits or vice versa. These examples are merely illustrative.
Similarly, output signals OUT_0, OUT_1, and OUT_2 may all have the same number of bits. Alternatively, all output signals may have a different bit width. As an example, OUT_1 may have one more bit than OUT_0 and OUT_1. For example, the multiplication of two 18-bit numbers in each ofmultipliers350 may produce two 36-bit numbers. Adding these two 36-bit numbers inadder370 may produce a 37-bit number because of a potential carry bit.
Configurablespecialized processing block300 may have a different number ofmultipliers350 and/oradders370 than shown inFIG. 3. For example, configurablespecialized processing block300 may include fourmultipliers350 and threeadders370, thereby allowing for four independent multiplications, two independent sum of two products, or a sum of four products.
Multipliers350 may be any type of multiplier. For example,multiplier350 may be a floating-point multiplier, a fixed-point multiplier, an integer multiplier, just to name a few. Similarly,adder370 may be any type of adder. For example,adder370 may be a floating-point adder, a fixed-point adder, an integer adder, just to name a few.
If desired, configurablespecialized processing block300 may have a feedback path (not shown) from theoutput register380 that is driven byadder370 back to the input ofadder370 to allow for the implementation of a multiply-accumulate function.
Configuration circuitry such asconfiguration memory385 may control the selection inmultiplexers310,340,360, and390 and thereby configure configurable specialized processing block to operate in different modes. For example, configurablespecialized processing block300 may be configured to operate as an arithmetic operator, as a multiplexer, as a register pipeline, or in some combinations thereof such as a register pipelined multiplexer, a register pipelined arithmetic operator, or a multiplexer with arithmetic operation execution.
As an example, configuration circuitry may configuremultiplexers310 and360 to serve as bypass circuitry, which routes inputs IN_1 and IN_3 to adder370, thereby bypassingmultipliers350.
When configured to operate as an arithmetic operator, configurablespecialized processing block300 may implement two independent multiplications, an addition, a sum of products, and some combinations thereof, just to name a few. For example, configurablespecialized processing block300 may compute the product of signals IN_0 and IN_1 and the product of the signals IN_2 and IN_3 by selecting the respective signals inmultiplexers310, selecting the output ofmultiplexers310 inmultiplexers340, multiplying each pair of signals inmultipliers350, selecting the output ofmultipliers350 inmultiplexers360, and selecting the output ofmultiplexers360 inmultiplexers390 to produce signals OUT_0 and OUT_2, respectively.
If desired,multiplexers310 and390 may be configured to store input signals IN_0, IN_1, IN_2, and IN_3 in input registers320 and the results of the multiplication in output registers380.Adder370 may compute the sum of the outputs ofmultiplexers360, thereby producing a sum of products as output signal OUT_1.
When configured to operate as a multiplexer, signals IN_0 and IN_2 may be used as select signals and select between signals IN_1 and IN_3. For example, configurablespecialized processing block300 may select signal IN_1 by directingmultiplexers335 with signals IN_0 and IN_2 to select a one and a zero incoefficient storage330, respectively.Multiplexers340 may select the output ofmultiplexers340. Multiplying IN_1 with one produces IN_1 and IN_3 with zero produces IN_1 and zero at the output ofmultipliers350, respectively. Selecting the output ofmultipliers350 inmultiplexers360 and computing the sum inadder370 may produce IN_1 as output signal OUT_1.
FIG. 4 shows an embodiment of a configurablespecialized processing block400 that is configured to operate as a 4:1 multiplexer. As shown, configurablespecialized processing block400 may include coefficient storage blocks430,432,434, and436,multiplexers440,multipliers410,first stage adders420, andsecond stage adder425. Configurablespecialized processing block400 may include additional circuitry, which has been omitted in order not to unnecessarily obscure the present embodiment.
As an example, coefficient storage blocks430,432,434, and436 may each be configured to store a ‘1’ at a first, second, third, and fourth address, respectively, and to store zeroes at the second, third, and fourth addresses, at the first, third, and fourth addresses, at the first, second, and fourth addresses, and at the first, second, and third addresses, respectively.
For example,coefficient storage block430 may store a ‘1’ ataddress Bank 0, while coefficient storage blocks432,434, and436 store a ‘0’ ataddress Bank 0. Similarly, coefficient storage blocks432,434, and434 may store a ‘1’ ataddresses Bank 1,Bank 2, andBank 3, respectively, while coefficient storage blocks430,434, and436 store a ‘0’ ataddress Bank 1, coefficient storage blocks430,432, and436 store a ‘0’ ataddress Bank 2, and coefficient storage blocks430,432, and434 store a ‘0’ ataddress Bank 3.
Configurablespecialized processing block400 may receive input signals IN_0, IN_1, IN_2, IN_3, and select signal SEL. Select signal SEL may controlmultiplexers440 to select a coefficient fromcoefficient storage430,432,434, and436 and thereby control which input signal may be selected. For example, select signal SEL may be ‘0’, which may directmultiplexers440 to select the coefficients stored ataddress Bank 0 in coefficient storage blocks430,432,434, and436. Thus,multiplexers440 may select ‘1’, ‘0’, ‘0’, and ‘0’ from coefficient storage blocks430,432,434, and436, respectively.Multipliers440 may compute products as IN_0, ‘0’, ‘0’, and ‘0’, respectively, whichadders420 and425 may sum together resulting in IN_0 being produced at the output ofadder425.
Similarly, select signal SEL may be ‘1’, ‘2’, or ‘3’, which may directmultiplexers440 to select the coefficients stored ataddresses Bank 1,Bank 2, andBank 3, respectively, in coefficient storage blocks430,432,434, and436, thereby producing IN_1, IN_2, and IN_3, respectively, at output ofadder425.
In some applications, it may be desirable to multiply the selected input signal with a constant scaling factor (e.g., if the selected input signal is multiplied with a constant number in a later operation).FIG. 5 shows an embodiment of a configurable specialized processing block that is configured to operate as a 4:1 multiplexer in which the selected input signal is multiplied with constant scaling factor SCALE.
As shown, configurablespecialized processing block500 may include coefficient storage blocks530,532,534, and536,multiplexers440,multipliers410,first stage adders420, andsecond stage adders425. Configurablespecialized processing block500 may include additional circuitry, which has been omitted in order not to unnecessarily obscure the present embodiment.
As an example, coefficient storage blocks530,532,534, and536 may each be configured to store a scaling factor SCALE (e.g., a factor which may be equal to 2, 4, 5, 8, 10, 2.5, etc.) at a first, second, third, and fourth address, respectively, and to store zeroes at the second, third, and fourth addresses, at the first, third, and fourth addresses, at the first, second, and fourth addresses, and at the first, second, and third addresses, respectively. For example,coefficient storage block530 may store a scaling factor SCALE ataddress Bank 0, while coefficient storage blocks532,534, and536 store a ‘0’ ataddress Bank 0. Similarly, coefficient storage blocks532,534, and534 may store a scaling factor SCALE ataddresses Bank 1,Bank 2, andBank 3, respectively, while coefficient storage blocks530,534, and536 store a ‘0’ ataddress Bank 1, coefficient storage blocks530,532, and536 store a ‘0’ ataddress Bank 2, and coefficient storage blocks530,532, and534 store a ‘0’ ataddress Bank 3.
Configurablespecialized processing block500 may receive input signals IN_0, IN_1, IN_2, IN_3, and select signal SEL. Select signal SEL may controlmultiplexers440 to select a coefficient fromcoefficient storage530,532,534, and536 and thereby control which input signal may be selected and multiplied with scaling factor SCALE. For example, select signal SEL may be ‘0’, which may directmultiplexers440 to select the coefficients stored ataddress Bank 0 in coefficient storage blocks530,532,534, and536. Thus,multiplexers440 may select scaling factor SCALE, ‘0’, ‘0’, and ‘0’ from coefficient storage blocks530,532,534, and536, respectively.Multipliers440 may compute products as (SCALE*IN_0), ‘0’, ‘0’, and ‘0’, respectively, whichadders420 and425 may sum together resulting in (SCALE*IN_0) being produced at the output ofadder425.
Similarly, select signal SEL may be ‘1’, ‘2’, or ‘3’, which may directmultiplexers440 to select the coefficients stored ataddresses Bank 1,Bank 2, andBank 3, respectively in coefficient storage blocks530,532,534, and536, thereby producing (SCALE*IN_1), (SCALE*IN_2), and (SCALE*IN_3), respectively at output ofadder425.
If desired, larger multiplexers may be formed by cascading smaller multiplexers. As an example, a 16:1 multiplexer may be formed by two stages of 4:1 multiplexers in which the first stage includes four 4:1 multiplexers and the second stage includes one 4:1 multiplexer. An embodiment of such an arrangement that uses configurablespecialized processing block400 ofFIG. 4 to implement each of the 4:1 multiplexers is shown inFIG. 6.
As shown, the 16:1 multiplexer ofFIG. 6 includes a first stage of four configurable specialized processing blocks400 and a second stage of one configurablespecialized processing block400 that are each configured to operate as 4:1 multiplexers as shown inFIG. 4 and described above. Each of the four configurable specialized processing blocks400 receives four input signals and a select signal, whereby the select signal is shared among all four configurable specialized processing blocks400. For example, the top-most configurablespecialized processing block400 may receive input signals IN_0, IN_1, IN_2, IN_3 and select signal SEL[1:0], the next configurablespecialized processing block400 may receive input signals IN_4, IN_5, IN_6, IN_7, and select signal SEL[1:0], etc.
The select signal together with the coefficients that are stored in the coefficient storage may select one signal from each of the configurable specialized processing blocks400. As an example, the select signal together with the coefficients stored in the coefficient storage may select input signals IN_0, IN_4, IN_8, and IN_12, respectively.
The configurablespecialized processing block400 that forms the second stage of the 16:1 multiplexer may receive select signal SEL[3:2] and the selected input signals from the first stage. Select signal SEL[3:2] together with the coefficients that are stored in the coefficient storage may select one signal from the signals received from configurable specialized processing blocks400 of the first stage. As an example,configurable storage block400 may receive signals IN_0, IN_4, IN_8, and IN_12 from the first stage, and the select signal SEL[3:2] together with the coefficients stored in the coefficient storage may select signal IN_0 as the output signal of the 16:1 multiplexer.
If desired, some of the 4:1 multiplexers may be implemented using different circuitry. For example, 4:1 multiplexers may be implemented as dedicated circuitry, using functional blocks (e.g., usingLABs110 ofFIG. 1), or using the embodiment of configurablespecialized processing block500 ofFIG. 5, thereby multiplying the selected input signal with a scaling factor, just to name a few alternative implementations.
If desired, a configurable specialized processing block such as configurable specialized processing blocks300,400, or500 may be configured to implement registered multiplexers or registered arithmetic operators (e.g., using input registers320 and/oroutput registers380 ofFIG. 3). In certain embodiments, a configurable specialized processing block such as configurablespecialized processing block300 may be configured to operate as a register pipeline.
In the example of configurable specialized processing block, configuration bits may configuremultiplexers310,360, and390 such that input signals IN_1 and/or IN_3 are register pipelined. Consider the scenario in which configurable specializedprocessing block300 is configured to implement two register pipeline stages for signals IN_1 and IN_3. In this scenario, input registers320 may operate as a first register pipeline stage and store input signals IN_1 and IN_3.
Multiplexer310 may select the stored signals from input registers320,multiplexers360 may select the signals frommultiplexers310, andoutput registers380 may operate as a second register pipeline stage and store the signals frommultiplexers360.Multiplexers390 may select the stored signals fromoutput registers380, and thereby provide the signals as output signals OUT_0 and OUT_2.
Configurablespecialized processing block300 may be configured differently and still operate as a register pipeline. An embodiment of such a configurable specialized processing block is shown inFIG. 7. As an example, configurablespecialized processing block300 ofFIG. 3 may be configured as shown inFIG. 7 ifmultiplexers360 were omitted.
Configurablespecialized processing block700 may include input registers712,714,716, and718,multipliers732 and736, andoutput registers722 and726. As shown, configurablespecialized processing block700 may be configured to operate as two independent, two stage register pipelines with each having 16 bits of data.
Configurablespecialized processing block700 may receive input signals IN_A, IN_B and ‘1’ at the two remaining input ports. Input registers712 and716 may store input signals IN_A and IN_B, respectively, while input registers714 and718 may store ‘1’.
Multiplier732 may multiply IN_A that is stored in input registers712 with ‘1’ stored in input registers714, thereby producing IN_A, which may be stored in output registers722. Similarly,multiplier736 may multiply IN_B that is stored in input registers716 with ‘1’ stored in input registers718, thereby producing IN_B, which may be stored in output registers726.
Instead of receiving a ‘1’ at input ports, configurablespecialized processing block700 may receive a select signal that may direct a multiplexer to retrieve a ‘1’ stored in coefficient storage circuitry such ascoefficient storage circuitry330 ofFIG. 3.
Another embodiment of a configurable specialized processing block that is configured to operate as a register pipeline is shown inFIG. 8. Configurablespecialized processing block800 may include input registers812,814, and816,multiplier832,adder842, and output registers822. As shown, configurablespecialized processing block800 may be configured to operate as a two stage register pipelines having 36 bits of data.
Configurablespecialized processing block800 may receive input signals IN_B, IN_C and ‘0’ at the input ports. Input registers814 and816 may store input signals IN_B and IN_C, respectively, while input registers812 may store ‘0’.
Multiplier832 may multiply IN_B that is stored in input registers814 with ‘0’ stored in input registers812, thereby producing ‘0’.Adder842 may add IN_C that is stored in input registers816 to ‘0’ frommultiplier832, thereby producing IN_C, which may be stored in output registers822.
If desired, configurablespecialized processing block800 may include coefficient storage circuitry coupled to a multiplexer, such ascoefficient storage circuitry330 that is coupled tomultiplexer335 ofFIG. 3, and an input signal (e.g., signal IN_A, not shown) may direct the multiplexer to retrieve a ‘0’ stored in the coefficient storage circuitry.
FIG. 9 is a flow chart showing illustrative steps for operating a configurable specialized processing block such as configurablespecialized processing block300 ofFIG. 3 as an arithmetic circuit in a first mode and as a multiplexer in a second mode in accordance with an embodiment.
Duringstep910, the configurable specialized processing block may receive first, second and third signals. Depending on whether the configurable specialized processing block is configured to operate in a first or second mode, the configurable specialized processing block may operate as an arithmetic operator or as a multiplexer, respectively.
In response to determining that the configurable specialized processing block is configured to operate in first mode, the configurable specialized processing block may compute a product of the first and second signals duringstep920.
In response to determining that the configurable specialized processing block is not configured to operate in first mode, the configurable specialized processing block may select first and second coefficients by addressing first and second storage circuits based on the third signal duringstep930. For example, the third input may direct a multiplexer such asmultiplexer335 ofFIG. 3 to select an appropriate coefficient from a coefficient storage circuitry such ascoefficient storage circuitry330 ofFIG. 3.
Duringstep940, the configurable specialized processing block may compute a first product of the first signal and the first selected coefficient and a second product of the second signal and the second selected coefficient duringstep950. In the event that one of the first and second selected coefficients is ‘1’ and the other is ‘0’, one of the first and second products may be ‘0’ while the other may be the first or second signal.
Duringstep960, the configurable specialized processing block may compute a sum of the first and second products. In the event that one of the first and second products is ‘0’ and the other the first or second signal, the output of the adder may produce either the first or the second signal at an output of the configurable specialized processing block, thereby selecting between the first and second signals based on the third signal.
FIG. 10 is a flow chart showing illustrative steps for operating a configurable specialized processing block such as configurablespecialized processing block300 ofFIG. 3 as a single register pipeline in a first mode and as one and two pipeline registers in a second mode in accordance with an embodiment.
Duringstep1010, the configurable specialized processing block may receive first and second signals. Duringstep1020, the configurable specialized processing block may compute the product of the first and second signals in the first and second mode.
In the event that the configurable specialized processing block is configured to operate in a first mode which may include executing an arithmetic function, the configurable specialized processing block may optionally receive a third signal duringstep1065 and compute the sum of the product of the first and second signals and the third signal duringstep1075.
In the event that the configurable specialized processing block is not configured to operate in a first mode, the configurable specialized processing block may be configured to operate in a second mode which may include register pipelining a signal. Register pipelining may be performed in single width mode as shown inFIG. 7 or in double width mode as shown inFIG. 8.
In response to determining that the configurable specialized processing block operates in single with mode, the configurable specialized processing block may receive a second signal that is asserted to ‘1’ duringstep1035, which has the effect that the product of the first and second signals computed duringstep1020 is the first signal. The configurable specialized processing block may store the product to perform register pipelining of the first signal (e.g., usingoutput registers722 ofFIG. 7) duringstep1040. In order to implement a second register pipeline stage, the configurable specialized processing block may store the first and second signals to perform register pipelining of the first signal (e.g., using input registers712 and714 ofFIG. 7) duringstep1050.
In response to determining that the configurable specialized processing block operates not in single width mode, the configurable specialized processing block may receive a second signal that is asserted to ‘0’ duringstep1025. Alternatively, the second signal may serve as a control signal that retrieves ‘0’ from coefficient storage and overrides the second signal duringstep1030 for the purpose of computing the product of first and second signals duringstep1020, which as a result is ‘0’.
Duringstep1060, the configurable specialized processing block may receive a third signal and compute the sum of the product of the first and second signals and the third signal (e.g., usingadder842 ofFIG. 8) duringstep1070. Since the product of the first and second signals computed duringstep1020 is ‘0’, the result of the sum is the third signal. The configurable specialized processing block may store the sum to perform register pipelining of the first signal (e.g., usingoutput registers822 ofFIG. 8) duringstep1090. In order to implement a second register pipeline stage, the configurable specialized processing block may store the first, second, and third signals to perform register pipelining of the third signal (e.g., using input registers812,814, and816 ofFIG. 8) duringstep1095.
The method and apparatus described herein may be incorporated into any suitable electronic device or system of electronic devices. For example, the method and apparatus may be incorporated into numerous types of devices such as microprocessors or other ICs. Exemplary ICs include programmable array logic (PAL), programmable logic arrays (PLAs), field programmable logic arrays (FPLAs), electrically programmable logic devices (EPLDs), electrically erasable programmable logic devices (EEPLDs), logic cell arrays (LCAs), field programmable gate arrays (FPGAs), application specific standard products (ASSPs), application specific integrated circuits (ASICs), digital signal processors (DSPs), graphics processing units (GPUs) just to name a few.
The integrated circuit described herein may be part of a data processing system that includes one or more of the following components; a processor; memory; I/O circuitry; and peripheral devices. The integrated circuit can be used in a wide variety of applications, such as computer networking, data networking, instrumentation, video processing, digital signal processing, or any suitable other application where the advantage of using configurable specialized processing circuits that may operate as multiplexers and/or register pipelines is desirable.
Although the method operations were described in a specific order, it should be understood that other operations may be performed in between described operations, described operations may be adjusted so that they occur at slightly different times or described operations may be distributed in a system which allows the occurrence of the processing operations at various intervals associated with the processing, as long as the processing of the overlay operations are performed in a desired way.
The foregoing is merely illustrative of the principles of this invention and various modifications can be made by those skilled in the art without departing from the scope and spirit of the invention.