Background
The programmable gate array belongs to a semi-custom integrated circuit, namely, the customization of a logic circuit is realized through a back-end process or field programming. U.S. Pat. No. 4,870,302 discloses a programmable gate array. It contains a plurality of programmable logic elements (or configurable logic blocks) and programmable connections (or programmable interconnects). The programmable logic unit can selectively realize the functions of shifting, logical negation, AND (logical AND), OR (logical AND), NOR (AND NOT), NAND (NAND), XOR (exclusive OR), plus (arithmetic addition), minus (arithmetic subtraction) AND the like under the control of a setting signal; the programmable connection can selectively realize the functions of connection, disconnection and the like between the two interconnection lines under the control of a set signal.
Currently, many applications involve the computation of complex mathematical functions. Examples of complex mathematical functions include transcendental functions such as exponential (exp), logarithmic (log), trigonometric functions (sina, cos), and the like. To guarantee execution speed, high performance applications require complex mathematical functions to be implemented in hardware. In the existing programmable gate array, complex mathematical functions are realized by solidifying a computing unit. These cured computing units are part of a hard core (hard block) whose circuitry has been cured and cannot be reconfigured. It is clear that curing the computational cells will limit further applications of the programmable gate array. To overcome this difficulty, the present invention generalizes the concept of programmable gate circuits to allow the solid state computing unit to be programmable. In particular, the programmable gate circuit contains a programmable computation unit in addition to the programmable logic unit. The programmable computing unit may selectively implement any of a variety of mathematical functions.
Disclosure of Invention
The invention mainly aims to popularize the application of the programmable gate circuit in the field of complex mathematical computation.
It is a further object of the present invention to provide a programmable gate circuit whose logic functions can be customized as well as their computational functions.
It is another object of the invention to provide a programmable gate array that is more flexible and powerful in computing power.
It is another object of the present invention to provide a programmable gate array with a smaller chip area and lower cost.
To achieve these and other objects, the present invention proposes a three-dimensional printed memory (3D-P) based programmable gate array. It contains a programmable computing unit array, a programmable logic unit array and several programmable connections. Each programmable computing unit contains a plurality of 3D-P arrays, which 3D-P arrays store a look-up table (LUT) of a base library of mathematical functions. Each compute unit also contains a plurality of programmable connections. The values of the desired function can be looked up from the corresponding LUT through these programmable connections. In the present invention, the complex mathematical function refers to mathematical functions other than arithmetic addition (+) and arithmetic subtraction (-) including an exponential, a logarithm, a trigonometric function, and the like.
For high performance programmable computational units, 3D-P is particularly suitable for storing LUTs. The 3D-P is a kind of three-dimensional memory (3D-M), and the stored information is recorded by printing (printing method, such as photolithography, nanoimprint, etc.) in the factory production process. The information is permanently fixed and cannot be changed after leaving the factory. Since the 3D-P memory cell does not need to implement electrical programming, it can withstand larger read voltage and read current than a three-dimensional writable memory (3D-W). Therefore, 3D-P reads much faster than 3D-W.
In addition to the programmable compute unit, the programmable gate array contains a plurality of programmable logic units and programmable connections. In the implementation, the complex mathematical function is first decomposed into a plurality of basic mathematical functions. And then, setting a corresponding programmable computing unit for each basic mathematical function to realize the corresponding basic mathematical function. And finally, the required complex mathematical function is realized by setting the programmable logic unit and the programmable connection.
There are many advantages to implementing a programmable computing unit using 3D-P: firstly, the 3D-P is faster than the 3D-W reading speed, and a high-performance computing unit can be realized; secondly, the 3D-P array sizes required by different basic mathematical functions are all the same or differ by integer multiples. 3D-P arrays representing different basic mathematical functions can be placed in different storage layers and integrated into the same 3D-M module by three-dimensional stacking. This can greatly reduce the substrate area occupied by the programmable compute unit. Finally, because the 3D-P array occupies substantially no substrate area, the programmable logic units and/or programmable connections can be integrated below the 3D-P array, which can further reduce the substrate area occupied by the programmable compute units.
Accordingly, the invention proposes a programmable calculation unit (100) for selectively implementing a first or a second mathematical function, characterized in that it comprises: a semiconductor substrate (0) containing transistors; first and second three-dimensional printed memory (3D-P) arrays (110, 120) stacked on the semiconductor substrate (0), the first 3D-P array (110) storing at least a portion of a look-up table (LUT a) of the first mathematical function, the second 3D-P array (120) storing at least a portion of a look-up table (LUT B) of the second mathematical function; at least one programmable connection (150 or 160) coupled to the first and second 3D-P arrays, the programmable computational unit (200) selectively implementing the first or second mathematical function based on a setting signal (125) of the programmable connection.
The invention also proposes a programmable gate array (400) for implementing a complex mathematical function, characterized in that it comprises: a programmable computational cell array (100AA-100AD) including at least one programmable computational cell that selectively implements a basic mathematical function from a library of basic mathematical functions; a programmable logic cell array (200AA-200AD) containing at least one programmable logic cell that selectively implements a logic operation from a library of logic operations; a plurality of programmable connections (300) coupling the array of programmable compute units and the array of programmable logic units; the programmable gate array (400) implements the complex mathematical function by programming the programmable computational units (100AA-100AD), the programmable logic units (200AA-200AD), and the programmable connections (300), the complex mathematical function being a combination of the basic mathematical functions.
Detailed Description
Fig. 1 is a cross-sectional view of a three-dimensional printed memory (3D-P) 10. The 3D-P is a kind of three-dimensional memory (3D-M) that stores information that is entered by printing during factory production (pad printing). This information is permanently fixed and cannot be changed after shipment. The printing method may be photo-lithography (photo-lithography), nano-imprint method (nano-imprint), electron beam scanning exposure (e-beam lithography), DUV scanning exposure, laser scanning exposure (laser patterning), or the like. The 3D-M of data entry by photolithography is also known as three-dimensional mask-programmed read only memory (3D-MPROM), which is a common 3D-M.
3D-P10 contains a substrate circuit layer 0K formed onsubstrate 0. Amemory layer 16A is stacked over the substrate circuit 0K, and amemory layer 16B is stacked over thememory layer 16A. The substrate circuit layer 0K contains peripheral circuits of the memory layers 16A, 16B, which include the transistors 0t and their interconnection lines 0i (including 0M1-0M 2). Wherein the transistor 0t is formed in asemiconductor substrate 0; interconnect 0i contains interconnect layers 0M1-0M 3. Each memory layer (e.g., 16A) has a plurality of first address lines (e.g., 2a, in the y-direction), a plurality of second address lines (e.g., 1a, in the x-direction), and a plurality of 3D-P memory elements (e.g., 1 aa). The memory layers 16A, 16B are coupled to thesubstrate 0 through contact via holes 1av, 3av, respectively.
In a 3D-P, each storage layer contains a plurality of 3D-P arrays. A 3D-P array is a collection of memory elements in one memory layer all sharing at least one address line. In one 3D-P array, all address lines are contiguous and do not share any address lines with different 3D-P arrays. In addition, one 3D-P chip contains a plurality of 3D-P modules. Each 3D-P module comprises all memory layers in the 3D-P, the top memory layer of which contains only one 3D-P array, and the projection of the 3D-P array onto the substrate determines the boundaries of the 3D-P module.
The 3D-P contains at least two memory elements 1aa, 2 aa. The memory element comprises adiode 14. Thediode 14 has the following broad characteristics: under the reading voltage, the resistance is small; when the applied voltage is less than the read voltage or in the opposite direction to the read voltage, the resistance is larger. The diode film may be a P-i-N diode or may be a metal oxide (e.g., TiO)2) Diodes, etc. Memory cell 2aa is a low resistance memory cell; the memory cell 1aa is a high resistance memory cell. The high resistance memory cell 1aa includes one morehigh resistance film 12 than the low resistance memory cell 2 aa. As a simple example, the high-resistance film 12 may be a silicon dioxide film. Thehigh resistanceThe membrane 12 is physically removed at the memory cell 2aa in the process flow using imprinting.
Since the stored data is entered during the process and cannot be changed thereafter, the 3D-P need not support electrical programming. In contrast, a three-dimensional writable memory (3D-W for short) needs to support electrical programming. Since the read voltage/read current cannot exceed the programming voltage/programming current, the read voltage/read current that the 3D-W can bear is less than that of the 3D-P. Since the read speed of 3D-W is much lower than that of 3D-P, 3D-P is more suitable for high performance calculations.
Fig. 2 is a symbol of aprogrammable computing unit 100. Its input terminal IN comprisesinput data 115, its output terminal OUT comprisesoutput data 135, and its set terminal CFG comprises aset signal 125. Under the control of thesetting signal 125, theprogrammable computing unit 100 selects the desired basic mathematical function from a library of basic mathematical functions.
Fig. 3 is a block circuit diagram of aprogrammable computing unit 100, which also discloses the basic library of mathematical functions that theprogrammable computing unit 100 can implement. It contains first and second programmable connections 150,160 and LUTs a-D that store a library of basic mathematical functions. In this embodiment, the firstprogrammable connection 150 is a 1 to 4 demux and the secondprogrammable connection 160 is a 4 to 1 mux, the basic mathematical function library including log (), exponential exp (), log sine [ sin () ] and log cosine [ cos () ]. LUT A stores a log table log (), LUT B stores an exponent table exp (), LUT C stores a log sine table log [ sin () ], and LUT D stores a log cosine table log [ cos () ]. For example, to implement the function exp (), the firstprogrammable connection 150 sends theinput data 115 to the corresponding LUT B as an address. Based on this address a table lookup is performed, i.e. the value in LUT B (exp ()) is read out. From which value the secondprogrammable connection 160 will then be sent to the output asoutput data 135. For those skilled in the art, a library of basic mathematical functions may contain more basic mathematical functions. By comparison, it may contain eight basic mathematical functions, including log (), exp (), sin (), cos (), sqrt (), cbrt (), tan (), atan (). Of course, other various combinations are possible.
Fig. 4 shows a first implementation of theprogrammable computing unit 100, which is a layout of its substrate circuit 0K. Since the 3D-P array is stacked above substrate circuitry 0K, not insubstrate 0, only the projection of the 3D-P array ontosubstrate 0 is represented by the dashed lines. In this embodiment, each LUT is stored in a 3D-P array: LUT A is stored in 3D-P array 110, LUT B is stored in 3D-P array 120, LUT C is stored in 3D-P array 130, and LUT D is stored in 3D-P array 140. These 3D-P arrays 110-140 are arranged side by side. Substrate circuitry 0K includesprogrammable connections 150, 160 and peripheral circuitry for each of the 3D-P arrays 110 and 140: such asX-decoder 15A and Y-decoder (including readout circuitry) 17A of 3D-P array 110.
To reduce the substrate area occupied by theprogrammable computing unit 100, the present invention exploits the property of three-dimensional stacking, stacking together 3D-P arrays that store different basic mathematical functions. Fig. 5A-5B illustrate a second implementation of theprogrammable computing unit 100. In the cross-sectional view of FIG. 5A, the 3D-P array 110 of LUTs A storing functions log () is stacked above the substrate circuit 0K (+ z direction), the 3D-P array 120 of LUTs B storing functions exp () is stacked above the 3D-P array 110 (+ z direction), the 3D-P array 130 of LUTs C storing functions log [ sin () ] is stacked above the 3D-P array 120 (+ z direction), and the 3D-P array 140 of LUTs D storing functions log [ cos () ] is stacked above the 3D-P array 130 (+ z direction). As can be seen more clearly from the substrate circuit layout diagram of fig. 5B, the projections of the 3D-P array 110 storing LUT a, the 3D-P array 120 storing LUT B, the 3D-P array 130 storing LUT C, and the 3D-P array 140 storing LUT D on thesubstrate 0 in this embodiment overlap, and they occupy only 1/4 of the embodiment in fig. 4. At the same time, the Z-decoder 19 functions as aprogrammable connection 150, 160.
Fig. 6 shows aprogrammable gate array 400. It contains regularly arranged programmable modules 400A and 400B, etc. Each programmable module (e.g., 400A) contains a plurality of programmable compute units (e.g., 100AA-100AD) and programmable logic units (e.g., 200AA-200 AD).Programmable channels 320, 340 are contained between programmable computing units (e.g., 100AA-100AD) and programmable logic units (e.g., 200AA-200 AD); between the programmable module 400A and the programmable module 400B, there are alsoprogrammable channels 310, 330, 350. The programmable channel 310-350 contains a plurality ofprogrammable connections 300. For those skilled in the art, in addition to programmable channels, sea-of-gates (sea-of-gates) and the like may be used.
Fig. 7A discloses a connection library that can be implemented by aprogrammable connection 300. Theprogrammable connection 300 is similar to the programmable connection disclosed in U.S. patent 4,870,302. It adopts a connection mode of the following connection library: a)interconnection line 302/304,interconnection line 306/308, but 302/304 is not connected to 306/308; b) theinterconnection lines 302/304/306/308 are all connected; c)interconnect 306/308 is connected,interconnect 302, 304 are unconnected, and are not connected to 306/308; d)interconnection line 302/304 is connected, andinterconnection lines 306, 306 are not connected, nor are they connected to 302/304; e) none of theinterconnect lines 302, 304, 306 are connected. In this specification, a symbol "/" between two interconnect lines indicates that the two interconnect lines are connected, and a symbol "between two interconnect lines" indicates that the two interconnect lines are not connected.
Fig. 7B discloses a library of logic operations that can be implemented by theprogrammable logic unit 200. With inputs a and B beinginput data 210, 220 and output C beingoutput data 230. Theprogrammable logic unit 200 is similar to the programmable logic unit disclosed in U.S. patent 4,870,302. It can implement at least one of the following logic operation libraries: c = A, A logical not, a shift, AND (a, B), OR (a, B), NAND (a, B), NOR (a, B), XOR (a, B), arithmetic addition a + B, arithmetic subtraction a-B, etc.Programmable logic unit 200 may also contain sequential circuit elements such as registers, flip-flops, etc. to practice pipelining and the like.
Fig. 8 shows a third implementation of theprogrammable computing unit 100. Since the 3D-P arrays 110-140 do not occupy the substrate area, theprogrammable logic unit 200 can be integrated under the 3D-P arrays 110-140 and at least partially covered by the 3D-P arrays 110-140. In addition, programmable connections can be integrated below the 3D-P array 110-140 and at least partially covered by the 3D-P array 110-140. All of these measures can further reduce the chip area of theprogrammable gate array 400.
Fig. 9 is a specific implementation of aprogrammable gate array 400 for implementing a complex mathematical function: e = a.sin(b)+c.cos (d). Theprogrammable connection 300 in the programmable channel 310-350 is represented in fig. 7A: a programmable connection with a dot at an intersection point means that the intersection lines are connected, a programmable connection without a dot at an intersection point means that the intersection lines are not connected, and a disconnected programmable connection means that the disconnected interconnect line is divided into two interconnect line segments that are not connected to each other. In this embodiment, the programmable computation unit 100AA is set to log () whose computation result log (a) is fed to the first input of theprogrammable logic unit 200 AA. Programmable computing element 100AB is set to log [ sin ()]The calculation result log [ sin (b)]To a second input of theprogrammable logic unit 200 AA. Programmable logic unit 200AA is set to "arithmetic addition", the result of which is log (a) + log [ sin (b)]To theprogrammable computing unit 100 BA. The programmable computing unit 100BA is set to exp (), the result of which exp { log (a) + log [ sin (b)]}=a.sin (b) is provided to a first input ofprogrammable logic unit 200 BA. Similarly, with appropriate settings, the programmable computation units 100AC, 100AD, the programmable logic unit 200AC, the result c of the programmable computation unit 100BC.cos (d) is sent to the second input ofprogrammable logic unit 200 BA. Programmable logic unit 200BA is set to "arithmetic addition", a.sin, (b) and c.cos (d) is added there, the final result is sent to output e. It is apparent that other complex mathematical functions can also be implemented by theprogrammable gate array 400 by changing the settings.
The present specification takes a Field Programmable Gate Array (FPGA) as an example. In an FPGA, the wafer will complete all the processes (including all the programmable compute units, programmable logic units, and programmable connections). In the programming field, the function of the FPGA can be defined by setting up the programmable connections. The example of an FPGA described above can be easily generalized to a conventional programmable gate array. In a conventional programmable gate array, the wafer is only semi-finished, i.e., the wafer production only completes the programmable compute units and the programmable logic units, but not the programmable connections. When the chip functionality is determined, the programmable channel 310-350 is customized by back-end processing.
It will be understood that changes in form and detail may be made therein without departing from the spirit and scope of the invention, and are not intended to impede the practice of the invention. The invention, therefore, is not to be restricted except in the spirit of the appended claims.