Movatterモバイル変換


[0]ホーム

URL:


US6609188B1 - Data flow processor - Google Patents

Data flow processor
Download PDF

Info

Publication number
US6609188B1
US6609188B1US09/540,196US54019600AUS6609188B1US 6609188 B1US6609188 B1US 6609188B1US 54019600 AUS54019600 AUS 54019600AUS 6609188 B1US6609188 B1US 6609188B1
Authority
US
United States
Prior art keywords
data
devices
bus
memory
processor
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Lifetime
Application number
US09/540,196
Inventor
Randy R. Dunton
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Intel Corp
Original Assignee
Intel Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Intel CorpfiledCriticalIntel Corp
Priority to US09/540,196priorityCriticalpatent/US6609188B1/en
Assigned to INTEL CORPORATIONreassignmentINTEL CORPORATIONASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS).Assignors: DUNTON, RANDY R.
Priority to US10/461,847prioritypatent/US6904512B2/en
Application grantedgrantedCritical
Publication of US6609188B1publicationCriticalpatent/US6609188B1/en
Anticipated expirationlegal-statusCritical
Expired - Lifetimelegal-statusCriticalCurrent

Links

Images

Classifications

Definitions

Landscapes

Abstract

A data flow processor includes a number of hardware units each having more than one mode. A plurality of hardware units may be connected together to implement a flow made up of a series of processes. The flows, initiated by a central processing unit, may proceed independently and substantially at their own pace. Thus, the flows may operate in parallel, independently with respect to one another. Each of the hardware units may be configured differently to operate with each of the different flows.

Description

BACKGROUND
This invention relates generally to digital signal and graphics processors.
A digital signal processor generally modifies or analyzes information measured as a discrete sequence of numbers. Digital signal processors are utilized for a wide variety of signal processing applications such as television, multimedia, audio, digital imaging processing and telephony as examples. Most of these applications involve a certain amount of mathematical manipulation, usually multiplying and adding signals.
A large number of digital signal processors are available from a large number of vendors. Generally, each of these processors is fixed in the sense that it comes with certain capabilities. The users attempt to acquire those processors which best fit their needs and budget. However, the user's ability to modify the overall architecture of the digital signal processor is relatively limited. Thus, these products are packaged as units having generally fixed and immutable sets of capabilities.
In a number of cases, it would be desirable to have the ability to create a digital signal processor that performs complex functions that are specifically adapted to particular problems to be solved. Thus, it would be desirable that the hardware or software of the digital signal processor be adaptable to a particular function. However, such a digital signal processor might enjoy relatively limited market. Given the investment in silicon processing, it may not be feasible to provide the digital signal processor that has been designed to meet relatively specific needs. However, such a device would be highly desirable. It would provide the greatest performance for the expense incurred, since only those features that are needed are provided. Moreover, those features may be provided that result in the highest performance without unduly increasing costs.
Processor speed has increased dramatically over the last few years. However, the ability of memories to keep track with high speed processors has lagged. One way to get around this problem is to use caches. However, caches do not work well when the data is usually different. Thus, systems that work with data intense operations generally do not scale in speed with improving processor speed.
In addition, many processing devices access memory at a high frequency. Each time memory is accessed, the system processing time is decreased. Moreover, memory accesses commonly result in power consumption. In some battery operated systems, it would be desirable to reduce power consumption. Therefore, it would be desirable to find a way to reduce the number of memory accesses in the course of a processing routine.
Thus, there is a need for a processor that is readily adaptable to handling a variety of intense data manipulation operations.
BRIEF DESCRIPTION OF THE DRAWINGS
FIG. 1 is a block diagram of one embodiment of the present invention;
FIG. 2 is a block diagram of the I/O interface shown in FIG. 1 in accordance with one embodiment of the present invention;
FIG. 3 is a schematic depiction of a data flow in accordance with one embodiment of the present invention utilizing the I/O interface shown in FIG. 2;
FIG. 4 shows a portion of a mode table in accordance with one embodiment of the present invention;
FIG. 5 is a schematic depiction of another data flow in accordance with one embodiment of the present invention;
FIG. 6 is a flow chart for software in accordance with one embodiment of the present invention;
FIG. 7 is a more detailed flow chart for software for implementing the data flow processor shown in FIG. 1 in accordance with one embodiment of the present invention;
FIG. 8 is still another depiction of a data flow in accordance with one embodiment of the present invention;
FIG. 9 is a transmitter in accordance with one embodiment of the present invention;
FIG. 10 is a passive receiver in accordance with one embodiment of the present invention;
FIG. 11 is an active receiver in accordance with one embodiment of the present invention; and
FIG. 12 shows how the arbiter and the DMA engine communicate with the bus in one embodiment of the present invention.
DETAILED DESCRIPTION
Referring to FIG. 1, a digital signal processor (DSP)10 may include abus12 that couples a number of hardware units14-28. A data input may be received by input/output (I/O)interface14. Theinterface14 is coupled to thebus12 through receiving first in first out (FIFO) registers14aand transmittingFIFO registers14b.
Anarithmetic logic unit16 is coupled to thebus12 through receivingFIFO registers16aand transmittingFIFO registers16b. One ormore DSP engines18 may be coupled to thebus12 through a receivingFIFO register18aand a transmittingFIFO register18b. In addition, specialized DSP engines such as a lookup table20 may be coupled to thebus12 throughFIFO registers20aand20b. Abus arbiter22 may be coupled to thebus12 through a simple request/grant signal pair, operating over a request grant bus that is part of thebus12. Thearbiter22 may include aregister23 that stores bus access priorities.
A direct memory access (DMA)engine24 is coupled to thebus12 through receiving and transmittingFIFO registers24aand24b. Address, data pairs may be sent as packets over thesame bus12 that carries other data, with theDMA engine24 having the highest priority in one embodiment of the invention. A random access memory (RAM)controller26 is coupled to thebus12 through receiving and transmittingFIFO registers26aand26b. TheRAM controller26 is also coupled to anexternal RAM memory30. In one embodiment of the present invention, theRAM30 may be dynamic random access memory.
Finally, a general purpose central processing unit (CPU)28 may be coupled to thebus12 through receivingFIFO registers28aand transmittingFIFO registers28b. TheCPU28 may be coupled to input/output devices orperipherals32 to enable user interfaces with theDSP10.
TheCPU28 managescertain DSP10 tasks. For example, it may handle interrupts, manage the system and may be responsible for initial set up of the various hardware units14-26. Thus, theCPU28 may not control the step by step execution of the process steps implemented by the various hardware units making up the rest of theDSP10. Instead, during digital signal processing it may be responsible for more limited applications in the sense of a service provider to the remaining hardware units which actually provide the functional results of theDSP10. For example, theCPU28 may perform complex logic tasks such as operating a real time operating system (RTOS), implementing file management, and providing user interfaces. In some cases, theCPU28 may help or substitute for other hardware units14-26 as the need arises.
The DSP10 utilizes a data flow architecture in which a plurality of parallel data flows progress through various units14-20 relatively independently of any central control from theCPU28 or any other central resource. In fact, the units14-20 may perform their operations on data without the use of a central memory resource in the course of data flow processing. At the end of any given data flow, information may be written to an external memory and at the initiation of a data flow, data may read from an external memory. However, in the course of any given data flow, there may be no need to transmit addresses since generally the data moves with the data flow.
In general, decoupling the processing operations from the need for frequent memory accesses may greatly increase processing speed, simplify processing operations, and in some cases reduce power consumption. In addition, by reducing the number of memory accesses in the course of a processing operation, it may be possible to structure the memory addresses in different configurations. For example, memory address may be arranged in two or three dimensional spaces. For example, in connection with imaging arrays, it may be advantageous to manipulate addresses in two dimensions which correspond to the x and y pixels of the imaging array. Similarly, in dealing with complex three dimensional shapes, it may be advantageous to utilize memory addresses in three dimensions. Conventional memories operate in one dimensional memory spaces. However, in systems with limited memory accesses, the one dimensional data from the memory may be converted into a more than one dimension space. The data may utilize a multi-dimensional array, and then the results data may be converted for storage in a one dimensional memory thereafter. In many processing systems of conventional design the use of multi-dimensional data is not feasible because the numerous memory accesses would require constant conversion between one and multi-dimensional memory spaces.
Each of the hardware units14-20 and any modules contained within those units may have a plurality of modes. Each of these modes may be used in different flows at different times. Thus, the same hardware unit or module may act differently in different modes. The modes may be selected through information precoded into the units by theCPU28 in a set up stage in one embodiment of the invention. Thus, a hardware unit may be adapted to accomplish a variety of different functional variations on a central operating theme. No central synchronization may be required. Instead, the data flows may progress through one or more hardware units at a rate determined by those particular hardware units and the manipulations they perform. When more than one data flow must come together to create a result, the faster data flow may wait for the slower data flow to arrive.
Each of the hardware units or modules in theDSP10 may be re-programmable. Even when those units or modules have a variety of programmable modes for one application, they may be reprogrammed with other modes for new applications.
The nature of the hardware units used in any given DSP implementation is subject to wide variability. The units shown in FIG. 1 constitute one potential set of units adapted for image processing applications, for example.
Referring to FIG. 2, in one embodiment of the present invention theinterface14 may receive input data from asensor33. Thesensor33 may be an imaging array as one example. Alternatively, theinterface14 may receive input information through FIFO registers14a1 or14a2 coupled to thebus12. The input data from thesensor33 initially goes to acapture module34. Thecapture module34 may perform discrete functions such as sensor control and timing. Thecapture module34 is coupled toALU module36, a pixel substitution module38 and a three color look uptable module40. An output or transmitFIFO14bmay also be coupled to thebus12. Thus, the input/output interface14 may perform complex functions associated with the capture of video data in one embodiment of the present invention.
As shown in FIG. 2, each of themodules34,36,38 and40 also includes acommand register35,37,39 and41. During an initial setup mode, theCPU28 programs each of these registers with information about the way a particular module is to operate. For example, the CPU may set the bits in these registers to determine the mode of operation of eachmodule34 to40.
An exemplary data flow, shown in FIG. 3, that may be implemented on the input/output interface14, begins by collecting a reset frame as indicated inblock42. This process may be accomplished by thecapture module34. In particular, the illustrated flow uses a first mode of thecapture module34. Thecapture module34 may have any number of capture modes each of which perform a different function. Thus, the flow illustrated in FIG. 3 acquires data from thesensor33 by initially collecting areset frame42.
TheALU module36 is in a mode five that is a bypass mode wherein themodule36 is not utilized. Similarly, the pixel substitution module38, in a mode two, is also bypassed. Thus, as indicated inblock44, the look uptable module40 is utilized to scale the pixel values of the reset or background noise image. The look uptable module40 may be in a mode one in accordance with one embodiment of the present invention. The output data is then stored in theRAM30.
Thus, a variety of reprogrammable hardware units may be utilized in a particular mode to accomplish a given function. Other flows, in addition to the one shown in FIG. 3, may be ongoing at the same time as the illustrated flow and may use many of the same modules in different modes to achieve different results. In this way, a given module may be used variably and its mode assignment may be preprogrammed.
The preprogramming of a given data flow or segment such as the segment 1.1 illustrated in FIG. 3 may be accomplished through a mode table shown in FIG.4. The mode table may be a table stored in the memory associated with theCPU28 that sets the selected modes for each of a plurality of modules or hardware units involved in a given flow. Thus, thecapture module14 is illustrated as being in mode one, theALU module36 is in a mode five, the module38 is in mode two and theLUT module40 is in mode one for the flow shown in FIG.3.
The information stored in the mode table is transferred by the CPU to the individual units or modules. In particular, command registers such as the registers35-41 may be preset with desired operating characteristics such as the particular modes that are desired in a given data flow. Thus, the information in the mode table is transferred to the individual modules or units over thebus12 to set the internal command registers for each mode of operation. The command registers in each unit or module monitor thebus12 for information that relates to their units. When a command coded for its unit is identified, the command register causes the command to be stored in an appropriate register. Thus, each command may be identified with a transmit identifier (TXID) for a particular module or unit together with type information. The type information may identify whether the information is data, address, or command information, as a few examples.
The mode table may also provide additional information about the operation of the directmemory access engine24, interrupt registers and FIFO registers. For example, as indicated under the entry “LUT” for the segment 1.1, a direct memory access (DMA) engine may be in mode one, an interrupt register may be in mode one and a transmit FIFO may be in mode one. The transmit FIFO is theFIFO14bin FIG. 2, the DMA engine is theunit24 and the interrupt register is a register associated with theCPU28.
DMA mode one, for example, may be a write and a move (i.e., write progressively to addresses in the X direction). Other possible modes for the DMA include read, move in the X direction burst; read, move in the Y direction; write, move in the X direction; skip by one in the X direction; read, move in the X direction; skip by one in the X direction; skip by one in the Y direction and the like.
In one embodiment of the present invention, there may be seven DMA channels. Each of the channels may be in a different mode than other channels at any given time.
The DMA interrupt registers may have two modes in one embodiment. In a first mode, an interrupt may be on a write end and in the second mode an interrupt may be on a read end.
The transmit FIFO registers may have several modes in one embodiment of the present invention. For example, in one mode, the transmit FIFO registers transmit to two different units and monitor both for busy signals. Thus, for example, in FIGS. 3 and 4, the transmit FIFO registers are in a mode one. In this mode, the transmit FIFO registers fill in a unit identifier for the unit that will be receiving data from the transmit FIFO registers. The unit that will be receiving transmitted data is theRAM controller26. Thus, the transmitFIFO14bprovides the transmit directions to transmit data to theRAM controller26.
The mode table may also assign the highestbus access priority59, as shown in FIG.4. The highest priority for bus accesses is assigned to theLUT module40 in the illustrated example.
Those skilled in the art will appreciate that a large number of segments each corresponding to different data flows may be produced in the mode table for any given complex process resulting in an ending result. Moreover, the number of hardware modules in the mode table may be much higher than the four modules illustrated. Thus, a large number of units or modules and a large number of segments may be operated in parallel and relatively independently of one another.
In some embodiments of the present invention, all the modules or units shown in FIG. 1 may be formed as one integrated circuit, potentially with the exception of theRAM30 and the input/output unit32. Within the one integrated circuit, bandwidth is necessarily abundant. While more than one bus may be utilized, onebus12 may be utilized in some cases because the use of one bus allows easy reconfiguration of a plurality of units that may be readily configured together.
Referring to FIG. 5, in a more complex data flow, utilizing a multi-stage pipe, theunit62 is a data source (such as the cluster of acapture module34 and a three color LUT40). Theunit64 may accomplish a general math function such as a multiply performed in a fixed function DSP. Thefinal unit68 is theRAM controller26. When theunit64 requests thebus12, theDMA24 recognizes the activity by looking at the bus grants and automatically generates the needed RAM address identified as an address signal on thebus12. TheDMA24 channel was programmed with its instructions during the set up stage. Thus data may flow without addresses between units. When storage is involved, either for source data reads or destination data writes, an address may be required. TheDMA controller24 with its multiple channels may be used for automatic address generation. Thus, theRAM controller26 receives the needed address to write to theRAM30.
Thedata flow software72, shown in FIG. 6 in accordance with one embodiment, begins by programming the various selected modes into two or more hardware or units such as any of the units14-20 shown in FIG. 1, as indicated inblock74. Parallel, independent data flows are then initiated starting from input data or stored data as indicated inblock78. Generally, the parallel data flows may be initiated automatically upon the receipt of new data or under the control ofCPU28.
During a set up stage for each of the parallel data flows, the hardware units are placed in different modes for the different data flows as indicated inblock80. In this way, a given unit may selectively operate in different modes. As a result, the same hardware device may be effectively reconfigured on the fly and reused for different functions. Once all the data flows are complete, the results are produced and stored as indicated inblock82.
A device identification (ID) is used to communicate on thebus12. Each transaction on thebus12 has a transmit ID (TXID) that indicates where the cycle is going. Each unit that initiates a cycle on thebus12 sends a TXID. Each unit that receives data from the bus responds to a specific TXID and captures the current cycle on the bus when there is a match between the TXID and a unit ID for that particular unit. A cycle may consist of an address and/or data and the TXID.
Each cycle on theDSP10 may include address, command or data and may include type information as well. Again, the type information indicates whether the information is address, data, command or some other form of information. Flag information may be information that indicates the last address, in an x or y field for example, so the system knows when no more addresses will be forthcoming.
Thus, a variety of different types of information may be sent as packets along the same packetized bus. In some cases, it may be more desirable to have a separate bus for information that is time sensitive. For example, thearbiter22 may operate with its own bus in one embodiment to the present invention. A cycle may also include flag information. A receiver ID (RXID) is also used. Thebus12 carries the return path for the originator of the current bus cycle. This return path is used only for posted reads as theRAM controller26 needs to send the read data back at a later time to this ID.
A unit ormodule transmitter118, shown in FIG. 9, is responsible for requesting thebus12, unloading itsFIFO122 and sending data to the proper place. To perform this function, thetransmitter118 has aTXTID register120 to store the identifier (i.e., link) of the next module or unit in the flow where the data is to be sent. Thus, the TXTID is sent on the transmitidentifier path12aand the data, type and flags are transmitted from aFIFO122 to thedata path12b.
Apassive receiver124, shown in FIG. 10, is responsible for receiving data commands or addresses on thebus12. TheFIFO126 is loaded with this data in a final step. To perform the data receiving function, thereceiver124 has aTXRID register128 to store the identifier to match the transmit identifier as indicated inblock130.
Anactive receiver132, shown in FIG. 11, is programmed by theCPU28 to initiate a memory read cycle. Thereceiver132 waits for the posted read it initiated to create an inter-unit cycle on thebus12 and receives data for itsFIFO126. To create the read cycle, thereceiver132 sends a request to a unit in itsTXTID register120 and the return ID is sent from theRXID register136 along with the request. This active participation is set with a register bit in theregister134. If set as passive, thereceiver132 operates as a passive receiver. Since the active receiver is likely to be the first unit of the pipe, it may trigger the whole processing chain. A receiver identifier constant129 is used to identify the return half of a split read transaction on thepath12aandb.
A busy state may be used to convey any receiver's full state back to a transmitter or active receiver. A separate busy signal bus may be the feedback path in the pipeline that allows a receiver to signal back to a transmitter when it is too full to receive more data. Each transmitter looks for the busy signal of the receiver it is sending to and prior to requesting the bus, checks to make sure the receiver's FIFO is not busy. The transmitter is able to identify the busy signals on the busy bus of interest based on TXIDs.
A moredetailed version84 of the data flow software, shown in FIG. 7, may be stored in association with theCPU28. The stored flow begins by identifying the data sources as indicated inblock86. The data sources are the sources of data to be processed. Rectangular or two dimensional addresses to theRAM30 as well as linear or one dimensional addresses to theRAM30 are loaded into the DMA channels that are to be used for source data as indicated byblock88. Since in many cases the source data is in the form of a two dimensional array such as a pixel array, two dimensional addresses in theRAM30 may be utilized in some embodiments of the present invention.
The units that are required to read the source data are then linked to the DMA channels by setting one DMA channel to correspond to a given unit's identifier as indicated in block90. In other words, each of the units is assigned a unit identification in a bus grant in response to a bus request. Thus, the DMA channels may be programmed during the set up stage to automatically provide the memory addresses shortly following a read request to theRAM controller26.
Connections between the various units shown in FIG. 1 are made by theCPU28 during the setup stage by setting the output unit's transmit identifier (TXID) equal to the value of the downstream unit's receiver identifier (RXID), as indicated inblock92. A unit that stores the final results in memory may then be linked to a DMA channel. When the last unit performs a write, the DMA channel address is attached to the write command as the write command is sent over thebus12 to theRAM controller26, as indicated inblock94. A DMA to unit link is established by configuring a DMA channel to belong to a certain output stage of a unit. The DMA channel monitors the bus grants from thearbiter22 to make the match.
The various bus priorities are then set up in theregisters23 of thearbiter22 as indicated inblock96. The bus access priority is generally set so that the last data flow segment step has the highest priority and the second to the last step is the second to highest priority, and so on. This assures that there will be no blockage in the pipe, which might cause the system to fail.
If required, interrupts are set up in theDMA controller24 such that on the end of the last write of the processed data, theCPU28 is interrupted. TheDMA controller24 monitors the data bus tag fields such as end of field in the x direction (EOX) or the end of field of the of the y direction (EOY), in a two dimensional data field such as a pixel array. Thus theDMA controller24 looks for an EOX and EOY defining the last data or the last pixel. The end of the field in the x direction corresponds to the end of the row and the end of the field in the y direction corresponds to the end of the column and the end of the entire field in one embodiment of the present invention. TheCPU28 also has the option to poll the DMA registers to monitor progress.
The EOX and EOY are set by theCPU28 during the initiation of a data flow. TheDMA engine24 is the only unit that knows when all the addresses are done. It is theDMA engine24 that attaches EOX and EOY tags at the end of a data field.
The receiving FIFO registers of the receiving units are set up to be active or passive receivers as indicated inblock98. Only one active receiver is needed for the beginning of each data flow. The other receiving FIFO registers of other units may be passive.
Finally, the data sources are triggered. This triggers the first unit in a data flow to begin processing. Each unit capable of being the first unit has a specific register to designate that unit to respond to the source data trigger. In order to trigger the unit, an active bit in its receive FIFO registers is set as indicated inblock100.
Referring to FIG. 8, an example of another data flow involves twoDSP units102 and108 andmemory controller104 coupled in series. The flow begins when the active receiver of theunit102 requests data be read from thedevice104 which may be theRAM controller26 coupled to theRAM30. TheDMA controller24 supplies an address over one of its two illustrated DMA channels (channels one and two). For example, channel one of theDMA controller24 may be assigned to a process implemented by theunit104.
When the read has occurred, the data flow begins. Thedevice104 then sends the data to the device102 (as indicated by the arrow101) that made the original request, which then transfers the data onto thedevice108. Assuming unit104 (theRAM controller26 for example) accomplishes the last step, the data is written back to thestorage30 using DMA channel two for address creation.
Theunit102 negotiated for the read to take place and thedevice104 performed the read offline from the buses' perspective (i.e., for a posted read or split cycle). When the read data was ready, theunit102 requested the bus to deliver the data to theunit102.
In the example illustrated in FIG. 8, two sources of information (data/address) merge to become one atunit104. It is also an example of two units (units102 and108) feeding off of one data source. The trigger elements must be determined. That is, the device that is begin the flow must be set. In the case of capture and ALU modules, the capture may be the overriding process that determines the pace of the data flow, and the ALU simply keeps pace. For this case, it is advantageous to trigger both modules, one after the other, with the ALU triggered first since it is a slave. On the other end of the pipe, the module receiving the data listens in on the same channel.
Referring to FIG. 12, data and address/grant information may be sent over thesame packet bus12 using anarbiter22 which communicates with the address/grant bus12d. The address/grant bus12dwithin theoverall bus12 provides for given units or modules to request access to thebus12 and for thearbiter22 to grant that access, as appropriate based on the unit's priority and the current requests for the bus by other units. At the same time, theDMA engine24 also accesses the address/grant bus12dso that it can determine when any given unit is seeking data from memory. TheDMA engine24 normally communicates over thedata bus12b. In other embodiments of the present invention, the address/grant information may be packetized with the other data.
A series of data flows may operate relatively independently of one another and in parallel. After an initial set up phase, a given flow may be implemented that begins with a read, involves a series of process steps and ends in a write. In each case, any number of these data flows may be operating at the same time. In some cases, these data flows may use the same hardware units at indeterminant times. Flow control may be achieved simply by feedback to the various units from the flow. When the data units are busy, the data flow awaiting access to a unit simply awaits the removal of the unit's busy flag. The data flows may progress without constantly seeking data from a central memory. Instead, data may be read at the beginning of a data flow and written at the end of the data flow. Within the data flow, the data may be simply carried with the data flow without requiring any kind of addressing mechanism.
Because the data flows may progress relatively independently of memory accesses, a much more flexible operation is achieved. In particular, reducing the number of memory accesses may increase the speed of operation of some embodiments of the present invention. Likewise, it may decrease the power consumption in some embodiments of the present invention. Moreover, by reducing the need to constantly return to the memory for data, multi-dimensional data structures may be constructed from uni-dimensional memories. Thus, a memory address structure with two dimensions may be utilized which corresponds to the data structure from an imaging array as one example. In addition, a three dimensional data structure may be utilized to represent a three dimensional structure. These multi-dimensional data structures facilitate the operation of the individual units or modules.
While the present invention has been described as operating in a data flow mode, the present invention is also applicable to embodiments in which data flow processors are incorporated into non-data flow processor-based system, such as conventional, sequentially controlled processor-based systems. For example, in one embodiment to the present invention, a data flow processor of the type described herein may be utilized to implement a graphics accelerator coupled to an accelerated graphics port (AGP) bus. The graphics accelerator may have a plurality of modules that work together as a data flow processor. In addition, the graphics accelerator may communicate with system memory through data flow processing. The use of data flows to manipulate complex graphics data may be more efficient than conventional systems in some embodiments. Reducing the need to access the memory may increase the speed of operation. Thus, a graphics accelerator may operate in whole or in part as a data flow processor within a conventional, sequentially operated computer system.
In addition, the present invention may utilize a programming model, in some embodiments of the present invention, that facilitates the design of complex data handling systems. Initially, a graphical depiction of the type shown in FIG. 3 may be developed that captures the various operations that must be implemented in software and hardware. The needed modules or units are identified and the modes of those units are recorded in a mode table as illustrated in FIG.4. At this point, the desired characteristics may be transferred from theCPU28 into command registers, such as the registers35-41, in the various modules or units during a setup stage. In this way, distinct operations, graphically depicted and set up in a mode table may be mapped into hardware units without the need to use real time operating systems or the like.
While the present invention has been described with respect to a limited number of embodiments, those skilled in the art will appreciate numerous modifications and variations therefrom. It is intended that the appended claims cover all such modifications and variations as fall within the true spirit and scope of this present invention.

Claims (25)

What is claimed is:
1. A method comprising:
programming at least two selectable functional modes for each of two reprogrammable digital signal processing devices; and
initiating at least two independent data flows for each of said devices, such that each of said flows use a different mode of at least one of said devices.
2. The method ofclaim 1 including identifying a source of data to begin each data flow.
3. The method ofclaim 1 including providing the data before initiating the flow.
4. The method ofclaim 1 including accessing data in a random access memory and loading said data into channels to be utilized as source data.
5. The method ofclaim 4 including enabling said devices to directly access source data.
6. The method ofclaim 5 including providing a direct memory access controller and assigning a plurality of channels to provide source data for said flows.
7. The method ofclaim 6 including assigning one of said direct memory access channels to respond to read requests from one of said devices.
8. The method ofclaim 6 including providing a designated direct memory access channel for the final result of a data flow.
9. The method ofclaim 1 including coupling said devices together through a common bus.
10. The method ofclaim 9 including coupling said devices together over a packet bus.
11. The method ofclaim 10 including providing device identifiers for devices on said bus and setting the output of one of said devices to the device identifier for the other of said devices.
12. The method ofclaim 9 including providing an arbiter that is assigns priorities for accessing said bus.
13. The method ofclaim 12 including assigning the bus access priorities before initiating said at least two independent data flows.
14. The method ofclaim 1 including designating each of said devices to either send, receive or send and receive data.
15. The method ofclaim 1 including receiving a string of memory addresses, and converting those addresses into a data structure having at least two dimensions.
16. The method ofclaim 15 including forming a data structure having at least three dimensions and using said data structure to model a three-dimensional object.
17. A method ofclaim 15 including converting a one dimensional data structure from a memory into a data structure having at least two dimensions for use by said devices and writing data back to said memory as a one dimensional data structure.
18. An article comprising a medium storing instructions that cause a processor-based system to:
program at least two selectable functional modes for each of two reprogrammable digital signal processing devices; and
initiate at least two independent data flows for each of said devices, such that each of said flows use a different mode of at least one of said devices.
19. The article ofclaim 18 wherein said processor-based system identifies a source of data to begin each data flow.
20. The article ofclaim 18 further storing instructions that cause a processor-based system to provide the data before initiating the flow.
21. The article ofclaim 18 further storing instructions that cause a processor-based system to access data in a random access memory and load said data into channels to be utilized as source data.
22. The article ofclaim 21 further storing instructions that cause a processor-based system to assign a plurality of direct memory access channels to provide source data for said flows.
23. The article ofclaim 21 further storing instructions that cause said processor-based system to convert a string of memory addresses into a data structure having at least two dimensions.
24. The article ofclaim 23 further storing instructions that cause a processor-based system to form a data structure having at least three dimensions and use said data structure to model a three dimensional object.
25. The article ofclaim 23 further storing instructions that cause a processor-based system to convert a one dimensional data structure from a memory into a data structure having at least two dimensions for use by said devices and to cause said data to be written back to said memory as a one dimensional data structure.
US09/540,1962000-03-312000-03-31Data flow processorExpired - LifetimeUS6609188B1 (en)

Priority Applications (2)

Application NumberPriority DateFiling DateTitle
US09/540,196US6609188B1 (en)2000-03-312000-03-31Data flow processor
US10/461,847US6904512B2 (en)2000-03-312003-06-13Data flow processor

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
US09/540,196US6609188B1 (en)2000-03-312000-03-31Data flow processor

Related Child Applications (1)

Application NumberTitlePriority DateFiling Date
US10/461,847DivisionUS6904512B2 (en)2000-03-312003-06-13Data flow processor

Publications (1)

Publication NumberPublication Date
US6609188B1true US6609188B1 (en)2003-08-19

Family

ID=27734879

Family Applications (2)

Application NumberTitlePriority DateFiling Date
US09/540,196Expired - LifetimeUS6609188B1 (en)2000-03-312000-03-31Data flow processor
US10/461,847Expired - Fee RelatedUS6904512B2 (en)2000-03-312003-06-13Data flow processor

Family Applications After (1)

Application NumberTitlePriority DateFiling Date
US10/461,847Expired - Fee RelatedUS6904512B2 (en)2000-03-312003-06-13Data flow processor

Country Status (1)

CountryLink
US (2)US6609188B1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US20020059393A1 (en)*2000-11-152002-05-16Reimer Jay B.Multicore DSP device having coupled subsystem memory buses for global DMA access
US20050212916A1 (en)*2004-03-292005-09-29Takashi NakamuraInput sensor containing display device and method for driving the same
US20140189303A1 (en)*2012-12-282014-07-03Askey Computer Corp.Multistage module expansion system and multistage module communication method

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US7908259B2 (en)*2006-08-252011-03-15Teradata Us, Inc.Hardware accelerated reconfigurable processor for accelerating database operations and queries

Citations (8)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US5590349A (en)*1988-07-111996-12-31Logic Devices, Inc.Real time programmable signal processor architecture
US5790817A (en)*1996-09-251998-08-04Advanced Micro Devices, Inc.Configurable digital wireless and wired communications system architecture for implementing baseband functionality
US5797043A (en)*1996-03-131998-08-18Diamond Multimedia Systems, Inc.System for managing the transfer of data between FIFOs within pool memory and peripherals being programmable with identifications of the FIFOs
US6012136A (en)*1997-12-012000-01-04Advanced Micro Devices, Inc.Communications system with a configurable data transfer architecture
US6085314A (en)*1996-03-182000-07-04Advnced Micro Devices, Inc.Central processing unit including APX and DSP cores and including selectable APX and DSP execution modes
US6151069A (en)*1997-11-032000-11-21Intel CorporationDual mode digital camera for video and still operation
US6275891B1 (en)*1999-02-252001-08-14Lsi Logic CorporationModular and scalable system for signal and multimedia processing
US6441860B1 (en)*1996-05-072002-08-27Matsushita Electric Industrial Co., Ltd.Video signal processing apparatus

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US6230255B1 (en)*1990-07-062001-05-08Advanced Micro Devices, Inc.Communications processor for voice band telecommunications
US5628026A (en)*1994-12-051997-05-06Motorola, Inc.Multi-dimensional data transfer in a data processing system and method therefor
US6128307A (en)*1997-12-012000-10-03Advanced Micro Devices, Inc.Programmable data flow processor for performing data transfers
JP2000010913A (en)*1998-06-262000-01-14Sony Computer Entertainment IncInformation processing device and method and distribution medium
US6567400B1 (en)*1999-07-022003-05-20Cisco Systems, Inc.Hardware assisted DSP data transfer
US6526462B1 (en)*1999-11-192003-02-25Hammam ElabdProgrammable multi-tasking memory management system
US6654783B1 (en)*2000-03-302003-11-25Ethergent CorporationNetwork site content indexing method and associated system

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US5590349A (en)*1988-07-111996-12-31Logic Devices, Inc.Real time programmable signal processor architecture
US5797043A (en)*1996-03-131998-08-18Diamond Multimedia Systems, Inc.System for managing the transfer of data between FIFOs within pool memory and peripherals being programmable with identifications of the FIFOs
US6085314A (en)*1996-03-182000-07-04Advnced Micro Devices, Inc.Central processing unit including APX and DSP cores and including selectable APX and DSP execution modes
US6441860B1 (en)*1996-05-072002-08-27Matsushita Electric Industrial Co., Ltd.Video signal processing apparatus
US5790817A (en)*1996-09-251998-08-04Advanced Micro Devices, Inc.Configurable digital wireless and wired communications system architecture for implementing baseband functionality
US6151069A (en)*1997-11-032000-11-21Intel CorporationDual mode digital camera for video and still operation
US6012136A (en)*1997-12-012000-01-04Advanced Micro Devices, Inc.Communications system with a configurable data transfer architecture
US6275891B1 (en)*1999-02-252001-08-14Lsi Logic CorporationModular and scalable system for signal and multimedia processing

Cited By (5)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US20020059393A1 (en)*2000-11-152002-05-16Reimer Jay B.Multicore DSP device having coupled subsystem memory buses for global DMA access
US6892266B2 (en)*2000-11-152005-05-10Texas Instruments IncorporatedMulticore DSP device having coupled subsystem memory buses for global DMA access
US20050212916A1 (en)*2004-03-292005-09-29Takashi NakamuraInput sensor containing display device and method for driving the same
US7612818B2 (en)*2004-03-292009-11-03Toshiba Matsushita Display Technology Co., Ltd.Input sensor containing display device and method for driving the same
US20140189303A1 (en)*2012-12-282014-07-03Askey Computer Corp.Multistage module expansion system and multistage module communication method

Also Published As

Publication numberPublication date
US6904512B2 (en)2005-06-07
US20030208671A1 (en)2003-11-06

Similar Documents

PublicationPublication DateTitle
EP1058891B1 (en)Multi-processor system with preemptive memory sharing
US10055807B2 (en)Hardware architecture for acceleration of computer vision and imaging processing
US11809360B2 (en)Network-on-chip data processing method and device
US7921151B2 (en)Managing a plurality of processors as devices
US7999813B2 (en)System and method for data synchronization for a computer architecture for broadband networks
US7748006B2 (en)Loading software on a plurality of processors
US7689694B2 (en)Process management apparatus, computer systems, distributed processing method, and computer program for avoiding overhead in a process management device
CN110991619A (en)Neural network processor, chip and electronic equipment
US12333351B2 (en)Synchronization of processing elements that execute statically scheduled instructions in a machine learning accelerator
US20230017778A1 (en)Efficient communication between processing elements of a processor for implementing convolution neural networks
US7836221B2 (en)Direct memory access system and method
US6167471A (en)Method of and apparatus for dispatching a processing element to a program location based on channel number of received data
KR20200138411A (en)Network-on-chip data processing method and device
CN117707991A (en)Data reading and writing method, system, equipment and storage medium
CN111209244A (en) Data processing devices and related products
US6609188B1 (en)Data flow processor
KR20200139256A (en)Network-on-chip data processing method and device
KR20200138414A (en)Network-on-chip data processing method and device
US6567908B1 (en)Method of and apparatus for processing information, and providing medium
US20070208887A1 (en)Method, apparatus, and medium for controlling direct memory access
US8195737B2 (en)Process management apparatus, computer systems, distributed processing method, and computer program
CN110928818B (en)Direct memory access, processor and electronic device
CN120067023B (en) Communication interface reading control method, device, chip and system
WO2022027172A1 (en)Data processing apparatus, method, and system, and neural network accelerator
CA3174808A1 (en)Efficient communication between processing elements of a processor for implementing convolution neural networks

Legal Events

DateCodeTitleDescription
ASAssignment

Owner name:INTEL CORPORATION, CALIFORNIA

Free format text:ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:DUNTON, RANDY R.;REEL/FRAME:010720/0237

Effective date:20000324

STCFInformation on status: patent grant

Free format text:PATENTED CASE

FEPPFee payment procedure

Free format text:PAYOR NUMBER ASSIGNED (ORIGINAL EVENT CODE: ASPN); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

FPAYFee payment

Year of fee payment:4

FPAYFee payment

Year of fee payment:8

FPAYFee payment

Year of fee payment:12


[8]ページ先頭

©2009-2025 Movatter.jp