JP2007035063A

Movatterモバイル変換

Info

Publication number: JP2007035063A
Application number: JP2006259487A
Authority: JP
Inventors: Shinichi Yamaura; 慎一山浦; Kazuhiko Hara; 和彦原; Takao Katayama; 貴雄片山; Kazuhiko Iwanaga; 和彦岩永; Kosuke Takato; 浩資高藤
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 2006-09-25
Filing date: 2006-09-25
Publication date: 2007-02-08
Anticipated expiration: 2019-09-10
Also published as: JP4413905B2

Abstract

<P>PROBLEM TO BE SOLVED: To accelerate data transfer and to perform data processing flexibly corresponding to the number of bits of data. <P>SOLUTION: An SIMD type processor comprises: a plurality of processor elements 3a which include a plurality of registers 31b each for holding data subjected to arithmetic processing and holding data with which arithmetic processing has been performed; data transfer buses 41d connected to the processor elements 3a, respectively; and a register controller 31a which gives read signals or write signals to the registers 31b. In response to an instruction signal given from an external interface 4, the register controller 31a gives a signal to a prescribed register 31b of a prescribed processor element 3a, thereby reading/writing data based on the signal given to the register 31b to which the signal has been given. <P>COPYRIGHT: (C)2007,JPO&INPIT

Description

Translated fromJapanese

この発明は、一つの演算命令により複数の画像データ等を並列処理するＳＩＭＤ（ＳｉｎｇｌｅＩｎｓｔｒｕｃｔｉｏｎＳｔｒｅａｍＭｕｌｔｉｐｌｅＤａｔａＳｔｒｅａｍ）型プロセッサに関するものである。 The present invention relates to a single instruction stream multiple data stream (SIMD) type processor that processes a plurality of image data and the like in parallel by one arithmetic instruction.

近年、デジタル複写機やファクリミリ装置等において、画素数を増加させたり、或いはカラー対応にするなど画像の向上が図られている。そして、この画像の向上に伴い、処理すべきデータ数が増加している。ところで、複写機などにおけるデータ処理は全ての画素に対して同じ演算処理を施すことが多い。そこで、１つの命令で複数のデータに対して同時に同じ演算処理を行うＳＩＭＤ型プロセッサが用いられるようになっている。ここで、演算処理は複数の演算器を並べることで実現できるが、演算の対象となるデータは演算速度に見合う速度でメモリ等をアクセスする必要があり、この速度に間に合わない場合はデータのアクセス速度でプロセッサの性能が決定してしまう。通常タイプのＳＩＳＤ（ＳｉｎｇｌｅＩｎｓｔｒｕｃｔｉｏｎＳｉｎｇｌｅＤａｔａ）型プロセッサでは、演算データはプロセッサのプログラムによりメモリから逐次アクセスするが、この場合にデータのアクセス速度はメモリのビット幅と転送時間で決定する。ＳＩＭＤ型プロセッサにおいてもこの方法を用いると演算は並列処理であるのに対して、データのアクセスは逐次処理となりＳＩＳＤ型プロセッサ程度に処理能力は低下してしまう。 In recent years, in digital copying machines, facsimile machines, and the like, improvement of images has been attempted by increasing the number of pixels or making it compatible with color. As the image is improved, the number of data to be processed has increased. By the way, data processing in a copying machine or the like often performs the same arithmetic processing on all pixels. Therefore, a SIMD type processor that performs the same arithmetic processing simultaneously on a plurality of data with one instruction is used. Here, arithmetic processing can be realized by arranging a plurality of arithmetic units, but it is necessary to access the memory etc. at a speed commensurate with the arithmetic speed for the data to be operated. Processor performance is determined by speed. In a normal type SIDS (Single Instruction Single Data) processor, operation data is sequentially accessed from a memory by a program of the processor. In this case, the data access speed is determined by the bit width of the memory and the transfer time. If this method is also used in the SIMD type processor, the computation is parallel processing, whereas the data access is sequential processing, and the processing capability is reduced to the same level as the SISD type processor.

このため、ＳＩＭＤ型プロセッサでは、演算対象データのアクセスはプロセッサの命令では行わず、外部のメモリデータ転送装置からプロセッサ内部の入出力用のレジスタに直接アクセスするように構成している。即ち、プロセッサでの演算実行と同時に、外部に備えられたメモリデータ転送装置が次に演算処理されるデータを入力用レジスタへ転送したり、演算処理されたデータを出力レジスタからメモリデータ転送装置を介してメモリへ転送することで、データ処理の高速化を図っている。 For this reason, the SIMD type processor is configured such that the operation target data is not accessed by a processor instruction, but an input / output register inside the processor is directly accessed from an external memory data transfer device. In other words, simultaneously with the execution of the operation by the processor, the memory data transfer device provided outside transfers the data to be processed next to the input register, or the calculated data is transferred from the output register to the memory data transfer device. The data processing speed is increased by transferring the data to the memory.

プロセッサと外部メモリデータ転送装置での処理フローは以下のように行われる。
(1)外部メモリデータ転送装置が演算対象データを入力用レジスタに転送。
(2)プロセッサは外部から演算データを転送済みである入力用のレジスタから演算対象データを演算用のレジスタに転送し演算を開始。
(3)プロセッサが所定の演算を実行する。この間に外部メモリデータ転送装置が次の演算対象データを入力用レジスタに転送。また、演算処理済みデータ（結果データ）が出力用レジスタにある場合には外部メモリデータ転送装置が結果データを出力用レジスタからメモリへ転送。
(4)プロセッサは演算を終了し、結果データを出力用レジスタに転送。The processing flow between the processor and the external memory data transfer device is performed as follows.
(1) The external memory data transfer device transfers the operation target data to the input register.
(2) The processor starts the operation by transferring the operation target data from the input register to which the operation data has been transferred from the outside to the operation register.
(3) The processor executes a predetermined operation. During this time, the external memory data transfer device transfers the next calculation target data to the input register. In addition, when the operation processed data (result data) is in the output register, the external memory data transfer device transfers the result data from the output register to the memory.
(4) The processor finishes the operation and transfers the result data to the output register.

上記のように、プロセッサの演算実行時に同時に外部のメモリデータ転送装置が演算データを転送することで高速化を実現している。 As described above, the speed is increased by the external memory data transfer device transferring the calculation data simultaneously with the execution of the calculation of the processor.

このデータ転送方式として、シフトレジスタ方式、或いはシリアルアクセスメモリ方式が採用されている。このシフトレジスタ方式は、例えば、特許文献１特開平５−６７２０３号公報に記載されているように、クロック入力に同期して、レジスタに保持されているデータがビット毎に順次シフトされる方式である。このシフトレジスタ方式によれば、例えば２５６個のプロセッサエレメントを持つＳＩＭＤ型プロセッサの場合、１回目に転送されたデータは０番目のプロセッサエレメントの入力レジスタに保持され、次のクロック入力により１ビットシフトされて１番目のプロセッサエレメントの入力レジスタに保持される。そして、１回目に転送されたデータが、２５５番目のプロセッサエレメントの入力レジスタに保持されるまでには、合計２５６回のクロック入力が必要となる。 As this data transfer method, a shift register method or a serial access memory method is adopted. This shift register method is a method in which data held in a register is sequentially shifted bit by bit in synchronization with a clock input, as described in, for example, Japanese Patent Application Laid-Open No. 5-67203. is there. According to this shift register method, for example, in the case of a SIMD type processor having 256 processor elements, the data transferred for the first time is held in the input register of the 0th processor element and shifted by 1 bit by the next clock input. And held in the input register of the first processor element. A total of 256 clock inputs are required until the data transferred for the first time is held in the input register of the 255th processor element.

また、シリアルアクセスメモリ方式は、例えば、特許文献２特開平６−４６９０号公報に記載されているように、入力ポインタが一つのプロセッサエレメントに論理“Ｈ”を立てた入力ポインタ信号を発生し、論理“Ｈ”で指定されたプロセッサエレメントの入力ＳＡＭ（シリアルアクセスメモリ）に入力データが書き込まれる方式である。このシリアルアクセスメモリ方式では、入力ポインタ信号はクロック入力に同期してビット毎に順次シフトしていく。従って、このシリアルアクセスメモリ方式によれば、例えば２５６個のプロセッサエレメントを持つＳＩＭＤ型プロセッサの場合、１回目のデータ転送では、入力ポインタ信号が０番目のプロセッサエレメントを指定し、０番目のプロセッサエレメントの入力ＳＡＭにデータが保持される。次いで、２回目のデータ転送では、入力ポインタ信号がクロック入力に同期して１ビットシフトして１番目のプロセッサエレメントを指定し、１番目のプロセッサエレメントの入力ＳＡＭにデータが保持される。このようにして、２５５番目のプロセッサエレメントの入力ＳＡＭにデータが保持されるまでには、合計２５６回目のクロック入力が必要となる。
特開平５−６７２０３号公報特開平６−４６９０号公報にThe serial access memory system generates an input pointer signal in which a logic “H” is set in one processor element as described in, for example, Japanese Patent Application Laid-Open No. 6-4690. In this method, input data is written to an input SAM (serial access memory) of a processor element designated by logic “H”. In this serial access memory system, the input pointer signal is sequentially shifted bit by bit in synchronization with the clock input. Therefore, according to this serial access memory system, for example, in the case of a SIMD type processor having 256 processor elements, in the first data transfer, the input pointer signal specifies the 0th processor element, and the 0th processor element Data is held in the input SAM. Next, in the second data transfer, the input pointer signal is shifted by 1 bit in synchronization with the clock input to designate the first processor element, and the data is held in the input SAM of the first processor element. In this way, a total of 256 clock inputs are required until data is held in the input SAM of the 255th processor element.
JP-A-5-67203 In JP-A-6-4690

しかし、これらの方式によると、データを偶数番目のプロセッサエレメントにだけ転送したいような場合であっても、奇数番目のプロセッサエレメントにも転送しなければならないという問題があった。また、データを後半のプロセッサエレメント（１２８番目〜２５５番目）にだけ転送したいような場合であっても、全部のプロセッサエレメントに転送しなければならないという問題があった。即ち、特定のプロセッサエレメントにだけデータを直接転送することはできないという問題があった。そのため、必要なデータを転送するのに、必要以上に時間を要し、データ処理が遅くなるという問題があった。 However, according to these methods, there is a problem that even when data is transferred only to even-numbered processor elements, it must be transferred to odd-numbered processor elements. Further, there is a problem that even when data is to be transferred only to the latter half of the processor elements (128th to 255th), it must be transferred to all the processor elements. That is, there is a problem that data cannot be directly transferred only to a specific processor element. Therefore, there is a problem that it takes more time than necessary to transfer the necessary data, and the data processing becomes slow.

また、プロセッサで行うデータ処理においては、入力データの保持に必要な入力レジスタのビット幅、出力データの保持に必要な出力レジスタのビット幅、一時的にデータを保持するのに必要なレジスタのビット幅は実行するアプリケーションにより異なる。従来のＳＩＭＤ型プロセッサにおいては、入力レジスタ、出力レジスタ、一時的にデータを保持するレジスタで保持できるデータのビット幅が固定であった。そのため、データがこれらのレジスタで保持できるビット幅を越えるとデータ処理できないという問題があった。 In the data processing performed by the processor, the bit width of the input register required to hold the input data, the bit width of the output register required to hold the output data, and the bit of the register required to temporarily hold the data The width depends on the application to be executed. In a conventional SIMD type processor, the bit width of data that can be held by an input register, an output register, or a register that temporarily holds data has been fixed. Therefore, there is a problem that data cannot be processed if the data exceeds the bit width that can be held in these registers.

また、従来技術では入出力レジスタと入出力ポートのビット幅は同じであり、全プロセッサエレメント（ＰＥ）のデータを転送するのにはＰＥ数だけのアクセスが必要であり、転送時間が多くなる問題があった。 In the prior art, the bit widths of the input / output registers and the input / output ports are the same, and it is necessary to access only the number of PEs in order to transfer the data of all the processor elements (PE), which increases the transfer time. was there.

また、アプリケーションによっては多数のラインバッファが必要となりプロセッサエレメントに内蔵するレジスタをこの用途に使用している。しかし、レジスタ数は固定であるため、この値を超えるラインバッファが必要なアプリケーションには対応できない問題があった。 Further, depending on the application, a large number of line buffers are required, and a register built in the processor element is used for this purpose. However, since the number of registers is fixed, there is a problem that cannot be applied to an application that requires a line buffer exceeding this value.

この発明は、斯かる従来の問題に着目してなされたものであり、データを任意のプロセッサエレメントに直接に転送することを可能にすることで、データの転送を高速にし、延いてはデータ処理を高速にすることを目的とする。また、レジスタの使用用途を柔軟にすることで、データのビット数に柔軟に対応したデータ処理を可能にすることを目的とする。 The present invention has been made paying attention to such a conventional problem, and enables data to be directly transferred to an arbitrary processor element, thereby speeding up data transfer, and thus data processing. The purpose is to speed up. Another object of the present invention is to enable data processing flexibly corresponding to the number of bits of data by making the usage of the register flexible.

この発明のＳＩＭＤ型プロセッサは、データを演算処理する演算手段及び当該演算手段で演算処理されるデータを保持するとともに当該演算手段で演算処理されたデータを保持するデータ保持手段を備える複数のプロセッサエレメントと、このプロセッサエレメントそれぞれに接続されるデータ転送バスと、前記プロセッサエレメントに割り付けられたアドレスにより所定のプロセッサエレメントを指定する指定手段と、を備え、前記指定手段が所定のプロセッサエレメントをアドレス指定することにより、このアドレス指定されたプロセッサエレメントの前記データ保持手段はデータを前記データ転送バスより取得或いは出力することを特徴とする。 The SIMD type processor according to the present invention includes a plurality of processor elements each having an arithmetic means for arithmetically processing data and data holding means for holding data arithmetically processed by the arithmetic means and holding data arithmetically processed by the arithmetic means And a data transfer bus connected to each of the processor elements, and designation means for designating a predetermined processor element by an address assigned to the processor element, and the designation means addresses the predetermined processor element. Thereby, the data holding means of the addressed processor element acquires or outputs data from the data transfer bus.

また、前記データ保持手段は、前記演算処理手段で演算処理されるデータを保持する第１のデータ保持手段と、前記演算手段で演算処理されたデータを保持する第２データ保持手段とを備えるようにしてもよい。 Further, the data holding means includes a first data holding means for holding data processed by the calculation processing means, and a second data holding means for holding data processed by the calculation means. It may be.

これによれば、演算処理されるデータは、アドレス指定されたプロセッサエレメントのデータ保持手段に保持されるため、データを任意のプロセッサエレメントに直接に転送できる。また、演算手段で演算処理されたデータを出力する場合にも、アドレス指定されたプロセッサエレメントのデータ保持手段に保持されているデータを出力する。そのため、データの転送を高速にでき、延いてはデータ処理を高速にできる。 According to this, since the data to be processed is held in the data holding means of the addressed processor element, the data can be directly transferred to any processor element. Also, when data processed by the arithmetic means is output, the data held in the data holding means of the addressed processor element is output. Therefore, the data transfer can be performed at high speed, and the data processing can be performed at high speed.

また、この発明のＳＩＭＤ型プロセッサは、データを演算処理する演算手段及び当該演算手段で演算処理されるデータを保持するとともに当該演算手段で演算処理されたデータを保持するデータ保持手段を備える複数のプロセッサエレメントと、このプロセッサエレメントそれぞれに接続されるデータ転送バスと、所定のプロセッサエレメントを指定する指定手段と、演算処理されるデータを前記データ転送バスより取得して前記データ保持手段に保持させるための取得信号、或いは前記データ保持手段に保持されている演算処理されたデータを前記データ転送バスより出力させるための出力信号を前記データ保持手段に与える信号発生手段と、を備え、前記指定手段が所定のプロセッサエレメントを指定し、この指定されたプロセッサエレメントの所定の前記データ保持手段に前記信号発生手段が信号を与えることにより、信号が与えられた前記データ保持手段は与えられた信号に基づいてデータを前記データ転送バスより取得或いは出力することを特徴とする。 In addition, the SIMD type processor according to the present invention includes a plurality of arithmetic means for arithmetically processing data, and a plurality of data holding means for holding the data arithmetically processed by the arithmetic means and holding the data arithmetically processed by the arithmetic means. A processor element, a data transfer bus connected to each of the processor elements, a specifying means for specifying a predetermined processor element, and data to be processed are acquired from the data transfer bus and held in the data holding means A signal generating means for providing the data holding means with an acquisition signal or an output signal for outputting the processed data held in the data holding means from the data transfer bus. Specify a given processor element and specify this specified processor element. When the signal generating means gives a signal to the predetermined data holding means of the event, the data holding means given the signal acquires or outputs data from the data transfer bus based on the given signal. Features.

これによれば、信号発生手段が演算処理されるデータをデータ保持手段に保持させるための取得信号をデータ保持手段に与えることで、このデータ保持手段は演算処理されるデータを取得して保持するものとして機能する。さらに、信号発生手段がデータ保持手段に保持されている演算処理されたデータを出力させるための出力信号をデータ保持手段に与えることで、このデータ保持手段は演算処理されたデータを出力するものとして機能する。このように、レジスタの使用用途を柔軟にすることで、入力データ及び出力データのビット数に柔軟に対応したデータ処理が可能になる。 According to this, the data holding means acquires and holds the data to be processed by giving the data holding means an acquisition signal for holding the data to be processed by the signal generating means to the data holding means. It functions as a thing. Further, the signal generation means provides the data holding means with an output signal for outputting the processed data held in the data holding means, so that the data holding means outputs the processed data. Function. As described above, by making the usage of the register flexible, it is possible to perform data processing flexibly corresponding to the number of bits of input data and output data.

また、前記プロセッサエレメントに偶数番号或いは奇数番号を割り付けて、偶数番号が割り付けられたプロセッサエレメントと奇数番号が割り付けられたプロセッサエレメントとを一組とするとともに、偶数番号が割り付けられたプロセッサエレメント用の前記データ転送バス或いは奇数番号が割り付けられたプロセッサエレメント用の前記データ転送バスを各組毎のプロセッサエレメントにそれぞれ割り当て、前記指定手段により指定された所定の組のプロセッサエレメントにおける前記データ保持手段はそれぞれ割り当てられた前記データ転送バスよりデータを取得或いは出力するようにするとよい。 Further, an even number or an odd number is assigned to the processor element, and a processor element to which the even number is assigned and a processor element to which the odd number is assigned are set as one set, and for the processor element to which the even number is assigned. The data transfer bus or the data transfer bus for the processor element to which the odd number is assigned is assigned to the processor element of each set, and the data holding means in the predetermined set of processor elements specified by the specifying means is respectively Data may be acquired or output from the assigned data transfer bus.

これによれば、一組になっているプロセッサエレメントを一度指定することにより、偶数番号、奇数番号が割り付けられた２つのプロセッサエレメントはそれぞれに割り当てられたデータ転送バスを介して、データの転送ができる。従って、データ転送が一度により多くできるため、データ転送回数を少なくできる。これに伴いデータ転送を高速にでき、データ処理を高速にできる。 According to this, by designating a set of processor elements once, two processor elements assigned with even numbers and odd numbers can transfer data via the data transfer buses assigned to them. it can. Accordingly, since the data transfer can be increased more once, the number of data transfers can be reduced. Accordingly, data transfer can be performed at high speed, and data processing can be performed at high speed.

さらに、前記プロセッサエレメントを構成する前記データ保持手段とは別のデータ保持手段を所定数備えるようにするとよい。そして、処理数単位を分割して前記別のデータ保持手段から取り込み、分割した単位の処理を行うように構成するとよい。 Further, a predetermined number of data holding means different from the data holding means constituting the processor element may be provided. Then, it is preferable to divide the number of processing units, fetch from the other data holding means, and perform processing of the divided units.

これにより、プロセッサエレメントのデータ保持手段の容量を越えるデータの処理が可能になる。例えば、１ラインの処理数（画素数）が多くなっても、外部のデータ保持手段で保持して、このデータ保持手段から処理数単位を分割してデータを取り込み、繰り返し同じ処理を行うことで、画素数の増加にも容易に対応できる。 This makes it possible to process data that exceeds the capacity of the data holding means of the processor element. For example, even if the number of processes (number of pixels) in one line increases, the data is held by an external data holding unit, the data is divided into units from the data holding unit, and the same processing is repeated. It is possible to easily cope with an increase in the number of pixels.

以上詳述したように、この発明によれば、演算処理されるデータは、アドレス指定されたプロセッサエレメントのデータ保持手段に保持されるため、データを任意のプロセッサエレメントに直接に転送できる。また、演算手段で演算処理されたデータを出力する場合にも、アドレス指定されたプロセッサエレメントのデータ保持手段に保持されているデータを出力する。そのため、データの転送を高速にでき、延いてはデータ処理を高速にできる。 As described above in detail, according to the present invention, the data to be processed is held in the data holding means of the addressed processor element, so that the data can be directly transferred to any processor element. Also, when data processed by the arithmetic means is output, the data held in the data holding means of the addressed processor element is output. Therefore, the data transfer can be performed at high speed, and the data processing can be performed at high speed.

また、データ保持手段は入力レジスタとしての機能を有するとともに、出力レジスタとしての機能を有する。このように、データ保持手段の使用用途を柔軟にすることで、データのビット数に柔軟に対応したデータ処理が可能になる。 The data holding means has a function as an input register and also functions as an output register. Thus, by making the usage of the data holding means flexible, it is possible to perform data processing flexibly corresponding to the number of bits of data.

また、一組になっているプロセッサエレメントを一度指定することにより、偶数番号、奇数番号が割り付けられた２つのプロセッサエレメントはそれぞれに割り当てられたデータ転送バスを介して、データの転送ができるため、データの転送回数を少なくすることができ、データ転送を高速にできる。よって、データ処理を高速にできる。 In addition, by designating a pair of processor elements once, two processor elements assigned with an even number and an odd number can transfer data via the data transfer bus assigned to each processor element. Data transfer times can be reduced, and data transfer can be performed at high speed. Therefore, data processing can be performed at high speed.

さらに、プロセッサエレメントのデータ保持手段とは別のデータ保持手段を備えるため、プロセッサエレメントの個数を超えたデータの処理も処理数単位を分割してデータ保持手段から取り込み、繰り返し同じ処理を行うことでき、処理数が多くなっても容易にその処理を行うことができる。 Further, since the data holding means different from the data holding means of the processor element is provided, the processing of data exceeding the number of processor elements can be fetched from the data holding means by dividing the number of processing units, and the same processing can be repeated. Even if the number of processes increases, the process can be easily performed.

（第１の実施形態）
以下、この発明に係るＳＩＭＤ型プロセッサ１の実施形態を、図１乃至図４に基づいて説明する。(First embodiment)
An embodiment of aSIMD type processor 1 according to the present invention will be described below with reference to FIGS.

ＳＩＭＤ型プロセッサ１は、図１に示すように、グローバルプロセッサ２、本実施形態では２５６組の後述するプロセッサエレメント３ａからなるプロセッサエレメントブロック３、メモリコントローラ５と接続される外部インターフェース４から構成される。メモリコントローラ５はグローバルプロセッサ２の命令に基づき、メモリ６から演算対象データをプロセッサ内部の入出力用のレジスタフィル３１に直接アクセスする。 As shown in FIG. 1, theSIMD processor 1 includes aglobal processor 2, aprocessor element block 3 including 256 sets ofprocessor elements 3a described later in this embodiment, and anexternal interface 4 connected to amemory controller 5. . Thememory controller 5 directly accesses the operation target data from the memory 6 to the input / output register fill 31 in the processor based on the instruction of theglobal processor 2.

まず、メモリコントローラ５につき説明する。図１に示すように、メモリーコントローラ４は、ＳＩＭＤ型プロセッサ１のレジスタファイル３１と外部インタフェース４のデータ転送ポートを介して接続されていて、レジスタファイル３１からメモリ６へのデータ転送、メモリ６からレジスタファイル３１へのデータ転送を行っている。メモリコントローラ５が制御するレジスタは、Ｉ／Ｏ空間にマッピングされており、グローバルプロセッサ２からの指示に従い、アドレス、クロック、及びリード・ライト制御を出力することでリード、ライト可能となっている。 First, thememory controller 5 will be described. As shown in FIG. 1, thememory controller 4 is connected to theregister file 31 of theSIMD type processor 1 via the data transfer port of theexternal interface 4, and transfers data from theregister file 31 to the memory 6. Data transfer to theregister file 31 is performed. The registers controlled by thememory controller 5 are mapped in the I / O space, and can be read and written by outputting addresses, clocks, and read / write controls in accordance with instructions from theglobal processor 2.

グローバルプロセッサ２からメモリコントローラ５へはＩ／Ｏ用のアドレス、データ、コントロール信号がバスを介して与えられる。グローバルプロセッサ２がメモリコントローラ５のいくつかの動作設定レジスタ（図示せず）へ動作方法等のコマンドを設定している。最後にグローバルプロセッサ２は、メモリコントローラ５のスタートレジスタ（図示せず）にスタートコードを書き込むことで、メモリコントローラ５は自動的に設定に従った動作を行う。このように構成することで、プロセッサの命令制御による演算と同時にレジスタファイル３１のデータを入出力する。 An I / O address, data, and control signal are given from theglobal processor 2 to thememory controller 5 via a bus. Theglobal processor 2 sets commands such as an operation method in some operation setting registers (not shown) of thememory controller 5. Finally, theglobal processor 2 writes a start code in a start register (not shown) of thememory controller 5 so that thememory controller 5 automatically performs an operation according to the setting. With this configuration, the data in theregister file 31 is input / output simultaneously with the calculation based on the instruction control of the processor.

図２は、この発明に用いられるメモリコントローラ５の構成を示したものである。メモリコントローラ５は、メモリ６にデータライトを行うライトバッファ部５４と、メモリ６からデータリードを行うリードバッファ部５５と、ＰＥレジスタファイルの制御を行っているＰＥ制御部５２、メモリ６の制御を行うＲＡＭ制御部５３、及びシーケンスユニット（ＳＣＵ）５１より構成されている。 FIG. 2 shows the configuration of thememory controller 5 used in the present invention. Thememory controller 5 controls thewrite buffer unit 54 that writes data to the memory 6, theread buffer unit 55 that reads data from the memory 6, thePE control unit 52 that controls the PE register file, and the memory 6. It comprises aRAM control unit 53 and a sequence unit (SCU) 51.

ライトバッファ部５４にはＳＩＭＤ方式プロセッサ１の外部インタフェース４の出力ポートが接続され、リードバッファ部５５には外部インタフェース４の入力ポートが接続される。 An output port of theexternal interface 4 of theSIMD processor 1 is connected to thewrite buffer unit 54, and an input port of theexternal interface 4 is connected to the readbuffer unit 55.

グローバルプロセッサ２は、図３に示すように、グローバルプロセッサ２、プロセッサエレメントブロック３、外部インタフェース４及びメモリコントローラ５を制御するためのプログラムが格納されたプログラムＲＡＭ２１、及びこのプログラムＲＡＭ２１に基づきグローバルプロセッサ２、プロセッサエレメントブロック３、外部インタフェース４、メモリコントローラ５を制御するシーケンスユニット２２を備える。具体的には、このシーケンスユニット２２は、グローバルプロセッサ２に備えられている後述する算術論理演算器２３（以下、「ＡＬＵ２３」という。）等を制御する。 As shown in FIG. 3, theglobal processor 2 includes aglobal processor 2, aprocessor element block 3, aprogram RAM 21 storing a program for controlling theexternal interface 4 and thememory controller 5, and theglobal processor 2 based on theprogram RAM 21. , Aprocessor element block 3, anexternal interface 4, and asequence unit 22 for controlling thememory controller 5. Specifically, thesequence unit 22 controls an arithmetic logic unit 23 (hereinafter referred to as “ALU 23”), which will be described later, provided in theglobal processor 2.

また、このシーケンスユニット２２は、プロセッサエレメントブロック３を構成する後述するレジスタファイル３１、及び後述する演算アレイ３６を制御する。この演算アレイ３６は、マルチプレクサ３２、シフト拡張回路３３、算術論理演算器３４（以下、「ＡＬＵ３４」という）、及びレジスタ３５を備える。なお、このグローバルプロセッサ２は、いわゆるＳＩＳＤ型であり、一つの演算命令に対して一つの演算処理を行うものである。 Thesequence unit 22 controls a later-describedregister file 31 and a later-describedarithmetic array 36 that constitute theprocessor element block 3. Thearithmetic array 36 includes amultiplexer 32, ashift extension circuit 33, an arithmetic logic unit 34 (hereinafter referred to as “ALU 34”), and aregister 35. Theglobal processor 2 is a so-called SISD type, and performs one arithmetic process for one arithmetic instruction.

さらに、このシーケンスユニット２２は、後述するメモリコントローラ５に対してデータ転送のための動作設定用データ及びコマンド等を送る。メモリコントローラ５は、シーケンスユニット２２の動作設定用データ及びコマンドに基づき、プロセッサエレメント３ａのアドレス指定のためのアドレス制御信号、プロセッサエレメント３ａを構成する後述するレジスタ３１ｂにデータのリード／ライトを指示するためのリード／ライト制御信号、クロック信号を与えるためのクロック制御信号を外部インタフェース４に与える。 Further, thesequence unit 22 sends operation setting data and commands for data transfer to thememory controller 5 described later. Based on the operation setting data and commands of thesequence unit 22, thememory controller 5 instructs an address control signal for addressing theprocessor element 3a, and a data read / write to aregister 31b, which will be described later, constituting theprocessor element 3a A read / write control signal and a clock control signal for supplying a clock signal are supplied to theexternal interface 4.

ここで、リード／ライト制御信号のうちライト制御信号とは、演算処理されるデータを後述するデータバス４１ｄより取得して、プロセッサエレメント３ａのレジスタ３１ｂに保持させるための信号をいう。一方、リード／ライト制御信号のうちリード制御信号とは、プロセッサエレメント３ａのレジスタ３１ｂが保持している演算処理されたデータを、後述するデータバス４１ｄへ与えるようレジスタ３１ｂに指示するための信号をいう。 Here, the write control signal among the read / write control signals refers to a signal for obtaining data to be processed from adata bus 41d described later and holding it in theregister 31b of theprocessor element 3a. On the other hand, the read control signal among the read / write control signals is a signal for instructing theregister 31b to supply the processed data held in theregister 31b of theprocessor element 3a to thedata bus 41d described later. Say.

メモリコントローラ５は、グローバルプロセッサ２からのコマンドを受けて、プロセッサエレメントブロック３を構成するプロセッサエレメント３ａのアドレスを指定する信号（以下、「アドレス指定信号」という。）を作成し、外部インターフェース４からアドレスバス４１ａを介してプロセッサエレメント３ａの後述するレジスタコントローラ３１ａヘ送る。また、メモリコントローラ５は、後述するようにプロセッサエレメント３ａを構成するレジスタ３１ｂに対して、データのリード／ライトを指示するための信号（以下、「リード／ライト指示信号」という。）を、リード／ライト信号４１ｂを介してプロセッサエレメント３ａの後述するレジスタコントローラ３１ａヘリード／ライト信号が与えられる。また、メモリコントローラ５は、外部インタフェース４からクロック信号４１ｃを介してプロセッサエレメント３ａの後述するレジスタコントローラ３１ａへクロック信号を与える。 Upon receiving a command from theglobal processor 2, thememory controller 5 creates a signal (hereinafter referred to as “address designation signal”) that designates the address of theprocessor element 3 a that constitutes theprocessor element block 3, and theexternal controller 4 The data is sent to a later-describedregister controller 31a of theprocessor element 3a via theaddress bus 41a. Further, thememory controller 5 reads a signal (hereinafter referred to as “read / write instruction signal”) for instructing data read / write to aregister 31b constituting theprocessor element 3a as will be described later. A read / write signal is given to aregister controller 31a (to be described later) of theprocessor element 3a via the /write signal 41b. Further, thememory controller 5 gives a clock signal from theexternal interface 4 to aregister controller 31a (to be described later) of theprocessor element 3a via theclock signal 41c.

また、メモリコントローラ５は、上述したように、ＳＩＭＤ型プロセッサ１の外部に設けられたメモリ６に格納されているデータを、本実施形態では８ビットのパラレルデータとして、データバス４１ｄに置く。この８ビットのパラレルデータについては、データに応じて適宜変更しても問題ない。このデータバス４１ｄは、レジスタ３１ｂに保持されている演算処理されたデータが、ＳＩＭＤ型プロセッサ１の外部に設けられたメモリ６に送られる時にも使用される。 Further, as described above, thememory controller 5 places the data stored in the memory 6 provided outside theSIMD type processor 1 on thedata bus 41d as 8-bit parallel data in the present embodiment. The 8-bit parallel data can be appropriately changed according to the data. Thedata bus 41d is also used when the processed data held in theregister 31b is sent to the memory 6 provided outside theSIMD type processor 1.

なお、メモリ６は演算処理されるデータを格納するとともに、演算処理されたデータを格納するものであり、これらのメモリ６はＳＩＭＤ型プロセッサ１の内部に設けても問題ない。また、メモリコントローラ５とメモリ６との間のデータ転送についても、本実施形態では８ビットのパラレルデータとして転送されるものとして扱うが、データに応じて適宜変更しても問題ない。なお、メモリコントローラ５が行うその他の動作については後述する。 Note that the memory 6 stores data to be subjected to arithmetic processing and stores data subjected to arithmetic processing, and there is no problem even if these memories 6 are provided inside theSIMD type processor 1. Also, data transfer between thememory controller 5 and the memory 6 is handled as being transferred as 8-bit parallel data in this embodiment, but there is no problem even if it is appropriately changed according to the data. Other operations performed by thememory controller 5 will be described later.

また、グローバルプロセッサ２は、上記シーケンスユニット２２からの命令により、算術論理演算を行うＡＬＵ２３、演算データを格納するデータＲＡＭ２４を備える。さらに、グローバルプロセッサ２は、演算処理されるデータ等を保持するためのレジスタ群２５を備える。 In addition, theglobal processor 2 includes anALU 23 that performs arithmetic logic operations and adata RAM 24 that stores operation data in accordance with instructions from thesequence unit 22. Further, theglobal processor 2 includes a register group 25 for holding data to be processed.

このレジスタ群２５は、プログラムのアドレスを保持するプログラムカウンタＰＣ、演算処理のデータ格納のための汎用レジスタであるＧ０〜Ｇ３レジスタ、レジスタ待避、復帰時に待避先データＲＡＭのアドレスを保持しているスタックポインタ（ＳＰ）、サブルーチンコール時にコール元のアドレスを保持するリンクレジスタ（ＬＳ）、同じくＩＲＱ時とＮＭＩ時の分岐元アドレスを保持するＬＩ、ＬＮレジスタ、プロセッサの状態を保持しているプロセッサステータスレジスタ（Ｐ）を内蔵している。 The register group 25 includes a program counter PC that holds a program address, G0 to G3 registers that are general-purpose registers for storing data for arithmetic processing, and a stack that holds the address of the save destination data RAM at the time of register save and return. Pointer (SP), link register (LS) that holds the address of the caller at the time of a subroutine call, LI and LN registers that hold branch source addresses at the time of IRQ and NMI, and a processor status register that holds the state of the processor (P) is incorporated.

また、レジスタ群２５は、プロセッサエレメントブロック３の後述するレジスタ３５に接続されており、このレジスタ３５との間でシーケンスユニット２２の制御によりデータの交換が行われる。 The register group 25 is connected to a later-describedregister 35 of theprocessor element block 3, and data is exchanged with theregister 35 under the control of thesequence unit 22.

プロセッサエレメントブロック３は、図１及び図３に示すように、レジスタファイル３１、マルチプレクサ３２、シフト・拡張回路３３、算術論理演算器３４（以下、「ＡＬＵ３４」という。）、レジスタ３５、を一単位とする複数のプロセッサエレメント３ａを備える。レジスタファイル３１には、１つのプロセッサエレメント３ａ単位に８ビットのレジスタが３２本内蔵されており、本実施形態では２５６プロセッサエレメント分の組がアレイ構成になっている。レジスタファイル３１は１つのプロセッサエレメント（ＰＥ）３ａごとにＲ０、Ｒ１、Ｒ２、．．．Ｒ３１と呼ばれているレジスタが内蔵されている。それぞれのレジスタファイル３１は演算アレイ３６に対して１つの読み出しポートと１つの書き込みポートを備えており、８ビットのリード／ライト兼用のバスで演算アレイ３６からアクセスされる。３２本のレジスタの内、２４本はプロセッサ外部からアクセス可能であり、外部からクロックとアドレス、リード／ライト制御を入力することで任意のレジスタを読み書きできる。 As shown in FIGS. 1 and 3, theprocessor element block 3 includes aregister file 31, amultiplexer 32, a shift /extension circuit 33, an arithmetic logic unit 34 (hereinafter referred to as “ALU 34”), and aregister 35 as one unit. A plurality ofprocessor elements 3a. Theregister file 31 includes 32 8-bit registers for eachprocessor element 3a. In this embodiment, a set of 256 processor elements has an array configuration. Theregister file 31 stores R0, R1, R2,... For each processor element (PE) 3a. . . A register called R31 is incorporated. Eachregister file 31 has one read port and one write port for thearithmetic array 36 and is accessed from thearithmetic array 36 by an 8-bit read / write bus. Of the 32 registers, 24 are accessible from outside the processor, and any register can be read and written by inputting a clock, an address, and read / write control from the outside.

レジスタの外部からのアクセスは１つの外部ポートで各プロセッサエレメント（ＰＥ）の１つのレジスタがアクセス可能であり外部から入力されたアドレスでプロセッサエレメント（ＰＥ）の番号（０〜２５５）を指定する。したがって、レジスタアクセスの外部ポートは全部で２４組搭載されている。また、外部からのアクセスは偶数のプロセッサエレメント（ＰＥ）と奇数のプロセッサエレメント（ＰＥ）の１組で１６ビットデータとなっており、１回のアクセスで２つのレジスタを同時にアクセスしている。 Access from the outside of the register allows one register of each processor element (PE) to be accessed by one external port, and the number (0 to 255) of the processor element (PE) is designated by an address input from the outside. Therefore, a total of 24 external ports for register access are installed. Access from the outside is 16-bit data in one set of even-numbered processor elements (PE) and odd-numbered processor elements (PE), and two registers are accessed simultaneously by one access.

本実施形態では、プロセッサエレメント３ａの数を２５６個として説明するが、これに限定されるものでなく適宜変更して使用してもよい。このプロセッサエレメント３ａには、グローバルプロセッサ２のシーケンスユニット２２により、外部インタフェース４に近い順に０から２５５までのアドレスが割り付けられる。 In the present embodiment, the number ofprocessor elements 3a is assumed to be 256. However, the number ofprocessor elements 3a is not limited to this, and may be changed as appropriate. Addresses 0 to 255 are assigned to theprocessor element 3a in the order of closeness to theexternal interface 4 by thesequence unit 22 of theglobal processor 2.

プロセッサエレメント３ａのレジスタファイル３１は、レジスタコントローラ３１ａ、２種類のレジスタ３１ｂ、３１ｃを備える。本実施形態では、図３及び図４に示すように、一単位のプロセッサエレメント３ａ毎に、レジスタコントローラ３１ａとレジスタ３１ｂとを２４組備え、さらにレジスタ３１ｃを８個備えている。なお、図４では２組のプロセッサエレメント３ａにおけるレジスタファイル３１の一部を表しており、図３、４中の１ＰＥとは１つのプロセッサエレメント３ａを表している。ここで、本実施形態では、レジスタ３１ｂ、３１ｃを８ビットのものとして扱うが、これに限定されるものでなく適宜変更して使用してもよい。 Theregister file 31 of theprocessor element 3a includes aregister controller 31a and two types ofregisters 31b and 31c. In this embodiment, as shown in FIG. 3 and FIG. 4, for each unit ofprocessor element 3a, 24 sets ofregister controller 31a and register 31b are provided, and 8registers 31c are further provided. 4 shows a part of theregister file 31 in the two sets ofprocessor elements 3a, and 1PE in FIGS. 3 and 4 represents oneprocessor element 3a. Here, in the present embodiment, theregisters 31b and 31c are handled as 8-bit registers, but the present invention is not limited to this and may be used with appropriate modifications.

レジスタコントローラ３１ａは、図４に示すように、外部インタフェース４と、上述したアドレスバス４１ａ、リード／ライト信号４１ｂ、クロック信号４１ｃを介して接続されている。このレジスタコントローラ３１ａは、メモリコントローラ５から外部インタフェース４に与えられ、アドレスバス４１ａを介してアドレス指定信号が送られてくると、そのアドレス指定信号をデコードする。そして、デコードしたアドレスと、自己のプロセッサエレメント３ａに割り付けられたアドレスとが一致する場合には、メモリコントローラ５から外部インタフェース４に与えられ、クロック信号４１ｃからのクロック信号に同期して、リード／ライト信号４１ｂを介してメモリコントローラ５から送られてきたリード／ライト指示信号を得る。このリード／ライト指示信号は、レジスタ３１ｂへ与えられる。 As shown in FIG. 4, theregister controller 31a is connected to theexternal interface 4 via theaddress bus 41a, the read /write signal 41b, and theclock signal 41c described above. When theregister controller 31a is supplied from thememory controller 5 to theexternal interface 4 and receives an address designation signal via theaddress bus 41a, theregister controller 31a decodes the address designation signal. If the decoded address matches the address assigned to itsown processor element 3a, it is given from thememory controller 5 to theexternal interface 4 and read / read in synchronization with the clock signal from theclock signal 41c. A read / write instruction signal sent from thememory controller 5 is obtained via thewrite signal 41b. This read / write instruction signal is applied to theregister 31b.

レジスタ３１ｂは、後述するＡＬＵ３４でこれから演算される外部から入力されたデータを保持したり、或いはＡＬＵ３４で演算処理されたデータを外部へ出力するために保持するものであり、いわゆる入力レジスタとしても、或いは出力レジスタとしても機能する。また、演算処理されるデータ、或いは演算されたデータを一時的に保持するといった、後述するレジスタ３１ｃとしての機能も有する。なお、本実施形態では、レジスタ３１ｂは８ビットのデータを保持できるものとして扱うが、データに応じて適宜変更しても問題ない。上述したレジスタコントローラ３１ａからライト指示信号が与えられると、レジスタ３１ｂは演算処理されるデータをデータバス４１ｄより取得して保持する。一方、レジスタコントローラ３１ａからリード指示信号が送られてくると、レジスタ３１ｂは保持している演算処理されたデータをデータバス４１ｄへ与える。このデータは外部インタフェース４からメモリコントローラ５のライトバッファ部５４に与えられ、ライトバッファ部５４からメモリ６へ格納される。 Theregister 31b holds data input from the outside that will be calculated in theALU 34, which will be described later, or holds the data processed in theALU 34 for output to the outside. Alternatively, it functions as an output register. Further, it also has a function as aregister 31c, which will be described later, such as temporarily holding data to be processed or calculated data. In this embodiment, theregister 31b is handled as one that can hold 8-bit data, but there is no problem even if it is appropriately changed according to the data. When the write instruction signal is given from theregister controller 31a described above, theregister 31b acquires the data to be processed from thedata bus 41d and holds it. On the other hand, when a read instruction signal is sent from theregister controller 31a, theregister 31b gives the data processed and held to thedata bus 41d. This data is given from theexternal interface 4 to thewrite buffer unit 54 of thememory controller 5 and stored in the memory 6 from thewrite buffer unit 54.

また、レジスタ３１ｂは、本実施形態においては８ビットデータをパラレルで転送するデータバス３６を介してマルチプレクサ３２に接続されている。ＡＬＵ３４で演算処理されるデータ、或いはＡＬＵ３４で演算処理されたデータは、このデータバス３６を介して、レジスタ３１ｂとの間で転送される。この転送は、グローバルプロセッサ２のシーケンスユニット２２からの指示によって、グローバルプロセッサ２に接続されたリード信号２６ａ、ライト信号２６ｂを介して行われる。具体的には、グローバルプロセッサ２のシーケンスユニット２２から、リード信号２６ａを介してリード指示信号が送られてくると、レジスタ３１ｂはデータバス３６を介して送られてきたＡＬＵ３４で演算処理されたデータを保持する。一方、グローバルプロセッサ２のシーケンスユニット２２から、ライト信号２６ｂを介してライト指示信号が送られてくると、レジスタ３１ｂは保持している演算処理されるデータをデータバス３６へ置く。このデータはＡＬＵ３４へ送られ演算処理される。 Theregister 31b is connected to themultiplexer 32 via adata bus 36 for transferring 8-bit data in parallel in this embodiment. Data processed by theALU 34 or data processed by theALU 34 is transferred to theregister 31b via thedata bus 36. This transfer is performed via aread signal 26 a and awrite signal 26 b connected to theglobal processor 2 according to an instruction from thesequence unit 22 of theglobal processor 2. Specifically, when a read instruction signal is sent from thesequence unit 22 of theglobal processor 2 via theread signal 26a, theregister 31b receives the data processed by theALU 34 sent via thedata bus 36. Hold. On the other hand, when a write instruction signal is sent from thesequence unit 22 of theglobal processor 2 via thewrite signal 26b, theregister 31b puts the data to be processed on thedata bus 36. This data is sent to theALU 34 and processed.

レジスタ３１ｃは、レジスタ３１ｂより与えられた演算処理されるデータ、或いは演算されたデータがレジスタ３１ｂに与えられる前に、そのデータを一時的に保持するものである。このレジスタ３１ｃは、上述したレジスタ３１ｂと異なり、メモリコントローラ５を介して、メモリ６との間においてデータ転送はしない。 Theregister 31c temporarily holds the data to be processed by theregister 31b or before the calculated data is supplied to theregister 31b. Unlike theregister 31b described above, theregister 31c does not transfer data to or from the memory 6 via thememory controller 5.

演算アレイ３６は、マルチプレクサ３２シフト／拡張回路３３、１６ビットＡＬＵ３４及び１６ビットのレジスタ３５を備えている。このレジスタ３５には、１６ビットＡレジスタ、Ｆレジスタを内蔵している。 Thearithmetic array 36 includes amultiplexer 32 shift /expansion circuit 33, a 16-bit ALU 34, and a 16-bit register 35. Theregister 35 includes a 16-bit A register and an F register.

プロセッサエレメント（ＰＥ）３ａの命令による演算は、基本的にレジスタファイル３１から読み出されたデータをＡＬＵ３４の片側の入力としてもう片側にはレジスタ３５のＡレジスタの内容を入力として結果をＡレジスタに格納する。したがって、Ａレジスタとレジスタファイル３１のＲ０〜Ｒ３１レジスタとの演算が行われることとなる。レジスタファイル３１と演算アレイ３６との接続に（７ｔｏ１）のマルチプレクサ３２を置いており、プロセッサエレメント（ＰＥ）方向で左に１、２、３つ離れたデータと右に１、２、３つ離れたデータ、中央のデータを演算対象として選択している。また、レジスタファイル３１の８ビットのデータはシフト／拡張回路３３により任意ビットの左シフトしてＡＬＵ３４に入力される。さらに、図示していない８ビットの条件レジスタ（Ｔ）により、プロセッサエレメント３ａごとに演算実行の無効／有効の制御をしており、特定のプロセッサエレメント３ａだけを演算対象として選択できるように構成している。 The calculation by the instruction of the processor element (PE) 3a basically uses the data read from theregister file 31 as input on one side of theALU 34 and the contents of the A register of theregister 35 as input on the other side and the result into the A register. Store. Therefore, the operation between the A register and the R0 to R31 registers of theregister file 31 is performed. A (7 to 1)multiplexer 32 is placed between theregister file 31 and theoperation array 36. The data is 1, 2, 3 away from the left in the processor element (PE) direction, and 1, 2, 3 away from the right. Data and center data are selected for calculation. The 8-bit data in theregister file 31 is shifted to the left by an arbitrary bit by the shift /extension circuit 33 and input to theALU 34. In addition, the execution / invalidation control of eachprocessor element 3a is controlled by an 8-bit condition register (T) (not shown) so that only aspecific processor element 3a can be selected as an operation target. ing.

上記したように、マルチプレクサ３２は、自己のプロセッサエレメント３ａに備えられた上記データバス３６に接続されるとともに、両隣３つのプロセッサエレメント３ａに備えられたデータバス３６にも接続されている。このマルチプレクサ３２は７つのプロセッサエレメント３ａから１つを選択し、その選択したプロセッサエレメント３ａにおけるレジスタレジスタ３１ｂ、３１ｃで保持されているデータをＡＬＵ３４へ送る。或いはＡＬＵ３４で演算処理されたデータを、選択したプロセッサエレメント３ａにおけるレジスタレジスタ３１ｂ、３１ｃへ送る。これによって、隣のプロセッサエレメント３ａにおけるレジスタレジスタ３１ｂ、３１ｃで保持されているデータを利用した演算処理が可能になり、ＳＩＭＤ型プロセッサ１の演算処理能力を高めることができる。 As described above, themultiplexer 32 is connected to thedata bus 36 provided in itsown processor element 3a, and is also connected to thedata bus 36 provided in the threeadjacent processor elements 3a. Themultiplexer 32 selects one of the sevenprocessor elements 3 a and sends the data held in the register registers 31 b and 31 c in the selectedprocessor element 3 a to theALU 34. Alternatively, the data processed by theALU 34 is sent to the register registers 31b and 31c in the selectedprocessor element 3a. As a result, arithmetic processing using data held in the register registers 31b and 31c in theadjacent processor element 3a becomes possible, and the arithmetic processing capability of theSIMD type processor 1 can be enhanced.

シフト／拡張回路３３は、マルチプレクサ３２から送られてきたデータを所定ビットシフトしてＡＬＵ３４へ送る。或いはＡＬＵ３４から送られてきた演算処理されたデータを所定ビットシフトしてマルチプレクサ３２へ送る。 The shift /extension circuit 33 shifts the data sent from themultiplexer 32 by a predetermined bit and sends it to theALU 34. Alternatively, the arithmetically processed data sent from theALU 34 is shifted by a predetermined bit and sent to themultiplexer 32.

ＡＬＵ３４は、シフト／拡張回路３３から送られてきたデータと、レジスタ３５に保持されているデータとに基づき算術論理演算を行う。なお、本実施形態では、ＡＬＵ３４は１６ビットのデータに対応できるものとして扱うが、データに応じて適宜変更しても問題ない。演算処理されたデータは、レジスタ３５に保持され、シフト／拡張回路３３へ転送されたり、或いはグローバルプロセッサ２の汎用レジスタ２５へ転送される。 TheALU 34 performs arithmetic logic operations based on the data sent from the shift /expansion circuit 33 and the data held in theregister 35. In this embodiment, theALU 34 is handled as being capable of handling 16-bit data, but there is no problem even if it is appropriately changed according to the data. The processed data is held in theregister 35 and transferred to the shift /expansion circuit 33 or transferred to the general-purpose register 25 of theglobal processor 2.

次に、外部からプロセッサエレメント３ａのレジスタファイル３１へのアクセスにつき図４を参照して説明する。この図４では、外部インターフェース４の外部ポートは８ビットのアドレス、ハイレベル時にリード動作をローレベル時にライト動作を示すリード／ライト選択信号、転送のタイミングを示すクロック、転送データである８ビットデータで構成されている。これらの信号はプロセッサの外部インタフェース４に接続され、ここでタイミングおよびバッファリングされ、プロセッサ内部の信号としてアドレス、リード／ライト、クロック、データに変換される。 Next, external access to theregister file 31 of theprocessor element 3a will be described with reference to FIG. In FIG. 4, the external port of theexternal interface 4 has an 8-bit address, a read / write selection signal indicating a read operation at a high level and a write operation at a low level, a clock indicating a transfer timing, and 8-bit data as transfer data. It consists of These signals are connected to theexternal interface 4 of the processor, where they are timed and buffered, and converted into addresses, read / writes, clocks, and data as signals inside the processor.

これらの信号はレジスタファイル３１の各レジスタに供給されるが、各プロセッサエレメント３ａ…ごとにアドレスをデコードして各プロセッサエレメント３ａ…を示すアドレスと一致したプロセッサエレメント３ａだけがリード／ライトの動作をおこなう。そのため各プロセッサエレメント３ａごとにアドレスのデコードとリード／ライトの制御を行うレジスタコントローラ３１ａを備える。そして、入出力レジスタ３１ｂには、リード／ライト信号４１ｂから与えられるリードライト指示信号（ライト信号Ｗ１、リード信号Ｒ１）に基づき、外部インタフェース４と接続されたデータバス４１ｄとデータの転送をおこなう。入出力レジスタ３１ｂは演算アレイ３６ともデータの転送をおこなうため、もう一方の入出力ポートを持ち、命令によりグローバルプロセッサ２で作成され、リード信号２６ａ及びライト信号２６ｂから与えられるたライト（Ｗ２）、リード（Ｒ２）制御信号により、演算アレイ３６と接続されたデータバス３７（Ｄ２）からデータの転送をおこなう。 These signals are supplied to each register of theregister file 31. However, only theprocessor element 3a corresponding to the address indicating eachprocessor element 3a... Decodes the address for eachprocessor element 3a. Do it. Therefore, eachprocessor element 3a is provided with aregister controller 31a that performs address decoding and read / write control. The input /output register 31b transfers data to thedata bus 41d connected to theexternal interface 4 based on the read / write instruction signal (write signal W1, read signal R1) given from the read /write signal 41b. Since the input /output register 31b also transfers data to and from thearithmetic array 36, the input /output register 31b has the other input / output port. Data is transferred from the data bus 37 (D2) connected to thearithmetic array 36 by a read (R2) control signal.

図４では２個のプロセッサエレメント３ａ分の構成だけを図示しているが、図３の２５６個のプロセッサエレメント３ａ…の構成と合わせるためには、レジスタコントローラ３１ａとレジスタファイル３１ｂは２５６組必要となる。また、２５６組を選択するためにアドレスのビット幅は８ビットとなっている。従って、プロセッサエレメント３ａの数の増減によりアドレスのビット幅も変化することとなる。また、データのビット幅もここでは８ビットとしているが１度に転送するデータ量により変化する。 In FIG. 4, only the configuration for twoprocessor elements 3a is shown, but in order to match the configuration of the 256processor elements 3a ... in FIG. 3, 256 sets ofregister controllers 31a and registerfiles 31b are required. Become. In order to select 256 sets, the bit width of the address is 8 bits. Therefore, the bit width of the address also changes as the number ofprocessor elements 3a increases or decreases. The bit width of the data is 8 bits here, but it varies depending on the amount of data transferred at one time.

このように構成される本実施形態におけるＳＩＭＤ型プロセッサ１は、以下のような動作を行うため、以下のような利点を得ることができる。 Since theSIMD type processor 1 according to the present embodiment configured as described above performs the following operation, the following advantages can be obtained.

メモリコントローラ５が、メモリ６に格納されているデータをプロセッサエレメント３ａに送る場合、プロセッサエレメント３ａに割り付けられたアドレスを指定することにより、１回のクロック信号が入力されるだけで、その指定したプロセッサエレメント３ａにデータを送ることができる。例えばデータを偶数番目のプロセッサエレメント３ａにだけ転送したい場合には、偶数番目のプロセッサエレメント３ａをアドレス指定すればよい。よって、奇数番目のプロセッサエレメント３ａに、データを転送する必要がないため、データ転送が高速になり、延いてはデータ処理を高速にすることができる。 When thememory controller 5 sends the data stored in the memory 6 to theprocessor element 3a, by designating the address assigned to theprocessor element 3a, only one clock signal is input. Data can be sent to theprocessor element 3a. For example, if it is desired to transfer data only to the even-numberedprocessor element 3a, the even-numberedprocessor element 3a may be addressed. Therefore, since there is no need to transfer data to the odd-numberedprocessor elements 3a, the data transfer can be performed at high speed, and the data processing can be performed at high speed.

また、これとは逆に、レジスタ３１ｂに保持されている演算処理されたデータをメモリ５に転送する場合においても、メモリコントローラ５が、プロセッサエレメント３ａに割り付けられたアドレスを指定することにより、１回のクロック信号が入力されるだけで、指定したプロセッサエレメント３ａのレジスタ３１ｂに保持されているデータをメモリ６に転送できる。従って、この場合においても、必要なデータのみを転送できるため、データ転送が高速になり、延いてはデータ処理を高速にすることができる。 On the contrary, even when the arithmetically processed data held in theregister 31b is transferred to thememory 5, thememory controller 5 designates the address assigned to theprocessor element 3a, so that 1 The data held in theregister 31b of the designatedprocessor element 3a can be transferred to the memory 6 only by inputting the clock signal for the first time. Accordingly, even in this case, only necessary data can be transferred, so that the data transfer can be performed at a high speed, and the data processing can be performed at a high speed.

一つのプロセッサエレメント３ａにつき、２４個づつ備えられているレジスタ３１ｂは、上述したように、演算処理されるデータを保持したり、或いは演算処理されたデータを保持するものであり、いわゆる入力レジスタとしても、或いは出力レジスタとしても機能する。例えば、メモリコントローラ５からプロセッサエレメント３ａに送られるデータ、即ち入力データが５６ビットのものであり、プロセッサエレメント３ａからメモリコントローラ５に送るデータ、即ち出力データが３２ビットのものであり、一時的に保持されるべきデータが８０ビットである場合のアプリケーションを考える。この場合、７個のレジスタ３１ｂを５６ビットの入力データを保持するものとして利用し（８ビット×７個＝５６ビット）、４個のレジスタ３１ｂを３２ビットの出力データを保持するものとして利用することができる（８ビット×４個＝３２ビット）。このように、入力データのビット数及び出力データのビット数それぞれのビット数に係わらず、入力データのビット数と出力データのビット数との合計が、８ビット×２４個＝１９２ビットを越えなければ、そのアプリケーションの演算実行ができる。 As described above, 24registers 31b provided for eachprocessor element 3a hold data to be processed or hold data that has been processed, so-called input registers. Or functions as an output register. For example, data sent from thememory controller 5 to theprocessor element 3a, ie, input data is 56 bits, data sent from theprocessor element 3a to thememory controller 5, ie, output data is 32 bits, and temporarily Consider an application where the data to be retained is 80 bits. In this case, sevenregisters 31b are used as holding 56-bit input data (8 bits × 7 = 56 bits), and fourregisters 31b are used as holding 32-bit output data. (8 bits × 4 pieces = 32 bits). Thus, regardless of the number of bits of input data and the number of bits of output data, the sum of the number of bits of input data and the number of bits of output data must exceed 8 bits × 24 = 192 bits. For example, the application can be executed.

また、データを一時的に保持するレジスタ３１ｃは、本実施形態では、一つのプロセッサエレメント３ａにつき８個づつ備えられている。そのため、８ビット×８個＝６４ビット分を保持できる。しかし、この例のように、一時的に保持されるべきデータが８０ビットである場合には、レジスタ３１ｃだけでは１６ビット（＝８０ビット−６４ビット）分のデータが保持できない。この場合においても、本実施形態においてレジスタ３１ｂは、上述したようにデータを一時的に保持する機能も有するため、使用していない１１個（＝２４個−７個−４個）のレジスタ３１ｂのうち、２個（８ビット×２個＝１６ビット）を一時的なデータ保持のために使用すればよい。 In the present embodiment, eightregisters 31c for temporarily storing data are provided for eachprocessor element 3a. Therefore, 8 bits × 8 pieces = 64 bits can be held. However, as in this example, when the data to be temporarily stored is 80 bits, 16 bits (= 80 bits−64 bits) of data cannot be stored only by theregister 31c. Even in this case, in the present embodiment, theregister 31b also has a function of temporarily holding data as described above. Therefore, 11 (= 24−7−4) registers 31b that are not used are used. Of these, two (8 bits × 2 = 16 bits) may be used for temporary data retention.

このように、レジスタ３１ｂの使用用途が柔軟であるため、データのビット数に柔軟に対応したデータ処理が可能である。このことは、このＳＩＭＤ型プロセッサ１で演算処理できるアプリケーションの幅が増えることになり、使用用途が広がるという利点がある。 Thus, since the usage of theregister 31b is flexible, data processing that flexibly corresponds to the number of data bits is possible. This has the advantage that the range of applications that can be processed by theSIMD type processor 1 is increased, and the usage is expanded.

上記した実施の形態においては、外部インタフェース４の外部ポートは外部端子として説明しているが、図５の実施形態のように、転送先のメモリ６とメモリ転送ブロック７が同一チップ上に搭載され、特に外部端子として外部ポートを出力しない場合でも、図３のプロセッサエレメント３ａ…単位でのアドレスデコードとリード／ライトコントロールにより、同一チップに搭載されたメモリ転送ブロック７等で各プロセッサエレメント３ａ…の任意のレジスタをアクセスすることが可能である。 In the above embodiment, the external port of theexternal interface 4 is described as an external terminal. However, as in the embodiment of FIG. 5, the transfer destination memory 6 and thememory transfer block 7 are mounted on the same chip. In particular, even when an external port is not output as an external terminal, each of theprocessor elements 3a... In thememory transfer block 7 etc. mounted on the same chip by address decoding and read / write control in units of theprocessor elements 3a. Any register can be accessed.

次に、上記実施の形態の変更例につき図６に従い説明する。図６に示す構成は、図４の基本構成を２つ搭載している。即ち、図３に示す実施の形態では、入出力レジスタ３１ｂは全部で２４個あり、８個は演算アレイ３６からのみアクセス可能な演算処理用の一時的なデータ保持に使用される演算レジスタ３１ｃである。この２種類のレジスタが合計で３２個あるため、例えば、入力データが５６ビット、出力データが３２ビット、一時的なデータ保持に８０ビットが必要なアプリケーションでは、７個の入出力レジスタ３１ｂを外部入力レジスタ用に、４個の入出力レジスタ３１ｂを外部出力レジスタに、８個の演算レジスタ３１ｃと２個の入出力レジスタ３１ｂの合計１０個を一時的なデータ保持に割り当てることで実現できる。つまり、入力データと出力データのビット幅の合計が１９２ビットまでで、一時的なデータ保持のビット幅を加えた合計のビット幅が２５６ビットまでのアプリケーションであれば自由にレジスタの使用方法を設定して実現できることになる。これに対して、従来のプロセッサでは入力レジスタ、出力レジスタ、演算レジスタが固定のビット幅であったため、いずれかのビット幅を超えるアプリケーションは実現できなかった。 Next, a modified example of the above embodiment will be described with reference to FIG. The configuration shown in FIG. 6 has two basic configurations shown in FIG. That is, in the embodiment shown in FIG. 3, there are 24 input /output registers 31b in total, and 8 areoperation registers 31c used for temporary data storage for operation processing accessible only from theoperation array 36. is there. Since these two types of registers are 32 in total, for example, in an application that requires 56 bits of input data, 32 bits of output data, and 80 bits for temporary data retention, 7 input /output registers 31b are externally provided. This can be realized by assigning four input /output registers 31b as external output registers and a total of tenoperation registers 31c and two input /output registers 31b for temporary data storage. In other words, if the total of the bit width of input data and output data is up to 192 bits and the total bit width including the bit width of temporary data retention is up to 256 bits, the register usage can be freely set Can be realized. On the other hand, in the conventional processor, the input register, the output register, and the arithmetic register have a fixed bit width, and thus an application exceeding any one of the bit widths cannot be realized.

（第２の実施形態）
本発明に係るＳＩＭＤ型プロセッサ１の第２の実施形態を図７を参照して以下説明する。なお、ここでは上述した第１実施形態と異なる点について説明することとし、同じ点については説明を省略する。また、上述した第１実施形態と同じ構成部分については、同一の符号を付する。(Second Embodiment)
A second embodiment of theSIMD type processor 1 according to the present invention will be described below with reference to FIG. Here, differences from the above-described first embodiment will be described, and description of the same points will be omitted. Moreover, the same code | symbol is attached | subjected about the same component as 1st Embodiment mentioned above.

この第２実施形態におけるＳＩＭＤ型プロセッサ１は、互いに隣り合う２つのプロセッサエレメント３ａに偶数番号、奇数番号を割り付けて一組とするとともに、この一組のプロセッサエレメント３ａには、同一のアドレスを割り付けていることを特徴とする。さらに、偶数番号が割り付けられたプロセッサエレメント３ａ用の偶数用データバス４６ａと、奇数番号が割り付けられたプロセッサエレメント３ａ用の奇数用データバス４６ｂと、を各組毎のプロセッサエレメント３ａにそれぞれ割り当てていることを特徴とする。また、メモリコントローラ４とＳＩＭＤ型プロセッサ１の外部に設けられたメモリ５、６との間において、データは上記第１実施形態のように８ビットではなく、１６ビットがパラレルで転送されることも特徴とする。この１６ビットのデータは、偶数番号が割り付けられたプロセッサエレメント３ａに与えられる８ビットと、奇数番号が割り付けられたプロセッサエレメント３ａに与えられる８ビットとから構成されている。以下、具体的にこの実施形態について説明する。 TheSIMD type processor 1 in the second embodiment assigns even numbers and odd numbers to twoadjacent processor elements 3a as one set, and assigns the same address to this set ofprocessor elements 3a. It is characterized by. Further, an even data bus 46a for theprocessor element 3a to which the even number is assigned and an odd data bus 46b for theprocessor element 3a to which the odd number is assigned are respectively assigned to theprocessor elements 3a of each set. It is characterized by being. Further, 16 bits may be transferred in parallel between thememory controller 4 and thememories 5 and 6 provided outside theSIMD type processor 1 instead of 8 bits as in the first embodiment. Features. This 16-bit data is composed of 8 bits given to theprocessor element 3a assigned with the even number and 8 bits given to theprocessor element 3a assigned with the odd number. Hereinafter, this embodiment will be specifically described.

まず、グローバルプロセッサ２からメモリコントローラ５へはＩ／Ｏ用のアドレス、データ、コントロール信号がバスを介して与えられる。グローバルプロセッサ２がメモリコントローラ５のいくつかの動作設定レジスタ（図示せず）へ動作方法等のコマンドを設定している。最後にグローバルプロセッサ２は、メモリコントローラ５のスタートレジスタ（図示せず）にスタートコードを書き込むことで、メモリコントローラ５は自動的に設定に従った動作を行う。 First, an I / O address, data, and control signal are given from theglobal processor 2 to thememory controller 5 via a bus. Theglobal processor 2 sets commands such as an operation method in some operation setting registers (not shown) of thememory controller 5. Finally, theglobal processor 2 writes a start code in a start register (not shown) of thememory controller 5 so that thememory controller 5 automatically performs an operation according to the setting.

外部インタフェース４は、メモリコントローラ５からアドレス制御信号を受けると、アドレス指定信号をアドレスバス４１ａを介してプロセッサエレメントブロック３ヘ送る。これにより、一組のプロセッサエレメント３ａ、即ち２つのプロセッサエレメント３ａが同時にアドレス指定される。レジスタコントローラ３１ａは、送られてきたアドレス指定信号をデコードし、デコードしたアドレスと、自己に割り付けられたアドレスとが一致する場合には、メモリコントローラ５からクロック信号４１ｃを介して送られてきたクロック信号に同期して、リード／ライト信号４５ａ或いは４５ｂを介してメモリコントローラ４から送られてきたリード／ライト指示信号を得る。具体的には、偶数番号が割り付けられているレジスタコントローラ３１ａは、偶数用リード／ライト信号４５ａを介してメモリコントローラ４から送られてきたリード／ライト指示信号を得る。一方、奇数番号が割り付けられているレジスタコントローラ３１ａは、奇数用リード／ライト信号４５ｂを介してメモリコントローラ４から送られてきたリード／ライト指示信号を得る。このとき一組を構成するプロセッサエレメント３ａのレジスタコントローラ３１ａへ送られるリード／ライト指示信号はそれぞれ異なるものであってもよい。即ち、偶数番号が割り付けられているレジスタコントローラ３１ａへ送られる指示信号がリード指示であるとき、奇数番号が割り付けられているレジスタコントローラ３１ａへ送られる指示信号はライト指示であってもよい。そして、このリード／ライト指示信号はレジスタ３１ｂに与えられる。 When receiving the address control signal from thememory controller 5, theexternal interface 4 sends an address designation signal to theprocessor element block 3 via theaddress bus 41a. Thereby, a set ofprocessor elements 3a, ie twoprocessor elements 3a, are addressed simultaneously. Theregister controller 31a decodes the address designation signal sent, and if the decoded address matches the address assigned to itself, theregister controller 31a sends the clock sent from thememory controller 5 via theclock signal 41c. In synchronization with the signal, a read / write instruction signal sent from thememory controller 4 is obtained via the read / write signal 45a or 45b. Specifically, theregister controller 31a to which the even number is assigned obtains the read / write instruction signal sent from thememory controller 4 via the even read / write signal 45a. On the other hand, theregister controller 31a to which the odd number is assigned obtains the read / write instruction signal sent from thememory controller 4 via the odd read / write signal 45b. At this time, the read / write instruction signals sent to theregister controller 31a of theprocessor element 3a constituting the set may be different. That is, when the instruction signal sent to theregister controller 31a assigned with the even number is a read instruction, the instruction signal sent to theregister controller 31a assigned with the odd number may be a write instruction. The read / write instruction signal is given to theregister 31b.

レジスタコントローラ３１ａから双方のプロセッサエレメント３ａに対し、ライト指示信号が送られてきた場合には、偶数番号が割り付けられたプロセッサエレメント３ａのレジスタ３１ｂは、演算処理されるデータ（８ビット）を偶数用データバス４６ａより取得して保持する。また、奇数番号が割り付けられたプロセッサエレメント３ａのレジスタ３１ｂは、演算処理されるデータ（８ビット）を奇数用データバス４６ｂより取得して保持する。一方、レジスタコントローラ３１ａから双方のプロセッサエレメント３ａに対し、リード指示信号が送られてきた場合には、偶数番号が割り付けられたプロセッサエレメント３ａのレジスタ３１ｂは、演算処理されたデータ（８ビット）を偶数用データバス４６ａへ送る。また、奇数番号が割り付けられたプロセッサエレメント３ａのレジスタ３１ｂは、演算処理されたデータ（８ビット）を奇数用データバス４６ｂへ送る。 When a write instruction signal is sent from theregister controller 31a to bothprocessor elements 3a, theregister 31b of theprocessor element 3a to which the even number is assigned uses the data (8 bits) to be processed for an even number. Obtained from the data bus 46a and held. Further, theregister 31b of theprocessor element 3a to which the odd number is assigned acquires the data (8 bits) to be processed from the odd data bus 46b and holds it. On the other hand, when a read instruction signal is sent from theregister controller 31a to both theprocessor elements 3a, theregister 31b of theprocessor element 3a to which the even number is assigned receives the processed data (8 bits). The data is sent to the even data bus 46a. In addition, theregister 31b of theprocessor element 3a to which the odd number is assigned sends the arithmetically processed data (8 bits) to the odd data bus 46b.

このように、一度のアドレス指定により、偶数番号が割り付けられたプロセッサエレメント３ａにデータ転送できるとともに、奇数番号が割り付けられたプロセッサエレメント３ａにもデータ転送できる。このため、データの転送回数を少なくすることができ、データ転送を高速にできる。よって、データ処理を高速にできる。また、本実施形態においても、上記第１実施形態と同様に、プロセッサエレメント３ａをアドレス指定していることより、上記第１実施形態と同様の利点を得ることができる。 As described above, data can be transferred to theprocessor element 3a to which the even number is assigned, and can be transferred to theprocessor element 3a to which the odd number is assigned. For this reason, the number of times of data transfer can be reduced, and data transfer can be performed at high speed. Therefore, data processing can be performed at high speed. Also in the present embodiment, the same advantages as in the first embodiment can be obtained by addressing theprocessor element 3a as in the first embodiment.

次に、上記実施の形態の変更例につき図８に従い説明する。図８に示す構成は、図７の基本構成を２つ搭載している。即ち、図３に示す実施の形態では、入出力レジスタ３１ｂは全部で２４個あり、８個は演算アレイ３６からのみアクセス可能な演算処理用の一時的なデータ保持に使用される演算レジスタ３１ｃである。この２種類のレジスタが合計で３２個あるため、例えば、入力データが５６ビット、出力データが３２ビット、一時的なデータ保持に８０ビットが必要なアプリケーションでは、７個の入出力レジスタ３１ｂを外部入力レジスタ用に、４個の入出力レジスタ３１ｂを外部出力レジスタに、８個の演算レジスタ３１ｃと２個の入出力レジスタ３１ｂの合計１０個を一時的なデータ保持に割り当てることで実現できる。つまり、入力データと出力データのビット幅の合計が１９２ビットまでで、一時的なデータ保持のビット幅を加えた合計のビット幅が２５６ビットまでのアプリケーションであれば自由にレジスタの使用方法を設定して実現できることになる。 Next, a modified example of the above embodiment will be described with reference to FIG. The configuration shown in FIG. 8 has two basic configurations shown in FIG. That is, in the embodiment shown in FIG. 3, there are 24 input /output registers 31b in total, and 8 areoperation registers 31c used for temporary data storage for operation processing accessible only from theoperation array 36. is there. Since these two types of registers are 32 in total, for example, in an application that requires 56 bits of input data, 32 bits of output data, and 80 bits for temporary data retention, 7 input /output registers 31b are externally provided. This can be realized by assigning four input /output registers 31b as external output registers and a total of tenoperation registers 31c and two input /output registers 31b for temporary data storage. In other words, if the total of the bit width of input data and output data is up to 192 bits and the total bit width including the bit width of temporary data retention is up to 256 bits, the register usage can be freely set Can be realized.

（第３の実施形態）
本発明に係るＳＩＭＤ型プロセッサ１の第３の実施形態を、図９を参照して以下説明する。上述した第２実施形態においては、プロセッサエレメント３ａをアドレス指定しているが、本実施形態はプロセッサエレメント３ａの指定をアドレス指定する方式ではなく、ポインタ指定する方式、即ちシリアルアクセスメモリ方式に応用するものである。なお、ここでは上述した第２実施形態と異なる点について説明することとし、同じ点については説明を省略する。また、上述した第２実施形態と同じ構成部分については、同一の符号を付する。(Third embodiment)
A third embodiment of theSIMD type processor 1 according to the present invention will be described below with reference to FIG. In the second embodiment described above, theprocessor element 3a is addressed. However, the present embodiment is not applied to the addressing specification of theprocessor element 3a, but applied to a pointer specifying method, that is, a serial access memory method. Is. Here, the points different from the second embodiment described above will be described, and the description of the same points will be omitted. Moreover, the same code | symbol is attached | subjected about the same component as 2nd Embodiment mentioned above.

まず、グローバルプロセッサ２からメモリコントローラ５へはＩ／Ｏ用のアドレス、データ、コントロール信号がバスを介して与えられる。グローバルプロセッサ２がメモリコントローラ５のいくつかの動作設定レジスタ（図示せず）へ動作方法等のコマンドを設定している。最後にグローバルプロセッサ２は、メモリコントローラ５のスタートレジスタ（図示せず）にスタートコードを書き込むことで、メモリコントローラ５は自動的に設定に従った動作を行う。メモリコントローラ５は、グローバルプロセッサ２のコマンドに基づき、このリセット信号を生成し、外部インタフェース４からリセット信号４７を介してプロセッサエレメントブロック３ヘ送る。これにより、レジスタコントローラ３１ａは、リセットされる。そして、外部インタフェース４に最も近いレジスタコントローラ３１ａへメモリコントローラ５から外部インタフェース４、クロック信号４１ｃを介してクロック信号が送られる。このクロック信号に同期して、レジスタコントローラ３１ａは、リード／ライト信号４５ａ或いは４５ｂを介してメモリコントローラ５から送られてきたリード／ライト指示信号を得る。このリード／ライト指示信号は、偶数番号が割り付けられたプロセッサエレメント３ａのレジスタ３１ｂ、及び奇数番号が割り付けられたプロセッサエレメント３ａのレジスタ３１ｂにそれぞれ与えられる。このとき一組を構成するプロセッサエレメント３ａのレジスタコントローラ３１ａへ送られるリード／ライト指示信号は、上記第２実施形態の場合と同様それぞれ異なるものであってもよい。 First, an I / O address, data, and control signal are given from theglobal processor 2 to thememory controller 5 via a bus. Theglobal processor 2 sets commands such as an operation method in some operation setting registers (not shown) of thememory controller 5. Finally, theglobal processor 2 writes a start code in a start register (not shown) of thememory controller 5 so that thememory controller 5 automatically performs an operation according to the setting. Thememory controller 5 generates this reset signal based on the command of theglobal processor 2 and sends it to theprocessor element block 3 from theexternal interface 4 via the reset signal 47. As a result, theregister controller 31a is reset. Then, a clock signal is sent from thememory controller 5 to theregister controller 31a closest to theexternal interface 4 via theexternal interface 4 and theclock signal 41c. In synchronization with this clock signal, theregister controller 31a obtains a read / write instruction signal sent from thememory controller 5 via the read / write signal 45a or 45b. This read / write instruction signal is applied to theregister 31b of theprocessor element 3a to which the even number is assigned and to theregister 31b of theprocessor element 3a to which the odd number is assigned. At this time, the read / write instruction signals sent to theregister controller 31a of theprocessor element 3a constituting one set may be different from each other as in the case of the second embodiment.

これにより、上述した第２実施形態の場合と同様、一度のポインタ指定により、偶数番号が割り付けられたプロセッサエレメント３ａにデータ転送できるとともに、奇数番号が割り付けられたプロセッサエレメント３ａにもデータ転送できる。このため、データの転送回数を少なくすることができ、データ転送を高速にできる。よって、データ処理を高速にできる。 As a result, as in the case of the second embodiment described above, data can be transferred to theprocessor element 3a assigned with an even number by one pointer designation, and can also be transferred to theprocessor element 3a assigned with an odd number. For this reason, the number of times of data transfer can be reduced, and data transfer can be performed at high speed. Therefore, data processing can be performed at high speed.

（第４実施形態）
本発明に係るＳＩＭＤ型プロセッサ１の第４の実施形態を、図１１及び図１２を参照して以下説明する。なお、ここでは上述した第１実施形態と異なる点について説明することとし、同じ点については説明を省略する。また、上述した第１実施形態と同じ構成部分については同一の符号を付する。(Fourth embodiment)
A fourth embodiment of theSIMD type processor 1 according to the present invention will be described below with reference to FIGS. Here, the points different from the first embodiment described above will be described, and the description of the same points will be omitted. Moreover, the same code | symbol is attached | subjected about the same component as 1st Embodiment mentioned above.

本実施形態においては、図１０に示すように、ラインバッファ６１をプロセッサエレメント３ａの外部に別途設けることを特徴とする。この図１０では、ラインバッファ６１を２つ示しているが、ラインバッファ６１の数は適宜変更してもよい。このラインバッファ６１には、演算処理が終了しているが、注目画素の上下の画素を参照するために必要なデータを保持したり、或いは１ラインの画素数が多い場合にプロセッサエレメント３ａ…を越える処理画素数を保持することなどに使用される。図１０では、入出力レジスタファイル３１にラインバッファ６１を接続しており、入出力レジスタファイル３１に保持されている一部のデータが、このラインバッファ６１に送られ保持される。また、ラインバッファ６１に保持されているデータは、必要に応じて入出力レジスタファイル３１に送られ、演算処理のデータとして使用される。なお、ここで、入出力レジスタファイル３１の各ブロックは、図２において横に一列に並んでいる２５６個のレジスタコントローラ３１ａ及びレジスタ３１ｂを意味する。 In the present embodiment, as shown in FIG. 10, aline buffer 61 is separately provided outside theprocessor element 3a. In FIG. 10, two line buffers 61 are shown, but the number of line buffers 61 may be changed as appropriate. In thisline buffer 61, calculation processing has been completed, but data necessary for referring to the upper and lower pixels of the target pixel is held, or when the number of pixels in one line is large, theprocessor element 3a. It is used for holding the number of processed pixels exceeding. In FIG. 10, theline buffer 61 is connected to the input /output register file 31, and a part of the data held in the input /output register file 31 is sent to and held in theline buffer 61. Further, the data held in theline buffer 61 is sent to the input /output register file 31 as necessary, and used as operation processing data. Here, each block of the input /output register file 31 means 256register controllers 31a and 31b arranged in a line horizontally in FIG.

上記した実施形態のように、２５６個のプロセッサエレメント３ａ…を備えたプロセッサでは、２５６画素までは内部のレジスタファイル３１にデータを置くことが可能である。それを超える画素数の場合、複数の本数のレジスタに同一ラインを分割して保持することになる。上記のようにラインバッファ６１を外部に持つことで、２５６画素ずつラインバッファ６１からデータを取り込むことが可能となり、２５６画素以上のラインでも繰り返し同じ処理を行うことで、画素数をいくらでも増加させることができる。但し、画像数の上限はラインバッファ６１の容量で決まる。このように、外部にラインバッファ６１を備えることにより、１ラインの画素数が多くなっても容易にその処理を行うことができる。 As in the embodiment described above, a processor having 256processor elements 3a... Can store data in theinternal register file 31 up to 256 pixels. When the number of pixels exceeds that, the same line is divided and held in a plurality of registers. By having theline buffer 61 outside as described above, it is possible to fetch data from theline buffer 61 in units of 256 pixels, and by repeatedly performing the same processing for lines of 256 pixels or more, the number of pixels can be increased as much as possible. Can do. However, the upper limit of the number of images is determined by the capacity of theline buffer 61. Thus, by providing theline buffer 61 outside, even if the number of pixels in one line increases, the processing can be easily performed.

また、入出力レジスタファイル３１で保持しているデータを、ラインバッファ６１で保持させることで、空いた入出力レジスタファイル３１を他の演算処理のために使用でき、演算処理を効率的に行うことができる。即ち、プロセッサエレメント３ａのレジスタ３１ｂの容量を越えるデータの処理が可能になる。 Also, by holding the data held in the input /output register file 31 in theline buffer 61, the empty input /output register file 31 can be used for other arithmetic processing, and the arithmetic processing can be performed efficiently. Can do. That is, it becomes possible to process data exceeding the capacity of theregister 31b of theprocessor element 3a.

なお、レジスタファイルの種類に関係なく、ラインバッファ６１をプロセッサエレメント３ａの外部に別途設けることができる。即ち、図１１に示すように、演算処理されるデータを取得して保持するだけの機能を持つ入力レジスタファイル、演算処理されたデータをデータバス４１ｄに出力するだけの機能を持つ出力レジスタファイルに接続して設けてもよい。この場合、出力レジスタファイルに保持されている一部のデータが、ラインバッファ６１に送られ保持される。また、ラインバッファ６１に保持されているデータは、必要に応じて入力レジスタファイルに送られ、演算処理のデータとして使用される。 Note that theline buffer 61 can be separately provided outside theprocessor element 3a regardless of the type of the register file. That is, as shown in FIG. 11, an input register file having a function of only acquiring and holding data to be processed and an output register file having a function of only outputting the processed data to thedata bus 41d. You may connect and provide. In this case, some data held in the output register file is sent to theline buffer 61 and held. Further, the data held in theline buffer 61 is sent to the input register file as necessary, and used as data for arithmetic processing.

この発明の実施形態におけるＳＩＭＤ型プロセッサを示すブロック図である。It is a block diagram which shows the SIMD type | mold processor in embodiment of this invention.この発明に用いられるメモリコントローラ５の構成を示すブロック図である。It is a block diagram which shows the structure of thememory controller 5 used for this invention.この発明の第１実施形態におけるＳＩＭＤ型プロセッサの内部構成を示す図である。It is a figure which shows the internal structure of the SIMD type | mold processor in 1st Embodiment of this invention.第１実施形態におけるプロセッサエレメントの内部構成を示す図である。It is a figure which shows the internal structure of the processor element in 1st Embodiment.転送先のメモリとメモリ転送ブロックが同一チップ上に搭載された実施の形態を示すブロック図である。FIG. 3 is a block diagram showing an embodiment in which a transfer destination memory and a memory transfer block are mounted on the same chip.第１実施形態におけるプロセッサエレメントの内部構成を示す図である。It is a figure which shows the internal structure of the processor element in 1st Embodiment.第２実施形態におけるプロセッサエレメントの内部構成を示す図である。It is a figure which shows the internal structure of the processor element in 2nd Embodiment.第２実施形態におけるプロセッサエレメントの内部構成を示す図である。It is a figure which shows the internal structure of the processor element in 2nd Embodiment.第３実施形態におけるプロセッサエレメントの内部構成を示す図である。It is a figure which shows the internal structure of the processor element in 3rd Embodiment.第４実施形態におけるラインバッファの接続を説明するブロック図である。It is a block diagram explaining the connection of the line buffer in 4th Embodiment.第４実施形態におけるラインバッファの接続を説明するブロック図である。It is a block diagram explaining the connection of the line buffer in 4th Embodiment.

符号の説明Explanation of symbols

１ＳＩＭＤ型プロセッサ
２グローバルプロセッサ
４外部インタフェース
５メモリコントローラ
２６ａリード信号
２６ｂライト信号
３１ａレジスタコントローラ
３１ｂレジスタ
３４ＡＬＵ
４１ａアドレスバス
４１ｂリード／ライト信号
４１ｄクロック信号
４５ａ偶数用リード／ライト信号
４５ｂ奇数用リード／ライト信号
４６ａ偶数用データバス
４６ｂ奇数用データバス
４７リセット信号1SIMD type processor 2Global processor 4External interface 5Memory controller26a Read signal26b Write signal31a Registercontroller 31b Register 34 ALU
41a address bus 41b read /write signal 41d clock signal 45a even read / write signal 45b odd read / write signal 46a even data bus 46b odd data bus 47 reset signal

Claims

Translated fromJapanese

データを演算処理する演算手段及び当該演算手段で演算処理されるデータを保持するとともに当該演算手段で演算処理されたデータを保持するデータ保持手段を備える複数のプロセッサエレメントと、このプロセッサエレメントそれぞれに接続されるデータ転送バスと、前記プロセッサエレメントに割り付けられたアドレスにより所定のプロセッサエレメントを指定する指定手段と、を備え、
前記指定手段が所定のプロセッサエレメントをアドレス指定することにより、このアドレス指定されたプロセッサエレメントの前記データ保持手段はデータを前記データ転送バスより取得或いは出力することを特徴とするＳＩＭＤ型プロセッサ。A plurality of processor elements each having a computing means for computing data and data holding means for holding data computed by the computing means and holding data computed by the computing means, and connected to each of the processor elements A data transfer bus, and designation means for designating a predetermined processor element by an address assigned to the processor element,
The SIMD type processor characterized in that the data holding means of the addressed processor element acquires or outputs data from the data transfer bus when the designation means addresses a predetermined processor element.