JP2011221840A

Movatterモバイル変換

Info

Publication number: JP2011221840A
Application number: JP2010091233A
Authority: JP
Inventors: Taizo Umezaki; 太造梅崎; Motoyasu Tanaka; 基康田中
Original assignee: MegaChips Corp
Current assignee: MegaChips Corp
Priority date: 2010-04-12
Filing date: 2010-04-12
Publication date: 2011-11-04
Anticipated expiration: 2030-04-12
Also published as: JP5513960B2

Abstract

PROBLEM TO BE SOLVED: To obtain an image processor capable of improving the accuracy of face detection even when the number of images that can be prepared is small.SOLUTION: The image processor 1 includes: a neural network 2 which has a plurality of processing layers including an input layer 10 and an output layer 12 with each processing layer including a plurality of units, and outputs an output image showing the position of the face of a person included in an input image inputted to the input layer 10 from the output layer 12; an acquiring part 4 for acquiring a first image including the face of the person; a working part 5 for generating a plurality of second images including the face of the person by executing predetermined working process of the first image; and a setting part 6 for setting a weighting value W between respective units belonging to different processing layers by learning that uses the plurality of second images as a teacher image by inputting the plurality of second images to the input layer 10.

Description

Translated fromJapanese

本発明は、画像処理装置に関し、特に、階層型ニューラルネットワークを用いた画像処理装置に関する。 The present invention relates to an image processing apparatus, and more particularly to an image processing apparatus using a hierarchical neural network.

画像内に含まれる人物の顔を検知するための、階層型ニューラルネットワークを用いた画像処理装置の開発が進められている。当該画像処理装置においては、ニューラルネットワークの入力層に入力画像が入力され、出力層からは、入力画像に含まれる人物の顔の中心位置を示す出力画像（例えば、顔の中心位置に対応する画素が白く表示され、その他の領域の画素が黒く表示された画像）が出力される。 Development of an image processing apparatus using a hierarchical neural network for detecting a human face included in an image is in progress. In the image processing apparatus, an input image is input to an input layer of a neural network, and an output image (for example, a pixel corresponding to the center position of the face) indicating the center position of a human face included in the input image is output from the output layer. Is displayed in white and pixels in other areas are displayed in black).

なお、階層型ニューラルネットワークを用いた顔検知技術については、例えば下記特許文献１，２に開示されている。 Note that face detection technology using a hierarchical neural network is disclosed in, for example, the followingPatent Documents 1 and 2.

特開２００６−３１４４０号公報JP 2006-31440 A特開２００６−１１９７８号公報Japanese Patent Laid-Open No. 2006-11978

階層型ニューラルネットワークは、それぞれが複数のニューロン（以下「ユニット」と称す）を含む複数の処理層（入力層、中間層、及び出力層）を有する。入力層に含まれる各ユニットと中間層に含まれる各ユニットとの間には、ユニット間の結合強度を示す重み付け値が設定され、同様に、中間層に含まれる各ユニットと出力層に含まれる各ユニットとの間には、ユニット間の結合強度を示す重み付け値が設定される。ニューラルネットワークの学習においては、人物の顔の位置が既知である教師画像を入力層に入力し、その顔の位置が反映された適切な出力画像が出力層から出力されるように、各ユニット間の重み付け値が設定される。 The hierarchical neural network has a plurality of processing layers (input layer, intermediate layer, and output layer) each including a plurality of neurons (hereinafter referred to as “units”). A weighting value indicating the coupling strength between the units is set between each unit included in the input layer and each unit included in the intermediate layer, and similarly included in each unit included in the intermediate layer and the output layer. A weighting value indicating the coupling strength between the units is set between each unit. In neural network learning, a teacher image with a known human face position is input to the input layer, and an appropriate output image reflecting the face position is output from the output layer. Is set.

一般的にニューラルネットワークにおいては、より多くの画像を用いて学習を行うことによって、各ユニット間の重み付け値がより良く設定され、顔検知の精度が向上する。しかしながら、多くの画像を収集することは現実的に困難を伴うことがあり、準備できた画像の数が少ない場合には、各ユニット間の重み付け値の設定が不十分となって、顔検知の精度が低下する。 In general, in a neural network, by performing learning using a larger number of images, weight values between units are set better, and the accuracy of face detection is improved. However, collecting many images may be difficult in practice, and when the number of prepared images is small, the setting of weight values between units becomes insufficient, and face detection is not possible. Accuracy is reduced.

本発明はかかる事情に鑑みて成されたものであり、準備できた画像の数が少ない場合であっても、顔検知の精度を向上することが可能な画像処理装置を得ることを目的とするものである。 The present invention has been made in view of such circumstances, and an object thereof is to obtain an image processing device capable of improving the accuracy of face detection even when the number of prepared images is small. Is.

本発明の第１の態様に係る画像処理装置は、入力層及び出力層を含む複数の処理層を有し、各前記処理層が複数のユニットを含み、前記入力層に入力された入力画像に含まれる人物の顔の位置を示す出力画像を前記出力層から出力する、ニューラルネットワークと、人物の顔を含む第１の画像を取得する取得手段と、前記第１の画像に対して所定の加工処理を施すことにより、人物の顔を含む複数の第２の画像を生成する加工手段と、前記複数の第２の画像を前記入力層に入力することにより、前記複数の第２の画像を教師画像として用いた学習によって、異なる前記処理層に属する各前記ユニット間の重み付け値を設定する設定手段と、を備えることを特徴とするものである。 The image processing apparatus according to the first aspect of the present invention includes a plurality of processing layers including an input layer and an output layer, each of the processing layers includes a plurality of units, and an input image input to the input layer A neural network for outputting an output image indicating the position of the included human face from the output layer, an acquisition means for acquiring a first image including the human face, and a predetermined processing for the first image Processing means for generating a plurality of second images including a human face by performing processing, and inputting the plurality of second images to the input layer, thereby teaching the plurality of second images to the teacher Setting means for setting a weighting value between the units belonging to different processing layers by learning used as an image.

第１の態様に係る画像処理装置によれば、加工手段は、取得手段が取得した第１の画像に対して所定の加工処理を施すことにより、人物の顔を含む複数の第２の画像を生成する。そして、設定手段は、加工手段が生成した複数の第２の画像を入力層に入力することにより、複数の第２の画像を教師画像として用いた学習によって、異なる処理層に属する各ユニット間の重み付け値を設定する。従って、準備できた第１の画像の数が少ない場合であっても、第１の画像を元に生成した複数の第２の画像を教師画像として用いて学習を行うことができる。従って、各ユニット間の重み付け値を適切に設定することができ、その結果、顔検知の精度を向上することが可能となる。 According to the image processing apparatus according to the first aspect, the processing unit performs a predetermined processing on the first image acquired by the acquisition unit, thereby obtaining a plurality of second images including a human face. Generate. Then, the setting means inputs a plurality of second images generated by the processing means to the input layer, thereby learning between the units belonging to different processing layers by learning using the plurality of second images as a teacher image. Set the weight value. Therefore, even when the number of prepared first images is small, learning can be performed using a plurality of second images generated based on the first images as teacher images. Therefore, the weighting value between each unit can be set appropriately, and as a result, the accuracy of face detection can be improved.

本発明の第２の態様に係る画像処理装置は、第１の画像処理装置において特に、前記所定の加工処理には、画像の拡大又は縮小、画像の回転、画像内における顔位置の変更、レンズ歪みの付与、ノイズの付与、及び光源変更の少なくとも一つが含まれることを特徴とするものである。 The image processing apparatus according to the second aspect of the present invention is the image processing apparatus in the first image processing apparatus, and the predetermined processing includes enlargement or reduction of the image, rotation of the image, change of the face position in the image, lens It includes at least one of imparting distortion, imparting noise, and changing the light source.

第２の態様に係る画像処理装置によれば、取得手段が取得した第１の画像に対する加工処理として、画像の拡大又は縮小、画像の回転、画像内における顔位置の変更、レンズ歪みの付与、ノイズの付与、及び光源変更等の加工処理を行うことにより、第１の画像を元に複数の第２の画像を生成することが可能となる。 According to the image processing apparatus according to the second aspect, as the processing for the first image acquired by the acquisition unit, the enlargement or reduction of the image, the rotation of the image, the change of the face position in the image, the application of lens distortion, By performing processing such as applying noise and changing the light source, a plurality of second images can be generated based on the first image.

本発明の第３の態様に係る画像処理装置は、第１又は第２の態様に係る画像処理装置において特に、各前記ユニットの出力値Ｙは、パラメータμ（≧１）と自身のユニット値Ｘとを用いて、

と定義され、前記設定手段は、パラメータμの値が互いに異なる値に設定された複数の処理系統によって、前記重み付け値の組をそれぞれ求め、得られた複数の組の中から最適な組を選択することを特徴とするものである。In the image processing apparatus according to the third aspect of the present invention, in particular, in the image processing apparatus according to the first or second aspect, the output value Y of each unit has a parameter μ (≧ 1) and its own unit value X. And

The setting means obtains each set of the weight values by a plurality of processing systems in which the value of the parameter μ is set to a different value, and selects the optimum set from the obtained plurality of sets. It is characterized by doing.

第３の態様に係る画像処理装置によれば、設定手段は、パラメータμの値が互いに異なる値に設定された複数の処理系統によって、各ユニット間の重み付け値の組をそれぞれ求める。そして、得られた複数の組の中から最適な組を選択する。従って、パラメータμの値が固定された一つの処理系統のみによって重み付け値を設定する場合と比較すると、より良い重み付け値を設定することが可能となる。 According to the image processing apparatus according to the third aspect, the setting means obtains a set of weight values between the units by a plurality of processing systems in which the value of the parameter μ is set to a different value. Then, an optimum group is selected from the obtained plurality of groups. Therefore, it is possible to set a better weighting value as compared with the case where the weighting value is set only by one processing system in which the value of the parameter μ is fixed.

本発明の第４の態様に係る画像処理装置は、第３の態様に係る画像処理装置において特に、前記設定手段は、前記複数の組のうち、学習回数の増加に伴って教師信号と出力信号との誤差が低下し、かつ人物の顔の検知率が最も高い組を、前記最適な組として選択することを特徴とするものである。 An image processing apparatus according to a fourth aspect of the present invention is the image processing apparatus according to the third aspect, in which the setting means includes a teacher signal and an output signal as the number of learnings increases in the plurality of sets. And a group having the highest human face detection rate is selected as the optimum group.

第４の態様に係る画像処理装置によれば、設定手段は、複数の組のうち、学習回数の増加に伴って教師信号と出力信号との誤差が低下し、かつ人物の顔の検知率が最も高い組を、最適な組として選択する。これにより、顔検知の精度を向上することが可能となる。 According to the image processing apparatus of the fourth aspect, the setting means includes a plurality of sets in which the error between the teacher signal and the output signal decreases with an increase in the number of learnings, and the human face detection rate increases. The highest set is selected as the optimal set. As a result, the accuracy of face detection can be improved.

本発明の第５の態様に係る画像処理装置は、第１〜第４のいずれか一つの態様に係る画像処理装置において特に、第３の画像を記憶する記憶手段をさらに備え、前記設定手段は、学習回数の増加に伴う教師信号と出力信号との誤差の低下の度合いが所定値未満となった場合に、前記記憶手段から読み出した前記第３の画像を前記入力層に入力することにより、前記重み付け値の設定処理を継続することを特徴とするものである。 The image processing apparatus according to a fifth aspect of the present invention is the image processing apparatus according to any one of the first to fourth aspects, and further includes a storage unit that stores a third image, and the setting unit includes The third image read from the storage means is input to the input layer when the degree of decrease in error between the teacher signal and the output signal with the increase in the number of learning is less than a predetermined value, The weighting value setting process is continued.

第５の態様に係る画像処理装置によれば、設定手段は、学習回数の増加に伴う教師信号と出力信号との誤差の低下の度合いが所定値未満となった場合に、記憶手段から読み出した第３の画像を入力層に入力することにより、重み付け値の設定処理を継続する。このように、誤差特性が収束してきた場合に新たな教師画像を自動で追加することによって、学習をさらに進めることができ、その結果、さらに適切な重み付け値を設定することが可能となる。 According to the image processing apparatus of the fifth aspect, the setting unit reads from the storage unit when the degree of reduction in the error between the teacher signal and the output signal due to the increase in the number of learnings is less than a predetermined value. By inputting the third image to the input layer, the weighting value setting process is continued. As described above, when the error characteristic has converged, by automatically adding a new teacher image, learning can be further advanced, and as a result, a more appropriate weighting value can be set.

本発明の第６の態様に係る画像処理装置は、第５の態様に係る画像処理装置において特に、前記第３の画像は、人物の顔を含まない画像であることを特徴とするものである。 The image processing apparatus according to a sixth aspect of the present invention is characterized in that, in the image processing apparatus according to the fifth aspect, the third image is an image that does not include a human face. .

第６の態様に係る画像処理装置によれば、誤差特性が収束してきた場合に新たに追加される第３の画像は、人物の顔を含まない画像である。人物の顔を含まない画像を用いることにより、抑制学習を行うことができる。また、人物の顔を含まない画像に関しては、画像内における人物の顔の位置を教師信号として教示する処理が不要であるため、新たな画像の追加に伴う処理の負荷を軽減することが可能となる。 According to the image processing apparatus of the sixth aspect, the third image newly added when the error characteristic has converged is an image that does not include a human face. Suppression learning can be performed by using an image that does not include a human face. In addition, for an image that does not include a person's face, it is not necessary to teach the position of the person's face in the image as a teacher signal, so the processing load associated with the addition of a new image can be reduced. Become.

本発明の第７の態様に係る画像処理装置は、入力層及び出力層を含む複数の処理層を有し、各前記処理層が複数のユニットを含み、前記入力層に入力された入力画像に含まれる人物の顔の位置を示す出力画像を前記出力層から出力する、ニューラルネットワークと、人物の顔を含む第１の画像を取得する取得手段と、前記第１の画像を前記入力層に入力することにより、前記第１の画像を教師画像として用いた学習によって、異なる前記処理層に属する各前記ユニット間の重み付け値を設定する設定手段と、を備え、各前記ユニットの出力値Ｙは、パラメータμ（≧１）と自身のユニット値Ｘとを用いて、

と定義され、前記設定手段は、パラメータμの値が互いに異なる値に設定された複数の処理系統によって、前記重み付け値の組をそれぞれ求め、得られた複数の組の中から最適な組を選択することを特徴とするものである。An image processing apparatus according to a seventh aspect of the present invention includes a plurality of processing layers including an input layer and an output layer, each of the processing layers includes a plurality of units, and an input image input to the input layer A neural network for outputting an output image indicating the position of the included human face from the output layer, an acquisition means for acquiring a first image including the human face, and the first image being input to the input layer And setting means for setting a weight value between the units belonging to different processing layers by learning using the first image as a teacher image, and the output value Y of each unit is: Using the parameter μ (≧ 1) and its own unit value X,

第７の態様に係る画像処理装置によれば、設定手段は、パラメータμの値が互いに異なる値に設定された複数の処理系統によって、各ユニット間の重み付け値の組をそれぞれ求める。そして、得られた複数の組の中から最適な組を選択する。従って、パラメータμの値が固定された一つの処理系統のみによって重み付け値を設定する場合と比較すると、より良い重み付け値を設定することが可能となる。 According to the image processing apparatus of the seventh aspect, the setting means obtains a set of weight values between the units by a plurality of processing systems in which the value of the parameter μ is set to a different value. Then, an optimum group is selected from the obtained plurality of groups. Therefore, it is possible to set a better weighting value as compared with the case where the weighting value is set only by one processing system in which the value of the parameter μ is fixed.

本発明の第８の態様に係る画像処理装置は、入力層及び出力層を含む複数の処理層を有し、各前記処理層が複数のユニットを含み、前記入力層に入力された入力画像に含まれる人物の顔の位置を示す出力画像を前記出力層から出力する、ニューラルネットワークと、人物の顔を含む第１の画像を取得する取得手段と、前記第１の画像を前記入力層に入力することにより、前記第１の画像を教師画像として用いた学習によって、異なる前記処理層に属する各前記ユニット間の重み付け値を設定する設定手段と、第２の画像を記憶する記憶手段と、を備え、前記設定手段は、学習回数の増加に伴う教師信号と出力信号との誤差の低下の度合いが所定値未満となった場合に、前記記憶手段から読み出した前記第２の画像を前記入力層に入力することにより、前記重み付け値の設定処理を継続することを特徴とするものである。 An image processing apparatus according to an eighth aspect of the present invention has a plurality of processing layers including an input layer and an output layer, each of the processing layers includes a plurality of units, and an input image input to the input layer A neural network for outputting an output image indicating the position of the included human face from the output layer, an acquisition means for acquiring a first image including the human face, and the first image being input to the input layer Accordingly, a setting unit that sets a weighting value between the units belonging to different processing layers by learning using the first image as a teacher image, and a storage unit that stores the second image, And the setting means outputs the second image read from the storage means when the degree of decrease in error between the teacher signal and the output signal with the increase in the number of learnings is less than a predetermined value. To enter More, it is characterized in that to continue setting processing of the weighting value.

第８の態様に係る画像処理装置によれば、設定手段は、学習回数の増加に伴う教師信号と出力信号との誤差の低下の度合いが所定値未満となった場合に、記憶手段から読み出した第２の画像を入力層に入力することにより、重み付け値の設定処理を継続する。このように、誤差特性が収束してきた場合に新たな教師画像を自動で追加することによって、学習をさらに進めることができ、その結果、さらに適切な重み付け値を設定することが可能となる。 According to the image processing apparatus of the eighth aspect, the setting unit reads from the storage unit when the degree of reduction in the error between the teacher signal and the output signal with the increase in the number of learnings is less than a predetermined value. By inputting the second image to the input layer, the weighting value setting process is continued. As described above, when the error characteristic has converged, by automatically adding a new teacher image, learning can be further advanced, and as a result, a more appropriate weighting value can be set.

本発明によれば、準備できた画像の数が少ない場合であっても、顔検知の精度を向上することが可能な画像処理装置を得ることができる。 According to the present invention, it is possible to obtain an image processing apparatus capable of improving the accuracy of face detection even when the number of prepared images is small.

本発明の実施の形態に係る画像処理装置の構成を示すブロック図である。It is a block diagram which shows the structure of the image processing apparatus which concerns on embodiment of this invention.図１に示した加工部を示す図である。It is a figure which shows the process part shown in FIG.図１に示したニューラルネットワークの構成を示す図である。It is a figure which shows the structure of the neural network shown in FIG.入力層に入力される画像と、出力層から出力される画像とを示す図である。It is a figure which shows the image input into an input layer, and the image output from an output layer.ニューラルネットワークにおける複数の入力ユニットと一つの中間ユニットとを抜き出して示す図である。It is a figure which extracts and shows several input units and one intermediate unit in a neural network.図５に示したユニット値と出力値との関係を示す図である。It is a figure which shows the relationship between the unit value shown in FIG. 5, and an output value.ニューラルネットワークの構成を示す図である。It is a figure which shows the structure of a neural network.学習回数に応じた誤差の変化状況の一例を示す図である。It is a figure which shows an example of the change state of the error according to the frequency | count of learning.学習回数に応じた誤差の変化状況の一例を示す図である。It is a figure which shows an example of the change state of the error according to the frequency | count of learning.

以下、本発明の実施の形態について、図面を用いて詳細に説明する。なお、異なる図面において同一の符号を付した要素は、同一又は相応する要素を示すものとする。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings. In addition, the element which attached | subjected the same code | symbol in different drawing shall show the same or corresponding element.

図１は、本発明の実施の形態に係る画像処理装置１の構成を示すブロック図である。図１の接続関係で示すように、画像処理装置１は、ニューラルネットワーク２、記憶部３、取得部４、加工部５、設定部６、及び検知率算出部７を備えて構成されている。 FIG. 1 is a block diagram showing a configuration of animage processing apparatus 1 according to an embodiment of the present invention. As shown by the connection relationship in FIG. 1, theimage processing apparatus 1 includes aneural network 2, astorage unit 3, anacquisition unit 4, aprocessing unit 5, asetting unit 6, and a detectionrate calculation unit 7.

ニューラルネットワーク２は、入力画像Ｓ６に人物の顔が含まれている場合に、その顔の位置を示す出力画像Ｓ７を出力する。記憶部３には、複数の画像が記憶されている。記憶部３には、人物の顔を含む画像のほか、人物の顔を含まない画像も記憶されている。取得部４は、記憶部３に記憶されている画像を画像データＳ１として読み出し、読み出した画像を画像データＳ２として加工部５に入力する。加工部５は、取得部４から入力された画像に対して所定の加工処理（詳細は後述する）を施すことにより、複数の画像を画像データＳ３として設定部６に入力する。ここで、加工部５は、取得部４から入力された画像自身も設定部６に入力してもよい。設定部６は、加工部５から入力された画像に基づいて、ニューラルネットワーク２の学習のために用いる教師画像を、画像データＳ５としてニューラルネットワーク２に入力する。検知率算出部７は、複数の入力画像Ｓ６をニューラルネットワーク２に入力し、入力画像Ｓ６の総数に対する、人物の顔の位置を正しく検知できた出力画像Ｓ７の数の割合（検知率）を求め、その検知率に関するデータＳ４を設定部６に入力する。 When the input image S6 includes a human face, theneural network 2 outputs an output image S7 indicating the position of the face. A plurality of images are stored in thestorage unit 3. Thestorage unit 3 stores an image including a person's face and an image not including the person's face. Theacquisition unit 4 reads the image stored in thestorage unit 3 as image data S1, and inputs the read image to theprocessing unit 5 as image data S2. Theprocessing unit 5 inputs a plurality of images as image data S3 to thesetting unit 6 by performing predetermined processing (details will be described later) on the image input from theacquisition unit 4. Here, theprocessing unit 5 may also input the image itself input from theacquisition unit 4 to thesetting unit 6. Based on the image input from theprocessing unit 5, thesetting unit 6 inputs a teacher image used for learning of theneural network 2 to theneural network 2 as image data S5. The detectionrate calculation unit 7 inputs a plurality of input images S6 to theneural network 2, and obtains the ratio (detection rate) of the number of output images S7 that can correctly detect the position of the person's face with respect to the total number of input images S6. The data S4 related to the detection rate is input to thesetting unit 6.

図２は、図１に示した加工部５を示す図である。加工部５には、図１に示した取得部４から画像データＳ２が入力される。加工部５は、人物の顔が含まれる画像データＳ２に対して様々な加工処理（ランダマイズ）を施すことにより、人物の顔が含まれる複数の画像データＳ３１，Ｓ３２，Ｓ３３，・・・，Ｓ３Ｍ（図１に示した画像データＳ３に相当する）を出力する。ランダマイズには、例えば、画像を拡大又は縮小する処理（アスペクト比の変更や解像度の変更を含む）、画像を任意の角度で回転させる処理、画像内における顔位置を変更する処理（トリミング領域の位置や大きさの変更を含む）、画像においてレンズ歪みを恣意的に付与する処理、画像においてノイズを恣意的に付与する処理、及び、画像において光源を恣意的に変更する処理（照度の変更や色温度の変更を含む）が含まれる。これらの各処理は、周知の画像処理技術によって実現することが可能である。 FIG. 2 is a diagram illustrating theprocessing unit 5 illustrated in FIG. 1. Image data S2 is input to theprocessing unit 5 from theacquisition unit 4 shown in FIG. Theprocessing unit 5 performs various processing (randomization) on the image data S2 including the person's face, thereby performing a plurality of image data S31, S32, S33,..., S3M including the person's face. (Corresponding to the image data S3 shown in FIG. 1) is output. Randomization includes, for example, a process of enlarging or reducing an image (including change of aspect ratio and resolution), a process of rotating the image at an arbitrary angle, and a process of changing the face position in the image (the position of the trimming area) Process that arbitrarily adds lens distortion in the image, process that arbitrarily adds noise in the image, and process that arbitrarily changes the light source in the image (change in illuminance or color) Including changes in temperature). Each of these processes can be realized by a known image processing technique.

図３は、図１に示したニューラルネットワーク２の構成を示す図である。ニューラルネットワーク２は、複数の処理層を有する階層型のニューラルネットワークであり、複数の入力ユニットを含む入力層１０と、複数の中間ユニットを含む中間層１１と、複数の出力ユニットを含む出力層１２とを備えている。各入力ユニットには、設定部６から入力された画像データＳ５の各画素値（例えば輝度値）が入力される。各出力ユニットは、画像データＳ７の各画素値（例えば白又は黒）を出力する。 FIG. 3 is a diagram showing a configuration of theneural network 2 shown in FIG. Theneural network 2 is a hierarchical neural network having a plurality of processing layers, and includes aninput layer 10 including a plurality of input units, anintermediate layer 11 including a plurality of intermediate units, and anoutput layer 12 including a plurality of output units. And. Each input unit receives each pixel value (for example, luminance value) of the image data S5 input from thesetting unit 6. Each output unit outputs each pixel value (for example, white or black) of the image data S7.

図４は、入力層１０に入力される画像２０と、出力層１２から出力される画像２１とを示す図である。画像２０には人物の顔が含まれている。画像２１は、画像２０に含まれる人物の顔の中心位置を示している。図４に示した画像２１の例では、顔の中心位置に対応する画素が白く表示され、その他の領域の画素が黒く表示されている。 FIG. 4 is a diagram illustrating animage 20 input to theinput layer 10 and animage 21 output from theoutput layer 12. Theimage 20 includes a human face. Theimage 21 shows the center position of the human face included in theimage 20. In the example of theimage 21 shown in FIG. 4, pixels corresponding to the center position of the face are displayed in white, and pixels in other regions are displayed in black.

図５は、ニューラルネットワーク２における複数の入力ユニット４０１，４０２，・・・，４０Ｎと一つの中間ユニット５０とを抜き出して示す図である。中間ユニット５０には、各入力ユニット４０１，４０２，・・・，４０Ｎからの出力値Ｙ１，Ｙ２，・・・，ＹＮが入力される。また、中間ユニット５０と各入力ユニット４０１，４０２，・・・，４０Ｎとの間には、重み付け値Ｗ１，Ｗ２，・・・，ＷＮがそれぞれ設定されている。 FIG. 5 is a diagram showing a plurality ofinput units 401, 402,..., 40N and oneintermediate unit 50 extracted from theneural network 2. As shown in FIG. The output values Y1, Y2,..., YN from theinput units 401, 402,. Further, weight values W1, W2,..., WN are set between theintermediate unit 50 and theinput units 401, 402,.

中間ユニット５０は、

なる演算を実行することにより、自身のユニット値Ｘを求める。ここで、θは、各中間ユニット５０に設定されたオフセット値である。Theintermediate unit 50 is

The unit value X is obtained by executing the following calculation. Here, θ is an offset value set for eachintermediate unit 50.

また、中間ユニット５０は、

なる演算を実行することにより、自身の出力値Ｙを求めて出力する。ここで、μは、ニューラルネットワーク２に設定されたパラメータである。なお、図５では複数の入力ユニットと一つの中間ユニットとの関係を示したが、複数の中間ユニットと一つの出力ユニットとの関係もこれと同様である。Theintermediate unit 50 is

Is executed to obtain and output its own output value Y. Here, μ is a parameter set in theneural network 2. In FIG. 5, the relationship between a plurality of input units and one intermediate unit is shown, but the relationship between a plurality of intermediate units and one output unit is the same.

画像処理装置１では、教師画像内に含まれる人物の顔の位置は既知であるため、その顔の位置を教師信号として与えることにより、各教師画像から適切な出力画像（図４参照）が得られるように、ニューラルネットワーク２の学習（つまり各ユニット間の重み付け値Ｗの設定）が行われる。 In theimage processing apparatus 1, since the position of the face of the person included in the teacher image is known, an appropriate output image (see FIG. 4) is obtained from each teacher image by giving the face position as a teacher signal. As described above, learning of the neural network 2 (that is, setting of the weight value W between each unit) is performed.

つまり、各出力ユニットに関して、教師信号と出力信号との誤差Ｅ（二乗誤差）を、

なる演算によって求める。ここで、Ｔは教師信号の値であり、Ｙは出力信号の値である。そして、誤差Ｅを用いて、重み付け値Ｗの修正量を、

なる演算によって求める。ここで、αは修正係数である。That is, for each output unit, the error E (square error) between the teacher signal and the output signal is

Is obtained by the following calculation. Here, T is the value of the teacher signal, and Y is the value of the output signal. Then, using the error E, the correction amount of the weighting value W is

Is obtained by the following calculation. Here, α is a correction coefficient.

図６は、図５に示したユニット値Ｘと出力値Ｙとの関係を示す図である。パラメータμ（≧１）の値の大小に応じて、ユニット値Ｘに対する出力値Ｙの反応の度合いが異なる。パラメータμの値が大きいほど、ユニット値Ｘが「０」の付近における曲線の傾斜は緩くなる。つまり、曲線の傾斜は、特性Ｌ１＞特性Ｌ２＞特性Ｌ３である。ニューラルネットワークにおいては、パラメータμの値を大きく設定するほど、学習に要する時間は増加するものの、汎化能力を高めることができる。 FIG. 6 is a diagram showing the relationship between the unit value X and the output value Y shown in FIG. The degree of response of the output value Y to the unit value X varies depending on the value of the parameter μ (≧ 1). The larger the value of the parameter μ, the gentler the slope of the curve near the unit value X of “0”. That is, the slope of the curve is characteristic L1> characteristic L2> characteristic L3. In the neural network, the larger the value of the parameter μ, the higher the generalization ability, although the time required for learning increases.

そこで、本実施の形態に係る画像処理装置１では、パラメータμの値が異なる複数の処理系統をニューラルネットワーク２に設け、それぞれの処理系統において並列に学習を行う。図７は、ニューラルネットワーク２の構成を示す図である。この例において、ニューラルネットワーク２は、パラメータμの値が「３」に設定された処理部３０Ａと、パラメータμの値が「９」に設定された処理部３０Ｂと、パラメータμの値が「１１」に設定された処理部３０Ｃとを備える。処理部３０Ａ〜３０Ｃは、図３に示した入力層１０、中間層１１、及び出力層１２をそれぞれ有する。処理部３０Ａ〜３０Ｃは、設定部６から画像データＳ５をそれぞれ入力し、画像データＳ７Ａ〜Ｓ７Ｃをそれぞれ出力する。そして、図１に示した設定部６は、処理部３０Ａ〜３０Ｃによって重み付け値Ｗの組をそれぞれ求め、得られた複数の組の中から最適な組を選択する。 Therefore, in theimage processing apparatus 1 according to the present embodiment, a plurality of processing systems having different values of the parameter μ are provided in theneural network 2 and learning is performed in parallel in each processing system. FIG. 7 is a diagram showing a configuration of theneural network 2. In this example, theneural network 2 includes aprocessing unit 30A in which the value of the parameter μ is set to “3”, aprocessing unit 30B in which the value of the parameter μ is set to “9”, and the value of the parameter μ is “11”. And aprocessing unit 30C set to "." Theprocessing units 30A to 30C each include theinput layer 10, theintermediate layer 11, and theoutput layer 12 illustrated in FIG. Theprocessing units 30A to 30C receive the image data S5 from thesetting unit 6 and output the image data S7A to S7C, respectively. Then, thesetting unit 6 illustrated in FIG. 1 obtains a set of weighting values W by theprocessing units 30A to 30C, and selects an optimum set from the plurality of obtained sets.

一例として設定部６は、得られた複数の組のうち、学習回数の増加に伴って誤差Ｅが低下し、かつ人物の顔の検知率が最も高い組を、最適な組として選択する。 As an example, thesetting unit 6 selects, as an optimal set, a set that has the highest error rate as the error E decreases with the increase in the number of learnings and that has the highest human face detection rate.

図８は、学習回数に応じた誤差Ｅの変化状況の一例を示す図である。図８に示した例では、パラメータμの値が「９」，「１１」に設定された処理部３０Ｂ，３０Ｃに対応する誤差特性Ｋ２，Ｋ３に関しては、学習回数Ｐの増加に伴って誤差Ｅが低下している。一方、パラメータμの値が「３」に設定された処理部３０Ａに対応する誤差特性Ｋ１に関しては、学習回数Ｐが増加しても誤差Ｅは低下していない。従って、設定部６は、処理部３０Ａによって求めた重み付け値Ｗの組を、選択の候補から除外する。なお、実際には誤差特性Ｋ１〜Ｋ３は小刻みに振動しているが、図面の簡略化のため、図８ではその振動の図示を省略している。 FIG. 8 is a diagram illustrating an example of a change state of the error E according to the number of learnings. In the example shown in FIG. 8, with respect to the error characteristics K2 and K3 corresponding to theprocessing units 30B and 30C in which the value of the parameter μ is set to “9” and “11”, the error E Has fallen. On the other hand, regarding the error characteristic K1 corresponding to theprocessing unit 30A in which the value of the parameter μ is set to “3”, the error E does not decrease even when the learning frequency P increases. Therefore, thesetting unit 6 excludes the set of weight values W obtained by theprocessing unit 30A from the selection candidates. Actually, the error characteristics K1 to K3 vibrate little by little, but the illustration of the vibration is omitted in FIG. 8 for simplification of the drawing.

次に、図１に示した検知率算出部７は、学習回数Ｐが所定値（例えば１０００回）に達した時点で、複数の入力画像Ｓ６（望ましくは既に使用した教師画像とは異なる画像）を、処理部３０Ｂ，３０Ｃにそれぞれ入力する。そして、各処理部３０Ｂ，３０Ｃに関して、入力画像Ｓ６の総数に対する、人物の顔の位置を正しく検知できた出力画像Ｓ７の数の割合（検知率）を求める。そして、検知率算出部７は、各処理部３０Ｂ，３０Ｃの検知率に関するデータＳ４を設定部６に入力する。 Next, when the learning count P reaches a predetermined value (for example, 1000), the detectionrate calculation unit 7 illustrated in FIG. 1 has a plurality of input images S6 (preferably images different from the already used teacher images). Are input to theprocessing units 30B and 30C, respectively. Then, for each of theprocessing units 30B and 30C, the ratio (detection rate) of the number of output images S7 that can correctly detect the position of the person's face relative to the total number of input images S6 is obtained. Then, the detectionrate calculation unit 7 inputs data S4 related to the detection rates of theprocessing units 30B and 30C to thesetting unit 6.

設定部６は、処理部３０Ｂ，３０Ｃのうち検知率が高いほうの重み付け値Ｗの組を、上記最適な組として選択し、ニューラルネットワーク２に設定する。なお、この段階で選択の候補が三つ以上残っている場合には、三つ以上の組のうち検知率が最も高い組を上記最適な組として選択する。 Thesetting unit 6 selects a set of weight values W having a higher detection rate from theprocessing units 30B and 30C as the optimum set, and sets it in theneural network 2. If three or more selection candidates remain at this stage, the group having the highest detection rate among the three or more groups is selected as the optimum group.

また、本実施の形態に係る画像処理装置１は、ニューラルネットワーク２（又は図７に示した処理部３０Ａ〜３０Ｃ）の学習が進んで誤差特性が収束してきた場合に、新たな教師画像を自動的に追加することにより、ニューラルネットワーク２の学習をさらに継続させる機能を有する。 Theimage processing apparatus 1 according to the present embodiment automatically generates a new teacher image when the learning of the neural network 2 (or theprocessing units 30A to 30C illustrated in FIG. 7) progresses and the error characteristics converge. Thus, the learning function of theneural network 2 can be further continued.

図９は、学習回数に応じた誤差Ｅの変化状況の一例を示す図である。図１，９を参照して、学習が進んで誤差特性Ｋが収束してきた場合（つまり、学習回数Ｐの増加に伴う誤差Ｅの低下の度合いΔＥが所定値未満となった場合）には、その旨の情報が取得部４に入力されることにより、取得部４は、既に教師画像として使用した画像とは異なる新たな画像を記憶部３から読み出す。ここで、取得部４が記憶部３から読み出す画像は、人物の顔を含まない画像であることが望ましい。これにより、顔でないパターンを顔でないと認識させる抑制学習を行うことができる。抑制学習を行う場合の教師信号は、全ての出力ユニットに関して例えば「０」となる。また、上述した検知率の算出のために使用した複数の入力画像Ｓ６のうち、顔でないのに顔であると誤検知されたパターンを含む画像を記憶部３に記憶しておき、その画像を抑制学習に使用してもよい。さらに、人物の顔に類似するが顔でないパターンを含む画像を記憶部３に記憶しておき、その画像を抑制学習に使用してもよい。 FIG. 9 is a diagram illustrating an example of a change state of the error E according to the number of learnings. Referring to FIGS. 1 and 9, when the learning progresses and the error characteristic K has converged (that is, when the degree ΔE of decrease in error E accompanying the increase in the number of learnings P is less than a predetermined value), When the information to that effect is input to theacquisition unit 4, theacquisition unit 4 reads a new image different from the image already used as the teacher image from thestorage unit 3. Here, the image that theacquisition unit 4 reads from thestorage unit 3 is preferably an image that does not include a human face. Thereby, the suppression learning which recognizes the pattern which is not a face as a face can be performed. The teacher signal when performing suppression learning is, for example, “0” for all output units. In addition, among the plurality of input images S6 used for calculating the detection rate described above, an image including a pattern that is not a face but erroneously detected as a face is stored in thestorage unit 3, and the image is stored. It may be used for suppression learning. Furthermore, an image that includes a pattern that is similar to a human face but is not a face may be stored in thestorage unit 3 and used for suppression learning.

取得部４は、記憶部３から読み出した画像を新たな教師画像としてニューラルネットワーク２に入力し、ニューラルネットワーク２は、取得部４から入力された新たな教師画像に基づいて学習を継続する。 Theacquisition unit 4 inputs the image read from thestorage unit 3 to theneural network 2 as a new teacher image, and theneural network 2 continues learning based on the new teacher image input from theacquisition unit 4.

図９を参照して、学習回数ＰがＰ１〜Ｐ３の各時点で、新たな教示画像が追加されている。新たな教師画像が追加された直後において誤差Ｅは上昇するが、学習が進むにつれて誤差Ｅは徐々に低下し、やがて追加前の値よりも小さくなる。誤差特性Ｋが収束する度に新たな教師画像を追加して学習を継続させることにより、全体として誤差Ｅは徐々に低下する。 Referring to FIG. 9, a new teaching image is added at each time point where the learning frequency P is P1 to P3. The error E increases immediately after a new teacher image is added, but the error E gradually decreases as learning progresses, and eventually becomes smaller than the value before the addition. By adding a new teacher image and continuing learning whenever the error characteristic K converges, the error E gradually decreases as a whole.

このように本実施の形態に係る画像処理装置１によれば、加工部５は、取得部４から入力された画像（画像データＳ２）に対して所定の加工処理を施すことにより、人物の顔を含む複数の画像（画像データＳ３）を生成する。そして、設定部６は、加工部５が生成した複数の画像（画像データＳ３）をニューラルネットワーク２の入力層１０に入力することにより、当該複数の画像を教師画像として用いた学習によって、各ユニット間の重み付け値Ｗを設定する。従って、準備できた画像（つまり記憶部３に記憶された画像）の数が少ない場合であっても、その画像を元に生成した複数の画像を教師画像として用いて学習を行うことができる。従って、各ユニット間の重み付け値Ｗを適切に設定することができ、その結果、顔検知の精度を向上することが可能となる。 As described above, according to theimage processing apparatus 1 according to the present embodiment, theprocessing unit 5 performs a predetermined processing on the image (image data S2) input from theacquisition unit 4, thereby performing the human face. Are generated (image data S3). Thesetting unit 6 inputs each of the plurality of images (image data S3) generated by theprocessing unit 5 to theinput layer 10 of theneural network 2, thereby learning each unit using the plurality of images as a teacher image. A weighting value W between them is set. Therefore, even when the number of prepared images (that is, images stored in the storage unit 3) is small, learning can be performed using a plurality of images generated based on the images as teacher images. Therefore, the weight value W between the units can be set appropriately, and as a result, the accuracy of face detection can be improved.

また、本実施の形態に係る画像処理装置１によれば、取得部４から入力された画像（画像データＳ２）に対して加工部５が行う加工処理として、画像の拡大又は縮小、画像の回転、画像内における顔位置の変更、レンズ歪みの付与、ノイズの付与、及び光源変更等の加工処理を行うことにより、入力された画像（画像データＳ２）を元に複数の画像（画像データＳ３）を生成することが可能となる。 Further, according to theimage processing apparatus 1 according to the present embodiment, as the processing performed by theprocessing unit 5 on the image (image data S2) input from theacquisition unit 4, the image is enlarged or reduced, and the image is rotated. A plurality of images (image data S3) based on the input image (image data S2) by performing processing such as changing the position of the face in the image, applying lens distortion, applying noise, and changing the light source. Can be generated.

また、本実施の形態に係る画像処理装置１によれば、設定部６は、パラメータμの値が互いに異なる値に設定された複数の処理部３０Ａ〜３０Ｃ（図７参照）によって、各ユニット間の重み付け値Ｗの組をそれぞれ求める。そして、得られた複数の組の中から最適な組を選択する。従って、パラメータμの値が固定された一つの処理系統のみによって重み付け値Ｗを設定する場合と比較すると、より良い重み付け値Ｗを設定することが可能となる。 Further, according to theimage processing apparatus 1 according to the present embodiment, thesetting unit 6 uses a plurality ofprocessing units 30A to 30C (see FIG. 7) in which the value of the parameter μ is set to be different from each other. Each set of weight values W is obtained. Then, an optimum group is selected from the obtained plurality of groups. Therefore, it is possible to set a better weighting value W compared to the case where the weighting value W is set only by one processing system in which the value of the parameter μ is fixed.

また、本実施の形態に係る画像処理装置１によれば、設定部６は、複数の組のうち、学習回数Ｐの増加に伴って教師信号と出力信号との誤差Ｅが低下し、かつ人物の顔の検知率が最も高い組を、最適な組として選択する。これにより、顔検知の精度を向上することが可能となる。 Further, according to theimage processing apparatus 1 according to the present embodiment, thesetting unit 6 reduces the error E between the teacher signal and the output signal as the number of times of learning P increases, and the person The group with the highest face detection rate is selected as the optimal group. As a result, the accuracy of face detection can be improved.

また、本実施の形態に係る画像処理装置１によれば、設定部６は、学習回数Ｐの増加に伴う教師信号と出力信号との誤差Ｅの低下の度合いが所定値未満となった場合に、記憶部３から読み出した新たな画像をニューラルネットワーク２の入力層１０に入力することにより、重み付け値Ｗの設定処理を継続する。このように、誤差特性Ｋが収束してきた場合に新たな教師画像を自動で追加することによって、学習をさらに進めることができ、その結果、さらに適切な重み付け値Ｗを設定することが可能となる。 Further, according to theimage processing apparatus 1 according to the present embodiment, thesetting unit 6 determines that the degree of decrease in the error E between the teacher signal and the output signal accompanying the increase in the number of learnings P is less than a predetermined value. The new image read from thestorage unit 3 is input to theinput layer 10 of theneural network 2 to continue the weighting value W setting process. In this way, when the error characteristic K has converged, learning can be further advanced by automatically adding a new teacher image, and as a result, a more appropriate weighting value W can be set. .

また、本実施の形態に係る画像処理装置１によれば、誤差特性Ｋが収束してきた場合に新たに追加される画像は、人物の顔を含まない画像である。人物の顔を含まない画像を用いることにより、抑制学習を行うことができる。また、人物の顔を含まない画像に関しては、画像内における人物の顔の位置を教師信号として教示する処理が不要であるため、新たな画像の追加に伴う処理の負荷を軽減することが可能となる。 Further, according to theimage processing apparatus 1 according to the present embodiment, the newly added image when the error characteristic K has converged is an image that does not include a human face. Suppression learning can be performed by using an image that does not include a human face. In addition, for an image that does not include a person's face, it is not necessary to teach the position of the person's face in the image as a teacher signal, so the processing load associated with the addition of a new image can be reduced. Become.

１画像処理装置
２ニューラルネットワーク
３記憶部
４取得部
５加工部
６設定部
７検知率算出部
１０入力層
１１中間層
１２出力層
３０Ａ〜３０Ｃ処理部
DESCRIPTION OFSYMBOLS 1Image processing apparatus 2Neural network 3 Memory |storage part 4Acquisition part 5Processing part 6Setting part 7 Detectionrate calculation part 10Input layer 11 Intermediate |middle layer 12Output layer 30A-30C Processing part

Claims

Translated fromJapanese

入力層及び出力層を含む複数の処理層を有し、各前記処理層が複数のユニットを含み、前記入力層に入力された入力画像に含まれる人物の顔の位置を示す出力画像を前記出力層から出力する、ニューラルネットワークと、
人物の顔を含む第１の画像を取得する取得手段と、
前記第１の画像に対して所定の加工処理を施すことにより、人物の顔を含む複数の第２の画像を生成する加工手段と、
前記複数の第２の画像を前記入力層に入力することにより、前記複数の第２の画像を教師画像として用いた学習によって、異なる前記処理層に属する各前記ユニット間の重み付け値を設定する設定手段と、
を備える、画像処理装置。An output image having a plurality of processing layers including an input layer and an output layer, each processing layer including a plurality of units, and indicating the position of a human face included in the input image input to the input layer Output from layer, neural network,
An acquisition means for acquiring a first image including a person's face;
Processing means for generating a plurality of second images including a person's face by applying a predetermined processing to the first image;
A setting for setting a weight value between the units belonging to different processing layers by learning using the plurality of second images as a teacher image by inputting the plurality of second images to the input layer. Means,
An image processing apparatus comprising:

前記所定の加工処理には、画像の拡大又は縮小、画像の回転、画像内における顔位置の変更、レンズ歪みの付与、ノイズの付与、及び光源変更の少なくとも一つが含まれる、請求項１に記載の画像処理装置。 The predetermined processing includes at least one of image enlargement or reduction, image rotation, face position change in the image, lens distortion, noise, and light source change. Image processing apparatus.

各前記ユニットの出力値Ｙは、パラメータμ（≧１）と自身のユニット値Ｘとを用いて、

と定義され、
前記設定手段は、パラメータμの値が互いに異なる値に設定された複数の処理系統によって、前記重み付け値の組をそれぞれ求め、得られた複数の組の中から最適な組を選択する、請求項１又は２に記載の画像処理装置。The output value Y of each unit is obtained by using the parameter μ (≧ 1) and its own unit value X,

Defined as
The setting means obtains each set of the weight values by a plurality of processing systems in which the value of the parameter μ is set to a value different from each other, and selects an optimal set from the obtained plurality of sets. The image processing apparatus according to 1 or 2.

前記設定手段は、前記複数の組のうち、学習回数の増加に伴って教師信号と出力信号との誤差が低下し、かつ人物の顔の検知率が最も高い組を、前記最適な組として選択する、請求項３に記載の画像処理装置。 The setting means selects, as the optimum set, the set having the highest error rate between the teacher signal and the output signal and having the highest human face detection rate as the number of learning increases. The image processing apparatus according to claim 3.

第３の画像を記憶する記憶手段をさらに備え、
前記設定手段は、学習回数の増加に伴う教師信号と出力信号との誤差の低下の度合いが所定値未満となった場合に、前記記憶手段から読み出した前記第３の画像を前記入力層に入力することにより、前記重み付け値の設定処理を継続する、請求項１〜４のいずれか一つに記載の画像処理装置。A storage means for storing the third image;
The setting means inputs the third image read from the storage means to the input layer when the degree of decrease in the error between the teacher signal and the output signal with the increase in the number of learnings is less than a predetermined value. The image processing apparatus according to claim 1, wherein the weighting value setting process is continued.

前記第３の画像は、人物の顔を含まない画像である、請求項５に記載の画像処理装置。 The image processing apparatus according to claim 5, wherein the third image is an image that does not include a human face.

入力層及び出力層を含む複数の処理層を有し、各前記処理層が複数のユニットを含み、前記入力層に入力された入力画像に含まれる人物の顔の位置を示す出力画像を前記出力層から出力する、ニューラルネットワークと、
人物の顔を含む第１の画像を取得する取得手段と、
前記第１の画像を前記入力層に入力することにより、前記第１の画像を教師画像として用いた学習によって、異なる前記処理層に属する各前記ユニット間の重み付け値を設定する設定手段と、
を備え、
各前記ユニットの出力値Ｙは、パラメータμ（≧１）と自身のユニット値Ｘとを用いて、

と定義され、
前記設定手段は、パラメータμの値が互いに異なる値に設定された複数の処理系統によって、前記重み付け値の組をそれぞれ求め、得られた複数の組の中から最適な組を選択する、画像処理装置。An output image having a plurality of processing layers including an input layer and an output layer, each processing layer including a plurality of units, and indicating the position of a human face included in the input image input to the input layer Output from layer, neural network,
An acquisition means for acquiring a first image including a person's face;
Setting means for setting a weight value between the units belonging to different processing layers by learning using the first image as a teacher image by inputting the first image to the input layer;
With
The output value Y of each unit is obtained by using the parameter μ (≧ 1) and its own unit value X,

Defined as
The setting means obtains the set of weight values by a plurality of processing systems in which the value of the parameter μ is set to a different value, and selects an optimal set from the plurality of obtained sets. apparatus.

入力層及び出力層を含む複数の処理層を有し、各前記処理層が複数のユニットを含み、前記入力層に入力された入力画像に含まれる人物の顔の位置を示す出力画像を前記出力層から出力する、ニューラルネットワークと、
人物の顔を含む第１の画像を取得する取得手段と、
前記第１の画像を前記入力層に入力することにより、前記第１の画像を教師画像として用いた学習によって、異なる前記処理層に属する各前記ユニット間の重み付け値を設定する設定手段と、
第２の画像を記憶する記憶手段と、
を備え、
前記設定手段は、学習回数の増加に伴う教師信号と出力信号との誤差の低下の度合いが所定値未満となった場合に、前記記憶手段から読み出した前記第２の画像を前記入力層に入力することにより、前記重み付け値の設定処理を継続する、画像処理装置。

An output image having a plurality of processing layers including an input layer and an output layer, each processing layer including a plurality of units, and indicating the position of a human face included in the input image input to the input layer Output from layer, neural network,
An acquisition means for acquiring a first image including a person's face;
Setting means for setting a weight value between the units belonging to different processing layers by learning using the first image as a teacher image by inputting the first image to the input layer;
Storage means for storing a second image;
With
The setting means inputs the second image read from the storage means to the input layer when the degree of reduction in error between the teacher signal and the output signal with the increase in the number of learnings is less than a predetermined value. By doing so, the image processing apparatus continues the setting process of the weighting value.