CN106529589A

Movatterモバイル変換

Info

Publication number: CN106529589A
Application number: CN201610959069.6A
Authority: CN
Inventors: 罗胜
Original assignee: Wenzhou University
Current assignee: Wenzhou University
Priority date: 2016-11-03
Filing date: 2016-11-03
Publication date: 2017-03-22

Abstract

Translated fromChinese

本发明涉及一种采用降噪堆叠自动编码器网络的视觉目标检测方法，其特征在于，包括步骤：包括步骤：S1、将训练样本的场景图像和目标位置的标记图像作为共同输入，经过多层编码解码后得到同样的输出，然后将输出中的标记图像作为目标检测结果；S2、降噪堆叠自动编码器网络包括多层，第一层作为输入端和输出端，经过简单的编码解码而没有降噪功能，中间若干层通过多次编码解码，找到不同维度间的联系，从样本中学习从场景图像中恢复丢失的标记图像功能，得到场景图像的标记图像；S3、降噪堆叠自动编码器网络逐层抽取特征并恢复丢失信息。可以提高检测精度，可以广泛用于车牌检测、自然环境中字符检测、行人检测、缺陷检测等各种检测应用。

The invention relates to a visual target detection method using a noise-reducing stacked autoencoder network, which is characterized in that it includes the steps: including the step: S1, using the scene image of the training sample and the marked image of the target position as common input, through multi-layer After encoding and decoding, the same output is obtained, and then the marked image in the output is used as the target detection result; S2, the noise reduction stacked autoencoder network includes multiple layers, the first layer is used as the input and output ends, after simple encoding and decoding without Noise reduction function, several layers in the middle are encoded and decoded multiple times to find the connection between different dimensions, learn from the sample to recover the lost marked image function from the scene image, and obtain the marked image of the scene image; S3, denoising stacked autoencoder The network extracts features layer by layer and recovers lost information. The detection accuracy can be improved, and it can be widely used in various detection applications such as license plate detection, character detection in natural environments, pedestrian detection, and defect detection.

Description

Translated fromChinese

采用降噪堆叠自动编码器网络的视觉目标检测方法A Visual Object Detection Approach Using Denoising Stacked Autoencoder Networks

技术领域technical field

本发明涉及视觉目标检测技术领域，更具体地说，涉及一种采用降噪堆叠自动编码器网络的视觉目标检测方法。The invention relates to the technical field of visual target detection, and more specifically, to a visual target detection method using a noise reduction stacked autoencoder network.

背景技术Background technique

从场景图像中检测目标，思路主要有三条：1)去除背景，剩余的就是目标；2)采用模板卷积图像增强目标，然后图像分割直接定位；3)是采用能够抑制背景、突出目标的特征，将图像转换到特征空间，采用机器学习模式识别的方法判断是否存在缺陷。无论是去除背景还是直接定位目标，或者转换到特征空间，往往都使用阈值来分辨缺陷、背景和干扰，而阈值适应复杂多变的场景。边缘检测难以处理边缘模糊和弱对比度对象，形态学易受非均匀照明和对比度的影响，模板匹配方法难以适应目标的形变，也难以确定合适的尺度参数，因此在处理类内变化、类间相似性、复杂干扰等方面表现欠佳，在复杂环境和干扰下鲁棒性差。There are three main ideas for detecting the target from the scene image: 1) remove the background, and the rest is the target; 2) use the template convolution image to enhance the target, and then directly locate the image through image segmentation; 3) use the features that can suppress the background and highlight the target , transform the image into the feature space, and use the method of machine learning pattern recognition to judge whether there is a defect. Whether it is removing the background or directly locating the target, or converting to the feature space, the threshold is often used to distinguish defects, background and interference, and the threshold is adapted to complex and changeable scenes. Edge detection is difficult to deal with blurred edges and weak contrast objects, morphology is easily affected by non-uniform illumination and contrast, template matching method is difficult to adapt to the deformation of the target, and it is difficult to determine the appropriate scale parameters, so it is difficult to deal with intra-class changes and inter-class similarities Poor performance in terms of stability and complex interference, etc., and poor robustness in complex environments and interference.

自动编码器捕捉可以代表输入数据的最重要的因素，以复现输入信号，像PCA一样找到代表原信息的主要成分，这些主要成分就是输入信号的特征，也就是中间层的结果；降噪自动编码器设定模型从含有部分噪声的输入数据中重构不含有噪声的原始输入；降噪自动编码器找到样本中不同维度间的联系，根据部分数据恢复丢失的信息，即模型能够抗噪声和抗遮挡。The autoencoder captures the most important factors that can represent the input data to reproduce the input signal, and finds the main components that represent the original information like PCA. These main components are the characteristics of the input signal, which is the result of the intermediate layer; the noise reduction is automatic The encoder sets the model to reconstruct the original input without noise from the input data containing part of the noise; the denoising autoencoder finds the connection between different dimensions in the sample, and restores the lost information based on part of the data, that is, the model can resist noise and Anti-occlusion.

因此，现有技术亟待有很大的进步。Therefore, prior art urgently needs to have very big progress.

发明内容Contents of the invention

本发明要解决的技术问题在于，针对现有技术的上述的缺陷，提供一种采用降噪堆叠自动编码器网络的视觉目标检测方法，包括步骤：The technical problem to be solved by the present invention is to provide a visual target detection method using a noise reduction stacked autoencoder network for the above-mentioned defects of the prior art, comprising steps:

S1、将训练样本的场景图像和目标位置的标记图像作为共同输入，经过多层编码解码后得到同样的输出，然后将输出中的标记图像作为目标检测结果；S1. Take the scene image of the training sample and the marked image of the target position as a common input, obtain the same output after multi-layer encoding and decoding, and then use the marked image in the output as the target detection result;

S2、降噪堆叠自动编码器网络包括多层，第一层作为输入端和输出端，经过简单的编码解码而没有降噪功能，中间若干层通过多次编码解码，找到不同维度间的联系，从样本中学习从场景图像中恢复丢失的标记图像功能，得到场景图像的标记图像；S2. The noise reduction stacked autoencoder network includes multiple layers. The first layer is used as the input and output ends. After simple encoding and decoding without noise reduction function, several layers in the middle are encoded and decoded multiple times to find the connection between different dimensions. Learn from the samples to recover the missing labeled image function from the scene image to get the labeled image of the scene image;

S3、降噪堆叠自动编码器网络逐层抽取特征并恢复丢失信息。S3. The denoising stacked autoencoder network extracts features layer by layer and restores lost information.

其中，步骤S2还包括步骤：A1、生成第一层自动编码器，将输入信息经过编码和解码后得到与原始输入一样的输出信息，将训练样本的场景图像和目标位置的标记图像作为共同输入F₁，经过编码O₁＝s₁(W₁F₁+b₁)成为中间层O₁，然后解码重构成F₁’＝s₁(W₂O₁+b₂)，模型的参数应该尽可能使重构数据逼近原始向量，即Wherein, step S2 also includes the steps: A1, generate the first layer of automatic encoder, encode and decode the input information to obtain the same output information as the original input, and use the scene image of the training sample and the marked image of the target position as common input F₁ , after encoding O₁ =s₁ (W₁ F₁ +b₁ ) becomes the middle layer O₁ , and then decodes and reconstructs it into F₁ '=s₁ (W₂ O₁ +b₂ ), the parameters of the model should be It is possible to make the reconstructed data approximate the original vector, i.e.

用平方差表示表示重构数据与原始向量间的差异Loss，再加入L₁限制，即稀疏要求，约束每一层中的大部分节点为0，少数不为0。因此上式演变成The square difference is used to express the difference Loss between the reconstructed data and the original vector, and then the L₁ limit is added, that is, the sparse requirement, and most of the nodes in each layer are constrained to be 0, and a few are not 0. Therefore, the above formula becomes

纯净、无噪声的原始数据下，W₂≈W₁^T；Under pure and noise-free raw data, W₂ ≈W₁^T ;

A2、将第一层编码器的输出当成第二层降噪自动编码器的输入，同样最小化第二层降噪自动编码器的重构误差，使得第二层经过编码、解码后所重构的输出与第二层输入一样；A2. Take the output of the first layer encoder as the input of the second layer noise reduction autoencoder, and also minimize the reconstruction error of the second layer noise reduction autoencoder, so that the second layer is reconstructed after encoding and decoding The output of is the same as the input of the second layer;

A3、生成中间若干层降噪自动编码器；A3. Generate several layers of noise reduction autoencoders in the middle;

A4、堆叠各层降噪自动编码器，输入依次经过第一层编码、第二层编码…第n层编码，再依次经过第n层解码…第二层解码、第一层编码，输出与输入一样的信息；A4. Stack each layer of noise reduction autoencoder, the input is sequentially passed through the first layer of encoding, the second layer of encoding...the nth layer of encoding, and then in turn through the nth layer of decoding...the second layer of decoding, the first layer of encoding, output and input the same information;

A5、使用时，用场景图像和空白的标记图像作为共同输入，将标记图像作为噪声干扰下的丢失信息或者遮挡住的信息，经过多层的降噪自动编码器，从场景图像恢复丢失的信息，在最后一层得到场景图像和标记图像，但只取标记图像作为输出。A5. When in use, the scene image and the blank marked image are used as common input, and the marked image is used as the lost or occluded information under noise interference, and the lost information is recovered from the scene image through a multi-layer noise reduction autoencoder , get the scene image and the labeled image in the last layer, but only take the labeled image as output.

实施本发明的采用降噪堆叠自动编码器网络的视觉目标检测方法，降噪堆叠自动编码器网络包括多层，逐层抽取特征并恢复丢失信息，可以提高检测精度，可以广泛用于车牌检测、自然环境中字符检测、行人检测、缺陷检测等各种检测应用。Implement the visual target detection method of the present invention using the noise reduction stacked autoencoder network, the noise reduction stacked autoencoder network includes multiple layers, extracts features layer by layer and restores lost information, can improve detection accuracy, and can be widely used in license plate detection, Various detection applications such as character detection, pedestrian detection, and defect detection in natural environments.

附图说明Description of drawings

下面将结合附图及实施例对本发明作进一步说明，附图中：The present invention will be further described below in conjunction with accompanying drawing and embodiment, in the accompanying drawing:

图1是本发明采用降噪堆叠自动编码器网络的视觉目标检测方法的第一实施例的方法流程图。FIG. 1 is a method flow chart of the first embodiment of the method for detecting a visual object using a denoising stacked autoencoder network in the present invention.

图2是图1中步骤S2包括的方法流程图。FIG. 2 is a flow chart of the method included in step S2 in FIG. 1 .

具体实施方式detailed description

请参阅图1，为本发明采用降噪堆叠自动编码器网络的视觉目标检测方法的第一实施例的模块示意图。图2是图1中步骤S2包括的方法流程图。如图1、图2所示，在本发明第一实施例提供的采用降噪堆叠自动编码器网络的视觉目标检测方法中，至少包括步骤：Please refer to FIG. 1 , which is a block diagram of a first embodiment of a visual object detection method using a denoising stacked autoencoder network according to the present invention. FIG. 2 is a flow chart of the method included in step S2 in FIG. 1 . As shown in Figure 1 and Figure 2, in the visual target detection method using the noise reduction stacked autoencoder network provided by the first embodiment of the present invention, at least steps are included:

具体实施时，S2还包括步骤：During specific implementation, S2 also includes steps:

A1、生成第一层自动编码器，将输入信息经过编码和解码后得到与原始输入一样的输出信息，将训练样本的场景图像和目标位置的标记图像作为共同输入F₁，经过编码O₁＝s₁(W₁F₁+b₁)成为中间层O₁，然后解码重构成F₁’＝s₁(W₂O₁+b₂)，模型的参数应该尽可能使重构数据逼近原始向量，即A1. Generate the first layer of autoencoder, encode and decode the input information to obtain the same output information as the original input, take the scene image of the training sample and the marked image of the target position as the common input F₁ , after encoding O₁ = s₁ (W₁ F₁ +b₁ ) becomes the middle layer O₁ , and then decoded and reconstructed into F₁ '=s₁ (W₂ O₁ +b₂ ), the parameters of the model should make the reconstructed data as close as possible to the original vector ,Right now

A5、使用时，用场景图像和空白的标记图像作为共同输入，将标记图像作为噪声干扰下的丢失信息或者遮挡住的信息，经过多层的降噪自动编码器，从场景图像恢复丢失的信息，在最后一层得到场景图像和标记图像，但只取标记图像作为输出；A5. When in use, the scene image and the blank marked image are used as common input, and the marked image is used as the lost or occluded information under noise interference, and the lost information is recovered from the scene image through a multi-layer noise reduction autoencoder , get the scene image and the labeled image in the last layer, but only take the labeled image as output;

S3、降噪堆叠自动编码器网络逐层抽取特征并恢复丢失信息。通过逐层抽取特征并恢复丢失信息，可以提高所述采用降噪堆叠自动编码器网络的检测精度。S3. The denoising stacked autoencoder network extracts features layer by layer and restores lost information. By extracting features layer by layer and restoring lost information, the detection accuracy of the stacked autoencoder network with noise reduction can be improved.

本发明通过以上实施例的设计，可以做到降噪堆叠自动编码器网络包括多层，逐层抽取特征并恢复丢失信息，可以提高检测精度，可以广泛用于车牌检测、自然环境中字符检测、行人检测、缺陷检测等各种检测应用。Through the design of the above embodiments, the present invention can realize that the noise reduction stacked automatic encoder network includes multiple layers, extracts features layer by layer and restores lost information, can improve detection accuracy, and can be widely used in license plate detection, character detection in natural environments, Pedestrian detection, defect detection and other detection applications.

本发明是根据特定实施例进行描述的，但本领域的技术人员应明白在不脱离本发明范围时，可进行各种变化和等同替换。此外，为适应本发明技术的特定场合，可对本发明进行诸多修改而不脱离其保护范围。因此，本发明并不限于在此公开的特定实施例，而包括所有落入到权利要求保护范围的实施例。The present invention has been described based on specific embodiments, but those skilled in the art will understand that various changes and equivalent substitutions can be made without departing from the scope of the present invention. In addition, in order to adapt to the specific occasion of the technology of the present invention, many modifications can be made to the present invention without departing from its protection scope. Therefore, it is intended that the invention not be limited to the particular embodiments disclosed herein, but include all embodiments falling within the scope of the appended claims.

Claims

1. a kind of employing noise reduction stacks the sensation target detection method of autocoder network, it is characterised in that including step：

S1, using the mark image of the scene image of training sample and target location as common input, decode through multi-layer codingSame output is obtained afterwards, then using the mark image in output as object detection results；

S2, noise reduction stacking autocoder network include multilayer, and ground floor is encoded through simple as input and output endDecrease of noise functions is decoded without, if middle dried layer finds the contact between different dimensions, from sample middle school by multiple coding and decodingHabit recovers the mark image function lost from scene image, obtains the mark image of scene image；

S3, noise reduction stacking autocoder network successively extraction feature recover loss information.

2. employing noise reduction according to claim 1 stacks the sensation target detection method of autocoder network, its featureIt is that step S2 also includes step：

A1, ground floor autocoder is generated, will be input information encoded and obtain and defeated being originally inputted after decodingGo out information, the mark image of the scene image of training sample and target location is input into into F as common₁, encoded O₁=s₁(W₁F₁+b₁) become intermediate layer O₁, then decoding and reconstituting is into F₁'=s₁(W₂O₁+b₂), the parameter of model should make reconstruct as far as possibleData approximation original vector, i.e.,

Difference Loss between reconstruct data and original vector is represented with the difference of two squares, L is added₁Limit, i.e., sparse requirement, aboutMost of node in each layer of beam is 0, and minority is not 0.Therefore above formula is developed into

Under pure, muting initial data, W₂≈W₁^T；

A2, the input that the output of the first layer coder is treated as second layer noise reduction autocoder, it is same to minimize second layer dropMake an uproar the reconstructed error of autocoder so that the second layer is encoded, the output and the second layer that are reconstructed after decoding be input into；

If dried layer noise reduction autocoder in the middle of A3, generation；

A4, each layer noise reduction autocoder of stacking, input sequentially pass through ground floor coding, second layer coding ... n-th layer and encode, thenSequentially pass through n-th layer the second layer decoder of decoding ..., ground floor to encode, the information output and input；

When A5, use, with the mark image of scene image and blank as common input, by mark image as under noise jammingLoss information or the information for sheltering from, through the noise reduction autocoder of multilayer, from the information that scene image recovers to lose,Scene image and mark image are obtained in last layer, but mark image are only taken as output.