CN108154066B

Movatterモバイル変換

Info

Publication number: CN108154066B
Application number: CN201611096314.1A
Authority: CN
Inventors: 梁炜; 李杨; 郑萌; 谈金东; 彭士伟
Original assignee: Shenyang Institute of Automation of CAS
Current assignee: Shenyang Institute of Automation of CAS
Priority date: 2016-12-02
Filing date: 2016-12-02
Publication date: 2021-04-27
Anticipated expiration: 2036-12-02
Also published as: CN108154066A

Abstract

Translated fromChinese

本发明涉及图像识别技术，为了有效地刻画三维目标在不同视角下的特征，针对三维目标识别过程中存在的图像噪声问题，提出了一种基于曲率特征递归神经网络的三维目标识别方法。首先，本发明通过计算目标三维模型的局部平均高斯曲率和平均均值曲率得出目标三维模型的联合曲率，并通过提取联合曲率局部极大值构成三维模型的曲率草图，利用透射投影变换生成360°二维图像序列作为训练递归神经网络的输入；其次，利用双向递归神经网络(BRNN)作为三维模型多视角序列特征学习方法，在softmax层利用softmax函数求得正确概率最大的识别类别。本发明能够自动提取三维目标与二维图像的共同特征，能够在图像噪声条件下保持较好的鲁棒性和较高的目标识别率。

The invention relates to an image recognition technology. In order to effectively describe the characteristics of a three-dimensional target under different viewing angles, a three-dimensional target recognition method based on a curvature feature recursive neural network is proposed for the image noise problem existing in the three-dimensional target recognition process. First, the present invention obtains the joint curvature of the target three-dimensional model by calculating the local average Gaussian curvature and the average mean curvature of the target three-dimensional model, and forms the curvature sketch of the three-dimensional model by extracting the local maximum value of the joint curvature, and uses the transmission projection transformation to generate 360° The two-dimensional image sequence is used as the input to train the recurrent neural network; secondly, the bidirectional recurrent neural network (BRNN) is used as the multi-view sequence feature learning method of the three-dimensional model, and the softmax function is used in the softmax layer to obtain the recognition category with the highest probability of correctness. The invention can automatically extract the common features of the three-dimensional target and the two-dimensional image, and can maintain good robustness and high target recognition rate under the condition of image noise.

Description

Translated fromChinese

一种基于曲率特征递归神经网络的三维目标识别方法A 3D Object Recognition Method Based on Curvature Feature Recurrent Neural Network

技术领域technical field

本发明涉及图像识别技术领域，具体地说是一种基于曲率特征递归神经网络的三维目标识别方法。The invention relates to the technical field of image recognition, in particular to a three-dimensional target recognition method based on a curvature feature recursive neural network.

背景技术Background technique

三维目标识别是指从任意给定的二维图像场景中自动检测、定位、识别出指定目标模式的过程，是计算机视觉研究的关键问题之一。随着计算机视觉技术的不断发展，三维目标识别越来越广泛地应用于工业检测、增强现实和医学影像等领域。但是，由于受到光照变化、图像噪声和目标遮挡等因素的影响，难以提取三维目标及其在不同视角下二维图像的共同特征，成为三维目标识别亟待解决的问题。3D target recognition refers to the process of automatically detecting, locating, and recognizing the specified target pattern from any given 2D image scene. It is one of the key issues in computer vision research. With the continuous development of computer vision technology, 3D object recognition is more and more widely used in the fields of industrial inspection, augmented reality and medical imaging. However, due to the influence of factors such as illumination changes, image noise and target occlusion, it is difficult to extract the common features of 3D objects and 2D images from different perspectives, which has become an urgent problem to be solved in 3D object recognition.

三维目标识别的关键是找到三维目标模型的二维表达，提取三维目标和二维图像的共同特征。现有的三维目标识别方法主要包括基于人工标记点的方法、基于几何特征的方法和基于深度学习的方法等。基于人工标记点的方法需要人工初始化二维图像中的特征点，由于需要人工交互，所以此类方法不具有可重复性；基于几何特征的方法通过提取目标的中线骨架、轮廓形状等信息实现目标识别，但是此类方法在图像存在噪声的情况下识别效果较差；基于深度学习的方法利用深度神经网络将低水平的图像特征融合成带有语义信息的高水平特征，能够很好地解决三维目标识别过程中二维图像的图像噪声问题，但是通常使用的深度卷积神经网络无法表达序列属性，不能有效地刻画三维目标在不同视角下的特征。因此，亟需提出一种在不同视角图像中对图像噪声问题鲁棒的自动化三维目标识别方法。The key to 3D object recognition is to find the 2D representation of the 3D object model and extract the common features of the 3D object and the 2D image. Existing 3D target recognition methods mainly include methods based on artificial markers, methods based on geometric features, and methods based on deep learning. The method based on artificial marking points needs to manually initialize the feature points in the two-dimensional image. Due to the need for manual interaction, such methods are not repeatable; the method based on geometric features achieves the target by extracting information such as the midline skeleton, contour shape and other information of the target. However, such methods have poor recognition effect in the presence of noise in the image; deep learning-based methods use deep neural networks to fuse low-level image features into high-level features with semantic information, which can solve the problem of three-dimensional The problem of image noise in two-dimensional images in the process of target recognition, but the commonly used deep convolutional neural network cannot express sequence attributes, and cannot effectively describe the characteristics of three-dimensional targets in different perspectives. Therefore, there is an urgent need to propose an automated 3D object recognition method that is robust to image noise in images from different perspectives.

发明内容SUMMARY OF THE INVENTION

本发明目的是能够更有效地刻画三维目标在不同视角下的特征，降低特征提取过程对图像噪声的敏感程度，提高三维目标识别准确率，本发明提出一种基于曲率特征递归神经网络的三维目标识别方法。The purpose of the invention is to more effectively describe the characteristics of the three-dimensional target under different viewing angles, reduce the sensitivity of the feature extraction process to image noise, and improve the accuracy of the three-dimensional target recognition. recognition methods.

本发明为实现上述目的所采用的技术方案是：一种基于曲率特征递归神经网络的三维目标识别方法，包括以下步骤：The technical scheme adopted by the present invention to achieve the above object is: a three-dimensional target recognition method based on a curvature feature recurrent neural network, comprising the following steps:

步骤1：计算目标三维模型的联合曲率

提取联合曲率

的局部极大值构成三维模型的曲率草图R_Sketch；再对三维模型的曲率草图R_Sketch利用透射投影变换生成360°二维图像P_m，其中m＝1,2,...,360；Step 1: Calculate the joint curvature of the target 3D model

Extract joint curvature

The local maximum value of the 3D model constitutes the curvature sketch R_Sketch of the three-dimensional model; then the 360° two-dimensional image P_m is generated by the transmission projection transformation of the curvature sketch R_Sketch of the three-dimensional model, where m=1,2,...,360;

步骤2：将360°二维图像输入BRNN，利用多角度特征进行学习计算其在多视角下的序列属性；在softmax层利用softmax函数求得序列属性的正确概率最大时的识别类别；所述BRNN为双向递归神经网络。Step 2: Input the 360° two-dimensional image into the BRNN, and use multi-angle features to learn and calculate its sequence attributes under multiple perspectives; use the softmax function in the softmax layer to obtain the recognition category when the correct probability of the sequence attribute is the largest; the BRNN is a bidirectional recurrent neural network.

所述计算目标三维模型的联合曲率

包括以下步骤：The joint curvature of the three-dimensional model of the calculation target

Include the following steps:

设

是目标三维模型R上给定一点(x,y,z)的法向量；令

则p_x,p_y,q_x,q_y定义为

Assume

is the normal vector of a given point (x, y, z) on the target 3D model R; let

Then p_x , p_y , q_x , q_y are defined as

计算三维模型R上每一点的法向量周围3×3邻域内的平均高斯曲率

和平均均值曲率

Calculate the average Gaussian curvature in the 3×3 neighborhood around the normal vector of each point on the 3D model R

and mean mean curvature

其中，

为平均曲率矩阵，trace(·)是矩阵的迹，

分别为p,q,p_x,p_y,q_x,q_y在3×3邻域内的平均值；in,

is the mean curvature matrix, trace( ) is the trace of the matrix,

are the average values of p, q, p_x , p_y , q_x , and q_y in a 3×3 neighborhood;

定义目标三维模型R的联合曲率

为：Define the joint curvature of the target 3D model R

for:

所述将360°二维图像输入BRNN，利用多角度特征进行学习计算出其在多视角下的序列属性，包括以下步骤：The 360° two-dimensional image is input into the BRNN, and the multi-angle features are used to learn and calculate its sequence attributes under the multi-view, including the following steps:

获取360°二维图像的一维特征序列T_S,s＝1,2,...,360，则特征序列T_S在BRNN第i层的输出分为正向输出

和反向输出

并且分别与本层BRNN上一序列的正向输出

本层BRNN下一序列的反向输出

以及上一层BRNN的正向输出

和反向输出

有如下关系：Obtain the one-dimensional feature sequence T_S of the 360° two-dimensional image, s=1,2,...,360, then the output of the feature sequence T_S in the i-th layer of BRNN is divided into forward output

and reverse output

And respectively with the forward output of the previous sequence of BRNN in this layer

The reverse output of the next sequence of BRNN in this layer

and the forward output of the previous layer of BRNN

and reverse output

There are the following relationships:

其中，

为各输出间的权值矩阵，b为偏置，tanh为神经元激活函数；in,

is the weight matrix between each output, b is the bias, and tanh is the neuron activation function;

则特征序列T_S在BRNN的总输出O^s，即为全连接层fc的输入I_fc为：Then the total output O^s of the feature sequence T_S in the BRNN, that is, the input I_fc of the fully connected layer fc is:

其中，

分别为正向输出和反向输出在全连接层的连接权值；in,

are the connection weights of the forward output and the reverse output in the fully connected layer, respectively;

因此，特征序列T_S在全连接层fc的累加输出为

即为序列属性。Therefore, the cumulative output of the feature sequence T_S in the fully connected layer fc is

is the sequence attribute.

所述在softmax层利用softmax函数求得序列属性的正确概率最大时的识别类别，包括以下步骤：Described using the softmax function in the softmax layer to obtain the recognition category when the correct probability of the sequence attribute is the largest, including the following steps:

在softmax层利用softmax函数计算出识别结果为第k类的正确概率p(C_k)In the softmax layer, the softmax function is used to calculate the correct probability p(C_k ) that the recognition result is the kth class

其中，C为识别类别总数，A_k为第k类三维目标的序列属性在全连接层fc的累加输出结果；Among them, C is the total number of recognition categories, A_k is the cumulative output result of the sequence attribute of the k-th three-dimensional target in the fully connected layer fc;

然后利用最大似然估计方法求得损失函数最小值时，即正确概率p(C_k)最大时的识别类别k：Then the maximum likelihood estimation method is used to obtain the minimum value of the loss function, that is, the identification category k when the correct probability p(C_k ) is the largest:

其中，δ(·)是克罗内克函数

r表示特征序列T_S的正确识别类别。where δ( ) is the Kronecker function

r represents the correct recognition category of the feature sequence T_S.

本发明具有以下有益效果及优点：The present invention has the following beneficial effects and advantages:

1.本发明设计的联合曲率草图特征提取方法，能够自动提取三维模型与二维图像的共同特征，并且联合曲率所使用的局部平均高斯曲率和局部平均均值曲率可以有效的解决图像噪声问题。1. The joint curvature sketch feature extraction method designed by the present invention can automatically extract the common features of the three-dimensional model and the two-dimensional image, and the local average Gaussian curvature and the local average mean curvature used by the joint curvature can effectively solve the problem of image noise.

2.本发明设计多角度特征学习双向递归神经网络，能够同时考虑三维模型在多角度下的特征序列，能够在任意角度的二维图像中准确识别三维目标。2. The present invention designs a multi-angle feature learning bidirectional recurrent neural network, which can simultaneously consider the feature sequences of the three-dimensional model under multiple angles, and can accurately identify the three-dimensional target in the two-dimensional image at any angle.

附图说明Description of drawings

图1为本发明方法流程图；Fig. 1 is the flow chart of the method of the present invention;

图2为本发明方法中的多角度特征学习双向递归神经网络框架图。FIG. 2 is a frame diagram of a bidirectional recurrent neural network for multi-angle feature learning in the method of the present invention.

具体实施方式Detailed ways

下面结合附图及实施例对本发明做进一步的详细说明。The present invention will be further described in detail below with reference to the accompanying drawings and embodiments.

本发明主要分为两部分，如图1所示为本发明方法流程图，具体实现过程如下所述。The present invention is mainly divided into two parts, as shown in FIG. 1 is a flow chart of the method of the present invention, and the specific implementation process is as follows.

步骤1：计算目标三维模型的联合曲率，并通过提取联合曲率局部极大值构成三维模型的曲率草图，利用透射投影变换生成360°二维图像作为训练递归神经网络的输入；Step 1: Calculate the joint curvature of the target three-dimensional model, and form the curvature sketch of the three-dimensional model by extracting the local maximum value of the joint curvature, and use the transmission projection transformation to generate a 360° two-dimensional image as the input for training the recurrent neural network;

步骤1.1：设

是三维模型上给定一点(x,y,z)的法向量。令

则p_x,p_y,q_x,q_y定义为

则三维模型的高斯曲率G_K为Step 1.1: Set up

is the normal vector of a given point (x, y, z) on the 3D model. make

Then p_x , p_y , q_x , q_y are defined as

Then the Gaussian curvature G_K of the three-dimensional model is

G_K＝|C|，G_K = |C|,

其中曲率矩阵

三维模型的均值曲率M_K为

trace(·)是矩阵的迹。为了消除噪声影响，本发明计算三维模型上每一点的法向量周围其3×3邻域内的平均高斯曲率

和平均均值曲率

where the curvature matrix

The mean curvature M_K of the 3D model is

trace( ) is the trace of the matrix. In order to eliminate the influence of noise, the present invention calculates the average Gaussian curvature in the 3×3 neighborhood around the normal vector of each point on the three-dimensional model

and mean mean curvature

其中

为平均曲率矩阵，

分别为p,q,p_x,p_y,q_x,q_y在3×3邻域内的平均值。由此，我们定义三维模型的联合曲率

为in

is the mean curvature matrix,

are the average values of p, q, p_x , p_y , q_x , and q_y in a 3×3 neighborhood, respectively. From this, we define the joint curvature of the 3D model

for

步骤1.2：提取联合曲率

的局部最大值点构成三维模型R的曲率草图R_Sketch。通过透视投影变换，生成三维曲率草图R_Sketch的360°二维投影图像P_m,m=1,2,...,360，作为BRNN的输入。Step 1.2: Extract the joint curvature

The local maximum points of , constitute the curvature sketch R_Sketch of the 3D model R . Through perspective projection transformation, a 360° two-dimensional projection image P_m , m=1,2,...,360 of the three-dimensional curvature sketch R_Sketch is generated as the input of the BRNN.

步骤2：本发明采用一种深度递归神经网络(DRNN)作为曲率特征识别方法，DRNN框架如图2所示。利用多角度特征学习BRNN刻画三维模型在多视角下的序列属性，在softmax层利用softmax函数求得正确概率最大的识别类别。Step 2: The present invention adopts a deep recurrent neural network (DRNN) as the curvature feature identification method, and the DRNN framework is shown in FIG. 2 . The multi-angle feature learning BRNN is used to describe the sequence attributes of the 3D model under multi-view, and the softmax function is used in the softmax layer to obtain the recognition category with the highest probability of correctness.

步骤2.1：为了刻画三维模型在不同视角下特征的序列性，定义三维模型在多视角下的一维特征序列为T_S,s＝1,2,...,360，则特征序列T_S在BRNN第i层的输出分为正向输出

和反向输出

分别与本层BRNN上一序列的正向输出

本层BRNN下一序列的反向输出

以及上一层BRNN的正向输出

和反向输出

有如下关系：Step 2.1: In order to describe the sequence of the features of the 3D model under different perspectives, define the one-dimensional feature sequence of the 3D model in multiple perspectives as T_S , s=1,2,...,360, then the feature sequence T_S is in The output of the i-th layer of BRNN is divided into forward output

and reverse output

Respectively with the forward output of the previous sequence of the BRNN in this layer

The reverse output of the next sequence of BRNN in this layer

and the forward output of the previous layer of BRNN

and reverse output

There are the following relationships:

其中

为各输出间的权值矩阵，b为偏执，tanh为神经元激活函数；则特征序列T_S在BRNN的总输出O^s，即为全连接层fc的输入I_fc为in

is the weight matrix between each output, b is paranoia, and tanh is the neuron activation function; then the total output O^s of the feature sequence T_S in the BRNN is the input I_fc of the fully connected layer fc is

其中，

分别为正向输出和反向输出在全连接层的连接权值。in,

are the connection weights of the forward output and reverse output in the fully connected layer, respectively.

步骤2.2：特征序列T_S在全连接层fc的累加输出为

即为序列属性。在softmax层利用softmax函数计算识别结果为第k类的正确概率p(C_k)Step 2.2: The cumulative output of the feature sequence T_S in the fully connected layer fc is

is the sequence attribute. In the softmax layer, the softmax function is used to calculate the correct probability p(C_k ) that the recognition result is the kth class

其中C为识别类别总数，A_k为第k类三维目标的序列属性在全连接层fc的累加输出结果。然后利用最大似然估计方法求得损失函数最小值时，即正确概率p(C_k)最大时的识别类别k：Among them, C is the total number of recognition categories, and A_k is the cumulative output result of the sequence attributes of the k-th three-dimensional objects in the fully connected layer fc. Then the maximum likelihood estimation method is used to obtain the minimum value of the loss function, that is, the identification category k when the correct probability p(C_k ) is the largest:

其中δ(·)是克罗内克函数

r represents the correct recognition category of the feature sequence T_S.

Claims

1. A three-dimensional target identification method based on curvature characteristic recurrent neural network is characterized by comprising the following steps:

step 1: calculating joint curvature of a three-dimensional model of an object

Extracting combined curvatures

The local maximum values form a curvature sketch R of the three-dimensional model_Sketch(ii) a Then, a curvature sketch R of the three-dimensional model is conducted_SketchGeneration of a 360 DEG two-dimensional image P using transmission projective transformation_mWherein m is 1, 2.., 360;

step 2: inputting a 360-degree two-dimensional image into the BRNN, and utilizing multi-angle characteristics to learn and calculate sequence attributes of the image under multiple visual angles; obtaining the identification category when the correct probability of the sequence attribute is maximum by utilizing a softmax function in a softmax layer; the BRNN is a bidirectional recurrent neural network;

the joint curvature of the three-dimensional model of the calculation target

The method comprises the following steps:

is provided with

Is a normal vector of a given point (x, y, z) on the target three-dimensional model R; order to

Then p is_x,p_y,q_x,q_yIs defined as

Calculating the mean Gaussian curvature in a 3 × 3 neighborhood around the normal vector of each point on the three-dimensional model R

And mean curvature

Wherein,

being the mean curvature matrix, trace (-) is the trace of the matrix,

are respectively p, q, p_x,p_y,q_x,q_yAverage in the 3 × 3 neighborhood;

defining a joint curvature of a three-dimensional model R of an object

Comprises the following steps:

2. the method for identifying three-dimensional objects based on curvature feature recurrent neural network as claimed in claim 1, wherein said inputting 360 ° two-dimensional image into BRNN, using multi-angle feature to learn and calculate its sequence attribute under multi-view, comprises the following steps:

one-dimensional characteristic sequence T for acquiring 360-degree two-dimensional image_SS 1,2, 360, then the signature sequence T_SOutput at the i-th layer of BRNN is divided into forward output

And reverse output

And respectively output with a sequence on the BRNN of the local layer in the forward direction

Reverse output of BRNN next sequence at this layer

And the forward output of the upper layer BRNN

And reverse output

The following relationships exist:

wherein,

b is a bias, and tanh is a neuron activation function;

then the characteristic sequence T_STotal output O at BRNN^sI.e. input I of full connection level fc_fcComprises the following steps:

wherein,

respectively is the connection weight of the forward output and the reverse output on the full connection layer;

thus, the signature sequence T_SThe cumulative output at full connection level fc is

I.e. the sequence property.

3. The three-dimensional object recognition method based on the curvature feature recurrent neural network as claimed in claim 1, wherein the recognition class when the correct probability of the sequence attribute is maximum is found by the softmax layer by using the softmax function, comprising the following steps:

calculating the correct probability p (C) of the recognition result being the kth class by utilizing a softmax function at a softmax layer_k)

Wherein C is the total number of identification categories, A_kAccumulating and outputting a result of the sequence attribute of the kth three-dimensional target at the full connection layer fc;

then, the maximum likelihood estimation method is used to obtain the minimum value of the loss function, i.e. the correct probability p (C)_k) Maximum recognition category k:

wherein δ (·) is a kronecker function

r represents a characteristic sequence T_SCorrect identification category of; t is_STo acquire a one-dimensional sequence of features for a 360 ° two-dimensional image, s 1, 2.