CN110210480B

Movatterモバイル変換

Info

Publication number: CN110210480B
Application number: CN201910488332.1A
Authority: CN
Inventors: 万昭祎; 刘毅博; 谢锋明; 姚聪; 杨沐
Original assignee: Beijing Kuangshi Technology Co Ltd
Current assignee: Yuanli Jinzhi (Chongqing) Technology Co.,Ltd.
Priority date: 2019-06-05
Filing date: 2019-06-05
Publication date: 2021-08-10
Anticipated expiration: 2039-06-05
Also published as: CN110210480A

Abstract

Translated fromChinese

本发明提供了一种文字识别方法、装置、电子设备和计算机可读存储介质，该方法包括：获取待检测图像，通过采用二维CTC模型训练之后的全卷积神经网络提取待检测图像的特征信息，得到第一特征信息；第一特征信息包括以下至少之一：表示待检测图像的第一二维空间特征分布中各个特征点属于第一文字序列的第一字符分布概率；表示在第一二维空间特征分布中高度维度上的第一路径转移概率；表示第一二维空间特征分布的各个特征点为第一路径上的起始特征点的第一初始路径概率；利用待检测图像的第一特征信息确定待检测图像中的第一文字序列。本申请缓解了现有的图像序列识别方法由于出现注意力偏移导致的序列预测准确度低的技术问题。

The present invention provides a method, device, electronic device and computer-readable storage medium for character recognition. The method includes: acquiring an image to be detected, and extracting features of the image to be detected by using a fully convolutional neural network trained by a two-dimensional CTC model information to obtain first feature information; the first feature information includes at least one of the following: representing the probability of each feature point belonging to the first character distribution in the first character sequence in the first two-dimensional spatial feature distribution of the image to be detected; The transition probability of the first path on the height dimension in the dimensional spatial feature distribution; the first initial path probability indicating that each feature point of the first two-dimensional spatial feature distribution is the starting feature point on the first path; using the first path probability of the image to be detected A feature information determines the first character sequence in the image to be detected. The present application alleviates the technical problem of low sequence prediction accuracy caused by attention shift in existing image sequence recognition methods.

Description

Translated fromChinese

文字识别方法、装置、电子设备和计算机可读存储介质Character recognition method, apparatus, electronic device and computer-readable storage medium

技术领域technical field

本发明涉及图像处理的技术领域，尤其是涉及一种文字识别方法、装置、电子设备和计算机可读存储介质。The present invention relates to the technical field of image processing, and in particular, to a character recognition method, apparatus, electronic device and computer-readable storage medium.

背景技术Background technique

自然场景中文字的识别，以下简称场景文字识别，是指对自然场景图片中的文字利用计算机算法识别其内容的技术，被广泛运用在自动驾驶、视障辅助、身份认证等多个领域。不同于扫描文件中的文字识别，自然场景中的文字识别面临更大的挑战：复杂的自然背景，不确定的文字方向和排列和大量的颜色变化等，这些都让自然场景中的文字识别的识别精度和实现难度远高于扫描文件的识别。The recognition of text in natural scenes, hereinafter referred to as scene text recognition, refers to the technology that uses computer algorithms to identify the content of text in natural scene pictures. Different from text recognition in scanned documents, text recognition in natural scenes faces greater challenges: complex natural backgrounds, uncertain text directions and arrangements, and a large number of color changes, all of which make text recognition in natural scenes difficult. The recognition accuracy and implementation difficulty are much higher than the recognition of scanned documents.

在现有技术中，广泛使用的基于图像的序列识别方法是基于注意力的模型。在这些注意力模型中，通常使用带有注意力机制的循环神经网络来产生序列预测。具体来说，即在每一个时间步骤使用注意力机制聚焦到一个字符区域，从而产生一个字符预测。基于这种框架的模型本质上也是一个每帧输出的算法，注意力机制提供了一种特征表示和序列预测之间的对齐方式。不过这种模型通常会面临比较严重的注意力偏移的问题：由于上一步的输出和隐状态直接参与下一步预测的计算，序列前面的错误预测往往会导致后续的注意力区域偏移进而带来连续的错误识别。In the prior art, the widely used image-based sequence recognition method is the attention-based model. Among these attention models, recurrent neural networks with attention mechanisms are usually used to produce sequence predictions. Specifically, the attention mechanism is used to focus on a character region at each time step, resulting in a character prediction. A model based on this framework is also essentially a per-frame output algorithm, and the attention mechanism provides an alignment between feature representation and sequence prediction. However, this kind of model usually faces the problem of serious attention offset: because the output and hidden state of the previous step directly participate in the calculation of the next prediction, the wrong prediction in front of the sequence often leads to the subsequent attention area offset and further to continuous error identification.

发明内容SUMMARY OF THE INVENTION

有鉴于此，本发明的目的在于提供一种文字识别方法、装置、电子设备和计算机可读存储介质，以缓解了现有的图像序列识别方法由于出现注意力偏移导致的序列预测准确度低的技术问题。In view of this, the purpose of the present invention is to provide a character recognition method, device, electronic device and computer-readable storage medium, so as to alleviate the low sequence prediction accuracy caused by the attention shift in the existing image sequence recognition method technical issues.

第一方面，本发明实施例提供了一种文字识别方法，包括：获取待检测图像，并通过采用二维CTC模型训练之后的全卷积神经网络提取所述待检测图像的特征信息，得到第一特征信息；其中，所述第一特征信息包括以下至少之一：第一字符分布概率、第一路径转移概率和第一初始路径概率；所述第一字符分布概率为所述待检测图像的第一二维空间特征分布中各个特征点属于第一文字序列的概率，所述第一路径转移概率表示在第一二维空间特征分布中高度维度上的路径选择概率；所述第一初始路径概率表示第一二维空间特征分布的各个特征点为第一路径上的起始特征点的概率，所述第一路径为在第一二维空间特征分布中预测出的能够对齐到第一文字序列的路径；利用所述待检测图像的第一特征信息确定所述待检测图像中的所述第一文字序列。In a first aspect, an embodiment of the present invention provides a method for character recognition, including: acquiring an image to be detected, and extracting feature information of the image to be detected by using a full convolutional neural network trained with a two-dimensional CTC model to obtain a first feature information; wherein the first feature information includes at least one of the following: a first character distribution probability, a first path transition probability, and a first initial path probability; the first character distribution probability is a The probability that each feature point in the first two-dimensional spatial feature distribution belongs to the first text sequence, the first path transition probability represents the path selection probability in the height dimension in the first two-dimensional spatial feature distribution; the first initial path probability Represents the probability that each feature point of the first two-dimensional spatial feature distribution is the starting feature point on the first path, and the first path is predicted in the first two-dimensional spatial feature distribution and can be aligned to the first text sequence. path; determining the first character sequence in the to-be-detected image by using the first feature information of the to-be-detected image.

进一步地，所述全卷积神经网络包括：第一卷积网络、金字塔池化模块和第二卷积网络。Further, the fully convolutional neural network includes: a first convolutional network, a pyramid pooling module and a second convolutional network.

进一步地，所述第一卷积网络为残差卷积神经网络，所述残差卷积神经网络中包括多个卷积模块，且所述多个卷积模块中的部分卷积模块包含空洞卷积层。Further, the first convolutional network is a residual convolutional neural network, the residual convolutional neural network includes a plurality of convolutional modules, and some of the convolutional modules in the plurality of convolutional modules include holes convolutional layer.

进一步地，通过采用二维CTC模型训练之后的全卷积神经网络提取所述待检测图像的特征信息，得到第一特征信息包括：利用所述第一卷积网络对所述待检测图像进行特征提取，得到第一卷积特征信息；利用所述金字塔池化模块对所述第一卷积特征信息进行池化计算，得到不同尺度的池化特征，并对所述不同尺度的池化特征进行级联处理，得到池化特征信息；利用所述第二卷积网络对所述池化特征信息进行卷积计算，得到所述待检测图像的第一特征信息。Further, extracting the feature information of the to-be-detected image by using a fully convolutional neural network after the training of the two-dimensional CTC model, obtaining the first feature information includes: using the first convolutional network to characterize the to-be-detected image. Extraction to obtain first convolution feature information; use the pyramid pooling module to perform pooling calculation on the first convolution feature information to obtain pooling features of different scales, and perform pooling features on the pooling features of different scales. cascade processing to obtain pooled feature information; use the second convolution network to perform convolution calculation on the pooled feature information to obtain the first feature information of the image to be detected.

进一步地，所述方法还包括：获取训练样本图像；通过初始全卷积神经网络提取所述训练样本图像的特征信息，得到第二特征信息；所述第二特征信息包括以下至少之一：第二字符分布概率、第二路径转移概率和第二初始路径概率，所述第二字符分布概率为所述训练样本图像的第二二维空间特征分布中各个特征点属于第二文字序列中的字符的概率，所述第二路径转移概率表示在第二二维空间特征分布中高度维度上的路径选择概率；所述第二初始路径概率表示第二二维空间特征分布的各个特征点为第二路径上的起始特征点的概率，所述第二路径为在第二二维空间特征分布中预测出的能够对齐到第二文字序列的有效路径；利用所述二维CTC模型对所述训练样本图像的第二特征信息进行处理，得到目标损失函数；通过所述目标损失函数训练所述初始全卷积神经网络，得到所述全卷积神经网络。Further, the method further includes: acquiring a training sample image; extracting feature information of the training sample image through an initial full convolutional neural network to obtain second feature information; the second feature information includes at least one of the following: Two character distribution probability, second path transition probability and second initial path probability, the second character distribution probability is that each feature point in the second two-dimensional spatial feature distribution of the training sample image belongs to the character in the second character sequence The second path transition probability represents the path selection probability in the height dimension in the second two-dimensional spatial feature distribution; the second initial path probability represents that each feature point of the second two-dimensional spatial feature distribution is the second The probability of the starting feature point on the path, the second path is an effective path predicted in the second two-dimensional spatial feature distribution that can be aligned to the second text sequence; using the two-dimensional CTC model to train the training The second feature information of the sample image is processed to obtain a target loss function; the initial full convolutional neural network is trained by the target loss function to obtain the full convolutional neural network.

进一步地，利用所述二维CTC模型对所述训练样本图像的第二特征信息进行处理，得到目标损失函数包括：利用所述二维CTC模型对所述第二特征信息进行处理，得到第二路径的条件概率；基于所述第二路径的条件概率确定所述目标损失函数。Further, using the two-dimensional CTC model to process the second feature information of the training sample image to obtain the target loss function includes: using the two-dimensional CTC model to process the second feature information to obtain the second feature information. the conditional probability of the path; the objective loss function is determined based on the conditional probability of the second path.

进一步地，利用所述二维CTC模型对所述第二特征信息进行计算，得到第二路径的条件概率包括：结合动态规划算法和所述第二特征信息中的信息，计算得到目标条件概率β_s,h,w，其中，β_s,h,w表示从第二二维空间特征分布的位置(h,w)上到达第二文字序列中位于第s个位置的字符的所有子路径的概率和，所述第二二维空间特征分布为所述训练样本图像的空间特征分布；利用所述目标条件概率β_s,h,w计算所述第二路径的条件概率。Further, using the two-dimensional CTC model to calculate the second feature information to obtain the conditional probability of the second path includes: combining the dynamic programming algorithm and the information in the second feature information, calculating and obtaining the target conditional probability β._s,h,w , where β_s,h,w represents the probability of all subpaths reaching the character at the sth position in the second character sequence from the position (h,w) of the second two-dimensional spatial feature distribution and, the second two-dimensional spatial feature distribution is the spatial feature distribution of the training sample image; the conditional probability of the second path is calculated by using the target conditional probability β_s,h,w .

进一步地，结合动态规划算法和所述第二特征信息中的信息，计算得到目标条件概率包括：利用目标公式计算所述目标条件概率β_s,h,w，所述目标公式表示为：Further, in combination with the dynamic programming algorithm and the information in the second feature information, calculating the target conditional probability includes: using a target formula to calculate the target conditional probability β_s,h,w , and the target formula is expressed as:

其中，

in,

Ψ_j,w-1,h表示所述第二路径转移概率，表示从所述第二二维空间特征分布中的特征点(j，w-1)到所述第二二维空间特征分布中的特征点(h，w)的转移概率，j表示所述第二二维空间特征分布中的一个高度坐标，Y^*和X'分别表示所述第二文字序列扩展后的标注文字序列和所述第二二维空间特征分布，s表示Y^*中字符的序号，h表示所述第二二维空间特征分布中的另一个高度坐标，w表示所述第二二维空间特征分布中的宽度坐标；h∈[1,2,…H],w∈[1,2,…,W-1]，H表示所述第二二维空间特征分布中的高度信息，W表示所述第二二维空间特征分布中的宽度信息；

属于所述第二字符分布概率，表示在位置(h,w)处的特征点属于第二文字序列中的字符的概率；Ψ_j,0,h是根据所述第二初始路径概率Ψ_j,-1,h计算得到的。

Ψ_j,w-1,h represents the transition probability of the second path, from the feature point (j, w-1) in the second two-dimensional spatial feature distribution to the second two-dimensional spatial feature distribution The transition probability of the feature points (h, w⁾ of The second two-dimensional spatial feature distribution, s represents the serial number of the character in Y^* , h represents another height coordinate in the second two-dimensional spatial feature distribution, and w represents the width in the second two-dimensional spatial feature distribution Coordinates; h∈[1,2,…H],w∈[1,2,…,W-1], H denotes the height information in the second two-dimensional spatial feature distribution, W denotes the second two Width information in the feature distribution of dimensional space;

Belonging to the second character distribution probability, indicating the probability that the feature point at the position (h, w) belongs to the character in the second character sequence; Ψ_{j, 0, h} is based on the second initial path probability Ψ_{j, -1,h} calculated.

进一步地，基于所述第二路径的条件概率确定所述目标损失函数包括：利用公式确定所述目标损失函数，其中，为所述第二路径的条件概率，为所述目标损失函数。Further, determining the target loss function based on the conditional probability of the second path includes: determining the target loss function by using a formula, where is the conditional probability of the second path and is the target loss function.

第二方面，本发明实施例还提供了一种文字识别装置，包括：获取单元，用于获取待检测图像；提取单元，用于通过采用二维CTC模型训练之后的全卷积神经网络提取所述待检测图像的特征信息，得到第一特征信息；其中，所述第一特征信息包括以下至少之一：第一字符分布概率、第一路径转移概率和第一初始路径概率；所述第一字符分布概率为所述待检测图像的第一二维空间特征分布中各个特征点属于第一文字序列的概率，所述第一路径转移概率表示在第一二维空间特征分布中高度维度上的路径选择概率；所述第一初始路径概率表示第一二维空间特征分布的各个特征点为第一路径上的起始特征点的概率，所述第一路径为在第一二维空间特征分布中预测出的能够对齐到第一文字序列的路径；确定单元，用于利用所述待检测图像的第一特征信息确定所述待检测图像中的所述第一文字序列。In a second aspect, an embodiment of the present invention further provides a character recognition device, including: an acquisition unit for acquiring an image to be detected; an extraction unit for extracting all the characters by using a fully convolutional neural network trained with a two-dimensional CTC model. The feature information of the image to be detected is obtained to obtain first feature information; wherein, the first feature information includes at least one of the following: a first character distribution probability, a first path transition probability and a first initial path probability; the first The character distribution probability is the probability that each feature point in the first two-dimensional spatial feature distribution of the image to be detected belongs to the first character sequence, and the first path transition probability represents the path in the height dimension in the first two-dimensional spatial feature distribution selection probability; the first initial path probability represents the probability that each feature point of the first two-dimensional spatial feature distribution is the starting feature point on the first path, and the first path is in the first two-dimensional spatial feature distribution a predicted path that can be aligned to the first character sequence; a determining unit, configured to determine the first character sequence in the image to be detected by using the first feature information of the image to be detected.

第三方面，本发明实施例还提供了一种电子设备，包括存储器、处理器及存储在所述存储器上并可在所述处理器上运行的计算机程序，所述处理器执行所述计算机程序时实现上述第一方面中任一项所述的方法的步骤。In a third aspect, an embodiment of the present invention further provides an electronic device, including a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor executes the computer program When implementing the steps of the method in any one of the above first aspects.

第四方面，本发明实施例还提供了一种计算机可读存储介质，所述计算机可读存储介质上存储有计算机程序，所述计算机程序被处理器运行时执行上述第一方面中任一项所述的方法的步骤。In a fourth aspect, an embodiment of the present invention further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is run by a processor, any one of the above-mentioned first aspects is executed the steps of the method.

在本发明实施例中，首先，获取待检测图像，并通过采用二维CTC模型训练之后的全卷积神经网络提取待检测图像的特征信息，得到第一特征信息，其中，第一特征信息包括以下至少之一：第一字符分布概率、第一路径转移概率和第一初始路径概率；最后，利用待检测图像的第一特征信息确定待检测图像中的所述第一文字序列。通过上述描述可知，在现有技术中，通过注意力模型来识别图像中的序列识别，但是这种模型通常会面临比较严重的注意力偏移的问题从而导致后续的注意力区域偏移进而带来连续的错误识别。然而，在本申请中，所选用的二维CTC模型在训练全卷积神经网络的过程中，保留了图像的第一特征信息，并基于该第一特征信息直接预测出文字序列。二维CTC模型保留图像的第一特征信息，并利用第一特征信息预测文字序列的方式提高了全卷积网络的识别精度，进而缓解了现有的图像序列识别方法由于出现注意力偏移导致的序列预测准确度低的技术问题。In the embodiment of the present invention, first, an image to be detected is acquired, and feature information of the image to be detected is extracted by using a full convolutional neural network trained by a two-dimensional CTC model to obtain first feature information, where the first feature information includes At least one of the following: a first character distribution probability, a first path transition probability, and a first initial path probability; finally, the first character sequence in the to-be-detected image is determined by using the first feature information of the to-be-detected image. It can be seen from the above description that in the prior art, the attention model is used to identify the sequence in the image, but this model usually faces the problem of serious attention offset, which leads to the subsequent attention area offset and further to continuous error identification. However, in this application, the selected two-dimensional CTC model retains the first feature information of the image in the process of training the fully convolutional neural network, and directly predicts the text sequence based on the first feature information. The two-dimensional CTC model retains the first feature information of the image, and uses the first feature information to predict the text sequence to improve the recognition accuracy of the fully convolutional network, thereby alleviating the existing image sequence recognition method caused by attention shift. The technical problem of low accuracy of sequence prediction.

本发明的其他特征和优点将在随后的说明书中阐述，并且，部分地从说明书中变得显而易见，或者通过实施本发明而了解。本发明的目的和其他优点在说明书、权利要求书以及附图中所特别指出的结构来实现和获得。Other features and advantages of the present invention will be set forth in the description which follows, and in part will be apparent from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the description, claims and drawings.

为使本发明的上述目的、特征和优点能更明显易懂，下文特举较佳实施例，并配合所附附图，作详细说明如下。In order to make the above-mentioned objects, features and advantages of the present invention more obvious and easy to understand, preferred embodiments are given below, and are described in detail as follows in conjunction with the accompanying drawings.

附图说明Description of drawings

为了更清楚地说明本发明具体实施方式或现有技术中的技术方案，下面将对具体实施方式或现有技术描述中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图是本发明的一些实施方式，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他的附图。In order to illustrate the specific embodiments of the present invention or the technical solutions in the prior art more clearly, the following briefly introduces the accompanying drawings that need to be used in the description of the specific embodiments or the prior art. Obviously, the accompanying drawings in the following description The drawings are some embodiments of the present invention. For those of ordinary skill in the art, other drawings can also be obtained based on these drawings without creative efforts.

图1是根据本发明实施例的一种电子设备的结构示意图；1 is a schematic structural diagram of an electronic device according to an embodiment of the present invention;

图2是根据本发明实施例的一种文字识别方法的流程图；2 is a flowchart of a method for character recognition according to an embodiment of the present invention;

图3是根据本发明实施例的一种二维特征分布的结构示意图；3 is a schematic structural diagram of a two-dimensional feature distribution according to an embodiment of the present invention;

图4是根据本发明实施例的一种二维特征分布的结构示意图中的子分布图；4 is a sub-distribution diagram in a schematic structural diagram of a two-dimensional feature distribution according to an embodiment of the present invention;

图5是根据本发明实施例的一种全卷积神经网络结构示意图；5 is a schematic structural diagram of a fully convolutional neural network according to an embodiment of the present invention;

图6是根据本发明实施例的一种预测序列的结构示意图；6 is a schematic structural diagram of a prediction sequence according to an embodiment of the present invention;

图7是根据本发明实施例的一种文字识别装置的示意图。FIG. 7 is a schematic diagram of a character recognition apparatus according to an embodiment of the present invention.

具体实施方式Detailed ways

为使本发明实施例的目的、技术方案和优点更加清楚，下面将结合附图对本发明的技术方案进行清楚、完整地描述，显然，所描述的实施例是本发明一部分实施例，而不是全部的实施例。基于本发明中的实施例，本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例，都属于本发明保护的范围。In order to make the purposes, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are part of the embodiments of the present invention, but not all of them. example. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative efforts shall fall within the protection scope of the present invention.

实施例1：Example 1:

首先，参照图1来描述用于实现本发明实施例的文字识别方法的示例电子设备100。First, an exampleelectronic device 100 for implementing the character recognition method of the embodiment of the present invention is described with reference to FIG. 1 .

如图1所示，电子设备100包括一个或多个处理器102以及一个或多个存储装置104。可选地，电子设备还可以包括输入装置106、输出装置108以及摄像机110，这些组件通过总线系统112和/或其它形式的连接机构(未示出)互连。应当注意，图1所示的电子设备100的组件和结构只是示例性的，而非限制性的，根据需要，所述电子设备也可以具有其他组件和结构。As shown in FIG. 1 ,electronic device 100 includes one or more processors 102 and one or more storage devices 104 . Optionally, the electronic device may also include an input device 106, an output device 108, and a camera 110, these components being interconnected by a bus system 112 and/or other form of connection mechanism (not shown). It should be noted that the components and structures of theelectronic device 100 shown in FIG. 1 are only exemplary and not restrictive, and the electronic device may also have other components and structures as required.

所述处理器102可以采用数字信号处理器(Digital Signal Processing，简称DSP)、现场可编程门阵列(Field－Programmable Gate Array，简称FPGA)、可编程逻辑阵列(Programmable logic arrays，简称PLA)和ASIC(Application Specific IntegratedCircuit)中的至少一种硬件形式来实现，所述处理器102可以是中央处理单元(CentralProcessing Unit，简称CPU)、图形处理单元(Graphics Processing Unit，GPU)或者具有数据处理能力和/或指令执行能力的其它形式的处理单元，并且可以控制所述电子设备100中的其它组件以执行期望的功能。The processor 102 may adopt a digital signal processor (Digital Signal Processing, DSP for short), a Field-Programmable Gate Array (FPGA for short), a programmable logic array (Programmable logic arrays, PLA for short), and an ASIC (Application Specific Integrated Circuit) in at least one form of hardware, the processor 102 may be a central processing unit (Central Processing Unit, CPU for short), a graphics processing unit (Graphics Processing Unit, GPU) or has data processing capability and/or or other forms of processing units with instruction execution capabilities, and may control other components in theelectronic device 100 to perform desired functions.

所述存储装置104可以包括一个或多个计算机程序产品，所述计算机程序产品可以包括各种形式的计算机可读存储介质，例如易失性存储器和/或非易失性存储器。所述易失性存储器例如可以包括随机存取存储器(RAM)和/或高速缓冲存储器(cache)等。所述非易失性存储器例如可以包括只读存储器(ROM)、硬盘、闪存等。在所述计算机可读存储介质上可以存储一个或多个计算机程序指令，处理器102可以运行所述程序指令，以实现下文所述的本发明实施例中(由处理器实现)的客户端功能以及/或者其它期望的功能。在所述计算机可读存储介质中还可以存储各种应用程序和各种数据，例如所述应用程序使用和/或产生的各种数据等。The storage device 104 may include one or more computer program products, which may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. The volatile memory may include, for example, random access memory (RAM) and/or cache memory, or the like. The non-volatile memory may include, for example, read only memory (ROM), hard disk, flash memory, and the like. One or more computer program instructions may be stored on the computer-readable storage medium, and the processor 102 may execute the program instructions to implement the client functions (implemented by the processor) in the embodiments of the present invention described below. and/or other desired functionality. Various application programs and various data, such as various data used and/or generated by the application program, etc. may also be stored in the computer-readable storage medium.

所述输入装置106可以是用户用来输入指令的装置，并且可以包括键盘、鼠标、麦克风和触摸屏等中的一个或多个。The input device 106 may be a device used by a user to input instructions, and may include one or more of a keyboard, mouse, microphone, touch screen, and the like.

所述输出装置108可以向外部(例如，用户)输出各种信息(例如，图像或声音)，并且可以包括显示器、扬声器等中的一个或多个。The output device 108 may output various information (eg, images or sounds) to the outside (eg, a user), and may include one or more of a display, a speaker, and the like.

所述摄像机110用于进行获取待检测图像，其中，摄像机所获取的待处理图像经过所述文字识别方法进行处理之后得到待检测图像中的文字序列，例如，摄像机可以拍摄用户期望的图像(例如照片、视频等)，然后，将该图像经过所述文字识别方法进行处理之后得到待检测图像中的文字序列，摄像机还可以将所拍摄的图像存储在所述存储器104中以供其它组件使用。The camera 110 is used to acquire the image to be detected, wherein the image to be processed obtained by the camera is processed by the text recognition method to obtain a text sequence in the image to be detected, for example, the camera can shoot the image desired by the user (eg Photos, videos, etc.), and then, the image is processed by the text recognition method to obtain a text sequence in the image to be detected, and the camera can also store the captured image in the memory 104 for use by other components.

示例性地，用于实现根据本发明实施例的文字识别方法的示例电子设备可以被实现为诸如智能手机、平板电脑等移动终端上。Exemplarily, an example electronic device for implementing the character recognition method according to the embodiment of the present invention may be implemented on a mobile terminal such as a smart phone, a tablet computer, and the like.

实施例2：Example 2:

根据本发明实施例，提供了一种文字识别方法的实施例，需要说明的是，在附图的流程图示出的步骤可以在诸如一组计算机可执行指令的计算机系统中执行，并且，虽然在流程图中示出了逻辑顺序，但是在某些情况下，可以以不同于此处的顺序执行所示出或描述的步骤。According to an embodiment of the present invention, an embodiment of a method for character recognition is provided. It should be noted that the steps shown in the flowchart of the accompanying drawings may be executed in a computer system such as a set of computer-executable instructions, and although A logical order is shown in the flowcharts, but in some cases steps shown or described may be performed in an order different from that herein.

图2是根据本发明实施例的一种文字识别方法的流程图，如图2所示，该方法包括如下步骤：FIG. 2 is a flowchart of a method for character recognition according to an embodiment of the present invention. As shown in FIG. 2 , the method includes the following steps:

步骤S202，获取待检测图像。Step S202, acquiring an image to be detected.

在本实施例中，该待检测图像可以为上述实施例一所描述的电子设备中摄像机110拍摄到的图像，也可以是从其他电子设备中接收到的。In this embodiment, the to-be-detected image may be the image captured by the camera 110 in the electronic device described in the first embodiment, or may be received from other electronic devices.

步骤S204，通过采用二维CTC模型训练之后的全卷积神经网络提取所述待检测图像的特征信息，得到第一特征信息。Step S204 , extracting the feature information of the image to be detected by using the full convolutional neural network after the two-dimensional CTC model training, to obtain the first feature information.

其中，所述第一特征信息包括以下至少之一：第一字符分布概率、第一路径转移概率和第一初始路径概率；所述第一字符分布概率为所述待检测图像的第一二维空间特征分布中各个特征点属于第一文字序列的概率，所述第一路径转移概率表示在第一二维空间特征分布中高度维度上的路径选择概率；所述第一初始路径概率表示第一二维空间特征分布的各个特征点为第一路径上的起始特征点的概率，所述第一路径为在第一二维空间特征分布中预测出的能够对齐到第一文字序列的路径。The first feature information includes at least one of the following: a first character distribution probability, a first path transition probability, and a first initial path probability; the first character distribution probability is a first two-dimensional image of the to-be-detected image The probability that each feature point in the spatial feature distribution belongs to the first text sequence, the first path transition probability represents the path selection probability in the height dimension in the first two-dimensional spatial feature distribution; the first initial path probability represents the first two The probability that each feature point of the dimensional space feature distribution is a starting feature point on a first path, where the first path is a path predicted in the first two-dimensional space feature distribution and can be aligned to the first character sequence.

通过上述描述可知，在现有技术中，通过注意力的模型来识别图像中的文字序列。除此之外，发明人还想到，可以将连接时序分类(Connectionist TemporalClassification，CTC)应用到文字识别方法中。然而CTC模型最初是为语音识别而设计，由于待识别的语音信号为一维信号，因此，传统的CTC模型的处理公式所能够处理的信号为类似于语音信号的一维信号。对于基于图像的文字识别问题，则会产生图像二维特征和CTC模型需要一维分布的矛盾，因此将CTC模型直接应用在文字识别中可能损失重要的特征，并且引入额外的噪声。It can be seen from the above description that, in the prior art, an attention model is used to recognize a text sequence in an image. Besides, the inventor also thought that the connection temporal classification (Connectionist Temporal Classification, CTC) can be applied to the character recognition method. However, the CTC model was originally designed for speech recognition. Since the speech signal to be recognized is a one-dimensional signal, the signal that can be processed by the processing formula of the traditional CTC model is a one-dimensional signal similar to the speech signal. For the image-based text recognition problem, there will be a contradiction between the two-dimensional image features and the one-dimensional distribution of the CTC model. Therefore, the direct application of the CTC model in text recognition may lose important features and introduce additional noise.

基于此，在本申请中，发明人对传统的CTC模型进行了拓展，提出了新的CTC模型(即，二维CTC模型)，该二维CTC模型能够对图像的二维特征进行处理，使得图像的二维特征能够得到保留，并使得全卷积神经网络预测出更加准确的文字序列，其中，图像的二维特征可以表示为一个二维的矩阵，该矩阵中的每个向量用于表征图像中每个像素点的特征信息。Based on this, in this application, the inventor has expanded the traditional CTC model and proposed a new CTC model (ie, a two-dimensional CTC model), which can process the two-dimensional features of the image, so that the The two-dimensional features of the image can be preserved, and the fully convolutional neural network can predict more accurate text sequences. The two-dimensional features of the image can be represented as a two-dimensional matrix, and each vector in the matrix is used to represent The feature information of each pixel in the image.

通过上述描述可知，在本申请中，可以通过该全卷积神经网络对待检测图像进行特征提取，得到第一特征信息。其中，第一特征信息包括：第一字符分布概率、第一路径转移概率和第一初始路径概率。As can be seen from the above description, in the present application, the feature extraction of the image to be detected can be performed through the fully convolutional neural network to obtain the first feature information. The first feature information includes: a first character distribution probability, a first path transition probability, and a first initial path probability.

需要说明的是，在本实施例中，第一二维空间特征分布为待检测图像的特征分布，第一二维空间特征分布可以为如图3所示的分布结构。也就是说，在本申请中，待检测图像的二维空间特征分布可以为高度为H，宽度为W的特征分布结构。It should be noted that, in this embodiment, the first two-dimensional spatial feature distribution is the feature distribution of the image to be detected, and the first two-dimensional spatial feature distribution may be a distribution structure as shown in FIG. 3 . That is to say, in this application, the two-dimensional spatial feature distribution of the image to be detected may be a feature distribution structure with a height of H and a width of W.

在本实施例中，第一字符分布概率表示第一二维空间特征分布中的各个特征点包含第一文字序列中文字的概率。例如，若包含，则概率值设置为1，否则设置为0。第一路径转移概率表示在第一二维空间特征分布中高度维度上的路径选择概率，第一路径转移概率还可以理解为表示第一二维空间特征分布中各个特征点位于第一路径上的概率，其中，第一路径为预测出的能够对齐到第一文字序列的路径。第一初始路径概率表示第一二维空间特征分布的各个特征点为第一路径上的起始特征点的概率，其中，第一初始路径概率还可以理解为第一二维空间特征分布中的各个特征点的字符分布概率中最左位置的值。In this embodiment, the first character distribution probability represents the probability that each feature point in the first two-dimensional spatial feature distribution includes a character in the first character sequence. For example, the probability value is set to 1 if included, and 0 otherwise. The first path transition probability represents the path selection probability in the height dimension in the first two-dimensional spatial feature distribution, and the first path transition probability can also be understood as indicating that each feature point in the first two-dimensional spatial feature distribution is located on the first path. probability, where the first path is a predicted path that can be aligned to the first character sequence. The first initial path probability represents the probability that each feature point of the first two-dimensional spatial feature distribution is a starting feature point on the first path, wherein the first initial path probability can also be understood as the first two-dimensional spatial feature distribution in the probability. The value of the leftmost position in the character distribution probability of each feature point.

步骤S206，利用所述待检测图像的第一特征信息确定所述待检测图像中的所述第一文字序列。Step S206, using the first feature information of the image to be detected to determine the first character sequence in the image to be detected.

在本实施例中，在确定出上述第一特征信息之后，就可以结合第一特征信息确定所述待检测图像中所包含的文字序列(即，第一文字序列)。In this embodiment, after the above-mentioned first feature information is determined, the character sequence (ie, the first character sequence) included in the image to be detected may be determined in combination with the first feature information.

在本实施例中，可以根据第一特征信息计算条件概率P(Y/X)，其中，

之后，可以利用贪心搜索(Greedy Search)或段搜索等方法寻找概率最大的路径，并将该概率最大的路径确定为第一文字序列，A_X,Y为标注序列Y在预测分布X下所有可能的路径，t指X的长度。其中，贪心搜索(Greedy Search)方法的计算公式为：

表示路径π上的所有字符的概率相乘之后的结果。In this embodiment, the conditional probability P(Y/X) may be calculated according to the first feature information, wherein,

After that, you can use methods such as greedy search or segment search to find the path with the highest probability, and determine the path with the highest_probability as the first text sequence. path, t refers to the length of X. Among them, the calculation formula of the greedy search method is:

Represents the result of multiplying the probabilities of all characters on path π.

需要说明的是，在本实施例中，条件概率

表示一条路径中所有字符概率相乘，

表示全部路径A_X,Y的概率乘积的总和。It should be noted that, in this embodiment, the conditional probability

represents the probability multiplication of all characters in a path,

Represents the sum of the probability products of all paths A_{X, Y.}

通过上述描述可知，在现有技术中，通过注意力模型来识别图像中的序列，但是这种模型通常会面临比较严重的注意力偏移的问题从而导致后续的注意力区域偏移进而带来连续的错误识别。然而，在本申请中，所选用的二维CTC模型在训练全卷积神经网络的过程中，保留了图像的第一特征信息，并基于该第一特征信息直接预测出文字序列。二维CTC模型保留图像的第一特征信息，并利用第一特征信息预测文字序列的方式提高了全卷积网络的识别精度，进而缓解了现有的图像序列识别方法由于出现注意力偏移导致的序列预测准确度低的技术问题。As can be seen from the above description, in the prior art, the attention model is used to identify the sequence in the image, but this model usually faces the problem of serious attention offset, which leads to the subsequent attention area offset and brings about Continuous misidentification. However, in this application, the selected two-dimensional CTC model retains the first feature information of the image in the process of training the fully convolutional neural network, and directly predicts the text sequence based on the first feature information. The two-dimensional CTC model retains the first feature information of the image, and uses the first feature information to predict the text sequence to improve the recognition accuracy of the fully convolutional network, thereby alleviating the existing image sequence recognition method caused by attention shift. The technical problem of low accuracy of sequence prediction.

进一步地，发明人想到可以结合CTC模型来识别图像中的序列，但是，传统的CTC模型的处理公式也只能处理一维信号。基于此，在本申请中，对传统的CTC模型进行了拓展，通过拓展之后的二维CTC模型对图像的二维特征进行处理，使得图像的二维特征能够得到保留，使得全卷积神经网络预测出更加准确的文字序列。Further, the inventor thinks that the sequence in the image can be recognized by combining the CTC model, but the processing formula of the traditional CTC model can only process one-dimensional signals. Based on this, in this application, the traditional CTC model is expanded, and the two-dimensional features of the image are processed by the expanded two-dimensional CTC model, so that the two-dimensional features of the image can be preserved, so that the fully convolutional neural network Predict more accurate text sequences.

通过上述描述可知，在本申请中，通过全卷积神经网络提取待检测图像的特征信息。It can be seen from the above description that, in this application, the feature information of the image to be detected is extracted through a fully convolutional neural network.

在一个可选的实施方式中，所述全卷积神经网络包括：第一卷积网络、金字塔池化模块和第二卷积网络。在本实施例中，全卷积神经网络为类金字塔的结构。In an optional embodiment, the fully convolutional neural network includes: a first convolutional network, a pyramid pooling module, and a second convolutional network. In this embodiment, the fully convolutional neural network is a pyramid-like structure.

在本申请中，第一卷积网络可以为多层残差卷积神经网络，例如，50层残差卷积神经网络。该多层残差卷积神经网络中包括多个卷积模块，且所述多个卷积模块中的部分卷积模块包含空洞卷积层。In this application, the first convolutional network may be a multi-layer residual convolutional neural network, for example, a 50-layer residual convolutional neural network. The multi-layer residual convolutional neural network includes a plurality of convolution modules, and some of the convolution modules in the plurality of convolution modules include an atrous convolution layer.

需要说明的是，在本实施例中，多层残差卷积神经网络中包括多个阶段的卷积模块，多个阶段的卷积模块中部分卷积模块包括空洞卷积层。可选地，可以将多个阶段的卷积模块中最后两个阶段的卷积模块中设置空洞卷积层。除此之外，还可以在其他阶段的卷积模块中设置空洞卷积层，本实施例对此不作具体限定。It should be noted that, in this embodiment, the multilayer residual convolutional neural network includes multiple stages of convolution modules, and some convolution modules in the multiple stages of convolution modules include atrous convolution layers. Optionally, atrous convolutional layers may be set in the convolutional modules of the last two stages among the convolutional modules of multiple stages. In addition, a hole convolution layer may also be set in the convolution modules of other stages, which is not specifically limited in this embodiment.

如图5所示的即为一种可选的全卷积神经网络的示意性结构图。在如图5所示的全卷积神经网络中，待检测图像依次经过第一卷积网络(即图中所示的多层残差卷积神经网络)、金字塔池化模块和第二卷积网络，最终得到待检测图像的特征信息，即第一特征信息。Figure 5 is a schematic structural diagram of an optional fully convolutional neural network. In the fully convolutional neural network shown in Figure 5, the image to be detected goes through the first convolutional network (ie, the multi-layer residual convolutional neural network shown in the figure), the pyramid pooling module and the second convolutional network in sequence. network, and finally obtain the feature information of the image to be detected, that is, the first feature information.

如图5所示，在本实施例中，第一卷积网络选择的是包含5个阶段的卷积模块的多层残差卷积神经网络(例如，50层残差卷积神经网络)。需要说明的是，本实施例中，在第四、第五这两个阶段的卷积模块中可以使用空洞卷积，以防止待检测图像的特征表示的分辨率过快地下降。经过数个阶段的卷积模块之后，待检测图像的特征表示获得了足够的感受野。与大部分分割模型一样，全卷积神经网络的计算算法使用了类金字塔结构，即在最后一层卷积之后，待检测图像的特征表示被平均池化到不同的尺寸，之后不同尺度的特征再被串联到一起，通过共享的卷积操作得到统一的特征。通过得到的特征，三种不同的输出再分别经过一层3x3和一层1x1的卷积得到最终的输出。As shown in FIG. 5 , in this embodiment, the first convolutional network is a multi-layer residual convolutional neural network (for example, a 50-layer residual convolutional neural network) including 5-stage convolution modules. It should be noted that, in this embodiment, atrous convolution may be used in the convolution modules of the fourth and fifth stages to prevent the resolution of the feature representation of the image to be detected from decreasing too quickly. After several stages of convolution modules, sufficient receptive fields are obtained for the feature representation of the image to be detected. Like most segmentation models, the calculation algorithm of the fully convolutional neural network uses a pyramid-like structure, that is, after the last layer of convolution, the feature representation of the image to be detected is averagely pooled to different sizes, and then the features of different scales. They are then concatenated together to obtain unified features through a shared convolution operation. Through the obtained features, the three different outputs are respectively subjected to a layer of 3x3 and a layer of 1x1 convolution to obtain the final output.

需要说明的是，在本实施例中，第二卷积网络中可以包括两个卷积层，这两个卷积层的卷积核可以分别选择为：3x3的卷积核和1x1的卷积核，除此之外，还可以选择其他大小的卷积核，本实施例对此不作具体限定。It should be noted that, in this embodiment, the second convolutional network may include two convolutional layers, and the convolution kernels of the two convolutional layers may be selected as: 3×3 convolution kernels and 1×1 convolution kernels respectively In addition, convolution kernels of other sizes may also be selected, which are not specifically limited in this embodiment.

基于此，在本实施例中，步骤S204，通过采用二维CTC模型训练之后的全卷积神经网络提取所述待检测图像的特征信息，得到第一特征信息包括如下步骤：Based on this, in this embodiment, in step S204, the feature information of the image to be detected is extracted by using the fully convolutional neural network after the training of the two-dimensional CTC model, and obtaining the first feature information includes the following steps:

步骤S2041，利用所述第一卷积网络对所述待检测图像进行特征提取，得到第一卷积特征信息；Step S2041, using the first convolution network to perform feature extraction on the to-be-detected image to obtain first convolution feature information;

步骤S2042，利用所述金字塔池化模块对所述第一卷积特征信息进行池化计算，得到不同尺度的池化特征，并对所述不同尺度的池化特征进行级联处理，得到池化特征信息；Step S2042, using the pyramid pooling module to perform pooling calculation on the first convolution feature information to obtain pooled features of different scales, and perform cascade processing on the pooled features of different scales to obtain pooled features characteristic information;

步骤S2043，利用所述第二卷积网络对所述池化特征信息进行卷积计算，得到所述待检测图像的第一特征信息。Step S2043, using the second convolution network to perform convolution calculation on the pooled feature information to obtain first feature information of the image to be detected.

具体地，在本实施例中，可以采用图5所示的全卷积神经网络中的50层残差卷积神经网络对待检测图像进行特征提取，得到第一卷积特征信息。由于在50层残差卷积神经网络的第4阶段和第5阶段中设置了空洞卷积，该空洞卷积能够防止待检测图像的特征表示的分辨率过快地下降，使得待检测图像的特征表示获得了足够的感受野。Specifically, in this embodiment, the 50-layer residual convolutional neural network in the full convolutional neural network shown in FIG. 5 can be used to perform feature extraction on the image to be detected to obtain the first convolutional feature information. Since the atrous convolution is set in the fourth and fifth stages of the 50-layer residual convolutional neural network, the atrous convolution can prevent the resolution of the feature representation of the image to be detected from dropping too quickly, making the image to be detected. The feature representation obtains a sufficient receptive field.

在利用50层残差卷积神经网络得到第一卷积特征信息之后，就可以利用金字塔池化模块对第一卷积特征信息进行池化计算，得到的池化特征为多尺度的特征。在得到多尺度的池化特征之后，就可以对各个尺度的池化特征进行级联处理，得到池化特征信息。After obtaining the first convolution feature information by using the 50-layer residual convolutional neural network, the pyramid pooling module can be used to perform pooling calculation on the first convolution feature information, and the obtained pooling feature is a multi-scale feature. After the multi-scale pooling features are obtained, the pooling features of each scale can be cascaded to obtain the pooling feature information.

在得到池化特征信息之后，就可以利用第二卷积网络对池化特征信息进行卷积计算，得到待检测图像的第一特征信息。若第二卷积网络中包括两个卷积层(即3x3的卷积层和1x1的卷积层)，则可以利用3x3的卷积层和1x1的卷积层依次对池化特征信息进行卷积计算，得到待检测图像的第一特征信息。After the pooled feature information is obtained, the second convolutional network can be used to perform convolution calculation on the pooled feature information to obtain the first feature information of the image to be detected. If the second convolutional network includes two convolutional layers (ie, a 3x3 convolutional layer and a 1x1 convolutional layer), the pooled feature information can be convoluted by using the 3x3 convolutional layer and the 1x1 convolutional layer in turn. Product calculation to obtain the first feature information of the image to be detected.

在本实施例中，在通过采用二维CTC模型训练之后的全卷积神经网络提取所述待检测图像的特征信息之前，还可以利用二维CTC模型对初始全卷积神经网络进行训练，得到步骤S204中所描述的全卷积神经网络。In this embodiment, before the feature information of the to-be-detected image is extracted by using the full convolutional neural network after the training of the two-dimensional CTC model, the initial full convolutional neural network may also be trained by using the two-dimensional CTC model to obtain The fully convolutional neural network described in step S204.

在介绍初始全卷积神经网络的训练过程之前，首先介绍传统的一维CTC。在传统一维CTC模型中引入了“∈”来描述序列中的空白，并通过在预测序列和标注序列中填补空白和重复来对二者进行对齐。其中，标注序列为图像中已标注的文字序列，预测序列该为图像预测出的可能为该文字序列的序列。在如图6所示的序列中，每行序列为预测序列。在该预测序列中，符号“□”表示“∈”，后续实施例中不再进行介绍。如图6所示，第1、3、4行预测序列可以被正确对齐为目标序列“FREE”，第二行的预测序列无法被对齐为目标序列“FREE”。在预测序列中对于指定位置i，i能被跳过当且仅当i处预测为∈或与上一步预测相同。例如，图6中的第一个预测序列“F□R E□E E E”，假设，i为该预测序列中的第2个字符“□”，那么该预测序列在进行对齐处理时，由于第2个字符为“□”，表示∈，因此，第2个字符可以被跳过。又例如，假设，i为该预测序列“F□R E□E E E”中的第7个字符“E”，那么该预测序列在进行对齐处理时，由于第7个字符与第6个字符相同，均为“E”，因此，第7个字符可以被跳过。同理，第8个字符与第7个字符相同，因此，第8个字符可以被跳过。最终，预测序列“F□R E□EE E”的对齐结果为“FREE”。当去除预测中所有可被跳过的位置后，即得到对齐的预测序列。Before introducing the training process of the initial fully convolutional neural network, we first introduce the traditional one-dimensional CTC. In the traditional one-dimensional CTC model, "∈" is introduced to describe the gaps in the sequence, and the predicted and annotated sequences are aligned by filling gaps and repetitions in the two. Wherein, the labeled sequence is the labeled text sequence in the image, and the predicted sequence is the sequence predicted by the image that may be the text sequence. In the sequence shown in Figure 6, each row of sequence is a predicted sequence. In the prediction sequence, the symbol "□" represents "ε", which will not be described in subsequent embodiments. As shown in Figure 6, the prediction sequences in the first, third, and fourth rows can be correctly aligned as the target sequence "FREE", and the prediction sequence in the second row cannot be aligned as the target sequence "FREE". For a given position i in the prediction sequence, i can be skipped if and only if the prediction at i is ∈ or the same as the previous prediction. For example, for the first prediction sequence "F□R E□E E E" in Fig. 6, assuming that i is the second character "□" in the prediction sequence, then when the prediction sequence is aligned, due to the second character "□" The character is "□", which means ∈, so the second character can be skipped. For another example, assuming that i is the seventh character "E" in the prediction sequence "F R E E E E", then when the prediction sequence is aligned, since the seventh character is the same as the sixth character, both is "E", therefore, the 7th character can be skipped. Similarly, the 8th character is the same as the 7th character, so the 8th character can be skipped. Finally, the alignment result of the predicted sequence "F□R E□EE E" is "FREE". When all skipped positions in the prediction are removed, the aligned prediction sequence is obtained.

如上文所述，CTC模型通过计算标注在预测分布上的条件概率来衡量标注序列和预测序列的相似度。从定义出发，这个条件概率为：

As mentioned above, the CTC model measures the similarity between annotated sequences and predicted sequences by calculating the conditional probability of annotated on the predicted distribution. By definition, this conditional probability is:

具体地，Y和X分别为标注序列和预测分布，A_X,Y为标注序列Y在预测分布X下所有可能的路径，t指X的长度。由于所有可能的路径是一个非常巨大的数量级，遍历地计算所有路径的概率并求和是非常低效的，因此，在本申请的实施例中可以使用动态规划来解决这类问题。Specifically, Y and X are the annotation sequence and prediction distribution, respectively, A_{X, Y} are all possible paths of the annotation sequence Y under the prediction distribution X, and t refers to the length of X. Since all possible paths are of a very large order of magnitude, it is very inefficient to ergonomically calculate and sum the probabilities of all paths, therefore, dynamic programming can be used to solve such problems in embodiments of the present application.

首先，由于目标序列中各个符号前后是否带有∈的情况是等价的，对目标序列Y进行如下扩展以使描述更加清楚：Y^*＝[∈,y₁,∈,y₂,∈,…,y_L,∈]。其中，Y^*是扩展之后的目标序列，即在每个符号前后各插入一个∈，则原来长度为L的目标序列Y被扩展为长度为2L+1的Y*。First, since it is equivalent whether each symbol in the target sequence has ∈ before and after, the target sequence Y is expanded as follows to make the description clearer: Y^* =[∈,y₁ ,∈,y₂ ,∈,… , y_L , ∈]. Among them, Y^* is the target sequence after expansion, that is, one ∈ is inserted before and after each symbol, then the original target sequence Y of length L is extended to Y* of length 2L+1.

对于给定的s∈[1,2,…,2L+1]，设Y*[1:s]为Y*的前s个字符，则定义α_s,t为Y*[1:s]在时刻t的概率，该概率表示在t时刻到达序列Y*的第s个位置的所有可能子路径的概率和。For a given s∈[1,2,…,2L+1], let Y*[1:s] be the first s characters of Y*, then define α_s,t as Y*[1:s] in The probability at time t that represents the sum of the probabilities of all possible subpaths reaching the sth position of the sequence Y* at time t.

因此，对于第s-1个符号不可以被忽略的情况，即Y_s^*＝∈或者的情况，α_s,t满足以下公式：

Therefore, for the case where the s-1th symbol cannot be ignored, that is, the case where Y_s^* = ∈ or, α_s,t satisfies the following formula:

对于其他不可以忽略第s-1个符号的情况，即若Y_s^*≠∈且

则α_s,t可由如下公式计算：

For other cases where the s-1th symbol cannot be ignored, i.e. if Y_s^* ≠∈ and

Then α_s,t can be calculated by the following formula:

其中，Y_s^*表示扩展之后的目标序列中的第s个字符，

表示扩展之后的目标序列中的第s-2个字符。where Y_s^* represents the s-th character in the target sequence after expansion,

Represents the s-2 character in the target sequence after expansion.

总结起来，CTC模型的动态规划状态转移方程可以表示为如下公式：To sum up, the dynamic programming state transition equation of the CTC model can be expressed as the following formula:

基于传统的一维CTC模型，本申请所提供的实施例在高度维对该一维CTC模型进行扩展。类似地，对于给定的二维分布X'，其高度信息和宽度信息分别为H和W，定义路径转移概率ψ∈R^H×(W-1)×H。路径转移概率ψ_h,w,h'表示从预测分布的位置(h,w)到位置(h',w+1)的路径转移概率，其中，h,h'∈[1,2,…H]，w∈[1,2,…,W-1]。Based on the traditional one-dimensional CTC model, the embodiments provided in this application extend the one-dimensional CTC model in the height dimension. Similarly, for a given two-dimensional distribution X', whose height information and width information are H and W, respectively, define the path transition probability ψ∈R^H×(W-1)×H . The path transition probability ψ_h,w,h' represents the path transition probability from the position (h,w) of the predicted distribution to the position (h',w+1), where h,h'∈[1,2,…H ], w∈[1,2,…,W-1].

以图3所示的二维空间特征分布为例来进行说明。如图3所示的为一个Q*H*W大小的空间特征分布图，以图3中的任意一个H*W大小的子分布图来说，即如图4所示的子分布图。假设，坐标为(h,w)为图4中符号“1”所示的位置，那么坐标(h',w+1)为图4中符号“2”、“3”、“4”和“5”所示的位置。The description will be given by taking the two-dimensional spatial feature distribution shown in FIG. 3 as an example. As shown in FIG. 3 is a spatial feature distribution map with a size of Q*H*W, taking any sub-distribution map of H*W size in FIG. 3 as an example, that is, the sub-distribution map shown in FIG. 4 . Assuming that the coordinates (h, w) are the positions indicated by the symbol "1" in Figure 4, then the coordinates (h', w+1) are the symbols "2", "3", "4" and " 5" in the position shown.

由此易得，

该公式表示从预测分布的一个位置到该预测分布中所有高度的路径转移概率之和为1。因此，由图4可知，表示符号“1”所示的位置到符号“2”、“3”、“4”和“5”所示位置的路径转移概率之和为1。It is thus easy to obtain,

This formula states that the sum of the transition probabilities of paths from a location in a predicted distribution to all heights in that predicted distribution is 1. Therefore, it can be seen from FIG. 4 that the sum of the path transition probabilities from the position indicated by the symbol "1" to the positions indicated by the symbols "2", "3", "4" and "5" is 1.

与一维CTC类似地，对目标序列进行同样的扩展得到扩展之后的目标序列Y*。于是使用类似的推导过程可得二维CTC模型的状态转移方程：Similar to the one-dimensional CTC, the same expansion is performed on the target sequence to obtain the expanded target sequence Y*. So using a similar derivation process, the state transition equation of the two-dimensional CTC model can be obtained:

具体地，Ψ_j,w-1,h表示所述第二路径转移概率，表示从所述第二二维空间特征分布中的特征点(j，w-1)到所述第二二维空间特征分布中的特征点(h，w)的转移概率，j表示所述第二二维空间特征分布中的一个高度坐标，Y^*和X'分别表示所述第二文字序列扩展后的标注文字序列和所述第二二维空间特征分布，s表示Y^*中字符的序号，h表示所述第二二维空间特征分布中的另一个高度坐标，w表示所述第二二维空间特征分布中的宽度坐标，h∈[1,2,…H],w∈[1,2,…,W-1]，H表示所述第二二维空间特征分布中的高度信息，W表示所述第二二维空间特征分布中的宽度信息；

属于所述第二字符分布概率，表示在位置(h,w)处的特征点属于第二文字序列中的字符Y_s^*的概率；β_s,h,w表示从第二二维空间特征分布的位置(h,w)上到达第二文字序列中位于第s个位置的字符的所有子路径的概率和。Specifically, Ψ_j,w-1,h represents the transition probability of the second path, representing the transition from the feature point (j, w-1) in the feature distribution of the second two-dimensional space to the second two-dimensional space The transition probability of the feature points (h, w) in the feature distribution, j represents a height coordinate in the second two-dimensional spatial feature distribution, and Y^* and X' represent the expanded annotation text of the second text sequence, respectively sequence and the second two-dimensional spatial feature distribution, s represents the serial number of characters in Y^* , h represents another height coordinate in the second two-dimensional spatial feature distribution, and w represents the second two-dimensional spatial feature distribution The width coordinates in , h∈[1,2,…H],w∈[1,2,…,W-1], H represents the height information in the second two-dimensional spatial feature distribution, and W represents the width information in the second two-dimensional spatial feature distribution;

The probability of belonging to the second character distribution represents the probability that the feature point at the position (h, w) belongs to the character Y_s^* in the second character sequence; β_{s, h, w} represent the feature distribution from the second two-dimensional space The sum of the probabilities of all subpaths reaching the character at position s in the second word sequence at position (h, w) of .

最后，由于二维CTC模型在二维空间特征分布的高度维上有H个点可以作为起始点，基于此，β的起始状态可以被定义为：Finally, since the two-dimensional CTC model has H points on the height dimension of the two-dimensional spatial feature distribution that can be used as starting points, based on this, the starting state of β can be defined as:

其中，Γ_h∈R^H，且

R^H表示实数域上的H维向量。

where Γ_h ∈ R^H , and

^RH represents an H-dimensional vector over the real number domain.

通过如上所述公示，二维CTC模型可以通过序列标注端到端地对初始全卷积神经网络进行训练。在测试阶段，可以通过与一维CTC模型类似的方式，即通过贪心算法或段搜索来寻找概率最大的路径，其中，寻找概率最大的路径的过程即为寻找第二文字序列的过程。As disclosed above, the 2D CTC model can train the initial fully convolutional neural network end-to-end with sequence annotation. In the testing phase, a path with the highest probability can be found by a method similar to the one-dimensional CTC model, namely, a greedy algorithm or segment search, wherein the process of finding the path with the highest probability is the process of finding the second text sequence.

基于上述所描述的内容，在本实施例中，对初始全卷积神经网络进行训练的过程描述如下：Based on the content described above, in this embodiment, the process of training the initial fully convolutional neural network is described as follows:

步骤S301，获取训练样本图像；Step S301, acquiring training sample images;

步骤S302，通过初始全卷积神经网络提取所述训练样本图像的特征信息，得到第二特征信息；Step S302, extracting feature information of the training sample image through an initial full convolutional neural network to obtain second feature information;

步骤S303，利用所述二维CTC模型对所述训练样本图像的第二特征信息进行处理，得到目标损失函数；Step S303, using the two-dimensional CTC model to process the second feature information of the training sample image to obtain a target loss function;

步骤S304，通过所述目标损失函数训练所述初始全卷积神经网络，得到所述全卷积神经网络。Step S304: Train the initial full convolutional neural network by using the target loss function to obtain the full convolutional neural network.

具体地，在本实施例中，在训练初始全卷积神经网络时，首先获取训练样本图像，然后，通过初始全卷积神经网络提取该训练样本图像的特征信息，得到第二特征信息。第二特征信息中同样包括：第二字符分布概率，第二路径转移概率和第二初始路径概率。Specifically, in this embodiment, when training the initial full convolutional neural network, first obtain the training sample image, and then extract the feature information of the training sample image through the initial full convolutional neural network to obtain the second feature information. The second feature information also includes: the second character distribution probability, the second path transition probability and the second initial path probability.

其中，第二字符分布概率为所述训练样本图像的第二二维空间特征分布中各个特征点属于第二文字序列中的字符的概率，第二路径转移概率表示在第二二维空间特征分布中高度维度上的路径选择概率；第二初始路径概率表示第二二维空间特征分布的各个特征点为第二路径上的起始特征点的概率，所述第二路径为在第二二维空间特征分布中预测出的能够对齐到第二文字序列的有效路径。Wherein, the second character distribution probability is the probability that each feature point in the second two-dimensional spatial feature distribution of the training sample image belongs to a character in the second character sequence, and the second path transition probability represents the feature distribution in the second two-dimensional space The path selection probability in the middle-height dimension; the second initial path probability represents the probability that each feature point of the second two-dimensional spatial feature distribution is the starting feature point on the second path, and the second path is in the second two-dimensional space. A valid path predicted from the spatial feature distribution that can be aligned to the second text sequence.

在按照上述所描述的方式得到第二特征信息之后，就可以利用二维CTC模型对训练样本图像的第二特征信息进行计算，得到目标损失函数。之后，就可以通过该目标损失函数训练该初始全卷积神经网络，得到步骤S204中所描述的全卷积神经网络。After the second feature information is obtained in the manner described above, the two-dimensional CTC model can be used to calculate the second feature information of the training sample image to obtain the target loss function. After that, the initial fully convolutional neural network can be trained through the target loss function to obtain the fully convolutional neural network described in step S204.

在一个可选的实施方式中，可以通过以下步骤，利用所述二维CTC模型对所述训练样本图像的第二特征信息进行处理，得到目标损失函数，具体包括如下步骤：In an optional embodiment, the two-dimensional CTC model can be used to process the second feature information of the training sample image through the following steps to obtain an objective loss function, which specifically includes the following steps:

首先，利用所述二维CTC模型对所述第二特征信息进行处理，得到第二路径的条件概率；所述第二路径为所述初始全卷积神经网络在所述训练样本图像的二维空间特征分布中预测出的能够对齐到所述训练样本图像中第二文字序列的有效路径。First, use the two-dimensional CTC model to process the second feature information to obtain the conditional probability of the second path; the second path is the two-dimensional image of the training sample image of the initial fully convolutional neural network. An effective path predicted from the spatial feature distribution that can be aligned to the second text sequence in the training sample image.

在计算第二路径的条件概率时，可以首先，结合动态规划算法和所述第二特征信息中的信息，计算得到目标条件概率β_s,h,w，其中，β_s,h,w表示从第二二维空间特征分布的位置(h,w)上到达第二文字序列中位于第s个位置的字符的所有子路径的概率和，所述第二二维空间特征分布为所述训练样本图像的空间特征分布。When calculating the conditional probability of the second path, first, combining the dynamic programming algorithm and the information in the second feature information, the target conditional probability β_s,h,w can be calculated and obtained, where β_s,h,w represents the The sum of the probabilities of all sub-paths reaching the character at the s-th position in the second character sequence at the position (h, w) of the second two-dimensional spatial feature distribution, where the second two-dimensional spatial feature distribution is the training sample Spatial feature distribution of the image.

具体地，计算目标条件概率β_s,h,w的过程可以描述为：利用目标公式计算所述目标条件概率β_s,h,w，所述目标公式表示为：

在得到条件概率β_s,h,w之后，就可以利用所述目标条件概率β_s,h,w计算所述第二路径的条件概率。Specifically, the process of calculating the target conditional probability β_s,h,w can be described as: using the target formula to calculate the target conditional probability β_s,h,w , and the target formula is expressed as:

After the conditional probability β_s,h,w is obtained, the conditional probability of the second path can be calculated by using the target conditional probability β_s,h,w .

其中，

in,

Ψ_j,w-1,h表示第二路径转移概率，表示从所述第二二维空间特征分布中的特征点(j，w-1)到所述第二二维空间特征分布中的特征点(h，w)的转移概率，j表示所述第二二维空间特征分布中的一个高度序号，Y*和X'分别表示所述第二文字序列扩展后的标注文字序列和所述第二二维空间特征分布，s表示Y^*中字符的序号，h表示所述第二二维空间特征分布中的另一个高度坐标，w表示所述第二二维空间特征分布中的宽度坐标；h∈[1,2,…H],w∈[1,2,…,W-1]，H表示所述第二二维空间特征分布中的高度信息，W表示所述第二二维空间特征分布中的宽度信息；

Ψ_j,w-1,h represents the second path transition probability, representing the feature from the feature point (j, w-1) in the second two-dimensional spatial feature distribution to the feature in the second two-dimensional spatial feature distribution The transition probability of point (h, w), j represents a height sequence number in the second two-dimensional spatial feature distribution, Y* and X' represent the expanded label text sequence of the second text sequence and the first text sequence respectively. Two-dimensional spatial feature distribution, s represents the serial number of the character in Y^* , h represents another height coordinate in the second two-dimensional spatial feature distribution, and w represents the width coordinate in the second two-dimensional spatial feature distribution; h∈[1,2,…H],w∈[1,2,…,W-1], H denotes the height information in the feature distribution of the second two-dimensional space, and W denotes the second two-dimensional space width information in the feature distribution;

需要说明的是，在本实施例中，可以根据公式计算第二路径的条件概率，该公式表示为：It should be noted that, in this embodiment, the conditional probability of the second path can be calculated according to the formula, and the formula is expressed as:

其中，L＝|Y|，Y为Y^*扩展之前用于表征目标序列的向量，L为向量Y取模长之后的数值。

Wherein, L=|Y|, Y is the vector used to characterize the target sequence before Y^* expansion, and L is the value after the modulo length of the vector Y is taken.

在按照上述所描述的方式得到第二路径的条件概率P(Y/X')之后，就可以按照如下公式计算目标损失函数。After the conditional probability P(Y/X') of the second path is obtained in the manner described above, the objective loss function can be calculated according to the following formula.

然后，基于所述第二路径的条件概率确定所述目标损失函数Loss。其中，该公式为：Loss＝-lnP(Y/X')。为所述第二路径的条件概率，为所述目标损失函数。Then, the objective loss function Loss is determined based on the conditional probability of the second path. Wherein, the formula is: Loss=-lnP(Y/X'). is the conditional probability of the second path, and is the objective loss function.

通过上述描述可知，在本实施例中，结合了CTC模型来实现图像中序列的识别，同时，发明人为了解决现有的传统CTC模型的限制，发明人还拓展了传统的CTC模型，提出新的二维CTC模型以直接从二维概率分布计算目标序列的条件概率。更具体地说，在传统CTC模型的基础上，本申请所提供的方法在搜索路径中除时间维之外加入了高度维，路径搜索可以在在不同高度之间进行。搜索路径在不同高度上的选择依然可以指向同一个目标序列，同样地，所有路径的条件概率之和为目标序列的条件概率。It can be seen from the above description that in this embodiment, the CTC model is combined to realize the sequence recognition in the image. At the same time, in order to solve the limitation of the existing traditional CTC model, the inventor also expands the traditional CTC model and proposes a new The 2D CTC model of the 2D model to calculate the conditional probability of the target sequence directly from the 2D probability distribution. More specifically, on the basis of the traditional CTC model, the method provided by the present application adds a height dimension to the search path in addition to the time dimension, and the path search can be performed between different heights. The selection of search paths at different heights can still point to the same target sequence. Similarly, the sum of the conditional probabilities of all paths is the conditional probability of the target sequence.

通过将传统一维CTC模型拓展到二维，基于图像的序列识别可以保留图像的二维特征，从二维分布直接计算和标注的相似度，从而大幅提高识别准确率。此外，由于二维信息的存在，这种扩展还提供了处理曲形、偏转和透视变形文字的能力。本申请中二维CTC模型的提出给文字识别方法带来了新的角度，以更加自然的方式处理基于图像的序列识别问题，使得该问题中保留图像的二维分布成为可能。By extending the traditional one-dimensional CTC model to two-dimensional, image-based sequence recognition can retain the two-dimensional characteristics of the image, and directly calculate and label the similarity from the two-dimensional distribution, thereby greatly improving the recognition accuracy. In addition, this extension provides the ability to handle curved, deflection, and perspective distorted text due to the presence of 2D information. The proposal of the two-dimensional CTC model in this application brings a new angle to the text recognition method, and handles the image-based sequence recognition problem in a more natural way, making it possible to preserve the two-dimensional distribution of images in this problem.

此外，对于CTC概率的计算过程，简单地计算所有路径的概率再求和的计算方式计算代价非常大，本发明提出了一种动态规划算法，大幅降低了计算二维条件概率的计算复杂度，使得在识别网络中使用二维CTC的计算代价可以几乎不计。In addition, for the calculation process of CTC probability, the calculation method of simply calculating the probability of all paths and then summing up the calculation cost is very high. The present invention proposes a dynamic programming algorithm, which greatly reduces the computational complexity of calculating the two-dimensional conditional probability. This makes the computational cost of using 2D CTC in the recognition network almost negligible.

实施例3：Example 3:

本发明实施例还提供了一种文字识别装置，该文字识别装置主要用于执行本发明实施例上述内容所提供的文字识别方法，以下对本发明实施例提供的文字识别装置做具体介绍。An embodiment of the present invention further provides a character recognition device, which is mainly used to execute the character recognition method provided by the above content of the embodiment of the present invention. The text recognition device provided by the embodiment of the present invention will be described in detail below.

图7是根据本发明实施例的一种文字识别装置的示意图，如图7所示，该文字识别装置主要包括获取单元10、提取单元20和确定单元30，其中：FIG. 7 is a schematic diagram of a character recognition apparatus according to an embodiment of the present invention. As shown in FIG. 7 , the character recognition apparatus mainly includes an acquisition unit 10, an extraction unit 20 and a determination unit 30, wherein:

获取单元10，用于获取待检测图像；an acquisition unit 10, configured to acquire an image to be detected;

提取单元20，用于通过采用二维CTC模型训练之后的全卷积神经网络提取所述待检测图像的特征信息，得到第一特征信息；The extraction unit 20 is used for extracting the feature information of the image to be detected by adopting the full convolutional neural network after the training of the two-dimensional CTC model to obtain the first feature information;

其中，所述第一特征信息包括以下至少之一：第一字符分布概率、第一路径转移概率和第一初始路径概率；所述第一字符分布概率为所述待检测图像的第一二维空间特征分布中各个特征点属于第一文字序列的概率，所述第一路径转移概率表示在第一二维空间特征分布中高度维度上的路径选择概率；所述第一初始路径概率表示第一二维空间特征分布的各个特征点为第一路径上的起始特征点的概率，所述第一路径为在第一二维空间特征分布中预测出的能够对齐到第一文字序列的路径；The first feature information includes at least one of the following: a first character distribution probability, a first path transition probability, and a first initial path probability; the first character distribution probability is a first two-dimensional image of the to-be-detected image The probability that each feature point in the spatial feature distribution belongs to the first text sequence, the first path transition probability represents the path selection probability in the height dimension in the first two-dimensional spatial feature distribution; the first initial path probability represents the first two The probability that each feature point of the dimensional space feature distribution is a starting feature point on a first path, and the first path is a path predicted in the first two-dimensional space feature distribution that can be aligned to the first text sequence;

确定单元30，用于利用所述待检测图像的第一特征信息确定所述待检测图像中的所述第一文字序列。The determining unit 30 is configured to determine the first character sequence in the image to be detected by using the first feature information of the image to be detected.

在本发明实施例中，首先，获取待检测图像，并通过采用二维CTC模型训练之后的全卷积神经网络提取待检测图像的特征信息，得到第一特征信息，其中，第一特征信息包括以下至少之一：字符分布概率、路径转移概率和初始路径概率；最后，利用待检测图像的第一特征信息确定待检测图像中的所述第一文字序列。通过上述描述可知，本申请采用二维CTC模型对全卷积神经网络进行训练，并利用训练之后的全卷积神经网络对待检测图像进行序列识别的方式，能够提高全卷积网络的识别精度，进而缓解了现有的图像序列识别方法由于出现注意力偏移导致的序列预测准确度低的技术问题。In the embodiment of the present invention, first, an image to be detected is acquired, and feature information of the image to be detected is extracted by using a full convolutional neural network trained by a two-dimensional CTC model to obtain first feature information, where the first feature information includes At least one of the following: character distribution probability, path transition probability and initial path probability; finally, the first character sequence in the image to be detected is determined by using the first feature information of the image to be detected. It can be seen from the above description that the present application uses the two-dimensional CTC model to train the fully convolutional neural network, and uses the trained fully convolutional neural network to perform sequence recognition on the images to be detected, which can improve the recognition accuracy of the fully convolutional network. In this way, the technical problem of low sequence prediction accuracy caused by attention shift in existing image sequence recognition methods is alleviated.

可选地，所述全卷积神经网络包括：第一卷积网络、金字塔池化模块和第二卷积网络。Optionally, the fully convolutional neural network includes: a first convolutional network, a pyramid pooling module and a second convolutional network.

可选地，所述第一卷积网络为残差卷积神经网络，所述残差卷积神经网络中包括多个卷积模块，且所述多个卷积模块中的部分卷积模块包含空洞卷积层。Optionally, the first convolutional network is a residual convolutional neural network, the residual convolutional neural network includes a plurality of convolutional modules, and some of the convolutional modules in the plurality of convolutional modules include: Atrous convolutional layer.

可选地，提取单元20用于：利用所述第一卷积网络对所述待检测图像进行特征提取，得到第一卷积特征信息；利用所述金字塔池化模块对所述第一卷积特征信息进行池化计算，得到不同尺度的池化特征，并对所述不同尺度的池化特征进行级联处理，得到池化特征信息；利用所述第二卷积网络对所述池化特征信息进行卷积计算，得到所述待检测图像的第一特征信息。Optionally, the extraction unit 20 is configured to: use the first convolution network to perform feature extraction on the image to be detected to obtain first convolution feature information; use the pyramid pooling module to perform feature extraction on the first convolution The feature information is pooled to obtain pooled features of different scales, and the pooled features of different scales are cascaded to obtain pooled feature information; the second convolutional network is used to analyze the pooled features. The information is subjected to convolution calculation to obtain the first feature information of the image to be detected.

可选地，所述装置还用于：获取训练样本图像；通过初始全卷积神经网络提取所述训练样本图像的特征信息，得到第二特征信息；所述第二特征信息包括以下至少之一：第二字符分布概率、第二路径转移概率和第二初始路径概率，所述第二字符分布概率为所述训练样本图像的第二二维空间特征分布中各个特征点属于第二文字序列中的字符的概率，所述第二路径转移概率表示在第二二维空间特征分布中高度维度上的路径选择概率；所述第二初始路径概率表示第二二维空间特征分布的各个特征点为第二路径上的起始特征点的概率，所述第二路径为在第二二维空间特征分布中预测出的能够对齐到第二文字序列的有效路径；利用所述二维CTC模型对所述训练样本图像的第二特征信息进行处理，得到目标损失函数；通过所述目标损失函数训练所述初始全卷积神经网络，得到所述全卷积神经网络。Optionally, the device is further configured to: acquire a training sample image; extract feature information of the training sample image through an initial full convolutional neural network to obtain second feature information; the second feature information includes at least one of the following : second character distribution probability, second path transition probability and second initial path probability, the second character distribution probability is that each feature point in the second two-dimensional spatial feature distribution of the training sample image belongs to the second character sequence The probability of the character, the second path transition probability represents the path selection probability in the height dimension in the second two-dimensional spatial feature distribution; the second initial path probability represents that each feature point of the second two-dimensional spatial feature distribution is The probability of the starting feature point on the second path, the second path is an effective path predicted in the second two-dimensional spatial feature distribution that can be aligned to the second text sequence; using the two-dimensional CTC model to The second feature information of the training sample image is processed to obtain a target loss function; the initial full convolutional neural network is trained by the target loss function to obtain the full convolutional neural network.

可选地，所述装置还用于：利用所述二维CTC模型对所述第二特征信息进行处理，得到第二路径的条件概率；基于所述第二路径的条件概率确定所述目标损失函数。Optionally, the device is further configured to: use the two-dimensional CTC model to process the second feature information to obtain a conditional probability of the second path; determine the target loss based on the conditional probability of the second path function.

可选地，所述装置还用于：结合动态规划算法和所述第二特征信息，计算得到目标条件概率β_s,h,w，其中，β_s,h,w表示从第二二维空间特征分布的位置(h,w)上到达第二文字序列中位于第s个位置的字符的所有子路径的概率和，所述第二二维空间特征分布为所述训练样本图像的空间特征分布；利用所述目标条件概率β_s,h,w计算所述第二路径的条件概率。Optionally, the device is further configured to: combine the dynamic programming algorithm and the second feature information to calculate and obtain the target conditional probability β_s,h,w , where β_s,h,w represents the data obtained from the second two-dimensional space The probability sum of all sub-paths reaching the character at the s-th position in the second character sequence at the position (h, w) of the feature distribution, where the second two-dimensional spatial feature distribution is the spatial feature distribution of the training sample image ; Calculate the conditional probability of the second path using the target conditional probability β_s,h,w .

可选地，所述装置还用于：利用目标公式计算所述目标条件概率β_s,h,w，所述目标公式表示为：

Optionally, the device is further configured to: calculate the target conditional probability β_s,h,w by using a target formula, and the target formula is expressed as:

其中，

in,

Ψ_j,w-1,h表示第二路径转移概率，表示从所述第二二维空间特征分布中的特征点(j，w-1)到所述第二二维空间特征分布中的特征点(h，w)的转移概率，j表示所述第二二维空间特征分布中的一个高度序号，Y^*和X'分别表示所述第二文字序列扩展后的标注文字序列和所述第二二维空间特征分布，s表示Y^*中字符的序号，h表示所述第二二维空间特征分布中的另一个高度坐标，w表示所述第二二维空间特征分布中的宽度坐标，h∈[1,2,…H],w∈[1,2,…,W-1]，H表示所述第二二维空间特征分布中的高度信息，W表示所述第二二维空间特征分布中的宽度信息；

属于所述第二字符分布概率，表示在位置(h,w)处的特征点属于第二文字序列中的字符Y_s^*的概率；Ψ_j,0,h是根据所述第二初始路径概率Ψ_j,-1,h计算得到的。

Ψ_j,w-1,h represents the second path transition probability, representing the feature from the feature point (j, w-1) in the second two-dimensional spatial feature distribution to the feature in the second two-dimensional spatial feature distribution The transition probability of point (h, w), j represents a height sequence number in the second two-dimensional spatial feature distribution, Y^* and X' represent the expanded label text sequence of the second text sequence and the first text sequence respectively. Two-dimensional spatial feature distribution, s represents the serial number of the character in Y^* , h represents another height coordinate in the second two-dimensional spatial feature distribution, w represents the width coordinate in the second two-dimensional spatial feature distribution, h∈[1,2,…H],w∈[1,2,…,W-1], H denotes the height information in the feature distribution of the second two-dimensional space, and W denotes the second two-dimensional space width information in the feature distribution;

The probability of belonging to the second character distribution represents the probability that the feature point at the position (h, w) belongs to the character Y_s^* in the second character sequence; Ψ_{j, 0, h} is the probability according to the second initial path Ψ_j,-1,h is calculated.

可选地，所述装置还用于：利用公式Loss＝-lnP(Y/X')确定所述目标损失函数，其中，P(Y/X')为所述第二路径的条件概率，Loss为所述目标损失函数。Optionally, the apparatus is further configured to: determine the target loss function by using the formula Loss=-lnP(Y/X'), where P(Y/X') is the conditional probability of the second path, Loss is the target loss function.

本申请还提供了一种计算机可读存储介质，所述计算机可读存储介质上存储有计算机程序，所述计算机程序被处理器运行时执行上述方法实施例中任一实施例所述的方法的步骤。The present application further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, a method described in any one of the foregoing method embodiments is executed. step.

本实施例所提供的装置，其实现原理及产生的技术效果和前述实施例相同，为简要描述，装置实施例部分未提及之处，可参考前述方法实施例中相应内容。The implementation principle and the technical effects of the device provided in this embodiment are the same as those in the foregoing embodiments. For brief description, for the parts not mentioned in the device embodiment, reference may be made to the corresponding content in the foregoing method embodiments.

此外，本实施例提供了一种处理设备，该设备包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序，处理器执行计算机程序时实现上述实施例提供的姿势识别方法。In addition, this embodiment provides a processing device, the device includes a memory, a processor, and a computer program stored in the memory and running on the processor, and the processor implements the gesture recognition method provided by the above embodiments when the computer program is executed. .

所属领域的技术人员可以清楚地了解到，为描述的方便和简洁，上述描述的系统具体工作过程，可以参考前述实施例中的对应过程，在此不再赘述。Those skilled in the art can clearly understand that, for the convenience and brevity of description, for the specific working process of the system described above, reference may be made to the corresponding process in the foregoing embodiments, and details are not repeated here.

本发明实施例所提供的一种文字识别方法、装置、电子设备和存储了程序代码的计算机可读存储介质，所述程序代码包括的指令可用于执行前面方法实施例中所述的方法，具体实现可参见方法实施例，在此不再赘述。所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时，可以存储在一个计算机可读取存储介质中。基于这样的理解，本发明的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来，该计算机软件产品存储在一个存储介质中，包括若干指令用以使得一台计算机设备(可以是个人计算机，服务器，或者网络设备等)执行本发明各个实施例所述方法的全部或部分步骤。而前述的存储介质包括：U盘、移动硬盘、只读存储器(ROM，Read-Only Memory)、随机存取存储器(RAM，Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。Embodiments of the present invention provide a character recognition method, apparatus, electronic device, and computer-readable storage medium storing program codes. The instructions included in the program codes can be used to execute the methods described in the foregoing method embodiments. Specifically, For implementation, reference may be made to the method embodiments, which will not be repeated here. The functions, if implemented in the form of software functional units and sold or used as independent products, may be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present invention can be embodied in the form of a software product in essence, or the part that contributes to the prior art or the part of the technical solution. The computer software product is stored in a storage medium, including Several instructions are used to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present invention. The aforementioned storage medium includes: U disk, mobile hard disk, Read-Only Memory (ROM, Read-Only Memory), Random Access Memory (RAM, Random Access Memory), magnetic disk or optical disk and other media that can store program codes .

最后应说明的是：以上所述实施例，仅为本发明的具体实施方式，用以说明本发明的技术方案，而非对齐限制，本发明的保护范围并不局限于此，尽管参照前述实施例对本发明进行了详细的说明，本领域的普通技术人员应当理解：任何熟悉本技术领域的技术人员在本发明揭露的技术范围内，其依然可以对前述实施例所记载的技术方案进行修改或可轻易想到变化，或者对齐中部分技术特征进行等同替换；而这些修改、变化或者替换，并不使相应技术方案的本质脱离本发明实施例技术方案的精神和范围，都应涵盖在本发明的保护范围之内。因此，本发明的保护范围应以所述权利要求的保护范围为准。Finally, it should be noted that the above-mentioned embodiments are only specific implementations of the present invention, and are used to illustrate the technical solutions of the present invention, not to limit the alignment. The present invention has been described in detail by the examples, and those of ordinary skill in the art should understand that: any person skilled in the art can still modify or modify the technical solutions described in the foregoing embodiments within the technical scope disclosed by the present invention. Changes can be easily imagined, or equivalent replacements are made for some technical features in the alignment; and these modifications, changes or replacements do not make the essence of the corresponding technical solutions deviate from the spirit and scope of the technical solutions of the embodiments of the present invention, and should be included in the present invention. within the scope of protection. Therefore, the protection scope of the present invention should be based on the protection scope of the claims.

Claims

Translated fromChinese

1.一种文字识别方法，其特征在于，包括：1. a character recognition method, is characterized in that, comprises:

获取待检测图像，并通过采用二维CTC模型训练之后的全卷积神经网络提取所述待检测图像的特征信息，得到第一特征信息；Obtaining the image to be detected, and extracting the feature information of the image to be detected by adopting the full convolutional neural network after the training of the two-dimensional CTC model, to obtain the first feature information;

其中，所述第一特征信息包括：第一字符分布概率、第一路径转移概率和第一初始路径概率；所述第一字符分布概率为所述待检测图像的第一二维空间特征分布中各个特征点属于第一文字序列的概率，所述第一路径转移概率表示在第一二维空间特征分布中高度维度上的路径选择概率；所述第一初始路径概率表示第一二维空间特征分布的各个特征点为第一路径上的起始特征点的概率；所述第一路径为在第一二维空间特征分布中预测出的能够对齐到第一文字序列的路径；The first feature information includes: a first character distribution probability, a first path transition probability, and a first initial path probability; the first character distribution probability is the first two-dimensional spatial feature distribution of the image to be detected. The probability that each feature point belongs to the first text sequence, the first path transition probability represents the path selection probability in the height dimension in the first two-dimensional spatial feature distribution; the first initial path probability represents the first two-dimensional spatial feature distribution The probability that each feature point of is a starting feature point on the first path; the first path is a path predicted in the first two-dimensional spatial feature distribution that can be aligned to the first text sequence;

利用所述待检测图像的第一特征信息确定所述待检测图像中的所述第一文字序列。The first character sequence in the to-be-detected image is determined by using the first feature information of the to-be-detected image.

2.根据权利要求1所述的方法，其特征在于，所述全卷积神经网络包括：第一卷积网络、金字塔池化模块和第二卷积网络。2. The method according to claim 1, wherein the fully convolutional neural network comprises: a first convolutional network, a pyramid pooling module and a second convolutional network.

3.根据权利要求2所述的方法，其特征在于，所述第一卷积网络为残差卷积神经网络，所述残差卷积神经网络中包括多个卷积模块，且所述多个卷积模块中的部分卷积模块包含空洞卷积层。3. The method according to claim 2, wherein the first convolutional network is a residual convolutional neural network, the residual convolutional neural network comprises multiple convolution modules, and the multiple Some of the convolutional modules in the convolutional modules contain atrous convolutional layers.

4.根据权利要求2或3所述的方法，其特征在于，通过采用二维CTC模型训练之后的全卷积神经网络提取所述待检测图像的特征信息，得到第一特征信息包括：4. method according to claim 2 or 3, is characterized in that, by adopting the fully convolutional neural network after two-dimensional CTC model training to extract the characteristic information of described to-be-detected image, obtains the first characteristic information comprises:

利用所述第一卷积网络对所述待检测图像进行特征提取，得到第一卷积特征信息；Using the first convolutional network to perform feature extraction on the to-be-detected image to obtain first convolutional feature information;

利用所述金字塔池化模块对所述第一卷积特征信息进行池化计算，得到不同尺度的池化特征，并对所述不同尺度的池化特征进行级联处理，得到池化特征信息；Use the pyramid pooling module to perform pooling calculation on the first convolution feature information to obtain pooled features of different scales, and perform cascade processing on the pooled features of different scales to obtain pooled feature information;

利用所述第二卷积网络对所述池化特征信息进行卷积计算，得到所述待检测图像的第一特征信息。Convolution calculation is performed on the pooled feature information by using the second convolution network to obtain the first feature information of the image to be detected.

5.根据权利要求1所述的方法，其特征在于，所述方法还包括：5. The method according to claim 1, wherein the method further comprises:

获取训练样本图像；Get training sample images;

通过初始全卷积神经网络提取所述训练样本图像的特征信息，得到第二特征信息；所述第二特征信息包括以下至少之一：第二字符分布概率、第二路径转移概率和第二初始路径概率，所述第二字符分布概率为所述训练样本图像的第二二维空间特征分布中各个特征点属于第二文字序列中的字符的概率，所述第二路径转移概率表示在第二二维空间特征分布中高度维度上的路径选择概率；所述第二初始路径概率表示第二二维空间特征分布的各个特征点为第二路径上的起始特征点的概率，所述第二路径为在第二二维空间特征分布中预测出的能够对齐到第二文字序列的有效路径；The feature information of the training sample image is extracted through an initial full convolutional neural network to obtain second feature information; the second feature information includes at least one of the following: a second character distribution probability, a second path transition probability, and a second initial Path probability, the second character distribution probability is the probability that each feature point in the second two-dimensional spatial feature distribution of the training sample image belongs to the character in the second character sequence, and the second path transition probability is expressed in the second The path selection probability on the height dimension in the two-dimensional spatial feature distribution; the second initial path probability represents the probability that each feature point of the second two-dimensional spatial feature distribution is the starting feature point on the second path, the second initial path probability The path is an effective path predicted in the second two-dimensional spatial feature distribution that can be aligned to the second text sequence;

利用所述二维CTC模型对所述训练样本图像的第二特征信息进行处理，得到目标损失函数；Use the two-dimensional CTC model to process the second feature information of the training sample image to obtain a target loss function;

通过所述目标损失函数训练所述初始全卷积神经网络，得到所述全卷积神经网络。The initial full convolutional neural network is trained by the target loss function to obtain the full convolutional neural network.

6.根据权利要求5所述的方法，其特征在于，利用所述二维CTC模型对所述训练样本图像的第二特征信息进行处理，得到目标损失函数包括：6. The method according to claim 5, characterized in that, using the two-dimensional CTC model to process the second feature information of the training sample image to obtain a target loss function comprising:

利用所述二维CTC模型对所述第二特征信息进行处理，得到第二路径的条件概率；Use the two-dimensional CTC model to process the second feature information to obtain the conditional probability of the second path;

基于所述第二路径的条件概率确定所述目标损失函数。The objective loss function is determined based on the conditional probability of the second path.

7.根据权利要求6所述的方法，其特征在于，利用所述二维CTC模型对所述第二特征信息进行处理，得到第二路径的条件概率包括：7. The method according to claim 6, characterized in that, using the two-dimensional CTC model to process the second feature information to obtain the conditional probability of the second path comprising:

结合动态规划算法和所述第二特征信息，计算得到目标条件概率β_s，，w，其中，β_s，，w表示从第二二维空间特征分布的位置(h，w)上到达第二文字序列中位于第s个位置的字符的所有子路径的概率和，所述第二二维空间特征分布为所述训练样本图像的空间特征分布；Combining the dynamic programming algorithm and the second feature information, the target conditional probability β_s,,w is calculated and obtained, wherein, β_s,,w represents the arrival from the position (h, w) of the second two-dimensional spatial feature distribution to the second The probability sum of all sub-paths of the character at the s-th position in the text sequence, and the second two-dimensional spatial feature distribution is the spatial feature distribution of the training sample image;

利用所述目标条件概率β_s，，w计算所述第二路径的条件概率。The conditional probability of the second path is calculated using the target conditional probability βs_,,w .

8.根据权利要求7所述的方法，其特征在于，结合动态规划算法和所述第二特征信息，计算得到目标条件概率包括：8. The method according to claim 7, wherein, in combination with a dynamic programming algorithm and the second feature information, calculating the target conditional probability comprises:

利用目标公式计算所述目标条件概率β_s，h，w，所述目标公式表示为：The target conditional probability β_{s, h, w} is calculated using the target formula, and the target formula is expressed as:

其中，

in,

Ψ_j，w-1，h表示所述第二路径转移概率，表示从所述第二二维空间特征分布中的特征点(j，w-1)到所述第二二维空间特征分布中特征点(h，w)的转移概率；j表示所述第二二维空间特征分布中的一个高度坐标，Y^*和X′分别表示第二文字序列扩展后的标注文字序列和所述第二二维空间特征分布，s表示Y^*中字符的序号，h表示所述第二二维空间特征分布中的另一个高度坐标，w表示所述第二二维空间特征分布中的宽度坐标；h∈[1，2，...H]，w∈[1，2，...，W-1]，H表示所述第二二维空间特征分布中的高度信息，W表示所述第二二维空间特征分布中的宽度信息；

属于所述第二字符分布概率，表示在位置(h，w)处的特征点属于第二文字序列中的字符的概率；Ψ_j，0，h是根据所述第二初始路径概率Ψ_j，-1，计算得到的。Ψ_{j, w-1, h} represents the transition probability of the second path, from the feature point (j, w-1) in the second two-dimensional spatial feature distribution to the second two-dimensional spatial feature distribution The transition probability of the feature point (h, w); j represents a height coordinate in the second two-dimensional spatial feature distribution, and Y^* and X′ represent the expanded labeled text sequence of the second text sequence and the second text sequence respectively. Two-dimensional spatial feature distribution, s represents the serial number of the characters in Y^* , h represents another height coordinate in the second two-dimensional spatial feature distribution, w represents the width coordinate in the second two-dimensional spatial feature distribution; h ∈[1, 2,...H], w∈[1,2,...,W-1], H represents the height information in the second two-dimensional spatial feature distribution, W represents the second Width information in two-dimensional spatial feature distribution;

The probability of belonging to the second character distribution, indicating the probability that the feature point at the position (h, w) belongs to the character in the second character sequence; Ψ_{j, 0, h} is the probability Ψ_{j according to the second initial path, -1} , calculated.

9.根据权利要求6所述的方法，其特征在于，基于所述第二路径的条件概率确定所述目标损失函数包括：9. The method of claim 6, wherein determining the objective loss function based on the conditional probability of the second path comprises:

利用公式Loss＝-ln P(Y/X′)确定所述目标损失函数，其中，P(Y/X′)为所述第二路径的条件概率，Loss为所述目标损失函数。The objective loss function is determined by using the formula Loss=-ln P(Y/X'), where P(Y/X') is the conditional probability of the second path, and Loss is the objective loss function.

10.一种文字识别装置，其特征在于，包括：10. A character recognition device, comprising:

获取单元，用于获取待检测图像；an acquisition unit for acquiring the image to be detected;

提取单元，用于通过采用二维CTC模型训练之后的全卷积神经网络提取所述待检测图像的特征信息，得到第一特征信息；an extraction unit, used for extracting the feature information of the image to be detected by adopting the full convolutional neural network after the training of the two-dimensional CTC model to obtain the first feature information;

其中，所述第一特征信息包括：第一字符分布概率、第一路径转移概率和第一初始路径概率；所述第一字符分布概率为所述待检测图像的第一二维空间特征分布中各个特征点属于第一文字序列的概率，所述第一路径转移概率表示在第一二维空间特征分布中高度维度上的路径选择概率；所述第一初始路径概率表示第一二维空间特征分布的各个特征点为第一路径上的起始特征点的概率，所述第一路径为在第一二维空间特征分布中预测出的能够对齐到第一文字序列的路径；The first feature information includes: a first character distribution probability, a first path transition probability, and a first initial path probability; the first character distribution probability is the first two-dimensional spatial feature distribution of the image to be detected. The probability that each feature point belongs to the first text sequence, the first path transition probability represents the path selection probability in the height dimension in the first two-dimensional spatial feature distribution; the first initial path probability represents the first two-dimensional spatial feature distribution The probability that each feature point of is a starting feature point on the first path, and the first path is a path predicted in the first two-dimensional spatial feature distribution that can be aligned to the first text sequence;

确定单元，用于利用所述待检测图像的第一特征信息确定所述待检测图像中的所述第一文字序列。A determining unit, configured to determine the first character sequence in the image to be detected by using the first feature information of the image to be detected.

11.一种电子设备，包括存储器、处理器及存储在所述存储器上并可在所述处理器上运行的计算机程序，其特征在于，所述处理器执行所述计算机程序时实现上述权利要求1至9中任一项所述的方法的步骤。11. An electronic device, comprising a memory, a processor and a computer program stored on the memory and running on the processor, wherein the processor implements the above claims when executing the computer program The steps of any one of 1 to 9.

12.一种计算机可读存储介质，所述计算机可读存储介质上存储有计算机程序，其特征在于，所述计算机程序被处理器运行时执行上述权利要求1至9任一项所述的方法的步骤。12. A computer-readable storage medium on which a computer program is stored, wherein the computer program executes the method according to any one of claims 1 to 9 when the computer program is run by a processor A step of.