技术领域technical field
本发明涉及图像处理的技术领域,尤其是涉及一种文字识别方法、装置、电子设备和计算机可读存储介质。The present invention relates to the technical field of image processing, in particular to a character recognition method, device, electronic equipment and computer-readable storage medium.
背景技术Background technique
自然场景中文字的识别,以下简称场景文字识别,是指对自然场景图片中的文字利用计算机算法识别其内容的技术,被广泛运用在自动驾驶、视障辅助、身份认证等多个领域。不同于扫描文件中的文字识别,自然场景中的文字识别面临更大的挑战:复杂的自然背景,不确定的文字方向和排列和大量的颜色变化等,这些都让自然场景中的文字识别的识别精度和实现难度远高于扫描文件的识别。Text recognition in natural scenes, hereinafter referred to as scene text recognition, refers to the technology of using computer algorithms to identify the content of text in natural scene pictures. It is widely used in many fields such as automatic driving, visually impaired assistance, and identity authentication. Different from text recognition in scanned documents, text recognition in natural scenes faces greater challenges: complex natural backgrounds, uncertain text directions and arrangements, and a large number of color changes, etc., all of which make text recognition in natural scenes difficult. The recognition accuracy and implementation difficulty are much higher than the recognition of scanned documents.
在现有技术中,广泛使用的基于图像的序列识别方法是基于注意力的模型。在这些注意力模型中,通常使用带有注意力机制的循环神经网络来产生序列预测。具体来说,即在每一个时间步骤使用注意力机制聚焦到一个字符区域,从而产生一个字符预测。基于这种框架的模型本质上也是一个每帧输出的算法,注意力机制提供了一种特征表示和序列预测之间的对齐方式。不过这种模型通常会面临比较严重的注意力偏移的问题:由于上一步的输出和隐状态直接参与下一步预测的计算,序列前面的错误预测往往会导致后续的注意力区域偏移进而带来连续的错误识别。In the state-of-the-art, widely used image-based sequence recognition methods are attention-based models. Among these attention models, a recurrent neural network with an attention mechanism is usually used to generate sequence predictions. Specifically, the attention mechanism is used to focus on a character region at each time step, resulting in a character prediction. A model based on this framework is essentially a per-frame output algorithm, and the attention mechanism provides an alignment between feature representation and sequence prediction. However, this kind of model usually faces the problem of serious attention shift: since the output of the previous step and the hidden state directly participate in the calculation of the next prediction, the wrong prediction in front of the sequence will often lead to the shift of the subsequent attention area and bring to continuous error recognition.
发明内容Contents of the invention
有鉴于此,本发明的目的在于提供一种文字识别方法、装置、电子设备和计算机可读存储介质,以缓解了现有的图像序列识别方法由于出现注意力偏移导致的序列预测准确度低的技术问题。In view of this, the purpose of the present invention is to provide a character recognition method, device, electronic equipment and computer-readable storage medium, to alleviate the low sequence prediction accuracy caused by attention shift in the existing image sequence recognition method technical problems.
第一方面,本发明实施例提供了一种文字识别方法,包括:获取待检测图像,并通过采用二维CTC模型训练之后的全卷积神经网络提取所述待检测图像的特征信息,得到第一特征信息;其中,所述第一特征信息包括以下至少之一:第一字符分布概率、第一路径转移概率和第一初始路径概率;所述第一字符分布概率为所述待检测图像的第一二维空间特征分布中各个特征点属于第一文字序列的概率,所述第一路径转移概率表示在第一二维空间特征分布中高度维度上的路径选择概率;所述第一初始路径概率表示第一二维空间特征分布的各个特征点为第一路径上的起始特征点的概率,所述第一路径为在第一二维空间特征分布中预测出的能够对齐到第一文字序列的路径;利用所述待检测图像的第一特征信息确定所述待检测图像中的所述第一文字序列。In the first aspect, an embodiment of the present invention provides a method for character recognition, including: acquiring an image to be detected, and extracting feature information of the image to be detected by using a fully convolutional neural network trained by a two-dimensional CTC model to obtain the first A feature information; wherein, the first feature information includes at least one of the following: a first character distribution probability, a first path transition probability, and a first initial path probability; the first character distribution probability is the The probability that each feature point in the first two-dimensional spatial feature distribution belongs to the first character sequence, the first path transition probability represents the path selection probability on the height dimension in the first two-dimensional spatial feature distribution; the first initial path probability Represents the probability that each feature point of the first two-dimensional spatial feature distribution is the starting feature point on the first path, and the first path is the predicted in the first two-dimensional spatial feature distribution that can be aligned to the first character sequence Path: using the first characteristic information of the image to be detected to determine the first character sequence in the image to be detected.
进一步地,所述全卷积神经网络包括:第一卷积网络、金字塔池化模块和第二卷积网络。Further, the fully convolutional neural network includes: a first convolutional network, a pyramid pooling module and a second convolutional network.
进一步地,所述第一卷积网络为残差卷积神经网络,所述残差卷积神经网络中包括多个卷积模块,且所述多个卷积模块中的部分卷积模块包含空洞卷积层。Further, the first convolutional network is a residual convolutional neural network, the residual convolutional neural network includes a plurality of convolutional modules, and some of the convolutional modules in the plurality of convolutional modules contain holes convolutional layer.
进一步地,通过采用二维CTC模型训练之后的全卷积神经网络提取所述待检测图像的特征信息,得到第一特征信息包括:利用所述第一卷积网络对所述待检测图像进行特征提取,得到第一卷积特征信息;利用所述金字塔池化模块对所述第一卷积特征信息进行池化计算,得到不同尺度的池化特征,并对所述不同尺度的池化特征进行级联处理,得到池化特征信息;利用所述第二卷积网络对所述池化特征信息进行卷积计算,得到所述待检测图像的第一特征信息。Further, extracting the feature information of the image to be detected by using the fully convolutional neural network trained by the two-dimensional CTC model, and obtaining the first feature information includes: using the first convolutional network to characterize the image to be detected Extract to obtain the first convolutional feature information; use the pyramid pooling module to perform pooling calculation on the first convolutional feature information to obtain pooling features of different scales, and perform pooling on the pooling features of different scales Cascade processing to obtain pooled feature information; using the second convolutional network to perform convolution calculation on the pooled feature information to obtain first feature information of the image to be detected.
进一步地,所述方法还包括:获取训练样本图像;通过初始全卷积神经网络提取所述训练样本图像的特征信息,得到第二特征信息;所述第二特征信息包括以下至少之一:第二字符分布概率、第二路径转移概率和第二初始路径概率,所述第二字符分布概率为所述训练样本图像的第二二维空间特征分布中各个特征点属于第二文字序列中的字符的概率,所述第二路径转移概率表示在第二二维空间特征分布中高度维度上的路径选择概率;所述第二初始路径概率表示第二二维空间特征分布的各个特征点为第二路径上的起始特征点的概率,所述第二路径为在第二二维空间特征分布中预测出的能够对齐到第二文字序列的有效路径;利用所述二维CTC模型对所述训练样本图像的第二特征信息进行处理,得到目标损失函数;通过所述目标损失函数训练所述初始全卷积神经网络,得到所述全卷积神经网络。Further, the method further includes: acquiring a training sample image; extracting feature information of the training sample image through an initial fully convolutional neural network to obtain second feature information; the second feature information includes at least one of the following: Two character distribution probability, the second path transition probability and the second initial path probability, the second character distribution probability is that each feature point in the second two-dimensional space feature distribution of the training sample image belongs to the character in the second character sequence The probability of the second path transfer probability represents the path selection probability on the height dimension in the second two-dimensional spatial feature distribution; the second initial path probability represents that each feature point of the second two-dimensional spatial feature distribution is the second The probability of the starting feature point on the path, the second path is an effective path that can be aligned to the second character sequence predicted in the second two-dimensional spatial feature distribution; utilize the two-dimensional CTC model for the training The second feature information of the sample image is processed to obtain a target loss function; the initial fully convolutional neural network is trained through the target loss function to obtain the fully convolutional neural network.
进一步地,利用所述二维CTC模型对所述训练样本图像的第二特征信息进行处理,得到目标损失函数包括:利用所述二维CTC模型对所述第二特征信息进行处理,得到第二路径的条件概率;基于所述第二路径的条件概率确定所述目标损失函数。Further, using the two-dimensional CTC model to process the second feature information of the training sample image to obtain the target loss function includes: using the two-dimensional CTC model to process the second feature information to obtain the second A conditional probability of a path; determining the target loss function based on the conditional probability of the second path.
进一步地,利用所述二维CTC模型对所述第二特征信息进行计算,得到第二路径的条件概率包括:结合动态规划算法和所述第二特征信息中的信息,计算得到目标条件概率βs,h,w,其中,βs,h,w表示从第二二维空间特征分布的位置(h,w)上到达第二文字序列中位于第s个位置的字符的所有子路径的概率和,所述第二二维空间特征分布为所述训练样本图像的空间特征分布;利用所述目标条件概率βs,h,w计算所述第二路径的条件概率。Further, using the two-dimensional CTC model to calculate the second feature information to obtain the conditional probability of the second path includes: combining the dynamic programming algorithm and the information in the second feature information to calculate the target conditional probability βs,h,w , where, βs,h,w represent the probability of all subpaths reaching the character at the sth position in the second text sequence from the position (h,w) of the second two-dimensional spatial feature distribution and, the second two-dimensional spatial feature distribution is the spatial feature distribution of the training sample image; using the target conditional probability βs,h,w to calculate the conditional probability of the second path.
进一步地,结合动态规划算法和所述第二特征信息中的信息,计算得到目标条件概率包括:利用目标公式计算所述目标条件概率βs,h,w,所述目标公式表示为:Further, combining the dynamic programming algorithm and the information in the second feature information, calculating the target conditional probability includes: calculating the target conditional probability βs,h,w by using a target formula, and the target formula is expressed as:
其中,in,
Ψj,w-1,h表示所述第二路径转移概率,表示从所述第二二维空间特征分布中的特征点(j,w-1)到所述第二二维空间特征分布中的特征点(h,w)的转移概率,j表示所述第二二维空间特征分布中的一个高度坐标,Y*和X'分别表示所述第二文字序列扩展后的标注文字序列和所述第二二维空间特征分布,s表示Y*中字符的序号,h表示所述第二二维空间特征分布中的另一个高度坐标,w表示所述第二二维空间特征分布中的宽度坐标;h∈[1,2,…H],w∈[1,2,…,W-1],H表示所述第二二维空间特征分布中的高度信息,W表示所述第二二维空间特征分布中的宽度信息;属于所述第二字符分布概率,表示在位置(h,w)处的特征点属于第二文字序列中的字符的概率;Ψj,0,h是根据所述第二初始路径概率Ψj,-1,h计算得到的。 Ψj,w-1,h represents the transition probability of the second path, which means that from the feature point (j,w-1) in the second two-dimensional space feature distribution to the second two-dimensional space feature distribution The transition probability of the feature point (h, w), j represents a height coordinate in the second two-dimensional spatial feature distribution, Y* and X' respectively represent the extended label text sequence and the second text sequence The second two-dimensional spatial feature distribution, s represents the serial number of characters in Y* , h represents another height coordinate in the second two-dimensional spatial feature distribution, and w represents the width in the second two-dimensional spatial feature distribution Coordinates; h∈[1,2,...H], w∈[1,2,...,W-1], H represents the height information in the second two-dimensional spatial feature distribution, W represents the second two Width information in the feature distribution of dimensional space; Belonging to the second character distribution probability, it means the probability that the feature point at the position (h, w) belongs to the character in the second character sequence; Ψj, 0, h is based on the second initial path probability Ψj, -1, calculated from h .
进一步地,基于所述第二路径的条件概率确定所述目标损失函数包括:利用公式确定所述目标损失函数,其中,为所述第二路径的条件概率,为所述目标损失函数。Further, determining the target loss function based on the conditional probability of the second path includes: using a formula to determine the target loss function, where is the conditional probability of the second path and is the target loss function.
第二方面,本发明实施例还提供了一种文字识别装置,包括:获取单元,用于获取待检测图像;提取单元,用于通过采用二维CTC模型训练之后的全卷积神经网络提取所述待检测图像的特征信息,得到第一特征信息;其中,所述第一特征信息包括以下至少之一:第一字符分布概率、第一路径转移概率和第一初始路径概率;所述第一字符分布概率为所述待检测图像的第一二维空间特征分布中各个特征点属于第一文字序列的概率,所述第一路径转移概率表示在第一二维空间特征分布中高度维度上的路径选择概率;所述第一初始路径概率表示第一二维空间特征分布的各个特征点为第一路径上的起始特征点的概率,所述第一路径为在第一二维空间特征分布中预测出的能够对齐到第一文字序列的路径;确定单元,用于利用所述待检测图像的第一特征信息确定所述待检测图像中的所述第一文字序列。In the second aspect, the embodiment of the present invention also provides a text recognition device, including: an acquisition unit, used to acquire the image to be detected; The feature information of the image to be detected is obtained to obtain the first feature information; wherein, the first feature information includes at least one of the following: a first character distribution probability, a first path transition probability, and a first initial path probability; the first The character distribution probability is the probability that each feature point in the first two-dimensional spatial feature distribution of the image to be detected belongs to the first character sequence, and the first path transition probability represents the path on the height dimension in the first two-dimensional spatial feature distribution Selection probability; the first initial path probability represents the probability that each feature point of the first two-dimensional spatial feature distribution is the initial feature point on the first path, and the first path is in the first two-dimensional space feature distribution A predicted path that can be aligned to the first character sequence; a determining unit configured to use the first feature information of the image to be detected to determine the first character sequence in the image to be detected.
第三方面,本发明实施例还提供了一种电子设备,包括存储器、处理器及存储在所述存储器上并可在所述处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现上述第一方面中任一项所述的方法的步骤。In a third aspect, an embodiment of the present invention also provides an electronic device, including a memory, a processor, and a computer program stored on the memory and operable on the processor, and the processor executes the computer program When implementing the steps of the method described in any one of the above-mentioned first aspects.
第四方面,本发明实施例还提供了一种计算机可读存储介质,所述计算机可读存储介质上存储有计算机程序,所述计算机程序被处理器运行时执行上述第一方面中任一项所述的方法的步骤。In a fourth aspect, an embodiment of the present invention also provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is run by a processor, any one of the above-mentioned first aspects is executed. The steps of the method.
在本发明实施例中,首先,获取待检测图像,并通过采用二维CTC模型训练之后的全卷积神经网络提取待检测图像的特征信息,得到第一特征信息,其中,第一特征信息包括以下至少之一:第一字符分布概率、第一路径转移概率和第一初始路径概率;最后,利用待检测图像的第一特征信息确定待检测图像中的所述第一文字序列。通过上述描述可知,在现有技术中,通过注意力模型来识别图像中的序列识别,但是这种模型通常会面临比较严重的注意力偏移的问题从而导致后续的注意力区域偏移进而带来连续的错误识别。然而,在本申请中,所选用的二维CTC模型在训练全卷积神经网络的过程中,保留了图像的第一特征信息,并基于该第一特征信息直接预测出文字序列。二维CTC模型保留图像的第一特征信息,并利用第一特征信息预测文字序列的方式提高了全卷积网络的识别精度,进而缓解了现有的图像序列识别方法由于出现注意力偏移导致的序列预测准确度低的技术问题。In the embodiment of the present invention, firstly, the image to be detected is obtained, and the feature information of the image to be detected is extracted by using the fully convolutional neural network trained by the two-dimensional CTC model to obtain the first feature information, wherein the first feature information includes At least one of the following: a first character distribution probability, a first path transition probability, and a first initial path probability; finally, using the first feature information of the image to be detected to determine the first character sequence in the image to be detected. From the above description, it can be known that in the prior art, the attention model is used to identify the sequence recognition in the image, but this kind of model usually faces the problem of serious attention shift, which leads to the subsequent shift of attention area and brings to continuous error recognition. However, in this application, the selected two-dimensional CTC model retains the first feature information of the image during the training process of the fully convolutional neural network, and directly predicts the text sequence based on the first feature information. The two-dimensional CTC model retains the first feature information of the image, and uses the first feature information to predict the text sequence to improve the recognition accuracy of the full convolutional network, thereby alleviating the existing image sequence recognition method due to attention shift. The technical problem of low sequence prediction accuracy.
本发明的其他特征和优点将在随后的说明书中阐述,并且,部分地从说明书中变得显而易见,或者通过实施本发明而了解。本发明的目的和其他优点在说明书、权利要求书以及附图中所特别指出的结构来实现和获得。Additional features and advantages of the invention will be set forth in the description which follows, and in part will be apparent from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.
为使本发明的上述目的、特征和优点能更明显易懂,下文特举较佳实施例,并配合所附附图,作详细说明如下。In order to make the above-mentioned objects, features and advantages of the present invention more comprehensible, preferred embodiments will be described in detail below together with the accompanying drawings.
附图说明Description of drawings
为了更清楚地说明本发明具体实施方式或现有技术中的技术方案,下面将对具体实施方式或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图是本发明的一些实施方式,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the specific implementation of the present invention or the technical solutions in the prior art, the following will briefly introduce the accompanying drawings that need to be used in the specific implementation or description of the prior art. Obviously, the accompanying drawings in the following description The drawings show some implementations of the present invention, and those skilled in the art can obtain other drawings based on these drawings without any creative work.
图1是根据本发明实施例的一种电子设备的结构示意图;FIG. 1 is a schematic structural diagram of an electronic device according to an embodiment of the present invention;
图2是根据本发明实施例的一种文字识别方法的流程图;Fig. 2 is a flow chart of a method for character recognition according to an embodiment of the present invention;
图3是根据本发明实施例的一种二维特征分布的结构示意图;Fig. 3 is a schematic structural diagram of a two-dimensional feature distribution according to an embodiment of the present invention;
图4是根据本发明实施例的一种二维特征分布的结构示意图中的子分布图;FIG. 4 is a sub-distribution diagram in a structural schematic diagram of a two-dimensional feature distribution according to an embodiment of the present invention;
图5是根据本发明实施例的一种全卷积神经网络结构示意图;Fig. 5 is a schematic diagram of a fully convolutional neural network structure according to an embodiment of the present invention;
图6是根据本发明实施例的一种预测序列的结构示意图;FIG. 6 is a schematic structural diagram of a prediction sequence according to an embodiment of the present invention;
图7是根据本发明实施例的一种文字识别装置的示意图。Fig. 7 is a schematic diagram of a character recognition device according to an embodiment of the present invention.
具体实施方式Detailed ways
为使本发明实施例的目的、技术方案和优点更加清楚,下面将结合附图对本发明的技术方案进行清楚、完整地描述,显然,所描述的实施例是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。In order to make the purpose, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions of the present invention will be clearly and completely described below in conjunction with the accompanying drawings. Obviously, the described embodiments are part of the embodiments of the present invention, not all of them. the embodiment. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.
实施例1:Example 1:
首先,参照图1来描述用于实现本发明实施例的文字识别方法的示例电子设备100。First, an example electronic device 100 for implementing the character recognition method of the embodiment of the present invention is described with reference to FIG. 1 .
如图1所示,电子设备100包括一个或多个处理器102以及一个或多个存储装置104。可选地,电子设备还可以包括输入装置106、输出装置108以及摄像机110,这些组件通过总线系统112和/或其它形式的连接机构(未示出)互连。应当注意,图1所示的电子设备100的组件和结构只是示例性的,而非限制性的,根据需要,所述电子设备也可以具有其他组件和结构。As shown in FIG. 1 , an electronic device 100 includes one or more processors 102 and one or more storage devices 104 . Optionally, the electronic device may further include an input device 106, an output device 108, and a camera 110, and these components are interconnected through a bus system 112 and/or other forms of connection mechanisms (not shown). It should be noted that the components and structure of the electronic device 100 shown in FIG. 1 are only exemplary rather than limiting, and the electronic device may also have other components and structures as required.
所述处理器102可以采用数字信号处理器(Digital Signal Processing,简称DSP)、现场可编程门阵列(Field-Programmable Gate Array,简称FPGA)、可编程逻辑阵列(Programmable logic arrays,简称PLA)和ASIC(Application Specific IntegratedCircuit)中的至少一种硬件形式来实现,所述处理器102可以是中央处理单元(CentralProcessing Unit,简称CPU)、图形处理单元(Graphics Processing Unit,GPU)或者具有数据处理能力和/或指令执行能力的其它形式的处理单元,并且可以控制所述电子设备100中的其它组件以执行期望的功能。The processor 102 may be a digital signal processor (Digital Signal Processing, referred to as DSP), a field programmable gate array (Field-Programmable Gate Array, referred to as FPGA), a programmable logic array (Programmable logic arrays, referred to as PLA) and ASIC (Application Specific Integrated Circuit), the processor 102 may be a central processing unit (Central Processing Unit, CPU for short), a graphics processing unit (Graphics Processing Unit, GPU) or have data processing capabilities and/or or other forms of processing units capable of executing instructions, and can control other components in the electronic device 100 to perform desired functions.
所述存储装置104可以包括一个或多个计算机程序产品,所述计算机程序产品可以包括各种形式的计算机可读存储介质,例如易失性存储器和/或非易失性存储器。所述易失性存储器例如可以包括随机存取存储器(RAM)和/或高速缓冲存储器(cache)等。所述非易失性存储器例如可以包括只读存储器(ROM)、硬盘、闪存等。在所述计算机可读存储介质上可以存储一个或多个计算机程序指令,处理器102可以运行所述程序指令,以实现下文所述的本发明实施例中(由处理器实现)的客户端功能以及/或者其它期望的功能。在所述计算机可读存储介质中还可以存储各种应用程序和各种数据,例如所述应用程序使用和/或产生的各种数据等。The storage device 104 may include one or more computer program products, which may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. The volatile memory may include, for example, random access memory (RAM) and/or cache memory (cache). The non-volatile memory may include, for example, a read-only memory (ROM), a hard disk, a flash memory, and the like. One or more computer program instructions can be stored on the computer-readable storage medium, and the processor 102 can execute the program instructions to realize the client functions (implemented by the processor) in the embodiments of the present invention described below and/or other desired functionality. Various application programs and various data, such as various data used and/or generated by the application programs, may also be stored in the computer-readable storage medium.
所述输入装置106可以是用户用来输入指令的装置,并且可以包括键盘、鼠标、麦克风和触摸屏等中的一个或多个。The input device 106 may be a device used by a user to input instructions, and may include one or more of a keyboard, a mouse, a microphone, and a touch screen.
所述输出装置108可以向外部(例如,用户)输出各种信息(例如,图像或声音),并且可以包括显示器、扬声器等中的一个或多个。The output device 108 may output various information (eg, images or sounds) to the outside (eg, a user), and may include one or more of a display, a speaker, and the like.
所述摄像机110用于进行获取待检测图像,其中,摄像机所获取的待处理图像经过所述文字识别方法进行处理之后得到待检测图像中的文字序列,例如,摄像机可以拍摄用户期望的图像(例如照片、视频等),然后,将该图像经过所述文字识别方法进行处理之后得到待检测图像中的文字序列,摄像机还可以将所拍摄的图像存储在所述存储器104中以供其它组件使用。The camera 110 is used to acquire an image to be detected, wherein the image to be processed acquired by the camera is processed by the character recognition method to obtain a character sequence in the image to be detected, for example, the camera can capture an image desired by the user (such as photo, video, etc.), then, the image is processed by the character recognition method to obtain the character sequence in the image to be detected, and the camera can also store the captured image in the memory 104 for use by other components.
示例性地,用于实现根据本发明实施例的文字识别方法的示例电子设备可以被实现为诸如智能手机、平板电脑等移动终端上。Exemplarily, an example electronic device for implementing the character recognition method according to the embodiment of the present invention may be implemented as a mobile terminal such as a smart phone or a tablet computer.
实施例2:Example 2:
根据本发明实施例,提供了一种文字识别方法的实施例,需要说明的是,在附图的流程图示出的步骤可以在诸如一组计算机可执行指令的计算机系统中执行,并且,虽然在流程图中示出了逻辑顺序,但是在某些情况下,可以以不同于此处的顺序执行所示出或描述的步骤。According to an embodiment of the present invention, an embodiment of a character recognition method is provided. It should be noted that the steps shown in the flowcharts of the drawings can be executed in a computer system such as a set of computer-executable instructions, and, although A logical order is shown in the flowcharts, but in some cases the steps shown or described may be performed in an order different from that shown or described herein.
图2是根据本发明实施例的一种文字识别方法的流程图,如图2所示,该方法包括如下步骤:Fig. 2 is a flow chart of a method for character recognition according to an embodiment of the present invention. As shown in Fig. 2, the method includes the following steps:
步骤S202,获取待检测图像。Step S202, acquiring an image to be detected.
在本实施例中,该待检测图像可以为上述实施例一所描述的电子设备中摄像机110拍摄到的图像,也可以是从其他电子设备中接收到的。In this embodiment, the image to be detected may be an image captured by the camera 110 in the electronic device described in Embodiment 1 above, or may be received from other electronic devices.
步骤S204,通过采用二维CTC模型训练之后的全卷积神经网络提取所述待检测图像的特征信息,得到第一特征信息。Step S204, extracting feature information of the image to be detected by using the fully convolutional neural network trained by the two-dimensional CTC model to obtain first feature information.
其中,所述第一特征信息包括以下至少之一:第一字符分布概率、第一路径转移概率和第一初始路径概率;所述第一字符分布概率为所述待检测图像的第一二维空间特征分布中各个特征点属于第一文字序列的概率,所述第一路径转移概率表示在第一二维空间特征分布中高度维度上的路径选择概率;所述第一初始路径概率表示第一二维空间特征分布的各个特征点为第一路径上的起始特征点的概率,所述第一路径为在第一二维空间特征分布中预测出的能够对齐到第一文字序列的路径。Wherein, the first feature information includes at least one of the following: a first character distribution probability, a first path transition probability, and a first initial path probability; the first character distribution probability is the first two-dimensional The probability that each feature point in the spatial feature distribution belongs to the first character sequence, the first path transition probability represents the path selection probability on the height dimension in the first two-dimensional spatial feature distribution; the first initial path probability represents the first two The probability that each feature point of the two-dimensional space feature distribution is a starting feature point on the first path, and the first path is a path that can be aligned to the first character sequence predicted in the first two-dimensional space feature distribution.
通过上述描述可知,在现有技术中,通过注意力的模型来识别图像中的文字序列。除此之外,发明人还想到,可以将连接时序分类(Connectionist TemporalClassification,CTC)应用到文字识别方法中。然而CTC模型最初是为语音识别而设计,由于待识别的语音信号为一维信号,因此,传统的CTC模型的处理公式所能够处理的信号为类似于语音信号的一维信号。对于基于图像的文字识别问题,则会产生图像二维特征和CTC模型需要一维分布的矛盾,因此将CTC模型直接应用在文字识别中可能损失重要的特征,并且引入额外的噪声。From the above description, it can be seen that in the prior art, a character sequence in an image is recognized through an attention model. In addition, the inventor also thought that the Connectionist Temporal Classification (CTC) can be applied to the text recognition method. However, the CTC model was originally designed for speech recognition. Since the speech signal to be recognized is a one-dimensional signal, the signal that can be processed by the traditional CTC model processing formula is a one-dimensional signal similar to the speech signal. For the image-based text recognition problem, there will be a contradiction between the two-dimensional features of the image and the one-dimensional distribution of the CTC model. Therefore, the direct application of the CTC model to text recognition may lose important features and introduce additional noise.
基于此,在本申请中,发明人对传统的CTC模型进行了拓展,提出了新的CTC模型(即,二维CTC模型),该二维CTC模型能够对图像的二维特征进行处理,使得图像的二维特征能够得到保留,并使得全卷积神经网络预测出更加准确的文字序列,其中,图像的二维特征可以表示为一个二维的矩阵,该矩阵中的每个向量用于表征图像中每个像素点的特征信息。Based on this, in this application, the inventor has extended the traditional CTC model, and proposed a new CTC model (ie, a two-dimensional CTC model), which can process the two-dimensional features of the image, so that The two-dimensional features of the image can be preserved and allow the fully convolutional neural network to predict a more accurate text sequence. The two-dimensional features of the image can be expressed as a two-dimensional matrix, and each vector in the matrix is used to represent The feature information of each pixel in the image.
通过上述描述可知,在本申请中,可以通过该全卷积神经网络对待检测图像进行特征提取,得到第一特征信息。其中,第一特征信息包括:第一字符分布概率、第一路径转移概率和第一初始路径概率。It can be seen from the above description that in this application, the fully convolutional neural network can be used to perform feature extraction on the image to be detected to obtain the first feature information. Wherein, the first feature information includes: a first character distribution probability, a first path transition probability, and a first initial path probability.
需要说明的是,在本实施例中,第一二维空间特征分布为待检测图像的特征分布,第一二维空间特征分布可以为如图3所示的分布结构。也就是说,在本申请中,待检测图像的二维空间特征分布可以为高度为H,宽度为W的特征分布结构。It should be noted that, in this embodiment, the first two-dimensional spatial feature distribution is the feature distribution of the image to be detected, and the first two-dimensional spatial feature distribution may be a distribution structure as shown in FIG. 3 . That is to say, in this application, the two-dimensional spatial feature distribution of the image to be detected may be a feature distribution structure with a height H and a width W.
在本实施例中,第一字符分布概率表示第一二维空间特征分布中的各个特征点包含第一文字序列中文字的概率。例如,若包含,则概率值设置为1,否则设置为0。第一路径转移概率表示在第一二维空间特征分布中高度维度上的路径选择概率,第一路径转移概率还可以理解为表示第一二维空间特征分布中各个特征点位于第一路径上的概率,其中,第一路径为预测出的能够对齐到第一文字序列的路径。第一初始路径概率表示第一二维空间特征分布的各个特征点为第一路径上的起始特征点的概率,其中,第一初始路径概率还可以理解为第一二维空间特征分布中的各个特征点的字符分布概率中最左位置的值。In this embodiment, the first character distribution probability represents the probability that each feature point in the first two-dimensional spatial feature distribution contains a character in the first character sequence. For example, the probability value is set to 1 if included, and 0 otherwise. The first path transition probability represents the path selection probability on the height dimension in the first two-dimensional spatial feature distribution, and the first path transition probability can also be understood as indicating that each feature point in the first two-dimensional spatial feature distribution is located on the first path probability, where the first path is a predicted path that can be aligned to the first character sequence. The first initial path probability represents the probability that each feature point of the first two-dimensional spatial feature distribution is the initial feature point on the first path, wherein the first initial path probability can also be understood as the first two-dimensional space feature distribution The value of the leftmost position in the character distribution probability of each feature point.
步骤S206,利用所述待检测图像的第一特征信息确定所述待检测图像中的所述第一文字序列。Step S206, using the first characteristic information of the image to be detected to determine the first character sequence in the image to be detected.
在本实施例中,在确定出上述第一特征信息之后,就可以结合第一特征信息确定所述待检测图像中所包含的文字序列(即,第一文字序列)。In this embodiment, after the above-mentioned first feature information is determined, the character sequence contained in the image to be detected (that is, the first character sequence) may be determined in combination with the first feature information.
在本实施例中,可以根据第一特征信息计算条件概率P(Y/X),其中,之后,可以利用贪心搜索(Greedy Search)或段搜索等方法寻找概率最大的路径,并将该概率最大的路径确定为第一文字序列,AX,Y为标注序列Y在预测分布X下所有可能的路径,t指X的长度。其中,贪心搜索(Greedy Search)方法的计算公式为:表示路径π上的所有字符的概率相乘之后的结果。In this embodiment, the conditional probability P(Y/X) can be calculated according to the first feature information, where, Afterwards, methods such asGreedy Search or segment search can be used to find the path with the highest probability, and the path with the highest probability can be determined as the first character sequence. Path, t refers to the length of X. Among them, the calculation formula of the Greedy Search method is: Indicates the result of multiplying the probabilities of all characters on the path π.
需要说明的是,在本实施例中,条件概率表示一条路径中所有字符概率相乘,表示全部路径AX,Y的概率乘积的总和。It should be noted that, in this embodiment, the conditional probability Represents the multiplication of all character probabilities in a path, Indicates the sum of the probability products of all paths AX, Y.
通过上述描述可知,在现有技术中,通过注意力模型来识别图像中的序列,但是这种模型通常会面临比较严重的注意力偏移的问题从而导致后续的注意力区域偏移进而带来连续的错误识别。然而,在本申请中,所选用的二维CTC模型在训练全卷积神经网络的过程中,保留了图像的第一特征信息,并基于该第一特征信息直接预测出文字序列。二维CTC模型保留图像的第一特征信息,并利用第一特征信息预测文字序列的方式提高了全卷积网络的识别精度,进而缓解了现有的图像序列识别方法由于出现注意力偏移导致的序列预测准确度低的技术问题。From the above description, it can be known that in the prior art, the attention model is used to identify the sequence in the image, but this kind of model usually faces the problem of serious attention shift, which leads to the subsequent shift of attention area and brings Continuous misidentification. However, in this application, the selected two-dimensional CTC model retains the first feature information of the image during the training process of the fully convolutional neural network, and directly predicts the text sequence based on the first feature information. The two-dimensional CTC model retains the first feature information of the image, and uses the first feature information to predict the text sequence to improve the recognition accuracy of the full convolutional network, thereby alleviating the existing image sequence recognition method due to attention shift. The technical problem of low sequence prediction accuracy.
进一步地,发明人想到可以结合CTC模型来识别图像中的序列,但是,传统的CTC模型的处理公式也只能处理一维信号。基于此,在本申请中,对传统的CTC模型进行了拓展,通过拓展之后的二维CTC模型对图像的二维特征进行处理,使得图像的二维特征能够得到保留,使得全卷积神经网络预测出更加准确的文字序列。Further, the inventor thought that the sequence in the image could be recognized by combining the CTC model, however, the processing formula of the traditional CTC model can only process one-dimensional signals. Based on this, in this application, the traditional CTC model is extended, and the two-dimensional features of the image are processed through the expanded two-dimensional CTC model, so that the two-dimensional features of the image can be preserved, making the fully convolutional neural network A more accurate word sequence is predicted.
通过上述描述可知,在本申请中,通过全卷积神经网络提取待检测图像的特征信息。It can be seen from the above description that in this application, the feature information of the image to be detected is extracted through a fully convolutional neural network.
在一个可选的实施方式中,所述全卷积神经网络包括:第一卷积网络、金字塔池化模块和第二卷积网络。在本实施例中,全卷积神经网络为类金字塔的结构。In an optional implementation manner, the fully convolutional neural network includes: a first convolutional network, a pyramid pooling module, and a second convolutional network. In this embodiment, the fully convolutional neural network has a pyramid-like structure.
在本申请中,第一卷积网络可以为多层残差卷积神经网络,例如,50层残差卷积神经网络。该多层残差卷积神经网络中包括多个卷积模块,且所述多个卷积模块中的部分卷积模块包含空洞卷积层。In this application, the first convolutional network may be a multi-layer residual convolutional neural network, for example, a 50-layer residual convolutional neural network. The multi-layer residual convolutional neural network includes a plurality of convolution modules, and some of the convolution modules in the plurality of convolution modules include dilated convolution layers.
需要说明的是,在本实施例中,多层残差卷积神经网络中包括多个阶段的卷积模块,多个阶段的卷积模块中部分卷积模块包括空洞卷积层。可选地,可以将多个阶段的卷积模块中最后两个阶段的卷积模块中设置空洞卷积层。除此之外,还可以在其他阶段的卷积模块中设置空洞卷积层,本实施例对此不作具体限定。It should be noted that, in this embodiment, the multi-layer residual convolutional neural network includes multiple stages of convolution modules, and some of the multi-stage convolution modules include atrous convolution layers. Optionally, dilated convolution layers may be set in the convolution modules of the last two stages among the convolution modules of multiple stages. In addition, dilated convolution layers may also be set in convolution modules of other stages, which is not specifically limited in this embodiment.
如图5所示的即为一种可选的全卷积神经网络的示意性结构图。在如图5所示的全卷积神经网络中,待检测图像依次经过第一卷积网络(即图中所示的多层残差卷积神经网络)、金字塔池化模块和第二卷积网络,最终得到待检测图像的特征信息,即第一特征信息。As shown in FIG. 5 is a schematic structural diagram of an optional fully convolutional neural network. In the fully convolutional neural network shown in Figure 5, the image to be detected sequentially passes through the first convolutional network (that is, the multi-layer residual convolutional neural network shown in the figure), the pyramid pooling module and the second convolutional network network, and finally obtain the feature information of the image to be detected, that is, the first feature information.
如图5所示,在本实施例中,第一卷积网络选择的是包含5个阶段的卷积模块的多层残差卷积神经网络(例如,50层残差卷积神经网络)。需要说明的是,本实施例中,在第四、第五这两个阶段的卷积模块中可以使用空洞卷积,以防止待检测图像的特征表示的分辨率过快地下降。经过数个阶段的卷积模块之后,待检测图像的特征表示获得了足够的感受野。与大部分分割模型一样,全卷积神经网络的计算算法使用了类金字塔结构,即在最后一层卷积之后,待检测图像的特征表示被平均池化到不同的尺寸,之后不同尺度的特征再被串联到一起,通过共享的卷积操作得到统一的特征。通过得到的特征,三种不同的输出再分别经过一层3x3和一层1x1的卷积得到最终的输出。As shown in FIG. 5 , in this embodiment, the first convolutional network is selected as a multi-layer residual convolutional neural network (for example, a 50-layer residual convolutional neural network) including convolution modules of 5 stages. It should be noted that, in this embodiment, atrous convolution may be used in the fourth and fifth stages of the convolution module to prevent the resolution of the feature representation of the image to be detected from decreasing too quickly. After several stages of convolution modules, the feature representation of the image to be detected has obtained a sufficient receptive field. Like most segmentation models, the calculation algorithm of the full convolutional neural network uses a pyramid-like structure, that is, after the last layer of convolution, the feature representation of the image to be detected is pooled to different sizes on average, and then the features of different scales They are then concatenated together to obtain unified features through shared convolution operations. Through the obtained features, three different outputs are respectively subjected to a layer of 3x3 and a layer of 1x1 convolution to obtain the final output.
需要说明的是,在本实施例中,第二卷积网络中可以包括两个卷积层,这两个卷积层的卷积核可以分别选择为:3x3的卷积核和1x1的卷积核,除此之外,还可以选择其他大小的卷积核,本实施例对此不作具体限定。It should be noted that, in this embodiment, the second convolutional network may include two convolutional layers, and the convolutional kernels of these two convolutional layers may be respectively selected as: a 3x3 convolutional kernel and a 1x1 convolutional layer In addition to the kernel, convolution kernels of other sizes can also be selected, which is not specifically limited in this embodiment.
基于此,在本实施例中,步骤S204,通过采用二维CTC模型训练之后的全卷积神经网络提取所述待检测图像的特征信息,得到第一特征信息包括如下步骤:Based on this, in this embodiment, in step S204, extracting the feature information of the image to be detected by using the fully convolutional neural network trained by the two-dimensional CTC model, and obtaining the first feature information includes the following steps:
步骤S2041,利用所述第一卷积网络对所述待检测图像进行特征提取,得到第一卷积特征信息;Step S2041, using the first convolutional network to perform feature extraction on the image to be detected to obtain first convolutional feature information;
步骤S2042,利用所述金字塔池化模块对所述第一卷积特征信息进行池化计算,得到不同尺度的池化特征,并对所述不同尺度的池化特征进行级联处理,得到池化特征信息;Step S2042, using the pyramid pooling module to perform pooling calculation on the first convolutional feature information to obtain pooled features of different scales, and perform cascading processing on the pooled features of different scales to obtain pooled characteristic information;
步骤S2043,利用所述第二卷积网络对所述池化特征信息进行卷积计算,得到所述待检测图像的第一特征信息。Step S2043, using the second convolutional network to perform convolution calculation on the pooled feature information to obtain the first feature information of the image to be detected.
具体地,在本实施例中,可以采用图5所示的全卷积神经网络中的50层残差卷积神经网络对待检测图像进行特征提取,得到第一卷积特征信息。由于在50层残差卷积神经网络的第4阶段和第5阶段中设置了空洞卷积,该空洞卷积能够防止待检测图像的特征表示的分辨率过快地下降,使得待检测图像的特征表示获得了足够的感受野。Specifically, in this embodiment, the 50-layer residual convolutional neural network in the full convolutional neural network shown in FIG. 5 can be used to extract features of the image to be detected to obtain the first convolutional feature information. Since the atrous convolution is set in the fourth and fifth stages of the 50-layer residual convolutional neural network, the atrous convolution can prevent the resolution of the feature representation of the image to be detected from decreasing too quickly, so that the image to be detected The feature indicates that a sufficient receptive field has been obtained.
在利用50层残差卷积神经网络得到第一卷积特征信息之后,就可以利用金字塔池化模块对第一卷积特征信息进行池化计算,得到的池化特征为多尺度的特征。在得到多尺度的池化特征之后,就可以对各个尺度的池化特征进行级联处理,得到池化特征信息。After the 50-layer residual convolutional neural network is used to obtain the first convolutional feature information, the pyramid pooling module can be used to perform pooling calculation on the first convolutional feature information, and the obtained pooled features are multi-scale features. After obtaining the multi-scale pooling features, the pooling features of each scale can be cascaded to obtain the pooling feature information.
在得到池化特征信息之后,就可以利用第二卷积网络对池化特征信息进行卷积计算,得到待检测图像的第一特征信息。若第二卷积网络中包括两个卷积层(即3x3的卷积层和1x1的卷积层),则可以利用3x3的卷积层和1x1的卷积层依次对池化特征信息进行卷积计算,得到待检测图像的第一特征信息。After obtaining the pooled feature information, the second convolutional network can be used to perform convolution calculation on the pooled feature information to obtain the first feature information of the image to be detected. If the second convolutional network includes two convolutional layers (ie, a 3x3 convolutional layer and a 1x1 convolutional layer), the pooled feature information can be sequentially convoluted using the 3x3 convolutional layer and the 1x1 convolutional layer. Product calculation to obtain the first feature information of the image to be detected.
在本实施例中,在通过采用二维CTC模型训练之后的全卷积神经网络提取所述待检测图像的特征信息之前,还可以利用二维CTC模型对初始全卷积神经网络进行训练,得到步骤S204中所描述的全卷积神经网络。In this embodiment, before extracting the feature information of the image to be detected by using the fully convolutional neural network trained by the two-dimensional CTC model, the two-dimensional CTC model can also be used to train the initial fully convolutional neural network to obtain The fully convolutional neural network described in step S204.
在介绍初始全卷积神经网络的训练过程之前,首先介绍传统的一维CTC。在传统一维CTC模型中引入了“∈”来描述序列中的空白,并通过在预测序列和标注序列中填补空白和重复来对二者进行对齐。其中,标注序列为图像中已标注的文字序列,预测序列该为图像预测出的可能为该文字序列的序列。在如图6所示的序列中,每行序列为预测序列。在该预测序列中,符号“□”表示“∈”,后续实施例中不再进行介绍。如图6所示,第1、3、4行预测序列可以被正确对齐为目标序列“FREE”,第二行的预测序列无法被对齐为目标序列“FREE”。在预测序列中对于指定位置i,i能被跳过当且仅当i处预测为∈或与上一步预测相同。例如,图6中的第一个预测序列“F□R E□E E E”,假设,i为该预测序列中的第2个字符“□”,那么该预测序列在进行对齐处理时,由于第2个字符为“□”,表示∈,因此,第2个字符可以被跳过。又例如,假设,i为该预测序列“F□R E□E E E”中的第7个字符“E”,那么该预测序列在进行对齐处理时,由于第7个字符与第6个字符相同,均为“E”,因此,第7个字符可以被跳过。同理,第8个字符与第7个字符相同,因此,第8个字符可以被跳过。最终,预测序列“F□R E□EE E”的对齐结果为“FREE”。当去除预测中所有可被跳过的位置后,即得到对齐的预测序列。Before introducing the training process of the initial fully convolutional neural network, the traditional one-dimensional CTC is first introduced. In the traditional one-dimensional CTC model, "∈" is introduced to describe the gaps in the sequence, and the two are aligned by filling the gaps and repetitions in the predicted sequence and the labeled sequence. Wherein, the labeling sequence is an annotated text sequence in the image, and the prediction sequence is a sequence predicted from the image that may be the text sequence. In the sequence shown in Figure 6, each row of the sequence is a predicted sequence. In the prediction sequence, the symbol "□" means "∈", which will not be introduced in the subsequent embodiments. As shown in Figure 6, the predicted sequence in rows 1, 3, and 4 can be correctly aligned as the target sequence "FREE", and the predicted sequence in the second row cannot be aligned as the target sequence "FREE". For a given position i in the prediction sequence, i can be skipped if and only if the prediction at i is ∈ or the same as the previous prediction. For example, for the first predicted sequence "F□R E□E E E" in Figure 6, assuming that i is the second character "□" in the predicted sequence, then when the predicted sequence is aligned, due to the second The character is "□", which means ∈, therefore, the second character can be skipped. For another example, assuming that i is the seventh character "E" in the predicted sequence "F□R E□E E E", then when the predicted sequence is aligned, since the seventh character is the same as the sixth character, both is "E", therefore, the 7th character can be skipped. Similarly, the 8th character is the same as the 7th character, therefore, the 8th character can be skipped. Finally, the alignment result of the predicted sequence "F□R E□EE E" is "FREE". After removing all the positions that can be skipped in the prediction, the aligned prediction sequence is obtained.
如上文所述,CTC模型通过计算标注在预测分布上的条件概率来衡量标注序列和预测序列的相似度。从定义出发,这个条件概率为:As mentioned above, the CTC model measures the similarity between the labeled sequence and the predicted sequence by calculating the conditional probability labeled on the predicted distribution. Starting from the definition, this conditional probability is:
具体地,Y和X分别为标注序列和预测分布,AX,Y为标注序列Y在预测分布X下所有可能的路径,t指X的长度。由于所有可能的路径是一个非常巨大的数量级,遍历地计算所有路径的概率并求和是非常低效的,因此,在本申请的实施例中可以使用动态规划来解决这类问题。Specifically, Y and X are the labeled sequence and the predicted distribution, respectively, AX, Y are all possible paths of the labeled sequence Y under the predicted distribution X, and t refers to the length of X. Since all possible paths are of a very large order of magnitude, it is very inefficient to iteratively calculate and sum the probabilities of all paths. Therefore, dynamic programming can be used in the embodiments of the present application to solve such problems.
首先,由于目标序列中各个符号前后是否带有∈的情况是等价的,对目标序列Y进行如下扩展以使描述更加清楚:Y*=[∈,y1,∈,y2,∈,…,yL,∈]。其中,Y*是扩展之后的目标序列,即在每个符号前后各插入一个∈,则原来长度为L的目标序列Y被扩展为长度为2L+1的Y*。First of all, since it is equivalent whether there are ∈ before and after each symbol in the target sequence, the target sequence Y is extended as follows to make the description clearer: Y* = [∈,y1 ,∈,y2 ,∈,… ,yL ,∈]. Among them, Y* is the extended target sequence, that is, one ∈ is inserted before and after each symbol, and the original target sequence Y with length L is extended to Y* with length 2L+1.
对于给定的s∈[1,2,…,2L+1],设Y*[1:s]为Y*的前s个字符,则定义αs,t为Y*[1:s]在时刻t的概率,该概率表示在t时刻到达序列Y*的第s个位置的所有可能子路径的概率和。For a given s∈[1,2,…,2L+1], let Y*[1:s] be the first s characters of Y*, then define αs,t as Y*[1:s] in The probability at time t, which represents the sum of the probabilities of all possible subpaths arriving at the s-th position of the sequence Y* at time t.
因此,对于第s-1个符号不可以被忽略的情况,即Ys*=∈或者的情况,αs,t满足以下公式:Therefore, for the case where the s-1th symbol cannot be ignored, that is, the case where Ys* = ∈ or, αs,t satisfies the following formula:
对于其他不可以忽略第s-1个符号的情况,即若Ys*≠∈且则αs,t可由如下公式计算:For other cases where the s-1th symbol cannot be ignored, that is, if Ys* ≠ ∈ and Then αs,t can be calculated by the following formula:
其中,Ys*表示扩展之后的目标序列中的第s个字符,表示扩展之后的目标序列中的第s-2个字符。Among them, Ys* represents the sth character in the target sequence after expansion, Indicates the s-2th character in the target sequence after expansion.
总结起来,CTC模型的动态规划状态转移方程可以表示为如下公式:To sum up, the dynamic programming state transition equation of the CTC model can be expressed as the following formula:
基于传统的一维CTC模型,本申请所提供的实施例在高度维对该一维CTC模型进行扩展。类似地,对于给定的二维分布X',其高度信息和宽度信息分别为H和W,定义路径转移概率ψ∈RH×(W-1)×H。路径转移概率ψh,w,h'表示从预测分布的位置(h,w)到位置(h',w+1)的路径转移概率,其中,h,h'∈[1,2,…H],w∈[1,2,…,W-1]。Based on the traditional one-dimensional CTC model, the embodiments provided in the present application extend the one-dimensional CTC model in the height dimension. Similarly, for a given two-dimensional distribution X', its height information and width information are H and W respectively, and the path transition probability ψ∈RH×(W-1)×H is defined. The path transition probability ψh,w,h' represents the path transition probability from the position (h,w) of the predicted distribution to the position (h',w+1), where h,h'∈[1,2,…H ], w∈[1,2,…,W-1].
以图3所示的二维空间特征分布为例来进行说明。如图3所示的为一个Q*H*W大小的空间特征分布图,以图3中的任意一个H*W大小的子分布图来说,即如图4所示的子分布图。假设,坐标为(h,w)为图4中符号“1”所示的位置,那么坐标(h',w+1)为图4中符号“2”、“3”、“4”和“5”所示的位置。The two-dimensional spatial feature distribution shown in FIG. 3 is taken as an example for illustration. As shown in FIG. 3, it is a spatial feature distribution diagram of the size Q*H*W. For any sub-distribution diagram of the size H*W in FIG. 3, that is, the sub-distribution diagram shown in FIG. 4. Assume that the coordinates (h, w) are the position shown by the symbol "1" in Figure 4, then the coordinates (h', w+1) are the symbols "2", "3", "4" and " 5” in the position shown.
由此易得,该公式表示从预测分布的一个位置到该预测分布中所有高度的路径转移概率之和为1。因此,由图4可知,表示符号“1”所示的位置到符号“2”、“3”、“4”和“5”所示位置的路径转移概率之和为1。From this it is easy to obtain, This formula states that the sum of path transition probabilities from a location in the predicted distribution to all heights in the predicted distribution is 1. Therefore, it can be seen from FIG. 4 that the sum of the path transition probabilities from the position indicated by the symbol "1" to the positions indicated by the symbols "2", "3", "4" and "5" is 1.
与一维CTC类似地,对目标序列进行同样的扩展得到扩展之后的目标序列Y*。于是使用类似的推导过程可得二维CTC模型的状态转移方程:Similar to the one-dimensional CTC, the same extension is performed on the target sequence to obtain the extended target sequence Y*. Then, using a similar derivation process, the state transition equation of the two-dimensional CTC model can be obtained:
具体地,Ψj,w-1,h表示所述第二路径转移概率,表示从所述第二二维空间特征分布中的特征点(j,w-1)到所述第二二维空间特征分布中的特征点(h,w)的转移概率,j表示所述第二二维空间特征分布中的一个高度坐标,Y*和X'分别表示所述第二文字序列扩展后的标注文字序列和所述第二二维空间特征分布,s表示Y*中字符的序号,h表示所述第二二维空间特征分布中的另一个高度坐标,w表示所述第二二维空间特征分布中的宽度坐标,h∈[1,2,…H],w∈[1,2,…,W-1],H表示所述第二二维空间特征分布中的高度信息,W表示所述第二二维空间特征分布中的宽度信息;属于所述第二字符分布概率,表示在位置(h,w)处的特征点属于第二文字序列中的字符Ys*的概率;βs,h,w表示从第二二维空间特征分布的位置(h,w)上到达第二文字序列中位于第s个位置的字符的所有子路径的概率和。Specifically, Ψj,w-1,h represents the transition probability of the second path, representing the characteristic point (j, w-1) in the characteristic distribution of the second two-dimensional space The transition probability of the feature point (h, w) in the feature distribution, j represents a height coordinate in the second two-dimensional space feature distribution, Y* and X' respectively represent the expanded text of the second text sequence sequence and the second two-dimensional space feature distribution, s represents the serial number of the character in Y* , h represents another height coordinate in the second two-dimensional space feature distribution, and w represents the second two-dimensional space feature distribution Width coordinates in, h∈[1,2,...H], w∈[1,2,...,W-1], H represents the height information in the second two-dimensional spatial feature distribution, W represents the Width information in the second two-dimensional spatial feature distribution; Belonging to the second character distribution probability, means that the feature point at the position (h, w) belongs to the probability of the character Ys* in the second character sequence; βs, h, w represent the feature distribution from the second two-dimensional space The sum of the probabilities of all subpaths reaching the character at the sth position in the second character sequence at the position (h, w).
最后,由于二维CTC模型在二维空间特征分布的高度维上有H个点可以作为起始点,基于此,β的起始状态可以被定义为:Finally, since the two-dimensional CTC model has H points in the height dimension of the two-dimensional spatial feature distribution that can be used as starting points, based on this, the initial state of β can be defined as:
其中,Γh∈RH,且RH表示实数域上的H维向量。 where Γh ∈ RH , and RH represents an H-dimensional vector on the field of real numbers.
通过如上所述公示,二维CTC模型可以通过序列标注端到端地对初始全卷积神经网络进行训练。在测试阶段,可以通过与一维CTC模型类似的方式,即通过贪心算法或段搜索来寻找概率最大的路径,其中,寻找概率最大的路径的过程即为寻找第二文字序列的过程。By publicizing as above, a 2D CTC model can be trained end-to-end with an initial fully convolutional neural network via sequence annotations. In the test phase, the path with the highest probability can be found in a manner similar to the one-dimensional CTC model, ie, through a greedy algorithm or segment search, wherein the process of finding the path with the highest probability is the process of finding the second character sequence.
基于上述所描述的内容,在本实施例中,对初始全卷积神经网络进行训练的过程描述如下:Based on the content described above, in this embodiment, the process of training the initial fully convolutional neural network is described as follows:
步骤S301,获取训练样本图像;Step S301, acquiring training sample images;
步骤S302,通过初始全卷积神经网络提取所述训练样本图像的特征信息,得到第二特征信息;Step S302, extracting the feature information of the training sample image through the initial fully convolutional neural network to obtain second feature information;
步骤S303,利用所述二维CTC模型对所述训练样本图像的第二特征信息进行处理,得到目标损失函数;Step S303, using the two-dimensional CTC model to process the second feature information of the training sample image to obtain a target loss function;
步骤S304,通过所述目标损失函数训练所述初始全卷积神经网络,得到所述全卷积神经网络。Step S304, training the initial fully convolutional neural network through the target loss function to obtain the fully convolutional neural network.
具体地,在本实施例中,在训练初始全卷积神经网络时,首先获取训练样本图像,然后,通过初始全卷积神经网络提取该训练样本图像的特征信息,得到第二特征信息。第二特征信息中同样包括:第二字符分布概率,第二路径转移概率和第二初始路径概率。Specifically, in this embodiment, when training the initial fully convolutional neural network, the training sample image is first obtained, and then the feature information of the training sample image is extracted through the initial full convolutional neural network to obtain the second feature information. The second characteristic information also includes: the second character distribution probability, the second path transition probability and the second initial path probability.
其中,第二字符分布概率为所述训练样本图像的第二二维空间特征分布中各个特征点属于第二文字序列中的字符的概率,第二路径转移概率表示在第二二维空间特征分布中高度维度上的路径选择概率;第二初始路径概率表示第二二维空间特征分布的各个特征点为第二路径上的起始特征点的概率,所述第二路径为在第二二维空间特征分布中预测出的能够对齐到第二文字序列的有效路径。Wherein, the second character distribution probability is the probability that each feature point in the second two-dimensional spatial feature distribution of the training sample image belongs to the character in the second character sequence, and the second path transition probability represents the probability of each feature point in the second two-dimensional spatial feature distribution The path selection probability on the medium-height dimension; the second initial path probability represents the probability that each feature point of the second two-dimensional spatial feature distribution is the initial feature point on the second path, and the second path is the probability in the second two-dimensional An effective path predicted from the spatial feature distribution that can be aligned to the second word sequence.
在按照上述所描述的方式得到第二特征信息之后,就可以利用二维CTC模型对训练样本图像的第二特征信息进行计算,得到目标损失函数。之后,就可以通过该目标损失函数训练该初始全卷积神经网络,得到步骤S204中所描述的全卷积神经网络。After obtaining the second feature information in the manner described above, the two-dimensional CTC model can be used to calculate the second feature information of the training sample image to obtain the target loss function. After that, the initial fully convolutional neural network can be trained through the target loss function to obtain the fully convolutional neural network described in step S204.
在一个可选的实施方式中,可以通过以下步骤,利用所述二维CTC模型对所述训练样本图像的第二特征信息进行处理,得到目标损失函数,具体包括如下步骤:In an optional implementation manner, the following steps may be used to process the second feature information of the training sample image by using the two-dimensional CTC model to obtain the target loss function, which specifically includes the following steps:
首先,利用所述二维CTC模型对所述第二特征信息进行处理,得到第二路径的条件概率;所述第二路径为所述初始全卷积神经网络在所述训练样本图像的二维空间特征分布中预测出的能够对齐到所述训练样本图像中第二文字序列的有效路径。First, the second feature information is processed by using the two-dimensional CTC model to obtain the conditional probability of the second path; the second path is the initial full convolutional neural network in the two-dimensional training sample image An effective path predicted from the spatial feature distribution that can be aligned to the second character sequence in the training sample image.
在计算第二路径的条件概率时,可以首先,结合动态规划算法和所述第二特征信息中的信息,计算得到目标条件概率βs,h,w,其中,βs,h,w表示从第二二维空间特征分布的位置(h,w)上到达第二文字序列中位于第s个位置的字符的所有子路径的概率和,所述第二二维空间特征分布为所述训练样本图像的空间特征分布。When calculating the conditional probability of the second path, firstly, the target conditional probability βs,h,w can be calculated by combining the dynamic programming algorithm and the information in the second feature information, where βs,h,w represent from The probability sum of all subpaths reaching the character at the sth position in the second character sequence on the position (h, w) of the second two-dimensional space feature distribution, the second two-dimensional space feature distribution being the training sample The spatial feature distribution of the image.
具体地,计算目标条件概率βs,h,w的过程可以描述为:利用目标公式计算所述目标条件概率βs,h,w,所述目标公式表示为:在得到条件概率βs,h,w之后,就可以利用所述目标条件概率βs,h,w计算所述第二路径的条件概率。Specifically, the process of calculating the target conditional probability βs,h,w can be described as: using the target formula to calculate the target conditional probability βs,h,w , and the target formula is expressed as: After the conditional probability βs,h,w is obtained, the target conditional probability βs,h,w can be used to calculate the conditional probability of the second path.
其中,in,
Ψj,w-1,h表示第二路径转移概率,表示从所述第二二维空间特征分布中的特征点(j,w-1)到所述第二二维空间特征分布中的特征点(h,w)的转移概率,j表示所述第二二维空间特征分布中的一个高度序号,Y*和X'分别表示所述第二文字序列扩展后的标注文字序列和所述第二二维空间特征分布,s表示Y*中字符的序号,h表示所述第二二维空间特征分布中的另一个高度坐标,w表示所述第二二维空间特征分布中的宽度坐标;h∈[1,2,…H],w∈[1,2,…,W-1],H表示所述第二二维空间特征分布中的高度信息,W表示所述第二二维空间特征分布中的宽度信息;属于所述第二字符分布概率,表示在位置(h,w)处的特征点属于第二文字序列中的字符的概率;Ψj,0,h是根据所述第二初始路径概率Ψj,-1,h计算得到的。 Ψj,w-1,h represents the transition probability of the second path, which represents the feature point (j, w-1) in the second two-dimensional space feature distribution to the feature in the second two-dimensional space feature distribution The transition probability of point (h, w), j represents a height sequence number in the second two-dimensional spatial feature distribution, Y* and X' respectively represent the extended label text sequence of the second text sequence and the first Two-dimensional spatial feature distribution, s represents the sequence number of characters in Y* , h represents another height coordinate in the second two-dimensional spatial feature distribution, and w represents the width coordinate in the second two-dimensional spatial feature distribution; h∈[1,2,...H], w∈[1,2,...,W-1], H represents the height information in the feature distribution of the second two-dimensional space, W represents the second two-dimensional space Width information in the feature distribution; Belonging to the second character distribution probability, it means the probability that the feature point at the position (h, w) belongs to the character in the second character sequence; Ψj, 0, h is based on the second initial path probability Ψj, -1, calculated from h .
需要说明的是,在本实施例中,可以根据公式计算第二路径的条件概率,该公式表示为:It should be noted that, in this embodiment, the conditional probability of the second path can be calculated according to the formula, which is expressed as:
其中,L=|Y|,Y为Y*扩展之前用于表征目标序列的向量,L为向量Y取模长之后的数值。 Among them, L=|Y|, Y is the vector used to represent the target sequence before Y* expansion, and L is the value after the modulo length of the vector Y.
在按照上述所描述的方式得到第二路径的条件概率P(Y/X')之后,就可以按照如下公式计算目标损失函数。After obtaining the conditional probability P(Y/X') of the second path in the manner described above, the target loss function can be calculated according to the following formula.
然后,基于所述第二路径的条件概率确定所述目标损失函数Loss。其中,该公式为:Loss=-lnP(Y/X')。为所述第二路径的条件概率,为所述目标损失函数。Then, the target loss function Loss is determined based on the conditional probability of the second path. Wherein, the formula is: Loss=-lnP(Y/X'). is the conditional probability of the second path, and is the target loss function.
通过上述描述可知,在本实施例中,结合了CTC模型来实现图像中序列的识别,同时,发明人为了解决现有的传统CTC模型的限制,发明人还拓展了传统的CTC模型,提出新的二维CTC模型以直接从二维概率分布计算目标序列的条件概率。更具体地说,在传统CTC模型的基础上,本申请所提供的方法在搜索路径中除时间维之外加入了高度维,路径搜索可以在在不同高度之间进行。搜索路径在不同高度上的选择依然可以指向同一个目标序列,同样地,所有路径的条件概率之和为目标序列的条件概率。It can be seen from the above description that in this embodiment, the CTC model is combined to realize the sequence recognition in the image. At the same time, in order to solve the limitations of the existing traditional CTC model, the inventor also expanded the traditional CTC model and proposed a new The two-dimensional CTC model of [20] to compute the conditional probability of a target sequence directly from a two-dimensional probability distribution. More specifically, on the basis of the traditional CTC model, the method provided in this application adds a height dimension to the search path in addition to the time dimension, and the path search can be performed between different heights. The selection of search paths at different heights can still point to the same target sequence. Similarly, the sum of the conditional probabilities of all paths is the conditional probability of the target sequence.
通过将传统一维CTC模型拓展到二维,基于图像的序列识别可以保留图像的二维特征,从二维分布直接计算和标注的相似度,从而大幅提高识别准确率。此外,由于二维信息的存在,这种扩展还提供了处理曲形、偏转和透视变形文字的能力。本申请中二维CTC模型的提出给文字识别方法带来了新的角度,以更加自然的方式处理基于图像的序列识别问题,使得该问题中保留图像的二维分布成为可能。By extending the traditional one-dimensional CTC model to two-dimensional, image-based sequence recognition can preserve the two-dimensional features of the image, and directly calculate and mark the similarity from the two-dimensional distribution, thereby greatly improving the recognition accuracy. Furthermore, due to the presence of two-dimensional information, this extension also provides the ability to handle curved, deflected and perspective deformed text. The proposal of the two-dimensional CTC model in this application has brought a new perspective to the text recognition method, which can deal with the image-based sequence recognition problem in a more natural way, making it possible to preserve the two-dimensional distribution of the image in this problem.
此外,对于CTC概率的计算过程,简单地计算所有路径的概率再求和的计算方式计算代价非常大,本发明提出了一种动态规划算法,大幅降低了计算二维条件概率的计算复杂度,使得在识别网络中使用二维CTC的计算代价可以几乎不计。In addition, for the calculation process of CTC probability, the calculation method of simply calculating the probability of all paths and then summing is very expensive. The present invention proposes a dynamic programming algorithm, which greatly reduces the computational complexity of calculating two-dimensional conditional probability. The calculation cost of using two-dimensional CTC in the recognition network can be almost negligible.
实施例3:Example 3:
本发明实施例还提供了一种文字识别装置,该文字识别装置主要用于执行本发明实施例上述内容所提供的文字识别方法,以下对本发明实施例提供的文字识别装置做具体介绍。The embodiment of the present invention also provides a character recognition device, the character recognition device is mainly used to implement the character recognition method provided in the above-mentioned content of the embodiment of the present invention, and the character recognition device provided by the embodiment of the present invention will be described in detail below.
图7是根据本发明实施例的一种文字识别装置的示意图,如图7所示,该文字识别装置主要包括获取单元10、提取单元20和确定单元30,其中:Fig. 7 is a schematic diagram of a character recognition device according to an embodiment of the present invention. As shown in Fig. 7, the character recognition device mainly includes an acquisition unit 10, an extraction unit 20 and a determination unit 30, wherein:
获取单元10,用于获取待检测图像;An acquisition unit 10, configured to acquire an image to be detected;
提取单元20,用于通过采用二维CTC模型训练之后的全卷积神经网络提取所述待检测图像的特征信息,得到第一特征信息;The extraction unit 20 is used to extract the feature information of the image to be detected by using the fully convolutional neural network trained by the two-dimensional CTC model to obtain the first feature information;
其中,所述第一特征信息包括以下至少之一:第一字符分布概率、第一路径转移概率和第一初始路径概率;所述第一字符分布概率为所述待检测图像的第一二维空间特征分布中各个特征点属于第一文字序列的概率,所述第一路径转移概率表示在第一二维空间特征分布中高度维度上的路径选择概率;所述第一初始路径概率表示第一二维空间特征分布的各个特征点为第一路径上的起始特征点的概率,所述第一路径为在第一二维空间特征分布中预测出的能够对齐到第一文字序列的路径;Wherein, the first feature information includes at least one of the following: a first character distribution probability, a first path transition probability, and a first initial path probability; the first character distribution probability is the first two-dimensional The probability that each feature point in the spatial feature distribution belongs to the first character sequence, the first path transition probability represents the path selection probability on the height dimension in the first two-dimensional spatial feature distribution; the first initial path probability represents the first two Each feature point of the three-dimensional space feature distribution is the probability of the initial feature point on the first path, and the first path is a path that can be aligned to the first character sequence predicted in the first two-dimensional space feature distribution;
确定单元30,用于利用所述待检测图像的第一特征信息确定所述待检测图像中的所述第一文字序列。The determining unit 30 is configured to determine the first character sequence in the image to be detected by using the first characteristic information of the image to be detected.
在本发明实施例中,首先,获取待检测图像,并通过采用二维CTC模型训练之后的全卷积神经网络提取待检测图像的特征信息,得到第一特征信息,其中,第一特征信息包括以下至少之一:字符分布概率、路径转移概率和初始路径概率;最后,利用待检测图像的第一特征信息确定待检测图像中的所述第一文字序列。通过上述描述可知,本申请采用二维CTC模型对全卷积神经网络进行训练,并利用训练之后的全卷积神经网络对待检测图像进行序列识别的方式,能够提高全卷积网络的识别精度,进而缓解了现有的图像序列识别方法由于出现注意力偏移导致的序列预测准确度低的技术问题。In the embodiment of the present invention, firstly, the image to be detected is obtained, and the feature information of the image to be detected is extracted by using the fully convolutional neural network trained by the two-dimensional CTC model to obtain the first feature information, wherein the first feature information includes At least one of the following: character distribution probability, path transition probability and initial path probability; finally, using the first feature information of the image to be detected to determine the first character sequence in the image to be detected. It can be seen from the above description that the present application adopts the two-dimensional CTC model to train the fully convolutional neural network, and uses the trained fully convolutional neural network to perform sequence recognition on the image to be detected, which can improve the recognition accuracy of the fully convolutional network. Furthermore, the technical problem of low sequence prediction accuracy caused by attention shift in existing image sequence recognition methods is alleviated.
可选地,所述全卷积神经网络包括:第一卷积网络、金字塔池化模块和第二卷积网络。Optionally, the fully convolutional neural network includes: a first convolutional network, a pyramid pooling module, and a second convolutional network.
可选地,所述第一卷积网络为残差卷积神经网络,所述残差卷积神经网络中包括多个卷积模块,且所述多个卷积模块中的部分卷积模块包含空洞卷积层。Optionally, the first convolutional network is a residual convolutional neural network, the residual convolutional neural network includes a plurality of convolution modules, and some convolution modules in the plurality of convolution modules include Dilated convolutional layers.
可选地,提取单元20用于:利用所述第一卷积网络对所述待检测图像进行特征提取,得到第一卷积特征信息;利用所述金字塔池化模块对所述第一卷积特征信息进行池化计算,得到不同尺度的池化特征,并对所述不同尺度的池化特征进行级联处理,得到池化特征信息;利用所述第二卷积网络对所述池化特征信息进行卷积计算,得到所述待检测图像的第一特征信息。Optionally, the extraction unit 20 is configured to: use the first convolutional network to perform feature extraction on the image to be detected to obtain first convolutional feature information; use the pyramid pooling module to perform feature extraction on the first convolutional network performing pooling calculation on feature information to obtain pooling features of different scales, and performing cascading processing on the pooling features of different scales to obtain pooling feature information; using the second convolutional network to performing convolution calculation on the information to obtain the first feature information of the image to be detected.
可选地,所述装置还用于:获取训练样本图像;通过初始全卷积神经网络提取所述训练样本图像的特征信息,得到第二特征信息;所述第二特征信息包括以下至少之一:第二字符分布概率、第二路径转移概率和第二初始路径概率,所述第二字符分布概率为所述训练样本图像的第二二维空间特征分布中各个特征点属于第二文字序列中的字符的概率,所述第二路径转移概率表示在第二二维空间特征分布中高度维度上的路径选择概率;所述第二初始路径概率表示第二二维空间特征分布的各个特征点为第二路径上的起始特征点的概率,所述第二路径为在第二二维空间特征分布中预测出的能够对齐到第二文字序列的有效路径;利用所述二维CTC模型对所述训练样本图像的第二特征信息进行处理,得到目标损失函数;通过所述目标损失函数训练所述初始全卷积神经网络,得到所述全卷积神经网络。Optionally, the device is further configured to: acquire a training sample image; extract feature information of the training sample image through an initial fully convolutional neural network to obtain second feature information; the second feature information includes at least one of the following : the second character distribution probability, the second path transition probability and the second initial path probability, the second character distribution probability is that each feature point in the second two-dimensional space feature distribution of the training sample image belongs to the second character sequence The probability of the characters, the second path transition probability represents the path selection probability on the height dimension in the second two-dimensional space feature distribution; the second initial path probability represents each feature point of the second two-dimensional space feature distribution as The probability of the starting feature point on the second path, the second path is an effective path that can be aligned to the second character sequence predicted in the second two-dimensional spatial feature distribution; use the two-dimensional CTC model to the Processing the second feature information of the training sample image to obtain a target loss function; training the initial fully convolutional neural network through the target loss function to obtain the fully convolutional neural network.
可选地,所述装置还用于:利用所述二维CTC模型对所述第二特征信息进行处理,得到第二路径的条件概率;基于所述第二路径的条件概率确定所述目标损失函数。Optionally, the device is further configured to: use the two-dimensional CTC model to process the second feature information to obtain the conditional probability of the second path; determine the target loss based on the conditional probability of the second path function.
可选地,所述装置还用于:结合动态规划算法和所述第二特征信息,计算得到目标条件概率βs,h,w,其中,βs,h,w表示从第二二维空间特征分布的位置(h,w)上到达第二文字序列中位于第s个位置的字符的所有子路径的概率和,所述第二二维空间特征分布为所述训练样本图像的空间特征分布;利用所述目标条件概率βs,h,w计算所述第二路径的条件概率。Optionally, the device is further configured to: combine the dynamic programming algorithm and the second feature information to calculate the target conditional probability βs,h,w , where βs,h,w represents The probability sum of all subpaths reaching the character at the sth position in the second character sequence on the position (h, w) of the feature distribution, the second two-dimensional spatial feature distribution is the spatial feature distribution of the training sample image ; Using the target conditional probability βs,h,w to calculate the conditional probability of the second path.
可选地,所述装置还用于:利用目标公式计算所述目标条件概率βs,h,w,所述目标公式表示为:Optionally, the device is further configured to: use a target formula to calculate the target conditional probability βs,h,w , and the target formula is expressed as:
其中,in,
Ψj,w-1,h表示第二路径转移概率,表示从所述第二二维空间特征分布中的特征点(j,w-1)到所述第二二维空间特征分布中的特征点(h,w)的转移概率,j表示所述第二二维空间特征分布中的一个高度序号,Y*和X'分别表示所述第二文字序列扩展后的标注文字序列和所述第二二维空间特征分布,s表示Y*中字符的序号,h表示所述第二二维空间特征分布中的另一个高度坐标,w表示所述第二二维空间特征分布中的宽度坐标,h∈[1,2,…H],w∈[1,2,…,W-1],H表示所述第二二维空间特征分布中的高度信息,W表示所述第二二维空间特征分布中的宽度信息;属于所述第二字符分布概率,表示在位置(h,w)处的特征点属于第二文字序列中的字符Ys*的概率;Ψj,0,h是根据所述第二初始路径概率Ψj,-1,h计算得到的。 Ψj,w-1,h represents the transition probability of the second path, which represents the feature point (j, w-1) in the second two-dimensional space feature distribution to the feature in the second two-dimensional space feature distribution The transition probability of point (h, w), j represents a height sequence number in the second two-dimensional spatial feature distribution, Y* and X' respectively represent the extended label text sequence of the second text sequence and the first text sequence Two-dimensional spatial feature distribution, s represents the sequence number of characters in Y* , h represents another height coordinate in the second two-dimensional spatial feature distribution, and w represents the width coordinate in the second two-dimensional spatial feature distribution, h∈[1,2,...H], w∈[1,2,...,W-1], H represents the height information in the feature distribution of the second two-dimensional space, W represents the second two-dimensional space Width information in the feature distribution; Belonging to the second character distribution probability, means that the feature point at the position (h, w) belongs to the probability of the character Ys* in the second character sequence; Ψj, 0, h is according to the second initial path probability Ψj,-1,h is calculated.
可选地,所述装置还用于:利用公式Loss=-lnP(Y/X')确定所述目标损失函数,其中,P(Y/X')为所述第二路径的条件概率,Loss为所述目标损失函数。Optionally, the device is further configured to: use the formula Loss=-lnP(Y/X') to determine the target loss function, where P(Y/X') is the conditional probability of the second path, and Loss is the objective loss function.
本申请还提供了一种计算机可读存储介质,所述计算机可读存储介质上存储有计算机程序,所述计算机程序被处理器运行时执行上述方法实施例中任一实施例所述的方法的步骤。The present application also provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is run by a processor, the method described in any one of the above method embodiments is executed. step.
本实施例所提供的装置,其实现原理及产生的技术效果和前述实施例相同,为简要描述,装置实施例部分未提及之处,可参考前述方法实施例中相应内容。The implementation principle and technical effects of the device provided in this embodiment are the same as those of the foregoing embodiments. For brief description, for the parts not mentioned in the device embodiments, reference may be made to the corresponding content in the foregoing method embodiments.
此外,本实施例提供了一种处理设备,该设备包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,处理器执行计算机程序时实现上述实施例提供的姿势识别方法。In addition, this embodiment provides a processing device, which includes a memory, a processor, and a computer program stored on the memory and operable on the processor. When the processor executes the computer program, the gesture recognition method provided by the above embodiment is implemented. .
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统具体工作过程,可以参考前述实施例中的对应过程,在此不再赘述。Those skilled in the art can clearly understand that for the convenience and brevity of description, the specific working process of the system described above can refer to the corresponding process in the foregoing embodiments, and details are not repeated here.
本发明实施例所提供的一种文字识别方法、装置、电子设备和存储了程序代码的计算机可读存储介质,所述程序代码包括的指令可用于执行前面方法实施例中所述的方法,具体实现可参见方法实施例,在此不再赘述。所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本发明的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本发明各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。A character recognition method, device, electronic device, and computer-readable storage medium storing program codes provided by the embodiments of the present invention, the instructions included in the program codes can be used to execute the methods described in the foregoing method embodiments, specifically For implementation, reference may be made to the method embodiments, which will not be repeated here. If the functions described above are realized in the form of software function units and sold or used as independent products, they can be stored in a computer-readable storage medium. Based on this understanding, the essence of the technical solution of the present invention or the part that contributes to the prior art or the part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium, including Several instructions are used to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods described in various embodiments of the present invention. The aforementioned storage medium includes: U disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic disk or optical disk and other media that can store program codes. .
最后应说明的是:以上所述实施例,仅为本发明的具体实施方式,用以说明本发明的技术方案,而非对齐限制,本发明的保护范围并不局限于此,尽管参照前述实施例对本发明进行了详细的说明,本领域的普通技术人员应当理解:任何熟悉本技术领域的技术人员在本发明揭露的技术范围内,其依然可以对前述实施例所记载的技术方案进行修改或可轻易想到变化,或者对齐中部分技术特征进行等同替换;而这些修改、变化或者替换,并不使相应技术方案的本质脱离本发明实施例技术方案的精神和范围,都应涵盖在本发明的保护范围之内。因此,本发明的保护范围应以所述权利要求的保护范围为准。Finally, it should be noted that: the above-described embodiments are only specific implementations of the present invention, used to illustrate the technical solutions of the present invention, rather than alignment restrictions, and the protection scope of the present invention is not limited thereto. This example has described the present invention in detail, and those skilled in the art should understand that any person familiar with the technical field can still modify or modify the technical solutions described in the foregoing embodiments within the technical scope disclosed in the present invention. Changes can be easily imagined, or some technical features in the alignment can be replaced equivalently; and these modifications, changes or replacements do not make the essence of the corresponding technical solutions deviate from the spirit and scope of the technical solutions of the embodiments of the present invention, and should be covered by the scope of the present invention. within the scope of protection. Therefore, the protection scope of the present invention should be determined by the protection scope of the claims.
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201910488332.1ACN110210480B (en) | 2019-06-05 | 2019-06-05 | Character recognition method and device, electronic equipment and computer readable storage medium |
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201910488332.1ACN110210480B (en) | 2019-06-05 | 2019-06-05 | Character recognition method and device, electronic equipment and computer readable storage medium |
| Publication Number | Publication Date |
|---|---|
| CN110210480Atrue CN110210480A (en) | 2019-09-06 |
| CN110210480B CN110210480B (en) | 2021-08-10 |
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN201910488332.1AActiveCN110210480B (en) | 2019-06-05 | 2019-06-05 | Character recognition method and device, electronic equipment and computer readable storage medium |
| Country | Link |
|---|---|
| CN (1) | CN110210480B (en) |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN111259773A (en)* | 2020-01-13 | 2020-06-09 | 中国科学院重庆绿色智能技术研究院 | Irregular text line identification method and system based on bidirectional decoding |
| CN111259764A (en)* | 2020-01-10 | 2020-06-09 | 中国科学技术大学 | Text detection method and device, electronic equipment and storage device |
| CN112270316A (en)* | 2020-09-23 | 2021-01-26 | 北京旷视科技有限公司 | Character recognition, training method, device and electronic device for character recognition model |
| CN112418209A (en)* | 2020-12-15 | 2021-02-26 | 润联软件系统(深圳)有限公司 | Character recognition method and device, computer equipment and storage medium |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN106570456A (en)* | 2016-10-13 | 2017-04-19 | 华南理工大学 | Handwritten Chinese character recognition method based on full-convolution recursive network |
| CN107293291A (en)* | 2016-03-30 | 2017-10-24 | 中国科学院声学研究所 | A kind of audio recognition method end to end based on autoadapted learning rate |
| CN107330379A (en)* | 2017-06-13 | 2017-11-07 | 内蒙古大学 | A kind of Mongol hand-written recognition method and device |
| CN108460453A (en)* | 2017-02-21 | 2018-08-28 | 阿里巴巴集团控股有限公司 | It is a kind of to be used for data processing method, the apparatus and system that CTC is trained |
| CN108509881A (en)* | 2018-03-22 | 2018-09-07 | 五邑大学 | A kind of the Off-line Handwritten Chinese text recognition method of no cutting |
| CN109002461A (en)* | 2018-06-04 | 2018-12-14 | 平安科技(深圳)有限公司 | Handwriting model training method, text recognition method, device, equipment and medium |
| US20190013015A1 (en)* | 2017-07-10 | 2019-01-10 | Sony Interactive Entertainment Inc. | Initialization of ctc speech recognition with standard hmm |
| CN109272988A (en)* | 2018-09-30 | 2019-01-25 | 江南大学 | Speech recognition method based on multi-channel convolutional neural network |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN107293291A (en)* | 2016-03-30 | 2017-10-24 | 中国科学院声学研究所 | A kind of audio recognition method end to end based on autoadapted learning rate |
| CN106570456A (en)* | 2016-10-13 | 2017-04-19 | 华南理工大学 | Handwritten Chinese character recognition method based on full-convolution recursive network |
| CN108460453A (en)* | 2017-02-21 | 2018-08-28 | 阿里巴巴集团控股有限公司 | It is a kind of to be used for data processing method, the apparatus and system that CTC is trained |
| CN107330379A (en)* | 2017-06-13 | 2017-11-07 | 内蒙古大学 | A kind of Mongol hand-written recognition method and device |
| US20190013015A1 (en)* | 2017-07-10 | 2019-01-10 | Sony Interactive Entertainment Inc. | Initialization of ctc speech recognition with standard hmm |
| CN108509881A (en)* | 2018-03-22 | 2018-09-07 | 五邑大学 | A kind of the Off-line Handwritten Chinese text recognition method of no cutting |
| CN109002461A (en)* | 2018-06-04 | 2018-12-14 | 平安科技(深圳)有限公司 | Handwriting model training method, text recognition method, device, equipment and medium |
| CN109272988A (en)* | 2018-09-30 | 2019-01-25 | 江南大学 | Speech recognition method based on multi-channel convolutional neural network |
| Title |
|---|
| YUNZE GAO ET AL: "Reading scene text with fully convolutional sequence modeling", 《NEUROCOMPUTING》* |
| 刘衍平: "基于深度学习的端到端场景文本识别方法研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》* |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN111259764A (en)* | 2020-01-10 | 2020-06-09 | 中国科学技术大学 | Text detection method and device, electronic equipment and storage device |
| CN111259773A (en)* | 2020-01-13 | 2020-06-09 | 中国科学院重庆绿色智能技术研究院 | Irregular text line identification method and system based on bidirectional decoding |
| CN112270316A (en)* | 2020-09-23 | 2021-01-26 | 北京旷视科技有限公司 | Character recognition, training method, device and electronic device for character recognition model |
| CN112270316B (en)* | 2020-09-23 | 2023-06-20 | 北京旷视科技有限公司 | Character recognition, training method and device of character recognition model and electronic equipment |
| CN112418209A (en)* | 2020-12-15 | 2021-02-26 | 润联软件系统(深圳)有限公司 | Character recognition method and device, computer equipment and storage medium |
| CN112418209B (en)* | 2020-12-15 | 2022-09-13 | 润联软件系统(深圳)有限公司 | Character recognition method and device, computer equipment and storage medium |
| Publication number | Publication date |
|---|---|
| CN110210480B (en) | 2021-08-10 |
| Publication | Publication Date | Title |
|---|---|---|
| CN109508681B (en) | Method and device for generating human body key point detection model | |
| CN109376681B (en) | Multi-person posture estimation method and system | |
| CN109117846B (en) | Image processing method and device, electronic equipment and computer readable medium | |
| CN111480169B (en) | Method, system and apparatus for pattern recognition | |
| CN110210480B (en) | Character recognition method and device, electronic equipment and computer readable storage medium | |
| US10311295B2 (en) | Heuristic finger detection method based on depth image | |
| CN111461070B (en) | Text recognition method, device, electronic equipment and storage medium | |
| CN111652054B (en) | Joint point detection method, gesture recognition method and device | |
| WO2023174098A1 (en) | Real-time gesture detection method and apparatus | |
| CN109886223B (en) | Face recognition method, bottom library input method and device and electronic equipment | |
| CN109063776B (en) | Image re-recognition network training method and device and image re-recognition method and device | |
| CN111414823B (en) | Detection methods, devices, electronic equipment and storage media for human body feature points | |
| CN111274999A (en) | Data processing method, image processing method, device and electronic equipment | |
| CN106650615A (en) | Image processing method and terminal | |
| CN113449726B (en) | Text matching and recognition method and device | |
| CN110175975B (en) | Object detection method, device, computer readable storage medium and computer equipment | |
| CN115830697A (en) | Student classroom behavior identification method, device, equipment and storage medium | |
| CN116229445A (en) | Natural scene text detection method, system, storage medium and computing device | |
| CN114627503A (en) | Human hand recognition method and device, electronic equipment and storage medium | |
| CN114495132A (en) | Character recognition method, device, equipment and storage medium | |
| CN112749576A (en) | Image recognition method and device, computing equipment and computer storage medium | |
| KR102756642B1 (en) | Apparatus and method for haptic texture prediction | |
| CN117593801A (en) | Biological attack detection method and device | |
| WO2020224244A1 (en) | Method and apparatus for obtaining depth-of-field image | |
| CN117078602A (en) | Image stretching recognition and model training method, device, equipment, medium and product |
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant | ||
| GR01 | Patent grant | ||
| TR01 | Transfer of patent right | Effective date of registration:20241121 Address after:No. 257, 2nd Floor, Building 9, No. 2 Huizhu Road, Liangjiang New District, Yubei District, Chongqing 401100 Patentee after:Yuanli Jinzhi (Chongqing) Technology Co.,Ltd. Country or region after:China Address before:313, block a, No.2, south academy of Sciences Road, Haidian District, Beijing Patentee before:BEIJING KUANGSHI TECHNOLOGY Co.,Ltd. Country or region before:China | |
| TR01 | Transfer of patent right |