技术领域technical field
本公开涉及图像识别技术领域,尤其涉及一种文字识别方法及装置。The present disclosure relates to the technical field of image recognition, in particular to a character recognition method and device.
背景技术Background technique
相关技术中,文字识别是指利用计算机对图像中文字进行识别、验证和记录等处理的技术。人们在生产和生活中,要处理大量的文字、报表和文本,文字识别技术能够大大减轻人们的工作。目前,对于残缺文字图像中的文字识别准确率较低,提高残缺文字图像中的文字识别准确率具有重要意义。In related technologies, text recognition refers to a technology for identifying, verifying, and recording text in an image by using a computer. In production and life, people have to deal with a large number of characters, reports and texts, and character recognition technology can greatly reduce people's work. At present, the accuracy of character recognition in incomplete text images is low, and it is of great significance to improve the accuracy of character recognition in incomplete text images.
发明内容Contents of the invention
为克服相关技术中存在的问题,本公开提供一种文字识别方法及装置。In order to overcome the problems existing in related technologies, the present disclosure provides a character recognition method and device.
根据本公开实施例的第一方面,提供一种文字识别方法,包括:According to the first aspect of the embodiments of the present disclosure, a character recognition method is provided, including:
获取待识别文字图像包括的各个单个字符图像;Obtain each individual character image included in the text image to be recognized;
针对所述待识别文字图像包括的每个单个字符图像,在确定所述单个字符图像残缺的情况下,将所述单个字符图像输入字符图像生成网络,得到第一修复单个字符图像,并根据所述第一修复单个字符图像进行文字识别,得到文字识别结果;其中,所述字符图像生成网络通过完整单个字符图像和残缺单个字符图像训练得到;For each individual character image included in the text image to be recognized, when it is determined that the individual character image is incomplete, input the individual character image into the character image generation network to obtain the first repaired individual character image, and according to the Perform text recognition on the first repaired single character image to obtain a text recognition result; wherein, the character image generation network is obtained by training the complete single character image and the incomplete single character image;
根据所述待识别文字图像包括的各个单个字符图像对应的文字识别结果,得到所述待识别文字图像对应的文字识别结果。The character recognition result corresponding to the character image to be recognized is obtained according to the character recognition result corresponding to each single character image included in the character image to be recognized.
在一种可能的实现方式中,所述方法还包括:In a possible implementation, the method further includes:
对所述完整单个字符图像进行残缺处理,得到所述残缺单个字符图像;performing incomplete processing on the complete single character image to obtain the incomplete single character image;
根据所述完整单个字符图像和所述残缺单个字符图像,训练判别网络和生成网络,所述判别网络用于判别所述修复单个字符图像和所述完整单个字符图像的一致性;According to the complete single character image and the incomplete single character image, train a discriminant network and a generation network, and the discriminant network is used to judge the consistency between the repaired single character image and the complete single character image;
重复训练所述判别网络和所述生成网络,在训练次数达到预设阈值或所述判别网络的判别结果表明所述修复单个字符图像和所述完整单个字符图像的一致性满足预设条件时,将当前的生成网络确定为所述字符图像生成网络。Repeatedly training the discrimination network and the generation network, when the number of training times reaches a preset threshold or the discrimination result of the discrimination network shows that the consistency between the repaired single character image and the complete single character image satisfies a preset condition, The current generation network is determined as the character image generation network.
在一种可能的实现方式中,根据所述完整单个字符图像和所述残缺单个字符图像,训练判别网络和生成网络,包括:In a possible implementation manner, according to the complete single character image and the incomplete single character image, training a discriminant network and a generation network includes:
将所述残缺单个字符图像输入所述生成网络,得到第二修复单个字符图像;inputting the incomplete single character image into the generation network to obtain a second repaired single character image;
将所述完整单个字符图像和所述第二修复单个字符图像输入所述判别网络,得到用于表示所述第二修复单个字符图像与所述完整单个字符图像是否一致的判别结果;Inputting the complete single character image and the second repaired single character image into the discriminant network to obtain a discrimination result indicating whether the second repaired single character image is consistent with the complete single character image;
根据所述判别结果,调整所述判别网络或所述生成网络中参数的取值。According to the discrimination result, the value of the parameter in the discrimination network or the generation network is adjusted.
在一种可能的实现方式中,所述生成网络包括通过残差方式连接的多个编码模块和多个解码模块,所述编码模块包括卷积层、线性整流函数层和最大池化层,所述解码模块包括卷积层、线性整流函数层和最大池化层。In a possible implementation, the generation network includes a plurality of encoding modules and a plurality of decoding modules connected in a residual manner, and the encoding module includes a convolution layer, a linear rectification function layer and a maximum pooling layer, so The above decoding module includes a convolution layer, a linear rectification function layer and a maximum pooling layer.
在一种可能的实现方式中,所述判别网络包括依次连接的多个编码模块、多个全连接层和阈值函数层,所述编码模块包括卷积层、线性整流函数层和最大池化层。In a possible implementation, the discriminant network includes a plurality of encoding modules connected in sequence, a plurality of fully connected layers and a threshold function layer, and the encoding module includes a convolution layer, a linear rectification function layer and a maximum pooling layer .
在一种可能的实现方式中,所述方法还包括:针对所述待识别文字图像包括的每个单个字符图像,将所述单个字符图像输入文字分类器,得到所述单个字符图像属于各个字符分类的比率;在所述单个字符图像属于各个字符分类的比率均小于或等于第一阈值的情况下,确定所述单个字符图像残缺。In a possible implementation manner, the method further includes: for each individual character image included in the character image to be recognized, inputting the individual character image into a character classifier to obtain that the individual character image belongs to each character Ratio of classification: in the case that the ratio of the single character image belonging to each character classification is less than or equal to the first threshold, it is determined that the single character image is incomplete.
根据本公开实施例的第二方面,提供一种文字识别装置,包括:According to a second aspect of an embodiment of the present disclosure, a character recognition device is provided, including:
第一获取模块,用于获取待识别文字图像包括的各个单个字符图像;The first obtaining module is used to obtain each individual character image included in the character image to be recognized;
修复模块,用于针对所述待识别文字图像包括的每个单个字符图像,在确定所述单个字符图像残缺的情况下,将所述单个字符图像输入字符图像生成网络,得到第一修复单个字符图像,并根据所述第一修复单个字符图像进行文字识别,得到文字识别结果;其中,所述字符图像生成网络通过完整单个字符图像和残缺单个字符图像训练得到;A repairing module, for each single character image included in the text image to be recognized, when it is determined that the single character image is incomplete, input the single character image into the character image generation network to obtain the first repaired single character image, and perform text recognition according to the first repaired single character image to obtain a text recognition result; wherein, the character image generation network is obtained by training the complete single character image and the incomplete single character image;
识别模块,用于根据所述待识别文字图像包括的各个单个字符图像对应的文字识别结果,得到所述待识别文字图像对应的文字识别结果。The recognition module is configured to obtain the character recognition result corresponding to the character image to be recognized according to the character recognition result corresponding to each single character image included in the character image to be recognized.
在一种可能的实现方式中,所述装置还包括:In a possible implementation manner, the device further includes:
处理模块,用于对所述完整单个字符图像进行残缺处理,得到所述残缺单个字符图像;A processing module, configured to perform incomplete processing on the complete single character image to obtain the incomplete single character image;
训练模块,用于根据所述完整单个字符图像和所述残缺单个字符图像,训练判别网络和生成网络,所述判别网络用于判别所述修复单个字符图像和所述完整单个字符图像的一致性;A training module, configured to train a discriminant network and a generating network according to the complete single character image and the incomplete single character image, and the discriminant network is used to judge the consistency between the repaired single character image and the complete single character image ;
第一确定模块,用于重复训练所述判别网络和所述生成网络,在训练次数达到预设阈值或所述判别网络的判别结果表明所述修复单个字符图像和所述完整单个字符图像的一致性满足预设条件时,将当前的生成网络确定为所述字符图像生成网络。The first determining module is used to repeatedly train the discriminant network and the generating network, and when the number of training times reaches a preset threshold or the discriminative result of the discriminant network indicates that the repaired single character image is consistent with the complete single character image When the property meets the preset condition, the current generation network is determined as the character image generation network.
在一种可能的实现方式中,所述训练模块用于:In a possible implementation, the training module is used for:
将所述残缺单个字符图像输入所述生成网络,得到第二修复单个字符图像;inputting the incomplete single character image into the generation network to obtain a second repaired single character image;
将所述完整单个字符图像和所述第二修复单个字符图像输入所述判别网络,得到用于表示所述第二修复单个字符图像与所述完整单个字符图像是否一致的判别结果;Inputting the complete single character image and the second repaired single character image into the discriminant network to obtain a discrimination result indicating whether the second repaired single character image is consistent with the complete single character image;
根据所述判别结果,调整所述判别网络或所述生成网络中参数的取值。According to the discrimination result, the value of the parameter in the discrimination network or the generation network is adjusted.
在一种可能的实现方式中,所述生成网络包括通过残差方式连接的多个编码模块和多个解码模块,所述编码模块包括卷积层、线性整流函数层和最大池化层,所述解码模块包括卷积层、线性整流函数层和最大池化层。In a possible implementation, the generation network includes a plurality of encoding modules and a plurality of decoding modules connected in a residual manner, and the encoding module includes a convolution layer, a linear rectification function layer and a maximum pooling layer, so The above decoding module includes a convolution layer, a linear rectification function layer and a maximum pooling layer.
在一种可能的实现方式中,所述判别网络包括依次连接的多个编码模块、多个全连接层和阈值函数层,所述编码模块包括卷积层、线性整流函数层和最大池化层。In a possible implementation, the discriminant network includes a plurality of encoding modules connected in sequence, a plurality of fully connected layers and a threshold function layer, and the encoding module includes a convolution layer, a linear rectification function layer and a maximum pooling layer .
在一种可能的实现方式中,所述装置还包括:In a possible implementation manner, the device further includes:
第二获取模块,用于针对所述待识别文字图像包括的每个单个字符图像,将所述单个字符图像输入文字分类器,得到所述单个字符图像属于各个字符分类的比率;The second acquisition module is used for inputting the single character image into a character classifier for each single character image included in the character image to be recognized, so as to obtain the ratio of the single character image belonging to each character classification;
第二确定模块,用于在所述单个字符图像属于各个字符分类的比率均小于或等于第一阈值的情况下,确定所述单个字符图像残缺。The second determining module is configured to determine that the single character image is defective when the ratios of the single character image belonging to each character classification are less than or equal to a first threshold.
根据本公开实施例的第三方面,提供一种文字识别装置,包括:处理器;用于存储处理器可执行指令的存储器;其中,所述处理器被配置为执行上述的方法。According to a third aspect of the embodiments of the present disclosure, there is provided a text recognition device, including: a processor; a memory for storing instructions executable by the processor; wherein the processor is configured to execute the above method.
根据本公开实施例的第四方面,提供一种非易失性计算机可读存储介质,其上存储有计算机程序指令,所述计算机程序指令被处理器执行时实现上述的方法。According to a fourth aspect of the embodiments of the present disclosure, there is provided a non-volatile computer-readable storage medium, on which computer program instructions are stored, and when the computer program instructions are executed by a processor, the above method is implemented.
本公开的实施例提供的技术方案可以包括以下有益效果:本公开的文字识别方法及装置,通过获取待识别文字图像包括的各个单个字符图像,针对待识别文字图像包括的每个单个字符图像,在确定单个字符图像残缺的情况下,将单个字符图像输入字符图像生成网络,得到第一修复单个字符图像,并根据第一修复单个字符图像进行文字识别,得到文字识别结果,根据待识别文字图像包括的各个单个字符图像对应的文字识别结果,得到待识别文字图像对应的文字识别结果,由此能够将待识别文字图像分割为多个单个字符图像,并对残缺单个字符图像进行修复,得到修复单个字符图像,进而对修复单个字符图像中的文字进行识别,由此能够大大提高文字识别的准确率。The technical solutions provided by the embodiments of the present disclosure may include the following beneficial effects: the text recognition method and device of the present disclosure, by acquiring each single character image included in the text image to be recognized, for each single character image included in the text image to be recognized, In the case that a single character image is determined to be incomplete, input the single character image into the character image generation network to obtain the first repaired single character image, and perform text recognition based on the first repaired single character image to obtain a text recognition result, and according to the text image to be recognized The text recognition results corresponding to each single character image included are obtained to obtain the text recognition results corresponding to the text image to be recognized, so that the text image to be recognized can be divided into multiple single character images, and the incomplete single character image is repaired to obtain the repair A single character image, and then recognize the text in the repaired single character image, which can greatly improve the accuracy of text recognition.
应当理解的是,以上的一般描述和后文的细节描述仅是示例性和解释性的,并不能限制本公开。It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the present disclosure.
附图说明Description of drawings
此处的附图被并入说明书中并构成本说明书的一部分,示出了符合本公开的实施例,并与说明书一起用于解释本公开的原理。The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description serve to explain the principles of the disclosure.
图1是根据一示例性实施例示出的一种文字识别方法的流程图。Fig. 1 is a flowchart of a character recognition method according to an exemplary embodiment.
图2是根据一示例性实施例示出的一种文字识别方法的流程图。Fig. 2 is a flowchart of a character recognition method according to an exemplary embodiment.
图3是根据一示例性实施例示出的生成网络的示意框图。Fig. 3 is a schematic block diagram of a generating network according to an exemplary embodiment.
图4是根据一示例性实施例示出的判别网络的示意框图。Fig. 4 is a schematic block diagram of a discrimination network according to an exemplary embodiment.
图5是根据一示例性实施例示出的一种文字识别装置的框图。Fig. 5 is a block diagram of a character recognition device according to an exemplary embodiment.
图6是根据一示例性实施例示出的一种文字识别装置的一示意性的框图。Fig. 6 is a schematic block diagram of a character recognition device according to an exemplary embodiment.
图7是根据一示例性实施例示出的一种用于文字识别的装置800的框图。Fig. 7 is a block diagram of an apparatus 800 for character recognition according to an exemplary embodiment.
具体实施方式detailed description
这里将详细地对示例性实施例进行说明,其示例表示在附图中。下面的描述涉及附图时,除非另有表示,不同附图中的相同数字表示相同或相似的要素。以下示例性实施例中所描述的实施方式并不代表与本公开相一致的所有实施方式。相反,它们仅是与如所附权利要求书中所详述的、本公开的一些方面相一致的装置和方法的例子。Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numerals in different drawings refer to the same or similar elements unless otherwise indicated. The implementations described in the following exemplary examples do not represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatuses and methods consistent with aspects of the present disclosure as recited in the appended claims.
图1是根据一示例性实施例示出的一种文字识别方法的流程图。该方法用于文字识别设备,本公开对此不做限制。如图1所示,该方法包括步骤S11至步骤S13。Fig. 1 is a flowchart of a character recognition method according to an exemplary embodiment. This method is used in a character recognition device, which is not limited in the present disclosure. As shown in Fig. 1, the method includes step S11 to step S13.
在步骤S11中,获取待识别文字图像包括的各个单个字符图像。In step S11, each individual character image included in the character image to be recognized is acquired.
其中,待识别文字图像可以为包括的全部单个字符图像均完整的文字图像或包括的部分或全部单个字符图像残缺的文字图像,本公开对此不做限制。残缺可以指因为遮挡、污染、褪色或印刷不全等原因,而造成字符的笔画或部首等部分缺失的情况。Wherein, the text image to be recognized may be a text image including all single character images intact or a text image including part or all single character images incomplete, which is not limited in the present disclosure. Incompleteness can refer to the lack of strokes or radicals of characters due to reasons such as occlusion, pollution, fading or incomplete printing.
其中,单个字符图像可以指仅包括一个字符的图像。字符可以包括中文字符、英文字符和数字字符等,本公开对此不做限制。作为本实施例的一个示例,在待识别文字图像包括200个字符的情况下,可以获取待识别文字图像包括的200个单个字符图像。Wherein, a single character image may refer to an image including only one character. The characters may include Chinese characters, English characters, numeric characters, etc., which are not limited in the present disclosure. As an example of this embodiment, in the case that the character image to be recognized includes 200 characters, 200 individual character images included in the character image to be recognized may be acquired.
在一种可能的实现方式中,获取待识别文字图像包括的各个单个字符图像(步骤S11)包括:获取待识别文字图像对应的颜色直方图;根据待识别文字图像对应的颜色直方图,将待识别文字图像进行分割,得到待识别文字图像包括的各个单个字符图像。In a possible implementation manner, obtaining each individual character image included in the text image to be recognized (step S11) includes: obtaining a color histogram corresponding to the text image to be recognized; according to the color histogram corresponding to the text image to be recognized, the The recognized text image is segmented to obtain individual character images included in the text image to be recognized.
在一种可能的实现方式中,可以通过Matlab(Matrix Laboratory,矩阵实验室)图像处理模块中Imhist(I,n)函数获取待识别文字图像对应的颜色直方图。其中,I表示灰度图像,n为指定的灰度级数目。需要说明的是,尽管以Imhist(I,n)函数作为示例介绍了获取待识别文字图像对应的颜色直方图的方法如上,但本领域技术人员能够理解,本公开应不限于此。In a possible implementation, the color histogram corresponding to the character image to be recognized can be obtained through the Imhist(I, n) function in the image processing module of Matlab (Matrix Laboratory). Among them, I represents the grayscale image, and n is the specified number of grayscale levels. It should be noted that although the method for obtaining the color histogram corresponding to the character image to be recognized is described above using the Imhist(I,n) function as an example, those skilled in the art can understand that the present disclosure should not be limited thereto.
在步骤S12中,针对待识别文字图像包括的每个单个字符图像,在确定单个字符图像残缺的情况下,将单个字符图像输入字符图像生成网络,得到第一修复单个字符图像,并根据第一修复单个字符图像进行文字识别,得到文字识别结果;其中,字符图像生成网络通过完整单个字符图像和残缺单个字符图像训练得到。In step S12, for each single character image included in the character image to be recognized, when it is determined that the single character image is incomplete, the single character image is input into the character image generation network to obtain the first repaired single character image, and according to the first A single character image is repaired for text recognition to obtain a text recognition result; wherein, the character image generation network is obtained by training the complete single character image and the incomplete single character image.
其中,字符图像生成网络的输入可以为单个字符图像,输出可以为第一修复单个字符图像。第一修复单个字符图像为单个字符图像对应的修复单个字符图像。第一修复单个字符图像与单个字符图像具有相同的尺寸和分辨率。Wherein, the input of the character image generation network may be a single character image, and the output may be the first repaired single character image. The first repaired single character image is a repaired single character image corresponding to the single character image. The first inpainting single character image has the same size and resolution as the single character image.
在一种可能的实现方式中,针对待识别文字图像包括的每个单个字符图像,将单个字符图像输入文字分类器,得到单个字符图像属于各个字符分类的比率;在单个字符图像属于各个字符分类的比率均小于或等于第一阈值的情况下,确定单个字符图像残缺。In a possible implementation, for each single character image included in the text image to be recognized, the single character image is input into the text classifier to obtain the ratio of the single character image belonging to each character classification; When the ratios of are less than or equal to the first threshold, it is determined that the image of a single character is incomplete.
其中,文字分类器可以用于对单个字符图像进行分类。文字分类器的输入为单个字符图像,输出为单个字符图像属于各个字符分类的比率。文字分类器可以采用VGG16(一种神经网络)进行训练。字符分类可以包括中文字符、英文字符和数字字符等,本公开对此不做限制。第一阈值可以为预先设置的数值,例如为0.1,本公开对此不做限制。Among them, the text classifier can be used to classify a single character image. The input of the text classifier is a single character image, and the output is the ratio of the single character image belonging to each character class. Text classifiers can be trained with VGG16 (a neural network). The character classification may include Chinese characters, English characters, numeric characters, etc., which is not limited in the present disclosure. The first threshold may be a preset value, for example, 0.1, which is not limited in the present disclosure.
在步骤S13中,根据待识别文字图像包括的各个单个字符图像对应的文字识别结果,得到待识别文字图像对应的文字识别结果。In step S13 , according to the character recognition results corresponding to individual character images included in the character image to be recognized, the character recognition result corresponding to the character image to be recognized is obtained.
在一种可能的实现方式中,在获取单个字符图像属于各个字符分类的比率之后,若单个字符图像属于各个字符分类的比率均小于或等于第一阈值,则确定单个字符图像残缺,由此将单个字符图像输入字符图像生成网络,得到第一修复单个字符图像,并根据第一修复单个字符图像进行文字识别,得到文字识别结果。在确定单个字符图像不残缺的情况下,可以直接对单个字符图像进行文字识别,得到文字识别结果。In a possible implementation, after obtaining the ratio of a single character image belonging to each character classification, if the ratio of a single character image belonging to each character classification is less than or equal to the first threshold, it is determined that the single character image is incomplete, and thus the A single character image is input into the character image generation network to obtain a first repaired single character image, and text recognition is performed based on the first repaired single character image to obtain a text recognition result. When it is determined that the single character image is not incomplete, character recognition can be directly performed on the single character image to obtain a character recognition result.
作为本实施例的一个示例,在确定单个字符图像不残缺的情况下,可以将比率取值最大的字符分类确定为单个字符图像对应的字符分类。获取字符分类对应的字符库,采用模板匹配法对单个字符图像进行文字识别,得到文字识别结果。需要说明的是,尽管以模板匹配法作为示例介绍了识别方法如上,但本领域技术人员能够理解,本公开应不限于此。As an example of this embodiment, when it is determined that the single character image is not incomplete, the character category with the largest ratio value may be determined as the character category corresponding to the single character image. Obtain the character library corresponding to the character classification, use the template matching method to perform text recognition on a single character image, and obtain the text recognition result. It should be noted that although the template matching method is used as an example to describe the recognition method above, those skilled in the art can understand that the present disclosure should not be limited thereto.
本公开的文字识别方法及装置,能够将待识别文字图像分割为多个单个字符图像,并对残缺单个字符图像进行修复,得到修复单个字符图像,进而对修复单个字符图像中的文字进行识别,由此能够大大提高文字识别的准确率。The character recognition method and device of the present disclosure can divide the character image to be recognized into a plurality of single character images, and repair the incomplete single character image to obtain the repaired single character image, and then recognize the characters in the repaired single character image, As a result, the accuracy of character recognition can be greatly improved.
图2是根据一示例性实施例示出的一种文字识别方法的流程图。如图2所示,该方法包括步骤S21至步骤S26。Fig. 2 is a flowchart of a character recognition method according to an exemplary embodiment. As shown in Fig. 2, the method includes step S21 to step S26.
在步骤S21中,对完整单个字符图像进行残缺处理,得到残缺单个字符图像。In step S21, incomplete processing is performed on the complete single character image to obtain a incomplete single character image.
其中,完整单个字符图像可以指单个字符完整的图像,残缺单个字符图像可以指单个字符的笔画或部首等部分残缺的图像。Wherein, a complete single character image may refer to a complete image of a single character, and an incomplete single character image may refer to a partially incomplete image such as strokes or radicals of a single character.
在一种可能的实现方式中,选取完整单个字符图像,将随机的遮挡物置于该选取的文字图像上,得到残缺单个字符图像。In a possible implementation manner, a complete single character image is selected, and a random occluder is placed on the selected character image to obtain an incomplete single character image.
在步骤S22中,根据完整单个字符图像和残缺单个字符图像,训练判别网络和生成网络,判别网络用于判别修复单个字符图像和完整单个字符图像的一致性。In step S22, according to the complete single character image and the incomplete single character image, the discriminant network and the generation network are trained, and the discriminant network is used to judge the consistency between the repaired single character image and the complete single character image.
需要说明的是,本领域技术人员能够理解,步骤S21中的完整单个字符图像和残缺单个字符图像用于训练判别网络和生成网络。在实际训练过程中,针对每组完整单个字符图像和残缺单个字符图像,交替训练判别网络和生成网络。此外,需要获取多组不同的完整单个字符图像和残缺单个字符图像,重复训练判别网络和生成网络,以增强训练得到的字符图像生成网络的稳定性和适应性。It should be noted that those skilled in the art can understand that the complete single character image and the incomplete single character image in step S21 are used to train the discrimination network and the generation network. In the actual training process, for each group of complete single character images and incomplete single character images, the discriminant network and the generation network are alternately trained. In addition, it is necessary to obtain multiple sets of different complete single character images and incomplete single character images, and repeatedly train the discriminant network and the generation network to enhance the stability and adaptability of the trained character image generation network.
其中,交替训练判别网络和生成网络可以指针对每组完整单个字符图像和残缺单个字符图像,在保持生成网络的参数不变的情况下,训练判别网络,然后在保持判别网络的参数不变的情况下,训练生成网络。交替训练判别网络和生成网络,直到生成网络根据残缺单个字符图像生成第二修复单个字符图像,判别网络无法分辨完整单个字符图像和第二修复单个字符图像是否一致,例如判别网络输出0.5,即第二修复单个字符图像有50%的概率与完整单个字符图像一致,有50%的概率与完整单个字符图像不一致。Among them, alternately training the discrimination network and the generation network can refer to each group of complete single character images and incomplete single character images, while keeping the parameters of the generation network unchanged, train the discrimination network, and then keep the parameters of the discrimination network constant. case, train the generative network. Alternately train the discrimination network and the generation network until the generation network generates the second repaired single character image based on the incomplete single character image. The discrimination network cannot distinguish whether the complete single character image is consistent with the second repaired single character image. For example, the discriminant network outputs 0.5, that is, the first Second, there is a 50% probability that the repaired single character image is consistent with the complete single character image, and a 50% probability that it is inconsistent with the complete single character image.
在步骤S23中,重复训练判别网络和生成网络,在训练次数达到预设阈值或判别网络的判别结果表明修复单个字符图像和完整单个字符图像的一致性满足预设条件时,将当前的生成网络确定为字符图像生成网络。In step S23, repeat the training of the discrimination network and the generation network, and when the number of training times reaches the preset threshold or the discrimination result of the discrimination network shows that the consistency between the repaired single character image and the complete single character image satisfies the preset condition, the current generation network Make sure to generate the network for character images.
在一种可能的实现方式中,生成网络包括通过残差方式连接的多个编码模块(Encode)和多个解码模块(Decode),编码模块包括卷积层、线性整流函数(ReLu,RectifiedLinear Unit)层和最大池化层(Max Pooling),解码模块包括卷积层、线性整流函数层和最大池化层。In a possible implementation, the generation network includes multiple encoding modules (Encode) and multiple decoding modules (Decode) connected by residuals, and the encoding modules include convolutional layers and linear rectification functions (ReLu, RectifiedLinear Unit) layer and the maximum pooling layer (Max Pooling), the decoding module includes a convolutional layer, a linear rectification function layer and a maximum pooling layer.
其中,编码模块用于对图像进行编码。解码模块用于对通过编码模块编码得到的图像进行解码。编码模块和解码模块均能够改变图像分辨率和图像通道数,例如增加图像分辨率并降低图像通道数,或降低图像分辨率并增加图像通道数。卷积层、线性整流函数层和最大池化层均为编码模块和解码模块中的一个基本运算单元。Wherein, the encoding module is used to encode the image. The decoding module is used to decode the image encoded by the encoding module. Both the encoding module and the decoding module can change the image resolution and the number of image channels, for example, increase the image resolution and reduce the number of image channels, or reduce the image resolution and increase the number of image channels. Convolution layer, linear rectification function layer and maximum pooling layer are all a basic operation unit in encoding module and decoding module.
图3是根据一示例性实施例示出的生成网络的示意框图。如图3所示,生成网络为一个十层的编码模块-解码模块结构,包括通过残差方式连接的5个编码模块和5个解码模块。每个编码模块和每个解码模块均包括1个卷积层、1个线性整流函数层和1个最大池化层。其中,编码模块分别为Encode1(32*32*3)、Encode2(16*16*64)、Encode3(8*8*128)、Encode4(4*4*256)和Encode5(2*2*512)。解码模块分别为Decode1(2*2*512)、Decode2(4*4*256)、Decode3(8*8*128)、Decode4(16*16*64)和Decode5(32*32*3)。可以理解的是,32*32*3中的32*32可以表示图像分辨率,3可以表示图像通道数。Fig. 3 is a schematic block diagram of a generating network according to an exemplary embodiment. As shown in Figure 3, the generation network is a ten-layer encoding module-decoding module structure, including 5 encoding modules and 5 decoding modules connected by residuals. Each encoding module and each decoding module include a convolution layer, a linear rectification function layer and a maximum pooling layer. Among them, the encoding modules are Encode1(32*32*3), Encode2(16*16*64), Encode3(8*8*128), Encode4(4*4*256) and Encode5(2*2*512) . The decoding modules are Decode1(2*2*512), Decode2(4*4*256), Decode3(8*8*128), Decode4(16*16*64) and Decode5(32*32*3). It can be understood that 32*32 in 32*32*3 can represent the image resolution, and 3 can represent the number of image channels.
在一种可能的实现方式中,判别网络包括依次连接的多个编码模块、多个全连接(FC,Fully Connected Layers)层和阈值函数(Sigmoid)层,编码模块包括卷积层、线性整流函数层和最大池化层。In a possible implementation, the discriminant network includes multiple encoding modules sequentially connected, multiple fully connected (FC, Fully Connected Layers) layers and threshold function (Sigmoid) layers, and the encoding module includes a convolutional layer, a linear rectification function layer and max pooling layer.
其中,编码模块用于对图像进行编码。全连接层用于将学习到的分布式特征表示映射到样本标记空间。阈值函数层用于将变量映射到[0,1]之间。编码模块能够改变图像分辨率和图像通道数,例如降低图像分辨率并增加图像通道数。卷积层、线性整流函数层和最大池化层均为编码模块中的一个基本运算单元。Wherein, the encoding module is used to encode the image. Fully connected layers are used to map the learned distributed feature representations to the sample label space. The threshold function layer is used to map variables between [0, 1]. The encoding module can change the image resolution and the number of image channels, such as reducing the image resolution and increasing the number of image channels. Convolution layer, linear rectification function layer and maximum pooling layer are all a basic operation unit in the encoding module.
图4是根据一示例性实施例示出的判别网络的示意框图。如图4所示,判别网络包括依次连接的5个编码模块、2个全连接层和1个阈值函数层。每个编码模块包括1个卷积层、1个线性整流函数层和1个最大池化层。其中,编码模块分别为Encode1’(32*32*6)、Encode2’(16*16*64)、Encode3’(8*8*128)、Encode4’(4*4*256)和Encode5’(2*2*512)。可以理解的是,32*32*3中的32*32可以表示图像分辨率,3可以表示图像通道数。Fig. 4 is a schematic block diagram of a discrimination network according to an exemplary embodiment. As shown in Figure 4, the discriminative network consists of 5 encoding modules connected sequentially, 2 fully connected layers and 1 threshold function layer. Each encoding module includes 1 convolutional layer, 1 linear rectification function layer and 1 max pooling layer. Among them, the encoding modules are Encode1'(32*32*6), Encode2'(16*16*64), Encode3'(8*8*128), Encode4'(4*4*256) and Encode5'(2 *2*512). It can be understood that 32*32 in 32*32*3 can represent the image resolution, and 3 can represent the number of image channels.
在一种可能的实现方式中,根据完整单个字符图像和残缺单个字符图像,训练判别网络和生成网络(步骤S22)可以包括:将残缺单个字符图像输入生成网络,得到第二修复单个字符图像;将完整单个字符图像和第二修复单个字符图像输入判别网络,得到用于表示第二修复单个字符图像与完整单个字符图像是否一致的判别结果;根据判别结果,调整生成网络或判别网络中参数的取值。In a possible implementation, according to the complete single character image and the incomplete single character image, training the discrimination network and the generation network (step S22) may include: inputting the incomplete single character image into the generation network to obtain a second repaired single character image; Input the complete single character image and the second repaired single character image into the discriminant network to obtain a discriminant result indicating whether the second repaired single character image is consistent with the complete single character image; according to the discriminant result, adjust the parameters in the generation network or the discriminative network value.
在一种可能的实现方式中,判别网络的输入可以为完整单个字符图像和第二修复单个字符图像进行融合后得到的融合图像,输出可以为用于表示第二修复单个字符图像是否为完整单个字符图像的判别结果。例如,完整单个字符图像和第二修复单个字符图像对应的图像通道数均为3,将完整单个字符图像和第二修复单个字符图像进行融合,得到图像通道数为6的融合图像,将融合图像作为判别网络的输入。In a possible implementation, the input of the discriminant network can be the fusion image obtained after fusing the complete single character image and the second repaired single character image, and the output can be used to indicate whether the second repaired single character image is a complete single Discrimination results of character images. For example, the number of image channels corresponding to the complete single character image and the second repaired single character image is 3, the complete single character image and the second repaired single character image are fused to obtain a fused image with 6 image channels, and the fused image as input to the discriminative network.
在一种可能的实现方式中,交替训练判别网络和生成网络,并根据判别结果采用反向传播算法调整判别网络和生成网络中参数的取值,直到判别网络和生成网络都收敛。其中,判别网络和生成网络都收敛可以指判别结果处于稳定状态或训练次数达到预设阈值。In a possible implementation, the discriminant network and the generator network are alternately trained, and the values of the parameters in the discriminant network and the generator network are adjusted according to the discriminant result using a backpropagation algorithm until both the discriminant network and the generator network converge. Wherein, the convergence of both the discriminant network and the generative network may refer to that the discriminant result is in a stable state or the number of training times reaches a preset threshold.
在一种可能的实现方式中,采用式1确定生成网络G;In a possible implementation, formula 1 is used to determine the generation network G;
其中,G表示生成网络,D表示判别网络,表示生成网络对应的损失结果,x表示完整单个字符图像,D(x)表示x作为输入得到的判别结果,表示判别网络对应的损失结果,z表示残缺单个字符图像,G(z)表示z作为输入得到的生成结果,即G(z)表示第二修复单个字符图像,D(G(z))表示G(z)作为输入得到的判别结果,E[||x-G(z)||1表示完整单个字符图像和第二修复单个字符图像之差的平滑损失(Smooth L1 Loss),E[||cannyx-cannyG(z)||1]表示完整单个字符图像的Canny特征和第二修复单个字符图像的Canny特征之差的平滑损失。Among them, G represents the generation network, D represents the discriminative network, Indicates the loss result corresponding to the generation network, x indicates a complete single character image, D(x) indicates the discrimination result obtained by x as input, Represents the loss result corresponding to the discriminant network, z represents the incomplete single character image, G(z) represents the generated result obtained by z as input, that is, G(z) represents the second repaired single character image, D(G(z)) represents G (z) The discriminative result obtained as input, E[||xG(z)||1 represents the smoothing loss (Smooth L1 Loss) of the difference between the complete single character image and the second repaired single character image, E[||cannyx -cannyG(z) ||1 ] represents the smoothing loss of the difference between the Canny feature of the full single character image and the Canny feature of the second inpainted single character image.
可以理解的是,生成网络G是用于生成图像的网络,它接收随机的噪声z,通过噪声z生成图像G(z)。判别网络D是用于判别的网络,输出判别结果。判别结果表示输入判别网络的图像是否为真实图像的概率。判别结果为1则表示输入判别网络的图像100%是真实图像,判别结果为0则表示输入判别网络的图像不可能是真实图像。It can be understood that the generator network G is a network for generating images, it receives random noise z, and generates an image G(z) through the noise z. The discrimination network D is a network for discrimination, and outputs a discrimination result. The discriminant result represents the probability of whether the image input to the discriminant network is a real image. A discriminant result of 1 means that 100% of the image input to the discriminant network is a real image, and a discriminative result of 0 means that the image input to the discriminant network cannot be a real image.
在实际训练过程中,生成网络G的目标就是尽量生成真实图像去欺骗判别网络D。而判别网络D的目标就是尽量把生成网络G生成的图像和真实图像分别开来。由此生成网络G和判别网络D构成了一个动态的博弈过程。在最理想的状态下,最后博弈的结果为生成网络G可以生成足以以假乱真的图像G(z),判别网络D难以判别生成网络G生成的图像究竟是不是真实,因此D(G(z))=0.5。In the actual training process, the goal of the generation network G is to generate real images as much as possible to deceive the discriminant network D. The goal of the discriminative network D is to separate the image generated by the generation network G from the real image as much as possible. Therefore, the generation network G and the discrimination network D constitute a dynamic game process. In the most ideal state, the final result of the game is that the generator network G can generate a fake image G(z), and it is difficult for the discriminant network D to judge whether the image generated by the generator network G is real or not, so D(G(z)) = 0.5.
需要说明的是,本领域技术人员应该能够理解,x表示真实图像,z表示输入生成网络G的噪声,而G(z)表示生成网络G生成的图像。D(x)表示判别网络D判断真实图像x是否真实的概率。D(G(z))表示判别网络D判断生成网络G生成的图像是否真实的概率。因为x就是真实图像,所以对于判别网络D来说,D(x)越接近1越好。而G(z)是生成网络G生成的图像,所以对于判别网络D来说,D(x)越接近0越好。生成网络G的目的:D(G(z))是判别网络D判断生成网络G生成的图像是否真实的概率,生成网络G应该希望自己生成的图像越接近真实越好。也就是说,生成网络G希望D(G(z))尽可能得大,这时的值会变小。因此式1的最前面的记号是判别网络D的目的:判别网络D的能力越强,D(x)应该越大,D(G(z))应该越小,这时的值会变大。因此式1的最前面的记号是It should be noted that those skilled in the art should be able to understand that x represents the real image, z represents the noise input to the generation network G, and G(z) represents the image generated by the generation network G. D(x) represents the probability that the discriminative network D judges whether the real image x is real or not. D(G(z)) represents the probability that the discriminative network D judges whether the image generated by the generative network G is real or not. Because x is the real image, for the discriminant network D, the closer D(x) is to 1, the better. And G(z) is the image generated by the generation network G, so for the discriminant network D, the closer D(x) is to 0, the better. The purpose of the generation network G: D(G(z)) is the probability that the discriminant network D judges whether the image generated by the generation network G is real. The generation network G should hope that the image generated by itself is as close to the real as possible. That is to say, the generation network G wants D(G(z)) to be as large as possible, then value will become smaller. So the first notation of Equation 1 is The purpose of discriminating network D: the stronger the ability of discriminating network D, the larger D(x) should be, and the smaller D(G(z)) should be. At this time value will increase. So the first notation of Equation 1 is
在步骤S24中,获取待识别文字图像包括的各个单个字符图像。In step S24, each single character image included in the character image to be recognized is acquired.
针对该步骤的描述可以参见步骤S11。For the description of this step, refer to step S11.
在步骤S25中,针对待识别文字图像包括的每个单个字符图像,在确定单个字符图像残缺的情况下,将单个字符图像输入字符图像生成网络,得到第一修复单个字符图像,并根据第一修复单个字符图像进行文字识别,得到文字识别结果;其中,字符图像生成网络通过完整单个字符图像和残缺单个字符图像训练得到。In step S25, for each individual character image included in the text image to be recognized, when it is determined that the individual character image is incomplete, the individual character image is input into the character image generation network to obtain the first repaired individual character image, and according to the first A single character image is repaired for text recognition to obtain a text recognition result; wherein, the character image generation network is obtained by training the complete single character image and the incomplete single character image.
针对该步骤的描述可以参见步骤S12。For the description of this step, refer to step S12.
在步骤S26中,根据待识别文字图像包括的各个单个字符图像对应的文字识别结果,得到待识别文字图像对应的文字识别结果。In step S26, the character recognition result corresponding to the character image to be recognized is obtained according to the character recognition result corresponding to each single character image included in the character image to be recognized.
针对该步骤的描述可以参见步骤S13。For the description of this step, refer to step S13.
需要说明的是,本领域技术人员应当理解,步骤S21至步骤S23为训练得到字符图像生成网络的过程,步骤S24至步骤S26为实际使用字符图像生成网络的过程。训练过程为非常态过程,使用过程为常态过程。It should be noted that those skilled in the art should understand that steps S21 to S23 are the process of training the character image generation network, and steps S24 to S26 are the process of actually using the character image generation network. The training process is an abnormal process, and the use process is a normal process.
在一种可能的实现方式中,在文字识别设备中封装训练得到的字符图像生成网络,以使得文字识别设备可以重复使用该字符图像生成网络对单个字符图像进行修复,得到修复单个字符图像,并对修复单个字符图像中的字符信息进行识别,由此能够大大提高文字识别的准确率。In a possible implementation, the character image generation network obtained by encapsulation in the text recognition device, so that the text recognition device can reuse the character image generation network to repair a single character image, obtain a repaired single character image, and Recognizing the character information in the repaired single character image can greatly improve the accuracy of character recognition.
本公开的文字识别方法,采用生成式对抗网络进行生成网络的训练,使得生成网络对残缺单个字符图像具有较好的修复能力,能够修复得到与完整单个字符图像相同或相似的修复单个字符图像,由此能够大大提高文字识别的准确率。The text recognition method of the present disclosure uses a generative confrontation network to train the generative network, so that the generative network has a better repair ability for the incomplete single character image, and can repair a single character image that is the same as or similar to the complete single character image. As a result, the accuracy of character recognition can be greatly improved.
图5是根据一示例性实施例示出的一种文字识别装置的框图。参照图5,该装置包括:第一获取模块51,用于获取待识别文字图像包括的各个单个字符图像;修复模块52,用于针对所述待识别文字图像包括的每个单个字符图像,在确定所述单个字符图像残缺的情况下,将所述单个字符图像输入字符图像生成网络,得到第一修复单个字符图像,并根据所述第一修复单个字符图像进行文字识别,得到文字识别结果;其中,所述字符图像生成网络通过完整单个字符图像和残缺单个字符图像训练得到;识别模块53,用于根据所述待识别文字图像包括的各个单个字符图像对应的文字识别结果,得到所述待识别文字图像对应的文字识别结果。Fig. 5 is a block diagram of a character recognition device according to an exemplary embodiment. Referring to Fig. 5, the device includes: a first acquiring module 51, configured to acquire each individual character image included in the character image to be recognized; a repair module 52, configured to, for each individual character image included in the character image to be recognized, in When it is determined that the single character image is incomplete, input the single character image into the character image generation network to obtain a first repaired single character image, and perform text recognition according to the first repaired single character image to obtain a text recognition result; Wherein, the character image generation network is obtained by training the complete single character image and the incomplete single character image; the recognition module 53 is used to obtain the character recognition result corresponding to each single character image included in the character image to be recognized. Recognize the text recognition result corresponding to the text image.
图6是根据一示例性实施例示出的一种文字识别装置的一示意性的框图。参照图6:Fig. 6 is a schematic block diagram of a character recognition device according to an exemplary embodiment. Referring to Figure 6:
在一种可能的实现方式中,所述装置还包括:处理模块54,用于对所述完整单个字符图像进行残缺处理,得到所述残缺单个字符图像;训练模块55,用于根据所述完整单个字符图像和所述残缺单个字符图像,训练判别网络和生成网络,所述判别网络用于判别所述修复单个字符图像和所述完整单个字符图像的一致性;第一确定模块56,用于重复训练所述判别网络和所述生成网络,在训练次数达到预设阈值或所述判别网络的判别结果表明所述修复单个字符图像和所述完整单个字符图像的一致性满足预设条件时,将当前的生成网络确定为所述字符图像生成网络。In a possible implementation manner, the device further includes: a processing module 54, configured to perform incomplete processing on the complete single character image to obtain the incomplete single character image; a training module 55, configured to The single character image and the incomplete single character image train a discrimination network and a generation network, and the discrimination network is used to judge the consistency between the repaired single character image and the complete single character image; the first determination module 56 is used to Repeatedly training the discrimination network and the generation network, when the number of training times reaches a preset threshold or the discrimination result of the discrimination network shows that the consistency between the repaired single character image and the complete single character image satisfies a preset condition, The current generation network is determined as the character image generation network.
在一种可能的实现方式中,所述训练模块55用于:将所述残缺单个字符图像输入所述生成网络,得到第二修复单个字符图像;将所述完整单个字符图像和所述第二修复单个字符图像输入所述判别网络,得到用于表示所述第二修复单个字符图像与所述完整单个字符图像是否一致的判别结果;根据所述判别结果,调整所述判别网络或所述生成网络中参数的取值。In a possible implementation manner, the training module 55 is configured to: input the incomplete single character image into the generation network to obtain a second repaired single character image; combine the complete single character image with the second repairing a single character image and inputting it into the discriminant network to obtain a discriminant result indicating whether the second repaired single character image is consistent with the complete single character image; according to the discriminant result, adjust the discriminant network or the generated The value of the parameter in the network.
在一种可能的实现方式中,所述生成网络包括通过残差方式连接的多个编码模块和多个解码模块,所述编码模块包括卷积层、线性整流函数层和最大池化层,所述解码模块包括卷积层、线性整流函数层和最大池化层。In a possible implementation, the generation network includes a plurality of encoding modules and a plurality of decoding modules connected in a residual manner, and the encoding module includes a convolution layer, a linear rectification function layer and a maximum pooling layer, so The above decoding module includes a convolution layer, a linear rectification function layer and a maximum pooling layer.
在一种可能的实现方式中,所述判别网络包括依次连接的多个编码模块、多个全连接层和阈值函数层,所述编码模块包括卷积层、线性整流函数层和最大池化层。In a possible implementation, the discriminant network includes a plurality of encoding modules connected in sequence, a plurality of fully connected layers and a threshold function layer, and the encoding module includes a convolution layer, a linear rectification function layer and a maximum pooling layer .
在一种可能的实现方式中,所述装置还包括:第二获取模块57,用于针对所述待识别文字图像包括的每个单个字符图像,将所述单个字符图像输入文字分类器,得到所述单个字符图像属于各个字符分类的比率;第二确定模块58,用于在所述单个字符图像属于各个字符分类的比率均小于或等于第一阈值的情况下,确定所述单个字符图像残缺。In a possible implementation manner, the device further includes: a second acquisition module 57, configured to input the single character image into the text classifier for each single character image included in the character image to be recognized, to obtain The ratio of the single character image belonging to each character classification; the second determination module 58 is used to determine that the single character image is incomplete when the ratio of the single character image belonging to each character classification is less than or equal to the first threshold .
关于上述实施例中的装置,其中各个模块执行操作的具体方式已经在有关该方法的实施例中进行了详细描述,此处将不做详细阐述说明。Regarding the apparatus in the foregoing embodiments, the specific manner in which each module executes operations has been described in detail in the embodiments related to the method, and will not be described in detail here.
本公开的文字识别装置,能够将待识别文字图像分割为多个单个字符图像,并对残缺单个字符图像进行修复,得到修复单个字符图像,进而对修复单个字符图像中的文字进行识别,由此能够大大提高文字识别的准确率。The character recognition device of the present disclosure can divide the character image to be recognized into a plurality of single character images, and repair the incomplete single character image to obtain the repaired single character image, and then recognize the characters in the repaired single character image, thereby Can greatly improve the accuracy of character recognition.
图7是根据一示例性实施例示出的一种用于文字识别的装置800的框图。例如,装置800可以是移动电话,计算机,数字广播终端,消息收发设备,游戏控制台,平板设备,医疗设备,健身设备,个人数字助理等具有文字识别功能的设备。Fig. 7 is a block diagram of an apparatus 800 for character recognition according to an exemplary embodiment. For example, the device 800 may be a mobile phone, a computer, a digital broadcasting terminal, a message sending and receiving device, a game console, a tablet device, a medical device, a fitness device, a personal digital assistant, and other devices with character recognition functions.
参照图7,装置800可以包括以下一个或多个组件:处理组件802,存储器804,电源组件806,多媒体组件808,音频组件810,输入/输出(I/O)的接口812,传感器组件814,以及通信组件816。7, device 800 may include one or more of the following components: processing component 802, memory 804, power supply component 806, multimedia component 808, audio component 810, input/output (I/O) interface 812, sensor component 814, and communication component 816 .
处理组件802通常控制装置800的整体操作,诸如与显示,电话呼叫,数据通信,相机操作和记录操作相关联的操作。处理组件802可以包括一个或多个处理器820来执行指令,以完成上述的方法的全部或部分步骤。此外,处理组件802可以包括一个或多个模块,便于处理组件802和其他组件之间的交互。例如,处理组件802可以包括多媒体模块,以方便多媒体组件808和处理组件802之间的交互。The processing component 802 generally controls the overall operations of the device 800, such as those associated with display, telephone calls, data communications, camera operations, and recording operations. The processing component 802 may include one or more processors 820 to execute instructions to complete all or part of the steps of the above method. Additionally, processing component 802 may include one or more modules that facilitate interaction between processing component 802 and other components. For example, processing component 802 may include a multimedia module to facilitate interaction between multimedia component 808 and processing component 802 .
存储器804被配置为存储各种类型的数据以支持在装置800的操作。这些数据的示例包括用于在装置800上操作的任何应用程序或方法的指令,联系人数据,电话簿数据,消息,图片,视频等。存储器804可以由任何类型的易失性或非易失性存储设备或者它们的组合实现,如静态随机存取存储器(SRAM),电可擦除可编程只读存储器(EEPROM),可擦除可编程只读存储器(EPROM),可编程只读存储器(PROM),只读存储器(ROM),磁存储器,快闪存储器,磁盘或光盘。The memory 804 is configured to store various types of data to support operations at the device 800 . Examples of such data include instructions for any application or method operating on device 800, contact data, phonebook data, messages, pictures, videos, and the like. The memory 804 can be implemented by any type of volatile or non-volatile storage device or their combination, such as static random access memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable Programmable Read Only Memory (EPROM), Programmable Read Only Memory (PROM), Read Only Memory (ROM), Magnetic Memory, Flash Memory, Magnetic or Optical Disk.
电源组件806为装置800的各种组件提供电力。电源组件806可以包括电源管理系统,一个或多个电源,及其他与为装置800生成、管理和分配电力相关联的组件。The power supply component 806 provides power to the various components of the device 800 . Power components 806 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for device 800 .
多媒体组件808包括在所述装置800和用户之间的提供一个输出接口的屏幕。在一些实施例中,屏幕可以包括液晶显示器(LCD)和触摸面板(TP)。如果屏幕包括触摸面板,屏幕可以被实现为触摸屏,以接收来自用户的输入信号。触摸面板包括一个或多个触摸传感器以感测触摸、滑动和触摸面板上的手势。所述触摸传感器可以不仅感测触摸或滑动动作的边界,而且还检测与所述触摸或滑动操作相关的持续时间和压力。在一些实施例中,多媒体组件808包括一个前置摄像头和/或后置摄像头。当装置800处于操作模式,如拍摄模式或视频模式时,前置摄像头和/或后置摄像头可以接收外部的多媒体数据。每个前置摄像头和后置摄像头可以是一个固定的光学透镜系统或具有焦距和光学变焦能力。The multimedia component 808 includes a screen that provides an output interface between the device 800 and the user. In some embodiments, the screen may include a liquid crystal display (LCD) and a touch panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive input signals from a user. The touch panel includes one or more touch sensors to sense touches, swipes, and gestures on the touch panel. The touch sensor may not only sense a boundary of a touch or swipe action, but also detect duration and pressure associated with the touch or swipe action. In some embodiments, the multimedia component 808 includes a front camera and/or a rear camera. When the device 800 is in an operation mode, such as a shooting mode or a video mode, the front camera and/or the rear camera can receive external multimedia data. Each front camera and rear camera can be a fixed optical lens system or have focal length and optical zoom capability.
音频组件810被配置为输出和/或输入音频信号。例如,音频组件810包括一个麦克风(MIC),当装置800处于操作模式,如呼叫模式、记录模式和语音识别模式时,麦克风被配置为接收外部音频信号。所接收的音频信号可以被进一步存储在存储器804或经由通信组件816发送。在一些实施例中,音频组件810还包括一个扬声器,用于输出音频信号。The audio component 810 is configured to output and/or input audio signals. For example, the audio component 810 includes a microphone (MIC) configured to receive external audio signals when the device 800 is in operation modes, such as call mode, recording mode and voice recognition mode. Received audio signals may be further stored in memory 804 or sent via communication component 816 . In some embodiments, the audio component 810 also includes a speaker for outputting audio signals.
I/O接口812为处理组件802和外围接口模块之间提供接口,上述外围接口模块可以是键盘,点击轮,按钮等。这些按钮可包括但不限于:主页按钮、音量按钮、启动按钮和锁定按钮。The I/O interface 812 provides an interface between the processing component 802 and a peripheral interface module, which may be a keyboard, a click wheel, a button, and the like. These buttons may include, but are not limited to: a home button, volume buttons, start button, and lock button.
传感器组件814包括一个或多个传感器,用于为装置800提供各个方面的状态评估。例如,传感器组件814可以检测到装置800的打开/关闭状态,组件的相对定位,例如所述组件为装置800的显示器和小键盘,传感器组件814还可以检测装置800或装置800一个组件的位置改变,用户与装置800接触的存在或不存在,装置800方位或加速/减速和装置800的温度变化。传感器组件814可以包括接近传感器,被配置用来在没有任何的物理接触时检测附近物体的存在。传感器组件814还可以包括光传感器,如CMOS或CCD图像传感器,用于在成像应用中使用。在一些实施例中,该传感器组件814还可以包括加速度传感器,陀螺仪传感器,磁传感器,压力传感器或温度传感器。Sensor assembly 814 includes one or more sensors for providing status assessments of various aspects of device 800 . For example, the sensor component 814 can detect the open/closed state of the device 800, the relative positioning of components, such as the display and keypad of the device 800, and the sensor component 814 can also detect a change in the position of the device 800 or a component of the device 800 , the presence or absence of user contact with the device 800 , the device 800 orientation or acceleration/deceleration and the temperature change of the device 800 . Sensor assembly 814 may include a proximity sensor configured to detect the presence of nearby objects in the absence of any physical contact. Sensor assembly 814 may also include an optical sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor component 814 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor or a temperature sensor.
通信组件816被配置为便于装置800和其他设备之间有线或无线方式的通信。装置800可以接入基于通信标准的无线网络,如WiFi,2G或3G,或它们的组合。在一个示例性实施例中,通信组件816经由广播信道接收来自外部广播管理系统的广播信号或广播相关信息。在一个示例性实施例中,所述通信组件816还包括近场通信(NFC)模块,以促进短程通信。例如,在NFC模块可基于射频识别(RFID)技术,红外数据协会(IrDA)技术,超宽带(UWB)技术,蓝牙(BT)技术和其他技术来实现。The communication component 816 is configured to facilitate wired or wireless communication between the apparatus 800 and other devices. The device 800 can access wireless networks based on communication standards, such as WiFi, 2G or 3G, or a combination thereof. In an exemplary embodiment, the communication component 816 receives broadcast signals or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 816 also includes a near field communication (NFC) module to facilitate short-range communication. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, Infrared Data Association (IrDA) technology, Ultra Wide Band (UWB) technology, Bluetooth (BT) technology and other technologies.
在示例性实施例中,装置800可以被一个或多个应用专用集成电路(ASIC)、数字信号处理器(DSP)、数字信号处理设备(DSPD)、可编程逻辑器件(PLD)、现场可编程门阵列(FPGA)、控制器、微控制器、微处理器或其他电子元件实现,用于执行上述方法。In an exemplary embodiment, apparatus 800 may be programmed by one or more application specific integrated circuits (ASICs), digital signal processors (DSPs), digital signal processing devices (DSPDs), programmable logic devices (PLDs), field programmable A gate array (FPGA), controller, microcontroller, microprocessor or other electronic component implementation for performing the methods described above.
在示例性实施例中,还提供了一种包括指令的非临时性计算机可读存储介质,例如包括指令的存储器804,上述指令可由装置800的处理器820执行以完成上述方法。例如,所述非临时性计算机可读存储介质可以是ROM、随机存取存储器(RAM)、CD-ROM、磁带、软盘和光数据存储设备等。In an exemplary embodiment, there is also provided a non-transitory computer-readable storage medium including instructions, such as the memory 804 including instructions, which can be executed by the processor 820 of the device 800 to implement the above method. For example, the non-transitory computer readable storage medium may be ROM, random access memory (RAM), CD-ROM, magnetic tape, floppy disk, optical data storage device, and the like.
本领域技术人员在考虑说明书及实践这里公开的发明后,将容易想到本公开的其它实施方案。本申请旨在涵盖本公开的任何变型、用途或者适应性变化,这些变型、用途或者适应性变化遵循本公开的一般性原理并包括本公开未公开的本技术领域中的公知常识或惯用技术手段。说明书和实施例仅被视为示例性的,本公开的真正范围和精神由下面的权利要求指出。Other embodiments of the present disclosure will be readily apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any modification, use or adaptation of the present disclosure, and these modifications, uses or adaptations follow the general principles of the present disclosure and include common knowledge or conventional technical means in the technical field not disclosed in the present disclosure . The specification and examples are to be considered exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.
应当理解的是,本公开并不局限于上面已经描述并在附图中示出的精确结构,并且可以在不脱离其范围进行各种修改和改变。本公开的范围仅由所附的权利要求来限制。It should be understood that the present disclosure is not limited to the precise constructions which have been described above and shown in the drawings, and various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201710891330.8ACN107609560A (en) | 2017-09-27 | 2017-09-27 | Character recognition method and device |
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201710891330.8ACN107609560A (en) | 2017-09-27 | 2017-09-27 | Character recognition method and device |
| Publication Number | Publication Date |
|---|---|
| CN107609560Atrue CN107609560A (en) | 2018-01-19 |
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN201710891330.8APendingCN107609560A (en) | 2017-09-27 | 2017-09-27 | Character recognition method and device |
| Country | Link |
|---|---|
| CN (1) | CN107609560A (en) |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN108427950A (en)* | 2018-02-01 | 2018-08-21 | 北京捷通华声科技股份有限公司 | A kind of literal line detection method and device |
| CN108460830A (en)* | 2018-05-09 | 2018-08-28 | 厦门美图之家科技有限公司 | Image repair method, device and image processing equipment |
| CN108550118A (en)* | 2018-03-22 | 2018-09-18 | 深圳大学 | Fuzzy processing method, device, equipment and the storage medium of motion blur image |
| CN108710896A (en)* | 2018-04-24 | 2018-10-26 | 浙江工业大学 | The field learning method of learning network is fought based on production |
| CN109117843A (en)* | 2018-08-01 | 2019-01-01 | 百度在线网络技术(北京)有限公司 | Character occlusion detection method and device |
| CN109978805A (en)* | 2019-03-18 | 2019-07-05 | Oppo广东移动通信有限公司 | Photographing processing method, device, mobile terminal, and storage medium |
| CN110335212A (en)* | 2019-06-28 | 2019-10-15 | 西安理工大学 | Repair method of missing ancient Chinese characters based on conditional confrontation network |
| CN110363189A (en)* | 2018-04-09 | 2019-10-22 | 珠海金山办公软件有限公司 | A method, device, electronic device, and readable storage medium for restoring document content |
| CN111402156A (en)* | 2020-03-11 | 2020-07-10 | 腾讯科技(深圳)有限公司 | Restoration method and device for smear image, storage medium and terminal equipment |
| CN112287930A (en)* | 2020-11-02 | 2021-01-29 | 深圳市童书王国际文化传媒有限公司 | A kind of intelligent point reading text system and using method thereof |
| CN112801911A (en)* | 2021-02-08 | 2021-05-14 | 苏州长嘴鱼软件有限公司 | Method and device for removing Chinese character noise in natural image and storage medium |
| CN113610082A (en)* | 2021-08-12 | 2021-11-05 | 北京有竹居网络技术有限公司 | A character recognition method and related equipment |
| CN118691512A (en)* | 2024-08-26 | 2024-09-24 | 浙江鸟潮供应链管理有限公司 | Image processing method, storage medium and electronic device based on large model |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2006277149A (en)* | 2005-03-28 | 2006-10-12 | Fuji Xerox Co Ltd | Character and image segmentation device, character and image segmentation method, and program |
| CN104899588A (en)* | 2015-06-26 | 2015-09-09 | 小米科技有限责任公司 | Method and device for recognizing characters in image |
| CN104966097A (en)* | 2015-06-12 | 2015-10-07 | 成都数联铭品科技有限公司 | Complex character recognition method based on deep learning |
| CN106251312A (en)* | 2016-08-09 | 2016-12-21 | 央视国际网络无锡有限公司 | Incomplete automatically benefit of a kind of picture paints method |
| CN106548169A (en)* | 2016-11-02 | 2017-03-29 | 重庆中科云丛科技有限公司 | Fuzzy literal Enhancement Method and device based on deep neural network |
| CN107133934A (en)* | 2017-05-18 | 2017-09-05 | 北京小米移动软件有限公司 | Image completion method and device |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2006277149A (en)* | 2005-03-28 | 2006-10-12 | Fuji Xerox Co Ltd | Character and image segmentation device, character and image segmentation method, and program |
| CN104966097A (en)* | 2015-06-12 | 2015-10-07 | 成都数联铭品科技有限公司 | Complex character recognition method based on deep learning |
| CN104899588A (en)* | 2015-06-26 | 2015-09-09 | 小米科技有限责任公司 | Method and device for recognizing characters in image |
| CN106251312A (en)* | 2016-08-09 | 2016-12-21 | 央视国际网络无锡有限公司 | Incomplete automatically benefit of a kind of picture paints method |
| CN106548169A (en)* | 2016-11-02 | 2017-03-29 | 重庆中科云丛科技有限公司 | Fuzzy literal Enhancement Method and device based on deep neural network |
| CN107133934A (en)* | 2017-05-18 | 2017-09-05 | 北京小米移动软件有限公司 | Image completion method and device |
| Title |
|---|
| RAYMOND A. YEH等: "Semantic Image Inpainting with Deep Generative Models", 《CVPR2017》* |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN108427950B (en)* | 2018-02-01 | 2021-02-19 | 北京捷通华声科技股份有限公司 | Character line detection method and device |
| CN108427950A (en)* | 2018-02-01 | 2018-08-21 | 北京捷通华声科技股份有限公司 | A kind of literal line detection method and device |
| CN108550118A (en)* | 2018-03-22 | 2018-09-18 | 深圳大学 | Fuzzy processing method, device, equipment and the storage medium of motion blur image |
| CN108550118B (en)* | 2018-03-22 | 2022-02-22 | 深圳大学 | Method, device, device and storage medium for blurring motion blurred image |
| CN110363189A (en)* | 2018-04-09 | 2019-10-22 | 珠海金山办公软件有限公司 | A method, device, electronic device, and readable storage medium for restoring document content |
| CN110363189B (en)* | 2018-04-09 | 2021-09-24 | 珠海金山办公软件有限公司 | A document content restoration method, device, electronic device and readable storage medium |
| CN108710896A (en)* | 2018-04-24 | 2018-10-26 | 浙江工业大学 | The field learning method of learning network is fought based on production |
| CN108710896B (en)* | 2018-04-24 | 2021-10-29 | 浙江工业大学 | Domain learning method based on generative adversarial learning network |
| CN108460830A (en)* | 2018-05-09 | 2018-08-28 | 厦门美图之家科技有限公司 | Image repair method, device and image processing equipment |
| CN109117843A (en)* | 2018-08-01 | 2019-01-01 | 百度在线网络技术(北京)有限公司 | Character occlusion detection method and device |
| CN109117843B (en)* | 2018-08-01 | 2022-04-15 | 百度在线网络技术(北京)有限公司 | Character occlusion detection method and device |
| CN109978805A (en)* | 2019-03-18 | 2019-07-05 | Oppo广东移动通信有限公司 | Photographing processing method, device, mobile terminal, and storage medium |
| CN110335212B (en)* | 2019-06-28 | 2021-01-15 | 西安理工大学 | Defect ancient book Chinese character repairing method based on condition confrontation network |
| CN110335212A (en)* | 2019-06-28 | 2019-10-15 | 西安理工大学 | Repair method of missing ancient Chinese characters based on conditional confrontation network |
| CN111402156B (en)* | 2020-03-11 | 2021-08-03 | 腾讯科技(深圳)有限公司 | Restoration method and device for smear image, storage medium and terminal equipment |
| CN111402156A (en)* | 2020-03-11 | 2020-07-10 | 腾讯科技(深圳)有限公司 | Restoration method and device for smear image, storage medium and terminal equipment |
| CN112287930A (en)* | 2020-11-02 | 2021-01-29 | 深圳市童书王国际文化传媒有限公司 | A kind of intelligent point reading text system and using method thereof |
| CN112801911A (en)* | 2021-02-08 | 2021-05-14 | 苏州长嘴鱼软件有限公司 | Method and device for removing Chinese character noise in natural image and storage medium |
| CN112801911B (en)* | 2021-02-08 | 2024-03-26 | 苏州长嘴鱼软件有限公司 | Method and device for removing text noise in natural image and storage medium |
| CN113610082A (en)* | 2021-08-12 | 2021-11-05 | 北京有竹居网络技术有限公司 | A character recognition method and related equipment |
| CN113610082B (en)* | 2021-08-12 | 2024-09-06 | 北京有竹居网络技术有限公司 | Character recognition method and related equipment thereof |
| CN118691512A (en)* | 2024-08-26 | 2024-09-24 | 浙江鸟潮供应链管理有限公司 | Image processing method, storage medium and electronic device based on large model |
| CN118691512B (en)* | 2024-08-26 | 2025-01-14 | 浙江鸟潮供应链管理有限公司 | Image processing method, storage medium and electronic device based on large model |
| Publication | Publication Date | Title |
|---|---|---|
| CN107609560A (en) | Character recognition method and device | |
| CN107679483A (en) | Number plate recognition methods and device | |
| CN107679533A (en) | Character recognition method and device | |
| CN109871883B (en) | Neural network training method and device, electronic equipment and storage medium | |
| CN107527059B (en) | Character recognition method and device and terminal | |
| CN106651955A (en) | Method and device for positioning object in picture | |
| CN107798327A (en) | Character identifying method and device | |
| CN106503617A (en) | Model training method and device | |
| CN105809704A (en) | Method and device for identifying image definition | |
| CN106295511A (en) | Face tracking method and device | |
| CN107845062A (en) | image generating method and device | |
| CN107798669A (en) | Image defogging method, device and computer-readable recording medium | |
| CN107993210A (en) | Image repair method, device and computer-readable recording medium | |
| CN107527053A (en) | Object detection method and device | |
| CN107944447A (en) | Image classification method and device | |
| CN105528078B (en) | The method and device of controlling electronic devices | |
| CN106980840A (en) | Shape of face matching process, device and storage medium | |
| CN107527024A (en) | Face face value appraisal procedure and device | |
| CN108898591A (en) | Methods of marking and device, electronic equipment, the readable storage medium storing program for executing of picture quality | |
| CN107133354A (en) | The acquisition methods and device of description information of image | |
| CN108717542A (en) | Identify the method, apparatus and computer readable storage medium of character area | |
| CN111210844B (en) | Method, device and equipment for determining speech emotion recognition model and storage medium | |
| CN107766820A (en) | Image classification method and device | |
| CN107798314A (en) | Skin color detection method and device | |
| CN107424130A (en) | Picture U.S. face method and apparatus |
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination |