CN114255467B

Movatterモバイル変換

Info

Publication number: CN114255467B
Application number: CN202011003216.5A
Authority: CN
Inventors: 朱远志; 王天玮; 何梦超; 王永攀
Original assignee: Alibaba Group Holding Ltd
Current assignee: Alibaba Group Holding Ltd
Priority date: 2020-09-22
Filing date: 2020-09-22
Publication date: 2024-11-08
Anticipated expiration: 2040-09-22
Also published as: CN114255467A

Abstract

The embodiment of the specification provides a text recognition method and device, and a feature extraction neural network training method and device, wherein the text recognition method comprises the steps of obtaining a text picture to be recognized; inputting the text picture into a feature extraction neural network for processing to obtain a two-dimensional feature map of the text picture, wherein the two-dimensional feature map comprises characters in the text picture; predicting the characters in the two-dimensional feature map to obtain a two-dimensional prediction result map containing predicted characters; and processing the predicted characters in the two-dimensional predicted result graph to obtain a recognition result of the characters in the text picture.

Description

Translated fromChinese

文本识别方法及装置、特征提取神经网络训练方法及装置Text recognition method and device, feature extraction neural network training method and device

技术领域Technical Field

本说明书实施例涉及计算机技术领域，特别涉及文本识别方法。本说明书一个或者多个实施例同时涉及文本识别装置，特征提取神经网络训练方法，特征提取神经网络训练装置，一种计算设备，以及一种计算机可读存储介质。The embodiments of this specification relate to the field of computer technology, and in particular to a text recognition method. One or more embodiments of this specification also relate to a text recognition device, a feature extraction neural network training method, a feature extraction neural network training device, a computing device, and a computer-readable storage medium.

背景技术Background Art

随着深度学习的发展以及应用场景的扩展，文档型OCR(英文：Optical CharacterRecognition，简称：OCR，中文：光学字符识别)领域的主要研究对象，已经由对版式简单印刷体为主的文档型文本识别转向对版式复杂手写体为主的文档型文本识别。目前，业界主流的文本识别解码器(例如CTC和Attention),一般只能解决一维文本或者短序列二维文本识别的问题。With the development of deep learning and the expansion of application scenarios, the main research object in the field of document OCR (Optical Character Recognition, OCR for short, optical character recognition in Chinese) has shifted from document text recognition mainly for simple printed text to document text recognition mainly for complex handwritten text. At present, the mainstream text recognition decoders in the industry (such as CTC and Attention) can generally only solve the problem of one-dimensional text or short-sequence two-dimensional text recognition.

因此，急需提供一种可以对长序列的密集二维文本进行准确识别的文本识别方法。Therefore, there is an urgent need to provide a text recognition method that can accurately recognize long sequences of dense two-dimensional text.

发明内容Summary of the invention

有鉴于此，本说明书施例提供了文本识别方法。本说明书一个或者多个实施例同时涉及文本识别装置，特征提取神经网络训练方法，特征提取神经网络训练装置，一种计算设备，以及一种计算机可读存储介质，以解决现有技术中存在的技术缺陷。In view of this, the present specification provides a text recognition method. One or more embodiments of the present specification also relate to a text recognition device, a feature extraction neural network training method, a feature extraction neural network training device, a computing device, and a computer-readable storage medium to solve the technical defects existing in the prior art.

根据本说明书实施例的第一方面，提供了一种文本识别方法，包括：According to a first aspect of an embodiment of this specification, a text recognition method is provided, comprising:

获取待识别的文本图片；Get the text image to be recognized;

将所述文本图片输入特征提取神经网络进行处理，获得所述文本图片的二维特征图，其中，所述二维特征图中包括所述文本图片中的字符；Inputting the text image into a feature extraction neural network for processing to obtain a two-dimensional feature map of the text image, wherein the two-dimensional feature map includes characters in the text image;

对所述二维特征图中的字符进行预测，获得包含预测字符的二维预测结果图；Predicting characters in the two-dimensional feature map to obtain a two-dimensional prediction result map containing the predicted characters;

对所述二维预测结果图中的预测字符进行处理，获得所述文本图片中的字符的识别结果。The predicted characters in the two-dimensional prediction result image are processed to obtain recognition results of the characters in the text image.

根据本说明书实施例的第二方面，提供了一种特征提取神经网络训练方法，包括：According to a second aspect of an embodiment of this specification, a feature extraction neural network training method is provided, comprising:

构建初始特征提取神经网络，并获取包含文本的样本图片训练集，其中，所述样本图片训练集中包括样本图片以及所述样本图片对应的文本标签；Constructing an initial feature extraction neural network and obtaining a sample image training set containing text, wherein the sample image training set includes sample images and text labels corresponding to the sample images;

基于所述初始特征提取神经网络对所述样本图片进行处理，获得所述样本图片的二维特征图，并基于所述二维特征图确定所述样本图片的字符位置高热图；Processing the sample image based on the initial feature extraction neural network to obtain a two-dimensional feature map of the sample image, and determining a character position high heat map of the sample image based on the two-dimensional feature map;

根据所述样本图片的二维特征图以及字符位置高热图获得所述样本图片的一维特征图；Obtaining a one-dimensional feature map of the sample image according to the two-dimensional feature map of the sample image and the character position high heat map;

对所述一维特征图进行分类，以获得所述样本图片的文本预测结果；Classifying the one-dimensional feature map to obtain a text prediction result of the sample image;

基于所述样本图片对应的文本标签以及文本预测结果计算损失函数，且根据所述损失函数对所述初始特征提取神经网络进行训练，获得所述特征提取神经网络。A loss function is calculated based on the text label corresponding to the sample image and the text prediction result, and the initial feature extraction neural network is trained according to the loss function to obtain the feature extraction neural network.

根据本说明书实施例的第三方面，提供了一种文本识别方法，包括：According to a third aspect of the embodiments of this specification, a text recognition method is provided, including:

基于用户的调用请求为所述用户展示图片输入界面；Displaying a picture input interface to the user based on a user's call request;

获取所述用户基于所述图片输入界面输入的待识别的文本图片；Acquire a text image to be recognized input by the user based on the image input interface;

对所述二维预测结果图中的预测字符进行处理，获得所述文本图片中的字符的识别结果并返回给所述用户。The predicted characters in the two-dimensional prediction result image are processed to obtain recognition results of the characters in the text image and return them to the user.

根据本说明书实施例的第四方面，提供了一种文本识别方法，包括：According to a fourth aspect of the embodiments of this specification, a text recognition method is provided, including:

接收用户发送的调用请求，其中，所述调用请求中携带待识别的文本图片；Receiving a call request sent by a user, wherein the call request carries a text image to be recognized;

对所述二维特征图中的字符进行预测，获得包含预测字符的二维预测结果图；Predicting the characters in the two-dimensional feature map to obtain a two-dimensional prediction result map containing the predicted characters;

根据本说明书实施例的第五方面，提供了一种特征提取神经网络训练方法，包括：According to a fifth aspect of the embodiments of this specification, a feature extraction neural network training method is provided, comprising:

接收所述用户基于所述图片输入界面输入的包含文本的样本图片训练集，其中，所述样本图片训练集中包括样本图片以及所述样本图片对应的文本标签Receive a sample picture training set containing text input by the user based on the picture input interface, wherein the sample picture training set includes sample pictures and text labels corresponding to the sample pictures

构建初始特征提取神经网络，基于所述初始特征提取神经网络对所述样本图片进行处理，获得所述样本图片的二维特征图，并基于所述二维特征图确定所述样本图片的字符位置高热图；Constructing an initial feature extraction neural network, processing the sample image based on the initial feature extraction neural network to obtain a two-dimensional feature map of the sample image, and determining a character position high heat map of the sample image based on the two-dimensional feature map;

根据本说明书实施例的第六方面，提供了一种特征提取神经网络训练方法，包括：According to a sixth aspect of the embodiments of this specification, a feature extraction neural network training method is provided, comprising:

接收用户发送的调用请求，其中，所述调用请求中携带包含文本的样本图片训练集，所述样本图片训练集中包括样本图片以及所述样本图片对应的文本标签；Receive a call request sent by a user, wherein the call request carries a sample picture training set containing text, and the sample picture training set includes sample pictures and text labels corresponding to the sample pictures;

构建初始特征提取神经网络；Construct an initial feature extraction neural network;

基于所述样本图片对应的文本标签以及文本预测结果计算损失函数，且根据所述损失函数对所述初始特征提取神经网络进行训练，获得所述特征提取神经网络并返回给所述用户。A loss function is calculated based on the text label corresponding to the sample image and the text prediction result, and the initial feature extraction neural network is trained according to the loss function to obtain the feature extraction neural network and return it to the user.

根据本说明书实施例的第七方面，提供了一种文本识别装置，包括：According to a seventh aspect of the embodiments of this specification, a text recognition device is provided, including:

第一获取模块，被配置为获取待识别的文本图片；A first acquisition module is configured to acquire a text image to be recognized;

第一获得模块，被配置为将所述文本图片输入特征提取神经网络进行处理，获得所述文本图片的二维特征图，其中，所述二维特征图中包括所述文本图片中的字符；A first obtaining module is configured to input the text image into a feature extraction neural network for processing to obtain a two-dimensional feature map of the text image, wherein the two-dimensional feature map includes characters in the text image;

第二获得模块，被配置为对所述二维特征图中的字符进行预测，获得包含预测字符的二维预测结果图；A second obtaining module is configured to predict the characters in the two-dimensional feature map to obtain a two-dimensional prediction result map containing the predicted characters;

第三获得模块，被配置为对所述二维预测结果图中的预测字符进行处理，获得所述文本图片中的字符的识别结果。The third acquisition module is configured to process the predicted characters in the two-dimensional prediction result image to obtain the recognition results of the characters in the text image.

根据本说明书实施例的第八方面，提供了一种特征提取神经网络训练装置，包括：According to an eighth aspect of the embodiments of this specification, there is provided a feature extraction neural network training device, comprising:

第二获取模块，被配置为构建初始特征提取神经网络，并获取包含文本的样本图片训练集，其中，所述样本图片训练集中包括样本图片以及所述样本图片对应的文本标签；A second acquisition module is configured to construct an initial feature extraction neural network and acquire a sample image training set containing text, wherein the sample image training set includes sample images and text labels corresponding to the sample images;

第三获得模块，被配置为基于所述初始特征提取神经网络对所述样本图片进行处理，获得所述样本图片的二维特征图，并基于所述二维特征图确定所述样本图片的字符位置高热图；A third acquisition module is configured to process the sample image based on the initial feature extraction neural network to obtain a two-dimensional feature map of the sample image, and determine a character position high heat map of the sample image based on the two-dimensional feature map;

第四获得模块，被配置为根据所述样本图片的二维特征图以及字符位置高热图获得所述样本图片的一维特征图；A fourth obtaining module is configured to obtain a one-dimensional feature map of the sample image according to the two-dimensional feature map of the sample image and the character position high heat map;

第五获得模块，被配置为对所述一维特征图进行分类，以获得所述样本图片的文本预测结果；A fifth obtaining module is configured to classify the one-dimensional feature map to obtain a text prediction result of the sample image;

第六获得模块，被配置为基于所述样本图片对应的文本标签以及文本预测结果计算损失函数，且根据所述损失函数对所述初始特征提取神经网络进行训练，获得所述特征提取神经网络。The sixth acquisition module is configured to calculate a loss function based on the text label corresponding to the sample image and the text prediction result, and train the initial feature extraction neural network according to the loss function to obtain the feature extraction neural network.

根据本说明书实施例的第九方面，提供了一种文本识别装置，包括：According to a ninth aspect of the embodiments of this specification, a text recognition device is provided, including:

第一界面展示模块，被配置为基于用户的调用请求为所述用户展示图片输入界面；A first interface display module is configured to display a picture input interface for the user based on a user's call request;

第三获取模块，被配置为获取所述用户基于所述图片输入界面输入的待识别的文本图片；A third acquisition module is configured to acquire the text image to be recognized input by the user based on the image input interface;

第七获得模块，被配置为将所述文本图片输入特征提取神经网络进行处理，获得所述文本图片的二维特征图，其中，所述二维特征图中包括所述文本图片中的字符；a seventh obtaining module, configured to input the text image into a feature extraction neural network for processing, and obtain a two-dimensional feature map of the text image, wherein the two-dimensional feature map includes characters in the text image;

第八获得模块，被配置为对所述二维特征图中的字符进行预测，获得包含预测字符的二维预测结果图；an eighth obtaining module, configured to predict the characters in the two-dimensional feature map to obtain a two-dimensional prediction result map containing the predicted characters;

第九获得模块，被配置为对所述二维预测结果图中的预测字符进行处理，获得所述文本图片中的字符的识别结果并返回给所述用户。The ninth acquisition module is configured to process the predicted characters in the two-dimensional prediction result image, obtain the recognition results of the characters in the text image and return them to the user.

根据本说明书实施例的第十方面，提供了一种文本识别装置，包括：According to a tenth aspect of the embodiments of this specification, a text recognition device is provided, including:

第一请求接收模块，被配置为接收用户发送的调用请求，其中，所述调用请求中携带待识别的文本图片；A first request receiving module is configured to receive a call request sent by a user, wherein the call request carries a text image to be recognized;

第一处理模块，被配置为将所述文本图片输入特征提取神经网络进行处理，获得所述文本图片的二维特征图，其中，所述二维特征图中包括所述文本图片中的字符；A first processing module is configured to input the text image into a feature extraction neural network for processing to obtain a two-dimensional feature map of the text image, wherein the two-dimensional feature map includes characters in the text image;

第一预测模块，被配置为对所述二维特征图中的字符进行预测，获得包含预测字符的二维预测结果图；A first prediction module is configured to predict the characters in the two-dimensional feature map to obtain a two-dimensional prediction result map containing the predicted characters;

第二处理模块，被配置为对所述二维预测结果图中的预测字符进行处理，获得所述文本图片中的字符的识别结果并返回给所述用户。The second processing module is configured to process the predicted characters in the two-dimensional prediction result image, obtain the recognition results of the characters in the text image and return them to the user.

根据本说明书实施例的第十一方面，提供了一种特征提取神经网络训练装置，包括：According to an eleventh aspect of the embodiments of this specification, there is provided a feature extraction neural network training device, comprising:

第二界面展示模块，被配置为基于用户的调用请求为所述用户展示图片输入界面；A second interface display module is configured to display a picture input interface for the user based on a call request from the user;

样本接收模块，被配置为接收所述用户基于所述图片输入界面输入的包含文本的样本图片训练集，其中，所述样本图片训练集中包括样本图片以及所述样本图片对应的文本标签The sample receiving module is configured to receive a sample picture training set containing text input by the user based on the picture input interface, wherein the sample picture training set includes sample pictures and text labels corresponding to the sample pictures

第三处理模块，被配置为构建初始特征提取神经网络，基于所述初始特征提取神经网络对所述样本图片进行处理，获得所述样本图片的二维特征图，并基于所述二维特征图确定所述样本图片的字符位置高热图；A third processing module is configured to construct an initial feature extraction neural network, process the sample image based on the initial feature extraction neural network to obtain a two-dimensional feature map of the sample image, and determine a character position high heat map of the sample image based on the two-dimensional feature map;

第十获得模块，被配置为根据所述样本图片的二维特征图以及字符位置高热图获得所述样本图片的一维特征图；A tenth obtaining module, configured to obtain a one-dimensional feature map of the sample image according to the two-dimensional feature map of the sample image and the character position high heat map;

第一分类模块，被配置为对所述一维特征图进行分类，以获得所述样本图片的文本预测结果；A first classification module is configured to classify the one-dimensional feature map to obtain a text prediction result of the sample image;

第一训练模块，被配置为基于所述样本图片对应的文本标签以及文本预测结果计算损失函数，且根据所述损失函数对所述初始特征提取神经网络进行训练，获得所述特征提取神经网络。The first training module is configured to calculate a loss function based on the text label corresponding to the sample image and the text prediction result, and train the initial feature extraction neural network according to the loss function to obtain the feature extraction neural network.

根据本说明书实施例的第十二方面，提供了一种特征提取神经网络训练装置，包括：According to a twelfth aspect of the embodiments of this specification, there is provided a feature extraction neural network training device, comprising:

第二请求接收模块，被配置为接收用户发送的调用请求，其中，所述调用请求中携带包含文本的样本图片训练集，所述样本图片训练集中包括样本图片以及所述样本图片对应的文本标签；A second request receiving module is configured to receive a call request sent by a user, wherein the call request carries a sample picture training set containing text, and the sample picture training set includes sample pictures and text labels corresponding to the sample pictures;

第四处理模块，被配置为构建初始特征提取神经网络，基于所述初始特征提取神经网络对所述样本图片进行处理，获得所述样本图片的二维特征图，并基于所述二维特征图确定所述样本图片的字符位置高热图；a fourth processing module, configured to construct an initial feature extraction neural network, process the sample image based on the initial feature extraction neural network to obtain a two-dimensional feature map of the sample image, and determine a character position high heat map of the sample image based on the two-dimensional feature map;

第十一获得模块，被配置为根据所述样本图片的二维特征图以及字符位置高热图获得所述样本图片的一维特征图；An eleventh obtaining module is configured to obtain a one-dimensional feature map of the sample image according to the two-dimensional feature map of the sample image and the character position high heat map;

第二分类模块，被配置为对所述一维特征图进行分类，以获得所述样本图片的文本预测结果；A second classification module is configured to classify the one-dimensional feature map to obtain a text prediction result of the sample image;

第二训练模块，被配置为基于所述样本图片对应的文本标签以及文本预测结果计算损失函数，且根据所述损失函数对所述初始特征提取神经网络进行训练，获得所述特征提取神经网络并返回给所述用户。The second training module is configured to calculate a loss function based on the text label corresponding to the sample image and the text prediction result, and train the initial feature extraction neural network according to the loss function to obtain the feature extraction neural network and return it to the user.

根据本说明书实施例的第十三方面，提供了一种计算设备，包括：According to a thirteenth aspect of the embodiments of this specification, there is provided a computing device, including:

存储器和处理器；Memory and processor;

所述存储器用于存储计算机可执行指令，所述处理器用于执行所述计算机可执行指令，该指令被处理器执行时实现所述文本识别方法的步骤或者实现所述特征提取神经网络训练方法的步骤。The memory is used to store computer executable instructions, and the processor is used to execute the computer executable instructions. When the instructions are executed by the processor, the steps of the text recognition method or the steps of the feature extraction neural network training method are implemented.

根据本说明书实施例的第十四方面，提供了一种计算机可读存储介质，其存储有计算机可执行指令，该指令被处理器执行时实现所述文本识别方法的步骤或者实现所述特征提取神经网络训练方法的步骤。According to the fourteenth aspect of the embodiments of this specification, a computer-readable storage medium is provided, which stores computer-executable instructions, which, when executed by a processor, implement the steps of the text recognition method or the steps of the feature extraction neural network training method.

本说明书一个实施例实现了文本识别方法及装置，其中，所述文本识别方法包括获取待识别的文本图片；将所述文本图片输入特征提取神经网络进行处理，获得所述文本图片的二维特征图，其中，所述二维特征图中包括所述文本图片中的字符；对所述二维特征图中的字符进行预测，获得包含预测字符的二维预测结果图；对所述二维预测结果图中的预测字符进行处理，获得所述文本图片中的字符的识别结果。所述文本识别方法通过基于单行文本图片训练获得的特征提取神经网络，对待识别的包含多行文本的文本图片进行精确的二维特征图预测，并可以通过对精确的二维特征图中的预测字符的处理，获得文本图片中准确的文本字符。One embodiment of the present specification implements a text recognition method and device, wherein the text recognition method includes obtaining a text image to be recognized; inputting the text image into a feature extraction neural network for processing to obtain a two-dimensional feature map of the text image, wherein the two-dimensional feature map includes characters in the text image; predicting the characters in the two-dimensional feature map to obtain a two-dimensional prediction result map including the predicted characters; processing the predicted characters in the two-dimensional prediction result map to obtain the recognition result of the characters in the text image. The text recognition method uses a feature extraction neural network obtained by training a single-line text image to accurately predict the two-dimensional feature map of the text image to be recognized containing multiple lines of text, and can obtain accurate text characters in the text image by processing the predicted characters in the accurate two-dimensional feature map.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1是本说明书一个实施例提供的一种文本识别方法的具体应用场景的示例图；FIG1 is an example diagram of a specific application scenario of a text recognition method provided by an embodiment of this specification;

图2是本说明书一个实施例提供的第一种文本识别方法的流程图；FIG2 is a flow chart of a first text recognition method provided by an embodiment of this specification;

图3是本说明书一个实施例提供的一种文本识别方法中根据二维预测结果图获得文本图片的识别结果的流程图；FIG3 is a flow chart of obtaining a recognition result of a text image according to a two-dimensional prediction result graph in a text recognition method provided by an embodiment of the present specification;

图4是本说明书一个实施例提供的第一种特征提取神经网络训练方法的流程图；FIG4 is a flow chart of a first feature extraction neural network training method provided by an embodiment of this specification;

图5是本说明书一个实施例提供的一种特征提取神经网络训练方法的具体训练流程示意图；FIG5 is a schematic diagram of a specific training process of a feature extraction neural network training method provided by an embodiment of this specification;

图6是本说明书一个实施例提供的第二种文本识别方法的流程图；FIG6 is a flow chart of a second text recognition method provided by an embodiment of this specification;

图7是本说明书一个实施例提供的第三种文本识别方法的流程图；FIG7 is a flow chart of a third text recognition method provided by an embodiment of this specification;

图8是本说明书一个实施例提供的第二种特征提取神经网络训练方法的流程图；FIG8 is a flow chart of a second feature extraction neural network training method provided by an embodiment of this specification;

图9是本说明书一个实施例提供的第三种特征提取神经网络训练方法的流程图；FIG9 is a flow chart of a third feature extraction neural network training method provided by an embodiment of this specification;

图10是本说明书一个实施例提供的第一种文本识别装置的结构示意图；FIG10 is a schematic diagram of the structure of a first text recognition device provided by an embodiment of this specification;

图11是本说明书一个实施例提供的第二种文本识别装置的结构示意图；FIG11 is a schematic diagram of the structure of a second text recognition device provided by an embodiment of this specification;

图12是本说明书一个实施例提供的第三种文本识别装置的结构示意图；FIG12 is a schematic diagram of the structure of a third text recognition device provided by an embodiment of this specification;

图13是本说明书一个实施例提供的第一种特征提取神经网络训练装置的结构示意图；FIG13 is a schematic diagram of the structure of a first feature extraction neural network training device provided by an embodiment of this specification;

图14是本说明书一个实施例提供的第二种特征提取神经网络训练装置的结构示意图；FIG14 is a schematic diagram of the structure of a second feature extraction neural network training device provided by an embodiment of this specification;

图15是本说明书一个实施例提供的第三种特征提取神经网络训练装置的结构示意图；FIG15 is a schematic diagram of the structure of a third feature extraction neural network training device provided by an embodiment of this specification;

图16是本说明书一个实施例提供的一种计算设备的结构框图。FIG. 16 is a structural block diagram of a computing device provided in one embodiment of the present specification.

具体实施方式DETAILED DESCRIPTION

在下面的描述中阐述了很多具体细节以便于充分理解本说明书。但是本说明书能够以很多不同于在此描述的其它方式来实施，本领域技术人员可以在不违背本说明书内涵的情况下做类似推广，因此本说明书不受下面公开的具体实施的限制。Many specific details are described in the following description to facilitate a full understanding of this specification. However, this specification can be implemented in many other ways than those described herein, and those skilled in the art can make similar generalizations without violating the connotation of this specification, so this specification is not limited to the specific implementation disclosed below.

在本说明书一个或多个实施例中使用的术语是仅仅出于描述特定实施例的目的，而非旨在限制本说明书一个或多个实施例。在本说明书一个或多个实施例和所附权利要求书中所使用的单数形式的“一种”、“所述”和“该”也旨在包括多数形式，除非上下文清楚地表示其他含义。还应当理解，本说明书一个或多个实施例中使用的术语“和/或”是指并包含一个或多个相关联的列出项目的任何或所有可能组合。The terms used in one or more embodiments of this specification are only for the purpose of describing specific embodiments, and are not intended to limit one or more embodiments of this specification. The singular forms of "a", "said" and "the" used in one or more embodiments of this specification and the appended claims are also intended to include plural forms, unless the context clearly indicates other meanings. It should also be understood that the term "and/or" used in one or more embodiments of this specification refers to and includes any or all possible combinations of one or more associated listed items.

应当理解，尽管在本说明书一个或多个实施例中可能采用术语第一、第二等来描述各种信息，但这些信息不应限于这些术语。这些术语仅用来将同一类型的信息彼此区分开。例如，在不脱离本说明书一个或多个实施例范围的情况下，第一也可以被称为第二，类似地，第二也可以被称为第一。取决于语境，如在此所使用的词语“如果”可以被解释成为“在……时”或“当……时”或“响应于确定”。It should be understood that although the terms first, second, etc. may be used to describe various information in one or more embodiments of this specification, this information should not be limited to these terms. These terms are only used to distinguish the same type of information from each other. For example, without departing from the scope of one or more embodiments of this specification, the first may also be referred to as the second, and similarly, the second may also be referred to as the first. Depending on the context, the word "if" as used herein may be interpreted as "at the time of" or "when" or "in response to determining".

首先，对本说明书一个或多个实施例涉及的名词术语进行解释。First, the terms involved in one or more embodiments of this specification are explained.

OCR：英文：Optical Character Recognition，简称：OCR，中文：光学字符识别。OCR: English: Optical Character Recognition, abbreviated as: OCR, Chinese: Optical Character Recognition.

CTC：连接时序分类器。CTC: Connectionist Temporal Classifier.

Attention：注意力机制。Attention: Attention mechanism.

密集预测：对特征图上的每一个特征点进行预测分类。Dense prediction: predict the classification of each feature point on the feature map.

在本说明书中，提供了文本识别方法。本说明书一个或者多个实施例同时涉及文本识别装置，特征提取神经网络训练方法，特征提取神经网络训练装置，一种计算设备，以及一种计算机可读存储介质，在下面的实施例中逐一进行详细说明。In this specification, a text recognition method is provided. One or more embodiments of this specification also involve a text recognition device, a feature extraction neural network training method, a feature extraction neural network training device, a computing device, and a computer-readable storage medium, which are described in detail one by one in the following embodiments.

在文档OCR领域，对于篇幅级文本(即多行文本)，目前现有的识别技术一般只能解决一维文本或者短序列二维文本识别的问题，在进行篇幅级密集文本识别时，均需要先对其进行文本检测，找到单行文本的检测框或者特征，再对其进行一维的文本识别，最后将识别结果按照顺序拼接在一起形成识别结果。然而，先检测再识别机制的性能受限于检测器的精度和速度，尤其是对于具有多行的密集文本，行与行之间很难用四边形框精准的分割开，对于检测不准确的文本，会对后续识别的精度造成巨大的影响。因此，本说明书实施例提出一种基于扩展CTC的篇幅级密集文本识别方法，该方法无需使用检测器，仅通过构建伪标签在单行文本上学习出字符位置的高热图，运用CTC解码的方式对特征提取神经网络(例如卷积神经网络)进行训练，在实际应用时，可以直接对篇幅级密集文本进行密集预测，并通过后处理的方式得到最终的文本识别结果，采用本说明书实施例的所述文本识别方法的方案可以有效的提高篇幅级密集文本的识别准确率和识别效率。In the field of document OCR, for paragraph-level text (i.e., multi-line text), the existing recognition technology can generally only solve the problem of one-dimensional text or short-sequence two-dimensional text recognition. When performing paragraph-level dense text recognition, it is necessary to first perform text detection, find the detection box or feature of the single-line text, and then perform one-dimensional text recognition. Finally, the recognition results are stitched together in order to form the recognition result. However, the performance of the detection-before-recognition mechanism is limited by the accuracy and speed of the detector, especially for dense text with multiple lines. It is difficult to accurately separate lines with quadrilateral boxes. For text with inaccurate detection, it will have a huge impact on the accuracy of subsequent recognition. Therefore, the embodiment of the present specification proposes a method for recognizing dense text at the paragraph level based on extended CTC. The method does not require the use of a detector, but only learns a high heat map of character positions on a single line of text by constructing pseudo labels, and uses CTC decoding to train a feature extraction neural network (such as a convolutional neural network). In actual application, dense predictions can be directly performed on dense text at the paragraph level, and the final text recognition result can be obtained through post-processing. The text recognition method of the embodiment of the present specification can effectively improve the recognition accuracy and efficiency of dense text at the paragraph level.

参见图1，图1示出了根据本说明书一个实施例提供的一种文本识别方法的具体应用场景的示例图。Referring to FIG. 1 , FIG. 1 shows an example diagram of a specific application scenario of a text recognition method provided according to an embodiment of the present specification.

图1的应用场景中包括终端和服务器，具体的，用户通过终端将包含多行文本的文本图片发送给服务器，服务器在接收到包含多行文本的文本图片后，将该文本图片输入预先训练的卷积神经网络中，卷积获得该文本图片的二维特征图，然后对该二维特征图中每个特征点对应的字符进行分类(即字符预测)，获得二维密集预测结果图，其中，该二维密集预测结果图中存放的为该文本图片中的文本字符；最后对该二维密集预测结果图中的文本字符进行八领域合并、遍历、排序等后处理，形成最终的该文本图片中的字符，即该文本图片中的多行文本内容。The application scenario of Figure 1 includes a terminal and a server. Specifically, a user sends a text image containing multiple lines of text to a server through a terminal. After receiving the text image containing multiple lines of text, the server inputs the text image into a pre-trained convolutional neural network, convolves to obtain a two-dimensional feature map of the text image, and then classifies the characters corresponding to each feature point in the two-dimensional feature map (i.e., character prediction) to obtain a two-dimensional dense prediction result map, wherein the two-dimensional dense prediction result map stores the text characters in the text image; finally, the text characters in the two-dimensional dense prediction result map are post-processed such as eight-field merging, traversal, and sorting to form the final characters in the text image, i.e., the multiple lines of text content in the text image.

参见图2，图2示出了根据本说明书一个实施例提供的第一种文本识别方法的流程图，具体包括以下步骤。Referring to FIG. 2 , FIG. 2 shows a flow chart of a first text recognition method provided according to an embodiment of the present specification, which specifically includes the following steps.

步骤202：获取待识别的文本图片。Step 202: Obtain a text image to be recognized.

其中，文本图片包括但不限于包含任何文本字符的图片，例如包含文字、数字、符号和/或特殊字符等文本的文本图片。实际应用中，该文本图片为包含篇幅级密集文本的文本图片，即包含多行文本的文本图片，例如50行、100行或者200行文本的文本图片。The text image includes but is not limited to an image containing any text characters, such as a text image containing texts such as words, numbers, symbols and/or special characters. In practical applications, the text image is a text image containing dense text, that is, a text image containing multiple lines of text, such as a text image of 50 lines, 100 lines or 200 lines of text.

具体的，获取待识别的文本图片即可理解为获取待识别的包含多行文本的文本图片，在后续处理时，均是针对该包含多行文本的文本图片进行的处理。Specifically, obtaining a text image to be recognized can be understood as obtaining a text image to be recognized containing multiple lines of text. In subsequent processing, the processing is performed on the text image containing multiple lines of text.

步骤204：将所述文本图片输入特征提取神经网络进行处理，获得所述文本图片的二维特征图，其中，所述二维特征图中包括所述文本图片中的字符。Step 204: Input the text image into a feature extraction neural network for processing to obtain a two-dimensional feature map of the text image, wherein the two-dimensional feature map includes the characters in the text image.

其中，特征提取神经网络可以为卷积神经网络，也可以是其他可以实现对文本图片进行处理进行二维特征图获取的机器学习网络，本申请对此不做任何限定，为了便于理解，本说明书实施例中均以特征提取神经网络为卷积神经网络为例进行详细介绍。Among them, the feature extraction neural network can be a convolutional neural network, or it can be other machine learning networks that can process text images to obtain two-dimensional feature maps. The present application does not impose any limitation on this. For ease of understanding, the embodiment of this specification uses the convolutional neural network as an example for detailed introduction.

具体实施时，在将文本图片输入特征提取神经网络之前，会对该特征提取神经网络进行训练，以使得该特征提取神经网络在具体使用时可以准确的输出该文本图片的二维特征图。In a specific implementation, before the text image is input into the feature extraction neural network, the feature extraction neural network will be trained so that the feature extraction neural network can accurately output the two-dimensional feature map of the text image when it is used specifically.

具体的，所述特征提取神经网络的训练步骤如下：Specifically, the training steps of the feature extraction neural network are as follows:

其中，样本图片与上述待识别的文本图片不同，样本图片为包含单行文本的样本图片，即每个样本图片中仅包含一行由文字、数字符号和/或特殊字符等文本，而待识别的文本图片则为包含多行文本的篇幅级密集文本图片。Among them, the sample image is different from the above-mentioned text image to be recognized. The sample image is a sample image containing a single line of text, that is, each sample image contains only one line of text consisting of words, digital symbols and/or special characters, while the text image to be recognized is a dense text image containing multiple lines of text.

在样本图片为单行样本图片的情况下，每个样本图片对应的样本标签则为每个样本图片中的真实的文本内容，例如样本图片中包括“我爱中国”，那么该样本图片对应的文本标签即为“我爱中国”。When the sample images are single-row sample images, the sample label corresponding to each sample image is the actual text content in each sample image. For example, if the sample image includes "I love China", then the text label corresponding to the sample image is "I love China".

具体实施时，首先构建初始特征提取神经网络，该初始特征提取神经网络包括输入层、隐含层、卷积层、池化层以及全连接层等，然后将获取的样本图片通过初始特征提取神经网络的输入层，输入到初始特征提取神经网络中去，通过初始特征提取神经网络的卷积层对其进行卷积处理，获得该样本图片对应的二维特征图，再对该二维特征图进行一些卷积操作，获得该样本图片的字符位置高热图；根据该样本图片的二维特征图以及字符位置高热图计算获得该样本图片的一维特征图；通过该一维特征图对该样本图片中的文字进行分类预测，获得该样本图片的文本预测结果；最后基于该样本图片对应的真实的文本标签以及该样本图片的文本预测结果计算该初始特征提取神经网络的损失函数，利用该损失函数通过反向传播的原理对该初始特征提取神经网络的各层参数进行调整，以获得最终训练后的特征提取神经网络。In the specific implementation, firstly, an initial feature extraction neural network is constructed, which includes an input layer, a hidden layer, a convolution layer, a pooling layer and a fully connected layer, etc., then the obtained sample image is input into the initial feature extraction neural network through the input layer of the initial feature extraction neural network, and the sample image is convolved through the convolution layer of the initial feature extraction neural network to obtain a two-dimensional feature map corresponding to the sample image, and then some convolution operations are performed on the two-dimensional feature map to obtain a character position high heat map of the sample image; a one-dimensional feature map of the sample image is calculated based on the two-dimensional feature map of the sample image and the character position high heat map; the text in the sample image is classified and predicted through the one-dimensional feature map to obtain a text prediction result of the sample image; finally, the loss function of the initial feature extraction neural network is calculated based on the real text label corresponding to the sample image and the text prediction result of the sample image, and the loss function is used to adjust the parameters of each layer of the initial feature extraction neural network through the principle of back propagation to obtain the feature extraction neural network after final training.

本说明书实施例中，在对特征提取神经网络训练时，仅使用单行文本的文本图片对该特征提取神经网络进行训练，即可使得训练后获得的该特征提取神经网络应用于对多行、篇幅级密集文本的识别方法中，避免了文本检测器的使用，从而避免了检测器性能带来的精度损失，且通过该特征提取神经网络可以极大的提高实际应用中对篇幅级密集文本的文本图片的识别效率。In the embodiments of the present specification, when training the feature extraction neural network, only text images of a single line of text are used to train the feature extraction neural network, so that the feature extraction neural network obtained after training can be applied to the recognition method of multi-line, paragraph-level dense text, avoiding the use of a text detector, thereby avoiding the loss of accuracy caused by the detector performance, and the feature extraction neural network can greatly improve the recognition efficiency of text images of paragraph-level dense text in practical applications.

实际使用中，为了实现在生成字符位置高热图的时候对该字符位置高热图进行监督，保证特征提取神经网络的推理准确性，则可以生成伪标签的形式对字符位置高热图进行监督，具体实现方式如下所述：In actual use, in order to supervise the character position high heat map when generating it and ensure the inference accuracy of the feature extraction neural network, the character position high heat map can be supervised in the form of pseudo labels. The specific implementation method is as follows:

所述获取包含文本的样本图片训练集之后，还包括：After obtaining the sample image training set containing text, the method further includes:

对所述样本图片进行预处理，以生成所述样本图片的伪标签。The sample image is preprocessed to generate a pseudo label for the sample image.

具体的，对样本图片进行预处理，包括但不限于对样本图片做5*5腐蚀、Otsu二值化、反相以及等比例缩放等，其中，Otsu二值化方法是一种全局阈值分割方法，是一种对图像进行二值化的高效算法，通过对样本图片的上述一些系列预处理，获得预处理后的样本图片，而预处理后的样本图片即为伪标签。Specifically, the sample images are preprocessed, including but not limited to 5*5 corrosion, Otsu binarization, inversion, and proportional scaling of the sample images. The Otsu binarization method is a global threshold segmentation method and an efficient algorithm for binarizing images. By performing the above series of preprocessing on the sample images, the preprocessed sample images are obtained, and the preprocessed sample images are pseudo labels.

实际应用时，在对样本图片的二维特征图进行卷积获得字符位置高热图的情况下，会利用伪标签对该字符位置高热图进行监督，以保证字符位置高热图的准确性，具体实现方式如下所述：In actual application, when the two-dimensional feature map of the sample image is convolved to obtain the character position high heat map, the character position high heat map will be supervised by pseudo labels to ensure the accuracy of the character position high heat map. The specific implementation method is as follows:

所述基于所述二维特征图确定所述样本图片的字符位置高热图包括：The determining of the character position high heat map of the sample image based on the two-dimensional feature map comprises:

基于所述样本图片的伪标签以及二维特征图确定所述样本图片的字符位置高热图。A character position heat map of the sample image is determined based on the pseudo label of the sample image and the two-dimensional feature map.

具体实施时，预处理获得的样本图片的伪标签与基于二维特征图卷积生成的字符位置高热图为同样尺寸，通过字符位置高热图中的像素点的激活值与伪标签中的像素点的激活值计算出一个均方误差损失，基于该均方误差损失对该字符位置高热图进行调整，以获得最终的、准确的字符位置高热图。In the specific implementation, the pseudo-label of the sample image obtained by preprocessing is of the same size as the character position high heat map generated based on the convolution of the two-dimensional feature map. A mean square error loss is calculated by the activation value of the pixel point in the character position high heat map and the activation value of the pixel point in the pseudo-label. The character position high heat map is adjusted based on the mean square error loss to obtain the final and accurate character position high heat map.

实际应用中，由于通过二维特征图仅可以实现对每个字符的高热，但是不清楚每个高热字符的位置，那么此时就需要通过一定的字符位置序列才能实现对每个字符位置的准确排列，因此生成伪标签就可以在生成字符位置高热图的时候确哪些位置的字符进行高热，从而生成准确的字符位置高热图。In practical applications, since only the high heat of each character can be achieved through a two-dimensional feature map, but the position of each high heat character is unclear, a certain character position sequence is required to accurately arrange the position of each character. Therefore, generating pseudo labels can determine which characters are high heat when generating a character position high heat map, thereby generating an accurate character position high heat map.

本说明书实施例中，通过生成伪标签，监督学习出单行文本图片中字符位置高热图，在后续使用该特征提取神经网络对文本识别时，可以直接对篇幅级密集文本上的每一个特征点进行预测分类，从而每个有字符的位置上就会产生一个相应的结果，保证识别文本的完整性。In the embodiments of the present specification, pseudo labels are generated and supervised learning is performed to obtain a heat map of the character positions in a single-line text image. When the feature extraction neural network is subsequently used for text recognition, each feature point on the dense text at the page level can be directly predicted and classified, so that a corresponding result will be generated at each position where a character is located, thereby ensuring the integrity of the recognized text.

本说明书另一实施例中，所述根据所述样本图片的二维特征图以及字符位置高热图获得所述样本图片的一维特征图包括：In another embodiment of the present specification, obtaining a one-dimensional feature map of the sample image according to the two-dimensional feature map of the sample image and the character position high heat map includes:

将所述样本图片的二维特征图以及字符位置高热图进行垂直维度求和，获得所述样本图片的一维特征图。The two-dimensional feature map of the sample image and the character position high heat map are summed in the vertical dimension to obtain a one-dimensional feature map of the sample image.

具体的，由于通过单行文本的样本图片对特征提取神经网络进行训练，因此需要获得一维图片上的文本的识别结果，那么在获得样本图片的二维特征图以及字符位置高热图之后，将样本图片的二维特征图以及字符位置高热图进行垂直求和，以获得该样本图片的一维特征图，便于后续可以基于一维特征图中的预测结果对该特征提取神经网络的训练参数进行调整，保证该特征提取神经网络的预测准确性。Specifically, since the feature extraction neural network is trained through sample images of single-line text, it is necessary to obtain the recognition results of the text on the one-dimensional image. Therefore, after obtaining the two-dimensional feature map and the character position high heat map of the sample image, the two-dimensional feature map and the character position high heat map of the sample image are vertically summed to obtain the one-dimensional feature map of the sample image, so that the training parameters of the feature extraction neural network can be adjusted based on the prediction results in the one-dimensional feature map to ensure the prediction accuracy of the feature extraction neural network.

具体实施时，所述对所述一维特征图进行分类，以获得所述样本图片的文本预测结果包括：In a specific implementation, the classifying of the one-dimensional feature map to obtain the text prediction result of the sample image includes:

对所述一维特征图进行分类，获得所述样本图片的初始文本预测结果；Classifying the one-dimensional feature map to obtain an initial text prediction result of the sample image;

对所述初始文本预测结果进行CTC解码，以获得所述样本图片的文本预测结果。The initial text prediction result is subjected to CTC decoding to obtain a text prediction result of the sample image.

具体的，在获得样本图片的一维特征图之后，还会对一维特征图进行分类，即对一维特征图中的字符进行预测，以获得样本图片的初始文本预测结果，然后对该样本图片的初始文本预测结果进行CTC解码，即去除重复预测的字符，以获得样本图片的最终的文本预测结果。Specifically, after obtaining the one-dimensional feature map of the sample image, the one-dimensional feature map will be classified, that is, the characters in the one-dimensional feature map will be predicted to obtain the initial text prediction result of the sample image, and then the initial text prediction result of the sample image will be CTC decoded, that is, the repeatedly predicted characters will be removed to obtain the final text prediction result of the sample image.

本说明书实施例中，通过对一维特征图分类预测的初始文本预测结果进行CTC解码，去除重复预测的字符，以保证预测获得的样本图片的文本预测结果的准确性。In the embodiments of the present specification, CTC decoding is performed on the initial text prediction result of the one-dimensional feature map classification prediction to remove the repeatedly predicted characters, so as to ensure the accuracy of the text prediction result of the sample image obtained by prediction.

而在获得最终的样本图片的文本预测结果后，将该样本图片的文本预测结果与该样本图片真实的文本标签进行比对，对该初始特征提取神经网络进行CTC损失计算，通过计算获得的损失函数实现对初始特征提取神经网络的训练，以获得最终的调整后的精确的特征提取神经网络。After obtaining the final text prediction result of the sample image, the text prediction result of the sample image is compared with the actual text label of the sample image, and the CTC loss is calculated for the initial feature extraction neural network. The initial feature extraction neural network is trained through the calculated loss function to obtain the final adjusted and accurate feature extraction neural network.

本说明书实施例中，在对特征提取神经网络训练时，使用单行文本的文本图片基于CTC损失函数对特征提取神经网络进行训练，通过生成伪标签的方式，监督学习出单行文本中字符位置的高热图，以及通过CTC解码的方式对预测的一维特征图中的初始文本预测结果进行处理，保证训练获得的特征提取神经网络的识别精度以及识别效率。In the embodiments of the present specification, when training the feature extraction neural network, a text image of a single line of text is used to train the feature extraction neural network based on the CTC loss function. By generating pseudo labels, a high heat map of the character positions in the single line of text is supervised and learned, and the initial text prediction results in the predicted one-dimensional feature map are processed by CTC decoding to ensure the recognition accuracy and efficiency of the feature extraction neural network obtained by training.

步骤206：对所述二维特征图中的字符进行预测，获得包含预测字符的二维预测结果图。Step 206: predicting the characters in the two-dimensional feature map to obtain a two-dimensional prediction result map containing the predicted characters.

其中，预测字符可以理解为预测获得的文本图片中的字符。The predicted characters may be understood as characters in the predicted text image.

具体的，对文本图片的二维特征图中的字符进行预测，以获得该文本图片的二维预测结果图，该二维预测结果图中存放的为对二维特征图中的字符进行预测获得的预测结果。Specifically, characters in a two-dimensional feature map of a text image are predicted to obtain a two-dimensional prediction result map of the text image, wherein the two-dimensional prediction result map stores prediction results obtained by predicting the characters in the two-dimensional feature map.

步骤208：对所述二维预测结果图中的预测字符进行处理，获得所述文本图片中的字符的识别结果。Step 208: Process the predicted characters in the two-dimensional prediction result image to obtain recognition results of the characters in the text image.

具体的，所述对所述二维预测结果图中的预测字符进行处理，获得所述文本图片中的字符的识别结果包括：Specifically, the processing of the predicted characters in the two-dimensional prediction result image to obtain the recognition results of the characters in the text image includes:

对所述二维预测结果图中的预测字符进行八邻域合并，获得所述预测字符合并后生成的目标字符，其中，所述目标字符携带有在所述二维预测结果图中的横坐标和纵坐标；Merging eight neighborhoods of the predicted characters in the two-dimensional prediction result graph to obtain a target character generated after the predicted characters are merged, wherein the target character carries a horizontal coordinate and a vertical coordinate in the two-dimensional prediction result graph;

基于所述目标字符的横坐标和纵坐标对所述目标字符进行排列，根据所述目标字符的排列结果确定所述文本图片中的字符的识别结果。The target characters are arranged based on the horizontal coordinates and the vertical coordinates of the target characters, and the recognition results of the characters in the text image are determined according to the arrangement results of the target characters.

其中，八邻域合并的作用是为了将重复预测获得的字符合并为一个字符，而不会影响不同领域的其他相同字符。Among them, the purpose of eight-neighborhood merging is to merge the characters obtained by repeated prediction into one character without affecting other identical characters in different fields.

举例说明，二维预测结果图中的预测字符为“我我爱中国，我是中国人”，那么对该二维预测结果图中的预测字符进行八邻域合并之后，获得的预测字符合并后的目标字符则为“我爱中国，我是中国人”。其中，每个目标字符均携带有在二维预测结果图中的纵坐标和横坐标。For example, if the predicted characters in the two-dimensional prediction result graph are "I love China, I am Chinese", then after the eight-neighborhood merging of the predicted characters in the two-dimensional prediction result graph, the target characters after the merging of the predicted characters are "I love China, I am Chinese". Each target character carries the ordinate and abscissa in the two-dimensional prediction result graph.

具体实施时，先对二维预测结果图中的预测字符进行八邻域合并，获得目标字符，然后根据目标字符的横坐标、纵坐标对该目标字符进行排列，将排列后的目标字符作为该文本图片中的字符的目标识别结果，即该文本图片中的具体文本内容。In the specific implementation, the predicted characters in the two-dimensional prediction result image are first merged into eight neighborhoods to obtain the target characters, and then the target characters are arranged according to the horizontal and vertical coordinates of the target characters, and the arranged target characters are used as the target recognition results of the characters in the text image, that is, the specific text content in the text image.

本说明书实施例中，采用八邻域合并的方式对二维预测结果图中的预测字符进行去重，在获得准确的目标字符的同时，也可以节省对目标字符的处理时间，提高目标字符的排列效率。In the embodiment of the present specification, the predicted characters in the two-dimensional prediction result image are deduplicated by merging eight neighborhoods. While obtaining accurate target characters, the processing time of the target characters can also be saved, and the arrangement efficiency of the target characters can be improved.

本说明书另一实施例中，所述基于所述目标字符的横坐标和纵坐标对所述目标字符进行排列，根据所述目标字符的排列结果确定所述文本图片中的字符的识别结果包括：In another embodiment of the present specification, the arranging the target characters based on the horizontal coordinates and the vertical coordinates of the target characters, and determining the recognition results of the characters in the text image according to the arrangement results of the target characters includes:

基于所述目标字符的横坐标对所述目标字符按照从小到大的顺序进行排序，以形成排序后的第一目标字符集合；sorting the target characters in ascending order based on the horizontal coordinates of the target characters to form a sorted first target character set;

将所述第一目标字符集合中的第一位目标字符存放至第二目标字符集合中，且将所述第一位目标字符的纵坐标确定为当前滑动平均值，以及删除所述第一目标字符集合中的第一位目标字符；storing the first target character in the first target character set into the second target character set, determining the ordinate of the first target character as the current sliding average, and deleting the first target character in the first target character set;

根据所述当前滑动平均值对所述第一目标字符集合中的、除所述第一位目标字符之外的其他目标字符进行遍历，以获得第三目标字符集合，其中，所述第三目标字符集合中包括至少一个第二目标字符集合；traversing the other target characters in the first target character set except the first target character according to the current sliding average value to obtain a third target character set, wherein the third target character set includes at least one second target character set;

基于每个第二目标字符集合中的第一位目标字符的纵坐标，对所述第三目标字符集合中的第二目标字符集合进行排序后合并，并根据合并结果确定所述文本图片中的字符的识别结果。Based on the ordinate of the first target character in each second target character set, the second target character sets in the third target character set are sorted and then merged, and the recognition result of the characters in the text image is determined according to the merged result.

其中，第一目标字符集合与第二目标字符集合均为存放目标字符的集合，不同的是，第一目标字符集合中初始包含的是全部的目标字符，而第二目标字符集合中包含的是基于横坐标和纵坐标确定的位于同一行的目标字符，且第一目标字符集合为一个，第二目标字符集合为至少一个，一般对于篇幅级密集文本的文本图片来讲，第二目标字符集合为多个。Among them, the first target character set and the second target character set are both sets for storing target characters. The difference is that the first target character set initially contains all target characters, while the second target character set contains target characters located in the same row determined based on the horizontal and vertical coordinates. There is one first target character set and at least one second target character set. Generally, for text images with dense text at the page level, there are multiple second target character sets.

具体的，首先基于每个目标字符的横坐标对所有的目标字符按照从小到大的顺序进行排序，将排序后的目标字符存放至第一目标字符集合，将第一目标字符集合中的第一位目标字符，即横坐标最大的目标字符存放至第二目标字符集合中，并将该第一位目标字符的纵坐标作为当前滑动平均值，且删除第一目标字符集合中的第一位目标字符。Specifically, firstly, all target characters are sorted in ascending order based on the horizontal coordinate of each target character, and the sorted target characters are stored in the first target character set. The first target character in the first target character set, that is, the target character with the largest horizontal coordinate, is stored in the second target character set, and the vertical coordinate of the first target character is used as the current sliding average, and the first target character in the first target character set is deleted.

然后基于当前滑动平均值对该第一目标字符集合中的、除第一位目标字符之外的其他目标字符进行遍历，以获得包含至少一个第二目标字符集合的第三目标字符集合；最后，获取每个第二目标字符集合中的第一位目标字符的纵坐标，基于该纵坐标对所有的第二目标字符集合进行排序合并，将排序合并后的所有的第二目标字符集合中的目标字符作为该文本图片中的字符的识别结果。Then, based on the current sliding average, the other target characters in the first target character set except the first target character are traversed to obtain a third target character set including at least one second target character set; finally, the vertical coordinate of the first target character in each second target character set is obtained, and all the second target character sets are sorted and merged based on the vertical coordinate, and the target characters in all the sorted and merged second target character sets are used as the recognition results of the characters in the text image.

本说明书实施例中，基于目标字符的横坐标和纵坐标，先获取出每一行的目标字符，再对每一行的目标字符基于纵坐标进行排序合并，已实现对文本图片中的字符的准确识别，避免出现行字符遗漏以及列文本字符不对应的情况发生，提升用户体验。In the embodiments of the present specification, based on the horizontal and vertical coordinates of the target characters, the target characters of each row are first obtained, and then the target characters of each row are sorted and merged based on the vertical coordinates, so as to achieve accurate recognition of the characters in the text image, avoid the omission of row characters and the mismatch of column text characters, and improve the user experience.

本说明书另一实施例中，所述第一位目标字符为第i位目标字符，其中，i∈【1，n】，且i为正整数；In another embodiment of the present specification, the first target character is the i-th target character, where i∈[1,n], and i is a positive integer;

相应的，所述根据所述当前滑动平均值对所述第一目标字符集合中的、除所述第一位目标字符之外的其他目标字符进行遍历，以获得第三目标字符集合包括：Correspondingly, traversing other target characters in the first target character set except the first target character according to the current sliding average value to obtain a third target character set includes:

S1、获取所述第一目标字符集合中的第i+1位目标字符，基于所述当前滑动平均值计算获得所述i+1位目标字符的滑动平均值；S1, obtaining the i+1th target character in the first target character set, and calculating the sliding average value of the i+1th target character based on the current sliding average value;

S2、判断所述i+1位目标字符的滑动平均值是否小于预设滑动阈值，S2: Determine whether the sliding average value of the i+1 target character is less than a preset sliding threshold value.

若是，则将所述i+1位目标字符存放至所述第二目标字符集合，且删除所述第一目标字符集合中的第i+1位目标字符，以及基于所述i+1位目标字符的滑动平均值更新所述当前滑动平均值；If yes, then storing the i+1th target character into the second target character set, deleting the i+1th target character in the first target character set, and updating the current sliding average value based on the sliding average value of the i+1th target character;

判断i+1是否大于n，Determine whether i+1 is greater than n,

若是，则将所述第二目标字符集合存放至所述第三目标字符集合，且判断所述第一目标字符集合是否为空，If yes, then the second target character set is stored in the third target character set, and it is determined whether the first target character set is empty.

若是，则获得第三目标字符集合；If yes, then obtain a third target character set;

若否，则新建第二目标字符集合，将所述第一目标字符集合中的第i位目标字符存放至新建的第二目标字符集合中，且将所述第i位目标字符的纵坐标确定为当前滑动平均值，删除所述第一目标字符集合中的第i位目标字符，以及继续执行步骤S1，If not, a new second target character set is created, the i-th target character in the first target character set is stored in the newly created second target character set, the ordinate of the i-th target character is determined as the current sliding average, the i-th target character in the first target character set is deleted, and step S1 is continued.

若否，则将i自增1，继续执行步骤S1，If not, then i is incremented by 1 and step S1 is continued.

若否，则将i自增1，继续执行步骤S1。If not, i is incremented by 1 and step S1 is continued.

具体的，以i为1进行说明，在基于所有目标字符的横坐标对目标字符按照从小到大的顺序进行排序，形成排序后的第一目标字符集合，且将第一目标字符集合中的第一位目标字符存放至第二目标字符集合中，将第一位目标字符的纵坐标确定为当前滑动平均值，以及删除第一目标字符集合中的第一位目标字符后，继续获取第一目标字符集合中的第2位的目标字符，即紧邻第一位目标字符的下一位目标字符，基于当前滑动平均值以及预设计算公式获得第2位目标字符的滑动平均值，其中，第一位目标字符的滑动平均值为：Specifically, i is 1 for illustration, the target characters are sorted in ascending order based on the horizontal coordinates of all target characters to form a sorted first target character set, and the first target character in the first target character set is stored in the second target character set, the vertical coordinate of the first target character is determined as the current sliding average value, and after deleting the first target character in the first target character set, the second target character in the first target character set is continued to be obtained, that is, the next target character next to the first target character, and the sliding average value of the second target character is obtained based on the current sliding average value and a preset calculation formula, wherein the sliding average value of the first target character is:

y_滑动(1)＝第一位目标字符的y，其中，y_滑动(1)表示第一位目标字符的滑动平均值，第一位目标字符的y表示第一位目标字符的纵坐标。_yslide(1) =y of the first target character, where_yslide(1) represents the sliding average of the first target character, and y of the first target character represents the ordinate of the first target character.

预设计算公式为：The default calculation formula is:

y_滑动(n)＝0.9*y_滑动(n-1)+0.1*第n个字符的y_yslide(n) = 0.9*_yslide(n-1) + 0.1*y of the nth character

其中，y_滑动(n)表示第n位目标字符的滑动平均值，y_滑动(n-1)表示n的上一位目标字符的滑动平均值，即当前滑平均值，第n个字符的y表示第n个字符的纵坐标。Among them,_yslide(n) represents the sliding average of the nth target character,_yslide(n-1) represents the sliding average of the previous target character n, that is, the current sliding average, and y of the nth character represents the vertical coordinate of the nth character.

即以n为2为例，那么在计算第2位目标字符的滑动平均值的时候，将第2位目标字符的纵坐标以及第一位目标字符的滑动平均值代入到预设计算公式中，即可计算出第2位目标字符的滑动平均值。That is, taking n as 2 as an example, when calculating the sliding average of the second target character, the vertical coordinate of the second target character and the sliding average of the first target character are substituted into the preset calculation formula to calculate the sliding average of the second target character.

然后判断第2位目标字符的滑动平均值是否小于预设滑动阈值，其中，预设滑动阈值可以根据实际应用进行设置，例如设置为2、3等，本说明书并不对此进行任何限定；若是，则将第2为目标字符也存放至第一位目标字符所在的第二目标字符集合中，且删除第一目标字符集合中的第2位目标字符，以及将第2位目标字符的滑动平均值作为当前滑动平均值；若否，则说明第2位目标字符与第一位目标字符不是同一行的文字，这时则先不处理第2位目标字符，继续采用上述方式对第3位目标字符进行处理。Then, it is determined whether the sliding average value of the second target character is less than a preset sliding threshold value, wherein the preset sliding threshold value can be set according to the actual application, for example, set to 2, 3, etc., and this specification does not impose any limitation on this; if so, the second target character is also stored in the second target character set where the first target character is located, and the second target character in the first target character set is deleted, and the sliding average value of the second target character is used as the current sliding average value; if not, it means that the second target character and the first target character are not in the same line of text. In this case, the second target character is not processed first, and the third target character is processed in the above manner.

而在第2位目标字符的滑动平均值小于预设滑动阈值，将第2位目标字符存放至第二目标字符集合之后，还要判断第2位字符是否为第一目标字符集合中的最后一位目标字符，若是，将包含了第一位、第二位目标字符的第二目标字符集合存放至第三目标字符集合中，且判断第一目标字符集合是否为空，若第一目标字符集合为空，则说明所有的目标字符均已经遍历完成，直接获得第三目标字符集合即可，若不为空，则说明虽然第一目标字符集合中还存在目标字符，但是其他的目标字符与第一位、第二位目标字符不在同一行，此时，就另外新建一个第二目标字符集合，将第一目标字符集合中剩余的最大的第一位目标字符存放至新建的第二目标字符集合中，继续重复上述步骤，直至第一目标字符集合中无目标字符，即目标字符均已经存放至对应的第二目标字符集合中，然后将所有的第二目标字符集合集中存放至第三目标字符集合中。When the sliding average value of the second target character is less than the preset sliding threshold value, after the second target character is stored in the second target character set, it is necessary to determine whether the second character is the last target character in the first target character set. If so, the second target character set including the first and second target characters is stored in the third target character set, and it is determined whether the first target character set is empty. If the first target character set is empty, it means that all target characters have been traversed and the third target character set can be directly obtained. If it is not empty, it means that although there are still target characters in the first target character set, other target characters are not in the same row as the first and second target characters. At this time, a new second target character set is created, and the largest first target character remaining in the first target character set is stored in the newly created second target character set. The above steps are repeated until there are no target characters in the first target character set, that is, all target characters have been stored in the corresponding second target character set, and then all second target character sets are stored in the third target character set.

本说明书实施例中，先将所有的目标字符基于横坐标和纵坐标进行单行排列，再将单行排列出来的目标字符存放在一个大的集合中，后续通过对每行的目标字符进行列排列后即可获得文本图片的最终字符识别结果，通过此种方式可以完整、快速的获得该文本图片的识别结果。In the embodiments of the present specification, all target characters are first arranged in a single row based on the horizontal and vertical coordinates, and then the target characters arranged in a single row are stored in a large set. Subsequently, the final character recognition result of the text image can be obtained by arranging the target characters in each row in columns. In this way, the recognition result of the text image can be obtained completely and quickly.

本说明书另一实施例中，所述基于每个第二目标字符集合中的第一位目标字符的纵坐标，对所述第三目标字符集合中的第二目标字符集合进行排序后合并，并根据合并结果确定所述文本图片中的字符的识别结果包括：In another embodiment of the present specification, the step of sorting and merging the second target character sets in the third target character set based on the ordinate of the first target character in each second target character set, and determining the recognition result of the characters in the text image according to the merging result includes:

获得所述第三目标字符集合中的所有第二目标字符集合；Obtain all second target character sets in the third target character set;

确定每个第二目标字符集合中的第一位目标字符的纵坐标；Determine the ordinate of the first target character in each second target character set;

基于每个第二目标字符集合中的第一位目标字符的纵坐标对所有第二目标字符集合中的目标字符进行排列，将所有第二目标字符集合中的目标字符的排列结果确定为所述文本图片中的字符的识别结果。The target characters in all the second target character sets are arranged based on the ordinate of the first target character in each second target character set, and the arrangement result of the target characters in all the second target character sets is determined as the recognition result of the characters in the text picture.

具体实施时，对第三目标字符集合中的每行目标字符进行排列时，先获取每行的第一位目标字符的纵坐标，然后基于每行的第一位目标字符的纵坐标实现对所有行目标字符的排列，以获得准确的、整齐的文本图片的最终识别结果。In specific implementation, when arranging each row of target characters in the third target character set, first obtain the vertical coordinate of the first target character in each row, and then arrange all rows of target characters based on the vertical coordinate of the first target character in each row to obtain an accurate and neat final recognition result of the text image.

本说明书实施例中，所述文本识别方法通过基于单行文本图片训练获得的特征提取神经网络对待识别的包含多行文本的文本图片进行精确的二维特征图预测，并可以通过对精确的二维特征图中的预测字符的处理，获得文本图片中准确的文本字符。In the embodiments of the present specification, the text recognition method uses a feature extraction neural network obtained through training based on a single-line text image to accurately predict a two-dimensional feature map of a text image to be recognized containing multiple lines of text, and can obtain accurate text characters in the text image by processing the predicted characters in the accurate two-dimensional feature map.

参见图3，图3示出了根据本说明书一个实施例提供的一种文本识别方法中根据二维预测结果图获得文本图片的识别结果的流程图，具体包括以下步骤：Referring to FIG. 3 , FIG. 3 shows a flowchart of obtaining a recognition result of a text image according to a two-dimensional prediction result graph in a text recognition method provided according to an embodiment of the present specification, which specifically includes the following steps:

步骤302：获得二维预测结果图。Step 302: Obtain a two-dimensional prediction result graph.

步骤304：对二维预测结果图中的预测字符进行八邻域合并，获得合并后的目标字符。Step 304: performing eight-neighborhood merging on the predicted characters in the two-dimensional prediction result image to obtain a merged target character.

步骤306：根据目标字符的横坐标将目标字符按照从小到大的顺序进行排序，得到序列S。Step 306: Sort the target characters in ascending order according to the horizontal coordinates of the target characters to obtain a sequence S.

步骤308：取序列S中的第一位目标字符，放入序列D，且将该第一位目标字符的纵坐标作为当前滑动平均值。Step 308: Take the first target character in sequence S, put it into sequence D, and use the vertical coordinate of the first target character as the current sliding average.

步骤310：按顺序遍历序列S中的下一位字符，基于下一位字符的纵坐标以及当前滑动平均值计算下一位字符的滑动平均值。Step 310: traverse the next character in the sequence S in order, and calculate the sliding average value of the next character based on the ordinate of the next character and the current sliding average value.

步骤312：判断下一位字符的滑动平均值的绝对值是否小于3，若是，则执行步骤314，若否，则继续执行步骤310。Step 312: Determine whether the absolute value of the sliding average value of the next character is less than 3. If so, execute step 314; if not, continue to execute step 310.

步骤314：将下一位字符放入序列D，将计算获得的下一位字符的滑动平均值更新为当前滑动平均值，并将下一位字符从序列S中移除。Step 314: put the next character into sequence D, update the calculated sliding average value of the next character as the current sliding average value, and remove the next character from sequence S.

步骤316：判断是否已经遍历到序列S的最后一个目标字符，若是，则执行步骤318，若否，则继续执行步骤310。Step 316: Determine whether the last target character of the sequence S has been traversed. If so, execute step 318; if not, continue to execute step 310.

即判断是否遍历到序列S的终点。That is, determine whether the traversal has reached the end of the sequence S.

步骤318：将序列D放入集合D中。Step 318: Put sequence D into set D.

步骤320：判断序列S是否为空，若是，则执行步骤322，若否，则继续执行步骤308。Step 320: Determine whether the sequence S is empty. If so, execute step 322; if not, continue to execute step 308.

步骤322：获取集合D中的每一个序列D中的第一位字符的纵坐标，且基于每一个序列D中的第一位字符的纵坐标对所有的序列D进行排序、合并。Step 322: Obtain the ordinate of the first character in each sequence D in the set D, and sort and merge all the sequences D based on the ordinate of the first character in each sequence D.

步骤324：将排序、合并后的所有的序列D中的目标字符作为文本图片最终的识别结果。Step 324: taking the target characters in all the sorted and merged sequences D as the final recognition result of the text image.

本说明书实施例中，采用对二维预测结果图中的预测字符进行上述处理的方法可以解决任意字符数的篇幅级密集文本的解码，避免注意力机制中注意力图漂移的问题，解决了篇幅级密集文本上无法数据驱动的问题，实现对篇幅级密集文本的准确、快速识别，其中，数据驱动可以理解为当前卷积神经网络模型，很多是可以通过增加大量标注数据对卷积神经网络模型进行性能提升的，即可以用数据驱动性能；而无法数据驱动就是指，不管怎样增加数据，上述问题依旧无法解决，因为这是建模上存在的无法解决的问题。In the embodiments of the present specification, the method of performing the above-mentioned processing on the predicted characters in the two-dimensional prediction result graph can solve the decoding of dense text at the length level with any number of characters, avoid the problem of attention map drift in the attention mechanism, solve the problem of being unable to be data-driven on dense text at the length level, and achieve accurate and rapid recognition of dense text at the length level, wherein data-driven can be understood as the current convolutional neural network model, and many of them can improve the performance of the convolutional neural network model by adding a large amount of labeled data, that is, the performance can be driven by data; and being unable to be data-driven means that no matter how much data is added, the above-mentioned problem still cannot be solved, because this is an unsolvable problem in modeling.

参见图4，图4示出了根据本说明书一个实施例提供的第一种特征提取神经网络训练方法的流程图，具体包括以下步骤。Referring to FIG. 4 , FIG. 4 shows a flow chart of a first feature extraction neural network training method provided according to an embodiment of the present specification, which specifically includes the following steps.

步骤402：构建初始特征提取神经网络，并获取包含文本的样本图片训练集，其中，所述样本图片训练集中包括样本图片以及所述样本图片对应的文本标签。Step 402: construct an initial feature extraction neural network, and obtain a sample image training set containing text, wherein the sample image training set includes sample images and text labels corresponding to the sample images.

步骤404：基于所述初始特征提取神经网络对所述样本图片进行处理，获得所述样本图片的二维特征图，并基于所述二维特征图确定所述样本图片的字符位置高热图。Step 404: Process the sample image based on the initial feature extraction neural network to obtain a two-dimensional feature map of the sample image, and determine a character position high heat map of the sample image based on the two-dimensional feature map.

步骤406：根据所述样本图片的二维特征图以及字符位置高热图获得所述样本图片的一维特征图。Step 406: Obtain a one-dimensional feature map of the sample image according to the two-dimensional feature map of the sample image and the character position high heat map.

步骤408：对所述一维特征图进行分类，以获得所述样本图片的文本预测结果。Step 408: Classify the one-dimensional feature map to obtain a text prediction result of the sample image.

步骤410：基于所述样本图片对应的文本标签以及文本预测结果计算损失函数，且根据所述损失函数对所述初始特征提取神经网络进行训练，获得所述特征提取神经网络。Step 410: Calculate a loss function based on the text label corresponding to the sample image and the text prediction result, and train the initial feature extraction neural network according to the loss function to obtain the feature extraction neural network.

具体实施时，特征提取神经网络可以为卷积神经网络，也可以是其他可以实现对样本图片进行处理进行二维特征图获取的机器学习网络，本申请对此不做任何限定。In specific implementation, the feature extraction neural network can be a convolutional neural network, or other machine learning networks that can process sample images to obtain two-dimensional feature maps. This application does not impose any limitations on this.

实际应用中，由于仅通过二维特征图仅可以实现对每个字符的高热，但是不清楚每个高热字符的位置，那么此时就需要通过一定的字符位置序列才能实现对每个字符位置的准确排列，因此生成伪标签就可以在生成字符位置高热图的时候确哪些位置的字符进行高热，从而生成准确的字符位置高热图。In practical applications, since only the high heat of each character can be achieved through a two-dimensional feature map, but the position of each high heat character is unclear, a certain character position sequence is required to accurately arrange the position of each character. Therefore, generating pseudo labels can determine which characters are high heat when generating a character position high heat map, thereby generating an accurate character position high heat map.

而在获的最终的样本图片的文本预测结果后，将该样本图片的文本预测结果与该样本图片真实的文本标签进行比对，对该初始特征提取神经网络进行CTC损失计算，通过计算获得的损失函数实现对初始特征提取神经网络的训练，以获得最终的调整后的精确的特征提取神经网络。After obtaining the final text prediction result of the sample image, the text prediction result of the sample image is compared with the actual text label of the sample image, and the CTC loss is calculated for the initial feature extraction neural network. The initial feature extraction neural network is trained by calculating the loss function to obtain the final adjusted and accurate feature extraction neural network.

参见图5，图5示出了根据本说明书一个实施例提供的一种特征提取神经网络训练方法的具体训练流程示意图，具体包括以下步骤。Referring to FIG. 5 , FIG. 5 shows a specific training process diagram of a feature extraction neural network training method provided according to an embodiment of the present specification, which specifically includes the following steps.

其中，该特征提取神经网络为卷积神经网络。Among them, the feature extraction neural network is a convolutional neural network.

步骤502：构建初始卷积神经网络，获取单行输入图片以及该单行输入图片对应的文本标签，且将单行输入图片输入初始卷积神经网络的卷积层进行卷积。Step 502: construct an initial convolutional neural network, obtain a single-line input image and a text label corresponding to the single-line input image, and input the single-line input image into the convolutional layer of the initial convolutional neural network for convolution.

步骤504：获得单行输入图片卷积后的二维特征图。Step 504: Obtain a two-dimensional feature map after convolution of a single row of input images.

步骤506：对单行输入图片进行腐蚀、二值化、反相以及等比例缩放等预处理，生成伪标签。Step 506: Perform preprocessing such as corrosion, binarization, inversion, and proportional scaling on the single-line input image to generate a pseudo label.

步骤508：基于二维特征图生成字符位置高热图。Step 508: Generate a character position heat map based on the two-dimensional feature map.

步骤510：基于伪标签计算字符位置高热图的均方误差损失，基于该均方误差损失调整字符位置高热图。Step 510: Calculate the mean square error loss of the character position high heat map based on the pseudo-labels, and adjust the character position high heat map based on the mean square error loss.

步骤512：将二维特征图和字符位置高热图进行垂直维度求和，获得一维特征图。Step 512: vertically sum the two-dimensional feature map and the character position heat map to obtain a one-dimensional feature map.

步骤514：对一维特征图进行分类，获得分类后的初始预测序列。Step 514: classify the one-dimensional feature map to obtain a classified initial prediction sequence.

步骤516：对分类后的初始预测序列进行CTC解码，获得目标预测序列。Step 516: Perform CTC decoding on the classified initial prediction sequence to obtain a target prediction sequence.

步骤518：根据单行输入图片的目标预测序列与对应的文本标签计算CTC损失函数，基于CTC损失函数对初始卷积神经网络进行训练，获得卷积神经网络。Step 518: Calculate the CTC loss function according to the target prediction sequence of the single-line input image and the corresponding text label, and train the initial convolutional neural network based on the CTC loss function to obtain a convolutional neural network.

本说明书实施例中，该卷积神经网络采用ResNet为backbone(主干网)，通过将单行输入图片输入卷积神经网络得到二维特征图，并生成相应的文本高热图，将文本高热图与二维特征图点乘，在列维度上求和得到一维特征图，对一维特征图进行分类及CTC解码，对输入图片进行伪标签生成，与文本高热图做均方误差损失进行监督；以及对预测序列和文本标签使用CTC损失进行监督，实现卷积神经网络的快速、精确训练。此外，本说明书实施例中的卷积神经网络的backbone可以有更多的选择，例如DenseNet、ResNest等；且生成伪标签的过程可以通过其它图像处理的方式，只要能基本表示出文本图片中所有字符的中心位置即可，在此不做任何限定。In the embodiment of this specification, the convolutional neural network uses ResNet as the backbone, and obtains a two-dimensional feature map by inputting a single row of input images into the convolutional neural network, and generates a corresponding text high heat map, multiplies the text high heat map with the two-dimensional feature map, and sums it in the column dimension to obtain a one-dimensional feature map, classifies the one-dimensional feature map and CTC decodes it, generates pseudo labels for the input images, and supervises the mean square error loss with the text high heat map; and supervises the predicted sequence and text labels using CTC loss to achieve fast and accurate training of the convolutional neural network. In addition, the backbone of the convolutional neural network in the embodiment of this specification can have more choices, such as DenseNet, ResNest, etc.; and the process of generating pseudo labels can be done by other image processing methods, as long as it can basically represent the center position of all characters in the text image, and no limitation is made here.

参见图6，图6示出了根据本说明书一个实施例提供的第二种文本识别方法的流程图，具体包括以下步骤。Referring to FIG. 6 , FIG. 6 shows a flow chart of a second text recognition method provided according to an embodiment of the present specification, which specifically includes the following steps.

步骤602：基于用户的调用请求为所述用户展示图片输入界面。Step 602: Displaying a picture input interface to the user based on the user's call request.

步骤604：获取所述用户基于所述图片输入界面输入的待识别的文本图片。Step 604: Obtain the text image to be recognized input by the user based on the image input interface.

步骤606：将所述文本图片输入特征提取神经网络进行处理，获得所述文本图片的二维特征图，其中，所述二维特征图中包括所述文本图片中的字符。Step 606: Input the text image into a feature extraction neural network for processing to obtain a two-dimensional feature map of the text image, wherein the two-dimensional feature map includes the characters in the text image.

步骤608：对所述二维特征图中的字符进行预测，获得包含预测字符的二维预测结果图。Step 608: predicting the characters in the two-dimensional feature map to obtain a two-dimensional prediction result map containing the predicted characters.

步骤610：对所述二维预测结果图中的预测字符进行处理，获得所述文本图片中的字符的识别结果并返回给所述用户。Step 610: Process the predicted characters in the two-dimensional prediction result image, obtain recognition results of the characters in the text image and return them to the user.

需要说明的是，本说明书实施例提供的第二种文本识别方法中与上述第一种文本识别方法的实施例相对应的部分，可以参见上述第一种文本识别方法的实施例中的详细描述，在此不再赘述。It should be noted that, for the part of the second text recognition method provided in the embodiments of this specification that corresponds to the embodiment of the first text recognition method mentioned above, reference can be made to the detailed description in the embodiment of the first text recognition method mentioned above, and will not be repeated here.

参见图7，图7示出了根据本说明书一个实施例提供的第三种文本识别方法的流程图，具体包括以下步骤。Referring to FIG. 7 , FIG. 7 shows a flow chart of a third text recognition method provided according to an embodiment of the present specification, which specifically includes the following steps.

步骤702：接收用户发送的调用请求，其中，所述调用请求中携带待识别的文本图片。Step 702: Receive a call request sent by a user, wherein the call request carries a text image to be recognized.

步骤704：将所述文本图片输入特征提取神经网络进行处理，获得所述文本图片的二维特征图，其中，所述二维特征图中包括所述文本图片中的字符。Step 704: Input the text image into a feature extraction neural network for processing to obtain a two-dimensional feature map of the text image, wherein the two-dimensional feature map includes the characters in the text image.

步骤706：对所述二维特征图中的字符进行预测，获得包含预测字符的二维预测结果图。Step 706: predicting the characters in the two-dimensional feature map to obtain a two-dimensional prediction result map containing the predicted characters.

步骤708：对所述二维预测结果图中的预测字符进行处理，获得所述文本图片中的字符的识别结果并返回给所述用户。Step 708: Process the predicted characters in the two-dimensional prediction result image, obtain recognition results of the characters in the text image and return them to the user.

需要说明的是，本说明书实施例提供的第三种文本识别方法中与上述第一种文本识别方法的实施例相对应的部分，可以参见上述第一种文本识别方法的实施例中的详细描述，在此不再赘述。It should be noted that, for the part of the third text recognition method provided in the embodiments of this specification that corresponds to the embodiment of the first text recognition method mentioned above, reference can be made to the detailed description in the embodiment of the first text recognition method mentioned above, and will not be repeated here.

参见图8，图8示出了根据本说明书一个实施例提供的第二种特征提取神经网络训练方法的流程图，具体包括以下步骤。Referring to FIG. 8 , FIG. 8 shows a flow chart of a second feature extraction neural network training method provided according to an embodiment of the present specification, which specifically includes the following steps.

步骤802：基于用户的调用请求为所述用户展示图片输入界面。Step 802: Displaying a picture input interface to the user based on the user's call request.

步骤804：接收所述用户基于所述图片输入界面输入的包含文本的样本图片训练集，其中，所述样本图片训练集中包括样本图片以及所述样本图片对应的文本标签。Step 804: receiving a sample picture training set containing text input by the user based on the picture input interface, wherein the sample picture training set includes sample pictures and text labels corresponding to the sample pictures.

步骤806：构建初始特征提取神经网络，基于所述初始特征提取神经网络对所述样本图片进行处理，获得所述样本图片的二维特征图，并基于所述二维特征图确定所述样本图片的字符位置高热图。Step 806: construct an initial feature extraction neural network, process the sample image based on the initial feature extraction neural network, obtain a two-dimensional feature map of the sample image, and determine a character position high heat map of the sample image based on the two-dimensional feature map.

步骤808：根据所述样本图片的二维特征图以及字符位置高热图获得所述样本图片的一维特征图。Step 808: Obtain a one-dimensional feature map of the sample image according to the two-dimensional feature map of the sample image and the character position high heat map.

步骤810：对所述一维特征图进行分类，以获得所述样本图片的文本预测结果。Step 810: Classify the one-dimensional feature map to obtain a text prediction result of the sample image.

步骤812：基于所述样本图片对应的文本标签以及文本预测结果计算损失函数，且根据所述损失函数对所述初始特征提取神经网络进行训练，获得所述特征提取神经网络。Step 812: Calculate a loss function based on the text label corresponding to the sample image and the text prediction result, and train the initial feature extraction neural network according to the loss function to obtain the feature extraction neural network.

需要说明的是，本说明书实施例提供的第二种特征提取神经网络训练方法中与上述第一种特征提取神经网络训练方法的实施例相对应的部分，可以参见上述第一种特征提取神经网络训练方法的实施例中的详细描述，在此不再赘述。It should be noted that, for the part of the second feature extraction neural network training method provided in the embodiments of this specification that corresponds to the embodiment of the first feature extraction neural network training method mentioned above, reference can be made to the detailed description in the embodiment of the first feature extraction neural network training method mentioned above, and will not be repeated here.

参见图9，图9示出了根据本说明书一个实施例提供的第三种特征提取神经网络训练方法的流程图，具体包括以下步骤。Referring to FIG. 9 , FIG. 9 shows a flow chart of a third feature extraction neural network training method provided according to an embodiment of the present specification, which specifically includes the following steps.

步骤902：接收用户发送的调用请求，其中，所述调用请求中携带包含文本的样本图片训练集，所述样本图片训练集中包括样本图片以及所述样本图片对应的文本标签。Step 902: receiving a call request sent by a user, wherein the call request carries a sample picture training set containing text, and the sample picture training set includes sample pictures and text labels corresponding to the sample pictures.

步骤904：构建初始特征提取神经网络。Step 904: construct an initial feature extraction neural network.

步骤906：基于所述初始特征提取神经网络对所述样本图片进行处理，获得所述样本图片的二维特征图，并基于所述二维特征图确定所述样本图片的字符位置高热图。Step 906: Process the sample image based on the initial feature extraction neural network to obtain a two-dimensional feature map of the sample image, and determine a character position high heat map of the sample image based on the two-dimensional feature map.

步骤908：根据所述样本图片的二维特征图以及字符位置高热图获得所述样本图片的一维特征图。Step 908: Obtain a one-dimensional feature map of the sample image according to the two-dimensional feature map of the sample image and the character position high heat map.

步骤910：对所述一维特征图进行分类，以获得所述样本图片的文本预测结果。Step 910: classify the one-dimensional feature map to obtain a text prediction result of the sample image.

步骤912：基于所述样本图片对应的文本标签以及文本预测结果计算损失函数，且根据所述损失函数对所述初始特征提取神经网络进行训练，获得所述特征提取神经网络并返回给所述用户。Step 912: Calculate the loss function based on the text label corresponding to the sample image and the text prediction result, and train the initial feature extraction neural network according to the loss function to obtain the feature extraction neural network and return it to the user.

需要说明的是，本说明书实施例提供的第三种特征提取神经网络训练方法中与上述第一种特征提取神经网络训练方法的实施例相对应的部分，可以参见上述第一种特征提取神经网络训练方法的实施例中的详细描述，在此不再赘述。It should be noted that, for the part of the third feature extraction neural network training method provided in the embodiments of this specification that corresponds to the embodiment of the first feature extraction neural network training method mentioned above, reference can be made to the detailed description in the embodiment of the first feature extraction neural network training method mentioned above, and will not be repeated here.

与上述方法实施例相对应，本说明书还提供了文本识别装置实施例，图10示出了本说明书一个实施例提供的第一种文本识别装置的结构示意图。如图10所示，该装置包括：Corresponding to the above method embodiment, this specification also provides a text recognition device embodiment. FIG10 shows a schematic diagram of the structure of a first text recognition device provided by an embodiment of this specification. As shown in FIG10 , the device includes:

第一获取模块1002，被配置为获取待识别的文本图片；The first acquisition module 1002 is configured to acquire a text image to be recognized;

第一获得模块1004，被配置为将所述文本图片输入特征提取神经网络进行处理，获得所述文本图片的二维特征图，其中，所述二维特征图中包括所述文本图片中的字符；The first obtaining module 1004 is configured to input the text image into a feature extraction neural network for processing to obtain a two-dimensional feature map of the text image, wherein the two-dimensional feature map includes characters in the text image;

第二获得模块1006，被配置为对所述二维特征图中的字符进行预测，获得包含预测字符的二维预测结果图；The second obtaining module 1006 is configured to predict the characters in the two-dimensional feature map and obtain a two-dimensional prediction result map containing the predicted characters;

第三获得模块1008，被配置为对所述二维预测结果图中的预测字符进行处理，获得所述文本图片中的字符的识别结果。The third obtaining module 1008 is configured to process the predicted characters in the two-dimensional prediction result image to obtain recognition results of the characters in the text image.

可选的，所述第三获得模块1008，进一步被配置为：Optionally, the third obtaining module 1008 is further configured to:

所述基于所述目标字符的横坐标和纵坐标对所述目标字符进行排列，根据所述目标字符的排列结果确定所述文本图片中的字符的识别结果包括：The step of arranging the target characters based on the horizontal coordinates and the vertical coordinates of the target characters and determining the recognition results of the characters in the text image according to the arrangement results of the target characters includes:

可选的，所述第一位目标字符为第i位目标字符，其中，i∈【1，n】，且i为正整数；Optionally, the first target character is the i-th target character, where i∈[1,n], and i is a positive integer;

相应的，所述第三获得模块1008，进一步被配置为：Accordingly, the third obtaining module 1008 is further configured to:

判断i+1是否大于n，Determine whether i+1 is greater than n,

可选的，所述特征提取神经网络通过网络训练模块训练获得，所述网络训练模块被配置为：Optionally, the feature extraction neural network is obtained by training a network training module, and the network training module is configured as follows:

可选的，所述装置，还包括：Optionally, the device further includes:

标签生成模块，被配置为对所述样本图片进行预处理，以生成所述样本图片的伪标签。The label generation module is configured to pre-process the sample image to generate a pseudo label for the sample image.

可选的，所述所述网络训练模块，进一步被配置为：Optionally, the network training module is further configured to:

上述为本实施例的一种文本识别装置的示意性方案。需要说明的是，该文本识别装置的技术方案与上述的第一种文本识别方法的技术方案属于同一构思，文本识别装置的技术方案未详细描述的细节内容，均可以参见上述文本识别方法的技术方案的描述。The above is a schematic scheme of a text recognition device of this embodiment. It should be noted that the technical scheme of the text recognition device and the technical scheme of the first text recognition method mentioned above are of the same concept, and the details of the technical scheme of the text recognition device that are not described in detail can be found in the description of the technical scheme of the above text recognition method.

与上述方法实施例相对应，本说明书还提供了文本识别装置实施例，图11示出了本说明书一个实施例提供的第二种文本识别装置的结构示意图。如图11所示，该装置包括：Corresponding to the above method embodiment, this specification also provides a text recognition device embodiment. FIG11 shows a schematic diagram of the structure of a second text recognition device provided by an embodiment of this specification. As shown in FIG11, the device includes:

第一界面展示模块1102，被配置为基于用户的调用请求为所述用户展示图片输入界面；The first interface display module 1102 is configured to display a picture input interface for the user based on a user's call request;

第三获取模块1104，被配置为获取所述用户基于所述图片输入界面输入的待识别的文本图片；The third acquisition module 1104 is configured to acquire the text image to be recognized input by the user based on the image input interface;

第七获得模块1106，被配置为将所述文本图片输入特征提取神经网络进行处理，获得所述文本图片的二维特征图，其中，所述二维特征图中包括所述文本图片中的字符；A seventh obtaining module 1106 is configured to input the text image into a feature extraction neural network for processing to obtain a two-dimensional feature map of the text image, wherein the two-dimensional feature map includes characters in the text image;

第八获得模块1108，被配置为对所述二维特征图中的字符进行预测，获得包含预测字符的二维预测结果图；An eighth obtaining module 1108 is configured to predict the characters in the two-dimensional feature map to obtain a two-dimensional prediction result map containing the predicted characters;

第九获得模块1110，被配置为对所述二维预测结果图中的预测字符进行处理，获得所述文本图片中的字符的识别结果并返回给所述用户。The ninth obtaining module 1110 is configured to process the predicted characters in the two-dimensional prediction result image, obtain the recognition results of the characters in the text image and return them to the user.

上述为本实施例的一种文本识别装置的示意性方案。需要说明的是，该文本识别装置的技术方案与上述的第二种文本识别方法的技术方案属于同一构思，文本识别装置的技术方案未详细描述的细节内容，均可以参见上述文本识别方法的技术方案的描述。The above is a schematic scheme of a text recognition device of this embodiment. It should be noted that the technical scheme of the text recognition device and the technical scheme of the second text recognition method described above are of the same concept, and the details of the technical scheme of the text recognition device that are not described in detail can be found in the description of the technical scheme of the text recognition method described above.

与上述方法实施例相对应，本说明书还提供了文本识别装置实施例，图12示出了本说明书一个实施例提供的第三种文本识别装置的结构示意图。如图12所示，该装置包括：Corresponding to the above method embodiment, this specification also provides a text recognition device embodiment. FIG12 shows a schematic diagram of the structure of a third text recognition device provided by an embodiment of this specification. As shown in FIG12, the device includes:

第一请求接收模块1202，被配置为接收用户发送的调用请求，其中，所述调用请求中携带待识别的文本图片；The first request receiving module 1202 is configured to receive a call request sent by a user, wherein the call request carries a text image to be recognized;

第一处理模块1204，被配置为将所述文本图片输入特征提取神经网络进行处理，获得所述文本图片的二维特征图，其中，所述二维特征图中包括所述文本图片中的字符；The first processing module 1204 is configured to input the text image into a feature extraction neural network for processing to obtain a two-dimensional feature map of the text image, wherein the two-dimensional feature map includes characters in the text image;

第一预测模块1206，被配置为对所述二维特征图中的字符进行预测，获得包含预测字符的二维预测结果图；A first prediction module 1206 is configured to predict the characters in the two-dimensional feature map to obtain a two-dimensional prediction result map containing the predicted characters;

第二处理模块1208，被配置为对所述二维预测结果图中的预测字符进行处理，获得所述文本图片中的字符的识别结果并返回给所述用户。The second processing module 1208 is configured to process the predicted characters in the two-dimensional prediction result image, obtain the recognition results of the characters in the text image and return them to the user.

上述为本实施例的一种文本识别装置的示意性方案。需要说明的是，该文本识别装置的技术方案与上述的第三种文本识别方法的技术方案属于同一构思，文本识别装置的技术方案未详细描述的细节内容，均可以参见上述文本识别方法的技术方案的描述。The above is a schematic scheme of a text recognition device of this embodiment. It should be noted that the technical scheme of the text recognition device and the technical scheme of the third text recognition method mentioned above are of the same concept, and the details of the technical scheme of the text recognition device that are not described in detail can be found in the description of the technical scheme of the text recognition method mentioned above.

与上述方法实施例相对应，本说明书还提供了特征提取神经网络训练装置实施例，图13示出了本说明书一个实施例提供的第一种特征提取神经网络训练装置的结构示意图。如图13所示，该装置包括：Corresponding to the above method embodiment, this specification also provides an embodiment of a feature extraction neural network training device. FIG13 shows a schematic diagram of the structure of a first feature extraction neural network training device provided by an embodiment of this specification. As shown in FIG13, the device includes:

第二获取模块1302，被配置为构建初始特征提取神经网络，并获取包含文本的样本图片训练集，其中，所述样本图片训练集中包括样本图片以及所述样本图片对应的文本标签；The second acquisition module 1302 is configured to construct an initial feature extraction neural network and acquire a sample image training set containing text, wherein the sample image training set includes sample images and text labels corresponding to the sample images;

第三获得模块1304，被配置为基于所述初始特征提取神经网络对所述样本图片进行处理，获得所述样本图片的二维特征图，并基于所述二维特征图确定所述样本图片的字符位置高热图；A third obtaining module 1304 is configured to process the sample image based on the initial feature extraction neural network to obtain a two-dimensional feature map of the sample image, and determine a character position high heat map of the sample image based on the two-dimensional feature map;

第四获得模块1306，被配置为根据所述样本图片的二维特征图以及字符位置高热图获得所述样本图片的一维特征图；A fourth obtaining module 1306 is configured to obtain a one-dimensional feature map of the sample image according to the two-dimensional feature map of the sample image and the character position high heat map;

第五获得模块1308，被配置为对所述一维特征图进行分类，以获得所述样本图片的文本预测结果；A fifth obtaining module 1308 is configured to classify the one-dimensional feature map to obtain a text prediction result of the sample image;

第六获得模块1310，被配置为基于所述样本图片对应的文本标签以及文本预测结果计算损失函数，且根据所述损失函数对所述初始特征提取神经网络进行训练，获得所述特征提取神经网络。The sixth acquisition module 1310 is configured to calculate a loss function based on the text label corresponding to the sample image and the text prediction result, and train the initial feature extraction neural network according to the loss function to obtain the feature extraction neural network.

可选的，所述装置，还包括：Optionally, the device further includes:

预处理模块，被配置为对所述样本图片进行预处理，以生成所述样本图片的伪标签。The preprocessing module is configured to preprocess the sample image to generate a pseudo label for the sample image.

可选的，所述第三获得模块1304，进一步被配置为：Optionally, the third obtaining module 1304 is further configured to:

可选的，所述第四获得模块1306，进一步被配置为：Optionally, the fourth obtaining module 1306 is further configured to:

可选的，所述第五获得模块1308，进一步被配置为：Optionally, the fifth obtaining module 1308 is further configured to:

上述为本实施例的一种特征提取神经网络训练装置的示意性方案。需要说明的是，该特征提取神经网络训练装置的技术方案与上述的第一种特征提取神经网络训练方法的技术方案属于同一构思，特征提取神经网络训练装置的技术方案未详细描述的细节内容，均可以参见上述特征提取神经网络训练方法的技术方案的描述。The above is a schematic scheme of a feature extraction neural network training device of this embodiment. It should be noted that the technical scheme of the feature extraction neural network training device and the technical scheme of the first feature extraction neural network training method mentioned above belong to the same concept, and the details not described in detail in the technical scheme of the feature extraction neural network training device can be found in the description of the technical scheme of the feature extraction neural network training method mentioned above.

与上述方法实施例相对应，本说明书还提供了特征提取神经网络训练装置实施例，图14示出了本说明书一个实施例提供的第二种特征提取神经网络训练装置的结构示意图。如图14所示，该装置包括：Corresponding to the above method embodiment, this specification also provides an embodiment of a feature extraction neural network training device. FIG14 shows a schematic diagram of the structure of a second feature extraction neural network training device provided in one embodiment of this specification. As shown in FIG14, the device includes:

第二界面展示模块1402，被配置为基于用户的调用请求为所述用户展示图片输入界面；The second interface display module 1402 is configured to display a picture input interface for the user based on a user's call request;

样本接收模块1404，被配置为接收所述用户基于所述图片输入界面输入的包含文本的样本图片训练集，其中，所述样本图片训练集中包括样本图片以及所述样本图片对应的文本标签The sample receiving module 1404 is configured to receive a sample picture training set containing text input by the user based on the picture input interface, wherein the sample picture training set includes sample pictures and text labels corresponding to the sample pictures.

第三处理模块1406，被配置为构建初始特征提取神经网络，基于所述初始特征提取神经网络对所述样本图片进行处理，获得所述样本图片的二维特征图，并基于所述二维特征图确定所述样本图片的字符位置高热图；The third processing module 1406 is configured to construct an initial feature extraction neural network, process the sample image based on the initial feature extraction neural network to obtain a two-dimensional feature map of the sample image, and determine a character position high heat map of the sample image based on the two-dimensional feature map;

第十获得模块1408，被配置为根据所述样本图片的二维特征图以及字符位置高热图获得所述样本图片的一维特征图；A tenth obtaining module 1408 is configured to obtain a one-dimensional feature map of the sample image according to the two-dimensional feature map of the sample image and the character position high heat map;

第一分类模块1410，被配置为对所述一维特征图进行分类，以获得所述样本图片的文本预测结果；A first classification module 1410 is configured to classify the one-dimensional feature map to obtain a text prediction result of the sample image;

第一训练模块1412，被配置为基于所述样本图片对应的文本标签以及文本预测结果计算损失函数，且根据所述损失函数对所述初始特征提取神经网络进行训练，获得所述特征提取神经网络。The first training module 1412 is configured to calculate a loss function based on the text label corresponding to the sample image and the text prediction result, and train the initial feature extraction neural network according to the loss function to obtain the feature extraction neural network.

上述为本实施例的一种特征提取神经网络训练装置的示意性方案。需要说明的是，该特征提取神经网络训练装置的技术方案与上述的第二种特征提取神经网络训练方法的技术方案属于同一构思，特征提取神经网络训练装置的技术方案未详细描述的细节内容，均可以参见上述特征提取神经网络训练方法的技术方案的描述。The above is a schematic scheme of a feature extraction neural network training device of this embodiment. It should be noted that the technical scheme of the feature extraction neural network training device and the technical scheme of the second feature extraction neural network training method mentioned above belong to the same concept, and the details not described in detail in the technical scheme of the feature extraction neural network training device can be found in the description of the technical scheme of the feature extraction neural network training method mentioned above.

与上述方法实施例相对应，本说明书还提供了特征提取神经网络训练装置实施例，图15示出了本说明书一个实施例提供的第三种特征提取神经网络训练装置的结构示意图。如图15所示，该装置包括：Corresponding to the above method embodiment, this specification also provides a feature extraction neural network training device embodiment. FIG15 shows a schematic diagram of the structure of a third feature extraction neural network training device provided by an embodiment of this specification. As shown in FIG15, the device includes:

第二请求接收模块1502，被配置为接收用户发送的调用请求，其中，所述调用请求中携带包含文本的样本图片训练集，所述样本图片训练集中包括样本图片以及所述样本图片对应的文本标签；The second request receiving module 1502 is configured to receive a call request sent by a user, wherein the call request carries a sample picture training set containing text, and the sample picture training set includes sample pictures and text labels corresponding to the sample pictures;

第四处理模块1504，被配置为构建初始特征提取神经网络，基于所述初始特征提取神经网络对所述样本图片进行处理，获得所述样本图片的二维特征图，并基于所述二维特征图确定所述样本图片的字符位置高热图；The fourth processing module 1504 is configured to construct an initial feature extraction neural network, process the sample image based on the initial feature extraction neural network to obtain a two-dimensional feature map of the sample image, and determine a character position high heat map of the sample image based on the two-dimensional feature map;

第十一获得模块1506，被配置为根据所述样本图片的二维特征图以及字符位置高热图获得所述样本图片的一维特征图；An eleventh obtaining module 1506 is configured to obtain a one-dimensional feature map of the sample image according to the two-dimensional feature map of the sample image and the character position high heat map;

第二分类模块1508，被配置为对所述一维特征图进行分类，以获得所述样本图片的文本预测结果；A second classification module 1508 is configured to classify the one-dimensional feature map to obtain a text prediction result of the sample image;

第二训练模块1510，被配置为基于所述样本图片对应的文本标签以及文本预测结果计算损失函数，且根据所述损失函数对所述初始特征提取神经网络进行训练，获得所述特征提取神经网络并返回给所述用户。The second training module 1510 is configured to calculate a loss function based on the text label corresponding to the sample image and the text prediction result, and train the initial feature extraction neural network according to the loss function to obtain the feature extraction neural network and return it to the user.

上述为本实施例的一种特征提取神经网络训练装置的示意性方案。需要说明的是，该特征提取神经网络训练装置的技术方案与上述的第三种特征提取神经网络训练方法的技术方案属于同一构思，特征提取神经网络训练装置的技术方案未详细描述的细节内容，均可以参见上述特征提取神经网络训练方法的技术方案的描述。The above is a schematic scheme of a feature extraction neural network training device of this embodiment. It should be noted that the technical scheme of the feature extraction neural network training device and the technical scheme of the third feature extraction neural network training method mentioned above belong to the same concept, and the details not described in detail in the technical scheme of the feature extraction neural network training device can be found in the description of the technical scheme of the feature extraction neural network training method mentioned above.

图16示出了根据本说明书一个实施例提供的一种计算设备1600的结构框图。该计算设备1600的部件包括但不限于存储器1610和处理器1620。处理器1620与存储器1610通过总线1630相连接，数据库1650用于保存数据。Fig. 16 shows a block diagram of a computing device 1600 according to an embodiment of the present specification. The components of the computing device 1600 include but are not limited to a memory 1610 and a processor 1620. The processor 1620 is connected to the memory 1610 via a bus 1630, and the database 1650 is used to store data.

计算设备1600还包括接入设备1640，接入设备1640使得计算设备1600能够经由一个或多个网络1660通信。这些网络的示例包括公用交换电话网(PSTN)、局域网(LAN)、广域网(WAN)、个域网(PAN)或诸如因特网的通信网络的组合。接入设备1640可以包括有线或无线的任何类型的网络接口(例如，网络接口卡(NIC))中的一个或多个，诸如IEEE802.11无线局域网(WLAN)无线接口、全球微波互联接入(Wi-MAX)接口、以太网接口、通用串行总线(USB)接口、蜂窝网络接口、蓝牙接口、近场通信(NFC)接口，等等。The computing device 1600 also includes an access device 1640 that enables the computing device 1600 to communicate via one or more networks 1660. Examples of these networks include a public switched telephone network (PSTN), a local area network (LAN), a wide area network (WAN), a personal area network (PAN), or a combination of communication networks such as the Internet. The access device 1640 may include one or more of any type of network interface (e.g., a network interface card (NIC)) whether wired or wireless, such as an IEEE 802.11 wireless local area network (WLAN) wireless interface, a World Wide Interoperability for Microwave Access (Wi-MAX) interface, an Ethernet interface, a universal serial bus (USB) interface, a cellular network interface, a Bluetooth interface, a near field communication (NFC) interface, and the like.

在本说明书的一个实施例中，计算设备1600的上述部件以及图16中未示出的其他部件也可以彼此相连接，例如通过总线。应当理解，图16所示的计算设备结构框图仅仅是出于示例的目的，而不是对本说明书范围的限制。本领域技术人员可以根据需要，增添或替换其他部件。In one embodiment of the present specification, the above components of the computing device 1600 and other components not shown in FIG. 16 may also be connected to each other, for example, through a bus. It should be understood that the computing device structure block diagram shown in FIG. 16 is only for illustrative purposes and is not intended to limit the scope of the present specification. Those skilled in the art may add or replace other components as needed.

计算设备1600可以是任何类型的静止或移动计算设备，包括移动计算机或移动计算设备(例如，平板计算机、个人数字助理、膝上型计算机、笔记本计算机、上网本等)、移动电话(例如，智能手机)、可佩戴的计算设备(例如，智能手表、智能眼镜等)或其他类型的移动设备，或者诸如台式计算机或PC的静止计算设备。计算设备1600还可以是移动式或静止式的服务器。Computing device 1600 may be any type of stationary or mobile computing device, including a mobile computer or mobile computing device (e.g., a tablet computer, a personal digital assistant, a laptop computer, a notebook computer, a netbook, etc.), a mobile phone (e.g., a smart phone), a wearable computing device (e.g., a smart watch, smart glasses, etc.), or other types of mobile devices, or a stationary computing device such as a desktop computer or PC. Computing device 1600 may also be a mobile or stationary server.

其中，处理器1620用于执行如下计算机可执行指令，该指令被处理器执行时实现所述文本识别方法的步骤或者实现所述特征提取神经网络训练方法的步骤。The processor 1620 is used to execute the following computer executable instructions, which, when executed by the processor, implement the steps of the text recognition method or the steps of the feature extraction neural network training method.

上述为本实施例的一种计算设备的示意性方案。需要说明的是，该计算设备的技术方案与上述的文本识别方法或特征提取神经网络训练方法的技术方案属于同一构思，计算设备的技术方案未详细描述的细节内容，均可以参见上述文本识别方法或特征提取神经网络训练方法的技术方案的描述。The above is a schematic scheme of a computing device of this embodiment. It should be noted that the technical scheme of the computing device and the technical scheme of the above-mentioned text recognition method or feature extraction neural network training method belong to the same concept, and the details not described in detail in the technical scheme of the computing device can be referred to the description of the technical scheme of the above-mentioned text recognition method or feature extraction neural network training method.

本说明书一实施例还提供一种计算机可读存储介质，其存储有计算机指令，，该指令被处理器执行时实现所述文本识别方法的步骤或者实现所述特征提取神经网络训练方法的步骤。An embodiment of the present specification also provides a computer-readable storage medium storing computer instructions, which, when executed by a processor, implement the steps of the text recognition method or the steps of the feature extraction neural network training method.

上述为本实施例的一种计算机可读存储介质的示意性方案。需要说明的是，该存储介质的技术方案与上述的文本识别方法或特征提取神经网络训练方法的技术方案属于同一构思，存储介质的技术方案未详细描述的细节内容，均可以参见上述文本识别方法或特征提取神经网络训练方法的技术方案的描述。The above is a schematic scheme of a computer-readable storage medium of this embodiment. It should be noted that the technical scheme of the storage medium and the technical scheme of the above-mentioned text recognition method or feature extraction neural network training method belong to the same concept, and the details not described in detail in the technical scheme of the storage medium can be referred to the description of the technical scheme of the above-mentioned text recognition method or feature extraction neural network training method.

上述对本说明书特定实施例进行了描述。其它实施例在所附权利要求书的范围内。在一些情况下，在权利要求书中记载的动作或步骤可以按照不同于实施例中的顺序来执行并且仍然可以实现期望的结果。另外，在附图中描绘的过程不一定要求示出的特定顺序或者连续顺序才能实现期望的结果。在某些实施方式中，多任务处理和并行处理也是可以的或者可能是有利的。The above is a description of a specific embodiment of the specification. Other embodiments are within the scope of the appended claims. In some cases, the actions or steps recorded in the claims can be performed in an order different from that in the embodiments and still achieve the desired results. In addition, the processes depicted in the drawings do not necessarily require the specific order or continuous order shown to achieve the desired results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.

所述计算机指令包括计算机程序代码，所述计算机程序代码可以为源代码形式、对象代码形式、可执行文件或某些中间形式等。所述计算机可读介质可以包括：能够携带所述计算机程序代码的任何实体或装置、记录介质、U盘、移动硬盘、磁碟、光盘、计算机存储器、只读存储器(ROM，Read-Only Memory)、随机存取存储器(RAM，Random Access Memory)、电载波信号、电信信号以及软件分发介质等。需要说明的是，所述计算机可读介质包含的内容可以根据司法管辖区内立法和专利实践的要求进行适当的增减，例如在某些司法管辖区，根据立法和专利实践，计算机可读介质不包括电载波信号和电信信号。The computer instructions include computer program codes, which may be in source code form, object code form, executable files or some intermediate forms, etc. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, USB flash drive, mobile hard disk, magnetic disk, optical disk, computer memory, read-only memory (ROM), random access memory (RAM), electric carrier signal, telecommunication signal and software distribution medium, etc. It should be noted that the content contained in the computer-readable medium may be appropriately increased or decreased according to the requirements of legislation and patent practice in the jurisdiction. For example, in some jurisdictions, according to legislation and patent practice, computer-readable media do not include electric carrier signals and telecommunication signals.

需要说明的是，对于前述的各方法实施例，为了简便描述，故将其都表述为一系列的动作组合，但是本领域技术人员应该知悉，本说明书实施例并不受所描述的动作顺序的限制，因为依据本说明书实施例，某些步骤可以采用其它顺序或者同时进行。其次，本领域技术人员也应该知悉，说明书中所描述的实施例均属于优选实施例，所涉及的动作和模块并不一定都是本说明书实施例所必须的。It should be noted that, for the above-mentioned method embodiments, for the sake of simplicity of description, they are all expressed as a series of action combinations, but those skilled in the art should be aware that the embodiments of this specification are not limited by the order of the actions described, because according to the embodiments of this specification, some steps can be performed in other orders or simultaneously. Secondly, those skilled in the art should also be aware that the embodiments described in the specification are all preferred embodiments, and the actions and modules involved are not necessarily required by the embodiments of this specification.

在上述实施例中，对各个实施例的描述都各有侧重，某个实施例中没有详述的部分，可以参见其它实施例的相关描述。In the above embodiments, the description of each embodiment has its own emphasis. For parts that are not described in detail in a certain embodiment, reference can be made to the relevant descriptions of other embodiments.

以上公开的本说明书优选实施例只是用于帮助阐述本说明书。可选实施例并没有详尽叙述所有的细节，也不限制该发明仅为所述的具体实施方式。显然，根据本说明书实施例的内容，可作很多的修改和变化。本说明书选取并具体描述这些实施例，是为了更好地解释本说明书实施例的原理和实际应用，从而使所属技术领域技术人员能很好地理解和利用本说明书。本说明书仅受权利要求书及其全部范围和等效物的限制。The preferred embodiments of this specification disclosed above are only used to help explain this specification. The optional embodiments do not describe all the details in detail, nor do they limit the invention to only the specific implementation methods described. Obviously, many modifications and changes can be made according to the content of the embodiments of this specification. This specification selects and specifically describes these embodiments in order to better explain the principles and practical applications of the embodiments of this specification, so that technicians in the relevant technical field can well understand and use this specification. This specification is only limited by the claims and their full scope and equivalents.