Movatterモバイル変換


[0]ホーム

URL:


CN111428717A - Text recognition method and device, electronic equipment and computer readable storage medium - Google Patents

Text recognition method and device, electronic equipment and computer readable storage medium
Download PDF

Info

Publication number
CN111428717A
CN111428717ACN202010226050.7ACN202010226050ACN111428717ACN 111428717 ACN111428717 ACN 111428717ACN 202010226050 ACN202010226050 ACN 202010226050ACN 111428717 ACN111428717 ACN 111428717A
Authority
CN
China
Prior art keywords
text
text box
initial
detection model
inclination
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010226050.7A
Other languages
Chinese (zh)
Other versions
CN111428717B (en
Inventor
李月
黄光伟
史新艳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
BOE Technology Group Co Ltd
Original Assignee
BOE Technology Group Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by BOE Technology Group Co LtdfiledCriticalBOE Technology Group Co Ltd
Priority to CN202010226050.7ApriorityCriticalpatent/CN111428717B/en
Publication of CN111428717ApublicationCriticalpatent/CN111428717A/en
Application grantedgrantedCritical
Publication of CN111428717BpublicationCriticalpatent/CN111428717B/en
Activelegal-statusCriticalCurrent
Anticipated expirationlegal-statusCritical

Links

Images

Classifications

Landscapes

Abstract

Translated fromChinese

本申请提供了一种文本识别方法、装置、电子设备及计算机可读存储介质。所述方法包括:获取包含文本信息的待识别图片;通过预先训练好的文本检测模型对所述待识别图片进行识别,确定所述待识别图片中包含文本的至少一个文本框,及各所述文本框对应的倾斜方向;根据所述倾斜方向对各所述文本框的文本方向进行校正,得到所述文本方向校正后的校正文本框;识别所述校正文本框中的文本信息。本申请实施例可以提高识别文字内容的方向的精确度,提高图片文本识别的准确性。

Figure 202010226050

The present application provides a text recognition method, apparatus, electronic device, and computer-readable storage medium. The method includes: acquiring a to-be-recognized picture containing text information; recognizing the to-be-recognized picture through a pre-trained text detection model, determining at least one text box containing text in the to-be-recognized picture, and each of the The inclination direction corresponding to the text box; the text direction of each of the text boxes is corrected according to the inclination direction to obtain the corrected text box after the text direction correction; and the text information in the corrected text box is identified. The embodiments of the present application can improve the accuracy of recognizing the direction of text content, and improve the accuracy of image text recognition.

Figure 202010226050

Description

Translated fromChinese
文本识别方法、装置、电子设备及计算机可读存储介质Text recognition method, apparatus, electronic device, and computer-readable storage medium

技术领域technical field

本申请涉及图片处理技术领域,特别是涉及一种文本识别方法、装置、电子设备及计算机可读存储介质。The present application relates to the technical field of image processing, and in particular, to a text recognition method, apparatus, electronic device, and computer-readable storage medium.

背景技术Background technique

对图片中文本内容进行识别时,需要先用文本检测方法对图片中所有文本框进行检测,然后将得到的每个文本框旋转至水平,最后将每个文本框图片(文字方向为水平正向)送入到识别模型中进行文本框内文字的识别。When recognizing the text content in the picture, you need to use the text detection method to detect all the text boxes in the picture, then rotate each text box obtained to the horizontal, and finally rotate each text box picture (the text direction is horizontal and positive). ) into the recognition model to recognize the text in the text box.

一种文本检测方法,可以得到图片中文本框的位置信息(中心点坐标、宽高和角度),但是无法反应出文字的真实朝向。如图1所示,文本框的形状与方向是一致的,但是框内文字的方向差异非常大。这带来的问题是,对每个文本框旋转时可能会出现文字方向旋转错误的情况,如图2所示,这将导致文本内容识别错误。A text detection method that can obtain the position information (center point coordinates, width and height, and angle) of the text box in the picture, but cannot reflect the true orientation of the text. As shown in Figure 1, the shape and orientation of the text box are consistent, but the orientation of the text in the box is very different. The problem this brings is that when each text box is rotated, the text direction may be rotated incorrectly, as shown in Figure 2, which will lead to incorrect recognition of the text content.

发明内容SUMMARY OF THE INVENTION

本申请提供一种文本识别方法、装置、电子设备及计算机可读存储介质,以解决无法识别出文字真实朝向容易导致文本内容识别错误的问题。The present application provides a text recognition method, device, electronic device, and computer-readable storage medium, so as to solve the problem that the failure to recognize the true orientation of the text may easily lead to errors in text content recognition.

为了解决上述问题,本申请公开了一种文本识别方法,包括:In order to solve the above problems, the present application discloses a text recognition method, including:

获取包含文本信息的待识别图片;Obtain the to-be-recognized image containing text information;

通过预先训练好的文本检测模型对所述待识别图片进行识别处理,确定所述待识别图片中包含文本的至少一个文本框,及各所述文本框对应的倾斜方向;Perform recognition processing on the to-be-recognized picture by using a pre-trained text detection model to determine at least one text box containing text in the to-be-recognized picture, and the inclination direction corresponding to each of the text boxes;

根据所述倾斜方向对各所述文本框的文本方向进行校正处理,得到所述文本方向校正后的校正文本框;Correcting the text orientation of each of the text boxes according to the inclination direction, to obtain a corrected text box after the text orientation has been corrected;

识别所述校正文本框中的文本信息。Identify text information in the corrected text box.

可选地,在所述获取包含文本信息的待识别图片之前,还包括:Optionally, before the acquiring the to-be-identified picture containing text information, the method further includes:

确定预先训练的所述文本检测模型;determining the pre-trained text detection model;

所述确定预先训练的所述文本检测模型,包括:The determining the pre-trained text detection model includes:

获取样本图片;所述样本图片中包含有预先标注的至少一个初始文本框,及各所述初始文本框在所述样本图片中所处的初始位置信息、各所述初始文本框的初始倾斜方向;Obtain a sample picture; the sample picture includes at least one pre-marked initial text box, initial position information of each initial text box in the sample picture, and initial tilt direction of each initial text box ;

将所述样本图片依次输入至初始文本检测模型对所述初始文本检测模型进行训练,确定所述样本图片对应的至少一个预测文本框,及各所述预测文本框在所述样本图片中所处的预测位置信息、各所述预测文本框的预测倾斜方向;The sample pictures are sequentially input into the initial text detection model to train the initial text detection model, and at least one predicted text box corresponding to the sample picture is determined, and the position of each predicted text box in the sample picture is determined. The predicted position information, the predicted inclination direction of each of the predicted text boxes;

根据各所述初始位置信息、各所述预测位置信息、各所述初始倾斜方向和各所述预测倾斜方向,计算得到所述初始文本检测模型的损失值;Calculate the loss value of the initial text detection model according to each of the initial position information, each of the predicted position information, each of the initial inclination directions and each of the predicted inclination directions;

在所述损失值处于预设范围内的情况下,将训练后的初始文本检测模型作为所述文本检测模型。When the loss value is within a preset range, the initial text detection model after training is used as the text detection model.

可选地,所述根据各所述初始位置信息、各所述预测位置信息、各所述初始倾斜方向和各所述预测倾斜方向,计算得到所述初始文本检测模型的损失值,包括:Optionally, calculating the loss value of the initial text detection model according to each of the initial position information, each of the predicted position information, each of the initial inclination directions and each of the predicted inclination directions, including:

根据各所述初始位置信息和各所述预测位置信息,计算得到位置损失值;Calculate the position loss value according to each of the initial position information and each of the predicted position information;

根据各所述初始倾斜方向和各所述预测倾斜方向,计算得到倾斜损失值;Calculate the tilt loss value according to each of the initial tilt directions and each of the predicted tilt directions;

根据所述位置损失值、位置权重、所述倾斜损失值和倾斜权重,计算得到所述初始文本检测模型的损失值。According to the position loss value, the position weight, the inclination loss value and the inclination weight, the loss value of the initial text detection model is calculated.

可选地,所述文本检测模型包括:分类结果获取层和倾斜方向获取层,所述通过预先训练的文本检测模型对所述待识别图片进行识别,确定所述待识别图片中包含文本的至少一个文本框,及各所述文本框对应的倾斜方向,包括:Optionally, the text detection model includes: a classification result acquisition layer and an oblique direction acquisition layer, the pre-trained text detection model is used to recognize the to-be-recognized picture, and it is determined that the to-be-recognized picture contains at least one of the texts. A text box, and the inclination direction corresponding to each of the text boxes, including:

调用所述分类结果获取层对所述待识别图片进行处理,获取所述待识别图片上的像素分类结果和连通分类结果;Invoke the classification result acquisition layer to process the to-be-recognized picture, and obtain the pixel classification result and the connected classification result on the to-be-recognized picture;

根据所述像素分类结果和所述连通分类结果,确定所述待识别图片中的至少一个文本框;According to the pixel classification result and the connectivity classification result, determine at least one text box in the to-be-recognized picture;

调用所述倾斜方向获取层对所述待识别图片进行处理,确定所述至少一个文本框中文本的待确认倾斜方向,及所述待确认倾斜方向对应的倾斜阈值;invoking the inclination direction acquisition layer to process the to-be-recognized picture, to determine the inclination direction of the text in the at least one text box to be confirmed, and the inclination threshold corresponding to the to-be-confirmed inclination direction;

根据所述倾斜阈值中的最大倾斜阈值,确定所述至少一个文本框对应的倾斜方向。According to the maximum inclination threshold among the inclination thresholds, the inclination direction corresponding to the at least one text box is determined.

可选地,所述根据所述倾斜方向对各所述文本框进行校正处理,得到校正文本框,包括:Optionally, performing correction processing on each of the text boxes according to the inclination direction to obtain a corrected text box, comprising:

在根据所述倾斜方向确定所述文本框的倾斜角度位于90°至180°之间时,将所述文本框逆时针旋转90°,得到所述校正文本框;When it is determined that the inclination angle of the text box is between 90° and 180° according to the inclination direction, the text box is rotated counterclockwise by 90° to obtain the corrected text box;

在根据所述倾斜方向确定所述文本框的倾斜角度位于180°至270°之间时,将所述文本框逆时针旋转180°,得到所述校正文本框;When it is determined that the inclination angle of the text box is between 180° and 270° according to the inclination direction, the text box is rotated counterclockwise by 180° to obtain the corrected text box;

在根据所述倾斜方向确定所述文本框的倾斜角度位于270°至360°之间时,将所述文本框逆时针旋转270°,得到所述校正文本框。When it is determined that the inclination angle of the text box is between 270° and 360° according to the inclination direction, the text box is rotated counterclockwise by 270° to obtain the corrected text box.

可选地,所述识别所述校正文本框中的文本信息,包括:Optionally, the identifying the text information in the correction text box includes:

将所述校正文本框输入至文本识别模型,通过所述文本识别模型确定所述校正文本框中包含的文本信息。The corrected text box is input into a text recognition model, and the text information contained in the corrected text box is determined by the text recognition model.

为了解决上述问题,本申请公开了一种文本识别装置,包括:In order to solve the above problems, the present application discloses a text recognition device, including:

待识别图片获取模块,用于获取包含文本信息的待识别图片;A to-be-recognized image acquisition module, used to acquire a to-be-recognized image containing text information;

待识别图片识别模块,用于通过预先训练好的文本检测模型对所述待识别图片进行识别处理,确定所述待识别图片中包含文本的至少一个文本框,及各所述文本框对应的倾斜方向;The to-be-recognized image recognition module is used to recognize the to-be-recognized image through a pre-trained text detection model, and to determine at least one text box containing text in the to-be-recognized image, and the inclination corresponding to each of the text boxes direction;

校正文本框获取模块,用于根据所述倾斜方向对各所述文本框的文本方向进行校正处理,得到所述文本方向校正后的校正文本框;a correction text box acquisition module, configured to perform correction processing on the text orientation of each of the text boxes according to the inclination direction to obtain a corrected text box after the text orientation is corrected;

文本信息确定模块,用于识别所述校正文本框中的文本信息。A text information determination module, configured to identify the text information in the corrected text box.

可选地,还包括:Optionally, also include:

文本检测模型确定模块,用于确定预先训练的所述文本检测模型;a text detection model determination module for determining the pre-trained text detection model;

所述文本检测模型确定模块,包括:The text detection model determination module includes:

样本图片获取单元,用于获取样本图片;所述样本图片中包含有预先标注的至少一个初始文本框,及各所述初始文本框在所述样本图片中所处的初始位置信息、各所述初始文本框的初始倾斜方向;A sample picture obtaining unit, used for obtaining a sample picture; the sample picture includes at least one pre-marked initial text box, and initial position information of each initial text box in the sample picture, each of the The initial tilt direction of the initial text box;

预测文本框确定单元,用于将所述样本图片依次输入至初始文本检测模型对所述初始文本检测模型进行训练,确定所述样本图片对应的至少一个预测文本框,及各所述预测文本框在所述样本图片中所处的预测位置信息、各所述预测文本框的预测倾斜方向;A predictive text box determining unit, configured to sequentially input the sample pictures into an initial text detection model to train the initial text detection model, and determine at least one predictive text box corresponding to the sample picture, and each predictive text box The predicted position information in the sample picture, and the predicted inclination direction of each of the predicted text boxes;

损失值计算单元,用于根据各所述初始位置信息、各所述预测位置信息、各所述初始倾斜方向和各所述预测倾斜方向,计算得到所述初始文本检测模型的损失值;a loss value calculation unit, configured to calculate the loss value of the initial text detection model according to each of the initial position information, each of the predicted position information, each of the initial inclination directions and each of the predicted inclination directions;

文本检测模型获取单元,用于在所述损失值处于预设范围内的情况下,将训练后的初始文本检测模型作为所述文本检测模型。A text detection model acquisition unit, configured to use the trained initial text detection model as the text detection model when the loss value is within a preset range.

可选地,所述损失值计算单元包括:Optionally, the loss value calculation unit includes:

位置损失值计算子单元,用于根据各所述初始位置信息和各所述预测位置信息,计算得到位置损失值;a position loss value calculation subunit, configured to calculate a position loss value according to each of the initial position information and each of the predicted position information;

倾斜损失值计算子单元,用于根据各所述初始倾斜方向和各所述预测倾斜方向,计算得到倾斜损失值;a tilt loss value calculation subunit, configured to calculate and obtain a tilt loss value according to each of the initial tilt directions and each of the predicted tilt directions;

损失值计算子单元,用于根据所述位置损失值、位置权重、所述倾斜损失值和倾斜权重,计算得到所述初始文本检测模型的损失值。The loss value calculation subunit is configured to calculate the loss value of the initial text detection model according to the position loss value, the position weight, the inclination loss value and the inclination weight.

可选地,所述文本检测模型包括:分类结果获取层和倾斜方向获取层,所述待识别图片识别模块包括:Optionally, the text detection model includes: a classification result acquisition layer and an oblique direction acquisition layer, and the to-be-recognized image recognition module includes:

正像素获取单元,用于调用所述分类结果获取层对所述待识别图片进行处理,获取所述待识别图片上的像素分类结果和连通分类结果;a positive pixel obtaining unit, configured to call the classification result obtaining layer to process the to-be-recognized picture, and obtain the pixel classification result and the connectivity classification result on the to-be-recognized picture;

文本框确定单元,用于根据所述像素分类结果和所述连通分类结果,确定所述待识别图片中的至少一个文本框;a text box determination unit, configured to determine at least one text box in the to-be-recognized picture according to the pixel classification result and the connectivity classification result;

倾斜阈值确定单元,用于调用所述倾斜方向获取层对所述待识别图片进行处理,确定所述至少一个文本框中文本的待确认倾斜方向,及所述待确认倾斜方向对应的倾斜阈值;an inclination threshold determination unit, configured to call the inclination direction acquisition layer to process the to-be-recognized picture, to determine the inclination direction of the text in the at least one text box to be confirmed, and the inclination threshold corresponding to the inclination direction to be confirmed;

倾斜方向确定单元,用于根据所述倾斜阈值中的最大倾斜阈值,确定所述至少一个文本框对应的倾斜方向。A tilt direction determining unit, configured to determine a tilt direction corresponding to the at least one text box according to the maximum tilt threshold value among the tilt threshold values.

可选地,所述校正文本框获取模块包括:Optionally, the correction text box obtaining module includes:

第一校正框获取单元,用于在根据所述倾斜方向确定所述文本框的倾斜角度位于90°至180°之间时,将所述文本框逆时针旋转90°,得到所述校正文本框;a first correction frame obtaining unit, configured to rotate the text frame by 90° counterclockwise when the inclination angle of the text frame is determined to be between 90° and 180° according to the inclination direction to obtain the correction text frame ;

第二校正框获取单元,用于在根据所述倾斜方向确定所述文本框的倾斜角度位于180°至270°之间时,将所述文本框逆时针旋转180°,得到所述校正文本框;A second correction frame obtaining unit, configured to rotate the text frame by 180° counterclockwise when the inclination angle of the text frame is determined to be between 180° and 270° according to the inclination direction to obtain the correction text frame ;

第三校正框获取单元,用于在根据所述倾斜方向确定所述文本框的倾斜角度位于270°至360°之间时,将所述文本框逆时针旋转270°,得到所述校正文本框。A third correction frame obtaining unit, configured to rotate the text frame by 270° counterclockwise when the inclination angle of the text frame is determined to be between 270° and 360° according to the inclination direction to obtain the correction text frame .

可选地,所述文本信息确定模块包括:Optionally, the text information determination module includes:

文本信息确定单元,用于将所述校正文本框输入至文本识别模型,通过所述文本识别模型确定所述校正文本框中包含的文本信息。A text information determination unit, configured to input the corrected text box into a text recognition model, and determine the text information contained in the corrected text box through the text recognition model.

为了解决上述问题,本申请公开了一种电子设备,包括:In order to solve the above problems, the present application discloses an electronic device, comprising:

处理器、存储器以及存储在所述存储器上并可在所述处理器上运行的计算机程序,所述处理器执行所述程序时实现上述任一项所述的文本识别方法。A processor, a memory, and a computer program stored on the memory and executable on the processor, when the processor executes the program, the text recognition method described in any one of the above is implemented.

为了解决上述问题,本申请公开了一种计算机可读存储介质,当所述存储介质中的指令由电子设备的处理器执行时,使得电子设备能够执行上述任一项所述的文本识别方法。In order to solve the above problems, the present application discloses a computer-readable storage medium, when the instructions in the storage medium are executed by the processor of the electronic device, the electronic device can execute any one of the text recognition methods described above.

与现有技术相比,本申请包括以下优点:Compared with the prior art, the present application includes the following advantages:

本申请实施例提供的文本识别方案,通过获取包含文本信息的待识别图片,通过预先训练好的文本检测模型对待识别图片进行识别处理,确定待识别图片中包含文本的至少一个文本框,及各文本框对应的倾斜方向,根据倾斜方向对各文本框的文本方向进行校正处理,得到文本方向校正后的校正文本框,并识别校正文本框中的文本信息。本申请实施例同时融合文本方向分类网络和文本框位置检测网络的模型,然后根据大的角度分类结果和文本框位置的检测结果,最终得到精确的文字内容的方向,提高了图片文本识别的准确性。In the text recognition solution provided by the embodiment of the present application, by acquiring a to-be-recognized picture containing text information, and performing recognition processing on the to-be-recognized picture by a pre-trained text detection model, at least one text box containing text in the to-be-recognized picture is determined, and each For the inclination direction corresponding to the text box, the text direction of each text box is corrected according to the inclination direction to obtain a corrected text box after the text direction is corrected, and the text information in the corrected text box is identified. In this embodiment of the present application, the models of the text orientation classification network and the text box position detection network are fused simultaneously, and then according to the large angle classification result and the detection result of the text box position, an accurate text content orientation is finally obtained, which improves the accuracy of image and text recognition. sex.

附图说明Description of drawings

图1a示出了一种文本图片的示意图;Figure 1a shows a schematic diagram of a text picture;

图1b示出了一种旋转后的文本图片的示意图;Figure 1b shows a schematic diagram of a rotated text picture;

图2示出了本申请实施例提供的一种文本识别方法的步骤流程图;FIG. 2 shows a flowchart of steps of a text recognition method provided by an embodiment of the present application;

图3示出了本申请实施例提供的一种文本识别方法的步骤流程图;FIG. 3 shows a flowchart of steps of a text recognition method provided by an embodiment of the present application;

图4a示出了本申请实施例提供的一种待旋转单词的示意图;Figure 4a shows a schematic diagram of a word to be rotated provided by an embodiment of the present application;

图4b示出了本申请实施例提供的一种数据标注的示意图;FIG. 4b shows a schematic diagram of a data annotation provided by an embodiment of the present application;

图4c示出了本申请实施例提供的一种网络结构的示意图;FIG. 4c shows a schematic diagram of a network structure provided by an embodiment of the present application;

图4d示出了本申请实施例提供的一种文本框结果的示意图;FIG. 4d shows a schematic diagram of a text box result provided by an embodiment of the present application;

图4e示出了本申请实施例提供的一种文本图像的示意图;FIG. 4e shows a schematic diagram of a text image provided by an embodiment of the present application;

图4f示出了本申请实施例提供的一种虚拟笔尖框的示意图;FIG. 4f shows a schematic diagram of a virtual pen tip frame provided by an embodiment of the present application;

图5示出了本申请实施例提供的一种文本识别装置的结构示意图;FIG. 5 shows a schematic structural diagram of a text recognition device provided by an embodiment of the present application;

图6示出了本申请实施例提供的一种文本识别装置的结构示意图。FIG. 6 shows a schematic structural diagram of a text recognition apparatus provided by an embodiment of the present application.

具体实施方式Detailed ways

为使本申请的上述目的、特征和优点能够更加明显易懂,下面结合附图和具体实施方式对本申请作进一步详细的说明。In order to make the above objects, features and advantages of the present application more clearly understood, the present application will be described in further detail below with reference to the accompanying drawings and specific embodiments.

参照图2,示出了本申请实施例提供的一种文本识别方法的步骤流程图,一些实施例中可以由处理器执行,该文本识别方法具体可以包括如下步骤:Referring to FIG. 2 , a flowchart of steps of a text recognition method provided by an embodiment of the present application is shown. In some embodiments, the text recognition method may be executed by a processor. Specifically, the text recognition method may include the following steps:

步骤101:获取包含文本信息的待识别图片。Step 101: Acquire a to-be-identified picture containing text information.

本申请实施例在一些实施例中可以应用于对图片中的文字进行识别的场景。The embodiments of the present application may be applied to the scene of recognizing text in pictures in some embodiments.

待识别图片是指包含有文本信息的用于进行文本识别的图片。The picture to be recognized refers to a picture that contains text information and is used for text recognition.

在某些示例中,待识别图片可以是从互联网中随机选取的一张图片,例如,从某网站中选择的一张包含有文本信息的图片等。In some examples, the picture to be identified may be a picture randomly selected from the Internet, for example, a picture containing text information selected from a website.

在某些示例中,待识别图片可以是由用户输入的一张图片,例如,用户采用图片搜索所需信息时输入的图片等。In some examples, the picture to be recognized may be a picture input by the user, for example, the picture input by the user when searching for the desired information using the picture, and the like.

可以理解地,上述示例仅是为了更好地理解本申请实施例的技术方案而列举的示例,不作为对本申请实施例的唯一限制。It can be understood that the above examples are only examples listed for better understanding of the technical solutions of the embodiments of the present application, and are not intended to be the only limitations on the embodiments of the present application.

在获取包含有文本信息的待识别图片之后,执行步骤102。After acquiring the to-be-recognized picture containing text information,step 102 is performed.

步骤102:通过预先训练好的文本检测模型对所述待识别图片进行识别,确定所述待识别图片中包含文本的至少一个文本框,及各所述文本框对应的倾斜方向。Step 102: Recognize the picture to be recognized by using a pre-trained text detection model, and determine at least one text box containing text in the picture to be recognized, and the inclination direction corresponding to each of the text boxes.

文本检测模型是指预先训练得到的,至少用于对图片中的文本信息的文本倾斜方向和连通信息进行检测的模型。对于文本检测模型的训练过程中可以在下述实施例中进行详细描述。The text detection model refers to a pre-trained model that is at least used to detect the text inclination direction and connectivity information of the text information in the picture. The training process of the text detection model can be described in detail in the following embodiments.

文本框是指通过文本检测模型识别得到的能够包含图片中文本的四个图像坐标连线形成的框。如图4a所示,图4a左半部分图中,单词“word”由1、2、3和4对应的坐标围合而成,将这四个坐标直线相连即可组成一个文本框。可以理解地,上述示例仅是为了更好地理解本申请实施例的技术方案而列举的示例,不作为对本申请实施例的唯一限制。The text box refers to the box formed by connecting the four image coordinates that can contain the text in the picture and obtained through the text detection model. As shown in Figure 4a, in the left half of Figure 4a, the word "word" is surrounded by coordinates corresponding to 1, 2, 3, and 4, and a text box can be formed by connecting these four coordinates with straight lines. It can be understood that the above examples are only examples listed for better understanding of the technical solutions of the embodiments of the present application, and are not intended to be the only limitations on the embodiments of the present application.

倾斜方向是指以图片正向显示的方向为基准,确定的文本倾斜的方向,当然,在本实施例中,图片有横向拍摄的,也有纵向拍摄的,可以按照拍摄的方向,确定图片的正向显示方向,然后以正向显示方向为基准,确定出图片中文本的倾斜方向,如图4a所示,在图片正向显示时,图4a左半部分的上图中的单词的倾斜方向为:向右上倾斜,图4a左半部分的下图中的单词的倾斜方向为:向左上倾斜。The tilt direction refers to the direction in which the text is tilted based on the direction in which the picture is displayed in the forward direction. Of course, in this embodiment, the picture is taken horizontally or vertically, and the vertical direction of the picture can be determined according to the shooting direction. To the display direction, and then use the forward display direction as the benchmark to determine the inclination direction of the text in the picture, as shown in Figure 4a, when the picture is displayed in the forward direction, the inclination direction of the words in the upper picture in the left half of Figure 4a is : sloping to the upper right, the inclination direction of the word in the lower figure in the left half of Figure 4a is: sloping to the upper left.

在获取待识别图片之后,可以将待识别图片输入至文本检测模型中,通过文本检测模型对待识别图片进行识别,以确定待识别图片中包含的至少一个文本框,及各文本框对应的倾斜方向。After acquiring the to-be-recognized picture, the to-be-recognized picture can be input into the text detection model, and the to-be-recognized picture can be recognized by the text detection model to determine at least one text box included in the to-be-recognized picture and the inclination direction corresponding to each text box .

在确定待识别图片中包含文本的至少一个文本框,及各文本框对应的倾斜方向之后,执行步骤103。Step 103 is performed after determining at least one text box containing text in the picture to be recognized, and the inclination direction corresponding to each text box.

步骤103:根据所述倾斜方向对各所述文本框的文本方向进行校正处理,得到所述文本方向校正后的校正文本框。Step 103 : Correct the text orientation of each of the text boxes according to the inclination direction to obtain a corrected text box after the text orientation has been corrected.

校正处理是指对倾斜的文本框进行倾斜校正的操作,具体是对文本框的四个角的坐标以及文字方向进行校准。The correction processing refers to the operation of performing tilt correction on the tilted text box, specifically, calibrating the coordinates of the four corners of the text box and the direction of the text.

校正文本框是指对倾斜的文本框的文本方向进行校正,所得到的校正后的文本框,如图4a所示,在对图4a左半部分图中的文本框进行校正之后,可以得到图4a右半部分图所示的校正文本框。Correcting the text box refers to correcting the text direction of the inclined text box. The resulting corrected text box is shown in Figure 4a. After correcting the text box in the left half of Figure 4a, the figure can be obtained. Correction text box shown in the right half of 4a.

可以理解地,上述示例仅是为了更好地理解本申请实施例的技术方案而列举的示例,不作为对本申请实施例的唯一限制。It can be understood that the above examples are only examples listed for better understanding of the technical solutions of the embodiments of the present application, and are not intended to be the only limitations on the embodiments of the present application.

在根据倾斜方向对各文本框的文本方向进行校正处理,得到文本方向校正后的校正文本框之后,执行步骤104。Step 104 is performed after the text orientation of each text box is corrected according to the inclination direction to obtain a corrected text box after the text orientation has been corrected.

步骤104:识别所述校正文本框中的文本信息。Step 104: Identify the text information in the correction text box.

目标文本信息是指通过对校正文本框进行识别得到的待识别图片中包含的文本信息。The target text information refers to the text information contained in the to-be-recognized picture obtained by recognizing the corrected text box.

在根据倾斜方向对各文本框进行校正处理之后,可以对校正文本框进行识别处理,以确定各文本框中所包含的文本信息,然后,结合识别得到的结果,以得到目标文本信息。After the correction processing is performed on each text box according to the inclination direction, the corrected text box can be recognized by processing to determine the text information contained in each text box, and then the target text information can be obtained by combining the recognition results.

本实施例通过采用文本检测模型对倾斜的文本框进行识别,从而可以避免文字内容方向识别不准确,导致文本识别错误的问题。In this embodiment, by using a text detection model to identify the slanted text box, the problem of inaccurate identification of the direction of the text content and the resulting error in text identification can be avoided.

本申请实施例提供的文本识别方法,通过获取包含文本信息的待识别图片,通过预先训练好的文本检测模型对待识别图片进行识别,确定待识别图片中包含文本的至少一个文本框,及各文本框对应的倾斜方向,根据倾斜方向对各文本框的文本方向进行校正处理,得到文本方向校正后的校正文本框,识别校正文本框中的文本信息。本申请实施例同时融合文本方向分类网络和文本框位置检测网络的模型,然后根据大的角度分类结果和文本框位置的检测结果,最终得到精确的文字内容的方向,提高了图片文本识别的准确性。In the text recognition method provided by the embodiment of the present application, by acquiring a to-be-recognized picture containing text information, the pre-trained text detection model is used to recognize the to-be-recognized picture, and at least one text box containing text in the to-be-recognized picture is determined, and each text The text direction of each text box is corrected according to the tilt direction corresponding to the frame, to obtain a corrected text box after the text direction is corrected, and to identify the text information in the corrected text box. In this embodiment of the present application, the models of the text orientation classification network and the text box position detection network are fused simultaneously, and then according to the large angle classification result and the detection result of the text box position, an accurate text content orientation is finally obtained, which improves the accuracy of image and text recognition. sex.

接下来结合图3对本实施例的实施过程进行详细描述。Next, the implementation process of this embodiment will be described in detail with reference to FIG. 3 .

参照图3,示出了本申请实施例提供的一种文本识别方法的步骤流程图,该文本识别方法具体可以包括如下步骤:Referring to FIG. 3, a flowchart of steps of a text recognition method provided by an embodiment of the present application is shown, and the text recognition method may specifically include the following steps:

步骤201:确定预先训练的文本检测模型。Step 201: Determine a pre-trained text detection model.

本申请实施例可以应用于对包含文本的图片中的文本进行识别的过程中。The embodiments of the present application may be applied to a process of recognizing text in a picture containing text.

在需要对包含文本的图片进行识别时,可以先确定处文本检测模型,具体地,可以结合下述具体实现方式进行描述。When a picture containing text needs to be recognized, a text detection model can be determined first, and specifically, it can be described in conjunction with the following specific implementation methods.

在本申请的一种具体实现方式中,上述步骤201可以包括:In a specific implementation manner of the present application, theabove step 201 may include:

子步骤S1:获取样本图片。Sub-step S1: Obtain a sample picture.

而在对图片中的倾斜文本识别时,可以采用预先训练的文本检测模型进行识别,对于文本检测模型可以如4c所示,左侧的主干网络用于提取特征,采用MobileNet-V2轻量化模型,兼顾了模型深度和模型参数大小。右侧网络为特征图融合网络,分别取出bottleneck1、bottleneck2、bottleneck3、bottleneck5和conv2d的输出特征图,通过卷积计算和上采样进行不同尺寸的特征图融合,得到3组最终的计算结果,分别为X1:112x112x2,X2:112x112x16,X3:112x112x4。其中X1代表每个像素位置处预测是正像素(文字像素)的概率和负像素(背景像素)的概率;X2代表每个像素位置处预测与附近相邻的8个像素之间的连通与不连通的概率值;X3代表每个像素位置处预测4个文字方向区间对应的概率值。When recognizing the oblique text in the picture, the pre-trained text detection model can be used for recognition. For the text detection model, as shown in 4c, the backbone network on the left is used to extract features, and the MobileNet-V2 lightweight model is used. Taking into account the model depth and model parameter size. The network on the right is a feature map fusion network. The output feature maps of bottleneck1, bottleneck2, bottleneck3, bottleneck5 and conv2d are taken out respectively, and the feature maps of different sizes are fused through convolution calculation and upsampling, and three sets of final calculation results are obtained. X1: 112x112x2, X2: 112x112x16, X3: 112x112x4. Where X1 represents the probability that the prediction at each pixel position is a positive pixel (text pixel) and the probability of a negative pixel (background pixel); X2 represents the connection and disconnection between the prediction at each pixel position and the adjacent 8 pixels nearby The probability value of ; X3 represents the probability value corresponding to the predicted 4 text direction intervals at each pixel position.

在本实施例中,不同尺寸可以对应于大中小三个尺寸,而对于大中小尺寸的具体数值划分可以根据业务需求而定,本实施例对此不加以限制。In this embodiment, different sizes may correspond to three sizes, large, medium and small, and the specific numerical division of large, medium and small sizes may be determined according to service requirements, which is not limited in this embodiment.

首先,对文本检测模型的训练过程进行描述。First, the training process of the text detection model is described.

样本图片是指包含有倾斜文本的用于对文本检测模型进行训练的图片。Sample images are images that contain slanted text and are used to train the text detection model.

在每幅样本图片中包含有预先标注的至少一个初始文本框,及各初始文本框在所述样本图片中所处的初始位置信息、各初始文本框的初始倾斜方向。Each sample picture includes at least one pre-marked initial text box, initial position information of each initial text box in the sample picture, and initial inclination direction of each initial text box.

在本实施例中,初始文本框可以是由业务人员预先标注的,具体地,可以由业务人员根据样本图片中文本所处的位置,预先标注四个顶点,该四个顶点恰好可以将文本围合,则将这四个顶点连线形成一个方形框,这个方形框即为初始文本框,例如,如图4b所示,在标注初始文本框时,可以以图片正向显示方向为基准,以图片左顶点为原点,然后标注四个可以将图示文本“ABC”恰好围合的顶点,结合这四个顶点的坐标即形成了一个初始文本框。In this embodiment, the initial text box may be pre-marked by the business personnel. Specifically, the business personnel may pre-mark four vertices according to the position of the text in the sample picture, and the four vertices can just enclose the text around the text. The four vertices are connected to form a square box, and this square box is the initial text box. For example, as shown in Figure 4b, when marking the initial text box, the forward display direction of the picture can be used as the benchmark, and the The left vertex of the picture is the origin, and then four vertices that can just enclose the graphic text "ABC" are marked, and the coordinates of these four vertices are combined to form an initial text box.

在获取样本图片之后,执行子步骤S2。After the sample picture is acquired, sub-step S2 is performed.

子步骤S2:将所述样本图片依次输入至初始文本检测模型,确定所述样本图片对应的至少一个预测文本框,及各所述预测文本框在所述样本图片中所处的预测位置信息、各所述预测文本框的预测倾斜方向。Sub-step S2: Input the sample pictures into the initial text detection model in turn, determine at least one predicted text box corresponding to the sample picture, and the predicted position information of each predicted text box in the sample picture, The predicted inclination direction of each of the predicted text boxes.

初始文本检测模型是指可以对包含文本的图片中的文本进行识别,但是还未进行训练的文本检测模型。The initial text detection model refers to a text detection model that can recognize text in pictures containing text, but has not yet been trained.

预测文本框是指通过初始文本检测模型对样本图片进行文本识别,而通过四个坐标围合而成的文本框,如图4b所示,倾斜文本为“ABC”,可以结合四个点的坐标:1、2、3和4,这四个点的坐标围城的框即为文本框。The predicted text box refers to the text recognition of the sample image through the initial text detection model, and the text box enclosed by four coordinates, as shown in Figure 4b, the oblique text is "ABC", which can be combined with the coordinates of the four points : 1, 2, 3 and 4, the coordinates of these four points are surrounded by the box is the text box.

预测位置信息是指通过初始文本检测模型获取到的预测文本框在样本图片中所处的位置,具体地,可以根据预测文本框的四个顶点的坐标,确定出预测文本框在样本图片中的位置。The predicted position information refers to the position of the predicted text box in the sample picture obtained through the initial text detection model. Specifically, the position of the predicted text box in the sample picture can be determined according to the coordinates of the four vertices of the predicted text box. Location.

预测倾斜方向是指通过初始文本检测模型获取到的预测文本框的倾斜方向,如图4b所示,在通过初始文本检测模型识别出样本图片中的预测文本框之后,可以结合样本图片的显示方向,预测出预测文本框的倾斜方向,如图片中单词“ABC”的倾斜方向为:向右上倾斜。The predicted tilt direction refers to the tilt direction of the predicted text box obtained by the initial text detection model. As shown in Figure 4b, after the predicted text box in the sample picture is identified by the initial text detection model, the display direction of the sample picture can be combined. , and predict the inclination direction of the predicted text box. For example, the inclination direction of the word "ABC" in the picture is: to the right and up.

当然,不仅限于此,对于初始文本检测模型而言,其所检测的预测文本框的倾斜方向并不一定与样本图片中围合文本的文本框的倾斜方向相同,上述示例仅是为了更好地理解本申请实施例的技术方案而列举的示例,不作为对本申请实施例的唯一限制。Of course, it is not limited to this. For the initial text detection model, the inclination direction of the detected predicted text box is not necessarily the same as the inclination direction of the text box enclosing the text in the sample picture. The above example is only for better The examples listed for understanding the technical solutions of the embodiments of the present application are not regarded as the only limitations on the embodiments of the present application.

当然,在确定文本框时,还需要确定图片上的各像素之间的连通信息,连通信息是指各像素之间连通关系,有连通关系的文本像素组成一个文本框。Of course, when determining the text box, it is also necessary to determine the connectivity information between the pixels on the picture. The connectivity information refers to the connectivity relationship between the pixels, and the text pixels with the connectivity relationship form a text box.

在获取样本图片之后,可以将样本图片依次输入至初始文本检测模型,通过初始文本检测模型确定出样本图片对应的至少一个预测文本框,及各预测文本框在样本图片中所处的预测位置信息、各预测文本框的预测倾斜方向,进而,执行子步骤S3。After obtaining the sample pictures, the sample pictures can be input into the initial text detection model in turn, and at least one predicted text box corresponding to the sample picture and the predicted position information of each predicted text box in the sample picture are determined by the initial text detection model. , the predicted inclination direction of each predicted text box, and further, sub-step S3 is executed.

子步骤S3:根据各所述初始位置信息、各所述预测位置信息、各所述初始倾斜方向和各所述预测倾斜方向,计算得到所述初始文本检测模型的损失值。Sub-step S3: Calculate the loss value of the initial text detection model according to each of the initial position information, each of the predicted position information, each of the initial inclination directions, and each of the predicted inclination directions.

损失值可以表示样本图片的各预测位置信息与各初始位置信息、各预测倾斜方向与各初始倾斜方向之间的偏差程度。The loss value may represent the degree of deviation between each predicted position information of the sample picture and each initial position information, and each predicted tilt direction and each initial tilt direction.

在通过初始文本检测模型确定出样本图片对应的至少一个预测文本框,及各预测文本框在样本图片中所处的预测位置信息、各预测文本框的预测倾斜方向,可以结合各初始位置信息、各预测位置信息、各初始倾斜方向、各预测倾斜方向,计算得到初始文本检测模型的损失值,具体地,可以结合下述具体实现方式进行详细描述。After determining at least one predicted text box corresponding to the sample picture, the predicted position information of each predicted text box in the sample picture, and the predicted inclination direction of each predicted text box through the initial text detection model, the initial position information, Each predicted position information, each initial tilt direction, and each predicted tilt direction are calculated to obtain the loss value of the initial text detection model. Specifically, it can be described in detail in conjunction with the following specific implementation manners.

在本申请的一种具体实现方式中,上述子步骤S3可以包括:In a specific implementation manner of the present application, the above sub-step S3 may include:

子步骤S31:根据各所述初始位置信息和各所述预测位置信息,计算得到位置损失值。Sub-step S31: Calculate and obtain a position loss value according to each of the initial position information and each of the predicted position information.

在本实施例中,位置损失值是指通过预测位置信息和初始位置信息计算得到的损失值,具体地,位置损失值可以包括倾斜角度分类损失值和连通分类损失值,其中,位置损失值的具体计算方式可以为:计算初始位置信息和预测位置信息之间的交叉熵损失函数,从而可以得到位置损失值。In this embodiment, the position loss value refers to a loss value calculated from the predicted position information and the initial position information. Specifically, the position loss value may include a tilt angle classification loss value and a connectivity classification loss value, wherein the position loss value is The specific calculation method may be: calculating the cross-entropy loss function between the initial position information and the predicted position information, so that the position loss value can be obtained.

在获取各预测文本框的预测位置信息之后,可以结合各预测位置信息和各初始位置信息计算得到位置损失值。After obtaining the predicted position information of each predicted text box, the position loss value can be obtained by combining the predicted position information and the initial position information.

子步骤S32:根据各所述初始倾斜方向和各所述预测倾斜方向,计算得到倾斜损失值。Sub-step S32: Calculate and obtain a tilt loss value according to each of the initial tilt directions and each of the predicted tilt directions.

倾斜损失值是指通过预测倾斜方向和初始倾斜方向计算得到的损失值。The tilt loss value refers to the loss value calculated from the predicted tilt direction and the initial tilt direction.

在获取各预测文本框的预测倾斜方向之后,可以结合各预测倾斜方向和各初始倾斜方向计算得到倾斜损失值,具体地,可以通过计算初始倾斜方向和预测倾斜方向之间的交叉熵损失函数,以得到倾斜损失值。After obtaining the predicted inclination direction of each predicted text box, the inclination loss value can be calculated in combination with each predicted inclination direction and each initial inclination direction. Specifically, the cross-entropy loss function between the initial inclination direction and the predicted inclination direction can be calculated, to get the tilt loss value.

子步骤S33:根据所述位置损失值、位置权重、所述倾斜损失值、倾斜权重,计算得到所述初始文本检测模型的损失值。Sub-step S33: Calculate the loss value of the initial text detection model according to the position loss value, the position weight, the inclination loss value, and the inclination weight.

在本实施例中,位置损失值可以包括连通分类损失值和像素分类损失值,即位置损失值是由连通分类损失值和像素分类损失值共同确定的。In this embodiment, the location loss value may include a connectivity classification loss value and a pixel classification loss value, that is, the location loss value is jointly determined by the connectivity classification loss value and the pixel classification loss value.

像素(Pixel)分为正像素和负像素,所有落在文本区域内的像素标记为正像素,所有落在文本区域以外的像素标记为负像素,多个文本交叠区域也标记为负像素。Pixels are divided into positive pixels and negative pixels. All pixels that fall within the text area are marked as positive pixels, all pixels that fall outside the text area are marked as negative pixels, and multiple text overlapping areas are also marked as negative pixels.

连通关系是由两个像素双向决定的,对于一个给定的像素及其临近的八个像素点,如果两个像素都是正像素,那么它们之间的连通关系为正连通,如果一个像素是正像素,另一个是负像素,那么它们之间的连通关系为正连通,如果两个像素均为负像素,那么它们之间的连通关系为负连通。The connectivity relationship is determined by two pixels bidirectionally. For a given pixel and its adjacent eight pixels, if both pixels are positive pixels, then the connectivity relationship between them is positive connectivity, if a pixel is a positive pixel , the other is a negative pixel, then the connectivity between them is positive connectivity, if both pixels are negative pixels, then the connectivity between them is negative connectivity.

在本实施例中,可以先计算得到连通分类损失值和像素分类损失值,然后结合连通分类损失值和像素分类损失值共同计算出位置损失值。连通分类损失值是基于标注数据上各个像素间连通关系分类和网络预测的连通关系分类结果,通过交叉熵函数求出。像素分类损失值是基于像素分类结果计算得到的。In this embodiment, the connectivity classification loss value and the pixel classification loss value may be calculated first, and then the location loss value may be jointly calculated by combining the connectivity classification loss value and the pixel classification loss value. The loss value of connectivity classification is based on the classification result of the connectivity relationship between each pixel on the labeled data and the connectivity relationship predicted by the network, and is obtained by the cross entropy function. The pixel classification loss value is calculated based on the pixel classification result.

在得到位置损失值和倾斜损失值之后,可以结合位置损失值对应的位置权重,倾斜损失值对应的倾斜权重,计算得到损失值,如下述公式(1)所示:After obtaining the position loss value and the tilt loss value, the loss value can be calculated by combining the position weight corresponding to the position loss value and the tilt weight corresponding to the tilt loss value, as shown in the following formula (1):

L=λ1Lpixel2Llink3Ldirection (1)L=λ1 Lpixel2 Llink3 Ldirection (1)

其中,L为损失值,Ldirection为倾斜损失值,Llink为连通分类损失值,Lpixel为像素分类(正像素或者负像素)损失值,λ3为倾斜权重,λ2为连通分类权重,λ1为像素分类权重。Among them, L is the loss value, Ldirection is the tilt loss value, Llink is the connected classification loss value, Lpixel is the pixel classification (positive pixel or negative pixel) loss value, λ3 is the tilt weight, λ2 is the connected classification weight, λ1 is the pixel classification weight.

根据像素分类结果和连通分类结果(0or1),基于并查集算法,将都是正像素、且这些像素之间均有连通关系的群,各自组成了一个个文本框。连通权重即是指连通损失值所对应的权重。According to the pixel classification result and the connectivity classification result (0or1), based on the union search algorithm, the groups that are all positive pixels and have a connected relationship between these pixels each form a text box. The connectivity weight refers to the weight corresponding to the connectivity loss value.

Lpixel是基于标注数据和网络预测结果X1,通过交叉熵损失函数求出;Llink是基于标注数据和网络预测结果X2,通过交叉熵损失函数求出;Ldirection是基于标注数据和网络预测结果X3,通过交叉熵损失函数求出。λ1、λ2、λ3是各个损失函数所占的权值参数,可以通过实际训练效果调节。Lpixel is based on the labeled data and the network prediction result X1, and is obtained by the cross entropy loss function; Llink is based on the labeled data and the network prediction result X2, and is obtained by the cross entropy loss function; Ldirection is based on the labeled data and the network prediction result. X3, calculated by the cross-entropy loss function. λ1 , λ2 , and λ3 are the weight parameters occupied by each loss function, which can be adjusted by the actual training effect.

在根据各初始位置信息、各预测位置信息、各初始倾斜方向、各预测倾斜方向,计算得到初始文本检测模型的损失值之后,执行子步骤S4。After calculating the loss value of the initial text detection model according to each initial position information, each predicted position information, each initial tilt direction, and each predicted tilt direction, sub-step S4 is executed.

子步骤S4:在所述损失值处于预设范围内的情况下,将训练后的初始文本检测模型作为所述文本检测模型。Sub-step S4: when the loss value is within a preset range, use the trained initial text detection model as the text detection model.

预设范围可以根据实际应用场景和实际需求由研发人员预先设定,如3~5等,本申请实施例对于预设范围的具体数值不加以限制。The preset range may be preset by the research and development personnel according to actual application scenarios and actual needs, such as 3 to 5, etc. The specific numerical values of the preset range are not limited in this embodiment of the present application.

在损失值处于预设范围内的情况下,则可以认为初始文本检测模型经过训练之后,在对文本图片识别时,可以达到预设的要求,此时,可以将训练后的初始文本检测模型作为最终的文本检测模型,例如,预设范围为3~5,而在损失值处于3~5的范围内时,则认为初始文本检测模型训练完成,可以将训练后的初始文本检测模型作为最终的文本检测模型。而在损失值处于3~5的范围外时,则认为初始文本检测模型还未训练成功,可以增加训练样本,以对初始文件检测模型继续进行训练。When the loss value is within the preset range, it can be considered that after the initial text detection model is trained, the preset requirements can be met when recognizing text images. At this time, the trained initial text detection model can be used as For the final text detection model, for example, the preset range is 3 to 5, and when the loss value is in the range of 3 to 5, it is considered that the training of the initial text detection model is completed, and the trained initial text detection model can be used as the final text detection model. Text detection model. When the loss value is outside the range of 3 to 5, it is considered that the initial text detection model has not been successfully trained, and training samples can be added to continue training the initial document detection model.

在确定预先训练的文本检测模型之后,执行步骤202。After determining the pre-trained text detection model,step 202 is performed.

步骤202:获取包含文本信息的待识别图片。Step 202: Acquire a to-be-identified picture containing text information.

待识别图片是指包含有文本信息的,用于进行文本识别的图片。The picture to be recognized refers to a picture that contains text information and is used for text recognition.

在某些示例中,待识别图片可以是从互联网中随机选取的一张图片,例如,从某网站中选择的一张包含有文本信息的图片等。In some examples, the picture to be identified may be a picture randomly selected from the Internet, for example, a picture containing text information selected from a website.

在某些示例中,待识别图片可以是由用户输入的一张图片,例如,用户采用图片搜索所需信息时输入的图片等。如结合图4e和图4f描述。应用背景:用户通过笔尖指定需要翻译的单词,笔尖和摄像头的相对位置固定,因此笔尖在图像中的位置固定,选择方法:1、以笔尖所在位置为底边的中心,根据图像中文本大小,设定一个固定大小的矩形区域(如图4f所示),作为虚拟笔尖;2、分别计算矩形区域和每个被检出的文本框的重叠面积;3、找到重叠面积笔尖矩形区域的比例最大的文本框,即选为用户指定的待译单词。In some examples, the picture to be recognized may be a picture input by the user, for example, the picture input by the user when searching for the desired information using the picture, and the like. As described in conjunction with Figures 4e and 4f. Application background: The user specifies the word to be translated through the pen tip. The relative position of the pen tip and the camera is fixed, so the position of the pen tip in the image is fixed. Selection method: 1. Take the position of the pen tip as the center of the bottom edge, according to the size of the text in the image, Set a rectangular area with a fixed size (as shown in Figure 4f) as the virtual pen tip; 2. Calculate the overlapping area of the rectangular area and each detected text box respectively; 3. Find the largest proportion of the overlapping area of the pen tip rectangle area text box, that is, select the word to be translated specified by the user.

可以理解地,上述示例仅是为了更好地理解本申请实施例的技术方案而列举的示例,不作为对本申请实施例的唯一限制。It can be understood that the above examples are only examples listed for better understanding of the technical solutions of the embodiments of the present application, and are not intended to be the only limitations on the embodiments of the present application.

在获取包含有文本信息的待识别图片之后,执行步骤203。After acquiring the to-be-identified picture containing text information,step 203 is performed.

步骤203:调用所述分类结果获取层对所述待识别图片进行处理,获取所述待识别图片上的正像素。Step 203: Invoke the classification result acquisition layer to process the to-be-identified picture, and acquire positive pixels on the to-be-identified picture.

在本实施例中,初始文本检测模型可以包括分类结果获取层和倾斜方向获取层,其中,分类结果获取层可以用于识别获取图片中的像素分类结果和连通分类结果,而倾斜方向获取层可以用于识别获取图片中文本的倾斜方向。In this embodiment, the initial text detection model may include a classification result acquisition layer and an oblique direction acquisition layer, wherein the classification result acquisition layer may be used to identify and acquire pixel classification results and connected classification results in a picture, and the oblique direction acquisition layer may Used to identify the oblique direction of the text in the acquired image.

网络得到的是图片上每个像素是否为正像素(即文字部分)的预测值,以及像素之间是否连通的预测值。根据这些值(0or 1),最终基于并查集算法,将都是正像素、且这些像素之间均有连通关系的群,各自组成了一个个文本框。What the network gets is the predicted value of whether each pixel on the picture is a positive pixel (that is, the text part), and the predicted value of whether the pixels are connected. According to these values (0 or 1), finally based on the union search algorithm, the groups that are all positive pixels and have a connected relationship between these pixels will form a text box.

步骤204:根据所述正像素,确定所述待识别图片中的至少一个文本框。Step 204: Determine at least one text box in the to-be-recognized picture according to the positive pixels.

在获取待识别图片上的正像素的预测值后,可以根据这些预测值,以及像素之间是否连通的预测值,将都是正像素、且这些像素之间均有连通关系的群,各自组成了一个个文本框。After obtaining the predicted values of the positive pixels on the image to be identified, according to these predicted values and the predicted values of whether the pixels are connected, the groups that are all positive pixels and have a connected relationship between these pixels form groups of each other. a text box.

步骤205:调用所述倾斜方向获取层对所述待识别图片进行处理,确定所述至少一个文本框的待确认倾斜方向,及所述待确认倾斜方向对应的倾斜阈值。Step 205 : Invoke the inclination direction acquisition layer to process the to-be-recognized picture, determine the inclination direction of the at least one text box to be confirmed, and the inclination threshold corresponding to the to-be-confirmed inclination direction.

待确认倾斜方向是指通过文本检测模型中的第二神经网络层对至少一个文本框中的像素进行识别,得到的至少一个文本框的倾斜方向。The inclination direction to be confirmed refers to the inclination direction of the at least one text box obtained by identifying the pixels in the at least one text box by the second neural network layer in the text detection model.

倾斜阈值是指通过第二神经网络层识别得到的至少一个文本框中文本像素所对应的待确认倾斜方向所对应的阈值。The inclination threshold refers to the threshold corresponding to the inclination direction to be confirmed corresponding to the text pixel in at least one text box identified by the second neural network layer.

通过调用倾斜方向获取层对待识别图片进行处理,可以确定至少一个文本框中文本的待确认倾斜方向,以及待确认倾斜方向对应的倾斜阈值。By invoking the inclination direction acquisition layer to process the to-be-recognized image, the inclination direction of the text in at least one text box to be confirmed and the inclination threshold corresponding to the inclination direction to be confirmed can be determined.

步骤206:根据所述倾斜阈值中的最大倾斜阈值,确定所述至少一个文本框对应的倾斜方向。Step 206: Determine a tilt direction corresponding to the at least one text box according to the maximum tilt threshold in the tilt thresholds.

在获取至少一个文本框的倾斜方向对应的倾斜阈值之后,可以比较各倾斜阈值之间的大小关系,并根据比较结果确定至少一个文本框的倾斜方向。如图4c所示,基于X3,取出四个文字方向区间对应概率值中最大的值,最大概率值所对应的方向区间,被认为该像素所属的文本框的文字方向,即对该像素所对应的文字方向进行了分类(属于四个方向区间的哪一个)。After acquiring the inclination threshold corresponding to the inclination direction of the at least one text box, the magnitude relationship between the inclination thresholds may be compared, and the inclination direction of the at least one text box may be determined according to the comparison result. As shown in Figure 4c, based on X3, the largest value among the probability values corresponding to the four text direction intervals is taken out, and the direction interval corresponding to the maximum probability value is considered to be the text direction of the text box to which the pixel belongs, that is, the pixel corresponding to the pixel. The text direction of is classified (which one of the four direction intervals belongs to).

步骤207:在根据所述倾斜方向确定所述文本框的倾斜角度位于90°至180°之间时,将所述文本框逆时针旋转90°,得到所述校正文本框。Step 207 : when the inclination angle of the text box is determined to be between 90° and 180° according to the inclination direction, rotate the text box by 90° counterclockwise to obtain the corrected text box.

在根据倾斜方向确定文本框的倾斜角度位于90°至180°之间时,则将文本框逆时针旋转90°,得到校正文本框,具体地,如果文字方向属于90°至180°之间,则将图片以左上角点为旋转中心、逆时针旋转90°,以得到校正文本框。When it is determined that the inclination angle of the text box is between 90° and 180° according to the inclination direction, the text box is rotated counterclockwise by 90° to obtain a corrected text box. Specifically, if the text direction is between 90° and 180°, Then take the upper left corner of the image as the rotation center and rotate it 90° counterclockwise to get the corrected text box.

步骤208:在根据所述倾斜方向确定所述文本框的倾斜角度位于180°至270°之间时,将所述文本框逆时针旋转180°,得到所述校正文本框。Step 208 : when the inclination angle of the text box is determined to be between 180° and 270° according to the inclination direction, rotate the text box by 180° counterclockwise to obtain the corrected text box.

在根据倾斜方向确定文本框的倾斜角度位于180°至270°之间时,则将文本框逆时针旋转180°,得到校正文本框,具体地,如果文本方向属于180°至270°之间时,则将图片以左上角点为旋转中心、逆时针旋转180°,以得到校正文本框。When it is determined that the inclination angle of the text box is between 180° and 270° according to the inclination direction, the text box is rotated counterclockwise by 180° to obtain a corrected text box. Specifically, if the text direction is between 180° and 270° , then rotate the picture 180° counterclockwise with the upper left point as the rotation center to get the corrected text box.

步骤209:在根据所述倾斜方向确定所述文本框的倾斜角度位于270°至360°之间时,将所述文本框逆时针旋转270°,得到所述校正文本框。Step 209 : when the inclination angle of the text box is determined to be between 270° and 360° according to the inclination direction, rotate the text box counterclockwise by 270° to obtain the corrected text box.

在根据倾斜方向确定文本框的倾斜角度位于270°至360°之间,则将文本框逆时针旋转270°,得到校正文本框,具体地,如果文本方向属于270°至360°之间时,则将图片以左上角点为旋转中心、逆时针旋转270°。When it is determined that the inclination angle of the text box is between 270° and 360° according to the inclination direction, the text box is rotated counterclockwise by 270° to obtain a corrected text box. Specifically, if the text direction is between 270° and 360°, Then rotate the picture 270° counterclockwise with the upper left point as the rotation center.

当然,在根据倾斜方向确定文本框的倾斜角度位于0到90度之间时,无需对文本框进行校正处理。Certainly, when it is determined that the inclination angle of the text box is between 0 and 90 degrees according to the inclination direction, it is not necessary to perform correction processing on the text box.

步骤210:将所述校正文本框输入至文本识别模型,通过所述文本识别模型确定所述校正文本框中包含的文本信息。Step 210: Input the corrected text box into a text recognition model, and determine the text information contained in the corrected text box by the text recognition model.

在对文本框进行校正得到校正文本框之后,可以利用opencv的minAreaRect得到该组像素对应的矩形文本框,文本框通过中心点坐标、宽W、高H和倾斜角度a表示(如图4d所示)。将图片以中心点为旋转中心、逆时针旋转a,此时该文本框内文字已旋转成水平。最后取出校正文本框范围的图片部分,用于送入到后面的文本识别模型中进行文字内容的识别。即在本实施例中,采用了两个模型,一个是文本检测模型,另一个是文本识别模型,文本检测模型可以用于对图片中的倾斜文本进行检测并校正,而文本识别模型则可以用于对校正后的文本框中的文本信息进行识别。After correcting the text box to obtain the corrected text box, the minAreaRect of opencv can be used to obtain the rectangular text box corresponding to the group of pixels. The text box is represented by the center point coordinates, the width W, the height H and the inclination angle a (as shown in Figure 4d). ). Rotate the picture with the center point as the rotation center, and rotate it counterclockwise by a. At this time, the text in the text box has been rotated to be horizontal. Finally, the picture part of the corrected text box range is taken out and used to be sent to the text recognition model for text content recognition. That is, in this embodiment, two models are used, one is a text detection model and the other is a text recognition model. The text detection model can be used to detect and correct the oblique text in the picture, and the text recognition model can be used It is used to identify the text information in the corrected text box.

在得到校正文本框内的至少一个文本信息之后,可以结合至少一个确定出待识别图片中包含的目标文本信息,具体地,可以判断哪些文本是连通的,然后将这两个文本联合,从而形成最终的目标文本信息。After obtaining at least one piece of text information in the corrected text box, the target text information contained in the image to be recognized can be determined in combination with at least one piece of text information. Specifically, it can be determined which texts are connected, and then the two texts can be combined to form The final target text information.

本申请实施例提供的文本识别方法,通过获取包含文本信息的待识别图片,通过预先训练好的文本检测模型对待识别图片进行识别,确定待识别图片中包含文本的至少一个文本框,及各文本框对应的倾斜方向,根据倾斜方向对各文本框的文本方向进行校正,得到文本方向校正后的校正文本框,识别校正文本框中的文本信息。本申请实施例同时融合文本方向分类网络和文本框位置检测网络的模型,然后根据大的角度分类结果和文本框位置的检测结果,最终得到精确的文字内容的方向,提高了图片文本识别的准确性。In the text recognition method provided by the embodiment of the present application, by acquiring a to-be-recognized picture containing text information, the pre-trained text detection model is used to recognize the to-be-recognized picture, and at least one text box containing text in the to-be-recognized picture is determined, and each text The text direction of each text box is corrected according to the inclination direction corresponding to the frame, to obtain a corrected text box after the text direction is corrected, and to identify the text information in the corrected text box. In this embodiment of the present application, the models of the text orientation classification network and the text box position detection network are fused simultaneously, and then according to the large angle classification result and the detection result of the text box position, an accurate text content orientation is finally obtained, which improves the accuracy of image and text recognition. sex.

参照图5,示出了本申请实施例提供的一种文本识别装置的结构示意图,该文本识别装置具体可以包括如下模块:Referring to FIG. 5 , a schematic structural diagram of a text recognition device provided by an embodiment of the present application is shown, and the text recognition device may specifically include the following modules:

待识别图片获取模块310,用于获取包含文本信息的待识别图片;A to-be-recognized picture acquisition module 310, configured to acquire a to-be-recognized picture containing text information;

待识别图片识别模块320,用于通过预先训练好的文本检测模型对所述待识别图片进行识别,确定所述待识别图片中包含文本的至少一个文本框,及各所述文本框对应的倾斜方向;The to-be-recognized image recognition module 320 is configured to recognize the to-be-recognized image through a pre-trained text detection model, and to determine at least one text box containing text in the to-be-recognized image, and the inclination corresponding to each of the text boxes direction;

校正文本框获取模块330,用于根据所述倾斜方向对各所述文本框的文本方向进行校正,得到所述文本方向校正后的校正文本框;a correction text box obtaining module 330, configured to correct the text direction of each of the text boxes according to the tilt direction, to obtain a corrected text box after the text direction is corrected;

文本信息确定模块340,用于识别所述校正文本框中的文本信息。The text information determination module 340 is configured to identify the text information in the corrected text box.

本申请实施例提供的文本识别装置,通过获取包含文本信息的待识别图片,通过预先训练好的文本检测模型对待识别图片进行识别,确定待识别图片中包含文本的至少一个文本框,及各文本框对应的倾斜方向,根据倾斜方向对各文本框的文本方向进行校正处理,得到文本方向校正后的校正文本框,识别校正文本框中的文本信息。本申请实施例同时融合文本方向分类网络和文本框位置检测网络的模型,然后根据大的角度分类结果和文本框位置的检测结果,最终得到精确的文字内容的方向,提高了图片文本识别的准确性。The text recognition device provided by the embodiment of the present application, by acquiring the to-be-recognized image containing text information, and identifying the to-be-recognized image through a pre-trained text detection model, to determine at least one text box containing text in the to-be-recognized image, and each text The text direction of each text box is corrected according to the tilt direction corresponding to the frame, to obtain a corrected text box after the text direction is corrected, and to identify the text information in the corrected text box. In this embodiment of the present application, the models of the text orientation classification network and the text box position detection network are fused simultaneously, and then according to the large angle classification result and the detection result of the text box position, an accurate text content orientation is finally obtained, which improves the accuracy of image and text recognition. sex.

参照图6,示出了本申请实施例提供的一种文本识别装置的结构示意图,该文本识别装置具体可以包括如下模块:Referring to FIG. 6, a schematic structural diagram of a text recognition device provided by an embodiment of the present application is shown, and the text recognition device may specifically include the following modules:

文本检测模型确定模块410,用于确定预先训练的所述文本检测模型;a text detection model determination module 410, configured to determine the pre-trained text detection model;

待识别图片获取模块420,用于获取包含文本信息的待识别图片;A to-be-recognized picture acquisition module 420, configured to acquire a to-be-recognized picture containing text information;

待识别图片识别模块430,用于通过预先训练好的文本检测模型对所述待识别图片进行识别,确定所述待识别图片中包含文本的至少一个文本框,及各所述文本框对应的倾斜方向;The to-be-recognized image recognition module 430 is configured to recognize the to-be-recognized image through a pre-trained text detection model, and to determine at least one text box containing text in the to-be-recognized image, and the inclination corresponding to each of the text boxes direction;

校正文本框获取模块440,用于根据所述倾斜方向对各所述文本框的文本方向进行校正,得到所述文本方向校正后的校正文本框;A correction text box obtaining module 440, configured to correct the text direction of each of the text boxes according to the inclination direction, to obtain a corrected text box after the text direction is corrected;

文本信息确定模块450,用于识别所述校正文本框中的文本信息。The text information determination module 450 is configured to identify the text information in the corrected text box.

可选地,所述文本检测模型确定模块410包括:Optionally, the text detection model determination module 410 includes:

样本图片获取单元,用于获取样本图片;所述样本图片中包含有预先标注的至少一个初始文本框,及各所述初始文本框在所述样本图片中所处的初始位置信息、各所述初始文本框的初始倾斜方向;A sample picture obtaining unit, used for obtaining a sample picture; the sample picture includes at least one pre-marked initial text box, and initial position information of each initial text box in the sample picture, each of the The initial tilt direction of the initial text box;

预测文本框确定单元,用于将所述样本图片依次输入至初始文本检测模型对所述初始文本检测模型进行训练,确定所述样本图片对应的至少一个预测文本框,及各所述预测文本框在所述样本图片中所处的预测位置信息、各所述预测文本框的预测倾斜方向;A predictive text box determining unit, configured to sequentially input the sample pictures into an initial text detection model to train the initial text detection model, and determine at least one predictive text box corresponding to the sample picture, and each predictive text box The predicted position information in the sample picture, and the predicted inclination direction of each of the predicted text boxes;

损失值计算单元,用于根据各所述初始位置信息、各所述预测位置信息、各所述初始倾斜方向和各所述预测倾斜方向,计算得到所述初始文本检测模型的损失值;a loss value calculation unit, configured to calculate the loss value of the initial text detection model according to each of the initial position information, each of the predicted position information, each of the initial inclination directions and each of the predicted inclination directions;

文本检测模型获取单元,用于在所述损失值处于预设范围内的情况下,将训练后的初始文本检测模型作为所述文本检测模型;a text detection model acquisition unit, configured to use the trained initial text detection model as the text detection model when the loss value is within a preset range;

可选地,所述损失值计算单元包括:Optionally, the loss value calculation unit includes:

位置损失值计算子单元,用于根据各所述初始位置信息和各所述预测位置信息,计算得到位置损失值;a position loss value calculation subunit, configured to calculate a position loss value according to each of the initial position information and each of the predicted position information;

倾斜损失值计算子单元,用于根据各所述初始倾斜方向和各所述预测倾斜方向,计算得到倾斜损失值;a tilt loss value calculation subunit, configured to calculate and obtain a tilt loss value according to each of the initial tilt directions and each of the predicted tilt directions;

损失值计算子单元,用于根据所述位置损失值、位置权重、所述倾斜损失值和倾斜权重,计算得到所述初始文本检测模型的损失值。The loss value calculation subunit is configured to calculate the loss value of the initial text detection model according to the position loss value, the position weight, the inclination loss value and the inclination weight.

可选地,所述文本检测模型包括:分类结果获取层和倾斜方向获取层,所述待识别图片识别模块430包括:Optionally, the text detection model includes: a classification result acquisition layer and an oblique direction acquisition layer, and the to-be-recognized picture recognition module 430 includes:

正像素获取单元431,用于调用所述分类结果获取层对所述待识别图片进行处理,获取所述待识别图片上的像素分类结果和连通分类结果;A positive pixel acquisition unit 431, configured to call the classification result acquisition layer to process the to-be-recognized picture, and acquire the pixel classification result and the connected classification result on the to-be-recognized picture;

文本框确定单元432,用于根据所述像素分类结果和所述连通分类结果,确定所述待识别图片中的至少一个文本框;A text box determination unit 432, configured to determine at least one text box in the to-be-recognized picture according to the pixel classification result and the connectivity classification result;

倾斜阈值确定单元433,用于调用所述倾斜方向获取层对所述待识别图片进行处理,确定所述至少一个文本框中文本的待确认倾斜方向,及所述待确认倾斜方向对应的倾斜阈值;The inclination threshold determination unit 433 is configured to call the inclination direction acquisition layer to process the to-be-recognized picture, determine the inclination direction of the text in the at least one text box to be confirmed, and the inclination threshold corresponding to the inclination direction to be confirmed ;

倾斜方向确定单元434,用于根据所述倾斜阈值中的最大倾斜阈值,确定所述至少一个文本框对应的倾斜方向。The inclination direction determining unit 434 is configured to determine the inclination direction corresponding to the at least one text box according to the largest inclination threshold value among the inclination thresholds.

可选地,所述校正文本框获取模块440包括:Optionally, the correction text box obtaining module 440 includes:

第一校正框获取单元441,用于在根据所述倾斜方向确定所述文本框的倾斜角度位于90°至180°之间时,将所述文本框逆时针旋转90°,得到所述校正文本框;The first correction frame obtaining unit 441 is configured to rotate the text frame by 90° counterclockwise when the inclination angle of the text frame is determined to be between 90° and 180° according to the inclination direction to obtain the correction text frame;

第二校正框获取单元442,用于在根据所述倾斜方向确定所述文本框的倾斜角度位于180°至270°之间时,将所述文本框逆时针旋转180°,得到所述校正文本框;The second correction frame obtaining unit 442 is configured to rotate the text frame by 180° counterclockwise when the inclination angle of the text frame is determined to be between 180° and 270° according to the inclination direction to obtain the correction text frame;

第三校正框获取单元443,用于在根据所述倾斜方向确定所述文本框的倾斜角度位于270°至360°之间时,将所述文本框逆时针旋转270°,得到所述校正文本框。The third correction frame obtaining unit 443 is configured to rotate the text frame by 270° counterclockwise when the inclination angle of the text frame is determined to be between 270° and 360° according to the inclination direction to obtain the correction text frame.

可选地,所述文本信息确定模块450包括:Optionally, the text information determining module 450 includes:

文本信息确定单元451,用于将所述校正文本框输入至文本识别模型,通过所述文本识别模型确定所述校正文本框中包含的文本信息。The text information determination unit 451 is configured to input the corrected text box into a text recognition model, and determine the text information contained in the corrected text box through the text recognition model.

本申请实施例提供的文本识别装置,通过获取包含文本信息的待识别图片,通过预先训练好的文本检测模型对待识别图片进行识别,确定待识别图片中包含文本的至少一个文本框,及各文本框对应的倾斜方向,根据倾斜方向对各文本框的文本方向进行校正,得到文本方向校正后的校正文本框,识别校正文本框中的文本信息。本申请实施例同时融合文本方向分类网络和文本框位置检测网络的模型,然后根据大的角度分类结果和文本框位置的检测结果,最终得到精确的文字内容的方向,提高了图片文本识别的准确性。The text recognition device provided by the embodiment of the present application, by acquiring the to-be-recognized image containing text information, and identifying the to-be-recognized image through a pre-trained text detection model, to determine at least one text box containing text in the to-be-recognized image, and each text The text direction of each text box is corrected according to the inclination direction corresponding to the frame, to obtain a corrected text box after the text direction is corrected, and to identify the text information in the corrected text box. In this embodiment of the present application, the models of the text orientation classification network and the text box position detection network are fused simultaneously, and then according to the large angle classification result and the detection result of the text box position, an accurate text content orientation is finally obtained, which improves the accuracy of image and text recognition. sex.

对于前述的各方法实施例,为了简单描述,故将其都表述为一系列的动作组合,但是本领域技术人员应该知悉,本申请并不受所描述的动作顺序的限制,因为依据本申请,某些步骤可以采用其他顺序或者同时进行。其次,本领域技术人员也应该知悉,说明书中所描述的实施例均属于优选实施例,所涉及的动作和模块并不一定是本申请所必须的。For the foregoing method embodiments, for the sake of simple description, they are all expressed as a series of action combinations, but those skilled in the art should know that the present application is not limited by the described action sequence, because according to the present application, Certain steps may be performed in other orders or simultaneously. Secondly, those skilled in the art should also know that the embodiments described in the specification are all preferred embodiments, and the actions and modules involved are not necessarily required by the present application.

另外地,本申请实施例还提供了一种电子设备,包括:处理器、存储器以及存储在所述存储器上并可在所述处理器上运行的计算机程序,所述处理器执行所述程序时实现上述任一项所述的文本识别方法。In addition, an embodiment of the present application also provides an electronic device, including: a processor, a memory, and a computer program stored on the memory and executable on the processor, when the processor executes the program Implement the text recognition method described in any one of the above.

本申请实施例还提供了一种非易失性计算机可读存储介质,当所述存储介质中的指令由处理器执行时,使得处理器能够执行上述任一实施例中所述的文本识别方法。The embodiments of the present application further provide a non-volatile computer-readable storage medium, when the instructions in the storage medium are executed by the processor, the processor can execute the text recognition method described in any of the above embodiments .

本说明书中的各个实施例均采用递进的方式描述,每个实施例重点说明的都是与其他实施例的不同之处,各个实施例之间相同相似的部分互相参见即可。The various embodiments in this specification are described in a progressive manner, and each embodiment focuses on the differences from other embodiments, and the same and similar parts between the various embodiments may be referred to each other.

最后,还需要说明的是,在本文中,诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、商品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、商品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、商品或者设备中还存在另外的相同要素。Finally, it should also be noted that in this document, relational terms such as first and second are used only to distinguish one entity or operation from another, and do not necessarily require or imply these entities or that there is any such actual relationship or sequence between operations. Furthermore, the terms "comprising", "comprising" or any other variation thereof are intended to encompass a non-exclusive inclusion such that a process, method, article of manufacture or device comprising a list of elements includes not only those elements, but also includes not explicitly listed or other elements inherent to such a process, method, commodity or apparatus. Without further limitation, an element qualified by the phrase "comprising a..." does not preclude the presence of additional identical elements in the process, method, article of manufacture, or device that includes the element.

以上对本申请所提供的一种文本识别方法、一种文本识别装置、一种电子设备和一种计算机可读存储介质,进行了详细介绍,本文中应用了具体个例对本申请的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本申请的方法及其核心思想;同时,对于本领域的一般技术人员,依据本申请的思想,在具体实施方式及应用范围上均会有改变之处,综上所述,本说明书内容不应理解为对本申请的限制。A text recognition method, a text recognition device, an electronic device, and a computer-readable storage medium provided by the present application have been described in detail above. Specific examples are used in this paper to describe the principles and implementations of the present application. Elaborated, the description of the above embodiment is only used to help understand the method of the present application and its core idea; meanwhile, for those of ordinary skill in the art, according to the idea of the present application, there will be a For changes, in summary, the content of this specification should not be construed as a limitation on this application.

Claims (10)

Translated fromChinese
1.一种文本识别方法,其特征在于,包括:1. a text recognition method, is characterized in that, comprises:获取包含文本信息的待识别图片;Obtain the to-be-recognized image containing text information;通过预先训练好的文本检测模型对所述待识别图片进行识别,确定所述待识别图片中包含文本的至少一个文本框,及各所述文本框对应的倾斜方向;Identify the picture to be recognized by using a pre-trained text detection model, and determine at least one text box containing text in the picture to be recognized, and the inclination direction corresponding to each of the text boxes;根据所述倾斜方向对各所述文本框的文本方向进行校正,得到所述文本方向校正后的校正文本框;Correct the text direction of each of the text boxes according to the inclination direction to obtain a corrected text box after the text direction is corrected;识别所述校正文本框中的文本信息。Identify text information in the corrected text box.2.根据权利要求1所述的方法,其特征在于,在所述获取包含文本信息的待识别图片之前,还包括:2. The method according to claim 1, characterized in that, before said acquiring the to-be-recognized picture containing text information, further comprising:确定预先训练的所述文本检测模型;determining the pre-trained text detection model;所述确定预先训练的所述文本检测模型,包括The determining the pre-trained text detection model includes获取样本图片;所述样本图片中包含有预先标注的至少一个初始文本框,及各所述初始文本框在所述样本图片中所处的初始位置信息、各所述初始文本框的初始倾斜方向;Obtain a sample picture; the sample picture includes at least one pre-marked initial text box, initial position information of each initial text box in the sample picture, and initial tilt direction of each initial text box ;将所述样本图片依次输入至初始文本检测模型对所述初始文本检测模型进行训练,确定所述样本图片对应的至少一个预测文本框,及各所述预测文本框在所述样本图片中所处的预测位置信息、各所述预测文本框的预测倾斜方向;The sample pictures are sequentially input into the initial text detection model to train the initial text detection model, and at least one predicted text box corresponding to the sample picture is determined, and the position of each predicted text box in the sample picture is determined. The predicted position information, the predicted inclination direction of each of the predicted text boxes;根据各所述初始位置信息、各所述预测位置信息、各所述初始倾斜方向和各所述预测倾斜方向,计算得到所述初始文本检测模型的损失值;Calculate the loss value of the initial text detection model according to each of the initial position information, each of the predicted position information, each of the initial inclination directions and each of the predicted inclination directions;在所述损失值处于预设范围内的情况下,将训练后的初始文本检测模型作为所述文本检测模型。When the loss value is within a preset range, the initial text detection model after training is used as the text detection model.3.根据权利要求2所述的方法,其特征在于,所述根据各所述初始位置信息、各所述预测位置信息、各所述初始倾斜方向和各所述预测倾斜方向,计算得到所述初始文本检测模型的损失值,包括:3 . The method according to claim 2 , wherein the calculation is performed according to each of the initial position information, each of the predicted position information, each of the initial inclination directions, and each of the predicted inclination directions. 4 . Loss values for the initial text detection model, including:根据各所述初始位置信息和各所述预测位置信息,计算得到位置损失值;Calculate the position loss value according to each of the initial position information and each of the predicted position information;根据各所述初始倾斜方向和各所述预测倾斜方向,计算得到倾斜损失值;Calculate the tilt loss value according to each of the initial tilt directions and each of the predicted tilt directions;根据所述位置损失值、位置权重、所述倾斜损失值和倾斜权重,计算得到所述初始文本检测模型的损失值。According to the position loss value, the position weight, the inclination loss value and the inclination weight, the loss value of the initial text detection model is calculated.4.根据权利要求3所述的方法,其特征在于,所述文本检测模型包括:分类结果获取层和倾斜方向获取层,所述通过预先训练的文本检测模型对所述待识别图片进行识别,确定所述待识别图片中包含文本的至少一个文本框,及各所述文本框对应的倾斜方向,包括:4. The method according to claim 3, wherein the text detection model comprises: a classification result acquisition layer and an oblique direction acquisition layer, and the to-be-recognized picture is recognized by a pre-trained text detection model, Determining at least one text box containing text in the to-be-recognized picture and the inclination direction corresponding to each of the text boxes, including:调用所述分类结果获取层对所述待识别图片进行处理,获取所述待识别图片上的像素分类结果和连通分类结果;Invoke the classification result acquisition layer to process the to-be-recognized picture, and obtain the pixel classification result and the connected classification result on the to-be-recognized picture;根据所述像素分类结果和所述连通分类结果,确定所述待识别图片中的至少一个文本框;According to the pixel classification result and the connectivity classification result, determine at least one text box in the to-be-recognized picture;调用所述倾斜方向获取层对所述待识别图片进行处理,确定所述至少一个文本框中文本的待确认倾斜方向,及所述待确认倾斜方向对应的倾斜阈值;invoking the inclination direction acquisition layer to process the to-be-recognized picture, to determine the inclination direction of the text in the at least one text box to be confirmed, and the inclination threshold corresponding to the to-be-confirmed inclination direction;根据所述倾斜阈值中的最大倾斜阈值,确定所述至少一个文本框对应的倾斜方向。According to the maximum inclination threshold among the inclination thresholds, the inclination direction corresponding to the at least one text box is determined.5.根据权利要求1所述的方法,其特征在于,所述根据所述倾斜方向对各所述文本框进行校正处理,得到校正文本框,包括:5 . The method according to claim 1 , wherein, performing correction processing on each of the text boxes according to the tilt direction to obtain a corrected text box, comprising: 6 .在根据所述倾斜方向确定所述文本框的倾斜角度位于90°至180°之间时,将所述文本框逆时针旋转90°,得到所述校正文本框;When it is determined that the inclination angle of the text box is between 90° and 180° according to the inclination direction, the text box is rotated counterclockwise by 90° to obtain the corrected text box;在根据所述倾斜方向确定所述文本框的倾斜角度位于180°至270°之间时,将所述文本框逆时针旋转180°,得到所述校正文本框;When it is determined that the inclination angle of the text box is between 180° and 270° according to the inclination direction, the text box is rotated counterclockwise by 180° to obtain the corrected text box;在根据所述倾斜方向确定所述文本框的倾斜角度位于270°至360°之间时,将所述文本框逆时针旋转270°,得到所述校正文本框。When it is determined that the inclination angle of the text box is between 270° and 360° according to the inclination direction, the text box is rotated counterclockwise by 270° to obtain the corrected text box.6.根据权利要求1所述的方法,其特征在于,所述识别所述校正文本框中的文本信息,包括:6. The method according to claim 1, wherein the identifying the text information in the correction text box comprises:将所述校正文本框输入至文本识别模型,通过所述文本识别模型确定所述校正文本框中包含的文本信息。The corrected text box is input into a text recognition model, and the text information contained in the corrected text box is determined by the text recognition model.7.一种文本识别装置,其特征在于,包括:7. A text recognition device, characterized in that, comprising:待识别图片获取模块,用于获取包含文本信息的待识别图片;A to-be-recognized image acquisition module, used to acquire a to-be-recognized image containing text information;待识别图片识别模块,用于通过预先训练好的文本检测模型对所述待识别图片进行识别处理,确定所述待识别图片中包含文本的至少一个文本框,及各所述文本框对应的倾斜方向;The to-be-recognized image recognition module is used to recognize the to-be-recognized image through a pre-trained text detection model, and to determine at least one text box containing text in the to-be-recognized image, and the inclination corresponding to each of the text boxes direction;校正文本框获取模块,用于根据所述倾斜方向对各所述文本框的文本方向进行校正处理,得到所述文本方向校正后的校正文本框;A correction text box acquisition module, configured to perform correction processing on the text orientation of each of the text boxes according to the inclination direction, to obtain a corrected text box after the text orientation is corrected;文本信息确定模块,用于识别所述校正文本框中的文本信息。A text information determination module, configured to identify the text information in the corrected text box.8.根据权利要求7所述的装置,其特征在于,还包括:8. The apparatus of claim 7, further comprising:文本检测模型确定模块,用于确定预先训练的所述文本检测模型;a text detection model determination module for determining the pre-trained text detection model;所述文本检测模型确定模块,包括:The text detection model determination module includes:样本图片获取单元,用于获取样本图片;所述样本图片中包含有预先标注的至少一个初始文本框,及各所述初始文本框在所述样本图片中所处的初始位置信息、各所述初始文本框的初始倾斜方向;A sample picture obtaining unit, used for obtaining a sample picture; the sample picture includes at least one pre-marked initial text box, and initial position information of each initial text box in the sample picture, each of the The initial tilt direction of the initial text box;预测文本框确定单元,用于将所述样本图片依次输入至初始文本检测模型对所述初始文本检测模型进行训练,确定所述样本图片对应的至少一个预测文本框,及各所述预测文本框在所述样本图片中所处的预测位置信息、各所述预测文本框的预测倾斜方向;A predictive text box determining unit, configured to sequentially input the sample pictures into an initial text detection model to train the initial text detection model, and determine at least one predictive text box corresponding to the sample picture, and each predictive text box The predicted position information in the sample picture, and the predicted inclination direction of each of the predicted text boxes;损失值计算单元,用于根据各所述初始位置信息、各所述预测位置信息、各所述初始倾斜方向和各所述预测倾斜方向,计算得到所述初始文本检测模型的损失值;a loss value calculation unit, configured to calculate the loss value of the initial text detection model according to each of the initial position information, each of the predicted position information, each of the initial inclination directions and each of the predicted inclination directions;文本检测模型获取单元,用于在所述损失值处于预设范围内的情况下,将训练后的初始文本检测模型作为所述文本检测模型。A text detection model obtaining unit, configured to use the trained initial text detection model as the text detection model when the loss value is within a preset range.9.一种电子设备,其特征在于,包括:9. An electronic device, characterized in that, comprising:处理器、存储器以及存储在所述存储器上并可在所述处理器上运行的计算机程序,所述处理器执行所述程序时实现权利要求1至6中任一项所述的文本识别方法。A processor, a memory, and a computer program stored on the memory and executable on the processor, the processor implementing the text recognition method of any one of claims 1 to 6 when the processor executes the program.10.一种计算机可读存储介质,其特征在于,当所述存储介质中的指令由电子设备的处理器执行时,使得电子设备能够执行权利要求1至6中任一项所述的文本识别方法。10. A computer-readable storage medium, wherein when the instructions in the storage medium are executed by a processor of an electronic device, the electronic device is enabled to perform the text recognition according to any one of claims 1 to 6 method.
CN202010226050.7A2020-03-262020-03-26 Text recognition method, device, electronic device and computer-readable storage mediumActiveCN111428717B (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN202010226050.7ACN111428717B (en)2020-03-262020-03-26 Text recognition method, device, electronic device and computer-readable storage medium

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN202010226050.7ACN111428717B (en)2020-03-262020-03-26 Text recognition method, device, electronic device and computer-readable storage medium

Publications (2)

Publication NumberPublication Date
CN111428717Atrue CN111428717A (en)2020-07-17
CN111428717B CN111428717B (en)2024-04-26

Family

ID=71555698

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN202010226050.7AActiveCN111428717B (en)2020-03-262020-03-26 Text recognition method, device, electronic device and computer-readable storage medium

Country Status (1)

CountryLink
CN (1)CN111428717B (en)

Cited By (13)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN111985459A (en)*2020-09-182020-11-24北京百度网讯科技有限公司Table image correction method, device, electronic equipment and storage medium
CN111985465A (en)*2020-08-172020-11-24中移(杭州)信息技术有限公司Text recognition method, device, equipment and storage medium
CN112115936A (en)*2020-10-102020-12-22京东方科技集团股份有限公司 A text recognition method, device, storage medium and electronic device
CN112200191A (en)*2020-12-012021-01-08北京京东尚科信息技术有限公司Image processing method, image processing device, computing equipment and medium
CN112329777A (en)*2021-01-062021-02-05平安科技(深圳)有限公司Character recognition method, device, equipment and medium based on direction detection
CN112651399A (en)*2020-12-302021-04-13中国平安人寿保险股份有限公司Method for detecting same-line characters in oblique image and related equipment thereof
CN113313117A (en)*2021-06-252021-08-27北京奇艺世纪科技有限公司Method and device for recognizing text content
CN113537189A (en)*2021-06-032021-10-22深圳市雄帝科技股份有限公司Handwritten character recognition method, device, equipment and storage medium
CN114330247A (en)*2021-11-092022-04-12世纪保众(北京)网络科技有限公司Automatic insurance clause analysis method based on image recognition
CN114596566A (en)*2022-04-182022-06-07腾讯科技(深圳)有限公司Text recognition method and related device
CN115223168A (en)*2022-06-302022-10-21蔚来汽车科技(安徽)有限公司Text detection method and system
CN115880682A (en)*2022-12-232023-03-31上海浦东发展银行股份有限公司 Image text recognition method, device, equipment, medium and product
CN117831038A (en)*2022-01-102024-04-05于胜田 A method and system for character recognition in big data digital archives

Citations (5)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN1878281A (en)*2006-07-132006-12-13北京中星微电子有限公司Method and apparatus for processing object edge in digital video image
US20160140701A1 (en)*2014-11-142016-05-19Adobe Systems IncorporatedFacilitating Text Identification and Editing in Images
CN110020676A (en)*2019-03-182019-07-16华南理工大学Method for text detection, system, equipment and medium based on more receptive field depth characteristics
CN110046616A (en)*2019-03-042019-07-23北京奇艺世纪科技有限公司Image processing model generation, image processing method, device, terminal device and storage medium
CN110490198A (en)*2019-08-122019-11-22上海眼控科技股份有限公司Text orientation bearing calibration, device, computer equipment and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN1878281A (en)*2006-07-132006-12-13北京中星微电子有限公司Method and apparatus for processing object edge in digital video image
US20160140701A1 (en)*2014-11-142016-05-19Adobe Systems IncorporatedFacilitating Text Identification and Editing in Images
CN110046616A (en)*2019-03-042019-07-23北京奇艺世纪科技有限公司Image processing model generation, image processing method, device, terminal device and storage medium
CN110020676A (en)*2019-03-182019-07-16华南理工大学Method for text detection, system, equipment and medium based on more receptive field depth characteristics
CN110490198A (en)*2019-08-122019-11-22上海眼控科技股份有限公司Text orientation bearing calibration, device, computer equipment and storage medium

Cited By (19)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN111985465A (en)*2020-08-172020-11-24中移(杭州)信息技术有限公司Text recognition method, device, equipment and storage medium
CN111985465B (en)*2020-08-172024-09-06中移(杭州)信息技术有限公司Text recognition method, device, equipment and storage medium
CN111985459B (en)*2020-09-182023-07-28北京百度网讯科技有限公司 Form image correction method, device, electronic device and storage medium
CN111985459A (en)*2020-09-182020-11-24北京百度网讯科技有限公司Table image correction method, device, electronic equipment and storage medium
CN112115936A (en)*2020-10-102020-12-22京东方科技集团股份有限公司 A text recognition method, device, storage medium and electronic device
CN112200191A (en)*2020-12-012021-01-08北京京东尚科信息技术有限公司Image processing method, image processing device, computing equipment and medium
CN112200191B (en)*2020-12-012021-07-20北京京东尚科信息技术有限公司Image processing method, image processing device, computing equipment and medium
CN112651399A (en)*2020-12-302021-04-13中国平安人寿保险股份有限公司Method for detecting same-line characters in oblique image and related equipment thereof
CN112651399B (en)*2020-12-302024-05-14中国平安人寿保险股份有限公司Method for detecting same-line characters in inclined image and related equipment thereof
CN112329777A (en)*2021-01-062021-02-05平安科技(深圳)有限公司Character recognition method, device, equipment and medium based on direction detection
CN113537189A (en)*2021-06-032021-10-22深圳市雄帝科技股份有限公司Handwritten character recognition method, device, equipment and storage medium
CN113313117B (en)*2021-06-252023-07-25北京奇艺世纪科技有限公司Method and device for identifying text content
CN113313117A (en)*2021-06-252021-08-27北京奇艺世纪科技有限公司Method and device for recognizing text content
CN114330247A (en)*2021-11-092022-04-12世纪保众(北京)网络科技有限公司Automatic insurance clause analysis method based on image recognition
CN117831038A (en)*2022-01-102024-04-05于胜田 A method and system for character recognition in big data digital archives
CN114596566B (en)*2022-04-182022-08-02腾讯科技(深圳)有限公司Text recognition method and related device
CN114596566A (en)*2022-04-182022-06-07腾讯科技(深圳)有限公司Text recognition method and related device
CN115223168A (en)*2022-06-302022-10-21蔚来汽车科技(安徽)有限公司Text detection method and system
CN115880682A (en)*2022-12-232023-03-31上海浦东发展银行股份有限公司 Image text recognition method, device, equipment, medium and product

Also Published As

Publication numberPublication date
CN111428717B (en)2024-04-26

Similar Documents

PublicationPublication DateTitle
CN111428717B (en) Text recognition method, device, electronic device and computer-readable storage medium
CN114550177B (en) Image processing method, text recognition method and device
CA3204361A1 (en)Image processing method, image processing apparatus, and non-transitory storage medium
CN109582880B (en)Interest point information processing method, device, terminal and storage medium
CN110163087B (en)Face gesture recognition method and system
CN107704857A (en)A kind of lightweight licence plate recognition method and device end to end
CN110956131B (en)Single-target tracking method, device and system
WO2022001256A1 (en)Image annotation method and device, electronic apparatus, and storage medium
CN103971400A (en)Identification code based three-dimensional interaction method and system
CN112989768B (en) Method, device, electronic device and storage medium for correcting multiple-line questions
CN111985469B (en) A method, device and electronic equipment for recognizing characters in images
CN112102404B (en)Object detection tracking method and device and head-mounted display equipment
JP2012043433A (en)Image processing method and apparatus
WO2022002262A1 (en)Character sequence recognition method and apparatus based on computer vision, and device and medium
WO2021164653A1 (en)Method and device for generating animated figure, and storage medium
CN107832331A (en)Generation method, device and the equipment of visualized objects
CN107644105A (en)Question searching method and device
WO2024051731A1 (en)Image correction method and apparatus, computer device and computer readable storage medium
CN117237681A (en) Image processing methods, devices and related equipment
CN115937003A (en)Image processing method, image processing device, terminal equipment and readable storage medium
CN117218364A (en) Three-dimensional target detection method, electronic equipment and storage medium
CN113658274A (en)Individual spacing automatic calculation method for primate species behavior analysis
CN105975566A (en)Image-based information searching method and device
CN117253022A (en)Object identification method, device and inspection equipment
CN111325194A (en)Character recognition method, device and equipment and storage medium

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination
GR01Patent grant
GR01Patent grant

[8]ページ先頭

©2009-2025 Movatter.jp