Movatterモバイル変換


[0]ホーム

URL:


CN113887375A - Text recognition method, device, equipment and storage medium - Google Patents

Text recognition method, device, equipment and storage medium
Download PDF

Info

Publication number
CN113887375A
CN113887375ACN202111137451.6ACN202111137451ACN113887375ACN 113887375 ACN113887375 ACN 113887375ACN 202111137451 ACN202111137451 ACN 202111137451ACN 113887375 ACN113887375 ACN 113887375A
Authority
CN
China
Prior art keywords
text
text recognition
regions
target
region
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111137451.6A
Other languages
Chinese (zh)
Inventor
刘秩铭
邵明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Dahua Technology Co Ltd
Original Assignee
Zhejiang Dahua Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Dahua Technology Co LtdfiledCriticalZhejiang Dahua Technology Co Ltd
Priority to CN202111137451.6ApriorityCriticalpatent/CN113887375A/en
Publication of CN113887375ApublicationCriticalpatent/CN113887375A/en
Pendinglegal-statusCriticalCurrent

Links

Images

Landscapes

Abstract

Translated fromChinese

本申请提供一种文本识别方法、装置、设备及存储介质,涉及图像处理技术领域,用于提高文本识别的准确率。该方法包括:根据已训练的文本检测模型,确定待识别图像的多个目标文本区域;根据已训练的第一文本识别模型,对所述多个目标文本区域进行文本识别,获取所述多个目标文本区域各自对应的第一文本识别结果;根据已训练的第二文本识别模型,对所述多个目标文本区域进行文本识别,获取所述多个目标文本区域各自对应的第二文本识别结果;根据多个第一文本识别结果和多个第二文本识别结果各自包含的第一置信度,确定所述待识别图像对应的多个目标文本识别结果。

Figure 202111137451

The present application provides a text recognition method, device, device and storage medium, which relate to the technical field of image processing and are used to improve the accuracy of text recognition. The method includes: determining a plurality of target text regions of an image to be recognized according to a trained text detection model; performing text recognition on the plurality of target text regions according to a trained first text recognition model, and obtaining the plurality of target text regions. The first text recognition results corresponding to the target text areas; according to the trained second text recognition model, text recognition is performed on the multiple target text areas, and the second text recognition results corresponding to the multiple target text areas are obtained. ; Determine a plurality of target text recognition results corresponding to the to-be-recognized image according to the respective first confidence levels contained in the plurality of first text recognition results and the plurality of second text recognition results.

Figure 202111137451

Description

Text recognition method, device, equipment and storage medium
Technical Field
The application relates to the technical field of image processing, and provides a text recognition method, a text recognition device, text recognition equipment and a storage medium.
Background
In everyday office work of people, situations that text recognition is needed are often encountered, for example, when picture characters, scanned characters or PDF characters are quickly recorded, because the characters cannot be directly copied and pasted, and manual input is too laborious and time-consuming, the purpose of quickly recording the text can be achieved through a text recognition mode. At present, in the existing text recognition method, a Region-based Convolutional Neural Network (R-CNN) is often used to detect a text Region in an image, and then a Back Propagation (BP) Neural Network is used to recognize text characters in the text Region.
However, when the R-CNN is used for text detection, the text distance may be too large or too small, and the distribution direction of the text lines may be a random direction, so that the probability of occurrence of missed detection or multiple detection is high, the edges of the text lines cannot be accurately detected, and finally the accuracy of text detection is low. When the BP neural network is used for text recognition, because the Chinese characters are more in category number, the difference between simplified Chinese characters and traditional Chinese characters exists, and the Chinese characters and English punctuations are similar, the recognition effect on the Chinese characters and the traditional Chinese characters is not good enough, only a specific few text characters can be recognized, the application scenes are few, the recognition difficulty on special symbols and punctuations is large, and therefore the accuracy of text recognition is low, and the normal use of a user is influenced.
Therefore, how to improve the accuracy of text recognition is an urgent problem to be solved.
Disclosure of Invention
The embodiment of the application provides a text recognition method, a text recognition device, text recognition equipment and a storage medium, which are used for improving the accuracy of text recognition.
In one aspect, a text recognition method is provided, and the method includes:
determining a plurality of target text regions of the image to be recognized according to the trained text detection model;
performing text recognition on the plurality of target text regions according to the trained first text recognition model to obtain first text recognition results corresponding to the plurality of target text regions; the first text recognition model determines the first text recognition result according to text semantic information;
performing text recognition on the plurality of target text regions according to the trained second text recognition model to obtain second text recognition results corresponding to the plurality of target text regions; the second text recognition model determines the second text recognition result according to the text length and the text semantic information;
determining a plurality of target text recognition results corresponding to the image to be recognized according to first confidence degrees contained in the first text recognition results and the second text recognition results; the first confidence coefficient is used for indicating the probability that a specific character exists in a target text region corresponding to the text recognition result; and one target text area corresponds to a plurality of recognition results, and the recognition result with the highest confidence degree in the plurality of recognition results is the target text recognition result of the target text area.
In one aspect, an apparatus for text recognition is provided, the apparatus comprising:
the text region determining unit is used for determining a plurality of target text regions of the image to be recognized according to the trained text detection model;
a first recognition result determining unit, configured to perform text recognition on the multiple target text regions according to a trained first text recognition model, and obtain first text recognition results corresponding to the multiple target text regions; the first text recognition model determines the first text recognition result according to text semantic information;
a second recognition result determining unit, configured to perform text recognition on the multiple target text regions according to a trained second text recognition model, and obtain second text recognition results corresponding to the multiple target text regions; the second text recognition model determines the second text recognition result according to the text length and the text semantic information;
the target recognition result determining unit is used for determining a plurality of target text recognition results corresponding to the image to be recognized according to first confidence degrees contained in the plurality of first text recognition results and the plurality of second text recognition results; the first confidence coefficient is used for indicating the probability that a specific character exists in a target text region corresponding to the text recognition result; and one target text area corresponds to a plurality of recognition results, and the recognition result with the highest confidence degree in the plurality of recognition results is the target text recognition result of the target text area.
Optionally, the text region determining unit is specifically configured to:
determining a first probability that each pixel point in the image to be recognized is a central point of a single character and a second probability that each pixel point is a central point between any two adjacent characters according to the trained text detection model;
obtaining a plurality of local image areas according to the first probability;
for each local image region, segmenting each local image region according to the second probability, and determining a plurality of candidate text regions corresponding to each local image region;
and determining a plurality of target text regions of the image to be recognized according to a plurality of candidate text regions corresponding to the plurality of local image regions respectively.
Optionally, the text region determining unit is further specifically configured to:
determining a second confidence degree corresponding to each of the candidate text regions; wherein the second confidence level is used to indicate a probability that text is present in the candidate text region;
determining, for one candidate text region of the plurality of candidate text regions, whether a second confidence corresponding to the one candidate text region is greater than a set second confidence threshold;
determining the one candidate text region as the target text region upon determining that the one candidate text region is greater than the set second confidence threshold.
Optionally, the text region determining unit is further specifically configured to:
when the confidence coefficient is determined to be larger than a set second confidence coefficient threshold value, performing binarization processing on the candidate text region to obtain a first candidate text region;
performing connected domain analysis on the first candidate text region, and determining whether the first candidate text region is a connected region; the connected region is an image region which has the same pixel value and is formed by non-background pixel points adjacent in position;
and if the first candidate text region is determined to be the connected region, determining the first candidate text region as a target text region.
Optionally, the text region determining unit is further specifically configured to:
after the connected region is determined, determining a plurality of included angles between a plurality of text sub-regions in the first candidate text region and a preset first coordinate axis; wherein one included angle corresponds to one text subarea; when any two adjacent included angles in the plurality of included angles are different, determining that a text region part formed by text subregions corresponding to any two adjacent included angles in the first candidate text region has a bending phenomenon;
sequentially determining whether the difference value between two adjacent included angles in the plurality of included angles is larger than a set angle threshold value;
when the difference value between two adjacent included angles is larger than a set angle threshold value, determining a boundary line between text sub-regions corresponding to the two adjacent included angles respectively corresponding to the difference value larger than the set angle threshold value in the first candidate text region as a dividing line;
and acquiring a plurality of target text sub-regions according to the dividing lines, and determining the target text sub-regions as target text regions.
Optionally, the apparatus further includes a text recognition preprocessing unit, where the text recognition preprocessing unit is configured to:
based on the text direction classification function of the trained first text recognition model, performing text direction classification on the plurality of target text regions to obtain a plurality of first target text regions;
based on the text correction sub-function of the trained first text recognition model, performing text correction on the plurality of first target text regions to obtain a plurality of second target text regions;
performing text typesetting direction classification on the plurality of second target text regions based on the text typesetting direction classification function of the trained first text recognition model to obtain a plurality of third target text regions;
and respectively inputting the plurality of third target text areas into the trained first text recognition model for text recognition, and/or respectively inputting the trained second text recognition model for text recognition.
Optionally, the target recognition result determining unit is specifically configured to:
respectively performing text recognition on the plurality of target text regions according to a plurality of trained second text recognition models to obtain second text recognition results corresponding to the plurality of target text regions in each trained second text recognition model in the plurality of trained second text recognition models;
and determining a plurality of target text recognition results corresponding to the image to be recognized according to the respective corresponding confidence degrees of the plurality of first text recognition results and the respective corresponding confidence degrees of the plurality of second text recognition results of each trained second text recognition model.
Optionally, the target recognition result determining unit is further specifically configured to:
determining whether the same text recognition result exists in a first text recognition result and a plurality of second text recognition results corresponding to one target text region in the target text regions;
when the same text recognition result is determined to exist, increasing the confidence of the same text recognition result;
determining the text recognition result corresponding to the maximum confidence as the target text recognition result of the target text region according to the confidence of the same text recognition result and the confidence of the other text recognition results of the target text region;
and determining a plurality of target text recognition results corresponding to the image to be recognized according to the target text recognition results corresponding to the plurality of target text regions respectively.
In one aspect, a computer device is provided, comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the method of the above aspect when executing the computer program.
In one aspect, a computer storage medium is provided having computer program instructions stored thereon that, when executed by a processor, implement the steps of the method of the above aspect.
In the embodiment of the application, a plurality of target text regions of an image to be recognized can be determined according to a trained text detection model; further, text recognition can be performed on the plurality of target text regions according to the trained first text recognition model to obtain first text recognition results corresponding to the plurality of target text regions; the text recognition can be carried out on the plurality of target text regions according to the trained second text recognition model to obtain second text recognition results corresponding to the plurality of target text regions; therefore, the target text recognition results corresponding to the image to be recognized can be determined according to the first confidence degrees contained in the first text recognition results and the second text recognition results. It can be seen that, in the embodiment of the present application, since the first text recognition model determines the first text recognition result according to the text semantic information, therefore, can reason the characters in the target text area to improve the recognition accuracy of Chinese and English punctuations and words, and the second text recognition model determines a second text recognition result based on the text length and the text semantic information, and therefore, which can solve the alignment problem of indefinite-length sequences, and further, when comprehensively determining a target text recognition result through the first text recognition model and the second text recognition model, for the same target text region, relatively more text recognition results with different confidence degrees can be obtained, and further, on the basis, the recognition result with the maximum confidence coefficient is selected to serve as the target text recognition result of the target text region, and the accuracy of text recognition can be further improved.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments or related technologies, the drawings needed to be used in the description of the embodiments or related technologies are briefly introduced below, it is obvious that the drawings in the following description are only the embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.
Fig. 1 is a schematic view of an application scenario provided in an embodiment of the present application;
fig. 2 is a schematic flowchart of a text recognition method according to an embodiment of the present application;
FIG. 3 is a schematic diagram of a network structure of a text detection model in which a VGG 16 is adopted in a backbone network;
FIG. 4 is a flowchart illustrating a process of screening text regions according to an embodiment of the present application;
FIG. 5 is a schematic flow chart illustrating screening a text region according to an embodiment of the present application;
FIG. 6 is a schematic view of an irregular quadrilateral provided in accordance with an embodiment of the present application;
FIG. 7 is a schematic diagram of a text sample provided in an embodiment of the present application;
FIG. 8 is a flowchart illustrating a text region splitting process according to an embodiment of the present disclosure;
FIG. 9 is a flowchart illustrating a text recognition preprocessing provided by an embodiment of the present application;
FIG. 10 is a schematic flow chart illustrating text recognition provided by an embodiment of the present application;
FIG. 11 is a schematic flow chart illustrating text recognition according to an embodiment of the present application;
fig. 12 is a schematic structural diagram of a text recognition apparatus according to an embodiment of the present application;
fig. 13 is a schematic structural diagram of a computer device according to an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the technical solutions in the embodiments of the present application will be described clearly and completely with reference to the accompanying drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application. In the present application, the embodiments and features of the embodiments may be arbitrarily combined with each other without conflict. Also, while a logical order is shown in the flow diagrams, in some cases, the steps shown or described may be performed in an order different than here.
At present, in the existing text recognition method, R-CNN is often adopted to detect a text region in an image, and then a BP neural network is adopted to recognize text characters in the text region. However, when the R-CNN is used for text detection, the text distance may be too large or too small, and the distribution direction of the text lines may be a random direction, so that the probability of occurrence of missed detection or multiple detection is high, the edges of the text lines cannot be accurately detected, and finally the accuracy of text detection is low. When the BP neural network is used for text recognition, because the Chinese characters are more in category number, the difference between simplified Chinese characters and traditional Chinese characters exists, and the Chinese characters and English punctuations are similar, the recognition effect on the Chinese characters and the traditional Chinese characters is not good enough, only a specific few text characters can be recognized, the application scenes are few, the recognition difficulty on special symbols and punctuations is large, and therefore the accuracy of text recognition is low, and the normal use of a user is influenced.
Based on this, in the embodiment of the application, a plurality of target text regions of the image to be recognized may be determined according to the trained text detection model; further, text recognition can be performed on the plurality of target text regions according to the trained first text recognition model to obtain first text recognition results corresponding to the plurality of target text regions; the text recognition can be carried out on the plurality of target text regions according to the trained second text recognition model to obtain second text recognition results corresponding to the plurality of target text regions; therefore, the target text recognition results corresponding to the image to be recognized can be determined according to the first confidence degrees contained in the first text recognition results and the second text recognition results. It can be seen that, in the embodiment of the present application, since the first text recognition model determines the first text recognition result according to the text semantic information, therefore, can reason the characters in the target text area to improve the recognition accuracy of Chinese and English punctuations and words, and the second text recognition model determines a second text recognition result based on the text length and the text semantic information, and therefore, which can solve the alignment problem of indefinite-length sequences, and further, when comprehensively determining a target text recognition result through the first text recognition model and the second text recognition model, for the same target text region, relatively more text recognition results with different confidence degrees can be obtained, and further, on the basis, the recognition result with the maximum confidence coefficient is selected to serve as the target text recognition result of the target text region, and the accuracy of text recognition can be further improved.
After introducing the design concept of the embodiment of the present application, some simple descriptions are provided below for application scenarios to which the technical solution of the embodiment of the present application can be applied, and it should be noted that the application scenarios described below are only used for describing the embodiment of the present application and are not limited. In a specific implementation process, the technical scheme provided by the embodiment of the application can be flexibly applied according to actual needs.
As shown in fig. 1, a schematic view of an application scenario provided in the embodiment of the present application is provided, where the application scenario for text recognition may include atext recognition device 10 and anotherdevice 11.
Theother device 11 may be a device storing the image to be recognized, for example a device containing a database. Alternatively, theother device 11 may also be a device for generating an image to be recognized, such as a mobile phone, a camera, or the like.
Thetext recognition apparatus 10 may be a computer apparatus having a certain processing capability, and may be, for example, a Personal Computer (PC), a notebook computer, a server, or the like. The server may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a network service, cloud communication, middleware service, a domain name service, a security service, a CDN, and a big data and artificial intelligence platform, but is not limited thereto. Thetext recognition device 10 may include one ormore processors 101,memory 102, and I/O interfaces 103 to interact with other devices, among other things. In addition, thetext recognition device 10 may further configure adatabase 104, and thedatabase 104 may be used to store data such as network model parameters, confidence degrees, and the like involved in the scheme provided by the embodiment of the present application. Thememory 102 of thetext recognition device 10 may store therein program instructions of the text recognition method provided in the embodiment of the present application, and when the program instructions are executed by theprocessor 101, the program instructions can be used to implement the steps of the text recognition method provided in the embodiment of the present application, so as to improve the accuracy of text recognition.
In the embodiment of the present application, when the I/O interface 103 detects an image to be recognized input from anotherdevice 11, the program instructions of the text recognition method stored in thememory 102 are called, and theprocessor 101 executes the program instructions, so as to perform text recognition on the image to be recognized, and while obtaining a text recognition result, the accuracy of the text recognition is improved, and data such as confidence level generated during the execution of the program instructions and the text recognition result are stored in thedatabase 104.
Of course, the method provided in the embodiment of the present application is not limited to be used in the application scenario shown in fig. 1, and may also be used in other possible application scenarios, and the embodiment of the present application is not limited. The functions that can be implemented by each device in the application scenario shown in fig. 1 will be described in the following method embodiments, and will not be described in detail herein. Hereinafter, the method of the embodiment of the present application will be described with reference to the drawings.
As shown in fig. 2, a flowchart of a text recognition method provided in an embodiment of the present application, which can be executed by thetext recognition apparatus 10 in fig. 1, is described as follows.
Step 201: and determining a plurality of target text regions of the image to be recognized according to the trained text detection model.
In the embodiment of the present application, in order to facilitate subsequent screening of text regions and to enable better processing of text boundary regions that are not strictly surrounded, the image to be recognized may be a processed saliency image, for example, an image similar to a thermodynamic diagram (Heatmap), or a Gaussian map (Gaussian map). The text detection model may be a text detection model based on a Visual Geometry Group (VGG), for example, as shown in fig. 3, a network structure schematic diagram of a text detection model of VGG 16 is adopted for a backbone network provided in an embodiment of the present application, wherein an image feature of an image to be recognized may be extracted according to a network structure of VGG 16 in a trained text detection model, and then the image feature may be regressed in a manner of alternating appearance of deconvolution (UpConv Block) and upsampling (UpSample), so that 2 channel feature maps of thesize 1/2 of the image to be recognized may be obtained, that is, a first probability that each pixel point in the image to be recognized is a center point of a single character and a second probability that each pixel point is a center point between any two adjacent characters may be determined. Register score and Affinity score shown in FIG. 3. Specifically, when the image to be recognized is a saliency image, the region score may be represented as a character-level gaussian heat map, and the Affinity score may be represented as a gaussian heat map connected between characters.
Furthermore, a plurality of local image areas containing text lines can be obtained according to the determined first probability region score; then, for each local image region, segmenting each local image region according to the determined second probability Affinity score, thereby determining a plurality of candidate text regions corresponding to each local image region; furthermore, a plurality of target text regions of the image to be recognized may be determined according to a plurality of candidate text regions corresponding to the plurality of local image regions, respectively.
Like this, detect single character (region score) and connection relation (affinity score) between the characters first, then confirm the way of the final text line according to the connection relation between the characters, because only need pay close attention to the content of the character level and does not need to pay close attention to the whole text example, therefore, can make the small sense field also predict big text and long text.
Step 202: and performing text recognition on the plurality of target text regions according to the trained first text recognition model to obtain first text recognition results corresponding to the plurality of target text regions.
In the embodiment of the application, the first text recognition model determines a first text recognition result according to the text semantic information. For example, the first text recognition model may be a text recognition model including a transform network structure based on an attention mechanism, and a main structure of the text recognition model may be a residual network Resnet34, so that texts such as punctuation categories and english words and chinese words can be accurately inferred according to semantic information of texts shown in a target text region.
In practical application, the target text region may be input into the trained first text recognition model, and the trained first text recognition model may perform text recognition on the input target text region, so as to obtain a first text recognition result corresponding to the target text region.
Step 203: and performing text recognition on the plurality of target text regions according to the trained second text recognition model to obtain second text recognition results corresponding to the plurality of target text regions.
In the embodiment of the application, the second text recognition model determines a second text recognition result according to the text length and the text semantic information. For example, the second text recognition model is a recognition model formed by the first text recognition model and a third text recognition model, wherein a Network structure of the third text recognition model may sequentially include a Residual Network (ResNet) 34, a Long-Short Term Memory (LSTM) artificial neural Network, a fully connected layer, and a connection dominant time Classification (CTCloss). The CTCloss can solve the alignment problem of indefinite sequences. And zooming the transversely typeset text images and the vertically typeset text images, and obtaining a recognition result through a text recognition model, wherein the recognition result contains the confidence coefficient of each character.
Specifically, when the second recognition model is used, the respective text recognition results of the first text recognition model and the third text recognition model can be respectively determined, and then the respective text recognition results of the first text recognition model and the third text recognition model are processed according to a weighted sum mode to determine the text recognition result of the second recognition model, so that the second recognition model can not only solve the alignment problem of indefinite-length sequences, but also determine the text recognition result according to semantic information of the text, and can be considered in many aspects to improve the accuracy of text recognition.
In a possible implementation manner, in order to further improve the accuracy of text recognition, in the implementation of the present application, a model fusion manner may be adopted to improve the recognition accuracy of the text recognition model, that is, the second text recognition model may be obtained by fusing a plurality of models, and for example, 5 trained second text recognition models may be screened out, where the 5 trained second text recognition models have different weight parameters, and a new weight parameter may be obtained by averaging the weight parameters, so that a new second text recognition model may be obtained, and when text recognition is performed by using the new second text recognition model, the model performance of the text recognition model may be further improved, and the accuracy of text recognition may be improved.
Step 204: and determining a plurality of target text recognition results corresponding to the image to be recognized according to the first confidence degrees contained in the first text recognition results and the second text recognition results.
In this embodiment of the present application, the first confidence may be used to indicate a probability that a specific character exists in a target text region corresponding to the text recognition result; one target text area may correspond to a plurality of recognition results, and the recognition result with the highest confidence level among the plurality of recognition results is the target text recognition result of one target text area.
In practical application, when the text recognition result is obtained through the text recognition model, a first confidence level is obtained at the same time, and the first confidence level is included in the text recognition result. Since the first confidence level is used to indicate a probability that a specific character exists in the target text region corresponding to the text recognition result, that is, the recognition accuracy of the text, in the embodiment of the present application, the target text result corresponding to each target text region may be determined by comparing the magnitudes of the first confidence levels included in the respective text recognition results, for example, fortarget text region 1, there correspond to 1 first text recognition result and 1 second text recognition result, where the confidence level that character "a" exists in the first text recognition result is 0.9, and the confidence level that character "a" exists in the second text recognition result is 0.8, and then the first text recognition result may be used as the target text recognition result oftarget text region 1.
In one possible embodiment, when text region detection is performed, there may be a case of erroneous detection, for example, a region that is not a character is detected as a character region, and in this case, if text recognition is performed on a directly detected text region, the text region may affect the recognition accuracy of the text, and in a serious case, the text recognition accuracy may be lowered.
Therefore, in the embodiment of the application, the text regions can be screened in a manner of determining the probability of the text existing in each text region, so as to further provide for improving the accuracy of text recognition. Since the screening process for each candidate text region is the same, the following description will be given by taking the candidate text region a as an example, and as shown in fig. 4, a schematic flow chart for screening the text region provided in the embodiment of the present application is provided, and a specific flow is described as follows.
Step 401: determining a second confidence degree corresponding to each of the plurality of candidate text regions.
In implementations of the present application, the second confidence level may be used to indicate a probability that text is present in the candidate text region.
In practical applications, when the image to be recognized is a saliency image, such as a gaussian image, for each text region, a central point of the text region may be determined first, and then a confidence map (e.g., a gaussian distribution confidence) corresponding to the text region may be calculated according to the central point and the length and width of the text region.
Step 402: and determining whether the second confidence corresponding to the candidate text region A is greater than a set second confidence threshold value or not for the candidate text region A in the candidate text regions.
Step 403: and when the determination is larger than the set second confidence threshold, determining the candidate text region A as the target text region.
In practical applications, for example, the set second confidence threshold may be set to 0.6, and then when the second confidence of the candidate text region a is greater than 0.6, the candidate text region a may be determined as the target text region. Otherwise, it indicates that the candidate text region a has a higher probability of being a text region obtained by error detection, and therefore, the candidate text region a cannot be determined as the target text region, that is, the candidate text region a can be removed from the total candidate text regions, so as to achieve the purpose of filtering and screening the text regions, and further improve the accuracy of subsequent text recognition.
In one possible implementation, the accuracy of text recognition may be reduced due to the larger spacing between individual characters in the text region. Therefore, in the embodiment of the present application, in order to further improve the accuracy of text recognition, after determining that the second confidence of the candidate text region is greater than the second confidence threshold, the text region may be further screened according to connected domain analysis, that is, whether the text region is connected is analyzed, so as to further improve the accuracy of subsequent text recognition. Since the screening process for each candidate text region is the same, the following description also takes the candidate text region a as an example, and as shown in fig. 5, another schematic flow chart for screening the text region provided in the embodiment of the present application is provided, and a specific flow chart is described as follows.
Step 501: and when the confidence coefficient is determined to be larger than the set second confidence coefficient threshold value, carrying out binarization processing on the candidate text region A to obtain a first candidate text region.
In this embodiment of the application, after determining that the second confidence of the candidate text region is greater than the second confidence threshold, the candidate text region may be subjected to binarization processing, for example, the pixel value of each pixel point corresponding to the character may be set to 1, and the pixel value of each pixel point corresponding to the background image may be set to 0, so as to convert the original candidate text region into the first candidate text region in the form of a binarized image.
Step 502: and performing connected component analysis on the first candidate text region to determine whether the first candidate text region is a connected component.
In this embodiment, the connected region may be an image region composed of non-background pixels having the same pixel value and adjacent positions.
Following the above example, since the pixel value of each pixel in the first candidate text region is 0 or 1, after the first candidate text region in the form of the binarized image is obtained, whether the first candidate text region is a connected region can be determined by performing connected region analysis on the first candidate text region, that is, whether the first candidate text region is an image region composed of a plurality of pixels having pixel values and adjacent positions, for example, all pixels having pixel values of 1, can be determined.
Step 503: and if the first candidate text region is determined to be the connected region, determining the first candidate text region as the target text region.
In an embodiment of the present application, when the first candidate text region is determined to be a connected region, then the first candidate text region may be determined to be the target text region. Otherwise, it indicates that the distance between the characters in the candidate text region is large, which is easy to affect the accuracy of text recognition, so that the first candidate text region cannot be determined as the target text region, that is, the first candidate text region can be removed from the total candidate text region, thereby achieving the purpose of filtering and screening the text regions, and further improving the accuracy of subsequent text recognition.
In the embodiment of the present application, in order to improve the accuracy of text recognition, in the model training, the shape of the text region marked in the sample image is an irregular quadrilateral, as shown in fig. 6, which is a schematic diagram of an irregular quadrilateral provided in the embodiment of the present application, and the irregular quadrilateral is determined by at least 4 coordinate points. Furthermore, the bounding box of the text region after the connected component filtering is not necessarily a text box determined by four coordinate points, but may be an irregular quadrilateral text box determined by a plurality of coordinate points as shown in fig. 6.
In one possible implementation, in the daily life of people, as shown in fig. 7, which is a schematic view of a text sample provided in the embodiment of the present application, the text may be displayed in any direction, for example, in an inclined form or in a curved form, in addition to the conventional horizontal display and vertical display. However, in different display cases of these texts, the difficulty of identifying the curved text region is high, and therefore, the accuracy of identifying the curved text region is low.
Furthermore, in the embodiment of the present application, in order to improve the accuracy of text recognition, a method of splitting a curved text region into a plurality of non-curved text regions may be adopted to further improve the accuracy of text recognition. Fig. 8 is a schematic flow chart of splitting a text region according to an embodiment of the present application, and a specific flow is described as follows.
Step 801: after the connected region is determined, a plurality of included angles between a plurality of text sub-regions in the first candidate text region and a preset first coordinate axis are determined.
In the embodiment of the present application, the direction indicated by the first coordinate axis may be a horizontal direction. As shown in fig. 6, the curved text region "STOP" can divide the text region corresponding TO "STOP" into a plurality of text sub-regions according TO the degree of curvature, for example, the text region corresponding TO "STOP" corresponds TO 3 different inclinations, that is, 3 different included angles with the horizontal direction, which are respectively an includedangle 1, an included angle 2, and an includedangle 3, and further, the text region corresponding TO "STOP" can be divided into 3 text sub-regions according TO the 3 different included angles, which are respectively atext sub-region 1 corresponding TO "S", a text sub-region 2 corresponding TO ", and atext sub-region 3 corresponding TO" P ".
Furthermore, it can be seen that an included angle of the text region corresponding to the STOP may correspond to one text sub-region, and when any two adjacent included angles are different, in the text region corresponding to the STOP, a bending phenomenon exists in a text region portion formed by the text sub-regions corresponding to the two adjacent included angles.
Step 802: and sequentially determining whether the difference value between two adjacent included angles in the plurality of included angles is larger than a set angle threshold value.
In the practice of the present application, the set angle threshold may be set at 10 °.
In order to further improve the accuracy of text recognition, when it is determined that the degree of curvature between two adjacent text sub-regions is large, the two adjacent text sub-regions need to be split, that is, it is determined whether the two adjacent text sub-regions need to be split by determining whether a difference value between included angles corresponding to the two adjacent text sub-regions is greater than a set angle threshold.
Continuing with the above example, when splitting the text region corresponding to "STOP", it may be determined whether the difference between the includedangle 1 and the included angle 2 is greater than the set angle threshold, and then it may be determined whether the difference between the included angle 2 and the includedangle 3 is greater than the set angle threshold.
Step 803: and when the difference value between the two adjacent included angles is determined to be larger than the set angle threshold value, determining a boundary between text sub-regions corresponding to the two adjacent included angles respectively corresponding to the difference value larger than the set angle threshold value in the first candidate text region as a dividing line.
In this embodiment of the application, when it is determined that the difference between two adjacent included angles is greater than the set angle threshold, that is, it indicates that a text region portion jointly formed by text sub-regions corresponding to the two adjacent included angles has a bending phenomenon, and the bending degree has affected the accuracy of text recognition, at this time, the text sub-regions corresponding to the two adjacent included angles need to be split, and then, a boundary between the text sub-regions corresponding to the two adjacent included angles can be determined as a dividing line.
As shown in fig. 6, if the includedangle 1 is 45 °, the included angle 2 is 0 °, and the set angle threshold is 10 °, it can be seen that the difference between the includedangle 1 and the included angle 2 is 45 °, which is obviously greater than the angle threshold of 10 °, and therefore, thetext subregion 1 corresponding TO "S" and the text subregion 2 corresponding TO "need TO be split. Since there is a boundary between thetext sub-area 1 and the text sub-area 2, as shown in fig. 6, a dashed line between "S" and "TO" is a boundary between thetext sub-area 1 and the text sub-area 2, when splitting is performed, the boundary can be determined as a split line between thetext sub-area 1 and the text sub-area 2.
Step 804: and acquiring a plurality of target text sub-regions according to the dividing lines, and determining the plurality of target text sub-regions as target text regions.
In the implementation of the present application, after a dividing line corresponding to a first candidate text region is determined, the first candidate text region may be divided based on the dividing line, and then, a plurality of target text sub-regions may be obtained, and the plurality of target text sub-regions may be determined as target text regions, so that text recognition may be performed based on the determined target text regions.
For example, as shown in fig. 6, assuming that an includedangle 1 is 45 °, an included angle 2 is 0 °, an includedangle 3 is-45 °, and a set angle threshold is 10 °, the text region corresponding TO "STOP" may be finally divided into 3 target text sub-regions, that is, thetarget text sub-region 1 corresponding TO "S", the target text sub-region 2 corresponding TO ", and thetarget text sub-region 3 corresponding TO" P ", and further, thetarget text sub-region 1, the target text sub-region 2, and thetarget text sub-region 3 may be determined as the target text region TO be subjected TO text recognition.
In a possible implementation, since the typesetting directions of the characters are not necessarily all standard typesetting directions, that is, they are not necessarily all common horizontal display and vertical display, but may be displayed in any directions. In addition, since there may be a phenomenon of wrong cutting or rotation of characters in the text, for example, when taking a picture, there may be a case of taking a rectangle as a parallelogram, that is, a phenomenon of taking a normal object as an inclined object occurs, in order to further improve the accuracy of text recognition, in the embodiment of the present application, before text recognition is formally performed, the characters in the target text region may be preprocessed. Fig. 9 is a schematic flow chart of text recognition preprocessing provided in the embodiment of the present application, and a specific flow is described as follows.
Step 901: and carrying out text direction classification on the plurality of target text regions based on a text direction classification function of the trained first text recognition model to obtain a plurality of first target text regions.
In the embodiment of the present application, a text direction detection model composed of a Convolutional Neural Network (CNN) and fully connected layers may be used to classify the text directions in the target text region, specifically, the text directions may be classified into 4 types, i.e., 0 °, 90 °, 180 °, and 270 °, or the text directions may be classified into 8 types, i.e., 0 °, 45 °, 90 °, 135 °, 180 °, 225 °, 270 °, and 315 °. Of course, how to perform the direction classification can be set according to the user's needs.
Step 902: and performing text correction on the plurality of first target text regions based on a text correction sub-function of the trained first text recognition model to obtain a plurality of second target text regions.
In the embodiment of the application, a text correction network can be constructed based on affine transformation and interpolation principles, and based on the text correction network, a text image can have spatial invariance, so that a text character which is cut or rotated is corrected into a normal character typesetting form, and meanwhile, the text content is kept unchanged. That is, the angled text region may be rectified into a horizontal or vertical text box. At this time, a text region composed of multiple points may be processed such that the coordinates of the text box are represented by four corner points.
Step 903: and performing text typesetting direction classification on the plurality of second target text regions based on the text typesetting direction classification function of the trained first text recognition model to obtain a plurality of third target text regions.
In the embodiment of the present application, a text typesetting direction classification detection model formed by a convolutional neural network and a full connection layer may be used to perform text typesetting direction detection on a corrected text region, and specifically, the text typesetting directions may be classified into 2 types of horizontal typesetting and vertical typesetting (similar to the vertical typesetting manner of ancient languages).
Furthermore, when text recognition is carried out subsequently, the horizontally typeset text box can be sent to a horizontal text recognition network for text recognition, and the vertically typeset text box can be sent to a vertical text recognition network for text recognition. The network structure of the horizontal text recognition network is the same as that of the vertical text recognition network, and only the size of the input image is different, wherein the size of the input image of the horizontal text recognition model can be as follows: 320 width and 40 height; the input image size of the vertical text recognition model may be: the width is 40 and height is 320.
Step 904: and respectively inputting the plurality of third target text areas into the trained first text recognition model for text recognition, and/or respectively inputting the trained second text recognition model for text recognition.
In the embodiment of the present application, after the target text region is subjected to text typesetting direction classification detection to obtain a third target text region, the third target text region may be input into the trained first text recognition model, so as to accurately infer texts such as punctuation marks, english words, chinese words, and the like according to semantic information of the text shown in the third target text region.
And/or the third target text area can be input into a trained second text recognition model to determine a second text recognition result according to the text length and the text semantic information, so that the alignment problem of the indefinite sequence is solved while the text recognition result is determined according to the text semantic information, and the accuracy of text recognition is improved.
In one possible implementation, in order to further improve the accuracy of text recognition, when the target text recognition result is determined according to the confidence degrees corresponding to the first text recognition result and the second text recognition result, more than one trained second text recognition model may be used, that is, the target text recognition result may be determined from a plurality of second text recognition results corresponding to the plurality of trained second text recognition models and a plurality of first text recognition results of one trained first text recognition model. Fig. 10 is a schematic view of another flow of text recognition provided in the embodiment of the present application, and a specific flow is described as follows.
Step 1001: and respectively carrying out text recognition on the plurality of target text regions according to the plurality of trained second text recognition models to obtain second text recognition results corresponding to the plurality of target text regions in each of the plurality of trained second text recognition models.
In practical application, for example, text recognition needs to be performed on a target text region a, there are currently 3 trained second text recognition models with different weight parameters and 1 trained first text recognition model, so that for the target text region a, the 3 second text recognition models respectively obtain 1 second text recognition result, that is, 3 second text recognition results can be obtained in total, and the first text recognition model obtains 1 first text recognition result, that is, for the target text region a, 4 text recognition results can be obtained in total.
Step 1002: and determining a plurality of target text recognition results corresponding to the image to be recognized according to the confidence degrees corresponding to the first text recognition results and the confidence degrees corresponding to the second text recognition results of each trained second text recognition model.
In practical application, a preferred voting mode can be adopted for the character result to further improve the accuracy of the text recognition result. Since the preferred voting process for all the target text regions is the same, taking the example that the target text region a has the same text recognition result B as an example, as shown in fig. 11, another schematic flow chart of text recognition provided in the embodiment of the present application is provided, and a specific flow chart is described as follows.
Step 1101: for a target text region a of the plurality of target text regions, it is determined whether the same text recognition result B exists in the first text recognition result and the plurality of second text recognition results corresponding to the target text region a.
Step 1102: when it is determined that the same text recognition result B exists, the confidence of the same text recognition result B is increased.
In the embodiment of the present application, the confidence of the same text recognition result may be increased, that is, the same text recognition result may be given a higher weight. Specifically, assuming that 2 text recognition results among the text recognition results for the target text region a are all the text recognition results B, that is, the same text recognition result exists, the confidence of the text recognition result B may be increased by performing the following equation:
P=(p1+p2+…+pn)×1.1n-1
where P is the increased confidence of the text recognition result B, PnIs the confidence of the nth text recognition result B, and n is the repetition number of the text recognition result B.
Step 1103: and determining the text recognition result corresponding to the maximum confidence as the target text recognition result of the target text region A according to the same confidence of the text recognition result B and the confidence of the rest text recognition results of the target text region A.
Step 1104: and determining a plurality of target text recognition results corresponding to the image to be recognized according to the target text recognition results corresponding to the plurality of target text regions respectively.
In one possible embodiment, in order to recognize as many texts as possible, for example, simplified characters, traditional characters, english, chinese and english punctuations, etc. Because the fonts and the text typesetting directions of the characters in the network images are variable, the texts in the data set comprise data in the forms of horizontal typesetting, vertical typesetting, inclined angle typesetting, bending typesetting (such as circular typesetting) and the like. Therefore, before text recognition is performed, a basic database for model training needs to be established, and each sample for model training is provided in the basic database. In the embodiment of the present application, the network image may be sorted as a training set, where an annotation tag in the network image may include a text region coordinate and text content. The labeling mode is to label the texts with similar distances in a text area as a text box to identify the texts, for example, for a semantic text, when the distance is more than one character, the text can be regarded as characters belonging to two text areas and labeled as two text boxes. In practical application, the label of the training set text can be a 6000+ dictionary library, and the dictionary library contains simplified Chinese characters, traditional Chinese characters, English letters, numbers, Chinese and English punctuations, special characters and the like. Further, after text recognition by the text recognition model, a dictionary library tag including the above-described dictionary library tag may be output.
In summary, in the embodiment of the present application, since the first text recognition model determines the first text recognition result according to the text semantic information, the characters in the target text region can be inferred to improve the recognition accuracy of the punctuation marks and words in chinese and english, and the second text recognition model determines the second text recognition result according to the text length and the text semantic information, so that the alignment problem of the indefinite-length sequences can be solved, and further, when the target text recognition result is determined comprehensively by the first text recognition model and the second text recognition model, for the same target text region, relatively many text recognition results with different confidence degrees can be obtained, and on the basis, the recognition result with the largest confidence degree is selected as the target text recognition result of the target text region, the accuracy of text recognition can be further improved.
As shown in fig. 12, based on the same inventive concept, an embodiment of the present application provides a text recognition apparatus 120, including:
a textregion determining unit 1201, configured to determine a plurality of target text regions of the image to be recognized according to the trained text detection model;
a first recognitionresult determining unit 1202, configured to perform text recognition on the multiple target text regions according to the trained first text recognition model, and obtain first text recognition results corresponding to the multiple target text regions; the first text recognition model determines a first text recognition result according to the text semantic information;
a second recognitionresult determining unit 1203, configured to perform text recognition on the multiple target text regions according to the trained second text recognition model, and obtain second text recognition results corresponding to the multiple target text regions respectively; the second text recognition model determines a second text recognition result according to the text length and the text semantic information;
a target recognition result determining unit 1204, configured to determine, according to first confidence levels included in the plurality of first text recognition results and the plurality of second text recognition results, a plurality of target text recognition results corresponding to the image to be recognized; the first confidence coefficient is used for indicating the probability that a specific character exists in a target text region corresponding to the text recognition result; and one target text area corresponds to a plurality of recognition results, and the recognition result with the highest confidence degree in the plurality of recognition results is the target text recognition result of one target text area.
Optionally, the textregion determining unit 1201 is specifically configured to:
determining a first probability that each pixel point in the image to be recognized is a central point of a single character and a second probability that each pixel point is a central point between any two adjacent characters according to the trained text detection model;
obtaining a plurality of local image areas according to the first probability;
for each local image region, segmenting each local image region according to the second probability, and determining a plurality of candidate text regions corresponding to each local image region;
and determining a plurality of target text regions of the image to be recognized according to a plurality of candidate text regions corresponding to the plurality of local image regions respectively.
Optionally, the textregion determining unit 1201 is further specifically configured to:
determining a second confidence degree corresponding to each of the candidate text regions; wherein the second confidence level is used for indicating the probability of text existence in the candidate text region;
determining whether a second confidence corresponding to one candidate text region in the plurality of candidate text regions is greater than a set second confidence threshold;
and determining a candidate text region as the target text region when the determination is larger than the set second confidence threshold.
Optionally, the textregion determining unit 1201 is further specifically configured to:
when the confidence coefficient is determined to be larger than the set second confidence coefficient threshold value, performing binarization processing on a candidate text region to obtain a first candidate text region;
performing connected domain analysis on the first candidate text region to determine whether the first candidate text region is a connected region; the connected region is an image region which has the same pixel value and is formed by non-background pixel points adjacent in position;
and if the first candidate text region is determined to be the connected region, determining the first candidate text region as the target text region.
Optionally, the textregion determining unit 1201 is further specifically configured to:
after the connected region is determined, determining a plurality of included angles between a plurality of text sub-regions in the first candidate text region and a preset first coordinate axis; wherein one included angle corresponds to one text subarea; when any two adjacent included angles in the plurality of included angles are different, determining that a text region part formed by text subregions corresponding to any two adjacent included angles in the first candidate text region has a bending phenomenon;
sequentially determining whether the difference value between two adjacent included angles in the plurality of included angles is larger than a set angle threshold value;
when the difference value between two adjacent included angles is larger than a set angle threshold value, determining a boundary between text sub-regions corresponding to the two adjacent included angles corresponding to the difference value larger than the set angle threshold value in the first candidate text region as a dividing line;
and acquiring a plurality of target text sub-regions according to the dividing lines, and determining the plurality of target text sub-regions as target text regions.
Optionally, the apparatus further comprises a text recognition preprocessing unit 1205, configured to:
based on a text direction classification function of the trained first text recognition model, performing text direction classification on the plurality of target text regions to obtain a plurality of first target text regions;
based on a text correction sub-function of the trained first text recognition model, performing text correction on the plurality of first target text regions to obtain a plurality of second target text regions;
performing text typesetting direction classification on the plurality of second target text regions based on a text typesetting direction classification function of the trained first text recognition model to obtain a plurality of third target text regions;
and respectively inputting the plurality of third target text areas into the trained first text recognition model for text recognition, and/or respectively inputting the trained second text recognition model for text recognition.
Optionally, the target recognition result determining unit 1204 is specifically configured to:
respectively performing text recognition on the plurality of target text regions according to the plurality of trained second text recognition models to obtain second text recognition results corresponding to the plurality of target text regions in each trained second text recognition model in the plurality of trained second text recognition models;
and determining a plurality of target text recognition results corresponding to the image to be recognized according to the confidence degrees corresponding to the first text recognition results and the confidence degrees corresponding to the second text recognition results of each trained second text recognition model.
Optionally, the target recognition result determining unit 1204 is further specifically configured to:
determining whether the same text recognition result exists in a first text recognition result and a plurality of second text recognition results corresponding to one target text region in the target text regions;
when the same text recognition result is determined to exist, increasing the confidence coefficient of the same text recognition result;
determining the text recognition result corresponding to the maximum confidence as the target text recognition result of the target text region according to the confidence of the same text recognition result and the confidence of the other text recognition results of the target text region;
and determining a plurality of target text recognition results corresponding to the image to be recognized according to the target text recognition results corresponding to the plurality of target text regions respectively.
The apparatus may be configured to execute the methods described in the embodiments shown in fig. 2 to 11, and therefore, for functions and the like that can be realized by each functional module of the apparatus, reference may be made to the description of the embodiments shown in fig. 2 to 11, which is not repeated here. It should be noted that the functional units shown by the dashed boxes in fig. 12 are unnecessary functional units of the apparatus.
Referring to fig. 13, based on the same technical concept, an embodiment of the present application further provides acomputer device 130, which may include amemory 1301 and aprocessor 1302.
Thememory 1301 is used for storing computer programs executed by theprocessor 1302. Thememory 1301 may mainly include a storage program area and a storage data area, where the storage program area may store an operating system, an application program required for at least one function, and the like; the storage data area may store data created according to use of the computer device, and the like. Theprocessor 1302 may be a Central Processing Unit (CPU), a digital processing unit, or the like. The specific connection medium between thememory 1301 and theprocessor 1302 is not limited in this embodiment. In the embodiment of the present application, thememory 1301 and theprocessor 1302 are connected through abus 1303 in fig. 13, thebus 1303 is shown by a thick line in fig. 13, and the connection manner between other components is merely an illustrative description and is not limited thereto. Thebus 1303 may be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown in FIG. 13, but this is not intended to represent only one bus or type of bus.
Thememory 1301 may be a volatile memory (volatile memory), such as a random-access memory (RAM); thememory 1301 may also be a non-volatile memory (non-volatile memory) such as, but not limited to, a read-only memory (rom), a flash memory (flash memory), a Hard Disk Drive (HDD) or a solid-state drive (SSD), or thememory 1301 may be any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer.Memory 1301 may be a combination of the above.
Aprocessor 1302, configured to execute the method performed by the apparatus in the embodiments shown in fig. 2 to fig. 11 when calling the computer program stored in thememory 1301.
In some possible embodiments, various aspects of the methods provided herein may also be implemented in the form of a program product including program code for causing a computer device to perform the steps of the methods according to various exemplary embodiments of the present application described above in this specification when the program product is run on the computer device, for example, the computer device may perform the methods as described in the embodiments shown in fig. 2-11.
Those of ordinary skill in the art will understand that: all or part of the steps for implementing the method embodiments may be implemented by hardware related to program instructions, and the program may be stored in a computer readable storage medium, and when executed, the program performs the steps including the method embodiments; and the aforementioned storage medium includes: a mobile storage device, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes. Alternatively, the integrated unit of the present invention may be stored in a computer-readable storage medium if it is implemented in the form of a software functional module and sold or used as a separate product. Based on such understanding, the technical solutions of the embodiments of the present invention may be essentially implemented or a part contributing to the prior art may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the methods described in the embodiments of the present invention. And the aforementioned storage medium includes: a removable storage device, a ROM, a RAM, a magnetic or optical disk, or various other media that can store program code.
While the preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all alterations and modifications as fall within the scope of the application.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

Claims (11)

Translated fromChinese
1.一种文本识别方法,其特征在于,所述方法包括:1. a text recognition method, is characterized in that, described method comprises:根据已训练的文本检测模型,确定待识别图像的多个目标文本区域;Determine multiple target text areas of the image to be recognized according to the trained text detection model;根据已训练的第一文本识别模型,对所述多个目标文本区域进行文本识别,获取所述多个目标文本区域各自对应的第一文本识别结果;其中,所述第一文本识别模型根据文本语义信息确定所述第一文本识别结果;According to the trained first text recognition model, text recognition is performed on the multiple target text regions, and first text recognition results corresponding to the multiple target text regions are obtained; wherein, the first text recognition model is based on the text The semantic information determines the first text recognition result;根据已训练的第二文本识别模型,对所述多个目标文本区域进行文本识别,获取所述多个目标文本区域各自对应的第二文本识别结果;其中,所述第二文本识别模型根据文本长度和文本语义信息确定所述第二文本识别结果;According to the trained second text recognition model, text recognition is performed on the multiple target text regions, and second text recognition results corresponding to the multiple target text regions are obtained; wherein, the second text recognition model is based on the text The length and text semantic information determine the second text recognition result;根据多个第一文本识别结果和多个第二文本识别结果各自包含的第一置信度,确定所述待识别图像对应的多个目标文本识别结果;其中,所述第一置信度用于指示文本识别结果所对应的目标文本区域中存在特定字符的概率;一个目标文本区域对应多个识别结果,且所述多个识别结果中置信度最大的识别结果,为所述一个目标文本区域的目标文本识别结果。Determine a plurality of target text recognition results corresponding to the to-be-recognized image according to the first confidence levels contained in each of the plurality of first text recognition results and the plurality of second text recognition results; wherein the first confidence level is used to indicate The probability that a specific character exists in the target text area corresponding to the text recognition result; one target text area corresponds to multiple recognition results, and the recognition result with the highest confidence among the multiple recognition results is the target of the one target text area Text recognition results.2.如权利要求1所述的方法,其特征在于,所述根据已训练的文本检测模型,确定待识别图像的多个目标文本区域,包括:2. The method according to claim 1, wherein, determining a plurality of target text regions of the image to be recognized according to the trained text detection model, comprising:根据所述已训练的文本检测模型,确定所述待识别图像中的各个像素点为单个字符的中心点的第一概率,以及所述各个像素点为任意相邻两个字符之间的中心点的第二概率;According to the trained text detection model, determine the first probability that each pixel in the to-be-recognized image is the center point of a single character, and that each pixel is the center point between any two adjacent characters the second probability of ;根据所述第一概率,得到多个局部图像区域;obtaining a plurality of local image regions according to the first probability;针对每一个局部图像区域,根据所述第二概率,对所述每一个局部图像区域进行分割,确定所述每一个局部图像区域对应的多个候选文本区域;For each partial image area, according to the second probability, segment each partial image area, and determine a plurality of candidate text areas corresponding to each partial image area;根据所述多个局部图像区域各自对应的多个候选文本区域,确定待识别图像的多个目标文本区域。According to the multiple candidate text areas corresponding to the multiple partial image areas, multiple target text areas of the image to be recognized are determined.3.如权利要求2所述的方法,其特征在于,所述根据所述多个候选文本区域,确定待识别图像的多个目标文本区域,包括:3. The method according to claim 2, wherein, according to the multiple candidate text regions, determining multiple target text regions of the image to be recognized, comprising:确定所述多个候选文本区域各自对应的第二置信度;其中,所述第二置信度用于指示所述候选文本区域中存在本文的概率;determining a second confidence level corresponding to each of the multiple candidate text regions; wherein, the second confidence level is used to indicate the probability that the text exists in the candidate text region;针对所述多个候选文本区域中的一个候选文本区域,确定所述一个候选文本区域对应的第二置信度是否大于设定的第二置信度阈值;For one candidate text region in the plurality of candidate text regions, determining whether the second confidence level corresponding to the one candidate text region is greater than a set second confidence level threshold;在确定大于设定的第二置信度阈值时,将所述一个候选文本区域确定为目标文本区域。When it is determined that the value is greater than the set second confidence threshold, the one candidate text region is determined as the target text region.4.如权利要求3所述的方法,其特征在于,所述在确定大于设定的第二置信度阈值时,将所述一个候选文本区域确定为目标文本区域,包括:4. The method according to claim 3, wherein, when the determination is greater than the set second confidence threshold, determining the one candidate text region as the target text region, comprising:在确定大于设定的第二置信度阈值时,将所述一个候选文本区域进行二值化处理,获得第一候选文本区域;When it is determined that it is greater than the set second confidence threshold, the one candidate text region is subjected to binarization processing to obtain a first candidate text region;对所述第一候选文本区域进行连通域分析,确定所述第一候选文本区域是否为连通区域;其中,所述连通区域为具有相同像素值且位置相邻的非背景像素点组成的图像区域;Perform a connected domain analysis on the first candidate text area to determine whether the first candidate text area is a connected area; wherein, the connected area is an image area composed of non-background pixels with the same pixel value and adjacent positions ;若确定为连通区域,则将所述第一候选文本区域确定为目标文本区域。If it is determined to be a connected region, the first candidate text region is determined as a target text region.5.如权利要求4所述的方法,其特征在于,所述若确定为连通区域,则将所述第一候选文本区域确定为目标文本区域,包括:5. The method according to claim 4, wherein, if it is determined as a connected region, then determining the first candidate text region as a target text region, comprising:在确定为连通区域后,确定所述第一候选文本区域中多个文本子区域与预设的第一坐标轴之间的多个夹角;其中,一个夹角对应一个文本子区域;所述多个夹角中的任意两个相邻夹角不相同时,则确定所述第一候选文本区域中,所述任意两个相邻夹角各自对应的文本子区域所共同组成的文本区域部分存在弯曲现象;After it is determined to be a connected region, a plurality of included angles between the multiple text sub-regions in the first candidate text region and the preset first coordinate axis are determined; wherein, one included angle corresponds to one text sub-region; the When any two adjacent included angles among the plurality of included angles are not the same, determine the text area portion formed by the text sub-regions corresponding to each of the any two adjacent included angles in the first candidate text area. There is bending;依次确定所述多个夹角中的两个相邻夹角之间的差值是否大于设定的角度阈值;Determine in turn whether the difference between two adjacent included angles in the plurality of included angles is greater than a set angle threshold;在确定两个相邻夹角之间的差值大于设定的角度阈值时,将所述第一候选文本区域中,大于设定的角度阈值的差值所对应的两个相邻夹角各自对应的文本子区域之间的分界线确定为分割线;When it is determined that the difference between the two adjacent angles is greater than the set angle threshold, the two adjacent angles corresponding to the difference greater than the set angle threshold in the first candidate text area are The dividing line between the corresponding text sub-regions is determined as the dividing line;根据所述分割线,获取多个目标文本子区域,并将所述多个目标文本子区域确定为目标文本区域。According to the dividing line, a plurality of target text sub-regions are acquired, and the plurality of target text sub-regions are determined as target text regions.6.如权利要求1所述的方法,其特征在于,在根据已训练的第一文本识别模型,对所述多个目标文本区域进行文本识别,获取所述多个目标文本区域各自对应的第一文本识别结果之前,和/或,根据已训练的第二文本识别模型,对所述多个目标文本区域进行文本识别,获取所述多个目标文本区域各自对应的第二文本识别结果之前,所述方法还包括:6 . The method according to claim 1 , wherein, according to the trained first text recognition model, text recognition is performed on the multiple target text regions, and the respective corresponding first text regions of the multiple target text regions are obtained. 7 . Before a text recognition result, and/or, before performing text recognition on the multiple target text regions according to the trained second text recognition model, and before acquiring the second text recognition results corresponding to the multiple target text regions, The method also includes:基于所述已训练的第一文本识别模型的文本方向分类功能,对所述多个目标文本区域进行文本方向分类,获取多个第一目标文本区域;Based on the text orientation classification function of the trained first text recognition model, the text orientation classification is performed on the multiple target text regions to obtain multiple first target text regions;基于所述已训练的第一文本识别模型的文本矫正子功能,对所述多个第一目标文本区域进行文本矫正,获取多个第二目标文本区域;Based on the text correction sub-function of the trained first text recognition model, text correction is performed on the plurality of first target text regions to obtain a plurality of second target text regions;基于所述已训练的第一文本识别模型的文本排版方向分类功能,对所述多个第二目标文本区域进行文本排版方向分类,获取多个第三目标文本区域;Based on the text typesetting direction classification function of the trained first text recognition model, classifying the text typesetting direction of the plurality of second target text regions to obtain a plurality of third target text areas;将所述多个第三目标文本区域分别输入所述已训练的第一文本识别模型进行文本识别,和/或,分别输入所述已训练的第二文本识别模型进行文本识别。The plurality of third target text regions are respectively input into the trained first text recognition model for text recognition, and/or are respectively input into the trained second text recognition model for text recognition.7.如权利要求1所述的方法,其特征在于,在所述已训练的第二文本识别模型为多个时,所述根据多个第一文本识别结果和多个第二文本识别结果各自对应的置信度,确定所述待识别图像对应的多个目标文本识别结果,包括:7 . The method of claim 1 , wherein when there are multiple trained second text recognition models, the method according to the multiple first text recognition results and the multiple second text recognition results respectively Corresponding confidence level, determining multiple target text recognition results corresponding to the to-be-recognized image, including:根据多个已训练的第二文本识别模型,分别对所述多个目标文本区域进行文本识别,获取所述多个已训练的第二文本识别模型中,每一个已训练的第二文本识别模型中的多个目标文本区域各自对应的第二文本识别结果;According to a plurality of trained second text recognition models, text recognition is performed on the plurality of target text regions respectively, and among the plurality of trained second text recognition models, each trained second text recognition model is obtained. The second text recognition results corresponding to each of the multiple target text regions in;根据多个第一文本识别结果各自对应的置信度,和每一个已训练的第二文本识别模型的多个第二文本识别结果各自对应的置信度,确定所述待识别图像对应的多个目标文本识别结果。According to the respective confidence levels of the multiple first text recognition results and the respective confidence levels of the multiple second text recognition results of each trained second text recognition model, determine multiple targets corresponding to the to-be-recognized image Text recognition results.8.如权利要求7所述的方法,其特征在于,所述根据多个第一文本识别结果各自对应的置信度,和每一个已训练的第二文本识别模型的多个第二文本识别结果各自对应的置信度,确定所述待识别图像对应的多个目标文本识别结果,包括:8 . The method according to claim 7 , wherein the respective confidence levels according to the plurality of first text recognition results and the plurality of second text recognition results of each trained second text recognition model The respective confidence levels are determined, and multiple target text recognition results corresponding to the to-be-recognized image are determined, including:针对多个目标文本区域中的一个目标文本区域,确定在所述一个目标文本区域对应的第一文本识别结果和多个第二文本识别结果中,是否存在相同的文本识别结果;For a target text area in a plurality of target text areas, determine whether there is an identical text recognition result in the first text recognition result and a plurality of second text recognition results corresponding to the one target text area;在确定存在相同的文本识别结果时,增大所述相同的文本识别结果的置信度;When it is determined that the same text recognition result exists, the confidence level of the same text recognition result is increased;根据所述相同的文本识别结果的置信度以及所述一个目标文本区域的其余文本识别结果的置信度,将最大置信度所对应的文本识别结果确定为所述一个目标文本区域的目标文本识别结果;According to the confidence degree of the same text recognition result and the confidence degree of the other text recognition results of the one target text area, the text recognition result corresponding to the maximum confidence degree is determined as the target text recognition result of the one target text area ;根据所述多个目标文本区域各自对应的目标文本识别结果,确定所述待识别图像对应的多个目标文本识别结果。According to the respective target text recognition results corresponding to the multiple target text regions, multiple target text recognition results corresponding to the to-be-recognized image are determined.9.一种文本识别装置,其特征在于,所述装置包括:9. A text recognition device, wherein the device comprises:文本区域确定单元,用于根据已训练的文本检测模型,确定待识别图像的多个目标文本区域;a text area determination unit, used for determining multiple target text areas of the image to be recognized according to the trained text detection model;第一识别结果确定单元,用于根据已训练的第一文本识别模型,对所述多个目标文本区域进行文本识别,获取所述多个目标文本区域各自对应的第一文本识别结果;其中,所述第一文本识别模型根据文本语义信息确定所述第一文本识别结果;a first recognition result determination unit, configured to perform text recognition on the multiple target text regions according to the trained first text recognition model, and obtain first text recognition results corresponding to the multiple target text regions; wherein, The first text recognition model determines the first text recognition result according to text semantic information;第二识别结果确定单元,用于根据已训练的第二文本识别模型,对所述多个目标文本区域进行文本识别,获取所述多个目标文本区域各自对应的第二文本识别结果;其中,所述第二文本识别模型根据文本长度和文本语义信息确定所述第二文本识别结果;The second recognition result determination unit is configured to perform text recognition on the multiple target text regions according to the trained second text recognition model, and obtain second text recognition results corresponding to the multiple target text regions; wherein, The second text recognition model determines the second text recognition result according to the text length and text semantic information;目标识别结果确定单元,用于根据多个第一文本识别结果和多个第二文本识别结果各自包含的第一置信度,确定所述待识别图像对应的多个目标文本识别结果;其中,所述第一置信度用于指示文本识别结果所对应的目标文本区域中存在特定字符的概率;一个目标文本区域对应多个识别结果,且所述多个识别结果中置信度最大的识别结果,为所述一个目标文本区域的目标文本识别结果。A target recognition result determination unit, configured to determine a plurality of target text recognition results corresponding to the to-be-recognized image according to the first confidence levels contained in each of the plurality of first text recognition results and the plurality of second text recognition results; The first confidence level is used to indicate the probability that a specific character exists in the target text area corresponding to the text recognition result; one target text area corresponds to multiple recognition results, and the recognition result with the highest confidence among the multiple recognition results is The target text recognition result of the one target text area.10.计算机设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,其特征在于,10. Computer equipment, comprising a memory, a processor and a computer program stored on the memory and running on the processor, characterized in that,所述处理器执行所述计算机程序时实现权利要求1至8任一项所述方法的步骤。The processor implements the steps of the method of any one of claims 1 to 8 when executing the computer program.11.计算机存储介质,其上存储有计算机程序指令,其特征在于,11. A computer storage medium having computer program instructions stored thereon, characterized in that,该计算机程序指令被处理器执行时实现权利要求1至8任一项所述方法的步骤。The computer program instructions, when executed by a processor, implement the steps of the method of any one of claims 1 to 8.
CN202111137451.6A2021-09-272021-09-27Text recognition method, device, equipment and storage mediumPendingCN113887375A (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN202111137451.6ACN113887375A (en)2021-09-272021-09-27Text recognition method, device, equipment and storage medium

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN202111137451.6ACN113887375A (en)2021-09-272021-09-27Text recognition method, device, equipment and storage medium

Publications (1)

Publication NumberPublication Date
CN113887375Atrue CN113887375A (en)2022-01-04

Family

ID=79007225

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN202111137451.6APendingCN113887375A (en)2021-09-272021-09-27Text recognition method, device, equipment and storage medium

Country Status (1)

CountryLink
CN (1)CN113887375A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN114677693A (en)*2022-03-292022-06-28京东科技信息技术有限公司 Book inventory method and device
CN114693717A (en)*2022-02-242022-07-01腾讯科技(深圳)有限公司Image processing method, device, equipment and computer readable storage medium
CN114937143A (en)*2022-04-012022-08-23广东小天才科技有限公司Rotary shooting method and device, electronic equipment and storage medium
CN115512380A (en)*2022-09-302022-12-23三一汽车起重机械有限公司Text recognition method, system, device, equipment and storage medium
CN115527226A (en)*2022-09-302022-12-27中电金信软件有限公司Method and device for reliably identifying characters and electronic equipment
CN119232863A (en)*2024-09-252024-12-31北京优酷科技有限公司Subtitle generation method, display method, electronic device and storage medium

Citations (28)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US20030189603A1 (en)*2002-04-092003-10-09Microsoft CorporationAssignment and use of confidence levels for recognized text
CN102298696A (en)*2010-06-282011-12-28方正国际软件(北京)有限公司Character recognition method and system
US20120134589A1 (en)*2010-11-272012-05-31Prakash ReddyOptical character recognition (OCR) engines having confidence values for text types
WO2014026483A1 (en)*2012-08-152014-02-20广州广电运通金融电子股份有限公司Character identification method and relevant device
US20140355835A1 (en)*2013-05-282014-12-04Xerox CorporationSystem and method for ocr output verification
CN105095842A (en)*2014-05-222015-11-25阿里巴巴集团控股有限公司Method and device for identifying information of bill
CN105975955A (en)*2016-05-272016-09-28北京好运到信息科技有限公司Detection method of text area in image
CN107403130A (en)*2017-04-192017-11-28北京粉笔未来科技有限公司A kind of character identifying method and character recognition device
CN108446621A (en)*2018-03-142018-08-24平安科技(深圳)有限公司Bank slip recognition method, server and computer readable storage medium
CN109086756A (en)*2018-06-152018-12-25众安信息技术服务有限公司A kind of text detection analysis method, device and equipment based on deep neural network
CN109635627A (en)*2018-10-232019-04-16中国平安财产保险股份有限公司Pictorial information extracting method, device, computer equipment and storage medium
CN110245545A (en)*2018-09-262019-09-17浙江大华技术股份有限公司A kind of character recognition method and device
WO2019192397A1 (en)*2018-04-042019-10-10华中科技大学End-to-end recognition method for scene text in any shape
CN111259889A (en)*2020-01-172020-06-09平安医疗健康管理股份有限公司Image text recognition method and device, computer equipment and computer storage medium
CN111275038A (en)*2020-01-172020-06-12平安医疗健康管理股份有限公司Image text recognition method and device, computer equipment and computer storage medium
CN111353497A (en)*2018-12-212020-06-30顺丰科技有限公司Identification method and device for identity card information
CN111353484A (en)*2020-02-282020-06-30深圳前海微众银行股份有限公司 Image character recognition method, device, device and readable storage medium
CN111368902A (en)*2020-02-282020-07-03北京三快在线科技有限公司Data labeling method and device
CN111814785A (en)*2020-06-112020-10-23浙江大华技术股份有限公司Invoice recognition method, training method of related model, related equipment and device
CN112001406A (en)*2019-05-272020-11-27杭州海康威视数字技术股份有限公司Text region detection method and device
CN112016547A (en)*2020-08-202020-12-01上海天壤智能科技有限公司Image character recognition method, system and medium based on deep learning
CN112418216A (en)*2020-11-182021-02-26湖南师范大学 A text detection method in complex natural scene images
CN112541491A (en)*2020-12-072021-03-23沈阳雅译网络技术有限公司End-to-end text detection and identification method based on image character region perception
CN112668580A (en)*2020-12-282021-04-16南京航天数智科技有限公司Text recognition method, text recognition device and terminal equipment
CN112749695A (en)*2019-10-312021-05-04北京京东尚科信息技术有限公司Text recognition method and device
CN112818979A (en)*2020-08-262021-05-18腾讯科技(深圳)有限公司Text recognition method, device, equipment and storage medium
CN112836522A (en)*2021-01-292021-05-25青岛海尔科技有限公司 Method and device for determining speech recognition result, storage medium and electronic device
WO2021098861A1 (en)*2019-11-212021-05-27上海高德威智能交通系统有限公司Text recognition method, apparatus, recognition device, and storage medium

Patent Citations (28)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US20030189603A1 (en)*2002-04-092003-10-09Microsoft CorporationAssignment and use of confidence levels for recognized text
CN102298696A (en)*2010-06-282011-12-28方正国际软件(北京)有限公司Character recognition method and system
US20120134589A1 (en)*2010-11-272012-05-31Prakash ReddyOptical character recognition (OCR) engines having confidence values for text types
WO2014026483A1 (en)*2012-08-152014-02-20广州广电运通金融电子股份有限公司Character identification method and relevant device
US20140355835A1 (en)*2013-05-282014-12-04Xerox CorporationSystem and method for ocr output verification
CN105095842A (en)*2014-05-222015-11-25阿里巴巴集团控股有限公司Method and device for identifying information of bill
CN105975955A (en)*2016-05-272016-09-28北京好运到信息科技有限公司Detection method of text area in image
CN107403130A (en)*2017-04-192017-11-28北京粉笔未来科技有限公司A kind of character identifying method and character recognition device
CN108446621A (en)*2018-03-142018-08-24平安科技(深圳)有限公司Bank slip recognition method, server and computer readable storage medium
WO2019192397A1 (en)*2018-04-042019-10-10华中科技大学End-to-end recognition method for scene text in any shape
CN109086756A (en)*2018-06-152018-12-25众安信息技术服务有限公司A kind of text detection analysis method, device and equipment based on deep neural network
CN110245545A (en)*2018-09-262019-09-17浙江大华技术股份有限公司A kind of character recognition method and device
CN109635627A (en)*2018-10-232019-04-16中国平安财产保险股份有限公司Pictorial information extracting method, device, computer equipment and storage medium
CN111353497A (en)*2018-12-212020-06-30顺丰科技有限公司Identification method and device for identity card information
CN112001406A (en)*2019-05-272020-11-27杭州海康威视数字技术股份有限公司Text region detection method and device
CN112749695A (en)*2019-10-312021-05-04北京京东尚科信息技术有限公司Text recognition method and device
WO2021098861A1 (en)*2019-11-212021-05-27上海高德威智能交通系统有限公司Text recognition method, apparatus, recognition device, and storage medium
CN111275038A (en)*2020-01-172020-06-12平安医疗健康管理股份有限公司Image text recognition method and device, computer equipment and computer storage medium
CN111259889A (en)*2020-01-172020-06-09平安医疗健康管理股份有限公司Image text recognition method and device, computer equipment and computer storage medium
CN111368902A (en)*2020-02-282020-07-03北京三快在线科技有限公司Data labeling method and device
CN111353484A (en)*2020-02-282020-06-30深圳前海微众银行股份有限公司 Image character recognition method, device, device and readable storage medium
CN111814785A (en)*2020-06-112020-10-23浙江大华技术股份有限公司Invoice recognition method, training method of related model, related equipment and device
CN112016547A (en)*2020-08-202020-12-01上海天壤智能科技有限公司Image character recognition method, system and medium based on deep learning
CN112818979A (en)*2020-08-262021-05-18腾讯科技(深圳)有限公司Text recognition method, device, equipment and storage medium
CN112418216A (en)*2020-11-182021-02-26湖南师范大学 A text detection method in complex natural scene images
CN112541491A (en)*2020-12-072021-03-23沈阳雅译网络技术有限公司End-to-end text detection and identification method based on image character region perception
CN112668580A (en)*2020-12-282021-04-16南京航天数智科技有限公司Text recognition method, text recognition device and terminal equipment
CN112836522A (en)*2021-01-292021-05-25青岛海尔科技有限公司 Method and device for determining speech recognition result, storage medium and electronic device

Cited By (6)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN114693717A (en)*2022-02-242022-07-01腾讯科技(深圳)有限公司Image processing method, device, equipment and computer readable storage medium
CN114677693A (en)*2022-03-292022-06-28京东科技信息技术有限公司 Book inventory method and device
CN114937143A (en)*2022-04-012022-08-23广东小天才科技有限公司Rotary shooting method and device, electronic equipment and storage medium
CN115512380A (en)*2022-09-302022-12-23三一汽车起重机械有限公司Text recognition method, system, device, equipment and storage medium
CN115527226A (en)*2022-09-302022-12-27中电金信软件有限公司Method and device for reliably identifying characters and electronic equipment
CN119232863A (en)*2024-09-252024-12-31北京优酷科技有限公司Subtitle generation method, display method, electronic device and storage medium

Similar Documents

PublicationPublication DateTitle
US12321686B2 (en)Determining functional and descriptive elements of application images for intelligent screen automation
CN113887375A (en)Text recognition method, device, equipment and storage medium
US10572725B1 (en)Form image field extraction
US9235759B2 (en)Detecting text using stroke width based text detection
US8965127B2 (en)Method for segmenting text words in document images
CN111488826A (en)Text recognition method and device, electronic equipment and storage medium
AU2006252019B2 (en)Method and Apparatus for Dynamic Connector Analysis
CN110276366A (en) Object detection using a weakly supervised model
US10643094B2 (en)Method for line and word segmentation for handwritten text images
CN111027563A (en)Text detection method, device and recognition system
WO2017202232A1 (en)Business card content identification method, electronic device and storage medium
JP2004318879A (en) Automation technology for comparing image contents
CN109389115B (en) Text recognition method, device, storage medium and computer equipment
CN109189965A (en)Pictograph search method and system
CN111368632A (en)Signature identification method and device
CN111652144A (en) Item segmentation method, device, equipment and medium based on target region fusion
CN110399877A (en) Optical Character Recognition of Connected Characters
WO2023147717A1 (en)Character detection method and apparatus, electronic device and storage medium
CN114399782A (en)Text image processing method, device, equipment, storage medium and program product
CN115004261B (en)Text line detection
US12288406B2 (en)Utilizing machine-learning based object detection to improve optical character recognition
CN113033531B (en) A method, device and electronic equipment for text recognition in images
CN114387254A (en) A document quality analysis method, device, computer equipment and storage medium
CN116584100A (en) Image space detection suitable for overlay media content
CN114926852B (en)Table identification reconstruction method, apparatus, device, medium and program product

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination

[8]ページ先頭

©2009-2025 Movatter.jp