CN113536875A

Movatterモバイル変換

Info

Publication number: CN113536875A
Application number: CN202110020118.0A
Authority: CN
Inventors: 曹浩宇; 黄鹏程; 郭安泰; 刘银松; 姜德强
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2021-01-07
Filing date: 2021-01-07
Publication date: 2021-10-22
Anticipated expiration: 2041-01-07
Also published as: CN113536875B

Abstract

Translated fromChinese

本申请涉及一种证件图像识别方法、装置、计算机设备和存储介质。所述方法包括：获取待识别图像；检测所述待识别图像中的文本，并识别检测出的各文本的文本字段；对各文本字段的文本内容以及文本位置进行类别分析处理，确定各文本字段的文本类别；基于各所述文本字段的文本类别，获得所述待识别图像的证件图像识别结果。采用本方法能够实现非固定版式图像的结构化信息提取，且可以实现基于人工智能的图像识别，提升了证件图像的识别性能。

The present application relates to a certificate image recognition method, device, computer equipment and storage medium. The method includes: acquiring an image to be recognized; detecting text in the image to be recognized, and recognizing the text fields of the detected texts; performing category analysis processing on the text content and text positions of the text fields, and determining each text field The text category of each of the text fields is obtained; the identification result of the certificate image of the image to be recognized is obtained based on the text category of each of the text fields. By adopting the method, the structured information extraction of non-fixed-format images can be realized, and the image recognition based on artificial intelligence can be realized, and the recognition performance of certificate images can be improved.

Description

Certificate image recognition method and device, computer equipment and storage medium

Technical Field

The present application relates to the field of image processing technologies, and in particular, to a certificate image recognition method and apparatus, a computer device, and a storage medium.

Background

With the development of computer technology and image processing technology, technologies capable of detecting and recognizing characters in an image have appeared, for example, character detection and recognition by an OCR (optical character recognition) technology, and semi-structured data in which unstructured picture data is converted into characters. In practical image recognition technology, it is really valuable for the user to be structured data. For example, in the case of recognizing a document image, the user is more concerned and needs the recognition result of important fields such as a name and an ID number rather than a simple character recognition result. However, the conventional OCR structured recognition method has poor recognition performance.

Disclosure of Invention

In view of the above, it is necessary to provide a certificate image recognition method, apparatus, computer device and storage medium for improving recognition performance.

A method of document image recognition, the method comprising:

acquiring an image to be identified;

detecting texts in the image to be identified, and identifying text fields of the detected texts;

performing category analysis processing on the text content and the text position of each text field to determine the text category of each text field;

and acquiring a certificate image recognition result of the image to be recognized based on the text type of each text field.

In one embodiment, after acquiring the image to be recognized and before detecting the text in the image to be recognized, the method further includes the steps of: and carrying out image preprocessing on the image to be identified.

In one embodiment, the image preprocessing is performed on the image to be recognized, and includes: and correcting the angle of the image to be recognized.

In one embodiment, the method for training the angle recognition model comprises,

acquiring an initial sample image with a rotation angle of 0 degree;

performing sample expansion on the initial sample image to obtain an expanded sample, wherein the expanded sample comprises the initial sample image and a sample obtained by rotating the initial sample image by a preset angle;

and training the angle recognition model to be trained by adopting the expansion sample to obtain the trained angle recognition model.

In one embodiment, in the process of training the angle identification model to be trained by using the extended sample, the loss function is the square of the difference between the training identification angle of the extended sample and the rotation angle corresponding to the extended sample and the remainder of 360 degrees.

In one embodiment, fusing the content code and the position code to obtain the node code includes: and connecting the content code with the position code to obtain the node code.

In one embodiment, the field modification processing is performed on each text field based on the text category of the text field, and includes: and when the text type is a certificate number, checking whether the certificate number in the text content of the text field conforms to a certificate number rule, and if not, judging that the text field is identified wrongly or judging that the certificate is illegal.

In one embodiment, the field modification processing is performed on each text field based on the text category of the text field, and includes: and when the text type is a gender field, checking whether the text content of the text field belongs to a preset enumeration type, and if not, correcting the text content of the text field.

In one embodiment, the field modification processing is performed on each text field based on the text category of the text field, and includes: and when the text category is the address category, checking and correcting errors of the text content of the text field based on an address library.

In one embodiment, the field modification processing is performed on each text field based on the text category of the text field, and includes: and when the text type is the birth date type, the text content of the text field is first date information, whether the first date information is earlier than the text content of the text field of which the text type is the issue date type is checked, and if not, the certificate identification is judged to be wrong.

A document image recognition device, the device comprising:

the image acquisition module is used for acquiring an image to be identified;

the text detection module is used for detecting the text in the image to be recognized;

the text field identification module is used for identifying the text field of each text detected by the text detection module;

the classification module is used for carrying out classification analysis processing on the text content and the text position of each text field and determining the text classification of each text field;

and the result determining module is used for obtaining the certificate image recognition result of the image to be recognized based on the text type of each text field.

A computer device comprising a memory storing a computer program and a processor implementing the steps of the method as described above when executing the computer program.

A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the method as set forth above.

According to the certificate image identification method, the certificate image identification device, the computer equipment and the storage medium, after a text field of a text in an image is detected and identified, category analysis processing is carried out based on the text content and the text position of the identified text field to determine the text category of the text field, the certificate image identification result of the image to be identified is determined on the basis, the text category of the text field is determined by combining the text content and the text position, the certificate image identification result is obtained on the basis, the specific format structure of the image with identification is not required to be relied on, therefore, the structured information extraction of the image with the non-fixed format can be achieved, and the identification performance of the certificate image is improved.

Drawings

FIG. 1 is a diagram of an application environment of a certificate image recognition method in one embodiment;

FIG. 2 is a schematic flow chart diagram of a credential image recognition method in one embodiment;

FIG. 3 is a schematic flow chart diagram of a certificate image recognition method in another embodiment;

FIG. 4 is a diagram illustrating a model structure of an angle recognition model in one embodiment;

FIG. 5 is a schematic flow chart of classification based on a graph neural network in one embodiment;

FIG. 6 is a schematic diagram of obtaining node codes in one embodiment;

FIG. 7 is a schematic diagram illustrating classification of a neural network in one embodiment;

FIG. 8 is a schematic flow diagram of a credential image recognition method in one particular example;

FIG. 9 is a diagram illustrating an image to be recognized and a corresponding recognition result of a document image in an application example;

FIG. 10 is a block diagram of the configuration of a credential image recognition device in one embodiment;

FIG. 11 is an internal block diagram of a computer device in one embodiment;

fig. 12 is an internal configuration diagram of a computer device in another embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

The certificate image identification method provided by the application can be applied to the application environment shown in fig. 1. Wherein theterminal 10 communicates with theserver 20 through a network. Theterminal 10 may obtain the image to be recognized locally from the terminal or from theserver 20, and recognize the certificate image recognition result in the image to be recognized, and in the recognition process, when a network model is needed to be used, such as the angle recognition model and the graph neural network in the following embodiments, the network model may be obtained by training theterminal 10 itself, or may be obtained by training theserver 20 and then provided to theterminal 10. In some embodiments, theserver 20 may also obtain the image to be recognized from theterminal 10, and recognize the certificate image recognition result in the image to be recognized, and in the recognition process, when a network model is needed, such as an angle recognition model and a graph neural network in the following embodiments, the network model may be obtained by training of theserver 20 itself, or, if needed, may be obtained by training of theterminal 10 and then provided to theserver 20. Theterminal 10 may be, but not limited to, various personal computers, notebook computers, smart phones, tablet computers, and portable wearable devices, and theserver 20 may be implemented by an independent server or a server cluster composed of a plurality of servers.

In some embodiments, aspects of embodiments of the present application may involve computer vision techniques of artificial intelligence, for example, identifying the category information and specific content of structured fields in a document image by means of artificial intelligence.

Artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human Intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.

The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.

Computer Vision technology (CV) Computer Vision is a science for researching how to make a machine "see", and further refers to that a camera and a Computer are used to replace human eyes to perform machine Vision such as identification, tracking and measurement on a target, and further image processing is performed, so that the Computer processing becomes an image more suitable for human eyes to observe or transmitted to an instrument to detect. As a scientific discipline, computer vision research-related theories and techniques attempt to build artificial intelligence systems that can capture information from images or multidimensional data. Computer vision technologies generally include image processing, image recognition, image semantic understanding, image retrieval, OCR, video processing, video semantic understanding, video content/behavior recognition, three-dimensional object reconstruction, 3D technologies, virtual reality, augmented reality, synchronous positioning, map construction, and other technologies, and also include common biometric technologies such as face recognition and fingerprint recognition.

In one embodiment, as shown in fig. 2 and fig. 3, a certificate image recognition method is provided, which is described by taking the method as an example applied to the terminal 10 or theserver 20 in fig. 1, and includes the following steps S201 to S204.

Step S201: and acquiring an image to be identified.

The image to be recognized is a target image for which certificate image recognition is required, and can be obtained in various possible ways. In some embodiments, when the method is executed by a terminal, the terminal may obtain the image to be recognized from the terminal device itself, a server or other devices, for example, may obtain the image to be recognized obtained by real-time shooting from a camera connected to the terminal device. In some embodiments, when the method is executed by a server, the server may obtain the image to be identified from the terminal device, may obtain the image to be identified from its own database, or may obtain the image to be identified from another server or another database.

The obtained image to be recognized may be in any possible image format, and the image format of the image to be recognized is not limited in the embodiment of the present application.

In an embodiment, after obtaining the image to be recognized and before proceeding to the subsequent detection of the text in the image to be recognized, the method may further include:

step S2012: and carrying out image preprocessing on the image to be identified.

The image preprocessing to be performed on the image to be recognized may include any possible image preprocessing operation. In a specific example, the image preprocessing on the image to be recognized may include: and correcting the angle of the image to be recognized. In the following example, an image preprocessing process of correcting the angle of an image to be recognized is explained as an example.

When the angle of the image to be recognized is corrected, any possible angle correction mode may be used, for example, an angle processing method based on text jump features (such as a MSER (maximum Stable extreme Regions extraction) method, a processing method based on card edge straight line detection), an angle estimation method based on deep learning, and the like. In the following specific example of the present application, an example is described in which the angle of the image to be recognized is corrected by an angle estimation method based on deep learning.

Accordingly, in a specific example, correcting the angle of the image to be recognized may include the followingsteps 1 and 2.

Step 1: identifying the rotation angle of the image to be identified by adopting an angle identification model obtained by pre-training;

step 2: and correcting the image to be recognized based on the rotation angle to obtain a corrected image to be recognized.

When the angle recognition model is obtained through training, a machine learning technology can be adopted to obtain the angle recognition model. Machine Learning (ML) is a multi-domain cross discipline, and relates to a plurality of disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and the like. The special research on how a computer simulates or realizes the learning behavior of human beings so as to acquire new knowledge or skills and reorganize the existing knowledge structure to continuously improve the performance of the computer. Machine learning is the core of artificial intelligence, is the fundamental approach for computers to have intelligence, and is applied to all fields of artificial intelligence. Machine learning and deep learning generally include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, and formal education learning.

Referring to fig. 4, a model structure of an angle recognition model in an embodiment of the present application may be a model structure of a CNN (Convolutional Neural network), and a typical structure of the model structure includes an input layer, a Convolutional layer, a pooling layer, and a full connection layer, where the input layer is configured to perform a normalization process on an input image, process the input image into a plurality of matrices, the Convolutional layer is configured to perform a convolution process, and aims to extract image features based on the matrices processed by the input layer, and the pooling layer performs a pooling process to perform a dimension reduction process on the image features processed by the Convolutional layer, and the pooling process may reduce a size of a convolution kernel while retaining corresponding features to reduce a processing amount, and the full connection layer may perform a dimension transformation to output a classification result. In practical technical applications, the number of convolutional layers and pooling layers may be multiple, for example, after a certain convolutional layer is subjected to convolution processing, the convolutional layer is input to the next convolutional layer for processing, after a certain pooling layer is subjected to dimension reduction processing, the convolutional layer is input to the next convolutional layer for convolution processing, the setting position between each convolutional layer and each pooling layer may also be set differently based on different needs of the technology, and embodiments of the present application are not limited specifically. As shown in fig. 4, in the embodiment of the present application, the angle recognition model takes the image to be recognized as a model input, and outputs the rotation angle of the image to be recognized, which is recognized by the angle recognition model, after the processing of the angle recognition model.

In one embodiment, the method for training the angle recognition model comprises the following steps. ,

first, an initial sample image with a rotation angle of 0 degrees is acquired. The initial sample image with the rotation angle of 0 degree refers to a certificate image that has not been rotated, and the initial sample image can be obtained through various possible ways, for example, after the certificate image is obtained, the certificate image is artificially adjusted to obtain the initial sample image with the rotation angle of 0 degree. In other embodiments, these initial sample images may be obtained in other ways as well.

Secondly, sample expansion is carried out on the initial sample image to obtain an expanded sample, wherein the expanded sample comprises the initial sample image and a sample which rotates the initial sample image by a preset angle. Specifically, when these initial sample images are rotated, the initial sample images can be rotated in various possible ways, for example, the initial sample images are rotated by a predetermined angle through a data enhancement algorithm, the data enhancement algorithm can generate more training data from the existing training samples, the data enhancement algorithm utilizes various random transformations capable of generating credible images to increase the samples, the goal of not viewing the identical images twice during training is taken, so that more data samples can be obtained, more contents of the data can be observed, and the generalization capability is better. The predetermined angle may specifically include more than one different angle, for example, for a specific initial sample image, it may be rotated by a plurality of different predetermined angles, so as to obtain a plurality of different extended samples correspondingly.

And then, training the angle recognition model to be trained by adopting the expansion sample to obtain the trained angle recognition model. When the extended samples are used for training the angle identification model to be trained, specifically, after the extended samples are input into the identification model to be trained, the identification model to be trained identifies the training identification angle corresponding to each extended sample, namely, the rotation angle of the extended sample identified by the identification model to be trained, and based on the difference value between the identified training identification angle and the preset angle corresponding to the corresponding extended sample, the model parameters of the identification model to be trained are adjusted until the condition of ending the model training is reached. The predetermined angle corresponding to the extended sample refers to an actual rotation angle of the extended sample, which can be determined when the extended sample is obtained, for example, the predetermined angle is 0 for the initial sample image, and the corresponding predetermined angle is a corresponding rotation angle when the sample is extended for other images in the extended sample. The model training end condition may be that a difference between a training recognition angle obtained by final training and a predetermined angle corresponding to the corresponding extended sample is within an acceptable angle difference range, or that a predetermined number of times of training is reached, and in other embodiments, the model training end condition may be set otherwise.

The angle recognition model is obtained based on the training mode, the angle recognition model is expanded into more expansion samples on the basis of the small-scale initial sample image, the generalization capability is strong, the training process of the angle recognition model can be completed through the small-scale sample training, the angles of the certificate photos can be accurately predicted on various certificates on the basis, and the prediction efficiency is high. The predicted rotation angle can be used for performing rotation correction on the image, and subsequent detection and identification are facilitated.

In the process of training the angle recognition model to be trained by adopting the extended samples, the network can be trained by a back propagation algorithm through an angle regression loss function, and in some embodiments, the minimum mean square error can be used as the minimum loss function. In one specific example of the present application, since the angle value of the image rotation ranges from 0 to 360 °, and the value thereof symmetrically exists about 360 °, such as when the true angle is 0 °, the prediction result is 359 °, and the angle deviation thereof should be less than 358 °, in one embodiment, the loss function may be set as the square of the difference between the training identification angle of the extended sample and the rotation angle (i.e., the above-mentioned predetermined angle) corresponding to the extended sample and the remainder of 360 °. If the training recognition angle of the extended sample recognized by the recognition model to be trained is recorded as prediction, and the corresponding predetermined angle of the extended sample is recorded as Target, the loss function of the embodiment of the present application can be expressed as ((prediction-Target) mod 360 °) by using a formula²Where mod represents the remainder operation.

Step 202: and detecting texts in the image to be recognized, and recognizing text fields of the detected texts.

It is to be understood that, when detecting text in an image to be recognized, in the above-described case where the image is angle-corrected, the text in the image to be recognized after angle correction is detected. When detecting a Text of an image to be recognized, various possible manners may be used for detection, and in some embodiments, OCR Text detection manners may be used for detection, for example, Single-stage object detectors such as a Yolo (young Only Look one), an SSD (Single Shot multi box Detector), a fast RCNN (fast Region-CNN), a dedicated Text Detector ctpn (connective Text network), a Text detection network), a DB (differential binary navigation, scene Text detection), and the like. In the process of text detection, enhancement processing can be carried out on the certificate image, so that the fields can be well segmented.

In recognizing the text field of each detected text, the text field may be recognized in various possible manners, and in some embodiments, the text field may be recognized by using an OCR character recognition manner, such as a character recognition manner based on a CNN + RNN structure, a character recognition manner based on an attention mechanism structure, and the like.

Step S203: and performing category analysis processing on the text content and the text position of each text field to determine the text category of each text field.

In some embodiments, performing category analysis on the text content and the text position of each text field to determine the text category of each text field includes: and processing the text content and the text position of each text field by adopting a graph neural network to obtain the text type of each text field.

The graph neural network is a novel artificial intelligence neural network, and the input of the graph neural network is graph structure data, and the output of the graph neural network is a characterization vector for representing high generalization of property characteristics. Therefore, through the graph neural network, classification and identification are carried out by combining the text content and the text position of the text field, specific image characteristics are not used, dependence on a real sample is reduced, and the cost for carrying out image identification on the certificate highly sensitive data image is lower, and better safety is achieved.

When the text content and the text position of each text field are processed by using the neural network to obtain the text type of each text field, as shown in fig. 5, the method may include step S501 and step S502.

Step S501: and forming node codes corresponding to the text fields according to the text contents and the text positions of the text fields for each identified text field.

In one embodiment, when forming the node codes, as shown in fig. 6, the text content and the text position may be encoded separately, and then the two encoding results are fused to obtain the final node codes. Specifically, for each identified text field, a node code corresponding to the text field is formed according to the text content and the text position of the text field, including the following steps.

And coding the text content of the text field aiming at the text content of the text field to obtain a content code. In a specific example, the text content may be encoded based on a natural language model, for example, a Bert natural language model or a word vector natural language model may be used. When the natural language model is used for encoding specifically, each word in the text content is encoded firstly to obtain a word code of each word in the text content, and then the word codes of each word are fused, for example, the word codes of each word are spliced based on the word sequence of each word to obtain a sentence code (i.e., a content code) of the text content, i.e., the word codes of each word are spliced in an end-to-end connection manner based on the word sequence of each word to obtain the sentence code of the text content.

Among them, Natural Language Processing (NLP) is an important direction in the fields of computer science and artificial intelligence. It studies various theories and methods that enable efficient communication between humans and computers using natural language. Natural language processing is a science integrating linguistics, computer science and mathematics. Therefore, the research in this field will involve natural language, i.e. the language that people use everyday, so it is closely related to the research of linguistics. Natural language processing techniques typically include text processing, semantic understanding, machine translation, robotic question and answer, knowledge mapping, and the like.

For the text position of the text field, after the text position of the text field is normalized to a predetermined size, the normalized text position is encoded to obtain a position code of a predetermined dimension. The predetermined size may be set in connection with actual needs, for example, 1000 x 1000 in the following embodiments. Namely, the text position of the text field is normalized to 1000 × 1000 preset size, and then is coded to the preset fixed dimension. The normalization may be performed in any possible normalization manner, and the embodiment of the present application is not particularly limited.

And fusing the content codes and the position codes to obtain the node codes. In one embodiment, fusing the content code and the position code to obtain the node code includes: and connecting the content code with the position code to obtain the node code. For example, the content code and the position code are spliced in an end-to-end manner, so as to obtain the node code.

Step S502: and classifying the node codes of the text fields by adopting the graph neural network to obtain the classification categories of the text fields.

The core idea of the graph neural network (GCN) is to learn a mapping by which nodes in a graph can aggregate their own and neighbor features to form a new representation of the nodes. Referring to fig. 7, specifically, in the embodiment of the present application, after classification and identification are performed on the nodes by combining their own features and relationships between the nodes through a neural network, a specific text category of each text field may be determined, and in a process of outputting the specific text category of each text field, a useless text category may be eliminated and not output, for example, in the example shown in fig. 7, after 9 nodes are analyzed, useful categories corresponding to four nodes may be output: name, gender, certificate number, year and month of birth, etc.

In the embodiment of the present application, the graph form of the graph neural network used is G ═ V, E, where V denotes a node, each text field corresponds to a node, the specific content of the node may be a node code of the text field, and E represents an edge. Each node assumes an edge with itself, i.e., (v, v) ∈ E. With x ∈ R_n*mAll vertices are represented, where m is the characteristic dimension of each node. A is used for representing the adjacency matrix of G and is used for representing the connection relation between nodes, D is a degree matrix, and the degree of each node refers to the number of the nodes connected with the node, wherein

In the classification processing by the GCN, initially, the graph G ═ V, E includes information of each node, and edges exist between any two nodes, specifically, the edges of each node and itself and the edges of each node and other nodes, and the weight of each edge is given to the initial value. During the GCN process, the neighbor information in the graph can be obtained through a layer convolution operation, such as new k-dimensional vertex feature matrix of one layer of GCN

Where ρ is the activation function, X is the hidden layer, W₀Is a parameter matrix, in one embodiment, ReLU may be selected as the activation function. In a specific process of processing by using the graph neural network, there may be a plurality of convolution operations, that is, a plurality of convolution layers may be included in the GCN.

In the embodiment of the application, in the GCN processing process, the weight between the nodes may be determined according to the distance between the nodes, and specifically, the weight between the nodes that are farther away is smaller. In some embodiments, the distance between the nodes may be determined by combining the positions of the nodes in the image to be recognized (it is understood that the image to be recognized after angle correction is performed), for example, the distance between two nodes may be determined by combining the positions of the two nodes in the image to be recognized and the height and width of the image to be recognized, and in a specific example, the distance between two nodes may be determined by using the following formula:

wherein l_ijRepresents the distance, h, between node i and node j_maxRepresenting the height, w, of the image to be recognized_maxRepresenting the width, i, of the image to be recognized_xRepresenting the position coordinate of node i on the x-axis, i_yRepresenting the position coordinate of node i on the y-axis, j_xRepresenting the position coordinate of node j on the x-axis, j_yRepresenting the position coordinate of node j on the y-axis.

In one specific example of the present application, two layers of GCNs may be employed to calculate the probability that each node belongs to different classes in a cross-entropy manner, thereby obtaining the text class of each text field.

For the certificate image, the certificate OCR field categories usually have a certain arrangement relation except the content characteristics, for example, Chinese and English names are usually close to each other, and the field names 'birth dates' are usually marked near the birth dates, so that the certificate image can be well captured by performing classification processing through a neural network, and the certificate image recognition performance is improved. And only depends on text content and position in the processing process, and specific image characteristics are not used, so that the dependence on real samples is reduced. The data which is highly sensitive to the certificates is lower in cost and has better safety. Meanwhile, in the processing process, after the text field of the European text in the image is identified, the European text is classified depending on the text content and the text position of the text field and does not depend on the specific format structure of the image, so that different formats and different types of certificates of the same certificate can be well compatible. The graph neural network may be a graph convolution network or a graph attention network, and the embodiment of the present application is not particularly limited.

Step S204: and acquiring a certificate image recognition result of the image to be recognized based on the text type of each text field.

Based on the recognized text type of each text field, the text type of each text field and the corresponding text content can be structurally output, for example, information output through a table, a list and the like is performed, so that each piece of structured information data is visually obtained, and the subsequent application of the structured data is facilitated.

In one embodiment, obtaining a certificate image recognition result of the image to be recognized based on the text category of each text field includes:

and performing field correction processing on the text fields based on the text types of the text fields to obtain a final certificate image recognition result.

The field correction processing is a field post-processing, and is mainly used for post-correcting the obtained identification result and determining whether the certificate meets the specification, and in some specific examples, the field correction processing may include both field level post-processing and global post-processing. The field-level post-processing refers to correction processing for a single text field, and verification and correction are performed on whether the text field meets corresponding rules, for example, whether a certificate number meets a certificate number rule, and whether certificate gender meets an enumeration type rule. The global post-processing refers to a correction process for determining whether the text fields correspond to the corresponding relationship, such as whether the birth date is earlier than the issue date, whether the information code in the certificate number matches the certificate gender when the certificate number has the code of the gender information, and the like.

In one embodiment, the field modification processing is performed on each text field based on the text category of the text field, and includes: and when the text type is a certificate number, checking whether the certificate number in the text content of the text field conforms to a certificate number rule, and if not, judging that the text field is identified wrongly or judging that the certificate is illegal. Because the certificate number usually needs to accord with a certain check rule, the number which does not accord with the check rule can be identified, and the number which does not accord with the check rule can be identified wrongly or the certificate is illegal.

In one embodiment, the field modification processing is performed on each text field based on the text category of the text field, and includes: and when the text type is a gender field, checking whether the text content of the text field belongs to a preset enumeration type, and if not, correcting the text content of the text field. For example, the credential gender field is typically of an enumerated type, and thus, for common errors of the enumerated type, error correction may be performed by an error correction mechanism, such as candidate set correction, or the like. Specific error correction mechanism the embodiments of the present application are not particularly limited.

Therefore, the overall identification accuracy can be improved and the unqualified certificates can be rejected through the field correction processing.

According to the certificate image recognition method, after the text field of the text in the image is detected and recognized, the type analysis processing is carried out based on the text content and the text position of the recognized text field to determine the text type of the text field, the certificate image recognition result of the image to be recognized is determined on the basis, the text type of the text field is determined by combining the text content and the text position, and the certificate image recognition result is obtained on the basis, so that the structured information extraction of the unfixed format image can be realized, and the recognition performance of the certificate image is improved.

The certificate image identification method can be applied to any technical scene for identifying the certificate image. Referring to fig. 8, the certificate image recognition method in a specific application scenario is exemplified. Before the certificate image is actually recognized, relevant models, such as an angle recognition model and a graph neural network, need to be trained and obtained. The process of training the model can be performed by any equipment, such as a terminal or a server, as long as the certificate image recognition equipment can obtain the model obtained by training of the equipment performing the model training when finally performing the certificate image recognition.

In one embodiment, when the angle recognition model is obtained through training, an initial sample image with a rotation angle of 0 degree is obtained, the initial sample image is rotated by a predetermined angle through a data enhancement algorithm to obtain an extended sample, and then the angle recognition model to be trained is trained according to the extended sample to obtain the trained angle recognition model. The method has the advantages that the initial sample image is expanded into more expansion samples on the basis of a small-scale initial sample image, the generalization capability is strong, the training process of the angle recognition model can be completed through small-scale sample training, the angles of certificate photos can be accurately predicted on various certificates on the basis, and the prediction efficiency is high.

In one embodiment, when a graph neural network is obtained through training, corresponding graph structure data can be obtained based on a sample certificate image after the sample certificate image is obtained, specifically, after text content and text positions of text fields of the sample certificate image are obtained, the text content and the text positions are respectively encoded, node codes of the text fields are obtained by combining the text content and the text position codes, each text field is used as a node, the node codes corresponding to the text fields are used as node vectors of the node vectors, the graph neural network to be trained is adopted for training, and in the process of each iterative training, parameters of the graph neural network are adjusted to update weights determined based on distances among the nodes until a training end condition is reached, so that the trained graph neural network is obtained.

Assuming that the image to be recognized which needs to be recognized currently is as shown in fig. 9-1 and 9-3, the following process can be adopted for recognizing the image to be recognized shown in fig. 9-1:

and performing image preprocessing by using the angle identification model obtained by the training, specifically, after the angle identification model is used for identifying the rotation angle of the image to be identified, correcting the angle of the image to be identified based on the identified rotation angle, and obtaining the corrected image to be identified.

Then, detecting the text in the corrected image to be identified, identifying the text field of each detected text, coding the text content of each text field to obtain a content code, normalizing the text position of the text field to a preset size, coding the normalized text position to obtain a position code with a preset dimension, and fusing the content code and the position code to obtain a node code of each text position.

Then, the node codes of the text positions are input into the graph neural network obtained by training, and classification processing is carried out through the graph neural network, so that the categories of the text fields are obtained.

And on the basis of obtaining the category of the text field, performing field correction processing, such as whether the certificate number conforms to certificate number rules and the like, to obtain a final certificate identification result, wherein the final certificate identification result can be displayed by combining various field types through a list, a table or any possible mode for a user to view. The result of performing certificate image recognition on the image of fig. 9-1 in one specific example is shown in fig. 9-2, and the recognized result includes: credential Type (Type): permanent resident identification card of hong Kong; chinese name (CnName): plum; english name (EnName): LEE, #; chinese telecommunications code (TelexCode): 2621 × ". x"; sex (Sex): a male M; date of birth (Birthday): 11-04-1989; symbol of accreditation (Symbol): AZ, and the like.

Similarly, for the certificate image shown in fig. 9-2, the above-mentioned similar processing is adopted to obtain the recognition result shown in fig. 9-4, including: credential Type (Type): identity card of hong Kong residents; chinese name (CnName): guo ·; english name (EnName): GUO,; chinese telecommunications code (TelexCode): 6753A; sex (Sex): a woman F; date of birth (Birthday): 01-10-1992; symbol of accreditation (Symbol): CX, and the like.

It should be understood that, although the steps in the flowcharts shown in the embodiments are shown in sequence as indicated by the arrows, the steps are not necessarily executed in sequence as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least some of the steps in these flowcharts may include multiple steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of performing the steps or stages is not necessarily sequential, but may be performed alternately or alternately with other steps or at least some of the steps or stages in other steps.

In one embodiment, as shown in fig. 10, there is provided a document image recognition apparatus, which may be a part of a computer device using a software module or a hardware module, or a combination of the two, the apparatus specifically includes:

theimage acquisition module 101 is used for acquiring an image to be identified;

atext detection module 102, configured to detect a text in the image to be recognized;

a textfield identification module 103, configured to identify a text field of each text detected by the text detection module;

theclassification module 104 is configured to perform category analysis processing on the text content and the text position of each text field, and determine a text category of each text field;

and theresult determining module 105 is used for obtaining a certificate image recognition result of the image to be recognized based on the text type of each text field.

In one embodiment, the method further comprises: the image preprocessing module is used for preprocessing the image to be identified;

thetext detection module 102 detects a text in the image to be recognized after the image preprocessing module performs image preprocessing.

In one embodiment, the image pre-processing module comprises: and the angle correction module is used for correcting the angle of the image to be recognized.

In one embodiment, the angle correction module identifies the rotation angle of the image to be identified by adopting an angle identification model obtained by pre-training; and correcting the image to be recognized based on the rotation angle to obtain a corrected image to be recognized.

In one embodiment, the system further comprises an angle recognition model training module for training to obtain the angle recognition model.

In one embodiment, the angle recognition model training module comprises:

the sample acquisition module is used for acquiring an initial sample image with a rotation angle of 0 degree;

the sample expansion module is used for performing sample expansion on the initial sample image to obtain an expanded sample, wherein the expanded sample comprises the initial sample image and a sample for rotating the initial sample image by a preset angle;

and the training module is used for training the angle recognition model to be trained by adopting the expansion sample to obtain the trained angle recognition model.

And the training module is used for training the angle identification model to be trained by adopting the extended sample, wherein the loss function is the square of the difference value between the training identification angle of the extended sample and the rotating angle corresponding to the extended sample and the remainder of 360 degrees.

In one embodiment, theclassification module 104 is configured to process the text content and the text position of each text field by using a neural network to obtain a text category of each text field.

In one embodiment, theclassification module 104 includes:

the node coding module is used for forming a node code corresponding to each identified text field according to the text content and the text position of the text field;

and the classification processing module is used for taking the node codes of the text fields as nodes of a graph neural network, and classifying the node codes by adopting the graph neural network to obtain the classification categories of the text fields.

In one embodiment, the node encoding module includes:

the content coding module is used for coding the text content of the text field to obtain a content code;

the position coding module is used for coding the normalized text position after normalizing the text position of the text field to a preset size to obtain a position code with a preset dimension;

and the code fusion module is used for fusing the content codes and the position codes to obtain the node codes.

In one embodiment, the code fusion module connects the content code and the position code to obtain the node code.

In one embodiment, theresult determining module 105 performs field correction processing on the text fields based on the text type of each text field to obtain a final certificate image recognition result.

In one embodiment, when the text type is a certificate number, theresult determining module 105 checks whether the certificate number in the text content of the text field conforms to a certificate number rule, and if not, determines that the text field is identified incorrectly or determines that the certificate is illegal.

In one embodiment, when the text type is a gender field, theresult determining module 105 checks whether the text content of the text field belongs to a predetermined enumeration type, and if not, corrects the text content of the text field.

In one embodiment, theresult determination module 105 performs checksum error correction on the text content of the text field based on an address library when the text category is an address category.

In one embodiment, when the text type is the birth date type, theresult determination module 105 checks whether the text content of the text field is earlier than the text content of the text field of which the text type is the issue date type, and determines that the certificate identification is incorrect.

For specific limitations of the document image recognition device, reference may be made to the above limitations of the document image recognition method, which are not described in detail herein. All or part of each module in the certificate image recognition device can be realized by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.

In one embodiment, a computer device is provided, which may be a terminal, and its internal structure diagram may be as shown in fig. 11. The computer device includes a processor, a memory, a communication interface, a display screen, and an input device connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The communication interface of the computer device is used for carrying out wired or wireless communication with an external terminal, and the wireless communication can be realized through WIFI, an operator network, NFC (near field communication) or other technologies. The computer program is executed by a processor to implement a method of document image recognition. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, a key, a track ball or a touch pad arranged on the shell of the computer equipment, an external keyboard, a touch pad or a mouse and the like.

In one embodiment, a computer device is provided, which may be a server, and its internal structure diagram may be as shown in fig. 12. The computer device includes a processor, a memory, and a network interface connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The database of the computer device is used for storing relevant data, such as angle recognition models, graph neural networks, initial sample images, augmented samples, and the like. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a method of document image recognition.

It will be appreciated by those skilled in the art that the configurations shown in fig. 11 and 12 are block diagrams of only some of the configurations relevant to the present disclosure, and do not constitute a limitation on the computing devices to which the present disclosure may be applied, and that a particular computing device may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.

In one embodiment, a computer device is further provided, which includes a memory and a processor, the memory stores a computer program, and the processor implements the steps of the above method embodiments when executing the computer program.

In an embodiment, a computer-readable storage medium is provided, in which a computer program is stored which, when being executed by a processor, carries out the steps of the above-mentioned method embodiments.

In one embodiment, a computer program product or computer program is provided that includes computer instructions stored in a computer-readable storage medium. The computer instructions are read by a processor of a computer device from a computer-readable storage medium, and the computer instructions are executed by the processor to cause the computer device to perform the steps in the above-mentioned method embodiments.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database or other medium used in the embodiments provided herein can include at least one of non-volatile and volatile memory. Non-volatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical storage, or the like. Volatile Memory can include Random Access Memory (RAM) or external cache Memory. By way of illustration and not limitation, RAM can take many forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM), among others.

The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.