Disclosure of Invention
The invention provides a character recognition method, a character recognition device, electronic equipment and a computer readable storage medium, and mainly aims to improve the accuracy of character recognition.
In order to achieve the above object, the present invention provides a character recognition method, including:
acquiring a text image, and performing character detection on the text image to obtain a character detection box;
screening and combining the character detection boxes to obtain a target character box;
cutting the non-character area of the target character frame to obtain a cut character frame;
extracting characters from the cut character frame to obtain an initial character set;
and extracting key words in the initial word set, verifying the key words by utilizing a regular verification technology, and taking the key words which are successfully verified as a word recognition result of the text image.
Optionally, the text image performs text detection to obtain a text detection box, including:
carrying out image feature extraction on the text image by using a convolution layer in a character target frame detection model to obtain a feature image, wherein the character target frame detection model is trained in advance;
carrying out standardization operation on the characteristic image by utilizing a batch standardization layer in the character target frame detection model to obtain a standard characteristic image;
fusing the bottom layer characteristics of the text image and the standard characteristic image by utilizing a fusion layer in the character target box detection model to obtain a target characteristic image;
and outputting the detection result of the target characteristic image by using an activation function in the character target frame detection model, and generating a character detection frame according to the detection result.
Optionally, the performing non-text region clipping on the target text box to obtain a clipped text box includes:
carrying out binarization processing on the target character frame to obtain a binarization character frame;
inquiring the character starting position and the character ending position in the direction of the longitudinal axis in the binarization text box and the length in the direction of the longitudinal axis of the binarization text box, and longitudinally cutting the binarization text box according to the character starting position, the character ending position and the length in the direction of the longitudinal axis to obtain a longitudinally cut text box;
and inquiring the character starting position and the character ending position in the transverse axis direction in the longitudinal cutting character frame and the length in the transverse axis direction of the longitudinal cutting character frame, and transversely cutting the longitudinal cutting character frame according to the character starting position and the character ending position in the transverse axis direction and the length in the transverse axis direction to obtain the cutting character frame.
Optionally, the extracting the characters from the cut character box to obtain an initial character set includes:
performing feature extraction on the cutting text box by using a convolutional neural network in a text extraction model to obtain a feature text box, wherein the text extraction model is trained in advance;
carrying out character position sequence recognition on the characteristic character frame by using a long-short term memory network in the character extraction model to generate an original character set;
and performing character alignment on the original character set by utilizing a time sequence classification network in the character extraction model to generate an initial character set.
Optionally, the performing feature extraction on the cutting text box by using a convolutional neural network in a text extraction model to obtain a feature text box includes:
performing convolution feature extraction on the cutting text frame by utilizing convolution layers in the convolution neural network to obtain an initial feature text frame;
reducing the dimension of the initial feature cutting frame by using a pooling layer in the convolutional neural network to obtain a dimension-reduced feature text frame;
and outputting the dimension reduction characteristic text frame by utilizing a full connection layer in the convolutional neural network to obtain a characteristic text frame.
Optionally, the performing character position sequence recognition on the feature character frame by using a long-term and short-term memory network in the character extraction model to generate an original character set includes:
calculating the state value of the characteristic text box by using an input gate of the long-short term memory network;
calculating the activation value of the characteristic text box by using a forgetting gate of the long-short term memory network;
calculating a state update value of the characteristic text box according to the state value and the activation value;
and calculating the character position sequence of the state updating value by utilizing the output gate of the long-short term memory network to generate an original character set.
Optionally, the extracting key words in the initial word set includes:
deleting stop words in the initial character set to obtain a standard character set;
and calculating the weight of each standard character in the standard character set, and screening the standard characters with the weight larger than a preset weight from the standard character set to serve as the key characters.
In order to solve the above problem, the present invention further provides a character recognition apparatus, including:
the detection module is used for acquiring a text image and carrying out character detection on the text image to obtain a character detection box;
the merging module is used for screening and merging the character detection boxes to obtain a target character box;
the cutting module is used for cutting the non-character area of the target character frame to obtain a cut character frame;
the extraction module is used for extracting characters from the cutting character frame to obtain an initial character set;
and the recognition module is used for extracting the key words in the initial word set, verifying the key words by utilizing a regular verification technology, and taking the key words which are successfully verified as the word recognition result of the text image.
In order to solve the above problem, the present invention also provides an electronic device, including:
at least one processor; and the number of the first and second groups,
a memory communicatively coupled to the at least one processor; wherein,
the memory stores a computer program executable by the at least one processor, the computer program being executed by the at least one processor to implement the text recognition method described above.
In order to solve the above problem, the present invention further provides a computer-readable storage medium, in which at least one computer program is stored, and the at least one computer program is executed by a processor in an electronic device to implement the above-mentioned character recognition method.
The method comprises the steps of firstly obtaining a text image, carrying out character detection on the text image to obtain a character detection box, and detecting the detection box with character position coordinates in the text image; secondly, the embodiment of the invention screens, combines and cuts the character detection frame in the non-character area to obtain a cut character frame, which can improve the character recognition performance, thereby greatly improving the accuracy of the character recognition of the picture; further, the method and the device provided by the embodiment of the invention extract characters from the cut character box to obtain an initial character set, extract key characters in the initial character set, verify the key characters by using a regular verification technology, and use the key characters which are successfully verified as character recognition results of the text image. Therefore, the character recognition method, the character recognition device, the electronic equipment and the storage medium can improve the accuracy of character recognition.
Detailed Description
It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
The embodiment of the application provides a character recognition method. The execution subject of the character recognition method includes, but is not limited to, at least one of electronic devices such as a server and a terminal that can be configured to execute the method provided by the embodiment of the present application. In other words, the text recognition method may be executed by software or hardware installed in the terminal device or the server device, and the software may be a block chain platform. The server includes but is not limited to: a single server, a server cluster, a cloud server or a cloud server cluster, and the like.
Fig. 1 is a schematic flow chart of a text recognition method according to an embodiment of the present invention. In the embodiment of the present invention, the character recognition method includes:
and S1, acquiring a text image, and performing character detection on the text image to obtain a character detection box.
In the embodiment of the invention, the text image is obtained by performing image conversion on a document text, and the document text can be a PDF text, such as a government message text. Further, in the embodiment of the present invention, a trained character target box detection model is used to perform character detection on the text image, where the character target box detection model is constructed by a yolo (young Only Look one) neural network, and is used to detect a detection box with a character position coordinate in the text image, and the method includes: convolutional layers, batch normalization layers, fusion layers, activation functions, and the like. Further, before performing text detection on the text image by using the trained text target box detection model, the embodiment of the present invention further includes: and training the pre-constructed character target frame detection model by utilizing a training text image set until the pre-constructed character target frame detection model tends to be stable, and finishing the training of the pre-constructed character target frame detection model to obtain the trained character target frame detection model. It should be noted that the training process of the pre-constructed text target box detection model belongs to the current mature technology, and further details are not described herein.
In detail, the performing character detection on the text image by using the trained character target box detection model to obtain a character detection box includes: carrying out image feature extraction on the text image by using the convolution layer to obtain a feature image; standardizing the characteristic image by using the Batch standardization layer (Batch standardization, BN) to obtain a standard characteristic image; fusing the bottom layer features of the text image with the standard feature image by using the fusion layer to obtain a target feature image; and outputting the detection result of the target characteristic image by using the activation function, and generating a character detection frame according to the detection result.
Wherein, the image feature extraction can be realized by performing convolution operation on the tensor of the input image; the batch normalization layer normalizes the extracted image features, and can accelerate the convergence of the model.
In a preferred embodiment, the normalization operation can be expressed as:
wherein, x'iFor a set of standardized feature images, x, normalized in batchesiFor feature images, μ is the mean of the feature images, σ2For variance of the feature image set, ε is a random number that is infinitesimally small.
The fusion layer fuses the bottom layer features of the image into the extracted image features, so that the influence on image gray scale change caused by different gains can be reduced. The underlying features refer to basic features of the text image, such as color, length, width, and the like, and preferably, the fusion is implemented by a Cross-Stage-Partial-Connections (CSP) module in the fusion layer in the embodiment of the present invention.
In a preferred embodiment, the activation function includes:
where s' represents the activated target feature image and s represents the target feature image.
Further, in a preferred embodiment of the present invention, the detection result includes: x, y, height, width, category and the like, wherein x and y represent the center point of the target feature image, the category represents whether the target feature image is a text region, that is, category 0 represents that the target feature image is not a text region, and category 1 represents that the prediction region is a text region, and thus, the embodiment of the present invention selects the target feature image with category 1 as a text region, thereby generating the text detection box.
And S2, screening and combining the character detection boxes to obtain a target character box.
The embodiment of the invention screens and combines the character detection boxes to obtain the target character box, so as to screen out the character detection boxes with confidence level and repeatability in the character detection boxes, and improve the speed of extracting subsequent characters. The confidence degree refers to the probability that the character falls in the detected character detection box, that is, the higher the confidence degree, the higher the probability that the detected character detection box contains the character.
Further, before the screening and merging of the text detection boxes, the embodiment of the present invention further includes: and performing Non-Maximum processing on the character detection frame by using a Non-Maximum-value algorithm (NMS) to inhibit elements which are not Maximum values in the character detection frame so as to improve the detection speed of a subsequent character detection frame. It should be noted that the non-maximum algorithm belongs to a technology which is mature at present, and is not described herein again.
Further, in an optional embodiment of the present invention, the text detection box may be implemented by a currently known interval estimation method.
Further, it should be understood that the screened text detection boxes include text detection boxes with the same text, so that the embodiment of the present invention combines the screened text detection boxes by using a preset combining rule to avoid the occurrence of the same text detection boxes. Wherein the preset merge rule comprises: and combining the screened character detection boxes with the same adjacent character spacing, the same character height ratio and the same character content.
And S3, cutting the non-character area of the target character frame to obtain a cut character frame.
In the embodiment of the invention, the non-character area cutting is carried out on the target character frame so as to improve the character recognition performance of the subsequent model.
In detail, the performing non-text region clipping on the target text box to obtain a clipped text box includes: carrying out binarization processing on the target character frame to obtain a binarization character frame; inquiring the character starting position and the character ending position in the direction of the longitudinal axis in the binarization text box and the length in the direction of the longitudinal axis of the binarization text box, and longitudinally cutting the binarization text box according to the character starting position, the character ending position and the length in the direction of the longitudinal axis to obtain a longitudinally cut text box; and inquiring the character starting position and the character ending position in the transverse axis direction in the longitudinal cutting character frame and the length in the transverse axis direction of the longitudinal cutting character frame, and transversely cutting the longitudinal cutting character frame according to the character starting position and the character ending position in the transverse axis direction and the length in the transverse axis direction to obtain the cutting character frame.
Wherein the binarization processing of the target text box comprises: and marking the character area in the target character box as 1 and the background area as 0.
In an optional embodiment, the text start position, the text end position, the length in the longitudinal axis direction, and the length in the transverse axis direction may be implemented by a pre-compiled query script, and the query script may be compiled by a JavaScript scripting language.
In an alternative embodiment, the vertical cropping and the horizontal cropping are implemented by a currently known text cropping tool, such as a Photoshop cropping tool.
And S4, extracting characters from the cutting character frame to obtain an initial character set.
In the embodiment of the invention, the character extraction is carried out on the cutting character frame by utilizing the trained character extraction model to obtain an initial character set. Wherein, the pre-trained character extraction model comprises: the method comprises the steps of constructing a Convolutional Neural Network (CNN), a Long Short-Term Memory (LSTM) and a time sequence classification (CTC), wherein the CNN is used for identifying a characteristic text box of a cutting text box, the LSTM is used for extracting a text sequence in the characteristic text box, and the CTC is used for solving the problem that characters in the text characteristic sequence cannot be aligned. Further, the CNN includes a convolutional layer, a pooling layer, and a fully-connected layer, and the LSTM includes: an input gate, a forgetting gate and an output gate.
Further, before performing text extraction on the clipped text box by using the trained text extraction model, the embodiment of the present invention further includes: and training the pre-constructed character extraction model by utilizing a training and cutting character frame set until the pre-constructed character extraction model tends to be stable, and finishing the training of the pre-constructed character extraction to obtain the trained character extraction model. It should be noted that the pre-constructed text extraction training process belongs to the current mature technology, and is not further described herein.
In detail, referring to fig. 2, the extracting the characters from the clipped character box by using the character extraction model trained in advance to obtain an initial character set includes:
s20, performing feature extraction on the cutting text box by using a convolutional neural network in the text extraction model to obtain a feature text box;
s21, recognizing the character position sequence of the characteristic character frame by using a long-short term memory network in the character extraction model to generate an original character set;
and S22, performing character alignment on the original character set by utilizing the time sequence classification network in the character extraction model to generate an initial character set.
Further, the S20 includes: performing convolution feature extraction on the cutting text frame by utilizing convolution layers in the convolution neural network to obtain an initial feature text frame; reducing the dimension of the initial feature cutting frame by using a pooling layer in the convolutional neural network to obtain a dimension-reduced feature text frame; outputting the dimension reduction characteristic text frame by using a full connection layer in the convolutional neural network to obtain a characteristic text frame;
further, the S21 includes: calculating the state value of the characteristic text box by using an input gate of the long-short term memory network; calculating the activation value of the characteristic text box by using a forgetting gate of the long-short term memory network; calculating a state update value of the characteristic text box according to the state value and the activation value; and calculating the character position sequence of the state updating value by utilizing the output gate of the long-short term memory network to generate an original character set.
Further, it is to be noted that the training process of the convolutional neural network, the long and short term memory network and the time sequence classification network belongs to the current mature technology, and will not be further described herein
And S5, extracting key words in the initial word set, verifying the key words by using a regular verification technology, and taking the key words which are successfully verified as word recognition results of the text image.
It should be understood that, in the initial text set obtained in step S4, there are many texts that cannot be used by the user, so the embodiment of the present invention extracts the key texts in the initial text set to better assist the user in performing information processing, thereby improving the work efficiency.
In detail, the extracting the key words in the initial word set includes: deleting stop words in the initial character set to obtain a standard character set, calculating the weight of each standard character in the standard character set, and screening out the standard characters with the weight larger than a preset weight from the standard character set as the key characters.
In an alternative embodiment, the deletion of the stop word may be filtered according to a stop word list, and if the stop word list has "yes", all "in the initial text set are deleted.
In an optional embodiment, the weight of the standard words may be obtained by calculating a ratio of each standard word in the standard word set, and the preset weight may be 0.6, or may be set to other numerical values, and is set according to a specific scene.
Further, it should be understood that some phenomena that the character format is incorrect may exist in the extracted standard characters, for example, a chinese character is incorrect, so that the embodiment of the present invention uses a regular verification technique to verify the key characters, and the key characters that are successfully verified are used as the character recognition result of the text image.
In an optional embodiment, the canonical verification technique includes: digital check expressions (e.g., < Lambda > [0-9 >), < Lambda > < u4e00- < u9fa5 > {0, } $), and special requirement check expressions (e.g., date format: < Lambda >/d {4} - \ d {1, 2} - \\ d {1, 2 }).
Further, in order to ensure privacy and reusability of the key words, the key words can also be stored in a blockchain node.
The method comprises the steps of firstly obtaining a text image, carrying out character detection on the text image to obtain a character detection box, and detecting the detection box with character position coordinates in the text image; secondly, the embodiment of the invention screens, combines and cuts the character detection frame in the non-character area to obtain a cut character frame, which can improve the character recognition performance, thereby greatly improving the accuracy of the character recognition of the picture; further, in the embodiment of the present invention, the cutting text box is subjected to text extraction to obtain an initial text set, key texts in the initial text set are extracted, the key texts are verified by using a regular verification technique, and the key texts which are successfully verified are used as text recognition results of the text image. Therefore, the invention can improve the accuracy of character recognition.
Fig. 3 is a functional block diagram of the character recognition apparatus according to the present invention.
The character recognition apparatus 100 of the present invention may be installed in an electronic device. According to the implemented functions, the character recognition device may include a detection module 101, a merging module 102, a clipping module 103, an extraction module 104, and a recognition module 105. A module according to the present invention, which may also be referred to as a unit, refers to a series of computer program segments that can be executed by a processor of an electronic device and that can perform a fixed function, and that are stored in a memory of the electronic device.
In the present embodiment, the functions regarding the respective modules/units are as follows:
the detection module 101 is configured to obtain a text image, and perform text detection on the text image to obtain a text detection box;
the merging module 102 is configured to screen and merge the text detection boxes to obtain a target text box;
the cutting module 103 is configured to perform non-character region cutting on the target character frame to obtain a cut character frame;
the extraction module 104 is configured to perform text extraction on the clipped text box to obtain an initial text set;
the recognition module 105 is configured to extract key words in the initial word set, verify the key words by using a regular verification technique, and use the key words that are successfully verified as a word recognition result of the text image.
In detail, when the modules in the text recognition apparatus 100 in the embodiment of the present invention are used, the same technical means as the text recognition method described in fig. 1 and fig. 2 are adopted, and the same technical effects can be produced, which is not described herein again.
Fig. 4 is a schematic structural diagram of an electronic device implementing the character recognition method according to the present invention.
The electronic device 1 may comprise a processor 10, a memory 11 and a bus, and may further comprise a computer program, such as a word recognition program 12, stored in the memory 11 and executable on the processor 10.
The memory 11 includes at least one type of readable storage medium, which includes flash memory, removable hard disk, multimedia card, card-type memory (e.g., SD or DX memory, etc.), magnetic memory, magnetic disk, optical disk, etc. The memory 11 may in some embodiments be an internal storage unit of the electronic device 1, such as a removable hard disk of the electronic device 1. The memory 11 may also be an external storage device of the electronic device 1 in other embodiments, such as a plug-in mobile hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, which are provided on the electronic device 1. Further, the memory 11 may also include both an internal storage unit and an external storage device of the electronic device 1. The memory 11 may be used not only to store application software installed in the electronic device 1 and various types of data, such as a code for character recognition, etc., but also to temporarily store data that has been output or is to be output.
The processor 10 may be composed of an integrated circuit in some embodiments, for example, a single packaged integrated circuit, or may be composed of a plurality of integrated circuits packaged with the same or different functions, including one or more Central Processing Units (CPUs), microprocessors, digital Processing chips, graphics processors, and combinations of various control chips. The processor 10 is a Control Unit (Control Unit) of the electronic device, connects various components of the electronic device by using various interfaces and lines, and executes various functions and processes data of the electronic device 1 by running or executing programs or modules (e.g., performing character recognition) stored in the memory 11 and calling data stored in the memory 11.
The bus may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. The bus is arranged to enable connection communication between the memory 11 and at least one processor 10 or the like.
Fig. 4 only shows an electronic device with components, and it will be understood by those skilled in the art that the structure shown in fig. 4 does not constitute a limitation of the electronic device 1, and may comprise fewer or more components than those shown, or some components may be combined, or a different arrangement of components.
For example, although not shown, the electronic device 1 may further include a power supply (such as a battery) for supplying power to each component, and preferably, the power supply may be logically connected to the at least one processor 10 through a power management device, so as to implement functions of charge management, discharge management, power consumption management, and the like through the power management device. The power supply may also include any component of one or more dc or ac power sources, recharging devices, power failure detection circuitry, power converters or inverters, power status indicators, and the like. The electronic device 1 may further include various sensors, a bluetooth module, a Wi-Fi module, and the like, which are not described herein again.
Further, the electronic device 1 may further include a network interface, and optionally, the network interface may include a wired interface and/or a wireless interface (such as a WI-FI interface, a bluetooth interface, etc.), which are generally used for establishing a communication connection between the electronic device 1 and other electronic devices.
Optionally, the electronic device 1 may further comprise a user interface, which may be a Display (Display), an input unit (such as a Keyboard), and optionally a standard wired interface, a wireless interface. Alternatively, in some embodiments, the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode) touch device, or the like. The display, which may also be referred to as a display screen or display unit, is suitable for displaying information processed in the electronic device 1 and for displaying a visualized user interface, among other things.
It is to be understood that the described embodiments are for purposes of illustration only and that the scope of the appended claims is not limited to such structures.
The memory 11 of the electronic device 1 stores a word recognition 12 which is a combination of computer programs which, when executed in the processor 10, enable:
acquiring a text image, and performing character detection on the text image to obtain a character detection box;
screening and combining the character detection boxes to obtain a target character box;
cutting the non-character area of the target character frame to obtain a cut character frame;
extracting characters from the cutting character frame to obtain an initial character set;
and extracting key words in the initial word set, verifying the key words by utilizing a regular verification technology, and taking the key words which are successfully verified as a word recognition result of the text image.
Specifically, the processor 10 may refer to the description of the relevant steps in the embodiment corresponding to fig. 1 for a specific implementation method of the computer program, which is not described herein again.
Further, the integrated modules/units of the electronic device 1, if implemented in the form of software functional units and sold or used as separate products, may be stored in a non-volatile computer-readable storage medium. The computer readable storage medium may be volatile or non-volatile. For example, the computer-readable medium may include: any entity or device capable of carrying said computer program code, recording medium, U-disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM).
The present invention also provides a computer-readable storage medium, storing a computer program which, when executed by a processor of an electronic device, may implement:
acquiring a text image, and performing character detection on the text image to obtain a character detection box;
screening and combining the character detection boxes to obtain a target character box;
cutting the non-character area of the target character frame to obtain a cut character frame;
extracting characters from the cutting character frame to obtain an initial character set;
and extracting key words in the initial word set, verifying the key words by utilizing a regular verification technology, and taking the key words which are successfully verified as a word recognition result of the text image.
In the embodiments provided in the present invention, it should be understood that the disclosed apparatus, device and method can be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is only one logical functional division, and other divisions may be realized in practice.
The modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.
In addition, functional modules in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional module.
It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof.
The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference signs in the claims shall not be construed as limiting the claim concerned.
The block chain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism, an encryption algorithm and the like. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like.
Furthermore, it is obvious that the word "comprising" does not exclude other elements or steps, and the singular does not exclude the plural. A plurality of units or means recited in the system claims may also be implemented by one unit or means in software or hardware. The terms second, etc. are used to denote names, but not any particular order.
Finally, it should be noted that the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting, and although the present invention is described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions may be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention.