CN110245545A

Movatterモバイル変換

Info

Publication number: CN110245545A
Application number: CN201811126275.4A
Authority: CN
Inventors: 任宇鹏; 卢维; 殷俊
Original assignee: Zhejiang Dahua Technology Co Ltd
Current assignee: Zhejiang Dahua Technology Co Ltd
Priority date: 2018-09-26
Filing date: 2018-09-26
Publication date: 2019-09-17

Abstract

The invention discloses a kind of character recognition method and devices, and the recognition result accuracy for solving the problems, such as text in image is not high.This method comprises: the image comprising text to be identified is input in the first model comprising convolutional neural networks and Recognition with Recurrent Neural Network that training in advance is completed, the first score value that the content for including in the location information and each Suggestion box for each Suggestion box for including in described image is text is obtained；Screen the candidate Suggestion box that score value is greater than default scoring threshold value；According to the position of each candidate Suggestion box, candidate Suggestion box is merged to obtain target Suggestion box；Each target Suggestion box is input in the second model comprising convolutional neural networks and Recognition with Recurrent Neural Network that training in advance is completed, identifies the text for including in each target Suggestion box.

Description

A kind of character recognition method and device

Technical field

The present invention relates to deep learning and technical field of character recognition more particularly to a kind of character recognition methods and device.

Background technique

With the fast development of image capture device, more and more image informations need the mankind to be managed it.AndThe automatic management that image information is realized using Internet technology is current best means.

In identification image before text, need to position the text in image first.The text in image is fixed at presentPosition method is broadly divided into following two categories: the first is based on Faster RCNN (Faster Region ConvolutionalNeural Networks), YOLO (You Only Look Once) or SSD (Single Shot MultiBox Detector)The position frame homing method of network, such method can directly export line of text scoring and positioning frame body；Second is based on completeThe dividing method of convolutional neural networks (Fully Convolutional Networks, FCN), such method pass through prediction pixelThe text classification of grade is as a result, and carry out certain post-processing generation boundary rectangle frame to result.Real-time and precision are all higherFaster RCNN method suggests network (Region Proposal Networks, RPN) method after convolution using regionDifferent text filed candidate frames is generated on characteristic pattern, and classification and position frame time are carried out to candidate region by neural networkReturn.But acutely due to the variation of line of text length, conventional candidate frame scheme is difficult to realize the accurate positionin to the type objects, togetherWhen, due to the limitation of computing cost and the requirement of real-time, cannot meet simply by candidate frame size and shape is increasedRequired precision needs to improve existing RPN scheme.

It is that number connection inscription product science and technology in Chengdu is limited with the closest existing implementation of the present invention in terms of pictograph identificationA kind of " complex script recognition methods based on deep learning " patent formula of company's application.The program is using single convolution mindIdentify that single character does not account for the context and semantic information that text sequence is included through network, recognition result accuracy is notIt is high.

Summary of the invention

The purpose of the embodiment of the present invention is that a kind of character recognition method and device are provided, for solving the knowledge of text in imageThe not high problem of other result precision.

The embodiment of the invention provides a kind of character recognition methods, comprising:

Image comprising text to be identified is input to and trains the neural comprising convolutional neural networks and circulation of completion in advanceIn first model of network, include in the location information for each Suggestion box for including in acquisition described image and each Suggestion boxContent is the first score value of text, wherein first model obtains the characteristic pattern of described image, based on the characteristic pattern intoThe operation of row sliding window, determines each window feature, each according to preset width and Height Prediction in each window featurePosition Suggestion box；Using the corresponding window feature sequence of every row of the characteristic pattern as the input of Recognition with Recurrent Neural Network, it is based on instituteState Recognition with Recurrent Neural Network obtain described image in include each Suggestion box location information and each Suggestion box in include it is interiorHold the first score value for text；

Screen the candidate Suggestion box that the first score value is greater than default scoring threshold value；

According to the position of each candidate Suggestion box, candidate Suggestion box is merged to obtain target Suggestion box；

Each target Suggestion box is input to the convolutional neural networks comprising Recognition with Recurrent Neural Network that training is completed in advanceIn second model, the text for including in each target Suggestion box is identified.

Further, it is described by the image comprising text to be identified be input to that training in advance completes comprising convolutional Neural netBefore in first model of network and Recognition with Recurrent Neural Network, the method also includes:

Described image is handled using threshold segmentation method and connected domain analysis method；

And image carries out text orientation correction to treated.

Further, the position according to each candidate Suggestion box merges to obtain target and build to candidate Suggestion boxDiscussing frame includes:

Further, it is determined that the degree of overlapping of the vertical direction includes:

According to the of the first height of the described first candidate Suggestion box and the first vertical coordinate and the second candidate Suggestion boxTwo height and the second vertical coordinate, using following formula: overlap=| y_A2-y_D1|/min(h₁,h₂), determine the Vertical SquareTo degree of overlapping, wherein y_A2Represent the second vertical coordinate of the described second candidate Suggestion box, y_D1First candidate is represented to buildDiscuss the first vertical coordinate of frame, h₁And h₂The first height and second candidate for respectively representing the described first candidate Suggestion box are builtDiscuss the second height of frame.

Further, it is determined that the shape similarity includes:

According to the second height of the first height of the described first candidate Suggestion box and the second candidate Suggestion box, using following public affairsFormula: similarity=min (h₁,h₂)/max(h₁,h₂), determine the shape similarity, wherein h₁And h₂Respectively represent instituteState the height of the first candidate Suggestion box and the second candidate Suggestion box.

Further, the process for training first model in advance includes:

Sample image is obtained, wherein being labelled with the location information of each Suggestion box in the sample image and each position is builtThe content that view frame includes is the second score value of text；

Each sample image is input in the first model comprising convolutional neural networks and Recognition with Recurrent Neural Network, according to everyThe output of a first model is trained first model.

Further, the process for training second model in advance includes:

Obtain each line of text marked in sample image；

Each sample image comprising corresponding line of text is input to comprising convolutional neural networks and Recognition with Recurrent Neural NetworkIn second model, according to the output of each second model, second model is trained.

The embodiment of the invention provides a kind of character recognition device, which includes:

Obtain module, for by include text to be identified image be input to that training in advance completes comprising convolutional Neural netIn first model of network and Recognition with Recurrent Neural Network, obtains the location information for each Suggestion box for including in described image and each buildThe content for including in view frame is the first score value of text, wherein first model obtains the characteristic pattern of described image, is based onThe characteristic pattern carries out sliding window operation, determines each window feature, in each window feature according to preset width andThe each position Suggestion box of Height Prediction；Using the corresponding window feature sequence of every row of the characteristic pattern as Recognition with Recurrent Neural NetworkThe input of model obtains the location information for each Suggestion box for including in described image based on the Recognition with Recurrent Neural Network submodelAnd the content in each Suggestion box including is the first score value of text；

Screening module, the candidate Suggestion box for being greater than default scoring threshold value for screening the first score value；

Merging module is merged to obtain target and be built for the position according to each candidate Suggestion box to candidate Suggestion boxDiscuss frame；

Identification module, for by each target Suggestion box be input to that training in advance completes comprising convolutional neural networks and followingIn second model of ring neural network, the text for including in each target Suggestion box is identified.

Further, described device further include:

Correction module, for being handled using threshold segmentation method and connected domain analysis method described image；And it is rightTreated, and image carries out text orientation correction.

Further, the merging module, specifically for the first height and first according to the described first candidate Suggestion boxThe second height and the second vertical coordinate of vertical coordinate and the second candidate Suggestion box, using following formula: overlap=| y_A2-y_D1|/min(h₁,h₂), determine the degree of overlapping of the vertical direction, wherein y_A2Represent the described second candidate Suggestion box second is hung downStraight coordinate, y_D1Represent the first vertical coordinate of the described first candidate Suggestion box, h₁And h₂Respectively represent the described first candidate suggestionThe height of frame and the second candidate Suggestion box.

Further, the merging module, specifically for the first height and second according to the described first candidate Suggestion boxSecond height of candidate Suggestion box, using following formula: similarity=min (h₁,h₂)/max(h₁,h₂), determine the shapeShape similarity, wherein h₁And h₂Respectively represent the first height and the second candidate Suggestion box of the described first candidate Suggestion boxSecond height.

Further, described device further include:

First training module, for obtaining sample image, wherein being labelled with the position of each Suggestion box in the sample imageThe content that confidence breath and each position Suggestion box include is the second score value of text；Each sample image is input to comprising volumeIn first model of product neural network and Recognition with Recurrent Neural Network, according to the output of each first model, to first model intoRow training.

Further, described device further include:

Second training module, for obtaining each line of text marked in sample image；The every of corresponding line of text will be includedA sample image is input in the second model comprising convolutional neural networks and Recognition with Recurrent Neural Network, according to each second modelOutput, is trained second model.

The embodiment of the present invention provides a kind of character recognition method and device, this method are defeated by the image comprising text to be identifiedEnter in the first model comprising convolutional neural networks and Recognition with Recurrent Neural Network completed to preparatory training, includes in acquisition imageThe content for including in the location information of each Suggestion box and each Suggestion box is the first score value of text, wherein the first modelObtain image characteristic pattern, based on characteristic pattern carry out sliding window operation, determine each window feature, in each window feature according toPreset width and each position Suggestion box of Height Prediction；Using the corresponding window feature sequence of every row of characteristic pattern as circulation mindInput through network, location information and each Suggestion box based on each Suggestion box for including in Recognition with Recurrent Neural Network acquisition imageIn include content be text the first score value.Identify that the first score value is greater than the candidate Suggestion box of default scoring threshold value；RootAccording to the position of each candidate Suggestion box, candidate Suggestion box is merged to obtain target Suggestion box；Each target Suggestion box is defeatedEnter in the second model comprising convolutional neural networks and Recognition with Recurrent Neural Network completed to preparatory training, identifies each target suggestionThe text for including in frame.

Due in embodiments of the present invention, by the image comprising text to be identified be input to that training in advance completes comprising volumeIn first model of product neural network and Recognition with Recurrent Neural Network, the location information of each Suggestion box for including in image and every is obtainedThe content for including in a Suggestion box is the first score value of text.First model can effectively obtain the upper and lower of text sequenceLiterary information simultaneously adds it in position fixing process, specifically, the score value that the white space Suggestion box between same row text is setIt can be promoted because of the sequence signature of front and back text, the line of text position frame obtained is finally made to be more in line with text sequencePosition feature, line of text positioning result are more accurate.Secondly, including by what each target Suggestion box was input to training completion in advanceIn second model of convolutional neural networks and Recognition with Recurrent Neural Network, the text for including in each target Suggestion box is identified.This secondModel is due to can be enhanced the extraction of word sequence contextual information comprising Recognition with Recurrent Neural Network, so that the prediction of text sequenceAs a result more accurate.

Detailed description of the invention

To describe the technical solutions in the embodiments of the present invention more clearly, make required in being described below to embodimentAttached drawing is briefly introduced, it should be apparent that, drawings in the following description are only some embodiments of the invention, for thisFor the those of ordinary skill in field, without creative efforts, it can also be obtained according to these attached drawings otherAttached drawing.

Fig. 1 is a kind of flow diagram for character recognition method that the embodiment of the present invention 1 provides；

Fig. 2 is the specific implementation procedure schematic diagram for the line of text positioning operation that the embodiment of the present invention 1 provides；

Fig. 3 is the effect diagram operated by Recognition with Recurrent Neural Network that the embodiment of the present invention 1 provides；

Fig. 4 is the specific implementation procedure schematic diagram for the line of text identification operation that the embodiment of the present invention 1 provides；

Fig. 5 is the Suggestion box desired position information schematic diagram that the embodiment of the present invention 3 provides；

Fig. 6 is that the entire flow diagram for the express delivery face list Text region that the embodiment of the present invention 7 provides is intended to；

Fig. 7 is a kind of character recognition device structural schematic diagram that the embodiment of the present invention 8 provides.

Specific embodiment

The present invention will be describe below in further detail with reference to the accompanying drawings, it is clear that described embodiment is only thisInvention a part of the embodiment, instead of all the embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art existAll other embodiment obtained under the premise of creative work is not made, shall fall within the protection scope of the present invention.

Embodiment 1:

Fig. 1 is a kind of process schematic of character recognition method provided in an embodiment of the present invention, which includes following stepIt is rapid:

S101: by the image comprising text to be identified be input to that training in advance completes comprising convolutional neural networks and circulationIn first model of neural network, obtains and wrapped in the location information and each Suggestion box for each Suggestion box for including in described imageThe content contained is the first score value of text, wherein first model obtains the characteristic pattern of described image, is based on the featureFigure carries out sliding window operation, each window feature is determined, according to preset width and Height Prediction in each window featureEach position Suggestion box；Using the corresponding window feature sequence of every row of the characteristic pattern as the input of Recognition with Recurrent Neural Network, baseInclude in the location information and each Suggestion box that the Recognition with Recurrent Neural Network obtains each Suggestion box for including in described imageContent be text the first score value.

Since the text information in image is likely distributed in any position in image, and may there was only one in imageSubregion includes text to be identified.Therefore before being identified to the text in image, it is necessary first to the text in imageCurrent row location information carries out positioning operation and obtains the location information of line of text in the picture.According to the text line position after positioning operationConfidence breath carries out identification operation to text wherein included.

Wherein, two neural networks for including in the first model comprising convolutional neural networks and Recognition with Recurrent Neural Network can be withIt is: convolutional neural networks and Recognition with Recurrent Neural Network.Since the purpose of the operation of convolutional neural networks and Recognition with Recurrent Neural Network is allIn order to realize the positioning to line of text location information in image, therefore by above-mentioned two neural network collectively the first mouldType.The image of text to be identified is input in convolutional neural networks, the convolution sum pondization operation by several layers finally obtainsThe characteristic pattern of image.Sliding window convolution operation is carried out on this feature figure, obtains window feature；And when carrying out sliding window operation everyA sliding window center goes out each position Suggestion box according to the width and Height Prediction of setting.It is obtained above-mentioned according to sliding window convolution operationWindow feature be input in Recognition with Recurrent Neural Network, the coordinate information of each position Suggestion box of final output and the position are builtThe first score value containing text in frame is discussed, the score value is for judging whether the position Suggestion box is candidate Suggestion box.

When determining position Suggestion box, the embodiment of the present invention passes through the width and height at each sliding window center using settingPredict each position Suggestion box.Since the text height in the line of text in image is indefinite, built according to fixation in the prior artThe Suggestion box generation method of frame size and shape, the problem of will cause line of text position inaccurate are discussed, and the embodiment of the present invention mentionsThe method of the generation position Suggestion box of confession can solve above-mentioned problem.And by setting threshold decision position Suggestion box whetherFor candidate Suggestion box, redundant position Suggestion box is removed, it is possible to reduce open due to increasing the calculating of Suggestion box size and shape bringPin.Meanwhile line of text is positioned by introducing Recognition with Recurrent Neural Network model in first model, due to recycling nerve netNetwork model itself has the characteristic of memory, therefore the context of text sequence can be effectively obtained using the Recognition with Recurrent Neural NetworkInformation simultaneously adds it in position fixing process.In specific implementation it is possible that a kind of situation be, between same row textWhite space Suggestion box score value can because front and back text sequence signature and be promoted, finally make acquisition textRow Suggestion box is more in line with the position feature of text sequence, keeps positioning result more accurate.

For example, by taking the single image Text region of express delivery face as an example, the line of text positioning operation of express delivery face single image to be identifiedSpecific implementation procedure it is as shown in Figure 2, wherein Convx_x represents the convolution operation of disparate modules, the dotted line connection of convolution modulePart represents pondization operation.BLSTM (Bidirectional Long Short-term Memory) is two-way long short-term memoryNeural network, FC (Fully Connected) refer to full articulamentum, predict k location Suggestion box altogether in characteristic pattern conv5_3,After BLSTM and FC layers, exporting the content for including in the location information and each Suggestion box of each Suggestion box of prediction isThe score value of text.

Image to be identified is input to the convolutional neural networks based on VGGNet that training is completed in advance first, extracts figureAs feature, the network alternately convolution-pondization is operated, specifically, image pass through altogether 13 3 × 3 convolutional layer and 42 ×2 maximum pond layer, the final shape that obtains is W × H × C characteristic pattern conv5_3, and wherein W, H, C respectively represent characteristic patternWide, high and port number；

It is 1 that step-length is carried out on characteristic pattern conv5_3 obtained above, and the sliding window convolution that convolution kernel size is 3 × 3 is graspedMake, and predicts k position Suggestion box according to certain shapes and sizes at each sliding window center；

In specific implementation, k is set as 10；And certain shapes and sizes are particularly: fixed wide using small scaleDegree, the position Suggestion box set-up mode only changed on altitude range.Specifically, fixed width can be set to 16 pixels, it is highThe mode changed in degree range are as follows: height is reduced to 11 pictures from 283 pixels according to the method that reduction ratio is 0.7 respectivelyElement predicts 10 position Suggestion box according to above-mentioned method altogether.

Secondly, the window feature of features described above figure conv5_3 t 3 × 3 × C obtained by sliding window convolution operation is madeIt is characterized sequence inputting BLSTM neural network, is cyclically updated the internal state H of hidden layer_t, according to the following formula to internal stateIt is cyclically updated:

Wherein X_t∈R^3×3×CIt is the characteristic sequence obtained in the every a line of characteristic pattern conv5_3 from t sliding window, W isThe width of characteristic pattern conv5_3, C are characterized the port number of figure conv5_3,For nonlinear function.Obtain effective contextInformation connects the first scoring that the content for including in the location information and each Suggestion box of the FC layers of each Suggestion box of output is textValue.

For example, Fig. 3 be specific implementation process in an example, the figure show the result is that by BLSTM neural networkCorresponding first score value of the Suggestion box and Suggestion box of prediction after operation.Wherein the box of the third line represents Suggestion box, theThe number of two rows represents the first score value of Suggestion box, and the number in the first row represents the location index value of corresponding Suggestion box,Middle index value is for traversing Suggestion box.

S102: the first score value of screening is greater than the candidate Suggestion box of default scoring threshold value.

In the above-mentioned Suggestion box determined, it is possible to there is the Suggestion box not comprising text information.Therefore it is directed to upper oneThe content for including in each Suggestion box that step obtains is the score value of text, eliminates redundancy suggestion by preset scoring threshold valueFrame obtains candidate Suggestion box.Specifically, the Suggestion box is considered as if the score value of the Suggestion box is greater than preset scoring threshold valueCandidate Suggestion box；On the contrary, the Suggestion box is considered as redundancy suggestion if the score value of the Suggestion box is not more than preset scoring threshold valueFrame, and remove the Suggestion box.

For example, in specific implementation, preset scoring threshold value can be set to 0.7, judge Suggestion box score value whetherGreater than 0.7, if so, the Suggestion box is candidate Suggestion box；If it is not, then the Suggestion box is considered as redundancy Suggestion box, and it is superfluous to eliminate thisRemaining Suggestion box.

S103: according to the position of each candidate Suggestion box, candidate Suggestion box is merged to obtain target Suggestion box.

The corresponding text of a line every in image is positioned in order to realize, candidate Suggestion box obtained above need to be carried outMerging finds out target Suggestion box.Therefore according to the location information of the above-mentioned candidate Suggestion box found out, to candidate Suggestion box one by one intoRow, which merges, finds out target Suggestion box.

Wherein, the process merged for two candidate Suggestion box, in specific implementation, a kind of possible embodimentFor using the minimum circumscribed rectangle of this two candidate Suggestion box as the frame obtained after merging, i.e. target Suggestion box.

A kind of possible embodiment judges this two candidate Suggestion box in level for any two candidate's Suggestion boxWhether the distance in direction is less than the threshold value of setting, if it is, merging this two candidate Suggestion box.

S104: by each target Suggestion box be input in advance training complete comprising convolutional neural networks and circulation nerve netIn second model of network, the text for including in each target Suggestion box is identified.

Obtain line of text location information positioning result after, need to identify the text in line of text, identification it is accurateRate is very crucial for the automatic management for realizing image.Therefore the line of text in image is positioned by aforesaid operations,And after obtaining target Suggestion box, in order to identify the text in the target Suggestion box after positioning, by target Suggestion box obtained aboveIn the second model comprising convolutional neural networks and Recognition with Recurrent Neural Network that input training in advance is completed, identify in target Suggestion boxText information.

Wherein, two neural networks for including in the second model comprising convolutional neural networks and Recognition with Recurrent Neural Network can be withIt is: convolutional neural networks and Recognition with Recurrent Neural Network.Since the purpose of the operation of convolutional neural networks and Recognition with Recurrent Neural Network is allIn order to realize the identification to text information in image, therefore by above-mentioned two neural network collectively the second model.

Using target Suggestion box obtained above as the input of convolutional neural networks, by several convolution sum maximum pondsOperation obtains image convolution feature, and using image convolution feature as the input of Recognition with Recurrent Neural Network, obtains the output of convolutional layerAnd it is calculated as the corresponding classification scoring of width dimension therewith.Using connectionism timing classification method by Recognition with Recurrent Neural NetworkOutput result be converted into sequence label, and probability is defined to sequence label according to the predicted value of every frame, uses negative pair of probabilityNumber likelihood, can be corresponding with sequence label with direct construction image as objective function training network, without marking single character.

For example, input picture is the target Suggestion box obtained after aforesaid operations, wait know by taking the single image of express delivery face as an exampleThe specific implementation procedure of the line of text identification operation of other express delivery face single image is as shown in Figure 4, wherein Convolution represents 3× 3 convolutional layer, Dense Blocks represent 1 × 1 and 3 × 3 combined convolutional layers, and Transition Layers represents 2 × 2Maximum pond layer, BGRU (Bidirectional Gated Recurrent Unit) be based on two-way GRU recycle nerve netNetwork model.

Detailed process are as follows: by the express delivery face single image after positioning be input in advance training complete based on DenseNet'sUltra-deep network structure extracts characteristics of image, image first pass around 3 × 3 convolutional layer, again successively alternately across several 1 × 1 and 3 ×3 combined convolutional layers and 1 × 1 convolutional layer and 2 × 2 maximum pond layer, network model depth reaches 120 layers.

Convolutional layer is obtained by being based on two-way GRU Recognition with Recurrent Neural Network layer using characteristics of image obtained above as inputIt exports and is calculated as the corresponding classification scoring of width dimension therewith；

Using connectionism timing classify (Connectionist Temporal Classification, CTC) method,Sequence label is converted by Recognition with Recurrent Neural Network layer output result.Probability is defined to sequence label according to the predicted value of every frame, is madeUse the negative log-likelihood of probability as objective function training network, can be corresponding with sequence label with direct construction image, it is not necessarily toMark single character.

In embodiments of the present invention, relative to the Transition Layers of traditional DenseNet using average pondMode, using maximum pond layer come the texture information of keeping characteristics figure in the embodiment of the present invention, and in most latter two maximum pondIn layer, step-length is used to operate in width dimension for 1 pondization, more characteristic informations for retaining width dimension, so that narrow character is examinedIt surveys more robust.GRU method is used in the embodiment of the present invention, is a kind of Recognition with Recurrent Neural Network more efficient than LSTM network,The extraction of word sequence contextual information can be enhanced, so that the prediction result of text sequence is more accurate.The embodiment of the present inventionThe CTC method of middle use is the common method for transformation of processing cycle neural network output result, can be converted output result toSequence label obtains last text results by the operations such as duplicate removal and rejecting space, and process object is entire sequence label,Rather than single character.

Embodiment 2:

It is on the basis of the above embodiments, in embodiments of the present invention, described to incite somebody to action in order to keep Text region accuracy higherImage comprising text to be identified is input to first comprising convolutional neural networks and Recognition with Recurrent Neural Network that training in advance is completedBefore in model, the method also includes:

And image carries out text orientation correction to treated.

The image comprising text to be identified is obtained by image capture devices such as cameras, since the text information in image canIt can be distributed any position in the picture, and may only some region include text to be identified in image.Therefore existBefore being identified to the text in image, first using threshold segmentation method and connected domain analysis method to image atReason removes redundant area, and retains the area image comprising text to be identified.Also, for the Text region knot for guaranteeing imageFruit is more accurate, carries out text orientation correction to the area image comprising text to be identified, is horizontally oriented line of text.ItsIn, process that sampling threshold dividing method and connected domain analysis method handle image and to comprising text to be identifiedThe process that area image carries out text orientation correction belongs to the prior art, no longer repeats herein the process.

Embodiment 3:

It is every in order to obtain due to only including the text information of sub-fraction in each candidate Suggestion box obtained aboveThe corresponding complete text row information of a line, on the basis of the various embodiments described above, in embodiments of the present invention, according to each timeThe position for selecting Suggestion box, candidate Suggestion box is merged to obtain target Suggestion box include:

As shown in figure 5, be wherein the candidate Suggestion box obtained in the embodiment of the present invention according to the above process in dotted line frame, it is emptyWire frame is target Suggestion box, and two frames below arrow are two candidate Suggestion box, the respectively first candidate Suggestion box and secondCandidate Suggestion box.Wherein A1, B1, C1, D1, A2, B2, C2, D2 respectively represent the first candidate Suggestion box and the second candidate Suggestion boxFour corner locations, h₁And h₂Respectively represent the first candidate Suggestion box the first height and the second candidate Suggestion box it is second highDegree.

When calculating the degree of overlapping of the first candidate Suggestion box and the second candidate Suggestion box in vertical direction, a kind of possible realityThe mode of applying is, according to the first candidate Suggestion box and the second candidate Suggestion box vertical direction coordinate length of overlapped part divided by h₁And h₂In maximum value, i.e., according to following formula: overlap=| y_A2-y_D1|/max(h₁,h₂), calculate the overlapping of vertical directionDegree.

The embodiment of another possibility is, according to the first candidate Suggestion box and the second candidate Suggestion box in vertical directionCoordinate lap divided by h₁And h₂Average value, i.e., according to following formula: overlap=| y_A2-y_D1|/mean(h₁,h₂),Calculate the degree of overlapping of vertical direction.

The third possible embodiment is contemplated that, according to the first candidate Suggestion box and the second candidate Suggestion box in Vertical SquareTo coordinate lap divided by h₁And h₂Union, i.e., according to following formula: overlap=| y_A2-y_D1|/union(h₁,h₂), calculate the degree of overlapping of vertical direction.

Embodiment 4:

In order to make to determine that the degree of overlapping of vertical direction is more acurrate, on the basis of the various embodiments described above, implement in the present inventionIn example, determine that the degree of overlapping of vertical direction includes:

According to the of the first height of the described first candidate Suggestion box and the first vertical coordinate and the second candidate Suggestion boxTwo height and the second vertical coordinate, using following formula: overlap=| y_A2-y_D1|/min(h₁,h₂), determine the Vertical SquareTo degree of overlapping, wherein y_A2Represent the second vertical coordinate of the described second candidate Suggestion box, y_D1First candidate is represented to buildDiscuss the first vertical coordinate of frame, h₁And h₂Respectively represent the height of the described first candidate Suggestion box and the second candidate Suggestion box.

Specifically, according to the first height of the first candidate Suggestion box and the first vertical coordinate and the second candidate Suggestion boxSecond height and the second vertical coordinate, according to following formula: overlap=| y_A2-y_D1|/min(h₁,h₂), determine first timeSelect the degree of overlapping of Suggestion box and the second candidate Suggestion box.Wherein, y_A2The second vertical coordinate of the described second candidate Suggestion box is represented,y_D1Represent the first vertical coordinate of the described first candidate Suggestion box, h₁And h₂Respectively represent the described first candidate Suggestion box and instituteState the height of the second candidate Suggestion box.

Embodiment 5:

In order to make to determine that shape similarity is more acurrate, on the basis of the various embodiments described above, in embodiments of the present invention, reallyDetermining shape similarity includes:

Specifically, according to the second height of the first height of the first candidate Suggestion box and the second candidate Suggestion box, according toLower formula: similarity=min (h₁,h₂)/max(h₁,h₂), determine the first candidate Suggestion box and the second candidate Suggestion boxShape similarity.Wherein, h₁And h₂Respectively represent the height of the described first candidate Suggestion box and the second candidate Suggestion box.

Embodiment 6:

In order to be positioned to the image comprising text to be identified newly inputted, therefore also wrap before locating itPre-training process is included, on the basis of the various embodiments described above, in embodiments of the present invention, trains the mistake of first model in advanceJourney includes:

Since the purpose of first model is to input image to be identified to position the line of text in images to be recognizedIt is the location information of each position Suggestion box in the image and each position Suggestion box in order to obtain into first modelIn include content be text the second score value, which is to calculate whether the position Suggestion box is that candidate buildsDiscuss frame.Therefore before carrying out pre-training to the first model, it is necessary first to be labeled to image data, obtain sample image.Specifically, being labelled in the location information and each position Suggestion box of each position Suggestion box in each image and includingContent is the second score value of text.

In specific implementation, a certain number of batch sample images are inputted every time, using propagated forward, error calculation, backwardIt propagates and weight updates step and is updated to model parameter；It continually enters batch sample and repeats above step, constantly adjustment ginsengNumber, the error of corrective networks output and a reference value finally obtain the network parameter of optimization, the i.e. network model of training completion.

Particularly, before network model starts training, general training method is all by the way of random initializtionThe initial parameter value of model is set.But the mode of random initializtion model parameter can theoretically converge to it is optimal,But its disadvantage it is also obvious that model restrain needed for the training time it is longer, be easily trapped into local optimum, it is not easy to obtain high-precisionNetwork model.It therefore, in embodiments of the present invention, will trained model in the prior art using transfer learning methodParameter moves to the mode that original model parameter random initializtion is replaced in new model, and this method is accelerated and optimizes new modelLearning efficiency and convergence rate.Specifically, real as the present invention using the Text region model parameter of some general datas trainingThe initial parameter for applying the model of example is trained.

Further, using incremental learning training method.Since the sample of simulation mark and true labeled data quantity are poorDifferent great disparity.Therefore, in embodiments of the present invention, the sample of the simulation mark of ten million magnitude is trained first, and then incremental learning is trueReal labeled data.In the increased situation of authentic specimen dynamic, the repetitive learning of the sample to magnanimity simulation mark is avoided, togetherWhen take full advantage of history training result, constantly adjust and optimize final model, reduce model training for the time and depositingStore up the demand in space.

Embodiment 7:

It in order to be identified to the image after positioning, therefore further include pre-training process before being identified to it,On the basis of the various embodiments described above, in embodiments of the present invention, the process of second model is trained to include: in advance

Obtain each line of text marked in sample image；

Since the purpose of second model is to input image to be identified to identify the line of text in images to be recognizedIt is the line of text in order to obtain in the image into second model, determining can be obtained text by Chinese dictionary after this article current rowText information in current row.Therefore before carrying out pre-training to the second model, it is necessary first to be labeled, obtain to image dataTake sample image.Specifically, being labelled with each line of text in each image.Next training identical with the first model is usedMode is trained, final the second model for obtaining training and completing, the Text region for new input picture.

For example, by taking the single image Text region of express delivery face as an example, the entire flow of list Text region in express delivery face as shown in FIG. 6Figure.

First against the express delivery face single image of input, list region in face is intercepted using Threshold segmentation and connected domain analysis method,Preliminary text orientation correction is carried out to the face list region of interception, so that line of text is all in horizontal direction.

Face list area image after aforesaid operations is input to line of text locating module, specific operation process are as follows: willInput of the face list area image as convolutional neural networks, obtains characteristic pattern；Sliding window operation is carried out on this feature figure, eachSliding window center predicts k position Suggestion box according to certain shapes and sizes；Using characteristic pattern obtained above as circulation mindInput through network, obtaining the content for including in the location information and each position Suggestion box of position Suggestion box is commenting for textScore value；Candidate Suggestion box is obtained by given threshold for the score value of position Suggestion box, and according to above-mentioned candidate Suggestion boxMerging algorithm it is merged, obtain target Suggestion box, target Suggestion box is the text that the locating module finally obtainsRow positioning result.

The line of text positioning result obtained after aforesaid operations is input to line of text identification module, specific operation processAre as follows: using line of text positioning result as the input of convolutional neural networks, extract characteristic pattern；Using this feature figure as circulation nerve netThe input of network obtains convolutional layer and exports and be calculated as the corresponding classification scoring of width dimension therewith；It will be followed using CTC methodThe convolutional layer output result of ring neural network is converted into sequence label, is compared, is obtained most by sequence label and Chinese dictionaryText information afterwards.The text information of above-mentioned acquisition is sorted out respectively according to name, phone and address etc., structuring can be obtainedFast reading electronic surface list information.

Embodiment 8:

Fig. 7 is a kind of character recognition device provided in an embodiment of the present invention, which includes:

Obtain module 701, for by include text to be identified image be input in advance training complete comprising convolution mindIn the first model through network and Recognition with Recurrent Neural Network, the location information of each Suggestion box for including in described image and every is obtainedThe content for including in a Suggestion box is the first score value of text, wherein first model obtains the characteristic pattern of described image,Sliding window operation is carried out based on the characteristic pattern, each window feature is determined, according to preset width in each window featureDegree and each position Suggestion box of Height Prediction；Using the corresponding window feature sequence of every row of the characteristic pattern as circulation nerve netThe input of network obtains the location information for each Suggestion box for including in described image based on the Recognition with Recurrent Neural Network and each buildsThe content for including in view frame is the first score value of text；

Screening module 702, the candidate Suggestion box for being greater than default scoring threshold value for screening the first score value；

Merging module 703 merges to obtain target for the position according to each candidate Suggestion box to candidate Suggestion boxSuggestion box；

Identification module 704, for by each target Suggestion box be input to that training in advance completes comprising convolutional neural networksIn the second model of Recognition with Recurrent Neural Network, the text for including in each target Suggestion box is identified.

Described device further include: correction module 705, for using threshold segmentation method and connected domain analysis method to describedImage is handled；And image carries out text orientation correction to treated.

The merging module 703 is specifically used for identifying whether to deposit for the first candidate Suggestion box in each candidate Suggestion boxIt is less than preset first threshold at a distance between the first candidate Suggestion box abscissa, the degree of overlapping of vertical direction is greater than pre-If second threshold, and shape similarity is greater than the second candidate Suggestion box of preset third threshold value, if it does, by described theOne candidate Suggestion box and the second candidate Suggestion box are incorporated as the first candidate Suggestion box；If it does not, this first is waitedSelect Suggestion box as target Suggestion box.

The merging module 703, specifically for being sat according to the first height of the described first candidate Suggestion box and first are verticalIt is marked with and the second height and the second vertical coordinate of the second candidate Suggestion box, using following formula: overlap=| y_A2-y_D1|/min(h₁,h₂), determine the degree of overlapping of the vertical direction, wherein y_A2Represent the second vertical seat of the described second candidate Suggestion boxMark, y_D1Represent the first vertical coordinate of the described first candidate Suggestion box, h₁And h₂Respectively represent the described first candidate Suggestion boxSecond height of the first height and the second candidate Suggestion box.

The merging module 703, specifically for being built according to the first height and the second candidate of the described first candidate Suggestion boxThe second height for discussing frame, using following formula: similarity=min (h₁,h₂)/max(h₁,h₂), determine that the shape is similarDegree, wherein h₁And h₂Respectively represent the height of the described first candidate Suggestion box and the second candidate Suggestion box.

Described device further include:

First training module 706, for obtaining sample image, wherein being labelled with each Suggestion box in the sample imageThe content that location information and each position Suggestion box include is the second score value of text；Each sample image is input to and includesIn first model of convolutional neural networks and Recognition with Recurrent Neural Network, according to the output of each first model, to first modelIt is trained.

Described device further include:

Second training module 707, for obtaining each line of text marked in sample image；Corresponding line of text will be includedEach sample image is input in the second model comprising convolutional neural networks and Recognition with Recurrent Neural Network, according to each second modelOutput, second model is trained.

In conclusion the embodiment of the present invention provides a kind of character recognition method and device, comprising: will include text to be identifiedImage be input in the first model comprising convolutional neural networks and Recognition with Recurrent Neural Network that training in advance is completed, obtain imageIn include each Suggestion box location information and each Suggestion box in include content be text the first score value, whereinFirst model obtains the characteristic pattern of image, carries out sliding window operation based on characteristic pattern, determines each window feature, special in each windowAccording to preset width and each position Suggestion box of Height Prediction in sign；The corresponding window feature sequence of every row of characteristic pattern is madeFor the input of Recognition with Recurrent Neural Network, the location information of each Suggestion box for including in image and every is obtained based on Recognition with Recurrent Neural NetworkThe content for including in a Suggestion box is the first score value of text；Identify that the first score value is greater than the default candidate for scoring threshold value and buildsDiscuss frame；According to the position of each candidate Suggestion box, candidate Suggestion box is merged to obtain target Suggestion box；Each target is builtView frame is input in the second model comprising convolutional neural networks and Recognition with Recurrent Neural Network that training in advance is completed, and identifies each meshThe text for including in mark Suggestion box.

The present invention be referring to according to the method for the embodiment of the present invention, the process of equipment (system) and computer program productFigure and/or block diagram describe.It should be understood that can be realized by computer program instructions each in flowchart and/or the block diagramThe combination of process and/or box in process and/or box and flowchart and/or the block diagram.It can provide these computersProcessor of the program instruction to general purpose computer, special purpose computer, Embedded Processor or other programmable data processing devicesTo generate a machine, so that generating use by the instruction that computer or the processor of other programmable data processing devices executeIn the dress for realizing the function of specifying in one or more flows of the flowchart and/or one or more blocks of the block diagramIt sets.

These computer program instructions, which may also be stored in, is able to guide computer or other programmable data processing devices with spyDetermine in the computer-readable memory that mode works, so that it includes referring to that instruction stored in the computer readable memory, which generates,Enable the manufacture of device, the command device realize in one box of one or more flows of the flowchart and/or block diagram orThe function of being specified in multiple boxes.

These computer program instructions also can be loaded onto a computer or other programmable data processing device, so that countingSeries of operation steps are executed on calculation machine or other programmable devices to generate computer implemented processing, thus in computer orThe instruction executed on other programmable devices is provided for realizing in one or more flows of the flowchart and/or block diagram oneThe step of function of being specified in a box or multiple boxes.

Although preferred embodiments of the present invention have been described, it is created once a person skilled in the art knows basicProperty concept, then additional changes and modifications may be made to these embodiments.So it includes excellent that the following claims are intended to be interpreted asIt selects embodiment and falls into all change and modification of the scope of the invention.

Obviously, various changes and modifications can be made to the invention without departing from essence of the invention by those skilled in the artMind and range.In this way, if these modifications and changes of the present invention belongs to the range of the claims in the present invention and its equivalent technologiesWithin, then the present invention is also intended to include these modifications and variations.

Claims

1. a kind of character recognition method, which is characterized in that the described method includes:

By the image comprising text to be identified be input to that training in advance completes comprising convolutional neural networks and Recognition with Recurrent Neural NetworkThe first model in, obtain described image in include each Suggestion box location information and each Suggestion box in include contentFor the first score value of text, wherein first model obtains the characteristic pattern of described image, is slided based on the characteristic patternWindow operation, determines each window feature, according to preset width and each position of Height Prediction in each window featureSuggestion box；Using the corresponding window feature sequence of every row of the characteristic pattern as the input of Recognition with Recurrent Neural Network, followed based on describedRing neural network obtain described image in include each Suggestion box location information and each Suggestion box in include content beFirst score value of text；

Each target Suggestion box is input to second comprising convolutional neural networks and Recognition with Recurrent Neural Network that training in advance is completedIn model, the text for including in each target Suggestion box is identified.

2. the method as described in claim 1, which is characterized in that described that the image comprising text to be identified is input to preparatory instructionBefore practicing in the first model comprising convolutional neural networks and Recognition with Recurrent Neural Network completed, the method also includes:

And image carries out text orientation correction to treated.

3. the method as described in claim 1, which is characterized in that candidate is built in the position according to each candidate Suggestion boxView frame merges to obtain target Suggestion box

For the first candidate Suggestion box in each candidate Suggestion box, recognize whether with the first candidate Suggestion box abscissa itBetween distance be less than preset first threshold, the degree of overlapping of vertical direction is greater than preset second threshold, and shape similarity is bigIn the second candidate Suggestion box of preset third threshold value, if it does, by the described first candidate Suggestion box and second candidateSuggestion box is incorporated as the first candidate Suggestion box；If it does not, using the first candidate Suggestion box as target Suggestion box.

4. method as claimed in claim 3, which is characterized in that the degree of overlapping for determining the vertical direction includes:

It is high according to the second of the first height of the described first candidate Suggestion box and the first vertical coordinate and the second candidate Suggestion boxDegree and the second vertical coordinate, using following formula: overlap=| y_A2-y_D1|/min(h₁,h₂), determine the vertical directionDegree of overlapping, wherein y_A2Represent the second vertical coordinate of the described second candidate Suggestion box, y_D1Represent the described first candidate Suggestion boxThe first vertical coordinate, h₁And h₂Respectively represent the first height and the second candidate Suggestion box of the described first candidate Suggestion boxSecond height.

5. method as claimed in claim 3, which is characterized in that determine that the shape similarity includes:

According to the second height of the first height of the described first candidate Suggestion box and the second candidate Suggestion box, using following formula:Similarity=min (h₁,h₂)/max(h₁,h₂), determine the shape similarity, wherein h₁And h₂Respectively represent describedThe height of one candidate Suggestion box and the second candidate Suggestion box.

6. the method as described in claim 1, which is characterized in that the process for training first model in advance includes:

Obtain sample image, wherein be labelled in the sample image each Suggestion box location information and each position Suggestion boxThe content for including is the second score value of text；

Each sample image is input in the first model comprising convolutional neural networks and Recognition with Recurrent Neural Network, according to eachThe output of one model is trained first model.

7. the method as described in claim 1, which is characterized in that the process for training second model in advance includes:

Obtain each line of text marked in sample image；

Each sample image comprising corresponding line of text is input to second comprising convolutional neural networks and Recognition with Recurrent Neural NetworkIn model, according to the output of each second model, second model is trained.

8. a kind of character recognition device, which is characterized in that described device includes:

Obtain module, for by include text to be identified image be input to that training in advance completes comprising convolutional neural networks andIn first model of Recognition with Recurrent Neural Network, the location information for each Suggestion box for including in acquisition described image and each Suggestion boxIn include content be text the first score value, wherein first model obtains the characteristic pattern of described image, based on describedCharacteristic pattern carries out sliding window operation, each window feature is determined, according to preset width and height in each window featurePredict each position Suggestion box；Using the corresponding window feature sequence of every row of the characteristic pattern as the defeated of Recognition with Recurrent Neural NetworkEnter, is obtained in location information and each Suggestion box for each Suggestion box for including in described image based on the Recognition with Recurrent Neural NetworkThe content for including is the first score value of text；

Merging module merges candidate Suggestion box to obtain target Suggestion box for the position according to each candidate Suggestion box；

Identification module, for each target Suggestion box to be input to the convolution mind comprising Recognition with Recurrent Neural Network that training in advance is completedIn the second model through network, the text for including in each target Suggestion box is identified.

9. device as claimed in claim 8, which is characterized in that described device further include:

Correction module, for being handled using threshold segmentation method and connected domain analysis method described image；And to processingImage afterwards carries out text orientation correction.

10. device as claimed in claim 8, which is characterized in that the merging module is specifically used for for each candidate suggestionFirst candidate Suggestion box in frame recognizes whether that the distance between the first candidate Suggestion box abscissa is less than preset theOne threshold value, the degree of overlapping of vertical direction are greater than preset second threshold, and shape similarity is greater than the of preset third threshold valueTwo candidate Suggestion box, if it does, the described first candidate Suggestion box and the second candidate Suggestion box are incorporated as the first timeSelect Suggestion box；If it does not, using the first candidate Suggestion box as target Suggestion box.

11. device as claimed in claim 10, which is characterized in that the merging module is specifically used for waiting according to described firstThe first height of Suggestion box and the second height and the second vertical coordinate of the first vertical coordinate and the second candidate Suggestion box are selected, is adoptedWith following formula: overlap=| y_A2-y_D1|/min(h₁,h₂), determine the degree of overlapping of the vertical direction, wherein y_A2Represent instituteState the second vertical coordinate of the second candidate Suggestion box, y_D1Represent the first vertical coordinate of the described first candidate Suggestion box, h₁And h₂Respectively represent the first height of the described first candidate Suggestion box and the second height of the second candidate Suggestion box.

12. device as claimed in claim 10, which is characterized in that the merging module is specifically used for waiting according to described firstThe first height of Suggestion box and the second height of the second candidate Suggestion box are selected, using following formula: similarity=min (h₁,h₂)/max(h₁,h₂), determine the shape similarity, wherein h₁And h₂Respectively represent the described first candidate Suggestion box and described theThe height of two candidate Suggestion box.

13. device as claimed in claim 8, which is characterized in that described device further include:

First training module, for obtaining sample image, wherein being labelled with the position letter of each Suggestion box in the sample imageThe content that breath and each position Suggestion box include is the second score value of text；Each sample image is input to comprising convolution mindIn the first model through network and Recognition with Recurrent Neural Network, according to the output of each first model, first model is instructedPractice.

14. device as claimed in claim 8, which is characterized in that described device further include:

Second training module, for obtaining each line of text marked in sample image；By each sample comprising corresponding line of textThis image is input in the second model comprising convolutional neural networks and Recognition with Recurrent Neural Network, according to the defeated of each second modelOut, second model is trained.