The content of the invention
For this reason, recognition methods and computing device the present invention provides key message in a kind of invoice, to try hard to solve orPerson at least alleviates existing at least one problem above.
According to an aspect of the invention, there is provided in a kind of invoice key message recognition methods, suitable for being set in calculatingStandby middle execution, including step:Obtain invoice image to be identified;By Information locating model orientation into invoice image at least oneA key message, and mark out position and the type of each key message;According to the position of the key message marked from invoiceThe image for including each key message is cut out in image respectively;Each image comprising key message is carried out at Character segmentationReason, obtains the corresponding multiple character pictures of each key message;Go out each key message by character recognition Model Identification to correspond toCharacter picture in character;And the character recognition for combining the type of each key message and being identified goes out corresponding invoice letterBreath.
Alternatively, in recognition methods according to the present invention, the type of key message is included at least with one in Types BelowKind is a variety of:Taxpayer's mark, invoice mark, date of making out an invoice, check code, invoice amount, purchase product standard and type.
Alternatively, in recognition methods according to the present invention, Character segmentation is carried out to each image comprising key messageProcessing, the step of obtaining each key message corresponding multiple character pictures, include:For each key message, to including the passThe image of key information carries out Character segmentation processing, obtains subgraph of the image comprising each character as the key message;WithAnd processing is zoomed in and out to each subgraph, it is met the character picture of preliminary dimension.
Alternatively, in recognition methods according to the present invention, the step of further including advance training information location model:ToAll training images in one training image set, mark the position of each key message and type in each training image;WithAnd initial Information locating model will be inputted by the training image of mark, train letter using asynchronous stochastic gradient descent methodLocation model is ceased, until convergence.
Alternatively, in recognition methods according to the present invention, Information locating model uses convolutional neural networks structure, includingAt least one first convolutional layer, the second convolutional layer, pond layer, full articulamentum and classification layer, wherein the second convolutional layer is suitable for passing throughThe convolution kernel of multiple and different scales carries out process of convolution, and the convolution results of multiple convolution kernels are connected, as the volume TwoThe output result of lamination.
Alternatively, in recognition methods according to the present invention, be further adapted in the second convolutional layer by the pond window of 3*3 intoThe processing of row maximum pondization.
Alternatively, in recognition methods according to the present invention, the step of advance training information location model, further includes generationThe step of first training image set:Each initial training image is pre-processed to generate and each initial training image phaseOne or more image closed, obtains the first training image set.
Alternatively, in recognition methods according to the present invention, pretreatment includes the one or more in following operation:Add and make an uproarSound, rotation, flip horizontal, scaling, horizontal translation, vertical translation.
Alternatively, in recognition methods according to the present invention, the step of training character recognition model in advance is further included:ByTraining image in two training image set inputs initial character recognition model to export recognition result;And calculate outputThe error of recognition result and legitimate reading, and by the method backpropagation of minimization error to adjust the ginseng of character recognition modelNumber, trains when error meets predetermined condition and terminates.
Alternatively, in recognition methods according to the present invention, character recognition model uses convolutional neural networks structure, includingThree convolutional layers, two pond layers, full an articulamentum and an output layer.
Alternatively, in recognition methods according to the present invention, the step of training character recognition model further includes generation in advanceThe step of second training image set:Collect comprising various characters image be used as sample image, wherein character include letter withNumeral;Character segmentation processing is carried out to sample image, obtains multiple images for including single character;Figure to including single characterAs carrying out mirror image processing, multiple relative images are generated to form sample set;And contract to the image in sample setPut image of the processing to be met preliminary dimension and be used as the second training image set.
Alternatively, in recognition methods according to the present invention, preliminary dimension 32*32
According to another aspect of the present invention, there is provided a kind of computing device, including:One or more processors;And storageDevice;One or more programs, wherein one or more program storages in memory and be configured as by one or more handleDevice performs, and one or more programs include being used for the instruction for performing the either method in recognition methods as described above.
In accordance with a further aspect of the present invention, there is provided a kind of computer-readable storage medium for storing one or more programsMatter, one or more programs include instruction, instruct when computing device so that the computing device side of identification as described aboveEither method in method.
The recognition methods of key message in invoice according to the present invention, using Information locating model, in invoice figure to be identifiedKey message can still be positioned in the case that image quality amount is bad, invoice to be identified is generated without specific hardwareInvoice image, not only save cost, but also be user-friendly.On the other hand, using character recognition model, also further improveThe recognition accuracy of the image not high to picture quality.
Embodiment
The exemplary embodiment of the disclosure is more fully described below with reference to accompanying drawings.Although the disclosure is shown in attached drawingExemplary embodiment, it being understood, however, that may be realized in various forms the disclosure without should be by embodiments set forth hereLimited.On the contrary, these embodiments are provided to facilitate a more thoroughly understanding of the present invention, and can be by the scope of the present disclosureCompletely it is communicated to those skilled in the art.
Fig. 1 is the block diagram of Example Computing Device 100.In basic configuration 102, computing device 100, which typically comprises, isSystem memory 106 and one or more processor 104.Memory bus 108 can be used in processor 104 and system storageCommunication between device 106.
Depending on desired configuration, processor 104 can be any kind of processing, include but not limited to:Microprocessor(μ P), microcontroller (μ C), digital information processor (DSP) or any combination of them.Processor 104 can be included such asThe cache of one or more rank of on-chip cache 110 and second level cache 112 etc, processor core114 and register 116.Exemplary processor core 114 can include arithmetic and logical unit (ALU), floating-point unit (FPU),Digital signal processing core (DSP core) or any combination of them.Exemplary Memory Controller 118 can be with processor104 are used together, or in some implementations, Memory Controller 118 can be an interior section of processor 104.
Depending on desired configuration, system storage 106 can be any type of memory, include but not limited to:EasilyThe property lost memory (RAM), nonvolatile memory (ROM, flash memory etc.) or any combination of them.System storesDevice 106 can include operating system 120, one or more apply 122 and routine data 124.In some embodiments,It may be arranged to be operated using routine data 124 on an operating system using 122.In certain embodiments, computing device100 are configured as performing the recognition methods 200 of key message in invoice, and this method 200 can identify the key in invoice imageInformation, the instruction for performing method 200 is contained in routine data 124.
Computing device 100 can also include contributing to from various interface equipments (for example, output equipment 142, Peripheral Interface144 and communication equipment 146) to basic configuration 102 via the communication of bus/interface controller 130 interface bus 140.ExampleOutput equipment 142 include graphics processing unit 148 and audio treatment unit 150.They can be configured as contribute to viaOne or more A/V port 152 communicates with the various external equipments of such as display or loudspeaker etc.Outside exampleIf interface 144 can include serial interface controller 154 and parallel interface controller 156, they, which can be configured as, contributes toVia one or more I/O port 158 and such as input equipment (for example, keyboard, mouse, pen, voice-input device, touchInput equipment) or the external equipment of other peripheral hardwares (such as printer, scanner etc.) etc communicate.Exemplary communication is setStandby 146 can include network controller 160, it can be arranged to be easy to via one or more communication port 164 and oneThe communication that other a or multiple computing devices 162 pass through network communication link.In the present embodiment, interface equipment can be passed throughObtain invoice image to be identified.
Network communication link can be an example of communication media.Communication media can be usually presented as in such as carrier waveOr computer-readable instruction in the modulated data signal of other transmission mechanisms etc, data structure, program module, and canWith including any information delivery media." modulated data signal " can such signal, one in its data set or moreIt is a or it change can the mode of coding information in the signal carry out.As nonrestrictive example, communication media can be withInclude the wire medium of such as cable network or private line network etc, and it is such as sound, radio frequency (RF), microwave, infrared(IR) the various wireless mediums or including other wireless mediums.Term computer-readable medium used herein can include depositingBoth storage media and communication media.In certain embodiments, one or more programs are stored in computer-readable medium, thisOr multiple programs include performing the instruction of some methods.
Computing device 100 can be implemented as a part for portable (or mobile) electronic equipment of small size, these electronics are setIt is standby can be such as cell phone, personal digital assistant (PDA), it is personal media player device, wireless network browsing apparatus, aPeople's helmet, application specific equipment or the mixing apparatus that any of the above function can be included.Computing device 100 can be withIt is embodied as including desktop computer and the personal computer of notebook computer configuration, or the service with above-mentioned configurationDevice.
Fig. 2 shows the flow chart of the recognition methods 200 of key message in invoice according to an embodiment of the invention.
Method 200 starts from step S210, obtains invoice image to be identified.
A kind of implementation according to the present invention, computing device 100 are configured as server, receive what is uploaded by clientInvoice image to be identified.Wherein, invoice image can be that with arbitrary equipment, (e.g., the mobile terminal such as mobile phone, tablet, shines userCamera, scanner etc.) photo of invoice to be identified that is shot under any environment.
Another kind implementation, computing device 100 are configured as personal computer, pass through individual calculus according to the present inventionImage input device (such as camera) or other peripheral apparatus (such as printer, scanner etc.) on machine obtain invoice to be identifiedImage, as invoice image to be identified.
The present invention is not restricted this.The present invention is not also restricted the picture quality of invoice image.
Then in step S220, by Information locating model orientation into invoice image at least one key message, oneAnd position and the type of each key message are marked out.
According to an embodiment of the invention, the type of the key message in invoice image is included at least with one kind in Types BelowIt is or a variety of:Taxpayer's mark, invoice mark, date of making out an invoice, check code, invoice amount, purchase product standard and type etc..
Such as Fig. 3, the schematic diagram by some key message in Information locating model orientation invoice image 300 is shown.WillInvoice image 300 is inputted in Information locating model, orients key message 310, and the position for generating the key message 310 is satMark.According to one embodiment of present invention, the key message navigated to is selected with rectangular target circle, and rectangular target can be usedThe position of four apex coordinates mark key message 310 of frame.Meanwhile export the type of the key message 310.
Fig. 4 A show the network diagram of Information locating model 400 according to an embodiment of the invention.According to this hairBright one embodiment, the use convolutional neural networks structure of Information locating model 400, including at least one first convolutional layer 410,Second convolutional layer 420, pond layer 430, full articulamentum 440 and classification layer 450.It should be noted that the first convolutional layer in Fig. 4 A410th, the number of the second convolutional layer 420, pond layer 430, full articulamentum 440 and layer 450 of classifying is only as an example, the reality of the present inventionIt is without limitation to apply example.
According to a kind of implementation, the first convolutional layer 410 can use the convolution kernel of 3*3 or 7*7, and pond layer 430 can rootIt is finally fixed through the input of full articulamentum 440 extremely classification layer 450, output according to needing to select the core of 2*2 or 3*3 to carry out maximum pondPosition is arrived each key message type as a result, alternatively, classification layer 450 selects softmax layers.Alternatively, the second convolutional layer 420Using Inception module structures, process of convolution is carried out using the convolution kernel of multiple and different scales in same convolutional layer, andThe convolution results of multiple convolution kernels are connected, the output result as the second convolutional layer 420.Meanwhile second in convolutional layer 420The processing of maximum pondization is also carried out using the pond window of 3*3.
Fig. 4 B show a kind of network structure of the second convolutional layer 420 in Information locating model 400, as shown in Figure 4 B, adoptConvolution is carried out with the convolution kernel of 1*1,3*3,5*5, meanwhile, to avoid convolution from bringing huge calculation amount, (particularly 5*5 is this bigConvolution kernel), dimensionality reduction is first carried out using 1*1 convolution kernels.For example, the output of last layer is 100*100*128, by with 256(set, stride=1, pad=2) after the 5*5 convolutional layers of a output, output data 100*100*256.Wherein, convolutional layerParameter be 128*5*5*256.If last layer output first passes through the 1*1 convolutional layers with 32 output, then by having 256The 5*5 convolutional layers of a output, then final output data is still 100*100*256, but deconvolution parameter amount has been reduced to128*1*1*32+32*5*5*256, about reduces 4 times.As it can be seen that using the second convolutional layer 420 Information locating model 400 bothImprove network performance and do not roll up calculation amount.
Alternatively, Information locating model 400 may be referred to GoogLeNet network structures, the specific letter on GoogLeNetCease that visible " Going deeper with convolutions ", details are not described herein again.
According to an embodiment of the invention, method 200 should also include advance training information location model 400 step a)~b):
A) marks out each crucial letter in each training image to all training images in the first training image setThe position of breath and type.
B) will input initial Information locating model by the training image of mark, utilize asynchronous stochastic gradient descent sideMethod carrys out training information location model, until convergence.According to one embodiment of present invention, using in TensorflowTraining training information location models, major parameter are arranged to:Batch=64, momentum=0.9, decay=0.0005, learning_rate=0.001.
According to another embodiment of the present invention, the step of advance training information location model further includes the first training figure of generationThe step of image set closes:Various invoice images are collected as initial training image, meanwhile, each initial training image is located in advanceReason with generate with relevant one or more image of each initial training image, by initial training image and relative imageThe first training image set is used as in the lump.Alternatively, pretreatment includes the one or more in following operation:Plus noise, rotation,Flip horizontal, scaling, horizontal translation, vertical translation.In this way, the training image in the first training image set just contains differenceThe various invoice images of picture quality, different shooting angles.No matter that is, occur printing dislocation, off normal in invoice imageIt is trained still there are invoice image is not clear enough, invoice image slanting, position are placed situations such as not positive etc. uncertain factorKey message can be accurately positioned in Information locating model 400 afterwards.
Then in step S230, cut out and included respectively from invoice image according to the position of the key message markedThe image of each key message.
As it was noted above, according to the position coordinates of the key message of output, cut out according to rectangular target frame comprising the passThe image of key information.For example, 5 key messages are navigated to from invoice image to be identified, accordingly, it is possible to cut out 5A key message image, and the size of this 5 key message images does not have to be consistent.
Then in step S240, Character segmentation processing is carried out to each image comprising key message, obtains each passThe corresponding multiple character pictures of key information.
Specifically, for each key message, Character segmentation processing first is carried out to the image comprising the key message, is obtainedSubgraph of the image comprising each character as the key message, wherein, character includes numeral and/or letter.Again to eachSubgraph zooms in and out processing, is met the character picture of preliminary dimension.It should be noted that the embodiment of the present invention is to dividingThe method for cutting character is not restricted, such as can carry out binary conversion treatment to the image comprising key message, then passes through connected domainAnalysis obtains each character;The method of projection histogram can also be used directly to be partitioned into each character;All Character segmentations are calculatedMethod can be combined with the embodiment of the present invention, realize the method 200 of the present invention.
By taking key message is taxpayer's mark as an example, taxpayer's mark can be identification card number, generally comprise 18 characters,After Character segmentation is handled, 18 subgraphs are obtained, then handle this 18 subgraphs unifications to pre- scale respectively by scalingIt is very little, alternatively, preliminary dimension 32*32.
Then in step s 250, gone out by character recognition Model Identification in the corresponding character picture of each key messageCharacter.
As Fig. 5 shows the network structure of character recognition model 500 according to an embodiment of the invention, character recognitionModel 500 use convolutional neural networks structure, be followed successively by the first convolutional layer 510, the first pond layer 520, the second convolutional layer 530,Second pond layer 540, the 3rd convolutional layer 550, full articulamentum 560 and output layer 570.
As understood above, the character picture size of input is 32*32.Alternatively, the first convolutional layer 510 uses 6 sizesConvolution algorithm is carried out to input picture for the convolution kernel of 5*5, characteristic pattern size is 32-5+1=28, and it is 28* that common property, which gives birth to 6 sizes,28 characteristic pattern.Using maximum pond, pond is dimensioned to 2*2, the spy of 6 14*14 is obtained through Chi Huahou first pond layer 520Sign figure.Second convolutional layer 530 uses 16 kind different convolution kernels of the size for 5*5, each characteristic pattern in the second convolutional layer 530All it is that all 6 in the first pond layer 520 or wherein several characteristic patterns are weighted combination and obtain, therefore it is 16 to exportThe characteristic pattern of 10*10.Characteristic pattern of the second pond layer 540 with the first pond layer 520, finally 16 5*5 of output.3rd convolutional layer550 continue to carry out convolution with output of the convolution kernel of 5*5 to the second pond layer 540, and convolution nuclear volume increases to 120, suchThe output picture size of three convolutional layers 550 is 5-5+1=1, the characteristic pattern of 120 1*1 of final output.Full articulamentum 560 and theThree convolutional layers 550 are connected, and export 84 characteristic patterns.Finally exported by output layer 570 and represent which class is extracted feature belong toOther vector.By taking identification numeral 0~9 as an example, if output layer 570 exports [0,0,0,1,0,0,0,0,0,0], then explanation inputsThe character picture is 3.
According to an embodiment of the invention, method 200 further includes step 1)~2 of training character recognition model 500 in advance):
1) training image in the second training image set is inputted into initial character recognition model to export recognition result.
2) recognition result of output and the error of legitimate reading are calculated, and by the method backpropagation of minimization error to adjustThe parameter of whole character recognition model, trains when error meets predetermined condition and terminates.It should be noted that only it is example hereinA kind of method of trained character recognition model is provided to property, the embodiment of the present invention is not restricted the predetermined condition of error,It can be configured according to actual needs.
According to still another embodiment of the invention, train in advance and the second training figure of generation is further included the step of character recognition modelThe step of image set closes:The image comprising various characters is collected as sample image (wherein character includes letter and number), to sampleThis image carries out Character segmentation processing, obtains multiple images for including single character, and equally, the present embodiment is to Character segmentation processingMethod be not restricted.Then, be expand training sample, to comprising single character image carry out mirror image processing, generation and itsMultiple relevant images zoom in and out the image in sample set processing to be met preliminary dimension to form sample setThe image of (e.g., 32*32) is as the second training image set.
Then in step S260, go out corresponding invoice with reference to the type of each key message and the character recognition that is identifiedInformation.
Or by taking key message is taxpayer's mark as an example, after 18 characters are identified, with reference to key message type, justThe taxpayer's mark that can draw the invoice image to be identified.
The recognition methods of key message in invoice according to the present invention, using Information locating model, in invoice figure to be identifiedKey message can still be positioned in the case that image quality amount is bad, invoice to be identified is generated without specific hardwareInvoice image, not only save cost, but also be user-friendly.On the other hand, using character recognition model, also further improveThe recognition accuracy of the image not high to picture quality.
It should be appreciated that in order to simplify the disclosure and help to understand one or more of each inventive aspect, it is right aboveThe present invention exemplary embodiment description in, each feature of the invention be grouped together into sometimes single embodiment, figure orIn person's descriptions thereof.However, the method for the disclosure should be construed to reflect following intention:I.e. claimed hairThe bright feature more features required than being expressly recited in each claim.More precisely, as the following claimsAs book reflects, inventive aspect is all features less than single embodiment disclosed above.Therefore, it then follows specific realThus the claims for applying mode are expressly incorporated in the embodiment, wherein each claim is used as this hair in itselfBright separate embodiments.
Those skilled in the art should understand that the module or unit or group of the equipment in example disclosed hereinPart can be arranged in equipment as depicted in this embodiment, or alternatively can be positioned at and the equipment in the exampleIn different one or more equipment.Module in aforementioned exemplary can be combined as a module or be segmented into addition multipleSubmodule.
Those skilled in the art, which are appreciated that, to carry out adaptively the module in the equipment in embodimentChange and they are arranged in one or more equipment different from the embodiment.Can be the module or list in embodimentMember or component be combined into a module or unit or component, and can be divided into addition multiple submodule or subelement orSub-component.In addition at least some in such feature and/or process or unit exclude each other, it can use anyCombination is disclosed to all features disclosed in this specification (including adjoint claim, summary and attached drawing) and so to appointWhere all processes or unit of method or equipment are combined.Unless expressly stated otherwise, this specification (including adjoint powerProfit requires, summary and attached drawing) disclosed in each feature can be by providing the alternative features of identical, equivalent or similar purpose come generationReplace.
The present invention discloses in the lump:
A9, the method as any one of A1-8, further include the step of training character recognition model in advance:By the second instructionPractice the training image in image collection and input initial character recognition model to export recognition result;And calculate the identification of outputAs a result with the error of legitimate reading, and by the method backpropagation of minimization error to adjust the parameter of character recognition model, directlyTraining terminates when meeting predetermined condition to error.
A10, the method as described in A9, wherein, character recognition model uses convolutional neural networks structure, including three convolutionLayer, two pond layers, full an articulamentum and an output layer.
A11, the method as described in A9 or 10, wherein, the step of training character recognition model further includes generation second in advanceThe step of training image set:Collect comprising various characters image be used as sample image, wherein the character include letter withNumeral;Character segmentation processing is carried out to the sample image, obtains multiple images for including single character;To described comprising singleThe image of character carries out mirror image processing, generates multiple relative images to form sample set;And in the sample setImage zoom in and out image of the processing to be met preliminary dimension and be used as the second training image set.
A12, the method as any one of A3-11, wherein, the preliminary dimension is 32*32.
In addition, it will be appreciated by those of skill in the art that although some embodiments described herein include other embodimentsIn included some features rather than further feature, but the combination of the feature of different embodiments means in of the inventionWithin the scope of and form different embodiments.For example, in the following claims, embodiment claimed is appointedOne of meaning mode can use in any combination.
Various technologies described herein can combine hardware or software, or combinations thereof is realized together.So as to the present inventionMethod and apparatus, or some aspects of the process and apparatus of the present invention or part can take embedded tangible media, such as softThe form of program code (instructing) in disk, CD-ROM, hard disk drive or other any machine readable storage mediums,Wherein when program is loaded into the machine of such as computer etc, and is performed by the machine, the machine becomes to put into practice this hairBright equipment.
In the case where program code performs on programmable computers, computing device generally comprises processor, processorReadable storage medium (including volatile and non-volatile memory and or memory element), at least one input unit, and extremelyA few output device.Wherein, memory is arranged to store program codes;Processor is arranged to according to the memoryInstruction in the said program code of middle storage, performs method of the present invention.
By way of example and not limitation, computer-readable medium includes computer storage media and communication media.CalculateMachine computer-readable recording medium includes computer storage media and communication media.Computer-readable storage medium storage such as computer-readable instruction,The information such as data structure, program module or other data.Communication media is generally modulated with carrier wave or other transmission mechanisms etc.Data-signal processed passes to embody computer-readable instruction, data structure, program module or other data including any informationPass medium.Any combination above is also included within the scope of computer-readable medium.
In addition, be described as herein can be by the processor of computer system or by performing for some in the embodimentThe method or the combination of method element that other devices of the function are implemented.Therefore, have and be used to implement the method or methodThe processor of the necessary instruction of element forms the device for being used for implementing this method or method element.In addition, device embodimentElement described in this is the example of following device:The device is used to implement as in order to performed by implementing the element of the purpose of the inventionFunction.
As used in this, unless specifically stated, come using ordinal number " first ", " second ", " the 3rd " etc.Description plain objects are merely representative of the different instances for being related to similar object, and are not intended to imply that the object being so described mustMust have the time it is upper, spatially, in terms of sequence or given order in any other manner.
Although according to the embodiment of limited quantity, the invention has been described, benefits from above description, the artIt is interior it is clear for the skilled person that in the scope of the present invention thus described, it can be envisaged that other embodiments.Additionally, it should be noted thatThe language that is used in this specification primarily to readable and teaching purpose and select, rather than in order to explain or limitDetermine subject of the present invention and select.Therefore, in the case of without departing from the scope and spirit of the appended claims, for thisMany modifications and changes will be apparent from for the those of ordinary skill of technical field.For the scope of the present invention, to thisThe done disclosure of invention is illustrative and not restrictive, and it is intended that the scope of the present invention be defined by the claims appended hereto.