CN109726715A

Movatterモバイル変換

Info

Publication number: CN109726715A
Application number: CN201811614263.6A
Authority: CN
Inventors: 雷钧; 林路; 林康; 王慜骊; 安通鉴
Original assignee: SUNYARD SYSTEM ENGINEERING Co Ltd
Current assignee: SUNYARD SYSTEM ENGINEERING Co Ltd
Priority date: 2018-12-27
Filing date: 2018-12-27
Publication date: 2019-05-07

Abstract

The present invention discloses a kind of method that character image serializes identification, structural data output, this method specifically: obtain multiple character image blocks；Text image feature extraction is carried out to each character image block using full depth convolutional neural networks, each character image block is expressed as feature vector；The feature vector is handled using deep neural network, and exports a probability distribution about character set；Using connection chronological classification layer as transcription layer, the dynamic programming algorithm that the probability distribution about character set is propagated using forward calculation and reversed gradient exports computer readable text；Error correction is carried out to computer readable text using language model, structural data is obtained and exports.Method recognition accuracy of the invention is high, and robustness is good, and discrimination is high.

Description

A kind of character image serializing identification, structural data output method

Technical field

The present invention relates to image identification technical fields in computer software more particularly to a kind of serializing of character image to knowNot, structural data output method.

Background technique

Referring to through the equipment such as computer, benefit for financial field word area detection fixation and recognition technology based on OCRThe effective information in paper material is automatically extracted and identified with OCR technique (optical character identification), and carries out corresponding positionReason.It is one of the key technology that the computer for realizing that bank is with no paper automatically processes.In financial industry OCR, text often withThe form of sequence occurs, rather than occurs in isolation.Traditional document OCR identification technology is very weak to anti-interference ability, can not knowPicture in the case of other complex background, and low efficiency when output " row text information " and " column text information ".

Summary of the invention

In view of the deficiencies of the prior art, the present invention provides a kind of character image serializing identifications, structural data output sideMethod, specific technical solution are as follows:

A kind of character image serializing identification, structural data output method, which is characterized in that this method includes following stepIt is rapid:

S1: multiple character image blocks are obtained；

S2: text image feature extraction is carried out to each character image block using full depth convolutional neural networks, eachCharacter image block is expressed as feature vector；

S3: being handled the feature vector using deep neural network, and exports one about the general of character setRate distribution；

S4: using connection chronological classification layer as transcription layer, by the probability distribution about character set using forward calculation andThe dynamic programming algorithm that reversed gradient is propagated, exports computer readable text；

S5: error correction is carried out to computer readable text using language model, structural data is obtained and exports.

Further, the S2 specifically:

Using the picture of arbitrary size as input, response diagram of corresponding size is exported, each position is corresponding in the response diagramOriginal image an acceptance region and full convolutional neural networks share convolution response diagram, as feature vector.

Further, the deep neural network is the double-deck Recognition with Recurrent Neural Network.

Further, the S5 specifically:

S5.1: establishing corpus, and with training term vector and language model；

S5.2: the computer readable text that S4 is obtained is put into the language model after training, and by beam-search modeIn the insertion language model, revised text is exported.

Beneficial effects of the present invention are as follows:

(1) directly sequence label can be learnt, is marked without others；

(2) original image pixels are directly based upon and extract feature, do not need to carry out binaryzation, Character segmentation, character locating etc.Image pretreatment operation；

(3) Tag Estimation is carried out using recurrent neural network, it can direct output character sequence prediction result；

(4) length of recognition result is not limited, while calculating loss using CTC, so that character is in character stringIn position be also not limited；

(5) recurrent neural network is used, it is empty to consume less storage using less network weight for the layer that connects more complete than traditionBetween, there is preferable recognition accuracy and robustness, while full convolution is decoded using beam-search method insertion language model and is passedReturn network, further increases discrimination.

Detailed description of the invention

Fig. 1 is the flow diagram of character image serializing identification of the invention, structural data output method.

Specific embodiment

Below according to attached drawing and preferred embodiment the present invention is described in detail, the objects and effects of the present invention will become brighterWhite, below in conjunction with drawings and examples, the present invention will be described in further detail.It should be appreciated that described herein specificEmbodiment is only used to explain the present invention, is not intended to limit the present invention.

As shown in Figure 1, a kind of character image serializing identification, structural data output method, this method include following stepIt is rapid:

S1: multiple character image blocks are obtained；

S2: using full depth convolutional neural networks (deep neural network is the double-deck Recognition with Recurrent Neural Network) to each text figureAs block progress text image feature extraction, each character image block is expressed as feature vector；Specially with the figure of arbitrary sizePiece exports response diagram of corresponding size as input, and each position corresponds to an acceptance region of original image and complete in the response diagramConvolutional neural networks share convolution response diagram, as feature vector；

S4: using connection chronological classification layer (Connectionist Temporal Classifier, hereinafter referred to as CTC)As transcription layer, the dynamic programming algorithm that the probability distribution about character set is propagated using forward calculation and reversed gradient is defeatedComputer readable text out；CTC is a kind of probability function for converting prediction result to label sequence, for input feature vector andThe uncertain time series problem of alignment relation between output label, can automatic end-to-end ground Optimized model parameter and right simultaneouslyThe boundary of neat cutting.

The picture of 256 size of 32x in example, maximum can cutting 256 arrange, that is, input feature vector maximum 256, and exportingThe length maximum setting of label is 18, this to be optimized with CTC model.

About CTC model, it is assumed that the picture of 32x 256, numeric string label are " 123 ", and picture is pressed column cutting (CTC meetingOptimize segmentation model), every piece then branched away goes identification number again, and it is each digital or spcial character general for finding out this blockRate (unrecognized to be then labeled as spcial character "-"), has thus obtained each based on input feature vector sequence (picture)The generic probability distribution of mutually indepedent modeling unit individual (marking off the block come) (including "-" node).Based on probability pointCloth calculates the probability P (123) that sequence label is " 123 ", sets the probability of " 123 " here as the sum of all subsequences, this liningSequence include '-' and ' 1', ' 2', ' 3' continuously repeats.

S5: carrying out error correction to computer readable text using language model, obtain structural data and export, specifically:

Establish corpus, and with training term vector and language model；

The computer readable text that S4 is obtained is put into the language model after training, and beam-search mode is embedded in instituteIn the language model stated, revised text is exported.

The image data that daily workout generates belongs under more satisfactory, noiseless environment, is easy accuracy rate and just reaches100%, actual production environment picture may some line segments or discrete point noise, can be voluntarily in generating training setIncrease some noises, improves test model training effect.

It will appreciated by the skilled person that being not used to limit the foregoing is merely the preferred embodiment of inventionSystem invention, although invention is described in detail referring to previous examples, for those skilled in the art, stillIt can modify to the technical solution of aforementioned each case history or equivalent replacement of some of the technical features.It is allWithin the spirit and principle of invention, modification, equivalent replacement for being made etc. be should be included within the protection scope of invention.

Claims

1. a kind of character image serializing identification, structural data output method, which is characterized in that this method includes following stepIt is rapid:

S1: multiple character image blocks are obtained；

S2: text image feature extraction is carried out to each character image block using full depth convolutional neural networks, each textImage block is expressed as feature vector；

S3: being handled the feature vector using deep neural network, and exports a probability about character set pointCloth.

S4: using connection chronological classification layer as transcription layer, by the probability distribution about character set using forward calculation and reverselyThe dynamic programming algorithm that gradient is propagated exports computer readable text；

2. the method according to claim 1, wherein the S2 specifically:

Using the picture of arbitrary size as input, response diagram of corresponding size is exported, each position corresponds to original in the response diagramOne acceptance region of figure and full convolutional neural networks share convolution response diagram, as feature vector.

3. the method according to claim 1, wherein the deep neural network is the double-deck circulation nerve netNetwork.

4. the method according to claim 1, wherein the S5 specifically:

S5.1: establishing corpus, and with training term vector and language model；

S5.2: the computer readable text that S4 is obtained is put into the language model after training, and beam-search mode is embedded inIn the language model, revised text is exported.