CN108304814A

Movatterモバイル変換

Info

Publication number: CN108304814A
Application number: CN201810128155.1A
Authority: CN
Inventors: 徐行; 刘辉; 刘宁; 张东祥; 郭龙; 陈李江; 李启林
Original assignee: Hainan Cloud River Technology Co Ltd
Current assignee: Hainan Avanti Technology Co ltd
Priority date: 2018-02-08
Filing date: 2018-02-08
Publication date: 2018-07-20
Anticipated expiration: 2038-02-08
Also published as: CN108304814B

Abstract

The invention discloses a kind of construction method of literal type detection model and literal type detection methods, and suitable for being executed in computing device, model building method includes：Acquisition training picture；Each trained picture is extended for a rectangular picture；Obtain the result after being labeled to the print hand writing region of each rectangular picture and handwritten text region；Convolutional neural networks are trained according to each trained picture and its annotation results, obtain literal type detection model.Detection method includes：Original image to be identified is obtained, is multiple subgraphs by the original image cutting；Print hand writing region and handwritten text region in each subgraph are detected using literal type detection model respectively, obtain the coordinate information and its literal type of each character area；The adjacent same type of character area cut for belonging to different subgraphs is merged, the print hand writing region in original image and handwritten text region are obtained.The invention also discloses corresponding computing devices.

Description

A kind of construction method and computing device of literal type detection model

Technical field

The present invention relates to image real time transfer field more particularly to a kind of construction method, the texts of literal type detection modelWord type detection method and computing device.

Background technology

With the development of computer and Internet technology, people more and more use automation equipment to try student examinationVolume is goed over examination papers.In in examination paper analysis, it is often necessary to identify that the word of each identification region is hand-written script or printed wordsBody.Current character recognition method is typically based on character color or the simple character features answered are identified.This method is to imageQuality requirement it is very high, if image have shade or occur it is hand-written immersion and it is fuzzy situations such as, it will cause accuracy of detection mistakesLow problem.Moreover, this method is typically only capable to be split detection based on horizontal line word, to rotation image can not be intoRow detection well.In addition, word itself has various features, it is based only upon color characteristic and the detection differentiation of handwriting is failedThe feature of handwriting is fully excavated, and then limits its detection result to a certain extent.

Accordingly, it is desirable to provide a kind of detection method of more effective handwritten text and print hand writing.

Invention content

In view of the above problems, the present invention proposes a kind of construction method of literal type detection model, literal type detectionMethod and computing device exist above to try hard to solve the problems, such as or at least solve.

According to an aspect of the present invention, a kind of construction method of literal type detection model is provided, suitable for being set in calculatingStandby middle execution, this method include：Acquisition training picture, wherein every trained picture includes print hand writing and handwritten textAt least one of；Each trained picture is extended for a rectangular picture according to the long width values of each trained picture；It obtains to each sideThe print hand writing region and handwritten text region of shape picture be labeled after result；And according to each trained picture and itsAnnotation results are trained convolutional neural networks, obtain literal type detection model.

Optionally, in the construction method of literal type detection model according to the present invention, convolutional neural networks include 6 layersConvolutional layer and 2 layers of full articulamentum.

Optionally, intermediate in convolutional neural networks in the construction method of literal type detection model according to the present inventionThe convolution kernel of convolutional layer includes 3*3 convolution kernels, 5*5 convolution kernels and 7*7 convolution kernels, and last output layer includes print hand writing areaDomain, handwritten text region and 3 kinds of background area classification.

Optionally, in the construction method of literal type detection model according to the present invention, the block letter of square shaped pictureThe operation that character area and handwritten text region are labeled includes：Determine each line of text in the rectangular picture and each textCharacter area in one's own profession；The character area type of each line of text is labeled line by line, character area type includes block letterCharacter area and handwritten text region；And by the coordinate information of each character area in each line of text and its affiliated wordClassification is preserved.

It optionally, will according to the long width values of picture in the construction method of literal type detection model according to the present inventionTraining picture the step of being extended for a rectangular picture includes：Higher value framework one during selection is long and wide is white background figurePicture, and the training picture is placed on to the center of white background picture.

According to a further aspect of the invention, a kind of literal type detection method is provided, suitable for being executed in computing device,Literal type detection model is stored in computing device, literal type detection model is suitable for examining using literal type as described aboveThe construction method structure of model is surveyed, literal type detection method includes：The original image of literal type to be identified is obtained, and shouldOriginal image cutting is multiple subgraphs, wherein each subgraph is not overlapped and connects；Using literal type detection model respectively to eachPrint hand writing region and handwritten text region in subgraph are detected, obtain wherein each character area coordinate information andLiteral type belonging to it；And different subgraphs will be belonged to respectively and the adjacent same type of character area cut closesAnd and using in all subgraphs print hand writing regional ensemble and handwritten text regional ensemble as the print in the original imageBrush body character area and handwritten text region

Optionally, in literal type detection method according to the present invention, different subgraphs will be belonged to respectively and adjacent cutSame type of character area the step of merging include：Print hand writing region in each subgraph and hand-written is obtained respectivelyFirst coordinate information of the body character area in corresponding subgraph, and first coordinate information is converted to based on original image theTwo coordinate informations；It is detected whether there are two according to the second coordinate information of each character area or multiple belongs to same type of wordRegion is adjacent to cut, if so, then merge these it is adjacent cut region, with obtain all print hand writing regions in original image andHandwritten text region.

Optionally, in literal type detection method according to the present invention, by the step that original image cutting is multiple subgraphsSuddenly include：The original image is extended for a rectangular picture according to the long width values of original image, and by the rectangular picture cuttingFor multiple subgraphs.

Optionally, in literal type detection method according to the present invention, the coordinate information of character area includes the wordThe top left corner apex coordinate and lower right corner apex coordinate in region.

Optionally, in literal type detection method according to the present invention, if the top left corner apex of original image is in its instituteRectangular picture in coordinate value be (x, y), the top left corner apex of some subgraph is in the rectangular picture in the rectangular pictureCoordinate value be (x₁, y₁), coordinate value of the top left corner apex of certain character area in the subgraph is (x in the subgraph₂, y₂), thenCoordinate value of the character area in the original image is (x₁+x₂- x, y₁+y₂-y)。

According to a further aspect of the invention, a kind of computing device is provided, including：At least one processor；Be stored withThe memory of program instruction, wherein the program instruction is configured as being suitable for being executed by least one processor, program instructionIt include the instruction of the construction method and/or literal type detection method for executing literal type detection model as described above.

According to a further aspect of the invention, a kind of readable storage medium storing program for executing for the instruction that has program stored therein is provided, when the programWhen instruction is read and is executed by computing device so that the computing device executes the structure of literal type detection model as described aboveMethod and/or literal type detection method.

According to the technique and scheme of the present invention, during model training, block letter and handwritten form are largely carried by acquiringThe textual image of word carries out it rectangular expansion processing, and to print hand writing region therein and handwritten text regionBe input in convolutional neural networks after manually marking and learnt, obtains literal type detection model.Rectangular expansion processingCan effectively reduce in following model training process makes model training effect become since tab area is too small, size is irregularDifference.Artificial mark can enable following model training identify the text of single row in this way according to the mark line by line of horizontal directionBlock domain, avoids the result roughness of model whole detection, and improves the fine granularity and precision of detection.

Can be multiple subgraphs by its actual size cutting for original image to be identified during model use, andThe print hand writing region in each subgraph and handwritten text region are detected respectively.Finally, by the print hand writing of each subgraphRegion and handwritten text region merge, you can obtain print hand writing region and the handwritten text area of the original imageDomain.Here, original image is cut into the detection that region detection model is more suitable for after subgraph, compared to directly in artwork it is enterprisingRow identification, can improve the fine granularity and precision of identification.And after being merged to all subgraph results, it more actual can obtainThe region fragment formed in factor graph detection is reduced, to obtain being more in line in artwork in block letter and handwritten text regionThe region of word distribution.

Description of the drawings

To the accomplishment of the foregoing and related purposes, certain illustrative sides are described herein in conjunction with following description and drawingsFace, these aspects indicate the various modes that can put into practice principles disclosed herein, and all aspects and its equivalent aspectIt is intended to fall in the range of theme claimed.Read following detailed description in conjunction with the accompanying drawings, the disclosure it is above-mentionedAnd other purposes, feature and advantage will be apparent.Throughout the disclosure, identical reference numeral generally refers to identicalComponent or element.

Fig. 1 shows the structure diagram of computing device 100 according to an embodiment of the invention；

Fig. 2 shows the flow charts of the construction method 200 of literal type detection model according to an embodiment of the invention；

Fig. 3 shows the flow chart of literal type detection method 300 according to an embodiment of the invention；

Fig. 4 A and Fig. 4 B respectively illustrate the sample picture for meeting model training requirement；

Fig. 4 C and 4D respectively illustrate the sample picture for not meeting model training requirement；

Fig. 5 A and Fig. 5 B are respectively illustrated carries out the rectangular schematic diagram for expanding processing by picture；

Fig. 6 shows the signal according to an embodiment of the invention being labeled line by line to respectively asking the character area of one's own professionFigure；

Fig. 7 shows the structural schematic diagram of convolutional neural networks according to an embodiment of the invention；

Fig. 8 shows the signal according to an embodiment of the invention by the adaptive cutting of original image for multiple subgraphsFigure；And

Fig. 9 shows the schematic diagram of substrate coordinate system transformation according to an embodiment of the invention.

Specific implementation mode

The exemplary embodiment of the disclosure is more fully described below with reference to accompanying drawings.Although showing the disclosure in attached drawingExemplary embodiment, it being understood, however, that may be realized in various forms the disclosure without should be by embodiments set forth hereIt is limited.On the contrary, these embodiments are provided to facilitate a more thoroughly understanding of the present invention, and can be by the scope of the present disclosureCompletely it is communicated to those skilled in the art.

Fig. 1 is the block diagram of Example Computing Device 100.In basic configuration 102, computing device 100, which typically comprises, isSystem memory 106 and one or more processor 104.Memory bus 108 can be used for storing in processor 104 and systemCommunication between device 106.

Depending on desired configuration, processor 104 can be any kind of processing, including but not limited to：Microprocessor(μ P), microcontroller (μ C), digital information processor (DSP) or any combination of them.Processor 104 may include such asThe cache of one or more rank of on-chip cache 110 and second level cache 112 etc, processor core114 and register 116.Exemplary processor core 114 may include arithmetic and logical unit (ALU), floating-point unit (FPU),Digital signal processing core (DSP core) or any combination of them.Exemplary Memory Controller 118 can be with processor104 are used together, or in some implementations, and Memory Controller 118 can be an interior section of processor 104.

Depending on desired configuration, system storage 106 can be any type of memory, including but not limited to：EasilyThe property lost memory (RAM), nonvolatile memory (ROM, flash memory etc.) or any combination of them.System storesDevice 106 may include operating system 120, one or more apply 122 and program data 124.In some embodiments,It may be arranged to be operated using program data 124 on an operating system using 122.Program data 124 includes instruction, in rootIn computing device 100 according to the present invention, program data 124 includes the construction method 200 for executing literal type detection modelAnd/or the instruction of literal type detection method 300.

Computing device 100 can also include contributing to from various interface equipments (for example, output equipment 142, Peripheral Interface144 and communication equipment 146) to basic configuration 102 via the communication of bus/interface controller 130 interface bus 140.ExampleOutput equipment 142 include graphics processing unit 148 and audio treatment unit 150.They can be configured as contribute to viaOne or more port A/V 152 is communicated with the various external equipments of such as display or loud speaker etc.Outside exampleIf interface 144 may include serial interface controller 154 and parallel interface controller 156, they, which can be configured as, contributes toVia one or more port I/O 158 and such as input equipment (for example, keyboard, mouse, pen, voice-input device, touchInput equipment) or the external equipment of other peripheral hardwares (such as printer, scanner etc.) etc communicated.Exemplary communication is setStandby 146 may include network controller 160, can be arranged to convenient for via one or more communication port 164 and oneThe communication that other a or multiple computing devices 162 pass through network communication link.

Network communication link can be an example of communication media.Communication media can be usually presented as in such as carrier waveOr the computer-readable instruction in the modulated data signal of other transmission mechanisms etc, data structure, program module, and canTo include any information delivery media." modulated data signal " can such signal, one in its data set or moreIt is a or it change can the mode of coding information in the signal carry out.As unrestricted example, communication media can be withInclude the wire medium of such as cable network or private line network etc, and such as sound, radio frequency (RF), microwave, infrared(IR) the various wireless mediums or including other wireless mediums.Term computer-readable medium used herein may include depositingBoth storage media and communication media.

Computing device 100 can be implemented as server, such as file server, database server, application program serviceDevice and WEB server etc. can also be embodied as a part for portable (or mobile) electronic equipment of small size, these electronic equipmentsCan be such as cellular phone, personal digital assistant (PDA), personal media player device, wireless network browsing apparatus, individualHelmet, application specific equipment or may include any of the above function mixing apparatus.Computing device 100 can also be realIt includes desktop computer and the personal computer of notebook computer configuration to be now.In some embodiments, 100 quilt of computing deviceIt is configured to execute the construction method 200 and/or literal type detection method 300 of literal type detection model according to the present invention.

Fig. 2 shows the construction methods 200 of literal type detection model according to an embodiment of the invention, can countIt calculates and is executed in equipment, such as executed in computing device 100.As shown in Fig. 2, this method starts from step S220.

In step S220, training picture is acquired, wherein every trained picture includes print hand writing and handwritten textAt least one of.

For specific application scenarios, the word comprising block letter and/or handwritten form met under the scene can be collectedPicture, it shall be noted that the word line number in picture should not be excessive overstocked, in order to reduce the cost of labor of subsequent artefacts' mark.Figure4A and Fig. 4 B respectively illustrate the sample picture for meeting model training requirement, and text line number and spacing are more appropriate；Fig. 4 C and 4DRespectively illustrate the sample picture for not meeting model training requirement；Its text line number is excessively also overstocked.

Then, in step S240, each trained picture is extended for a square chart according to the long width values of each trained picturePiece.

The training picture usually acquired not necessarily meets the training requirement of subsequent detection model, it is therefore desirable to eachPicture carries out rectangular expansion processing, can reduce since tab area is too small in following model training process in this way, size is not advisedThen the problem of, makes model training effect be deteriorated.Rectangular expansion can (such as a length of w wide be according to the original size size of pictureH), the image that one background of value framework larger in w and h is white is chosen, and picture is placed on to the center of white image, in this wayOriginal image is just extended for the rectangular picture of a w*w or h*h.Fig. 5 A and Fig. 5 B respectively illustrate two kinds of rectangular processing and showExample, the picture width w in wherein Fig. 5 A are more than height h, therefore picture are extended for according to width value w rectangular；And in Fig. 5 BPicture width w is less than height h, therefore picture is extended for according to height value h rectangular.Certainly, if picture itself is exactly sideShape picture does not have to then carry out rectangular expansion again.

Then, in step S260, obtain to the print hand writing region of each rectangular picture and handwritten text region intoResult after rower note.

Wherein, the operation that the print hand writing region and handwritten text region of square shaped picture are labeled includes：ReallyEach line of text in the fixed rectangular picture and the character area in each line of text；Line by line to the character area type of each line of textIt is labeled, character area type includes print hand writing region and handwritten text region；By each word in each line of textThe coordinate information in region and its affiliated word classification are preserved.The coordinate information of character area generally includes character areaTop left corner apex coordinate and lower right corner apex coordinate, naturally it is also possible to other coordinate representation methods are chosen, as lower-left angular vertex is satMark and upper right corner apex coordinate or the long width values of top left corner apex coordinate and region, as long as a literal field can be represented accuratelyThe regional location in domain, this is not limited by the present invention.In addition, it will be appreciated that the identification of character area may be used existingSome arbitrary region recognition methods such as use OCR recognition methods, the invention is not limited in this regard.

Fig. 6 shows the signal according to an embodiment of the invention being labeled line by line to respectively asking the character area of one's own professionFigure, 4 line of text are print hand writing, and preceding 3 line of text are respectively there are one character area, and there are four texts in the 4th line of textBlock domain.This mask method line by line enables following model training to identify the character area of single row, and it is whole to avoid modelThe result that physical examination is surveyed is coarse, can improve the fine granularity and precision of detection.

Then, in step S280, convolutional neural networks is trained according to each trained picture and its annotation results, are obtainedTo literal type detection model.

The present invention carries out model training according to the picture set of the existing mark of certain scale, specifically, using rectangular placeThe markup information of picture set and every figure after reason is carried out using the detection model of improved fast area convolutional neural networksTraining.The detection model that training pattern is based on fast area convolutional neural networks (ZF networks) is transformed.For convolution godStructure through network and each layer content, those skilled in the art can sets itself as needed, the invention is not limited in this regard.

According to an embodiment of the present invention, which includes that 6 layers of convolutional layer and 2 layers of full articulamentum, Fig. 7 are shownThe structural schematic diagrams of the convolutional neural networks.In view of the dimension of picture of deep neural network input needs fixation (differentPicture will cut out specified size), w*w the or h*h original images of input are cut into system by the present invention by multiple dimensioned processingOne size is such as cut into 224*224 sizes, ensures that model can support multiple dimensioned image to input in this way.In addition, intermediate volumeLamination can increase the convolution kernel of sizes, appropriate after convolutional layer to adopt such as 3x3 convolution kernels, 5x5 convolution kernels and 7x7 convolution kernelsIt is set as 3 with the class number of parameter drop policy, last output layer, including three block letter, handwritten form and background classifications.ItsIn, the plain white background for referring at background, pixel value is RGB (255,255,255), not to original in neural computingPicture region generates interference or influences.Certainly, it about each layer structure in the convolutional neural networks, can also be arranged as required toFor other numerical value, the present invention is limited this.

As shown in fig. 7, the convolutional neural networks contain 12 layer network structures, wherein each layer of code name is InputLayer (input data layer), conv (convolutional layer), pool (pond layer), full articulamentum (fc), output layer (output).In Fig. 7Full articulamentum and pond layer be together, such as conv2+pool2, conv3+pool3, conv5+pool5, to have plenty of individuallyConvolutional layer there is no pond layer, such as conv1, conv4, conv6.It is, the complete structure of the convolutional neural networks is：InputThe+the second pond of the convolutional layer layer of layer → first convolutional layer → second → third convolutional layer+third pond layer → Volume Four lamination → theFull articulamentum → the output layer of full articulamentum → the second of convolutional layer → the first of five the+the five pond layers of convolutional layer → the 6th, each layerParameter is as shown in the table：

In addition, the mode that cross validation may be used in the training process carries out model selection：By entire picture set pointFor three training set, verification set and test set parts, it is trained on training set picture, is damaged according in iteration cycleThe reduction of function is lost to select the training pattern under the appropriate period to close the performance of test detection in verification collection, and is chosen at verificationThe training pattern to behave oneself best in set is as candidate optimum training model.

Fig. 3 shows literal type detection method 300 according to an embodiment of the invention, can be held in computing deviceRow, such as executes in computing device 100.Literal type detection model as described above, the word are stored in the computing deviceType detection model is suitable for building using literal type detection model method as described above.As shown in figure 3, this method starts from stepRapid S320.

In step S320, the original image of literal type to be identified is obtained, and is multiple subgraphs by the original image cutting,Wherein each subgraph is not overlapped and connects.

As it was noted above, block letter handwritten text detection method in the prior art is higher to image request, usually wantThe high-definition image that Seeking Truth scanner scanning obtains.And the present invention provides a kind of literal type detection models, can effectively reduceRequirement to image definition.Therefore, original image to be identified can obtain the character image of high definition by scanner, also may be usedTo take pictures by mobile phone or camera acquisition and obtain image.Moreover, picture obtains not stringent environmental requirement (such as illumination, angleDegree and paper texture etc.), normal photographing Plain paper under natural lighting is only needed, to effectively increase text imageThe universality of identification also reduces image recognition workload and cost.

The cutting of original image can take adaptive cutting method, i.e., according to the length of original image and roomy small to originalPicture carries out region division, and each region is not overlapped and connects, and each region is as a subgraph (such as the picture cutting institute in Fig. 8Show).Usually, can limit a subgraph size be no more than 480*320 sizes, such a 1920*1280 sizes it is originalPicture can be with cutting for 16-20 subgraph.It is cut into the detection that region detection model is more suitable for after subgraph, compared to directly existingIt is identified in artwork, the fine granularity and precision of identification can be improved.It further, can also be first according to the long width values of original imageThe original image is extended for a rectangular picture, then by the rectangular picture cutting is multiple subgraphs.The rectangular expansion side of its pictureMethod is referring to being described above, and which is not described herein again.

Then, in step S340, using literal type detection model respectively to the print hand writing area in each subgraphDomain and handwritten text region are detected, and obtain the coordinate information of wherein each character area and its affiliated literal type.Block letter and handwritten text region detection are exactly carried out one by one to each subgraph that step S320 cuttings obtain, obtain every heightThe coordinate information of multiple block letter and handwritten text region in figure, and the type of each detection zone (belong to block letterOr hand-written body region).Similarly, the coordinate information of character area includes top left corner apex coordinate and the bottom right of the character areaAngular vertex coordinate, but not limited to this, as long as the regional location of the character area can be indicated accurately.

Then, in step S360, different subgraphs will be belonged to respectively and the adjacent same type of character area cut intoRow merge, and using in all subgraphs print hand writing regional ensemble and handwritten text regional ensemble as in the original imagePrint hand writing region and handwritten text region.

To in all subgraphs printing body region and hand-written body region merge respectively, more actual can be printedBrush body and handwritten text region are reduced because of the region fragment formed in subgraph detection, to obtain being more in line in artworkThe region of word distribution.Include to the rule that subgraph merges：1) the regional ensemble for belonging to same type in different subgraphsTogether, the region as the corresponding types of original image；2) due to the detection in each subgraph (block letter is hand-writtenBody) area information be the first coordinate information based on subgraph, need first coordinate information being mapped to based on original imageSecond coordinate information (transformation for relating to substrate coordinate system)；3) after being converted into the second coordinate information based on original image, inspectionThere are two surveys whether or multiple regions are adjacent cuts, and if there is overlapping, then merges these regions；4) it finally arranges and obtains original imageAll non-overlapping block letter and hand-written body region.

According to one embodiment of present invention, if seat in rectangular picture of the top left corner apex of original image where itScale value is (x, y), and coordinate value of the top left corner apex of some subgraph in the rectangular picture is (x in the rectangular picture₁, y₁), it shouldCoordinate value of the top left corner apex of certain character area in the subgraph is (x in subgraph₂, y₂), then the character area is in the original graphCoordinate value in piece is (x₁+x₂- x, y₁+y₂-y)。

Mainly how Fig. 9 shows substrate coordinate system transfer principle schematic diagram according to an embodiment of the invention,By detected in subgraph the coordinate of character area be converted into based on rectangular expansion after original w*w or h*h pictures in coordinate.Such asShown in Fig. 9, for by the rectangular picture for expanding (including white background), word picture region only accounts for the part in its center, shouldTop left corner apex (the i.e. left frame five-pointed star position) coordinate in region is (x, y).Since the present invention carries out block letter/handwritten formText detection is to carry out sub- Fig. 1-4 (by 4 pieces of the picture cutting of rectangular expansion in exemplary plot, it is of course possible to be cut into itThe subgraph of his number, such as 8 12 or 16 etc.), therefore the coordinate of block letter or handwritten form the style of writing word detected is alsoBased on subgraph, i.e. the first coordinate information.For example, in subgraph 2 rectangle frame handwriting region, top left corner apex coordinate is(x₂, y₂), this coordinate value is the vertex (i.e. upper side frame five-pointed star position in figure) relative to subgraph 2, and the target of the present inventionIt is by coordinate (x₂, y₂) be converted to coordinate value (x relative to the original image vertex (x, y) in rectangular picture₂＇, y₂＇), i.e. phaseFor second coordinate information on original image vertex.By calculating it is found that x₂＇=x₁+x₂- x, y₂＇=y₁+y₂-y。

According to another embodiment of the invention, according to the second coordinate information relative to original image of each character areaAfterwards, you can detect whether there are two or multiple regions are adjacent cuts.Here, adjacent cut through refers to different subgraph edges and has printingBody or hand-written body region are adjacent, the case where being isolated by different subgraphs primarily directed to same character area.For thisThe word isolated needs to merge it to obtain complete a line word.It generally can be according to two character areasTop left corner apex coordinate and lower right corner apex coordinate value to determine whether adjacent cut, it is adjacent would generally be there are one abscissa value when cuttingOr ordinate value is identical.As the rectangle frame of subgraph 1 and subgraph 3 in Fig. 9 be it is adjacent cut, they are one in original imageWhole region, it is therefore desirable to be merged.

Specifically, the character area of the adjacent same type cut can be merged according to following method：It obtains respectively eachThe first coordinate information of print hand writing region and handwritten text region in subgraph in corresponding subgraph, and this first is satMark information is converted to the second coordinate information based on original image；It has been detected whether according to the second coordinate information of each character areaIt is two or more to belong to that same type of character area is adjacent to be cut, if so, then merge these it is adjacent cut region, it is original to obtainAll print hand writing regions in picture and handwritten text region.Here merging can refer to taking two or more wordsThe maximum union refion in region.

According to the technique and scheme of the present invention, after carrying out rectangular expansion processing to each picture, it is possible to reduce following modelSince tab area is too small in training process, the irregular problem of size makes model training effect be deteriorated.To training picture intoThe mark line by line of row horizontal direction so that following model training can identify the character area of single row, avoid model entiretyThe result of detection is coarse, can improve the fine granularity and precision of detection.For the image data collection feature in the present invention, network is changedAs a result, carrying out model training using based on improved fast area convolutional neural networks so that model performance higher.It is cut into sonIt is more suitable for the detection of region detection model after figure, compared to being directly identified in artwork, the fine granularity of identification can be improvedAnd precision.To block letter and handwritten text region more actual can be obtained after being merged in subgraph, reduce because of sonThe region fragment formed in figure detection, to obtain being more in line with the region that word is distributed in artwork.

B9, the method as described in any one of B6-B8, wherein the coordinate information of character area includes the character areaTop left corner apex coordinate and lower right corner apex coordinate.

B10, the method as described in B7, wherein if in rectangular picture of the top left corner apex of original image where itCoordinate value is (x, y), and coordinate value of the top left corner apex of some subgraph in the rectangular picture is (x in the rectangular picture₁, y₁),Coordinate value of the top left corner apex of certain character area in the subgraph is (x in the subgraph₂, y₂), then the character area is original at thisCoordinate value in picture is (x₁+x₂- x, y₁+y₂-y)。

In the instructions provided here, numerous specific details are set forth.It is to be appreciated, however, that the implementation of the present inventionExample can be put into practice without these specific details.In some instances, well known method, knot is not been shown in detailStructure and technology, so as not to obscure the understanding of this description.

Those skilled in the art should understand that the module of the equipment in example disclosed herein or unit or groupsPart can be arranged in equipment as depicted in this embodiment, or alternatively can be positioned at and the equipment in the exampleIn different one or more equipment.Module in aforementioned exemplary can be combined into a module or be segmented into addition multipleSubmodule.

Those skilled in the art, which are appreciated that, to carry out adaptively the module in the equipment in embodimentChange and they are arranged in the one or more equipment different from the embodiment.It can be the module or list in embodimentMember or component be combined into a module or unit or component, and can be divided into addition multiple submodule or subelement orSub-component.Other than such feature and/or at least some of process or unit exclude each other, it may be used anyCombination is disclosed to all features disclosed in this specification (including adjoint claim, abstract and attached drawing) and so to appointWhere all processes or unit of method or equipment are combined.Unless expressly stated otherwise, this specification (including adjoint powerProfit requires, abstract and attached drawing) disclosed in each feature can be by providing the alternative features of identical, equivalent or similar purpose come generationIt replaces.

In addition, it will be appreciated by those of skill in the art that although some embodiments described herein include other embodimentsIn included certain features rather than other feature, but the combination of the feature of different embodiments means in of the inventionWithin the scope of and form different embodiments.For example, in the following claims, embodiment claimed is appointedOne of meaning mode can use in any combination.

Various technologies described herein are realized together in combination with hardware or software or combination thereof.To the present inventionMethod and apparatus or the process and apparatus of the present invention some aspects or part can take embedded tangible media, such as it is softThe form of program code (instructing) in disk, CD-ROM, hard disk drive or other arbitrary machine readable storage mediums,Wherein when program is loaded into the machine of such as computer etc, and is executed by the machine, the machine becomes to put into practice this hairBright equipment.

In the case where program code executes on programmable computers, computing device generally comprises processor, processorReadable storage medium (including volatile and non-volatile memory and or memory element), at least one input unit, and extremelyA few output device.Wherein, memory is configured for storage program code；Processor is configured for according to the memoryInstruction in the said program code of middle storage executes the construction method and/or word of the literal type detection model of the present inventionType detection method.

In addition, be described as herein can be by the processor of computer system or by executing for some in the embodimentThe combination of method or method element that other devices of the function are implemented.Therefore, have for implementing the method or methodThe processor of the necessary instruction of element forms the device for implementing this method or method element.In addition, device embodimentElement described in this is the example of following device：The device is used to implement performed by the element by the purpose in order to implement the inventionFunction.

As used in this, unless specifically stated, come using ordinal number " first ", " second ", " third " etc.Description plain objects are merely representative of the different instances for being related to similar object, and are not intended to imply that the object being described in this way mustMust have the time it is upper, spatially, in terms of sequence or given sequence in any other manner.

Although the embodiment according to limited quantity describes the present invention, above description, the art are benefited fromIt is interior it is clear for the skilled person that in the scope of the present invention thus described, it can be envisaged that other embodiments.Additionally, it should be noted thatThe language that is used in this specification primarily to readable and introduction purpose and select, rather than in order to explain or limitDetermine subject of the present invention and selects.Therefore, without departing from the scope and spirit of the appended claims, to this skillMany modifications and changes will be apparent from for the those of ordinary skill in art field.For the scope of the present invention, to this hairBright done disclosure is illustrative and not restrictive, and it is intended that the scope of the present invention be defined by the claims appended hereto.

Claims

1. a kind of construction method of literal type detection model, suitable for being executed in computing device, this method includes：

Acquisition training picture, wherein every trained picture includes at least one of print hand writing and handwritten text；

Each trained picture is extended for a rectangular picture according to the long width values of each trained picture；

Obtain the result after being labeled to the print hand writing region of each rectangular picture and handwritten text region；And

Convolutional neural networks are trained according to each trained picture and its annotation results, obtain the literal type detection mouldType.

2. the method for claim 1, wherein the convolutional neural networks include 6 layers of convolutional layer and 2 layers of full articulamentum.

3. method as claimed in claim 2, wherein the convolution kernel of intermediate convolutional layer includes 3*3 in the convolutional neural networksConvolution kernel, 5*5 convolution kernels and 7*7 convolution kernels, last output layer include print hand writing region, hand-written body region and background area3 kinds of domain classification.

4. the method for claim 1, wherein the print hand writing region and handwritten text region of square shaped picture intoRower note operation include：

Determine each line of text in the rectangular picture and the character area in each line of text；

The character area type of each line of text is labeled line by line, the character area type include print hand writing region andHandwritten text region；And

The coordinate information of each character area in each line of text and its affiliated word classification are preserved.

5. training picture is the method for claim 1, wherein extended for a rectangular picture according to the long width values of pictureThe step of include：

Higher value framework one during selection is long and wide is white background picture, and the training picture is placed on white background pictureCenter.

6. a kind of literal type detection method is stored with literal type suitable for being executed in computing device in the computing deviceDetection model, the literal type detection model is suitable for using the method structure as described in any one of claim 1-5, describedLiteral type detection method includes：

The original image of literal type to be identified is obtained, and is multiple subgraphs by the original image cutting, wherein each subgraph does not weighFolded and connection；

Using the literal type detection model respectively in each subgraph print hand writing region and handwritten text regionIt is detected, obtains the coordinate information of wherein each character area and its affiliated literal type；And

Different subgraphs will be belonged to respectively and the adjacent same type of character area cut merges, and will be in all subgraphsPrint hand writing regional ensemble and handwritten text regional ensemble are as print hand writing region in the original image and hand-writtenBody character area.

7. method as claimed in claim 6, wherein it is described will belong to respectively different subgraphs and it is adjacent cut it is same type ofThe step of character area merges include：

First coordinate letter of the print hand writing region and handwritten text region in each subgraph in corresponding subgraph is obtained respectivelyBreath, and first coordinate information is converted to the second coordinate information based on original image；

It is detected whether there are two according to the second coordinate information of each character area or multiple belongs to same type of character area phaseFace and cut, if so, then merge these it is adjacent cut region, to obtain all print hand writing regions and the handwritten form in original imageCharacter area.

8. method as claimed in claim 6, wherein include for the step of multiple subgraphs by original image cutting：

The original image is extended for a rectangular picture according to the long width values of original image, and is more by the rectangular picture cuttingA subgraph.

9. a kind of computing device, including：

One or more processors；

Memory；And

One or more programs, wherein one or more of programs are stored in the memory and are configured as by described oneA or multiple processors execute, and one or more of programs include for executing the method according to claim 1-8In either method instruction.

10. a kind of computer readable storage medium of the one or more programs of storage, one or more of programs include instruction,Described instruction is when executed by a computing apparatus so that the computing device executes in the method according to claim 1-8Either method.