Movatterモバイル変換


[0]ホーム

URL:


CN114092937B - Seal recognition method, device, equipment and medium - Google Patents

Seal recognition method, device, equipment and medium
Download PDF

Info

Publication number
CN114092937B
CN114092937BCN202111371790.0ACN202111371790ACN114092937BCN 114092937 BCN114092937 BCN 114092937BCN 202111371790 ACN202111371790 ACN 202111371790ACN 114092937 BCN114092937 BCN 114092937B
Authority
CN
China
Prior art keywords
seal
text
feature vector
image
existing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111371790.0A
Other languages
Chinese (zh)
Other versions
CN114092937A (en
Inventor
林俪
于淑英
王国悦
卜丽
张岩
徐云
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Construction Bank Corp
Original Assignee
China Construction Bank Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Construction Bank CorpfiledCriticalChina Construction Bank Corp
Priority to CN202111371790.0ApriorityCriticalpatent/CN114092937B/en
Publication of CN114092937ApublicationCriticalpatent/CN114092937A/en
Application grantedgrantedCritical
Publication of CN114092937BpublicationCriticalpatent/CN114092937B/en
Activelegal-statusCriticalCurrent
Anticipated expirationlegal-statusCritical

Links

Classifications

Landscapes

Abstract

Translated fromChinese

本发明涉及计算机视觉技术领域,特别涉及印章识别方法、装置、设备和介质。方法包括:获取待识别的印章图像;对该待识别的印章图像进行文字识别得到识别文字,以及对该印章图像进行特征向量提取得到第一特征向量;将已有印章的特征向量与该第一特征向量进行特征匹配,获得匹配度超过第一预定阈值的一个或多个所述已有印章的印章文字;将一个或多个所述已有印章的印章文字与该识别文字进行相似度比对,输出所述相似度最高并且超过第二预定阈值的所述印章文字作为识别结果,当所述相似度都不超过该第二预定阈值的情况下,输出所述匹配度最高的所述已有印章的印章文字作为所述识别结果。

The present invention relates to the field of computer vision technology, and in particular to a seal recognition method, device, equipment and medium. The method comprises: obtaining a seal image to be recognized; performing text recognition on the seal image to be recognized to obtain recognized text, and performing feature vector extraction on the seal image to obtain a first feature vector; performing feature matching on the feature vector of an existing seal with the first feature vector to obtain seal text of one or more existing seals whose matching degree exceeds a first predetermined threshold; performing similarity comparison on the seal text of one or more existing seals with the recognized text, and outputting the seal text with the highest similarity and exceeding a second predetermined threshold as a recognition result, and when the similarities do not exceed the second predetermined threshold, outputting the seal text of the existing seal with the highest matching degree as the recognition result.

Description

Seal identification method, device, equipment and medium
Technical Field
The invention relates to the technical field of computer vision, in particular to a seal identification method, a seal identification device, seal identification equipment and seal identification media.
Background
With the development of society and economy, the credit card business transaction amount in the international settlement field is rapidly increased, and the number of document images involved in the credit card business transaction amount is rapidly increased, so that the business processing speed is increased, the workload of business personnel is reduced, and the auxiliary processing can be performed by utilizing a computer technology.
For business scenes with higher consistency requirements such as credit cards, efficient and accurate extraction of image information is a primary task in the document processing process. The document image often contains contents such as printed text, watermarks, signatures, seals and the like, aiming at the printed text, a detection and identification algorithm which is partially mature at present, aiming at the contents such as the watermarks, the signatures, the seals and the like, target detection can be carried out by using a target detection algorithm to acquire the position coordinates of each target, and the text contents in the seals generally need to be identified from single characters and can also be regarded as a special scene for text detection and identification, but are influenced by factors such as changeable seal patterns, different fonts, different color depths, character overlapping interference and the like, and the accurate identification result cannot be obtained by directly using the common OCR technology, so that the data conversion work of the seal text is full of challenges.
Disclosure of Invention
The application aims to provide a seal identification method, a device, equipment and a medium, which solve the problem of poor accuracy and coverage rate of seal text content identification in a file image and achieve better effect on accuracy and coverage rate.
The embodiment of the application discloses a seal identification method, which is used for electronic equipment and comprises the following steps:
acquiring a seal image to be identified;
performing character recognition on the seal image to be recognized to obtain recognition characters, and extracting feature vectors of the seal image to obtain first feature vectors;
Performing feature matching on the feature vector of the existing seal and the first feature vector to obtain seal characters of one or more existing seals with matching degree exceeding a first preset threshold;
comparing the similarity between the seal characters of one or more existing seals and the identification characters,
Outputting the seal characters with the highest similarity and exceeding a second preset threshold value as a recognition result, and outputting the seal characters with the highest matching degree of the existing seal as the recognition result when the similarity does not exceed the second preset threshold value.
Optionally, acquiring the stamp image to be identified includes:
Acquiring a file image;
Performing rotation correction and/or tilt correction on the document image;
detecting a seal image to be identified in the corrected file image.
Optionally, performing text recognition on the stamp image to be recognized to obtain the recognized text includes:
performing character detection on the seal image to be identified, wherein characters in a bent arrangement are detected through DRRG algorithm, and characters in a linear arrangement are detected through DBNET algorithm;
and performing character recognition on the detected characters through a CRNN algorithm.
Optionally, the existing stamp is associated with attribution information, including a customer number or a public agency code number.
Optionally, performing feature matching on the feature vector of the existing seal and the first feature vector to obtain seal characters of one or more existing seals with matching degree exceeding a first predetermined threshold value includes:
And obtaining a client number associated with the file image, and performing feature matching on the feature vector of the existing seal associated with the client number and the first feature vector to obtain seal characters of one or more existing seals with matching degree exceeding a first preset threshold.
Optionally, the method further comprises:
and in the case of no existing seal associated with the customer number, matching the feature vector of the existing seal associated with the institutional code with the first feature vector.
Optionally, the method further comprises:
And outputting the identification text as an identification result under the condition that the matching degree does not reach the first preset threshold value.
The embodiment of the application discloses a seal identification device, which comprises:
The seal image acquisition module is used for acquiring a seal image to be identified;
The processing module is used for carrying out character recognition on the seal image to be recognized to obtain recognition characters, and carrying out feature vector extraction on the seal image to obtain a first feature vector;
the feature matching module is used for carrying out feature matching on the feature vector of the existing seal and the first feature vector to obtain one or more seal characters of the existing seal with matching degree exceeding a first preset threshold;
a similarity comparison module for comparing the similarity of one or more seal characters of the existing seal with the identification characters,
And the output module is used for outputting seal characters with highest similarity and exceeding a second preset threshold value as a recognition result, and outputting the seal characters with highest matching degree as a recognition result when the similarity does not exceed the second preset threshold value.
The embodiment of the application discloses a seal identification device which comprises a memory and a processor, wherein the memory stores computer executable instructions, and the instructions, when executed by the processor, enable the device to implement any seal identification method.
The embodiment of the application discloses a computer storage medium, wherein instructions are stored on the computer storage medium, and when the instructions run on a computer, the computer is caused to execute any seal identification method.
Compared with the prior art, the embodiment of the application has the main differences and effects that:
In the application, feature matching is carried out on the feature vector of the existing seal and the first feature vector to obtain one or more seal characters of the existing seal with matching degree exceeding a first preset threshold value, similarity comparison is carried out on the seal characters of the one or more existing seals and the identification characters, seal characters with highest similarity and exceeding a second preset threshold value are output as identification results, when the similarity does not exceed the second preset threshold value, the seal characters with highest matching degree are output as identification results, and when the matching degree does not reach the first preset threshold value, the identification characters are output as identification results. The character detection and recognition technology and the feature matching method are combined, the character recognition result of the seal is corrected by using the seal with successful feature matching, and the seal with unsuccessful feature matching directly outputs the character recognition result of the seal, so that the problem of extracting the character content of the seal in the data conversion process is solved, and a better effect is achieved in accuracy and coverage rate.
In the application, character detection is carried out on a seal image, wherein characters in a bent arrangement are detected through DRRG algorithm, characters in a straight arrangement are detected through DBNET algorithm, and character recognition is carried out on the detected characters through CRNN algorithm. And detecting arc-shaped, circular texts or horizontal texts in various stamps such as circles, ellipses, squares and the like contained in the file image by adopting different algorithms. The recognition accuracy and the accuracy are improved, and the recognition speed is ensured.
In the application, a client number associated with a file image is acquired, a feature vector of an existing seal associated with the client number is subjected to feature matching with a first feature vector to obtain seal characters of one or more existing seals with matching degree exceeding a first preset threshold value, and the feature vector of the existing seal associated with a public institution code is subjected to feature matching with the first feature vector under the condition that the existing seal associated with the client number is not available. And a complete seal image base is established for clients and public institutions, the time consumption of matching search is reduced through the acquired client number as priori knowledge, and the matching accuracy is improved.
Drawings
Fig. 1 shows a flowchart of a stamp identifying method according to an embodiment of the present application.
Fig. 2 shows a schematic diagram of a document image and a stamp base in accordance with an embodiment of the present application.
Fig. 3 shows a schematic diagram of a stamp identifying apparatus according to an embodiment of the present application.
Fig. 4 shows a block diagram of a stamp identifying apparatus according to an embodiment of the present application.
Detailed Description
The application will be further described with reference to specific examples and figures. It is to be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application. Furthermore, for convenience of description, only some, but not all, structures or processes related to the present application are shown in the drawings. It should be noted that in the present specification, like reference numerals and letters denote like items in the following drawings.
It will be understood that, although the terms "first," "second," etc. may be used herein to describe various features, these features should not be limited by these terms. These terms are used merely for distinguishing and are not to be construed as indicating or implying relative importance. For example, a first feature may be referred to as a second feature, and similarly a second feature may be referred to as a first feature, without departing from the scope of the example embodiments.
In the description of the present application, it should also be noted that, unless explicitly specified and limited otherwise, the terms "disposed," "connected," and "connected" are to be construed broadly, and may be, for example, fixedly connected, detachably connected, integrally connected, mechanically connected, electrically connected, directly connected, indirectly connected through an intermediate medium, or in communication between two elements. The specific meaning of the above terms in the present embodiment can be understood in a specific case by those of ordinary skill in the art.
Illustrative embodiments of the application include, but are not limited to, seal identification methods, apparatus, devices, and media.
Various aspects of the illustrative embodiments will be described using terms commonly employed by those skilled in the art to convey the substance of their work to others skilled in the art. However, it will be apparent to those skilled in the art that some alternative embodiments may be practiced using the features described in part. For purposes of explanation, specific numbers and configurations are set forth in order to provide a more thorough understanding of the illustrative embodiments. However, it will be apparent to one skilled in the art that the alternative embodiments may be practiced without the specific details. In some other instances, well-known features are omitted or simplified in order not to obscure the illustrative embodiments of the application.
Furthermore, various operations will be described as multiple discrete operations in a manner that is most helpful in understanding the illustrative embodiments, however, the order of description should not be construed as to imply that these operations are necessarily order dependent, and many of the operations may be performed in parallel, concurrently or with other operations. Furthermore, the order of the operations may also be rearranged. When the described operations are completed, the process may be terminated, but may also have additional steps not included in the figures. The processes may correspond to methods, functions, procedures, subroutines, and the like.
References in the specification to "one embodiment," "an illustrative embodiment," etc., indicate that the embodiment described may include a particular feature, structure, or property, but every embodiment may or may not necessarily include the particular feature, structure, or property. Moreover, such phrases are not necessarily referring to the same embodiment. Furthermore, when a particular feature is described in connection with a particular embodiment, it is within the knowledge of one skilled in the art to affect such feature in connection with other embodiments, whether or not such embodiment is explicitly described.
The terms "comprising," "having," and "including" are synonymous, unless the context dictates otherwise. The phrase "a and/or B" means "(a), (B) or (a and B)".
As used herein, the term module may refer to, be part of, or include memory (shared, dedicated, or group) for running one or more software or firmware programs, an Application Specific Integrated Circuit (ASIC), an electronic circuit and/or processor (shared, dedicated, or group), a combinational logic circuit, and/or other suitable components that provide the described functionality.
In the drawings, some structural or methodological features may be shown in a particular arrangement and/or order. However, it should be understood that such a particular arrangement and/or ordering is not required. Rather, in some embodiments, these features may be described in a different manner and/or order than shown in the illustrative figures. Additionally, the inclusion of a structural or methodological feature in a particular drawing does not imply that all embodiments need to include such feature, and in some embodiments may not be included or may be combined with other features.
For the purpose of making the objects, technical solutions and advantages of the present application more apparent, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.
OCR (Optical Character Recognition ) is a process in which an electronic device (e.g., a scanner or digital camera) examines characters printed on paper, determines their shapes by detecting dark and light patterns, and translates the shapes into computer text by a character recognition method, i.e., a technique in which text in a paper document is optically converted into an image file of a black-and-white lattice for a print character, and the text in the image is converted into a text format by recognition software for further editing processing by word processing software. The OCR technology is taken as a very important research direction in the field of machine vision, the related application fields are various, and various image and picture contents can be accurately and efficiently converted into computer-processable data by utilizing the existing mature OCR technology. However, aiming at the related images in the field of export credit, the related documents are various in variety, different in style and content, particularly when the personal information of the documents is required to be obtained from the seal, the difficulty of identification is very high, the common OCR technology cannot be directly applied to the content identification of the seal, and the problem of extracting the seal information is required to be subjected to targeted research or application.
The seal recognition effect of the text detection recognition algorithm based on the neural network is different due to the fact that the seal patterns in the file images (such as export credit card image pictures) are changeable, fonts are different, color depth is different, text overlapping interference is achieved and the like. Common seal patterns comprise square, round and oval, wherein the round seal and the oval seal often have horizontal texts and curved texts, and a single text detection and recognition algorithm is difficult to achieve a good detection and recognition effect. If the seal mark is in a non-blank area, namely when the seal mark is overlapped with contents such as characters, signatures and the like, or when the seal mark is incomplete, the character detection and recognition results in the seal can be greatly interfered.
The ancient book seal character recognition technology based on graph model matching aims at seal cutting seal characters, although writing is complex, the phenomenon of character bending deformation can hardly occur, a graph model is built for limited characters, the application range is limited, and when characters which do not exist in a graph model library occur, a correct recognition result is difficult to obtain.
In order to solve the technical problems, an embodiment of the application provides a seal identification method for electronic equipment.
Fig. 1 shows a flowchart of a stamp identifying method according to an embodiment of the present application. Fig. 2 shows a schematic diagram of a document image and a stamp base in accordance with an embodiment of the present application. This method is exemplarily illustrated below in connection with fig. 1 and 2.
As shown in fig. 1 and 2, the stamp identifying method 100 includes:
Step 102, performing rotation correction and inclination correction on the document image. For example, a document image requiring seal identification (e.g., an export credit image presented by a customer to a bank, a credit referring to a written document issued to an exporter (seller) upon request from an importer (buyer)) may be rotated or tilted, and then the document image is rotationally and tilt-corrected to a forward document image.
Step 104, detecting a seal image in the corrected document image. For example, the stamp detection is performed on the corrected forward document image 201, and the stamp image 202 in the document image 201 is detected.
It will be appreciated by those skilled in the art that steps 102 and 104 are not necessary in the case where a stamp image can be obtained directly in accordance with the identification method of an embodiment of the present application.
Step 106, text detection is performed on stamp image 202. Text Detection (Text Detection) is a precondition of Text Recognition (Text Recognition), namely, a Text region in a picture is positioned through a Text Detection algorithm, and a single character or a Text line bounding box is found. For example, the stamp text region 2021 in the stamp image 202 is detected by a text detection algorithm.
And 108, performing character recognition on the detected characters. The character recognition algorithm compares the character detection result with characters in the character library one by one, and outputs the best matching result. For example, the stamp character area 2021 is recognized by a character recognition algorithm, and the recognition character 203 is recognized.
According to one embodiment of the application, commonly used text detection methods include EAST/CTPN/SegLink/PixelLink/TextBoxes/TextSnake/MSR.
In the text recognition, the horizontal text line is generally recognized by two methods, namely CRNN or Seq2Seq, and the irregular bending text is recognized by algorithms such as STN+RNN/cascadeR-CNN+LSTM.
The end-to-end character detection and recognition mechanism combines character detection and character recognition, and can directly acquire character position information and character information through single network training learning. The end-to-end model FOTS provides RROI (ROIRotate) for detecting and identifying the shared convolution characteristic layer, can generate a directional text region from the convolution characteristic diagram, supports inclined text line inspection and identification, and achieves real-time detection and identification speed. The Mask R-CNN based end-to-end model MaskTextSpotter can detect various non-horizontal texts, can achieve good detection effect in natural scenes, but only can recognize numbers and English letters. The semi-supervised learning method STN-OCR utilizes a Spatial Transformation Network (STN) to carry out affine transformation on an original input image in a detection part, so that graphic correction actions such as rotation, scaling, tilting and the like are respectively carried out on a plurality of detected text blocks, and better recognition accuracy is obtained in a subsequent text recognition stage.
Those skilled in the art will appreciate that other techniques commonly used in the art may be used for detecting and identifying characters in the stamp image, and will not be described in detail herein.
In step 110, feature vector extraction is performed on stamp image 202 to obtain a first feature vector. For example, feature vector extraction is performed on stamp image 202 to obtain first feature vector 204 of stamp image 202.
According to an embodiment of the present application, the feature vector extraction of the stamp image may employ various conventional image processing techniques well known to those skilled in the art, and will not be described herein.
And 112, performing feature matching on the feature vector of the existing seal and the first feature vector. Feature matching refers to an image matching method that features extracted from an image are taken as conjugate entities, and the attributes or description parameters of the features (actually, features of the features, which can also be considered as features of images) are taken as matching entities, so that the registration of the conjugate entities is realized by calculating similarity measures between the matching entities. The feature matching method has a small application range in the field of seal character recognition, and can achieve a good effect generally aiming at limited seal scenes. The ancient book seal character recognition technology based on the graph model matching provides a graph model construction based on stroke structural features of seal characters, which mainly comprises four important steps of binarization, skeleton extraction, skeleton trimming and changeable approximation, wherein the graph model of the seal characters after the approximation of the optimal skeleton is used as the structural features of seal character images. And in the matching process, node similarity matrix calculation, connection similarity matrix calculation and global similarity matrix calculation are carried out, and a reference seal character sample with the highest global consistency matching score is selected as a recognition result. For example, there are feature vectors of several stamps in stamp feature vector base 205, some of which feature vectors 206-213 are shown in fig. 2, wherein feature vectors 206 and 207 correspond to stamps identified with tag information 214 (tag information may include stamp text of the corresponding stamp), feature vectors 208 and 209 correspond to stamps identified with tag information 215, feature vectors 210 and 211 correspond to stamps identified with tag information 216, and feature vectors 212 and 213 correspond to stamps identified with tag information 217. All (or a portion of) the feature vectors in stamp feature vector base 205, including feature vectors 206-213, are feature matched with first feature vector 204.
Step 114, determining whether there is a matching stamp whose matching degree reaches a first predetermined threshold. For example, it is determined whether or not there is a feature vector whose degree of matching with the first feature vector 204 reaches a first predetermined threshold (for example, 0.7) among all (or part of) feature vectors in the stamp feature vector library 205, and since the feature vector corresponds to the stamp, it is possible to determine whether or not there is a matching stamp whose degree of matching with the first feature vector 204 reaches the first predetermined threshold.
Those skilled in the art will appreciate that the selection of the first predetermined threshold for the degree of matching may be determined according to the actual situation.
Step 116, if the judgment result of step 114 is no, outputting the identification text and ending the method. For example, if the matching degree between all feature vectors in the stamp feature vector base 205 and the first feature vector 204 is lower than the first predetermined threshold, the identification text 203 is directly output.
Step 118, if the result of the determination in step 114 is yes, outputting one or more seal characters matched with the seal, where the matching degree reaches the first predetermined threshold. Wherein, can limit and export the seal characters of the highest matching seal of matching degree of no more than the predetermined number. For example, in the case where the first predetermined threshold is set to 0.7 and the predetermined number is set to 2. The matching degree of the feature vector 207, the feature vector 210 and the feature vector 213 with the first feature vector 204 is 0.8, 0.95 and 0.75 respectively, but the matching degree of other feature vectors in the stamp feature vector base 205 with the first feature vector 204 does not reach 0.7. The seal text contained in the tag information 216 corresponding to the feature vector 210 and the seal text contained in the tag information 217 corresponding to the feature vector 213 are output.
Step 120, comparing the seal text of the one or more matched seals with the identification text. For example, the seal character included in the tag information 216 and the seal character included in the tag information 217 outputted in step 118 are compared in similarity with the identification character 203 identified in step 108.
Step 122, judging whether the seal characters with the similarity reaching the second preset threshold value exist. For example, when the second predetermined threshold is set to 0.9, it is determined whether or not the similarity between the seal character and the identification character 203 included in the tag information 216 and between the seal character and the identification character 203 included in the tag information 217 reaches 0.9.
Also, as will be appreciated by those skilled in the art, the choice of the second predetermined threshold for similarity may be determined based on actual circumstances.
And 124, if the judging result of the step 122 is no, outputting the seal characters of the matched seal with the highest matching degree. For example, if the similarity between the seal text and the identification text 203 included in the tag information 216 and between the seal text and the identification text 203 included in the tag information 217 is less than 0.9, the seal text included in the tag information 216 corresponding to the feature vector 210 having the highest matching degree among the feature vectors output in the step 118 is output.
And step 126, if the judgment result of the step 122 is yes, outputting seal characters with the highest similarity reaching a second preset threshold. For example, if the similarity between the seal text included in the tag information 216 and the identification text 203 is 0.97 and the similarity between the seal text included in the tag information 217 and the identification text 203 is 0.98, the seal text included in the tag information 217 is output.
It is to be understood that the first predetermined threshold, the second predetermined threshold, and the predetermined number may all be set as desired.
In the application, the character detection and recognition technology and the feature matching method are combined, the seal character recognition result is corrected by utilizing the seal with successful feature matching, and the seal with unsuccessful feature matching directly outputs the seal character recognition result, thereby solving the problem of seal character content extraction in the data conversion process and achieving better effect in accuracy and coverage rate.
According to some embodiments of the present application, performing text recognition on the seal image to obtain a recognized text further includes:
and detecting characters of the seal image, wherein characters in a curved arrangement are detected through a DRRG algorithm, characters in a linear arrangement are detected through a DBNET algorithm, and then, the characters detected through a CRNN algorithm are subjected to character recognition. Wherein:
Aiming at the characteristic that seal characters in the export credit receipt are horizontal, arc-shaped or circular, the arc-shaped and circular texts in the seal are detected by adopting DRRG algorithm, and the horizontal texts are detected by DBNET. DRRG is an end-to-end network consisting of a CNN-based text suggestion network and a GCN-based relational inference network. The traditional CNN-based relationship reasoning network can only integrate space information and channel information, and can integrate sequence information among texts in a GCN-based relationship reasoning mode, so that the network achieves a better effect when processing texts with any shape compared with CNN which is more suitable for reasoning relationship among texts. In order to achieve a more accurate detection effect, the partial text detection method often adopts a complex network structure and post-processing operation, and consumes a certain time cost. DBNET, a differentiable binarization module can adaptively set a binarization threshold value, so that post-processing is simplified, and the network can reach balance in performance and speed.
The CRNN can perform end-to-end training, has small model and high speed, is mainly used for identifying text sequences with indefinite length, wherein the CNN is used for extracting the characteristics of an input image, predicting the characteristic sequences by using the bidirectional RNN, learning the characteristic vectors in the sequences and outputting predicted label distribution, and finally converting the predicted label distribution into a final label sequence by CTC loss.
In the application, arc-shaped, circular texts or horizontal texts appearing in various stamps such as circles, ellipses, squares and the like contained in the file image are detected by adopting different algorithms respectively. The recognition accuracy and the accuracy are improved, and the recognition speed is ensured.
According to some embodiments of the present application, stamp feature vector base 205 may be built and operated according to the following manner:
multiple seal images of the same seal are stored in a subfolder, and the subfolder and the seal images in the subfolder are associated with unique seal label information. And then the subfolder is classified into a folder named by the attribution information of the seal. Thereby establishing a seal image base. And then extracting the characteristics of each seal image, extracting seal characteristic vectors, and storing the characteristic vectors with the same attribution information and associated label information into the same file named by the attribution information. Thereby establishing a stamp feature vector base.
Further described below in connection with fig. 2, for example:
1) Establishment of seal image base
In the credit card examination practice, a single client uses a plurality of seals, and character information contained in different seals is different and belongs to the same company. For example, a customer of customer number 218 may have multiple stamps, with the same stamp including multiple stamp images. Multiple (e.g., 10-20, 2 shown) stamp images of the same stamp may be grouped into subfolders of the same stamp named with corresponding tag information, e.g., stamp images 220 and 221 may be grouped into subfolders named with tag information 214, and stamp images 222 and 223 may be grouped into subfolders named with tag information 215. These subfolders are then assigned to the same folder named client number 218.
The bill types of clients also comprise public bill types provided by third party companies such as insurance bills, bill of lading, air traffic bills and the like, the stamped seal is a non-client seal, and corresponding picture base and corresponding label information are required to be established for the public seal. Likewise, for example, seal images 224 and 225 are included in subfolders named tag information 216, and seal images 226 and 227 are included in subfolders named tag information 217. These subfolders are then assigned to folders named by the same public authority 219 ("public authority 219" may be a public authority code). Thus, a stamp image base 228 is created.
2) Feature vector extraction
And (3) respectively extracting the characteristics of the seal image under each client numbering folder and the seal image under the public institution folder by utilizing a characteristic extraction VGG16 model according to the established complete seal image base in the earlier stage, and storing the characteristic vectors with the same attribution information and the associated label information thereof into the same file named by the attribution information to establish the seal characteristic vector base. For example, feature vectors 206-213 are extracted from stamp images 220-227, respectively, and then feature vectors 206 and 207 and their associated label information 214, and feature vectors 208 and 209 and their associated label information 215, of the customer belonging to customer number 218 are included in the.h5 file named customer number 218. Similarly, feature vectors 210 and 211 and their associated tag information 216, as well as feature vectors 212 and 213 and their associated tag information 217, belonging to public institution 218 are included in the.h5 file named by public institution 219. Thus, a stamp feature vector base 205 is established, which is convenient for matching search and management.
3) Seal feature matching
Before seal matching, the prior knowledge of the customer number of the seal can be obtained from the message interface, the seal characteristic vector under the customer number can be compared in the matching process to obtain a matching result, and if the seal is not successfully matched with the customer seal, the seal is compared with the characteristic vector of a public institution (customs, inspection and quarantine institution, china International trade promotion Commission, shipping company, insurance institution inspection and quarantine institution, china International trade promotion Commission) to obtain a matching result. For example, when the first feature vector 204 extracted in step 108 is matched with the feature vector in the feature vector base 205, the client number associated with the document image 201, for example, the client number 218, is acquired first, then all the feature vectors in the client number 218.h5 document are feature-matched with the first feature vector 204, and if the matching degree is lower than the first predetermined threshold, or the document image 201 is not associated with the client number 218, and there is no document named by the client number in the feature vector base 205, then the feature vectors in the public institution 219.h5-public institution n.h5 document are feature-matched with the first feature vector 204.
In the application, a complete seal image base is established for clients (such as credit card clients) and public institutions, the feature vector of each seal is extracted, unique corresponding seal label information is given, and the time consumption of matching search is reduced and the matching accuracy is improved through the acquired client number as priori knowledge.
Fig. 3 shows a schematic diagram of a stamp identifying apparatus according to an embodiment of the present application.
As shown in fig. 3, the apparatus 300 includes
Memory 302 for storing computer-executable instructions, an
A processor 304 for executing the instructions to implement any of the possible methods of the first embodiment described above.
The first embodiment is a method embodiment corresponding to the present embodiment, which may be implemented in cooperation with the first embodiment. The related technical details mentioned in the first embodiment are still valid in this embodiment, and in order to reduce repetition, they are not described here again. Accordingly, the related-art details mentioned in the present embodiment can also be applied to the first embodiment.
Specifically, as shown in fig. 3, the apparatus 300 may include one or more (only one is shown in the figure) memories 302 and a processor 304 (the processor 304 may include, but is not limited to, a central processing unit CPU, an image processor GPU, a digital signal processor DSP, a microprocessor MCU, a programmable logic device FPGA, etc.). The particular connection medium between the memory 302 and the processor 304 is not limited in this embodiment of the application. The connection between the memory 302 and the processor 304 in the embodiment of the present application is shown in fig. 3 by a bus 306, and the bus 306 is shown in bold line in fig. 3, and the connection between other components is merely illustrative, and not limited thereto. The bus 306 may be classified as an address bus, a data bus, a control bus, or the like. For ease of illustration, only one thick line is shown in fig. 3, but not only one bus or one type of bus. It will be appreciated by those skilled in the art that the structure shown in fig. 3 is merely illustrative and is not intended to limit the structure of the electronic device. For example, device 300 may also include more or fewer components than shown in FIG. 3, or have a different configuration than shown in FIG. 3.
Processor 304 executes software programs and modules stored in memory 302 to perform various functional applications and data processing, i.e., to implement the stamp identification methods described above.
Memory 302 may be used to store program instructions/modules that are executed by processor 304 corresponding to stamp identification methods in some embodiments of the present application. Memory 302 may include high-speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, memory 302 may further include memory located remotely from processor 304, which may be connected to the device via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
Fig. 4 shows a block diagram of a stamp identifying apparatus according to an embodiment of the present application, as shown in fig. 4, an apparatus 400 includes:
A seal image acquisition module 402, configured to acquire a seal image to be identified;
The processing module 404 is configured to perform text recognition on the seal image to be recognized to obtain a recognition text, and perform feature vector extraction on the seal image to obtain a first feature vector;
a feature matching module 406, configured to perform feature matching on a feature vector of an existing seal and the first feature vector, so as to obtain one or more seal characters of the existing seal with a matching degree exceeding a first predetermined threshold;
A similarity comparison module 408, configured to compare the similarity between the stamp text of one or more existing stamps and the identification text,
And the output module 410 is configured to output the seal text with the highest similarity and exceeding a second predetermined threshold as a recognition result, and output the seal text with the existing seal with the highest matching degree as the recognition result when the similarity does not exceed the second predetermined threshold.
The first embodiment is a method embodiment corresponding to the present embodiment, which may be implemented in cooperation with the first embodiment. The related technical details mentioned in the first embodiment are still valid in this embodiment, and in order to reduce repetition, they are not described here again. Accordingly, the related-art details mentioned in the present embodiment can also be applied to the first embodiment.
According to some embodiments of the present application, a computer storage medium having stored thereon instructions which, when executed on a computer, cause the computer to perform any one of the possible methods of the first embodiment described above is disclosed.
The first embodiment is a method embodiment corresponding to the present embodiment, which may be implemented in cooperation with the first embodiment. The related technical details mentioned in the first embodiment are still valid in this embodiment, and in order to reduce repetition, they are not described here again. Accordingly, the related-art details mentioned in the present embodiment can also be applied to the first embodiment.
In some cases, the disclosed embodiments may be implemented in hardware, firmware, software, or any combination thereof. The disclosed embodiments may also be implemented in the form of instructions or a program loaded onto or stored on one or more transitory or non-transitory machine-readable (e.g., computer-readable) storage media, which may be read and executed by one or more processors, etc. When the instructions or programs are executed by a machine, the machine may perform the various methods described above. For example, the instructions may be distributed over a network or other computer readable medium. Thus, a machine-readable medium may include, but is not limited to, any mechanism for storing or transmitting information in a form readable by a machine (e.g., a computer), such as a floppy disk, an optical disk, a compact disk read-only memory (CD-ROMs), a magneto-optical disk, a read-only memory (ROM), a Random Access Memory (RAM), an erasable programmable read-only memory (EPROM), an electronically erasable programmable read-only memory (EEPROM), a magnetic or optical card, or a flash memory or a tangible machine-readable memory for transmitting network information via electrical, optical, acoustical or other form of signal (e.g., carrier waves, infrared signals, digital signals, etc.). Thus, a machine-readable medium includes any form of machine-readable medium suitable for storing or transmitting electronic instructions or information readable by a machine (e.g., a computer).
The embodiments of the present application have been described in detail above with reference to the accompanying drawings, but the use of the technical solution of the present application is not limited to the applications mentioned in the embodiments of the present application, and various structures and modifications can be easily implemented with reference to the technical solution of the present application to achieve the various advantageous effects mentioned herein. Various changes, which may be made by those skilled in the art without departing from the spirit of the application, are deemed to be within the scope of the application as defined by the appended claims.

Claims (10)

Translated fromChinese
1.一种印章识别方法,用于电子设备,其特征在于,包括:1. A seal recognition method, used in an electronic device, comprising:获取待识别的印章图像;Acquire a seal image to be identified;对所述待识别的印章图像进行文字识别得到识别文字,以及对所述印章图像进行特征向量提取得到第一特征向量;Performing text recognition on the seal image to be recognized to obtain recognized text, and performing feature vector extraction on the seal image to obtain a first feature vector;将已有印章的特征向量与所述第一特征向量进行特征匹配,所述已有印章的特征向量对应于用标签信息标识的已有印章,所述标签信息包括相应印章的印章文字,获得匹配度超过第一预定阈值的一个或多个特征向量对应的标签信息包含的所述已有印章的印章文字;Perform feature matching on a feature vector of an existing seal and the first feature vector, wherein the feature vector of the existing seal corresponds to an existing seal identified by label information, wherein the label information includes the seal text of the corresponding seal, and obtain the seal text of the existing seal included in the label information corresponding to one or more feature vectors whose matching degree exceeds a first predetermined threshold;将一个或多个所述已有印章的印章文字与所述识别文字进行相似度比对,Compare the seal characters of one or more existing seals with the recognition characters for similarity,输出所述相似度最高并且超过第二预定阈值的所述印章文字作为识别结果,当所述相似度都不超过所述第二预定阈值的情况下,输出所述匹配度最高的所述已有印章的印章文字作为所述识别结果。The seal text with the highest similarity and exceeding a second predetermined threshold is output as the recognition result. When the similarities do not exceed the second predetermined threshold, the seal text with the highest matching degree is output as the recognition result.2.根据权利要求1所述的方法,其特征在于,所述获取待识别的印章图像包括:2. The method according to claim 1, characterized in that the step of obtaining the seal image to be identified comprises:获取文件图像;Get the file image;对所述文件图像进行旋转校正和/或倾斜校正;Performing rotation correction and/or tilt correction on the document image;检测出经过校正的所述文件图像中的所述待识别的印章图像。The seal image to be recognized is detected in the corrected document image.3.根据权利要求2所述的方法,其特征在于,所述对所述待识别的印章图像进行文字识别得到识别文字包括:3. The method according to claim 2, characterized in that the step of performing text recognition on the seal image to be recognized to obtain recognized text comprises:对所述待识别的印章图像进行文字检测,其中,弯曲排列的文字通过DRRG算法进行检测,直线排列的文字通过DBNET算法进行检测;Performing text detection on the seal image to be identified, wherein the text arranged in a curved manner is detected by the DRRG algorithm, and the text arranged in a straight line is detected by the DBNET algorithm;通过CRNN算法对检测出的文字进行所述文字识别。The detected text is subjected to the text recognition through the CRNN algorithm.4.根据权利要求3所述的方法,其特征在于,所述已有印章与归属信息相关联,所述归属信息包括客户编号或公共机构代号。4 . The method according to claim 3 , wherein the existing seal is associated with attribution information, and the attribution information includes a customer number or a public institution code.5.根据权利要求4所述的方法,其特征在于,将已有印章的特征向量与所述第一特征向量进行特征匹配,获得匹配度超过第一预定阈值的一个或多个所述已有印章的印章文字包括:5. The method according to claim 4, characterized in that the feature vector of the existing seal is matched with the first feature vector to obtain one or more seal characters of the existing seal whose matching degree exceeds a first predetermined threshold, comprising:获取与所述文件图像相关联的所述客户编号,将与所述客户编号相关联的已有印章的特征向量与所述第一特征向量进行特征匹配,获得匹配度超过第一预定阈值的一个或多个所述已有印章的印章文字。The customer number associated with the document image is obtained, and a feature vector of an existing seal associated with the customer number is matched with the first feature vector to obtain seal text of one or more existing seals whose matching degree exceeds a first predetermined threshold.6.根据权利要求5所述的方法,其特征在于,还包括:6. The method according to claim 5, further comprising:在没有与所述客户编号相关联的已有印章的情况下,将与所述公共机构代号相关联的所述已有印章的特征向量与所述第一特征向量进行所述特征匹配。In the case that there is no existing seal associated with the customer number, the feature matching is performed between the feature vector of the existing seal associated with the public institution code and the first feature vector.7.根据权利要求6所述的方法,其特征在于,还包括:7. The method according to claim 6, further comprising:在所述匹配度都不达到所述第一预定阈值的情况下,输出所述识别文字作为所述识别结果。When the matching degree does not reach the first predetermined threshold, the recognized text is output as the recognition result.8.一种印章识别装置,其特征在于,包括:8. A seal recognition device, characterized in that it comprises:印章图像获取模块,用于获取待识别的印章图像;A seal image acquisition module, used to acquire the seal image to be identified;处理模块,用于对所述待识别的印章图像进行文字识别得到识别文字,以及对所述印章图像进行特征向量提取得到第一特征向量;A processing module, used for performing text recognition on the seal image to be recognized to obtain recognized text, and performing feature vector extraction on the seal image to obtain a first feature vector;特征匹配模块,用于将已有印章的特征向量与所述第一特征向量进行特征匹配,所述已有印章的特征向量对应于用标签信息标识的已有印章,所述标签信息包括相应印章的印章文字,获得匹配度超过第一预定阈值的一个或多个特征向量对应的标签信息包含的所述已有印章的印章文字;A feature matching module is used to perform feature matching between a feature vector of an existing seal and the first feature vector, wherein the feature vector of the existing seal corresponds to an existing seal identified by label information, wherein the label information includes the seal text of the corresponding seal, and obtain the seal text of the existing seal included in the label information corresponding to one or more feature vectors whose matching degree exceeds a first predetermined threshold;相似度比对模块,用于将一个或多个所述已有印章的印章文字与所述识别文字进行相似度比对;A similarity comparison module, used for comparing the seal characters of one or more existing seals with the recognition characters for similarity;输出模块,用于输出所述相似度最高并且超过第二预定阈值的所述印章文字作为识别结果;当所述相似度都不超过所述第二预定阈值的情况下,输出所述匹配度最高的所述已有印章的印章文字作为所述识别结果。The output module is used to output the seal text with the highest similarity and exceeding a second predetermined threshold as the recognition result; when the similarities do not exceed the second predetermined threshold, the seal text with the highest matching degree of the existing seal is output as the recognition result.9.一种印章识别设备,其特征在于,所述设备包括存储有计算机可执行指令的存储器和处理器,当所述指令被所述处理器执行时,使得所述设备实施根据权利要求1-7中任一项所述的印章识别方法。9. A seal recognition device, characterized in that the device comprises a memory storing computer executable instructions and a processor, and when the instructions are executed by the processor, the device implements the seal recognition method according to any one of claims 1 to 7.10.一种计算机存储介质,其特征在于,在所述计算机存储介质上存储有指令,当所述指令在计算机上运行时,使得所述计算机执行根据权利要求1-7中任一项所述的印章识别方法。10. A computer storage medium, characterized in that instructions are stored on the computer storage medium, and when the instructions are executed on a computer, the computer executes the seal recognition method according to any one of claims 1 to 7.
CN202111371790.0A2021-11-182021-11-18 Seal recognition method, device, equipment and mediumActiveCN114092937B (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN202111371790.0ACN114092937B (en)2021-11-182021-11-18 Seal recognition method, device, equipment and medium

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN202111371790.0ACN114092937B (en)2021-11-182021-11-18 Seal recognition method, device, equipment and medium

Publications (2)

Publication NumberPublication Date
CN114092937A CN114092937A (en)2022-02-25
CN114092937Btrue CN114092937B (en)2025-04-11

Family

ID=80301977

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN202111371790.0AActiveCN114092937B (en)2021-11-182021-11-18 Seal recognition method, device, equipment and medium

Country Status (1)

CountryLink
CN (1)CN114092937B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN117292102A (en)*2023-04-172023-12-26国网安徽省电力有限公司电力科学研究院Seal fold extraction optimization method and seal fold extraction optimization system based on fusion characteristics

Citations (2)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN110751139A (en)*2019-09-162020-02-04深圳市国信合成科技有限公司Invoice seal identification method and device, computer equipment and storage medium
CN111353485A (en)*2018-12-202020-06-30中国移动通信集团辽宁有限公司 Seal identification method, device, equipment and medium

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
JP4347256B2 (en)*2005-04-142009-10-21シャープ株式会社 Image processing apparatus, image processing method, image processing program, and computer-readable recording medium recorded with the same
CN111950355A (en)*2020-06-302020-11-17深圳市雄帝科技股份有限公司Seal identification method and device and electronic equipment
CN112329756A (en)*2020-09-252021-02-05武汉光谷信息技术股份有限公司Method and device for extracting seal and recognizing characters

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN111353485A (en)*2018-12-202020-06-30中国移动通信集团辽宁有限公司 Seal identification method, device, equipment and medium
CN110751139A (en)*2019-09-162020-02-04深圳市国信合成科技有限公司Invoice seal identification method and device, computer equipment and storage medium

Also Published As

Publication numberPublication date
CN114092937A (en)2022-02-25

Similar Documents

PublicationPublication DateTitle
CN110766014B (en)Bill information positioning method, system and computer readable storage medium
Van Beusekom et al.Text-line examination for document forgery detection
US8494273B2 (en)Adaptive optical character recognition on a document with distorted characters
CN110569769A (en)image recognition method and device, computer equipment and storage medium
CN114550189A (en)Bill recognition method, device, equipment, computer storage medium and program product
CN112949455B (en)Value-added tax invoice recognition system and method
CN111858977B (en) Bill information collection method, device, computer equipment and storage medium
CN108717543A (en) An invoice identification method and device, and computer storage medium
US12141938B2 (en)Image processing system, image processing method, and program
CN114463767B (en) Letter of credit identification method, device, computer equipment and storage medium
CN108830275B (en) Recognition method and device for dot matrix characters and dot matrix numbers
US11881043B2 (en)Image processing system, image processing method, and program
ArslanEnd to end invoice processing application based on key fields extraction
CN113688821A (en)OCR character recognition method based on deep learning
CN112395995A (en)Method and system for automatically filling and checking bill according to mobile financial bill
CN111462388A (en)Bill inspection method and device, terminal equipment and storage medium
CN107240185A (en)A kind of crown word number identification method, device, equipment and storage medium
CN112241727A (en)Multi-ticket identification method and system and readable storage medium
CN112396047B (en)Training sample generation method and device, computer equipment and storage medium
CN108090728B (en)Express information input method and system based on intelligent terminal
CN118736613A (en) A bill recognition method and device based on text detection and template matching
CN120340054A (en) Document recognition method, system, device and medium based on multimodal large model
CN114092937B (en) Seal recognition method, device, equipment and medium
CN116311292A (en)Document image information extraction method, device, computer equipment and storage medium
CN111444876A (en)Image-text processing method and system and computer readable storage medium

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination
GR01Patent grant
GR01Patent grant

[8]ページ先頭

©2009-2025 Movatter.jp