Disclosure of Invention
In view of the foregoing, it is desirable to provide an image recognition method, apparatus, device, and medium that can implement image recognition in combination with artificial intelligence, improve accuracy of image text recognition, and also improve recognition performance.
An image recognition method, the image recognition method comprising:
Acquiring a sample image, and carrying out graying treatment on the sample image to obtain a first sample;
Determining a target threshold according to the first sample by using a normal distribution algorithm;
Preprocessing the first sample according to the target threshold value to obtain a second sample;
obtaining a preset cutting template, and cutting the second sample according to the cutting template to obtain a third sample;
Acquiring digital characteristics, english characteristics and Chinese character characteristics from the third sample;
extracting characteristic values of each Chinese character characteristic by taking Chinese character strokes as basic elements, and constructing an initial Chinese character sample according to the characteristic values of each Chinese character characteristic;
obtaining a pre-configured uncommon word dictionary and a shape near word dictionary, and adding the uncommon word dictionary and the shape near word dictionary to the initial Chinese character sample to obtain a target Chinese character sample;
training a preset network according to the digital characteristics, the English characteristics and the target Chinese character samples to obtain an identification model;
and acquiring an image to be identified, identifying the image to be identified by using the identification model, and generating an identification result according to the output data of the identification model.
According to a preferred embodiment of the present invention, the performing gray-scale processing on the sample image to obtain a first sample includes:
Acquiring an R value, a G value and a B value of each sample image;
Determining a first weight corresponding to the R value, a second weight corresponding to the G value and a third weight corresponding to the B value;
Calculating a weighted average value according to the R value, the G value, the B value, the first weight, the second weight and the third weight of each sample image to obtain a gray value of each sample image;
And converting each sample image according to the gray value of each sample image to obtain the first sample.
According to a preferred embodiment of the present invention, said determining the target threshold value according to said first sample using a normal distribution algorithm comprises:
Identifying a text image and a background image of each first sample in the first samples;
acquiring the density of a text image of each first sample and the density of a background image of each first sample;
Acquiring the duty ratio of text image pixels of each first sample as a first duty ratio, and calculating the duty ratio of background image pixels of each first sample as a second duty ratio according to the first duty ratio;
calculating the mixed probability density of the text image and the background image of each first sample according to the first duty ratio, the second duty ratio, the density of the text image of each first sample and the density of the background image of each first sample;
Acquiring an initial threshold value;
Calculating the error probability sum of the text image and the background image of each first sample according to the initial threshold value and the mixed probability density of the text image and the background image of each first sample;
and when the error probability sum takes the minimum value, acquiring the value of the initial threshold value as the target threshold value.
According to a preferred embodiment of the present invention, the preprocessing the first sample according to the target threshold value, to obtain a second sample includes:
performing binarization processing on the first sample according to the target threshold value to obtain a first image set;
carrying out noise reduction treatment on the first image set to obtain a second image set;
calculating the angle of the characteristic connecting line in the second image set by adopting a Hough transformation algorithm;
and correcting the angle of the characteristic connecting line in the second image set to a horizontal position according to a rotation algorithm to obtain the second sample.
According to a preferred embodiment of the present invention, the extracting the feature value of each Chinese character feature using the strokes of the Chinese character as the basic elements includes:
Dividing the region of each Chinese character feature to obtain a horizontal sub-graph, a longitudinal sub-graph and an oblique sub-graph of each Chinese character feature;
randomly acquiring pixel points from each Chinese character feature as initial pixel points;
Determining the initial pixel point as a starting point, and detecting black pixel points in a horizontal sub-graph, a longitudinal sub-graph and an oblique sub-graph of each Chinese character characteristic;
determining transverse strokes according to the number and the length of black pixel points in the detected transverse subgraph of each Chinese character feature;
Determining longitudinal strokes according to the number and the length of black pixel points in the detected longitudinal subgraph of each Chinese character feature;
determining oblique strokes according to the number and the length of black pixel points in the detected oblique subgraph of each Chinese character characteristic;
and constructing the characteristic value of each Chinese character characteristic according to the transverse stroke of each Chinese character characteristic, the longitudinal stroke of each Chinese character characteristic and the oblique stroke of each Chinese character characteristic.
According to a preferred embodiment of the present invention, the constructing an initial Chinese character sample according to the feature value of each Chinese character feature includes:
Acquiring a length threshold value of a transverse stroke of each Chinese character feature as a transverse length threshold value, acquiring a length threshold value of a longitudinal stroke of each Chinese character feature as a longitudinal length threshold value, and acquiring a length threshold value of an oblique stroke of each Chinese character feature as an oblique length threshold value;
Acquiring the length of the transverse stroke, the length of the longitudinal stroke and the length of the oblique stroke of each Chinese character feature from the feature value of each Chinese character feature;
when the length of the detected transverse strokes with the Chinese character characteristics is greater than or equal to the transverse length threshold, determining the detected transverse strokes as target transverse strokes;
when the length of the detected longitudinal strokes with the Chinese character characteristics is greater than or equal to the longitudinal length threshold, determining the detected longitudinal strokes as target longitudinal strokes;
When the length of the detected oblique strokes with the Chinese character characteristics is larger than or equal to the threshold value of the oblique length, determining the detected oblique strokes as target oblique strokes;
And constructing Chinese character information corresponding to each Chinese character feature according to the target transverse strokes, the target longitudinal strokes and the target oblique strokes corresponding to each Chinese character feature, and obtaining the initial Chinese character sample.
According to a preferred embodiment of the present invention, the generating the recognition result according to the output data of the recognition model includes:
Invoking a pre-configured text library;
acquiring all features in the output data;
Matching in the text library by utilizing all the characteristics in the output data;
and determining the matched characters with all the characteristics as the identification result.
An image recognition apparatus, the image recognition apparatus comprising:
the processing unit is used for acquiring a sample image, and carrying out graying treatment on the sample image to obtain a first sample;
a determining unit, configured to determine a target threshold according to the first sample using a normal distribution algorithm;
The processing unit is further used for preprocessing the first sample according to the target threshold value to obtain a second sample;
the cutting unit is used for obtaining a preset cutting template, and cutting the second sample according to the cutting template to obtain a third sample;
the acquisition unit is used for acquiring digital characteristics, english characteristics and Chinese character characteristics from the third sample;
the construction unit is used for extracting the characteristic value of each Chinese character characteristic by taking the strokes of the Chinese characters as basic elements and constructing an initial Chinese character sample according to the characteristic value of each Chinese character characteristic;
The adding unit is used for obtaining a pre-configured uncommon word dictionary and a shape near word dictionary, and adding the uncommon word dictionary and the shape near word dictionary to the initial Chinese character sample to obtain a target Chinese character sample;
The training unit is used for training a preset network according to the digital characteristics, the English characteristics and the target Chinese character samples to obtain an identification model;
the identification unit is used for acquiring an image to be identified, identifying the image to be identified by utilizing the identification model, and generating an identification result according to the output data of the identification model.
A computer device, the computer device comprising:
a memory storing at least one instruction; and
And the processor executes the instructions stored in the memory to realize the image recognition method.
A computer-readable storage medium having stored therein at least one instruction for execution by a processor in a computer device to implement the image recognition method.
According to the technical scheme, the method and the device can acquire a sample image, perform gray level processing on the sample image to obtain a first sample, perform gray level processing on a color image, reduce complexity of image processing, utilize a normal distribution algorithm, determine a target threshold value according to the first sample, accurately determine a threshold value of binary processing, enable distinction between characters and a background to be more obvious, avoid misjudgment, preprocess the first sample according to the target threshold value, obtain a second sample, acquire a preconfigured cutting template, divide the second sample according to the cutting template, obtain a third sample, divide the second sample according to the cutting template, then perform feature extraction, and further aim at the recognition accuracy and recognition efficiency, acquire digital features, chinese features and Chinese character features from the third sample, extract feature values of each Chinese character as basic elements, construct an initial English sample according to feature values of each Chinese character feature, not only reflect differences on a Chinese character structure, but also reflect common points on the structure, further aim at the initial dictionary, further aim at the recognition of the initial dictionary, further aim at the initial dictionary, and further aim at the recognition of the Chinese character, and further aim at the initial dictionary, further aim at the recognition of the Chinese character, the initial dictionary, more accurately form the Chinese character is more improved, the English features and the target Chinese character samples train a preset network to obtain a recognition model, an image to be recognized is obtained, the image to be recognized is recognized by the aid of the recognition model, a recognition result is generated according to output data of the recognition model, image recognition is achieved by combining an artificial intelligence means, accuracy of image text recognition is improved, and recognition performance is improved.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in detail with reference to the accompanying drawings and specific embodiments.
FIG. 1 is a flow chart of a preferred embodiment of the image recognition method of the present invention. The order of the steps in the flowchart may be changed and some steps may be omitted according to various needs.
The image recognition method is applied to one or more computer devices, wherein the computer device is a device capable of automatically performing numerical calculation and/or information processing according to preset or stored instructions, and the hardware of the computer device comprises, but is not limited to, a microprocessor, an Application SPECIFIC INTEGRATED Circuit (ASIC), a Programmable gate array (Field-Programmable GATE ARRAY, FPGA), a digital Processor (DIGITAL SIGNAL Processor, DSP), an embedded device and the like.
The computer device may be any electronic product that can interact with a user in a human-computer manner, such as a Personal computer, a tablet computer, a smart phone, a Personal digital assistant (Personal DIGITAL ASSISTANT, PDA), a game console, an interactive internet protocol television (Internet Protocol Television, IPTV), a smart wearable device, etc.
The computer device may also include a network device and/or a user device. Wherein the network device includes, but is not limited to, a single network server, a server group composed of a plurality of network servers, or a Cloud based Cloud Computing (Cloud Computing) composed of a large number of hosts or network servers.
The network in which the computer device is located includes, but is not limited to, the internet, a wide area network, a metropolitan area network, a local area network, a virtual private network (Virtual Private Network, VPN), and the like.
S10, acquiring a sample image, and carrying out graying treatment on the sample image to obtain a first sample.
In this embodiment, the sample image may be obtained by using a web crawler technology, or may be directly obtained from a specified database, which is not limited by the present invention.
The specified database can comprise a database of any enterprise or platform, and sufficient images are stored in the specified database for training.
In at least one embodiment of the present invention, the graying the sample image to obtain a first sample includes:
Acquiring an R (RED) value, a G (GREEN) value, and a B (BLUE) value of each sample image;
Determining a first weight corresponding to the R value, a second weight corresponding to the G value and a third weight corresponding to the B value;
Calculating a weighted average value according to the R value, the G value, the B value, the first weight, the second weight and the third weight of each sample image to obtain a gray value of each sample image;
And converting each sample image according to the gray value of each sample image to obtain the first sample.
It will be appreciated that the sample image is typically a colour image, and therefore, in order to reduce the throughput, it is necessary to grey scale the sample image, i.e. to convert the original picture into a grey scale image.
For example: through multiple tests, a proper weight value is obtained: the first weight wrr=0.36, the second weight wg=0.54, and the third weight wb=0.10, the processing effect is better. Further calculating a weighted average value, r=g=b=r+wrg+wrb=0.36r+0.54g+0.10b, wherein R represents the R value, G represents the G value, and B represents the B value.
By the implementation mode, the gray level processing can be carried out on the color image, and the complexity of image processing is reduced.
S11, determining a target threshold according to the first sample by using a normal distribution algorithm.
It can be understood that in the processing of the Chinese character image, the pixels of the Chinese character are normally distributed, the background of the Chinese character is also normally distributed, and two wave crests and one wave trough in the middle are formed when the background is reflected in the same gray histogram. The threshold is selected to distinguish between the Chinese character and the background pixels of the Chinese character, and the threshold is selected to be the pixel value in the trough. In actual processing, the difference between the wave crest and the wave trough is not particularly obvious due to the influence of environmental factors such as illumination and the like existing in the image to be identified, so that the threshold value is influenced. Too high a threshold value may result in the Chinese character pixels being treated as background removal. Therefore, the target threshold needs to be determined first.
In at least one embodiment of the present invention, said determining the target threshold from said first sample using a normal distribution algorithm comprises:
Identifying a text image and a background image of each first sample in the first samples;
acquiring the density of a text image of each first sample and the density of a background image of each first sample;
Acquiring the duty ratio of text image pixels of each first sample as a first duty ratio, and calculating the duty ratio of background image pixels of each first sample as a second duty ratio according to the first duty ratio;
calculating the mixed probability density of the text image and the background image of each first sample according to the first duty ratio, the second duty ratio, the density of the text image of each first sample and the density of the background image of each first sample;
Acquiring an initial threshold value;
Calculating the error probability sum of the text image and the background image of each first sample according to the initial threshold value and the mixed probability density of the text image and the background image of each first sample;
and when the error probability sum takes the minimum value, acquiring the value of the initial threshold value as the target threshold value.
For example, let the density, mean and variance of the text image be P1 (x), u and m2, respectively, and let the density, mean and variance of the background image be P2 (x), v and n2, respectively. Assuming that the text image pixel ratio is Q, then the background image pixel ratio is (1-Q), the mixed probability density of the text image and the background image of each first sample can be obtained:
P(X)=QP1(x)+(1-Q)P2(x)
Assuming that the selected initial threshold is T, the probability of misjudgment of the text image pixel points is as follows:
the probability of mistaking the background image for Chinese characters is as follows:
the error probability sum of the text image and the background image of each first sample can be obtained by the method:
C(T)=QC1(T)+(1-Q)C2(T)
wherein k is a positive integer.
To minimize the probability of errors, q=0.5 is calculated mathematically,
At this time, the value of the current T is determined as the target threshold.
Through the implementation mode, the threshold value of the binarization processing can be accurately determined, so that the distinction between the characters and the background is more obvious, and misjudgment is avoided.
S12, preprocessing the first sample according to the target threshold value to obtain a second sample.
In at least one embodiment of the present invention, the preprocessing the first sample according to the target threshold value, to obtain a second sample includes:
performing binarization processing on the first sample according to the target threshold value to obtain a first image set;
carrying out noise reduction treatment on the first image set to obtain a second image set;
calculating the angle of the characteristic connecting line in the second image set by adopting a Hough transformation algorithm;
and correcting the angle of the characteristic connecting line in the second image set to a horizontal position according to a rotation algorithm to obtain the second sample.
In this embodiment, binarization processing is adopted to convert the first sample into a picture only including black and white colors, so that the characteristic is conveniently and specifically processed.
Further, in reality, the digital image is often affected by interference of the imaging device with external environmental noise and the like during the digitizing and transmitting process, and is called a noisy image or a noise image. The process of reducing noise in a digital image is known as image noise reduction. There are many sources of noise in images, which originate from various aspects of image acquisition, transmission, compression, etc. The types of noise are also different, such as pretzel noise, gaussian noise, etc., and different processing algorithms are different for different noise, which is not described herein.
Further, when image recognition is performed, the photo possibly uploaded is not horizontal, so when most of image preprocessing is performed, the photo needs to be rotated by a program, so that the photo is at a position most likely to be kept horizontal, and therefore the photo is cut, and a good effect photo is obtained. In the case of the inclination correction, a hough transform algorithm is used to keep the picture horizontal. The Hough transformation principle is to connect intermittent characters into a straight line, so that the straight line detection is facilitated. After the angle of the straight line is calculated, the inclined picture is corrected to a correct horizontal position by using a rotation algorithm, so that the correction of characters in the image is realized.
Through the embodiment, the sample can be preprocessed, and the subsequent use is convenient.
S13, acquiring a preset cutting template, and cutting the second sample according to the cutting template to obtain a third sample.
It can be understood that the plate and text segmentation can be performed on the image after the inclination correction of the picture is completed.
Before the segmentation, the identification may be an identity card, a bank card, a business license and other various pictures. Because the formats of the pictures are relatively fixed and have certain regularity, the cutting templates of the pictures of the types are pre-configured, the second sample is cut according to the cutting templates, and then the characteristic extraction is carried out, so that the accuracy and the recognition efficiency of recognition can be improved more pertinently.
For example: the characteristic values of various images, such as the typesetting of identity cards, bank cards and license of drivers' license, can be extracted to be different. In the image segmentation process, the pixel is zero point, namely the segmentation position of the field, namely the cutting position, the column cutting point can be obtained according to the vertical histogram, the row cutting point can be judged according to the horizontal histogram, the information such as the height, the column width and the like can be further obtained, and the second sample is rapidly segmented by utilizing the information.
S14, acquiring digital features, english features and Chinese features from the third sample.
It can be understood that, because the number of english letters and numbers is limited, the number features and the english features are easier to extract and process, and because of the complexity of Chinese characters, errors are easy to occur when the Chinese characters are identified together, the number features, the english features and the Chinese character features are acquired in the embodiment, so that the corresponding processing is performed later, and the processing effect is improved.
S15, extracting the characteristic value of each Chinese character characteristic by taking the strokes of the Chinese characters as basic elements, and constructing an initial Chinese character sample according to the characteristic value of each Chinese character characteristic.
In at least one embodiment of the present invention, the extracting the feature value of each Chinese character feature using the strokes of the Chinese character as the basic element includes:
Dividing the region of each Chinese character feature to obtain a horizontal sub-graph, a longitudinal sub-graph and an oblique sub-graph of each Chinese character feature;
randomly acquiring pixel points from each Chinese character feature as initial pixel points;
Determining the initial pixel point as a starting point, and detecting black pixel points in a horizontal sub-graph, a longitudinal sub-graph and an oblique sub-graph of each Chinese character characteristic;
determining transverse strokes according to the number and the length of black pixel points in the detected transverse subgraph of each Chinese character feature;
Determining longitudinal strokes according to the number and the length of black pixel points in the detected longitudinal subgraph of each Chinese character feature;
determining oblique strokes according to the number and the length of black pixel points in the detected oblique subgraph of each Chinese character characteristic;
and constructing the characteristic value of each Chinese character characteristic according to the transverse stroke of each Chinese character characteristic, the longitudinal stroke of each Chinese character characteristic and the oblique stroke of each Chinese character characteristic.
Specifically, the Chinese character features are extracted. Firstly, according to the region division of the recognized text Chinese character, a horizontal sub-graph, a vertical sub-graph and an oblique sub-graph are decomposed, the initial value of the intersection point is set to be zero, an image handle can be obtained, then pixel points can be detected in sequence from left to right, and the number and the length of each basic stroke such as the horizontal stroke, the vertical stroke and the like of the Chinese character are extracted. For example, in processing the number of lateral strokes, the initial value of the lateral stroke is set to 0, each pixel is sequentially detected and verified in the order from the start point to the left, when a black pixel is detected, the position of the pixel is marked as the lateral stroke start point, and the number of lateral strokes is added to 1. Thus, every time a pixel point of a lateral stroke is encountered, the number of lateral strokes may be increased by 1 until a blank pixel is detected, stopping detection.
Further, the constructing an initial Chinese character sample according to the feature value of each Chinese character feature comprises:
Acquiring a length threshold value of a transverse stroke of each Chinese character feature as a transverse length threshold value, acquiring a length threshold value of a longitudinal stroke of each Chinese character feature as a longitudinal length threshold value, and acquiring a length threshold value of an oblique stroke of each Chinese character feature as an oblique length threshold value;
Acquiring the length of the transverse stroke, the length of the longitudinal stroke and the length of the oblique stroke of each Chinese character feature from the feature value of each Chinese character feature;
when the length of the detected transverse strokes with the Chinese character characteristics is greater than or equal to the transverse length threshold, determining the detected transverse strokes as target transverse strokes;
when the length of the detected longitudinal strokes with the Chinese character characteristics is greater than or equal to the longitudinal length threshold, determining the detected longitudinal strokes as target longitudinal strokes;
When the length of the detected oblique strokes with the Chinese character characteristics is larger than or equal to the threshold value of the oblique length, determining the detected oblique strokes as target oblique strokes;
And constructing Chinese character information corresponding to each Chinese character feature according to the target transverse strokes, the target longitudinal strokes and the target oblique strokes corresponding to each Chinese character feature, and obtaining the initial Chinese character sample.
Wherein, the Chinese character information includes, but is not limited to: the number of strokes, length, number of intersections, etc.
For example: when designing, a dot matrix of 32 x 32 can be adopted, whether the number value of the transverse strokes is larger than the quarter length of the dot matrix is judged, if the number value of the transverse strokes is smaller than the quarter length, the current stroke is considered to be not the transverse stroke and can be removed as noise, and only when the current stroke is larger than or equal to the quarter length, the current stroke can be considered to be the transverse stroke, and the length of the transverse stroke is recorded at the moment. Vertical strokes and oblique strokes (left-falling, right-falling, etc.) are processed by adopting the same similar method.
Through the implementation mode, the Chinese character strokes can be used as basic elements to construct Chinese character information of each Chinese character feature, not only can reflect the difference of Chinese character structures, but also can reflect the common point of shape and close words in the structures, so that the extraction of the Chinese character features is optimized, and the extracted features are more accurate.
S16, obtaining a pre-configured uncommon word dictionary and a shape near word dictionary, and adding the uncommon word dictionary and the shape near word dictionary to the initial Chinese character sample to obtain a target Chinese character sample.
In this embodiment, the rare word dictionary and the near word dictionary may be configured in a customized manner according to an actual application scenario, which is not limited in the present invention.
In the embodiment, the rarely used word dictionary and the near-shape word dictionary are further introduced on the basis of the constructed Chinese character sample, so that the method can be used for carrying out targeted training, and the accuracy rate of recognizing the rarely used words and the near-shape words is effectively improved.
S17, training a preset network according to the digital features, the English features and the target Chinese character samples to obtain an identification model.
The preset network may be an SVM (Support Vector Machine ).
Specifically, in the training process, indexes such as accuracy, recall rate and F (F-Measure) value of the model can be continuously obtained until the indexes such as accuracy, recall rate and F value reach the requirements, and training is stopped, so that the identification model can be obtained.
S18, acquiring an image to be identified, identifying the image to be identified by using the identification model, and generating an identification result according to the output data of the identification model.
In at least one embodiment of the present invention, the generating the recognition result according to the output data of the recognition model includes:
Invoking a pre-configured text library;
acquiring all features in the output data;
Matching in the text library by utilizing all the characteristics in the output data;
and determining the matched characters with all the characteristics as the identification result.
For example: when the output data contains characteristic horizontal, left falling and right falling, and the cross point is one, the character library can be matched to be 'large'.
The embodiment preprocesses the image through the complete processes of gray level processing, binarization, image noise reduction, inclination correction, text segmentation and the like. The binarization processing of the image preprocessing stage uses proper parameters and a proper method to calculate a threshold value which accords with a use scene, and can effectively distinguish the image from the text content. And the method for extracting the characteristics is used for structurally dividing the Chinese character into areas, so that the characteristic values of the quantity and the length of basic structures of the Chinese character, such as transverse strokes, vertical strokes, oblique strokes and the like of the Chinese character, can be respectively obtained. Furthermore, when the feature values are subjected to classification training, the feature training learning of the very commonly used Chinese characters (rare words) and the shape-similar words is also considered, so that the accuracy of image text recognition is improved, and the recognition performance is also improved.
Further, the identification result can be fed back to the user for selection.
It should be noted that, in order to further improve the security of the data and avoid the data from being tampered maliciously, the identification model may be stored in the blockchain node.
According to the technical scheme, the image recognition method and the device can be combined with an artificial intelligence means to realize image recognition, improve the accuracy of image text recognition and improve the recognition performance.
Fig. 2 is a functional block diagram of an image recognition apparatus according to a preferred embodiment of the present invention. The image recognition device 11 includes a processing unit 110, a determining unit 111, a segmentation unit 112, an acquisition unit 113, a construction unit 114, an adding unit 115, a training unit 116, and a recognition unit 117. The module/unit referred to in the present invention refers to a series of computer program segments capable of being executed by the processor 13 and of performing a fixed function, which are stored in the memory 12. In the present embodiment, the functions of the respective modules/units will be described in detail in the following embodiments.
The processing unit 110 acquires a sample image, and performs graying processing on the sample image to obtain a first sample.
In this embodiment, the sample image may be obtained by using a web crawler technology, or may be directly obtained from a specified database, which is not limited by the present invention.
The specified database can comprise a database of any enterprise or platform, and sufficient images are stored in the specified database for training.
In at least one embodiment of the present invention, the processing unit 110 performs graying processing on the sample image, to obtain a first sample includes:
Acquiring an R (RED) value, a G (GREEN) value, and a B (BLUE) value of each sample image;
Determining a first weight corresponding to the R value, a second weight corresponding to the G value and a third weight corresponding to the B value;
Calculating a weighted average value according to the R value, the G value, the B value, the first weight, the second weight and the third weight of each sample image to obtain a gray value of each sample image;
And converting each sample image according to the gray value of each sample image to obtain the first sample.
It will be appreciated that the sample image is typically a colour image, and therefore, in order to reduce the throughput, it is necessary to grey scale the sample image, i.e. to convert the original picture into a grey scale image.
For example: through multiple tests, a proper weight value is obtained: the first weight wrr=0.36, the second weight wg=0.54, and the third weight wb=0.10, the processing effect is better. Further calculating a weighted average value, r=g=b=r+wrg+wrb=0.36r+0.54g+0.10b, wherein R represents the R value, G represents the G value, and B represents the B value.
By the implementation mode, the gray level processing can be carried out on the color image, and the complexity of image processing is reduced.
The determination unit 111 determines a target threshold value from the first sample using a normal distribution algorithm.
It can be understood that in the processing of the Chinese character image, the pixels of the Chinese character are normally distributed, the background of the Chinese character is also normally distributed, and two wave crests and one wave trough in the middle are formed when the background is reflected in the same gray histogram. The threshold is selected to distinguish between the Chinese character and the background pixels of the Chinese character, and the threshold is selected to be the pixel value in the trough. In actual processing, the difference between the wave crest and the wave trough is not particularly obvious due to the influence of environmental factors such as illumination and the like existing in the image to be identified, so that the threshold value is influenced. Too high a threshold value may result in the Chinese character pixels being treated as background removal. Therefore, the target threshold needs to be determined first.
In at least one embodiment of the present invention, the determining unit 111 determines the target threshold value according to the first sample using a normal distribution algorithm includes:
Identifying a text image and a background image of each first sample in the first samples;
acquiring the density of a text image of each first sample and the density of a background image of each first sample;
Acquiring the duty ratio of text image pixels of each first sample as a first duty ratio, and calculating the duty ratio of background image pixels of each first sample as a second duty ratio according to the first duty ratio;
calculating the mixed probability density of the text image and the background image of each first sample according to the first duty ratio, the second duty ratio, the density of the text image of each first sample and the density of the background image of each first sample;
Acquiring an initial threshold value;
Calculating the error probability sum of the text image and the background image of each first sample according to the initial threshold value and the mixed probability density of the text image and the background image of each first sample;
and when the error probability sum takes the minimum value, acquiring the value of the initial threshold value as the target threshold value.
For example, let the density, mean and variance of the text image be P1 (x), u and m2, respectively, and let the density, mean and variance of the background image be P2 (x), v and n2, respectively. Assuming that the text image pixel ratio is Q, then the background image pixel ratio is (1-Q), the mixed probability density of the text image and the background image of each first sample can be obtained:
P(X)=QP1(x)+(1-Q)P2(x)
Assuming that the selected initial threshold is T, the probability of misjudgment of the text image pixel points is as follows:
the probability of mistaking the background image for Chinese characters is as follows:
the error probability sum of the text image and the background image of each first sample can be obtained by the method:
C(T)=QC1(T)+(1-Q)C2(T)
wherein k is a positive integer.
To minimize the probability of errors, q=0.5 is calculated mathematically,
At this time, the value of the current T is determined as the target threshold.
Through the implementation mode, the threshold value of the binarization processing can be accurately determined, so that the distinction between the characters and the background is more obvious, and misjudgment is avoided.
The processing unit 110 pre-processes the first sample according to the target threshold value to obtain a second sample.
In at least one embodiment of the present invention, the processing unit 110 pre-processes the first sample according to the target threshold value, and obtaining a second sample includes:
performing binarization processing on the first sample according to the target threshold value to obtain a first image set;
carrying out noise reduction treatment on the first image set to obtain a second image set;
calculating the angle of the characteristic connecting line in the second image set by adopting a Hough transformation algorithm;
and correcting the angle of the characteristic connecting line in the second image set to a horizontal position according to a rotation algorithm to obtain the second sample.
In this embodiment, binarization processing is adopted to convert the first sample into a picture only including black and white colors, so that the characteristic is conveniently and specifically processed.
Further, in reality, the digital image is often affected by interference of the imaging device with external environmental noise and the like during the digitizing and transmitting process, and is called a noisy image or a noise image. The process of reducing noise in a digital image is known as image noise reduction. There are many sources of noise in images, which originate from various aspects of image acquisition, transmission, compression, etc. The types of noise are also different, such as pretzel noise, gaussian noise, etc., and different processing algorithms are different for different noise, which is not described herein.
Further, when image recognition is performed, the photo possibly uploaded is not horizontal, so when most of image preprocessing is performed, the photo needs to be rotated by a program, so that the photo is at a position most likely to be kept horizontal, and therefore the photo is cut, and a good effect photo is obtained. In the case of the inclination correction, a hough transform algorithm is used to keep the picture horizontal. The Hough transformation principle is to connect intermittent characters into a straight line, so that the straight line detection is facilitated. After the angle of the straight line is calculated, the inclined picture is corrected to a correct horizontal position by using a rotation algorithm, so that the correction of characters in the image is realized.
Through the embodiment, the sample can be preprocessed, and the subsequent use is convenient.
The splitting unit 112 obtains a preset cutting template, and splits the second sample according to the cutting template to obtain a third sample.
It can be understood that the plate and text segmentation can be performed on the image after the inclination correction of the picture is completed.
Before the segmentation, the identification may be an identity card, a bank card, a business license and other various pictures. Because the formats of the pictures are relatively fixed and have certain regularity, the cutting templates of the pictures of the types are pre-configured, the second sample is cut according to the cutting templates, and then the characteristic extraction is carried out, so that the accuracy and the recognition efficiency of recognition can be improved more pertinently.
For example: the characteristic values of various images, such as the typesetting of identity cards, bank cards and license of drivers' license, can be extracted to be different. In the image segmentation process, the pixel is zero point, namely the segmentation position of the field, namely the cutting position, the column cutting point can be obtained according to the vertical histogram, the row cutting point can be judged according to the horizontal histogram, the information such as the height, the column width and the like can be further obtained, and the second sample is rapidly segmented by utilizing the information.
The obtaining unit 113 obtains a digital feature, an english feature, and a kanji feature from the third sample.
It can be understood that, because the number of english letters and numbers is limited, the number features and the english features are easier to extract and process, and because of the complexity of Chinese characters, errors are easy to occur when the Chinese characters are identified together, the number features, the english features and the Chinese character features are acquired in the embodiment, so that the corresponding processing is performed later, and the processing effect is improved.
The construction unit 114 extracts a feature value of each character feature using the strokes of the character as a basic element, and constructs an initial character sample according to the feature value of each character feature.
In at least one embodiment of the present invention, the extracting, by the constructing unit 114, the feature value of each Chinese character feature using the strokes of the Chinese character as the basic elements includes:
Dividing the region of each Chinese character feature to obtain a horizontal sub-graph, a longitudinal sub-graph and an oblique sub-graph of each Chinese character feature;
randomly acquiring pixel points from each Chinese character feature as initial pixel points;
Determining the initial pixel point as a starting point, and detecting black pixel points in a horizontal sub-graph, a longitudinal sub-graph and an oblique sub-graph of each Chinese character characteristic;
determining transverse strokes according to the number and the length of black pixel points in the detected transverse subgraph of each Chinese character feature;
Determining longitudinal strokes according to the number and the length of black pixel points in the detected longitudinal subgraph of each Chinese character feature;
determining oblique strokes according to the number and the length of black pixel points in the detected oblique subgraph of each Chinese character characteristic;
and constructing the characteristic value of each Chinese character characteristic according to the transverse stroke of each Chinese character characteristic, the longitudinal stroke of each Chinese character characteristic and the oblique stroke of each Chinese character characteristic.
Specifically, the Chinese character features are extracted. Firstly, according to the region division of the recognized text Chinese character, a horizontal sub-graph, a vertical sub-graph and an oblique sub-graph are decomposed, the initial value of the intersection point is set to be zero, an image handle can be obtained, then pixel points can be detected in sequence from left to right, and the number and the length of each basic stroke such as the horizontal stroke, the vertical stroke and the like of the Chinese character are extracted. For example, in processing the number of lateral strokes, the initial value of the lateral stroke is set to 0, each pixel is sequentially detected and verified in the order from the start point to the left, when a black pixel is detected, the position of the pixel is marked as the lateral stroke start point, and the number of lateral strokes is added to 1. Thus, every time a pixel point of a lateral stroke is encountered, the number of lateral strokes may be increased by 1 until a blank pixel is detected, stopping detection.
Further, the constructing unit 114 constructs an initial chinese character sample according to the feature value of each chinese character feature, including:
Acquiring a length threshold value of a transverse stroke of each Chinese character feature as a transverse length threshold value, acquiring a length threshold value of a longitudinal stroke of each Chinese character feature as a longitudinal length threshold value, and acquiring a length threshold value of an oblique stroke of each Chinese character feature as an oblique length threshold value;
Acquiring the length of the transverse stroke, the length of the longitudinal stroke and the length of the oblique stroke of each Chinese character feature from the feature value of each Chinese character feature;
when the length of the detected transverse strokes with the Chinese character characteristics is greater than or equal to the transverse length threshold, determining the detected transverse strokes as target transverse strokes;
when the length of the detected longitudinal strokes with the Chinese character characteristics is greater than or equal to the longitudinal length threshold, determining the detected longitudinal strokes as target longitudinal strokes;
When the length of the detected oblique strokes with the Chinese character characteristics is larger than or equal to the threshold value of the oblique length, determining the detected oblique strokes as target oblique strokes;
And constructing Chinese character information corresponding to each Chinese character feature according to the target transverse strokes, the target longitudinal strokes and the target oblique strokes corresponding to each Chinese character feature, and obtaining the initial Chinese character sample.
Wherein, the Chinese character information includes, but is not limited to: the number of strokes, length, number of intersections, etc.
For example: when designing, a dot matrix of 32 x 32 can be adopted, whether the number value of the transverse strokes is larger than the quarter length of the dot matrix is judged, if the number value of the transverse strokes is smaller than the quarter length, the current stroke is considered to be not the transverse stroke and can be removed as noise, and only when the current stroke is larger than or equal to the quarter length, the current stroke can be considered to be the transverse stroke, and the length of the transverse stroke is recorded at the moment. Vertical strokes and oblique strokes (left-falling, right-falling, etc.) are processed by adopting the same similar method.
Through the implementation mode, the Chinese character strokes can be used as basic elements to construct Chinese character information of each Chinese character feature, not only can reflect the difference of Chinese character structures, but also can reflect the common point of shape and close words in the structures, so that the extraction of the Chinese character features is optimized, and the extracted features are more accurate.
The adding unit 115 obtains a pre-configured uncommon word dictionary and shape near word dictionary, and adds the uncommon word dictionary and the shape near word dictionary to the initial Chinese character sample to obtain a target Chinese character sample.
In this embodiment, the rare word dictionary and the near word dictionary may be configured in a customized manner according to an actual application scenario, which is not limited in the present invention.
In the embodiment, the rarely used word dictionary and the near-shape word dictionary are further introduced on the basis of the constructed Chinese character sample, so that the method can be used for carrying out targeted training, and the accuracy rate of recognizing the rarely used words and the near-shape words is effectively improved.
The training unit 116 trains a preset network according to the digital features, the english features and the target chinese character samples, and obtains an identification model.
The preset network may be an SVM (Support Vector Machine ).
Specifically, in the training process, indexes such as accuracy, recall rate and F (F-Measure) value of the model can be continuously obtained until the indexes such as accuracy, recall rate and F value reach the requirements, and training is stopped, so that the identification model can be obtained.
The recognition unit 117 acquires an image to be recognized, recognizes the image to be recognized using the recognition model, and generates a recognition result from output data of the recognition model.
In at least one embodiment of the present invention, the identifying unit 117 generates an identification result according to output data of the identification model, including:
Invoking a pre-configured text library;
acquiring all features in the output data;
Matching in the text library by utilizing all the characteristics in the output data;
and determining the matched characters with all the characteristics as the identification result.
For example: when the output data contains characteristic horizontal, left falling and right falling, and the cross point is one, the character library can be matched to be 'large'.
The embodiment preprocesses the image through the complete processes of gray level processing, binarization, image noise reduction, inclination correction, text segmentation and the like. The binarization processing of the image preprocessing stage uses proper parameters and a proper method to calculate a threshold value which accords with a use scene, and can effectively distinguish the image from the text content. And the method for extracting the characteristics is used for structurally dividing the Chinese character into areas, so that the characteristic values of the quantity and the length of basic structures of the Chinese character, such as transverse strokes, vertical strokes, oblique strokes and the like of the Chinese character, can be respectively obtained. Furthermore, when the feature values are subjected to classification training, the feature training learning of the very commonly used Chinese characters (rare words) and the shape-similar words is also considered, so that the accuracy of image text recognition is improved, and the recognition performance is also improved.
Further, the identification result can be fed back to the user for selection.
It should be noted that, in order to further improve the security of the data and avoid the data from being tampered maliciously, the identification model may be stored in the blockchain node.
According to the technical scheme, the image recognition method and the device can be combined with an artificial intelligence means to realize image recognition, improve the accuracy of image text recognition and improve the recognition performance.
Fig. 3 is a schematic structural diagram of a computer device for implementing a preferred embodiment of the image recognition method according to the present invention.
The computer device 1 may comprise a memory 12, a processor 13 and a bus, and may further comprise a computer program, such as an image recognition program, stored in the memory 12 and executable on the processor 13.
It will be appreciated by those skilled in the art that the schematic diagram is merely an example of the computer device 1 and does not constitute a limitation of the computer device 1, the computer device 1 may be a bus type structure, a star type structure, the computer device 1 may further comprise more or less other hardware or software than illustrated, or a different arrangement of components, for example, the computer device 1 may further comprise an input-output device, a network access device, etc.
It should be noted that the computer device 1 is only used as an example, and other electronic products that may be present in the present invention or may be present in the future are also included in the scope of the present invention by way of reference.
The memory 12 includes at least one type of readable storage medium including flash memory, a removable hard disk, a multimedia card, a card memory (e.g., SD or DX memory, etc.), a magnetic memory, a magnetic disk, an optical disk, etc. The memory 12 may in some embodiments be an internal storage unit of the computer device 1, such as a removable hard disk of the computer device 1. The memory 12 may also be an external storage device of the computer device 1 in other embodiments, such as a plug-in mobile hard disk, a smart memory card (SMART MEDIA CARD, SMC), a Secure Digital (SD) card, a flash memory card (FLASH CARD) or the like, which are provided on the computer device 1. Further, the memory 12 may also include both an internal storage unit and an external storage device of the computer device 1. The memory 12 may be used not only for storing application software installed in the computer device 1 and various types of data, such as codes of image recognition programs, etc., but also for temporarily storing data that has been output or is to be output.
The processor 13 may be comprised of integrated circuits in some embodiments, for example, a single packaged integrated circuit, or may be comprised of multiple integrated circuits packaged with the same or different functions, including one or more central processing units (Central Processing unit, CPU), microprocessors, digital processing chips, graphics processors, various control chips, and the like. The processor 13 is a Control Unit (Control Unit) of the computer device 1, connects the respective components of the entire computer device 1 using various interfaces and lines, and executes various functions of the computer device 1 and processes data by running or executing programs or modules (for example, executing an image recognition program or the like) stored in the memory 12, and calling data stored in the memory 12.
The processor 13 executes the operating system of the computer device 1 and various types of applications installed. The processor 13 executes the application program to implement the steps of the various image recognition method embodiments described above, such as the steps shown in fig. 1.
Illustratively, the computer program may be partitioned into one or more modules/units that are stored in the memory 12 and executed by the processor 13 to complete the present invention. The one or more modules/units may be a series of computer readable instruction segments capable of performing the specified functions, which instruction segments describe the execution of the computer program in the computer device 1. For example, the computer program may be divided into a processing unit 110, a determining unit 111, a segmentation unit 112, an acquisition unit 113, a construction unit 114, an adding unit 115, a training unit 116, and an identification unit 117.
The integrated units implemented in the form of software functional modules described above may be stored in a computer readable storage medium. The software functional modules are stored in a storage medium and include instructions for causing a computer device (which may be a personal computer, a computer device, or a network device, etc.) or a processor (processor) to perform portions of the image recognition method according to the embodiments of the present invention.
The modules/units integrated in the computer device 1 may be stored in a computer readable storage medium if implemented in the form of software functional units and sold or used as separate products. Based on this understanding, the present invention may also be implemented by a computer program for instructing a relevant hardware device to implement all or part of the procedures of the above-mentioned embodiment method, where the computer program may be stored in a computer readable storage medium and the computer program may be executed by a processor to implement the steps of each of the above-mentioned method embodiments.
Wherein the computer program comprises computer program code which may be in source code form, object code form, executable file or some intermediate form etc. The computer readable medium may include: any entity or device capable of carrying the computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM), a random access Memory, or the like.
Further, the computer-readable storage medium may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function, and the like; the storage data area may store data created from the use of blockchain nodes, and the like.
The blockchain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanism, encryption algorithm and the like. The blockchain (Blockchain), essentially a de-centralized database, is a string of data blocks that are generated in association using cryptographic methods, each of which contains information from a batch of network transactions for verifying the validity (anti-counterfeit) of its information and generating the next block. The blockchain may include a blockchain underlying platform, a platform product services layer, an application services layer, and the like.
The bus may be a peripheral component interconnect standard (PERIPHERAL COMPONENT INTERCONNECT, PCI) bus, or an extended industry standard architecture (extended industry standard architecture, EISA) bus, among others. The bus may be classified as an address bus, a data bus, a control bus, etc. For ease of illustration, only one straight line is shown in fig. 3, but not only one bus or one type of bus. The bus is arranged to enable a connection communication between the memory 12 and at least one processor 13 or the like.
Although not shown, the computer device 1 may further comprise a power source (such as a battery) for powering the various components, preferably the power source may be logically connected to the at least one processor 13 via a power management means, whereby the functions of charge management, discharge management, and power consumption management are achieved by the power management means. The power supply may also include one or more of any of a direct current or alternating current power supply, recharging device, power failure detection circuit, power converter or inverter, power status indicator, etc. The computer device 1 may further include various sensors, bluetooth modules, wi-Fi modules, etc., which will not be described in detail herein.
Further, the computer device 1 may also comprise a network interface, optionally comprising a wired interface and/or a wireless interface (e.g. WI-FI interface, bluetooth interface, etc.), typically used for establishing a communication connection between the computer device 1 and other computer devices.
The computer device 1 may optionally further comprise a user interface, which may be a Display, an input unit, such as a Keyboard (Keyboard), or a standard wired interface, a wireless interface. Alternatively, in some embodiments, the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode) touch, or the like. The display may also be referred to as a display screen or display unit, as appropriate, for displaying information processed in the computer device 1 and for displaying a visual user interface.
It should be understood that the embodiments described are for illustrative purposes only and are not limited to this configuration in the scope of the patent application.
Fig. 3 shows only a computer device 1 with components 12-13, it being understood by those skilled in the art that the structure shown in fig. 3 is not limiting of the computer device 1 and may include fewer or more components than shown, or may combine certain components, or a different arrangement of components.
In connection with fig. 1, the memory 12 in the computer device 1 stores a plurality of instructions to implement an image recognition method, the processor 13 being executable to implement:
Acquiring a sample image, and carrying out graying treatment on the sample image to obtain a first sample;
Determining a target threshold according to the first sample by using a normal distribution algorithm;
Preprocessing the first sample according to the target threshold value to obtain a second sample;
obtaining a preset cutting template, and cutting the second sample according to the cutting template to obtain a third sample;
Acquiring digital characteristics, english characteristics and Chinese character characteristics from the third sample;
extracting characteristic values of each Chinese character characteristic by taking Chinese character strokes as basic elements, and constructing an initial Chinese character sample according to the characteristic values of each Chinese character characteristic;
obtaining a pre-configured uncommon word dictionary and a shape near word dictionary, and adding the uncommon word dictionary and the shape near word dictionary to the initial Chinese character sample to obtain a target Chinese character sample;
training a preset network according to the digital characteristics, the English characteristics and the target Chinese character samples to obtain an identification model;
and acquiring an image to be identified, identifying the image to be identified by using the identification model, and generating an identification result according to the output data of the identification model.
Specifically, the specific implementation method of the above instructions by the processor 13 may refer to the description of the relevant steps in the corresponding embodiment of fig. 1, which is not repeated herein.
In the several embodiments provided in the present invention, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is merely a logical function division, and there may be other manners of division when actually implemented.
The modules described as separate components may or may not be physically separate, and components shown as modules may or may not be physical units, may be located in one place, or may be distributed over multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional module in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units can be realized in a form of hardware or a form of hardware and a form of software functional modules.
It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential characteristics thereof.
The present embodiments are, therefore, to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference signs in the claims shall not be construed as limiting the claim concerned.
Furthermore, it is evident that the word "comprising" does not exclude other elements or steps, and that the singular does not exclude a plurality. The units or means stated in the invention may also be implemented by one unit or means, either by software or hardware. The terms first, second, etc. are used to denote a name, but not any particular order.
Finally, it should be noted that the above-mentioned embodiments are merely for illustrating the technical solution of the present invention and not for limiting the same, and although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications and equivalents may be made to the technical solution of the present invention without departing from the spirit and scope of the technical solution of the present invention.