CN113065379B

Movatterモバイル変換

Info

Publication number: CN113065379B
Application number: CN201911378061.0A
Authority: CN
Inventors: 冯展鹏; 黄德威; 胡文泽; 王孝宇
Original assignee: Shenzhen Intellifusion Technologies Co Ltd
Current assignee: Shenzhen Intellifusion Technologies Co Ltd
Priority date: 2019-12-27
Filing date: 2019-12-27
Publication date: 2024-05-07
Anticipated expiration: 2039-12-27
Also published as: CN113065379A

Abstract

The embodiment of the invention provides an image detection method, an image detection device, electronic equipment and a storage medium for fusing image quality, wherein the method comprises the following steps: detecting an image to be detected through a pre-trained convolutional neural network, and extracting to obtain characteristic information of a detection frame, wherein the characteristic information is fused with detection frame information and image quality information corresponding to the detection frame, and the image to be detected comprises a target object; performing non-maximum suppression according to the detection frame information to obtain target detection frame information; calculating the image quality information corresponding to the target detection frame information and the pre-trained quality weight to obtain the image quality score of the target detection frame; and outputting and obtaining the target object based on the target detection frame information and the image quality score of the target detection frame. Meanwhile, the detection frame information and the image quality information are extracted, and the image quality information of the target detection frame is evaluated through the quality weight, so that the influence on the running speed of the terminal equipment is small.

Description

Image detection method and device integrating image quality and electronic equipment

Technical Field

The present invention relates to the field of artificial intelligence technologies, and in particular, to an image detection method, an apparatus, and an electronic device for fusing image quality.

Background

With the development of the internet of things and edge computing, more and more algorithms are transplanted to terminal equipment, such as security equipment, intelligent home equipment and the like, and the algorithms are transplanted to the terminal equipment, so that certain computing capacity of the terminal equipment is provided, cloud computing power is not relied on any more, and the computing power cost is reduced. However, due to limitations of computing power and computing units on the terminal device, the algorithm model needs faster computing speed and smaller computing resource requirements when designing, for example, a detection algorithm is transplanted to a camera, due to the computing speed and computing resource requirements required by the algorithm, quality evaluation of detected images cannot be performed, in later image processing, image quality is not guaranteed, if an image quality evaluation module is introduced before or after detection, the computing power burden of the terminal device is greatly increased, for example, in the face detection process, a module of a single picture subjected to face quality evaluation is 2 milliseconds, in general indoor scene monitoring, 20 persons are in a monitoring snap-shot picture, the speed of processing a picture by the face quality evaluation module is 20 milliseconds (50 fps), and generally the speed of a face detector which is commonly used is about 30ms, if the speed of operating the whole face detection plus the face quality evaluation module is 50ms, which is far from a system requiring real-time computing. Therefore, after the image quality evaluation module is added to the existing detection module, the time consumption for image quality evaluation is increased, so that the running speed of the whole terminal equipment is greatly reduced, and the real-time detection performance of the terminal equipment is greatly influenced.

Disclosure of Invention

The embodiment of the invention provides an image detection method for fusing image quality, which can output the image quality while detecting and has less influence on the running speed of terminal equipment.

In a first aspect, an embodiment of the present invention provides an image detection method for fusing image quality, including:

Detecting an image to be detected through a pre-trained convolutional neural network, and extracting to obtain characteristic information of a detection frame, wherein the characteristic information is fused with detection frame information and image quality information corresponding to the detection frame, and the image to be detected comprises a target object;

performing non-maximum suppression according to the detection frame information to obtain target detection frame information;

Calculating the image quality information corresponding to the target detection frame information and the pre-trained quality weight to obtain the image quality score of the target detection frame;

And outputting and obtaining the target object based on the target detection frame information and the image quality score of the target detection frame.

Optionally, the detection frame information includes a confidence level, the detecting the image to be detected through a pre-trained convolutional neural network, extracting feature information of the detection frame, including:

extracting characteristic images under different grid scales through a pre-trained convolutional neural network, wherein the characteristic images comprise a preset number of image quality channels;

Predicting the characteristic images under different grid scales through a detection frame, and extracting the target classification probability of the detection frame of each grid from the characteristic images under the corresponding different grid scales as the confidence coefficient of the detection frame;

Extracting image quality information of each detection frame according to the image quality channel to obtain image quality information corresponding to each detection frame;

and obtaining the characteristic information of the detection frame based on the confidence and the image quality information corresponding to the detection frame.

Optionally, the detection frame information includes coordinate information of a detection frame, and the performing non-maximum suppression according to the detection frame information to obtain target detection frame information includes:

calculating the cross-over ratio between every two detection frames according to the coordinate information of the detection frames, and judging whether the cross-over ratio is larger than a preset cross-over ratio threshold value or not;

if the intersection ratio is larger than a preset intersection ratio threshold value, comparing the confidence degree between the two detection frames, and reserving the detection frames with larger confidence degrees;

And traversing all the detection frames, and deleting the detection frames with the confidence coefficient smaller than a preset confidence coefficient threshold value to obtain target detection frame information.

Optionally, the calculating the image quality information corresponding to the target detection frame information and the pre-trained quality weight to obtain the image quality score of the target detection frame includes:

And performing dot multiplication calculation on the image quality information corresponding to the target detection frame and the pre-trained quality weight to obtain the image quality score of the target detection frame.

Optionally, the outputting, based on the target detection frame information and the image quality score of the target detection frame, the target object includes:

Outputting a corresponding image in the target detection frame as a target object according to the target detection frame information; and

And outputting the image quality score of the target detection frame as the image quality score of the target object.

Optionally, the training of the convolutional neural network includes:

training the convolutional neural network through a first sample data set added with a first image quality label so that the convolutional neural network can extract a corresponding target object and image quality information;

training of the quality weights comprises the following steps:

training the quality weight through a second sample data set added with second image quality information so that the quality weight can extract a corresponding image quality score;

Wherein training of the convolutional neural network is decoupled from training of the quality weights.

In a second aspect, an embodiment of the present invention provides an image detection apparatus that fuses image quality, including:

The extraction module is used for detecting an image to be detected through a pre-trained convolutional neural network, extracting feature information of a detection frame, wherein the feature information is fused with detection frame information and image quality information corresponding to the detection frame, and the image to be detected comprises a target object;

The processing module is used for carrying out non-maximum suppression according to the detection frame information to obtain target detection frame information;

the calculating module is used for calculating the image quality information corresponding to the target detection frame and the pre-trained quality weight to obtain the image quality score of the target detection frame;

And the output module is used for outputting and obtaining the target object based on the target detection frame information and the image quality score of the target detection frame.

In a third aspect, an embodiment of the present invention provides an electronic device, including: the image detection method comprises the steps of a memory, a processor and a computer program which is stored in the memory and can run on the processor, wherein the steps in the image detection method for fusing image quality provided by the embodiment of the invention are realized when the processor executes the computer program.

In a fourth aspect, embodiments of the present invention provide a computer readable storage medium having a computer program stored thereon, the computer program when executed by a processor implementing the steps in the image detection method for fused image quality provided by the embodiments of the present invention.

In the embodiment of the invention, the image to be detected is detected through a pre-trained convolutional neural network, the characteristic information of a detection frame is extracted, the characteristic information is fused with the detection frame information and the image quality information corresponding to the detection frame, and the image to be detected comprises a target object; performing non-maximum suppression according to the detection frame information to obtain target detection frame information; calculating the image quality information corresponding to the target detection frame information and the pre-trained quality weight to obtain the image quality score of the target detection frame; and outputting and obtaining the target object based on the target detection frame information and the image quality score of the target detection frame. When an image to be detected is detected, the characteristic information of the detection frame fused with the detection frame information and the image quality information is extracted, the target detection frame information is obtained through non-maximum value inhibition, the image quality information of the target detection frame is evaluated through the quality weight, the image quality evaluation of the target object is not required to be carried out after the target object is output, the evaluation process of only one quality weight and the image quality information is increased, and the time cost for additionally extracting the image quality information is not required, so that the influence on the running speed of terminal equipment is small.

Drawings

In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a flowchart of an image detection method for fusing image quality according to an embodiment of the present invention;

Fig. 2 is a flowchart of a method for extracting feature information of a detection frame according to an embodiment of the present invention;

fig. 3 is a schematic structural diagram of an image detection device for fusing image quality according to an embodiment of the present invention;

Fig. 4 is a schematic structural diagram of another image detection device for fusing image quality according to an embodiment of the present invention;

fig. 5 is a schematic structural diagram of another image detection device for fusing image quality according to an embodiment of the present invention;

Fig. 6 is a schematic structural diagram of another image detection device for fusing image quality according to an embodiment of the present invention;

Fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Referring to fig. 1, fig. 1 is a flowchart of an image detection method for fusing image quality, as shown in fig. 1, according to an embodiment of the present invention, including the following steps:

101. And detecting the image to be detected through a pre-trained convolutional neural network, and extracting to obtain the characteristic information of the detection frame.

The characteristic information is fused with detection frame information and image quality information corresponding to the detection frame, and the image to be detected comprises a target object.

The above-mentioned detection frame information includes a detection frame confidence, a detection frame coordinate, and a classification confidence, where the above-mentioned detection frame confidence may be represented by one dimension, the above-mentioned detection frame coordinate may be represented by four dimensions, the classification confidence may be represented by the same dimension as the classification number, for example, in the case of face detection, the above-mentioned detection frame information includes a face frame confidence, coordinates of a face frame, and a face classification confidence, where the above-mentioned face frame confidence is used to represent a probability that the corresponding detection frame is a face frame, the coordinates of the face frame represent a position coordinate of the corresponding detection frame in the feature map, and the face classification confidence represents a probability that the image in the corresponding detection frame is a face classification and a probability that the image is classified into other types.

Therefore, the above feature information can be represented by a (1+4+d+e) dimension vector, where D is the classification confidence, E is the dimension required for the image quality information, and when there are 80 classifications and the image quality information dimension is 64, the feature information is a (1+4+80+64) =149 dimension vector, and when there is only one classification, i.e., only classifying faces or non-faces, the feature information is a (1+4+1+64) =70 dimension vector. Of course, the above-described image quality information dimension may be other dimensions, such as 32-dimension, 128-dimension, 256-dimension, or the like.

The image to be detected may be an image obtained by the terminal device in real time, for example, may be an image or video captured by the monitoring camera in real time, and the image to be detected may also be an image or video uploaded by the user. It should be noted that, the image to be detected may have one or more target objects for detection, and of course, if there is no target object in the image to be detected, a null value will be output after detection.

The target object may be any detectable object, such as a person, a face, a vehicle, an article, an animal, or the like.

The convolutional neural network is trained in advance, and can be improved on the conventional convolutional neural network for detection, and the corresponding convolution for extracting the image quality information is added, so that the convolutional neural network can extract the characteristic image and the image quality information through convolution calculation. Specifically, a convolution kernel for extracting image quality information is added to the convolution neural network, and the specific addition quantity of the convolution kernels can be added according to the dimension of the required image quality information, for example, if the required image quality information is 64 dimensions, the image quality information of the image to be detected is extracted by adding 64 convolution kernels, and the 64 dimensions of image quality information are obtained by extraction. Of course, the convolutional neural network further includes a convolutional kernel for extracting the confidence of the detection frame and a convolutional kernel for extracting the coordinates of the detection frame, and may further include a convolutional kernel for extracting the confidence of the target classification, and so on. The convolutional neural network further comprises a pooling core, wherein the pooling core is used for downsampling the characteristic images after convolutional calculation to obtain characteristic images with different scales (the smaller the image scale is, the larger the receptive field is, and larger targets can be detected).

It should be noted that the convolutional neural network may also be referred to as a detector or a detection module.

Referring to fig. 2, fig. 2 is a method for extracting feature information of a detection frame according to an embodiment of the present invention, where the method in fig. 2 includes:

201. and extracting characteristic images under different grid scales through a pre-trained convolutional neural network.

The feature image includes a predetermined number of image quality channels.

In this step, the above-mentioned grid scale refers to dividing feature images with different sizes through a network, which may be dividing the feature images into s×s grids, for feature images with larger scales, dividing the feature images with larger scales may be performed by using more grids, and for feature images with smaller scales, dividing the feature images with smaller grids, where the scales of the feature images are related to the number of downsampling times, and the more downsampling, the smaller the image scale. Specifically, the convolutional neural network can sample the characteristic images with three different scales through a first downsampling, a second downsampling and a third downsampling. More specifically, the first downsampling may be 8 times downsampling, the second downsampling may be 16 times downsampling, and the third downsampling may be 32 times downsampling.

In a possible embodiment, the convolutional neural network may also perform up-sampling according to the feature image after 32 times of down-sampling, and perform 2 times of up-sampling to obtain a feature image corresponding to 16 times of down-sampling, and perform 2 times of up-sampling on the feature image corresponding to 16 times of down-sampling to obtain a feature image corresponding to 8 times of down-sampling.

Through the downsampling, a first feature image corresponding to the first downsampling, a second feature image corresponding to the second downsampling, and a third feature image corresponding to the third downsampling can be obtained, and s1×s1, s2×s2, and s3×s3 grids are respectively divided by convolution operation, where s1 > S2 > S3, specifically s1=2s2=4s3. More specifically, assuming that the input is 416×416×3 to-be-detected image, s1=52, s2=26, s3=13, meaning that the first feature image is divided into 52×52 meshes, the second feature image is divided into 26×26 meshes, and the third feature image is divided into 13×13 meshes, each of which can be used as an anchor point for predicting the target object.

The image quality channel is a channel obtained by extracting features according to the added convolution kernel.

202. And predicting the characteristic images under different grid scales through the detection frames, and extracting to obtain the target classification probability of the detection frames of each grid in the characteristic images under the corresponding different grid scales as the confidence degree of the detection frames.

The above-mentioned detection frames may be set to K, the above-mentioned K detection frames are different from each other, that is, each grid predicts the target object by K detection frames once, and each grid predicts by detection frames once, for each network in the above-mentioned first, second and third feature images, a total of (13×13×k+26×k+52×52×k) times is predicted, the feature information of each predicted detection frame is a vector of (1+4+d+e) dimensions, when the number of classifications is 1, and the feature information of each predicted detection frame is a vector of (1+4+1+64) dimensions, and at this time, assuming that k=3, the output of the entire third feature image is a vector of 13×13× (1+4+1+64) ×3 dimensions, the output of the second feature image is a vector of 26×26× (1+4+1) ×3, and the output of the first feature image is a vector of (1+4+64) ×52×3) dimensions. It should be noted that the above detection frame is preset based on an anchor point, and may also be referred to as an anchor point frame. The anchor points are used for adding priori information and guiding initial training to enable a better detection frame of the model to carry out regression.

The image in each grid corresponding to the detection frame is extracted to conduct target classification prediction, and because each network corresponds to K detection frames, each detection frame can conduct target classification prediction, the image in the detection frame is subjected to target classification calculation, the probability that the target belongs to each classification is calculated, and the probability is used as the confidence of the detection frame. Taking face detection as an example, the probability of obtaining the image attribute face in the corresponding detection frame as the confidence of the face frame by extracting the image in the detection frame corresponding to each grid for face prediction, and the higher the confidence of the face frame is, the higher the confidence that the detection frame is the face detection frame is.

203. And extracting the image quality information of each detection frame according to the image quality channel to obtain the image quality information corresponding to each detection frame.

In this step, the image quality information corresponding to each detection frame is also extracted, and each grid extracts the channel value of the quality information channel of the feature image through K detection frames, so as to obtain the image quality information corresponding to each detection frame.

204. And obtaining the characteristic information of the detection frame based on the confidence and the image quality information corresponding to the detection frame.

For example, an image (416×416×3) is input, wherein 3 is three channels of RGB, and after convolution calculation, the three channels are respectively calculated with a pooling core in a first pooling layer, a second pooling layer and a third pooling layer to obtain feature information of 52×52×k detection frames in a first scale feature image, obtain feature information of 26×26×k detection frames in a second scale feature image, and obtain feature information of 13×13×k detection frames in a third scale feature image.

102. And performing non-maximum suppression according to the detection frame information to obtain target detection frame information.

In this step, the feature information of the detection frame includes detection frame information, and the detection frame information includes confidence and coordinate information of the detection frame. The coordinate information of the detection frame includes x and y coordinate values of the center point (anchor point) in the pixel point coordinate system, and a total of four dimension values of the detection frame, namely, a height (h), a width (w), and in one possible embodiment, the coordinate information of the detection frame may also be coordinate values of two diagonal points in the pixel point coordinate system, such as (x 1, y 1) and (x 2, y 2) of the diagonal points, and may also be obtained, where the center point is x= (x1+x2)/2, y= (y 1, y 2)/2, a height (h) = i y1-y2, and a width (w) = x1-x 2. It should be noted that, in the embodiment of the present invention, the detection frames are rectangular detection frames.

When non-maximum suppression is performed, the detection frames in the first feature image, the second feature image and the third feature image may be mapped into the image with the same size, for example, the detection frame is mapped into the original image, or the detection frames in the second feature image and the third feature image are mapped into the feature image with the largest size, for example, the detection frames in the second feature image and the third feature image are mapped into the first feature image.

The above-mentioned non-maximum suppression means may be understood as a detection frame in which the suppression confidence is not the maximum. Specifically, the intersection ratio between every two detection frames can be calculated according to the coordinate information of the detection frames, and whether the intersection ratio is larger than a preset intersection ratio threshold value is judged; the above-mentioned intersection ratio refers to the ratio of the area of the intersection portion of the two detection frames to the area of the intersection portion. The maximum overlap ratio is 1, and the larger the overlap ratio is, the closer and the more similar the two detection frames are. Since each grid generates K detection frames, grids near each grid can also generate a plurality of detection frames for intersecting, one target object can be represented by a plurality of detection frames gathered in one area, but one target object only has one optimal detection frame, so that the plurality of detection frames of one target object can be suppressed by non-maximum suppression, and the detection frame with the highest confidence is reserved as the target detection frame.

If the intersection ratio is larger than a preset intersection ratio threshold value, comparing the confidence degree between the two detection frames, and reserving the detection frames with larger confidence degrees; when the ratio of the two detection frames is greater than the threshold value, it is indicated that the two detection frames are sufficiently close, and the shapes of the two detection frames are sufficiently similar.

And traversing all the detection frames, and deleting the detection frames with the confidence coefficient smaller than a preset confidence coefficient threshold value to obtain target detection frame information. Since the target object may be one or more in the image to be detected, the above-mentioned detection frame may be one or more. After non-maximum suppression, one detection frame per target object is the target detection frame. The target detection frame information includes confidence and detection frame position.

103. And calculating the image quality information corresponding to the target detection frame information and the pre-trained quality weight to obtain the image quality score of the target detection frame.

Since each detection frame further includes the extracted image quality information in step 203, that is, the target detection frame also includes the image quality information, it is understood that the target detection frame includes the information of the detection frame itself (that is, the detection frame confidence, the detection frame coordinates, the target classification confidence) and the image quality information.

And calculating the image quality information and the pre-trained quality weight to obtain the image quality information of the target detection frame. For example, taking face detection as an example, the pre-trained quality weight is a face quality weight, and the dot product of the face quality information and the quality weight is calculated by dot multiplying the image quality information and the quality weight to obtain a face quality score, where the image quality score may also be referred to as an image quality score, and is a scalar.

In one possible embodiment, since the image quality information is fused in the feature information of the target detection frame, the detection frame information in the feature information of the target detection frame may be masked first, so that the detection frame information does not participate in the dot multiplication calculation of the quality weight. Or the dimension corresponding to the image quality information in the extracted characteristic information of the target detection frame can be used for carrying out dot multiplication calculation with the quality weight.

104. And outputting and obtaining the target object based on the target detection frame information and the image quality fraction of the target detection frame.

In this step, the target detection frame information is obtained in step 102, and then the target detection frame may be projected into the image to be detected through the target detection frame coordinates according to the detection frame coordinates in the target detection frame information, so as to obtain the position of the target object in the image to be detected, and the corresponding detection frame and confidence coefficient are displayed in the image to be detected, and meanwhile, the corresponding image quality score is also output to evaluate the image quality of the target object. Or extracting an image area corresponding to the target detection frame in the image to be detected as a target object, and extracting an image of the target object for subsequent tasks, such as profiling by extracting a face or performing face structuring processing on the extracted face.

It should be noted that, the image detection method for fusing image quality provided by the embodiment of the invention can be applied to devices such as a mobile phone, a monitor, an entrance guard machine, a computer, a server and the like which need to perform image detection.

Optionally, when calculating the image quality information of the target detection frame, the image quality information may be ranked according to the image quality score, the ranking of the image quality score may be converted into a classification problem through svm (support vector machines, support vector machine) or ranknet (ranking network), the ranking may be fused into a quality weight, and the image quality score may be calculated and the target object may be ranked according to the image quality score.

Alternatively, the convolutional neural network (detector) and the quality weights may be trained separately, i.e., the training of the convolutional neural network and the training of the quality weights are decoupled. Therefore, the mutual influence of error feedback of the convolutional neural network and the quality weight during training can be avoided, and finally, the model is not converged.

The training of the convolutional neural network may be that the convolutional neural network is trained through a first sample data set, where the first sample data set includes a sample object frame, a sample classification label and a first image quality label, so that the convolutional neural network can learn a corresponding target object classification and a detection frame, and at the same time, can extract image quality information.

The training of the quality weight may be performed by weighting the quality by a second sample data set, where the second sample data set includes a second image quality tag, and the sample data in the second sample data set may be image quality information extracted by a convolutional neural network, and the second image quality tag may be the same as the first image quality tag.

In one possible embodiment, the convolution kernel used to extract the image quality information in the convolutional neural network may be extracted from a pre-trained image quality evaluation network, and then the extracted already trained convolution kernel is fused into the convolutional neural network. In this way, the convolution kernel used to extract the image quality information may not be retrained.

Referring to fig. 3, fig. 3 is a schematic structural diagram of an image detection device for fusing image quality according to an embodiment of the present invention, as shown in fig. 3, the device includes:

The extracting module 301 is configured to detect an image to be detected through a convolutional neural network trained in advance, and extract feature information of a detection frame, where the feature information is fused with detection frame information and image quality information corresponding to the detection frame, and the image to be detected includes a target object;

The processing module 302 is configured to perform non-maximum suppression according to the detection frame information to obtain target detection frame information;

The calculating module 303 is configured to calculate the image quality information corresponding to the target detection frame information and the pre-trained quality weight, so as to obtain an image quality score of the target detection frame;

And the output module 304 is configured to output and obtain the target object based on the target detection frame information and the image quality score of the target detection frame.

Optionally, as shown in fig. 4, the detection frame information includes a confidence, and the extracting module 301 includes:

A first extraction unit 3011, configured to extract, through a convolutional neural network trained in advance, feature images under different grid scales, where the feature images include a predetermined number of image quality channels;

A second extracting unit 3012, configured to predict, by using a detection frame, the feature images under different grid scales, and extract, as confidence level of the detection frame, a target classification probability of the detection frame of each grid in the feature images under the corresponding different grid scales;

a third extracting unit 3013, configured to extract image quality information of each detection frame according to the image quality channel, so as to obtain image quality information corresponding to each detection frame;

And the obtaining unit 3014 is configured to obtain feature information of the detection frame based on the confidence level and the image quality information corresponding to the detection frame.

Optionally, as shown in fig. 5, the detection frame information includes coordinate information of a detection frame, and the processing module 302 includes:

a calculating unit 3021, configured to calculate an intersection ratio between each two detection frames according to the coordinate information of the detection frames, and determine whether the intersection ratio is greater than a preset intersection ratio threshold;

A retaining unit 3022, configured to compare the confidence levels between the two detection frames and retain the detection frames with greater confidence levels if the intersection ratio is greater than a preset intersection ratio threshold;

And the deleting unit 3023 is configured to traverse all the detection frames, and delete the detection frames with the confidence coefficient smaller than the preset confidence coefficient threshold value to obtain the target detection frame information.

Optionally, the calculating module 303 is further configured to perform a dot product calculation on the image quality information corresponding to the target detection frame and the pre-trained quality weight, so as to obtain an image quality score of the target detection frame.

Optionally, the output module 304 is further configured to output, according to the target detection frame information, a corresponding image in the target detection frame as a target object; and outputting the image quality score of the target detection frame as the image quality score of the target object according to the quality information of the target detection frame.

Optionally, as shown in fig. 6, the apparatus further includes:

A first training module 305, configured to train the convolutional neural network through a first sample data set to which a first image quality label is added, so that the convolutional neural network can extract a corresponding target object and image quality information;

a second training module 306, configured to train the quality weights through a second sample data set added with second image quality information, so that the quality weights can extract corresponding image quality scores;

It should be noted that, the image detection device for fusing image quality provided by the embodiment of the invention can be applied to devices such as a mobile phone, a monitor, an entrance guard machine, a computer, a server and the like which need to perform image detection.

The image detection device for the fused image quality provided by the embodiment of the invention can realize all the processes realized by the image detection method for the fused image quality in the method embodiment, and can achieve the same beneficial effects. In order to avoid repetition, a description thereof is omitted.

Referring to fig. 7, fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present invention, as shown in fig. 7, including: a memory 702, a processor 701 and a computer program stored on the memory 702 and executable on the processor 701, wherein:

the processor 701 is configured to call a computer program stored in the memory 702, and perform the following steps:

Optionally, the detection frame information includes a confidence level, the detecting the image to be detected through the pre-trained convolutional neural network performed by the processor 701, and extracting feature information of the detection frame includes:

Optionally, the detection frame information includes coordinate information of a detection frame, and the performing, by the processor 701, non-maximum suppression according to the detection frame information to obtain target detection frame information includes:

Optionally, the calculating, by the processor 701, the image quality information corresponding to the target detection frame information and the pre-trained quality weight to obtain the image quality information of the target detection frame includes:

and performing point multiplication calculation on the characteristic information of the target detection frame and the pre-trained quality weight to obtain the image quality score of the target detection frame.

Optionally, the outputting, by the processor 701, the target object based on the target detection frame information and the image quality information of the target detection frame includes:

Optionally, the training of the convolutional neural network performed by the processor 701 includes:

training of the quality weights comprises the following steps:

The electronic device may be a mobile phone, a monitor, an access control device, a computer, a server, or the like, which is required to perform image detection.

The electronic device provided by the embodiment of the invention can realize each process realized by the image detection method for fusing image quality in the embodiment of the method, can achieve the same beneficial effects, and is not repeated here for avoiding repetition.

The embodiment of the invention also provides a computer readable storage medium, on which a computer program is stored, which when executed by a processor, implements each process of the image detection method for fusing image quality provided by the embodiment of the invention, and can achieve the same technical effect, so that repetition is avoided, and no further description is provided here.

Those skilled in the art will appreciate that implementing all or part of the above-described methods in accordance with the embodiments may be accomplished by way of a computer program stored on a computer readable storage medium, which when executed may comprise the steps of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM) or the like.

The foregoing disclosure is illustrative of the present invention and is not to be construed as limiting the scope of the invention, which is defined by the appended claims.

Claims

1. An image detection method for fusing image quality is characterized by comprising the following steps:

Detecting an image to be detected through a pre-trained convolutional neural network, and extracting feature information of a detection frame, wherein the feature information is fused with detection frame information and image quality information corresponding to the detection frame, the image to be detected comprises a target object, and the detection frame information comprises a confidence level;

outputting and obtaining the target object based on the target detection frame information and the image quality score of the target detection frame;

The method for detecting the image to be detected through the convolutional neural network trained in advance, extracting the characteristic information of the detection frame, comprises the following steps:

2. The method of claim 1, wherein the detection frame information includes coordinate information of a detection frame, and the performing non-maximum suppression according to the detection frame information to obtain target detection frame information includes:

3. The method of claim 1, wherein the calculating the image quality information corresponding to the target detection frame information and the pre-trained quality weights to obtain the image quality score of the target detection frame comprises:

4. The method of claim 3, wherein the outputting the target object based on the target detection frame information and the image quality score of the target detection frame comprises:

Outputting a corresponding image in the target detection frame as a target object according to the target detection frame information; and outputting the image quality score of the target detection frame as the image quality score of the target object.

5. The method of any of claims 1 to 4, wherein the training of the convolutional neural network comprises:

training of the quality weights comprises the following steps:

6. An image detection apparatus that fuses image quality, the apparatus comprising:

the calculation module is used for calculating the image quality information corresponding to the target detection frame information and the pre-trained quality weight to obtain the image quality score of the target detection frame;

The output module is used for outputting and obtaining the target object based on the target detection frame information and the image quality score of the target detection frame;

the extraction module further comprises:

The first extraction unit is used for extracting characteristic images under different grid scales through a pre-trained convolutional neural network, wherein the characteristic images comprise a preset number of image quality channels;

The second extraction unit is used for predicting the characteristic images under different grid scales through the detection frames and extracting the target classification probability of the detection frame of each grid from the characteristic images under the corresponding different grid scales as the confidence level of the detection frame;

The third extraction unit is used for extracting the image quality information of each detection frame according to the image quality channel to obtain the image quality information corresponding to each detection frame;

And the obtaining unit is used for obtaining the characteristic information of the detection frame based on the confidence and the image quality information corresponding to the detection frame.

7. An electronic device, comprising: a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps in the image detection method of fused image quality as claimed in any one of claims 1 to 5 when the computer program is executed.

8. A computer-readable storage medium, on which a computer program is stored which, when executed by a processor, implements the steps in the image detection method of fused image quality according to any one of claims 1 to 5.