Disclosure of Invention
The present application aims to solve at least one of the technical problems existing in the prior art. Therefore, the application provides a training method and a training device for a pedestrian image quality evaluation model, which improve the accuracy and the reliability of pedestrian image quality evaluation.
In a first aspect, the present application provides a training method of a pedestrian image quality evaluation model, the method comprising:
acquiring a pedestrian image dataset, wherein the pedestrian image dataset comprises a plurality of pedestrian sample images and pedestrian label images corresponding to the pedestrian sample images;
Inputting the pedestrian image data set into a pedestrian image quality evaluation model to be trained, and carrying out convolution processing through a backbone network of the pedestrian image quality evaluation model to obtain characteristic information of the pedestrian sample image and characteristic information of the pedestrian label image output by the backbone network, wherein the characteristic information of the pedestrian sample image comprises a sample fine granularity characteristic, a sample sharpness characteristic and a sample texture characteristic, and the characteristic information of the pedestrian label image comprises a label fine granularity characteristic, a label sharpness characteristic and a label texture characteristic;
acquiring feature similarity between the feature information of the pedestrian sample image and the feature information of the pedestrian label image;
And updating network parameters of the pedestrian image quality evaluation model based on the feature similarity to obtain the trained pedestrian image quality evaluation model.
According to the training method of the pedestrian image quality evaluation model, the acquired pedestrian image data set is input into the pedestrian image quality evaluation model to be trained, the main network of the pedestrian image quality evaluation model is used for carrying out convolution processing, the global semantic features and the local information features of the pedestrian image are mined, the feature information such as the fine granularity, the sharpness and the texture of the pedestrian sample image and the pedestrian label image is obtained, the pedestrian image quality is evaluated more comprehensively and objectively, the network parameters of the pedestrian image quality evaluation model are trained and updated based on the feature similarity, and the accuracy and the reliability of the model for evaluating the pedestrian image quality are improved.
According to one embodiment of the present application, the convolution processing by the backbone network of the pedestrian image quality evaluation model includes:
And carrying out multi-order center difference convolution on the images of the pedestrian image data set through the backbone network.
According to one embodiment of the present application, the performing multi-order center difference convolution on the image of the pedestrian image data set through the backbone network includes:
Applying the formula
Wherein p0 is the convolution center position, θ and λ are super parameters, ωn is the weight of the convolution kernel at the position pn,For a value of pn for the feature layer at position, R is the range of convolution kernels,For the feature output at position p0 after center differential convolution,Is thatCorresponding multi-order center differences.
According to one embodiment of the application, the backbone network is ResNet < 18 >, and the backbone network comprises a convolution layer, a central differential convolution layer, a batch normalization layer, an activation function layer, a global pooling layer and an average pooling layer, wherein the central differential convolution layer is used for performing multi-order central differential convolution.
According to an embodiment of the present application, the acquiring feature similarity between the feature information of the pedestrian sample image and the feature information of the pedestrian tag image includes:
And calculating the feature similarity between the feature information of the pedestrian sample image and the feature information of the pedestrian label image through a bulldozer distance loss function.
According to one embodiment of the application, the training strategy of the pedestrian image quality evaluation model is a cosine annealing learning rate strategy.
In a second aspect, the present application provides a pedestrian image quality evaluation method including:
Acquiring a pedestrian image to be evaluated;
inputting the pedestrian image to be evaluated into the pedestrian image quality evaluation model to obtain the quality evaluation score of the pedestrian image to be evaluated, which is output by the pedestrian image quality evaluation model;
The pedestrian image quality evaluation model is trained based on the training method of the pedestrian image quality evaluation model in the first aspect.
According to the pedestrian image quality evaluation method, the acquired pedestrian image to be evaluated is input into the trained pedestrian image quality evaluation model, the fine granularity characteristic, the sharpness characteristic and the texture characteristic of the pedestrian image are accurately extracted, and the quality evaluation score of the pedestrian image to be evaluated is output, so that the accuracy of pedestrian image quality evaluation can be improved, and the subsequent analysis of pedestrian attributes is facilitated.
In a third aspect, the present application provides a training apparatus of a pedestrian image quality evaluation model, the apparatus comprising:
the first acquisition module is used for acquiring a pedestrian image data set, wherein the pedestrian image data set comprises a plurality of pedestrian sample images and pedestrian label images corresponding to the pedestrian sample images;
The first processing module is used for inputting the pedestrian image dataset into a pedestrian image quality evaluation model to be trained, carrying out convolution processing through a backbone network of the pedestrian image quality evaluation model to obtain characteristic information of the pedestrian sample image and characteristic information of the pedestrian label image output by the backbone network, wherein the characteristic information of the pedestrian sample image comprises a sample fine granularity characteristic, a sample sharpness characteristic and a sample texture characteristic, and the characteristic information of the pedestrian label image comprises a label fine granularity characteristic, a label sharpness characteristic and a label texture characteristic;
The second acquisition module is used for acquiring the feature similarity between the feature information of the pedestrian sample image and the feature information of the pedestrian label image;
and the second processing module is used for updating the network parameters of the pedestrian image quality evaluation model based on the feature similarity to obtain the trained pedestrian image quality evaluation model.
According to the training device of the pedestrian image quality evaluation model, the acquired pedestrian image data set is input into the pedestrian image quality evaluation model to be trained, the main network of the pedestrian image quality evaluation model is used for carrying out convolution processing, the global semantic features and the local information features of the pedestrian image are mined, the feature information such as the fine granularity, the sharpness and the texture of the pedestrian sample image and the pedestrian label image is obtained, the pedestrian image quality is evaluated more comprehensively and objectively, the network parameters of the pedestrian image quality evaluation model are updated based on the feature similarity, and the accuracy and the reliability of the model for evaluating the pedestrian image quality are improved.
In a fourth aspect, the present application provides a pedestrian image quality evaluation apparatus including:
the third acquisition module is used for acquiring the pedestrian image to be evaluated;
The third processing module is used for inputting the pedestrian image to be evaluated into the pedestrian image quality evaluation model to obtain the quality evaluation score of the pedestrian image to be evaluated, which is output by the pedestrian image quality evaluation model;
The pedestrian image quality evaluation model is trained based on the training method of the pedestrian image quality evaluation model in the first aspect.
According to the pedestrian image quality evaluation device, the acquired pedestrian image to be evaluated is input into the trained pedestrian image quality evaluation model, the fine granularity characteristic, the sharpness characteristic and the texture characteristic of the pedestrian image are accurately extracted, and the quality evaluation score of the pedestrian image to be evaluated is output, so that the accuracy of pedestrian image quality evaluation can be improved, and the subsequent analysis of pedestrian attributes is facilitated.
In a fifth aspect, the present application provides an electronic device, including a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor implements the training method of the pedestrian image quality evaluation model described in the first aspect when the processor executes the computer program.
In a sixth aspect, the present application provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the training method of the pedestrian image quality evaluation model as described in the first aspect above.
In a seventh aspect, the present application provides a computer program product comprising a computer program which, when executed by a processor, implements the training method of the pedestrian image quality evaluation model as described in the first aspect above.
Additional aspects and advantages of the application will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the application.
Detailed Description
The technical solutions of the embodiments of the present application will be clearly described below with reference to the drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which are obtained by a person skilled in the art based on the embodiments of the present application, fall within the scope of protection of the present application.
The terms first, second and the like in the description and in the claims, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged, as appropriate, such that embodiments of the present application may be implemented in sequences other than those illustrated or described herein, and that the objects identified by "first," "second," etc. are generally of a type, and are not limited to the number of objects, such as the first object may be one or more. Furthermore, in the description and claims, "and/or" means at least one of the connected objects, and the character "/", generally means that the associated object is an "or" relationship.
The training method, the pedestrian image quality evaluation method, the training device, the pedestrian image quality evaluation device, the electronic device and the readable storage medium of the pedestrian image quality evaluation model provided by the embodiment of the application are described in detail below by specific embodiments and application scenes thereof with reference to the accompanying drawings.
The training method of the pedestrian image quality evaluation model can be applied to the terminal, and can be specifically executed by hardware or software in the terminal.
The terminal includes, but is not limited to, a portable communication device such as a mobile phone or tablet having a touch sensitive surface (e.g., a touch screen display and/or a touch pad). It should also be appreciated that in some embodiments, the terminal may not be a portable communication device, but rather a desktop computer having a touch-sensitive surface (e.g., a touch screen display and/or a touch pad).
In the following various embodiments, a terminal including a display and a touch sensitive surface is described. However, it should be understood that the terminal may include one or more other physical user interface devices such as a physical keyboard, mouse, and joystick.
The execution subject of the training method of the pedestrian image quality evaluation model provided by the embodiment of the application can be the electronic equipment or the functional module or the functional entity which can realize the training method of the pedestrian image quality evaluation model in the electronic equipment, the electronic device in the embodiment of the application includes, but is not limited to, a mobile phone, a tablet computer, a camera, a wearable device and the like, and the training method of the pedestrian image quality evaluation model provided in the embodiment of the application is described below by taking the electronic device as an execution main body as an example.
The pedestrian image quality evaluation model is used for evaluating the pedestrian image quality, and accurate pedestrian image quality evaluation is obtained by training the pedestrian image quality evaluation model.
As shown in fig. 1, the training method of the pedestrian image quality evaluation model includes: steps 110 to 140.
Step 110, acquiring a pedestrian image dataset.
The pedestrian image data set comprises a plurality of pedestrian sample images and pedestrian label images corresponding to the pedestrian sample images.
The pedestrian image data set is a data set after preprocessing, and the data set for pedestrian image quality evaluation acquired by the image acquisition device can be preprocessed to obtain the pedestrian image data set.
In actual implementation, the original data collection and screening can be performed on the video monitoring scene, and the video of the corresponding scene is saved in a frame skipping manner, for example, an image is saved every 10 frames, so that a data set for pedestrian image quality evaluation is obtained.
And preprocessing such as pedestrian detection, data expansion and the like is carried out on the data set for pedestrian image quality evaluation, so that a pedestrian image data set is obtained.
In this embodiment, the effective pedestrian image may be extracted by using a pedestrian detection model for the image of the dataset for pedestrian image quality evaluation, and then data expansion may be performed on the effective pedestrian image by using image rotation, image plus noise, or the like.
The pedestrian detection model may be YOLOv, YOLOv5, SSD, or the like, and the effective pedestrian image is extracted by the pedestrian detection model.
When a pedestrian is shot through video monitoring, the pedestrian and surrounding scenes are shot, the collected images are subjected to a pedestrian detection model before a preprocessed data set is acquired, useless data such as the surrounding scenes are removed, and part of pedestrian images, namely effective pedestrian images, are reserved, so that the interference of complex backgrounds on pedestrian image quality evaluation can be effectively reduced.
It is understood that the obtained pedestrian image data set includes two types of images, namely a pedestrian sample image and a pedestrian label image corresponding to the pedestrian sample image, wherein the pedestrian label image includes label information of a pedestrian image quality evaluation of the corresponding pedestrian sample image.
And 120, inputting the pedestrian image data set into a pedestrian image quality evaluation model to be trained, and carrying out convolution processing through a backbone network of the pedestrian image quality evaluation model to obtain the characteristic information of the pedestrian sample image and the characteristic information of the pedestrian label image output by the backbone network.
The backbone network refers to a backbone part of the neural network model and is used for extracting characteristic information from the image.
The feature information of the pedestrian sample image comprises sample fine granularity feature information, sample sharpness feature information and sample texture feature information; the feature information of the pedestrian tag image includes tag fine-granularity feature information, tag sharpness feature information, and tag texture feature information.
In the embodiment, the acquired pedestrian image dataset is input into the pedestrian image quality evaluation model to be trained, convolution processing is carried out in the backbone network, and the characteristic information of the output pedestrian sample image and the characteristic information such as the fine granularity, the sharpness, the texture and the like of the pedestrian label image are acquired, so that the training of the pedestrian image quality evaluation model is facilitated.
It can be understood that the images in the pedestrian image dataset are processed in the backbone network, the images are processed in a convolution processing mode, and feature information such as fine granularity, sharpness and texture of the images is extracted.
After the pedestrian image dataset is obtained in step 110, the images in the pedestrian image dataset may be set to a uniform image specification before the pedestrian image dataset is input to the pedestrian image quality evaluation model to be trained in step 120, so that the pedestrian image quality evaluation model is convenient for processing the images.
For example, the size of all images of the pedestrian image data set may be set to 192×64 resolution, and the batch size of the pedestrian image data set may be set to 64.
The image pixel range of the pedestrian image data set is normalized, the average value [0, 0] is subtracted, the standard deviation [255, 255, 255] is divided, and the image specification of the pedestrian image data set is unified, so that the pedestrian image quality evaluation model is beneficial to processing images.
And 130, acquiring the feature similarity between the feature information of the pedestrian sample image and the feature information of the pedestrian label image.
The feature similarity is the similarity degree between the feature information of the pedestrian sample image and the feature information of the pedestrian label image, and the feature similarity can reflect the difference between the prediction output of the pedestrian image quality evaluation model and the label.
In actual execution, after the degree of similarity between the feature information of the pedestrian sample image and the feature information of the pedestrian tag image is acquired, the feature similarity is calculated using a loss function in the pedestrian image quality evaluation model.
For example, the feature similarity is calculated by using a loss function, and the feature similarity can be obtained by respectively calculating the similarity corresponding to the feature information such as fine granularity, sharpness, texture and the like in the pedestrian sample image and the pedestrian label image, and respectively carrying out weighted summation on the calculation results.
For another example, the feature similarity is calculated by using a loss function, and the feature similarity can be obtained by respectively obtaining corresponding comprehensive features from the feature information such as fine granularity, sharpness, texture and the like in the pedestrian sample image and the pedestrian label image and calculating according to the corresponding comprehensive features.
And 140, updating network parameters of the pedestrian image quality evaluation model based on the feature similarity to obtain a trained pedestrian image quality evaluation model.
The network parameters refer to parameters capable of independently reflecting characteristics of the backbone network.
For example, based on the feature similarity, the weight of the pedestrian image quality evaluation model, which is a parameter representing the relationship used to calculate the input and output samples, may be updated.
In this embodiment, according to the feature similarity between the feature information of the pedestrian sample image and the feature information of the pedestrian label image, the network parameters such as the weight of the pedestrian image quality evaluation model are updated until the training is completed, and the trained pedestrian image quality evaluation model is obtained.
In actual implementation, the network parameters of the pedestrian image quality evaluation model are updated, and the network parameters can be updated through back propagation of the network.
In this embodiment, when training the pedestrian image quality evaluation model, a target threshold value or the number of iterations may be set as the condition for the training being completed.
For example, when the obtained feature similarity is greater than a target threshold, updating the network parameters is stopped, and a trained pedestrian image quality evaluation model is obtained.
For another example, when the iteration number of training the pedestrian image quality evaluation model reaches the preset iteration number, stopping training to obtain a trained pedestrian image quality evaluation model.
In the related art, when evaluating the quality of a pedestrian image, areas such as the head, the upper body, the lower body and shoes of the pedestrian are generally extracted respectively, images of the local areas are scored, and finally the quality evaluation of the pedestrian is obtained, and the global characteristics of the pedestrian image are not considered, so that the reliability of the quality evaluation of the pedestrian image is poor.
In the embodiment of the application, the backbone network of the pedestrian image quality evaluation model acquires the characteristics of fine granularity, sharpness, texture and the like of the pedestrian sample image and the pedestrian label image through convolution processing, fully excavates the global semantic characteristics and the local information characteristics of the pedestrian image, carries out model training, can evaluate the pedestrian image quality more comprehensively and objectively, and improves the accuracy of the pedestrian image quality evaluation model in evaluating the pedestrian image quality.
According to the training method of the pedestrian image quality evaluation model provided by the embodiment of the application, the acquired pedestrian image dataset is input into the pedestrian image quality evaluation model to be trained, the main network of the pedestrian image quality evaluation model is used for carrying out convolution processing, the global semantic features and the local information features of the pedestrian image are mined, the feature information such as the fine granularity, the sharpness and the texture of the pedestrian sample image and the pedestrian label image is obtained, the pedestrian image quality is evaluated more comprehensively and objectively, the network parameters of the pedestrian image quality evaluation model are trained and updated based on the feature similarity, and the accuracy and the reliability of the model for evaluating the pedestrian image quality are improved.
In some embodiments, the convolution processing is performed through a backbone network of the pedestrian image quality evaluation model, including:
And carrying out multi-order center differential convolution on the images of the pedestrian image data set through the backbone network.
The multi-order center difference convolution may be a multi-order calculation method such as a second-order center difference convolution, a third-order center difference convolution, and a fourth-order center difference convolution. The embodiment of the application carries out the convolution processing of the image through the multi-order center differential convolution, can realize the detection of the image edge and the extraction of the characteristic information, and can ensure the accuracy of the extracted image characteristic.
In the related art, the characteristic information of the image of the pedestrian image dataset is extracted by adopting the central differential convolution, and only fine granularity characteristics of the image can be extracted, so that inaccuracy of pedestrian image quality evaluation is easily caused.
In the embodiment of the application, the characteristic information of the images of the pedestrian image data set is extracted by adopting multi-order central differential convolution, the fine granularity characteristics of the images of the pedestrian image data set can be extracted, the sharpness characteristics and the texture characteristics of the images of the pedestrian image data set can be extracted, and the accuracy and the reliability of the pedestrian image quality evaluation are improved by extracting more effective image characteristics.
In this embodiment, the second-order center difference convolution is performed on the image of the pedestrian image dataset through the backbone network of the pedestrian image quality evaluation model, and the fine-granularity feature, the sharpness feature, and the texture feature of the image of the pedestrian image dataset are obtained.
In the embodiment of the application, the fine granularity characteristics, the sharpness characteristics and the texture characteristics of the images of the pedestrian image data set are obtained by carrying out multi-order central difference convolution on the main network of the pedestrian sample image and the pedestrian label image in the pedestrian image quality evaluation model, so that the problem that the existing central difference convolution can only extract the fine granularity characteristics of the pedestrian image data set can be solved, and the accuracy and the reliability of pedestrian image quality evaluation can be improved by extracting more effective image characteristics.
In some embodiments, performing a multi-order central differential convolution on an image of a pedestrian image dataset over a backbone network may include:
Applying the formula
Wherein p0 is the convolution center position, θ and λ are super parameters, ωn is the weight of the convolution kernel at the position pn,For a value of pn for the feature layer at position, R is the range of convolution kernels,For the feature output at position p0 after center differential convolution,Is thatCorresponding multi-order center differences.
In this embodiment, a second order center differential convolution is taken as an example, where,Is thatThe second order center difference,The application formula of (2) is as follows:
wherein,Is the value of pn for the feature layer at position.
In this embodiment, unlike the existing center differential convolution manner, the fine granularity, sharpness, and texture features of the image of the pedestrian image dataset can be obtained by performing a multi-order center differential convolution on the image of the pedestrian image dataset.
In some embodiments, the backbone network is ResNet, the backbone network comprising a convolutional layer, a central differential convolutional layer, a batch normalization layer, an activation function layer, a global pooling layer, and an average pooling layer, wherein the central differential convolutional layer is configured to perform a second order central differential convolution.
The ResNet network takes ResNet as a basic framework, has 18 layers of depth, and realizes the functions of identifying pedestrian images, extracting characteristic information and the like through the ResNet network.
The convolution layer is used for extracting image features of the pedestrian image; the center differential convolution layer is used for executing multi-order center differential convolution; the batch normalization layer is used for carrying out standardization processing on the data; the activation function layer is used for adding nonlinear factors and solving the problem which cannot be solved by the linear model; the global pooling layer is used for replacing the full-connection layer and can accept images with any size; the average pooling layer is used for replacing the full-connection layer, so that the pressure in the learning process is reduced.
In this embodiment, resNet is used as a backbone network, and a central differential convolution layer in the ResNet network is used to perform multi-order central differential convolution, and in cooperation with the convolution layer, the batch normalization layer, the activation function layer, the global pooling layer and the average pooling layer, the collection of characteristic information such as fine granularity, sharpness and texture of the pedestrian sample image and the pedestrian label image is realized.
The loss function of the pedestrian image quality evaluation model is described below.
In some embodiments, obtaining feature similarity between feature information of a pedestrian sample image and feature information of a pedestrian tag image includes:
and calculating the feature similarity between the feature information of the pedestrian sample image and the feature information of the pedestrian label image through a bulldozer distance loss function.
The loss function adopts bulldozer distance (WASSERSTEIN DISTANCE), and image features are obeyed to a unified feature space through the loss function, so that training efficiency of the model is improved.
In actual execution, feature similarity between feature information of the pedestrian sample image and the pedestrian tag image is calculated by using a bulldozer loss function, and whether the pedestrian image quality evaluation model is good or bad is judged by the feature similarity.
The training strategy of the pedestrian image quality evaluation model is described below.
In some embodiments, the training strategy of the pedestrian image quality assessment model is a cosine annealing learning rate strategy.
The training strategy adopts a cosine annealing learning rate strategy (CosineAnnealingLR), and the learning rate is adjusted through the cosine annealing learning rate strategy.
The cosine annealing learning rate strategy is a training strategy adopting cosine function descent, and can enable learning to slowly descend first and then rapidly descend.
In the embodiment, the training strategy of the pedestrian image quality evaluation model adopts a cosine annealing learning rate strategy, and the learning rate is reduced by adopting a linear function, so that the accuracy of the pedestrian image quality evaluation model is maintained.
The following describes the formula of the cosine annealing learning rate strategy in detail.
The cosine annealing learning rate strategy has the formula expression:
Where lr denotes the learning rate of the current iteration number, lR denotes the initial learning rate, lmin denotes the minimum learning rate, cos denotes the cosine function, epoch is the current iteration number, and Tmax denotes 1/2 of the cosine period.
A specific embodiment is described below.
As shown in fig. 2, the video monitoring data is collected and filtered, and the video of the video monitoring scene is stored in a frame-skipping manner, and an image is stored every 10 frames.
And constructing a pedestrian image dataset, extracting an effective pedestrian image from the acquired image by adopting a YOLOv, YOLOv5, SSD and other pedestrian detection models, and performing data expansion on the effective pedestrian image by adopting modes of image rotation, image noise and the like.
The image specification is unified, the size of the pedestrian image dataset image is set to 192x64, the batch size is set to 64, normalization is performed, the mean value [0, 0] is subtracted, and the standard deviation [255, 255, 255] is divided.
And adopting a uniformly distributed sampling method for the pedestrian image dataset, selecting ResNet as a main network, and setting a convolution mode in the pedestrian image quality evaluation model into a multi-order center differential convolution mode.
Adopts a cosine annealing learning rate strategy: As a training strategy, where lr denotes the learning rate of the current iteration number, lR denotes the initial learning rate, lmin denotes the minimum learning rate, cos denotes the cosine function, epoch is the current iteration number, and Tmax denotes 1/2 of the cosine period.
The total iteration number is set to 300, Tmax is set to 300, the minimum learning rate lmin is set to 0, and the initial learning rate lR is set to 0.1.
And extracting the characteristic information of the pedestrian sample image and the characteristic information of the pedestrian label image by a convolution layer, a central difference convolution layer, a batch normalization layer, an activation function layer, a global pooling layer and an average pooling layer of a backbone network.
And calculating the similarity between the pedestrian sample image and the pedestrian label image by adopting a bulldozer distance loss function.
And updating network parameters through the back propagation of the network, and training the pedestrian image quality evaluation model.
And testing the trained network by adopting the test image, and outputting the test image quality evaluation score.
In the embodiment, by setting a convolution mode of multi-order center difference, fine granularity features, sharpness features and texture features in the pedestrian sample image can be accurately extracted, the credibility of pedestrian image quality evaluation can be improved, and subsequent pedestrian attribute analysis is facilitated.
The embodiment of the application also provides a pedestrian image quality evaluation method.
As shown in fig. 3, the pedestrian image quality evaluation method includes: step 310 and step 320.
Step 310, an image of the pedestrian to be evaluated is acquired.
The pedestrian image to be evaluated is a pedestrian image subjected to data preprocessing.
In the step, the image of the pedestrian monitored by the video is saved through frame skipping, and the pedestrian image is preprocessed to obtain the image of the pedestrian to be evaluated.
Step 320, inputting the pedestrian image to be evaluated into the pedestrian image quality evaluation model to obtain the quality evaluation score of the pedestrian image to be evaluated output by the pedestrian image quality evaluation model.
The pedestrian image quality evaluation model is obtained by training based on the training method of the pedestrian image quality evaluation model.
In the embodiment, a pedestrian image to be evaluated is input into a trained pedestrian image quality evaluation model, fine granularity characteristics, sharpness characteristics and texture characteristics of the pedestrian image to be evaluated are obtained in a multi-order center difference convolution mode, and quality evaluation scores of the pedestrian image to be evaluated are output.
According to the pedestrian image quality evaluation method provided by the embodiment of the application, the acquired pedestrian image to be evaluated is input into the trained pedestrian image quality evaluation model, the fine granularity characteristic, the sharpness characteristic and the texture characteristic of the pedestrian image are accurately extracted, and the quality evaluation score of the pedestrian image to be evaluated is output, so that the accuracy of pedestrian image quality evaluation can be improved, and the subsequent analysis of pedestrian attributes is facilitated.
According to the training method for the pedestrian image quality evaluation model, provided by the embodiment of the application, the execution subject can be a training device for the pedestrian image quality evaluation model. In the embodiment of the application, a training method for executing the pedestrian image quality evaluation model by using the training device for the pedestrian image quality evaluation model is taken as an example, and the training device for the pedestrian image quality evaluation model provided by the embodiment of the application is described.
The embodiment of the application also provides a training device of the pedestrian image quality evaluation model.
As shown in fig. 4, the training device of the pedestrian image quality evaluation model includes:
A first obtaining module 410, configured to obtain a pedestrian image dataset, where the pedestrian image dataset includes a plurality of pedestrian sample images and pedestrian tag images corresponding to the pedestrian sample images;
The first processing module 420 is configured to input a pedestrian image dataset into a pedestrian image quality evaluation model to be trained, perform convolution processing through a backbone network of the pedestrian image quality evaluation model, obtain feature information of a pedestrian sample image and feature information of a pedestrian label image output by the backbone network, where the feature information of the pedestrian sample image includes a sample fine granularity feature, a sample sharpness feature and a sample texture feature, and the feature information of the pedestrian label image includes a label fine granularity feature, a label sharpness feature and a label texture feature;
a second obtaining module 430, configured to obtain feature similarity between feature information of the pedestrian sample image and feature information of the pedestrian tag image;
The second processing module 440 is configured to update network parameters of the pedestrian image quality evaluation model based on the feature similarity, and obtain a trained pedestrian image quality evaluation model.
According to the training device of the pedestrian image quality evaluation model provided by the embodiment of the application, the acquired pedestrian image dataset is input into the pedestrian image quality evaluation model to be trained, the main network of the pedestrian image quality evaluation model is used for carrying out convolution processing, the global semantic features and the local information features of the pedestrian image are mined, the feature information such as the fine granularity, the sharpness and the texture of the pedestrian sample image and the pedestrian label image is obtained, the pedestrian image quality is evaluated more comprehensively and objectively, the network parameters of the pedestrian image quality evaluation model are updated based on the feature similarity, and the accuracy and the reliability of the model for evaluating the pedestrian image quality are improved.
In some embodiments, the first processing module 420 is configured to perform convolution processing through a backbone network of the pedestrian image quality evaluation model, including:
And carrying out multi-order center differential convolution on the images of the pedestrian image data set through the backbone network.
In some embodiments, the first processing module 420 is configured to perform a multi-order central differential convolution on an image of a pedestrian image dataset over a backbone network, including:
Applying the formula
Wherein p0 is the convolution center position, θ and λ are super parameters, ωn is the weight of the convolution kernel at the position pn,For a value of pn for the feature layer at position, R is the range of convolution kernels,For the feature output at position p0 after center differential convolution,Is thatCorresponding multi-order center differences.
In some embodiments, the first processing module 420 is configured for a backbone network ResNet, where the backbone network includes a convolution layer, a central differential convolution layer, a batch normalization layer, an activation function layer, a global pooling layer, and an average pooling layer, the central differential convolution layer configured to perform a multi-order central differential convolution.
In some embodiments, the second obtaining module 430, configured to obtain a feature similarity between feature information of the pedestrian sample image and feature information of the pedestrian tag image, includes:
and calculating the feature similarity between the feature information of the pedestrian sample image and the feature information of the pedestrian label image through a bulldozer distance loss function.
In some embodiments, the training strategy of the pedestrian image quality assessment model is a cosine annealing learning rate strategy.
The embodiment of the application also provides a pedestrian image quality evaluation device.
As shown in fig. 5, the pedestrian image quality evaluation device includes:
A third obtaining module 510, configured to obtain an image of a pedestrian to be evaluated;
The third processing module 520 is configured to input the pedestrian image to be evaluated to the pedestrian image quality evaluation model, and obtain a quality evaluation score of the pedestrian image to be evaluated output by the pedestrian image quality evaluation model;
the pedestrian image quality evaluation model is obtained by training based on the training method of the pedestrian image quality evaluation model.
According to the pedestrian image quality evaluation device provided by the embodiment of the application, the acquired pedestrian image to be evaluated is input into the trained pedestrian image quality evaluation model, the fine granularity characteristic, the sharpness characteristic and the texture characteristic of the pedestrian image are accurately extracted, and the quality evaluation score of the pedestrian image to be evaluated is output, so that the accuracy of pedestrian image quality evaluation can be improved, and the subsequent analysis of pedestrian attributes is facilitated.
The training device of the pedestrian image quality evaluation model in the embodiment of the application can be electronic equipment, and can also be a component in the electronic equipment, such as an integrated circuit or a chip. The electronic device may be a terminal, or may be other devices than a terminal. The electronic device may be a Mobile phone, a tablet computer, a notebook computer, a palm computer, a vehicle-mounted electronic device, a Mobile internet appliance (Mobile INTERNET DEVICE, MID), an augmented reality (augmented reality, AR)/Virtual Reality (VR) device, a robot, a wearable device, an ultra-Mobile personal computer (UMPC), a netbook or a Personal Digital Assistant (PDA), etc., and may also be a server, a network attached storage (Network Attached Storage, NAS), a personal computer (personal computer, PC), a Television (TV), a teller machine, a self-service machine, etc., which are not particularly limited in the embodiments of the present application.
The training device of the pedestrian image quality evaluation model in the embodiment of the application can be a device with an operating system. The operating system may be an Android operating system, an IOS operating system, or other possible operating systems, and the embodiment of the present application is not limited specifically.
The training device for the pedestrian image quality evaluation model provided by the embodiment of the application can realize each process realized by the method embodiments of fig. 1 to 2, and in order to avoid repetition, the description is omitted here.
In some embodiments, as shown in fig. 6, an electronic device 600 is further provided in the embodiments of the present application, which includes a processor 601, a memory 602, and a computer program stored in the memory 602 and capable of running on the processor 601, where the program, when executed by the processor 601, implements the respective processes of the training method embodiment of the pedestrian image quality evaluation model, and the same technical effects can be achieved, so that repetition is avoided and redundant description is omitted herein.
The electronic device in the embodiment of the application includes the mobile electronic device and the non-mobile electronic device.
The embodiment of the application also provides a non-transitory computer readable storage medium, on which a computer program is stored, which when executed by a processor, implements the respective processes of the training method embodiment of the pedestrian image quality evaluation model, and can achieve the same technical effects, and in order to avoid repetition, the description is omitted here.
Wherein the processor is a processor in the electronic device described in the above embodiment. The readable storage medium includes computer readable storage medium such as computer readable memory ROM, random access memory RAM, magnetic or optical disk, etc.
The embodiment of the application also provides a computer program product, which comprises a computer program, wherein the computer program realizes the training method of the pedestrian image quality evaluation model when being executed by a processor.
Wherein the processor is a processor in the electronic device described in the above embodiment. The readable storage medium includes computer readable storage medium such as computer readable memory ROM, random access memory RAM, magnetic or optical disk, etc.
The embodiment of the application further provides a chip, the chip comprises a processor and a communication interface, the communication interface is coupled with the processor, the processor is used for running programs or instructions, the processes of the training method embodiment of the pedestrian image quality evaluation model can be realized, the same technical effects can be achieved, and the repetition is avoided, and the description is omitted here.
It should be understood that the chips referred to in the embodiments of the present application may also be referred to as system-on-chip chips, chip systems, or system-on-chip chips, etc.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element. Furthermore, it should be noted that the scope of the methods and apparatus in the embodiments of the present application is not limited to performing the functions in the order shown or discussed, but may also include performing the functions in a substantially simultaneous manner or in an opposite order depending on the functions involved, e.g., the described methods may be performed in an order different from that described, and various steps may be added, omitted, or combined. Additionally, features described with reference to certain examples may be combined in other examples.
From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a computer software product stored in a storage medium (e.g. ROM/RAM, magnetic disk, optical disk) comprising instructions for causing a terminal (which may be a mobile phone, a computer, a server, or a network device, etc.) to perform the method according to the embodiments of the present application.
The embodiments of the present application have been described above with reference to the accompanying drawings, but the present application is not limited to the above-described embodiments, which are merely illustrative and not restrictive, and many forms may be made by those having ordinary skill in the art without departing from the spirit of the present application and the scope of the claims, which are to be protected by the present application.
In the description of the present specification, reference to the terms "one embodiment," "some embodiments," "illustrative embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the application. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiments or examples. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
While embodiments of the present application have been shown and described, it will be understood by those of ordinary skill in the art that: many changes, modifications, substitutions and variations may be made to the embodiments without departing from the spirit and principles of the application, the scope of which is defined by the claims and their equivalents.