CN116844026B

Movatterモバイル変換

Info

Publication number: CN116844026B
Application number: CN202310818271.7A
Authority: CN
Inventors: 刘义南; 黄宇恒; 徐天适; 岳许要; 杜文凯
Original assignee: Guangdian Yuntong Group Co ltd
Current assignee: Guangdian Yuntong Group Co ltd
Priority date: 2023-07-04
Filing date: 2023-07-04
Publication date: 2024-11-08
Anticipated expiration: 2043-07-04
Also published as: CN116844026A

Abstract

The application discloses a training method and device for a pedestrian image quality evaluation model, and belongs to the technical field of image processing. The method comprises the following steps: acquiring a pedestrian image dataset; the method comprises the steps that convolution processing is carried out through a backbone network of a pedestrian image quality evaluation model, feature information of a pedestrian sample image and feature information of a pedestrian label image output by the backbone network are obtained, the feature information of the pedestrian sample image comprises sample fine granularity features, sample sharpness features and sample texture features, and the feature information of the pedestrian label image comprises label fine granularity features, label sharpness features and label texture features; acquiring feature similarity between feature information of the pedestrian sample image and feature information of the pedestrian label image; based on the feature similarity, updating network parameters of the pedestrian image quality evaluation model to obtain a trained pedestrian image quality evaluation model. The method can improve the accuracy and the credibility of pedestrian image quality evaluation.

Description

Training method and device for pedestrian image quality evaluation model

Technical Field

The application belongs to the technical field of image processing, and particularly relates to a training method and device for a pedestrian image quality evaluation model.

Background

With the continuous improvement of video monitoring technology, the use scene of video monitoring is wider and wider, and pedestrian images extracted through video streams can be effectively applied to the aspects of safety precautions, personnel statistics and the like.

When extracting the pedestrian image, the quality difference of the obtained pedestrian image is large often because of the factors of complex background of video monitoring, blocked pedestrians, limited resolution of video monitoring and the like, and the low-quality image cannot be directly subjected to subsequent personnel analysis and attribute analysis.

Disclosure of Invention

The present application aims to solve at least one of the technical problems existing in the prior art. Therefore, the application provides a training method and a training device for a pedestrian image quality evaluation model, which improve the accuracy and the reliability of pedestrian image quality evaluation.

In a first aspect, the present application provides a training method of a pedestrian image quality evaluation model, the method comprising:

acquiring a pedestrian image dataset, wherein the pedestrian image dataset comprises a plurality of pedestrian sample images and pedestrian label images corresponding to the pedestrian sample images;

Inputting the pedestrian image data set into a pedestrian image quality evaluation model to be trained, and carrying out convolution processing through a backbone network of the pedestrian image quality evaluation model to obtain characteristic information of the pedestrian sample image and characteristic information of the pedestrian label image output by the backbone network, wherein the characteristic information of the pedestrian sample image comprises a sample fine granularity characteristic, a sample sharpness characteristic and a sample texture characteristic, and the characteristic information of the pedestrian label image comprises a label fine granularity characteristic, a label sharpness characteristic and a label texture characteristic;

acquiring feature similarity between the feature information of the pedestrian sample image and the feature information of the pedestrian label image;

And updating network parameters of the pedestrian image quality evaluation model based on the feature similarity to obtain the trained pedestrian image quality evaluation model.

According to the training method of the pedestrian image quality evaluation model, the acquired pedestrian image data set is input into the pedestrian image quality evaluation model to be trained, the main network of the pedestrian image quality evaluation model is used for carrying out convolution processing, the global semantic features and the local information features of the pedestrian image are mined, the feature information such as the fine granularity, the sharpness and the texture of the pedestrian sample image and the pedestrian label image is obtained, the pedestrian image quality is evaluated more comprehensively and objectively, the network parameters of the pedestrian image quality evaluation model are trained and updated based on the feature similarity, and the accuracy and the reliability of the model for evaluating the pedestrian image quality are improved.

According to one embodiment of the present application, the convolution processing by the backbone network of the pedestrian image quality evaluation model includes:

And carrying out multi-order center difference convolution on the images of the pedestrian image data set through the backbone network.

According to one embodiment of the present application, the performing multi-order center difference convolution on the image of the pedestrian image data set through the backbone network includes:

Applying the formula

Wherein p₀ is the convolution center position, θ and λ are super parameters, ω_n is the weight of the convolution kernel at the position p_n,For a value of p_n for the feature layer at position, R is the range of convolution kernels,For the feature output at position p₀ after center differential convolution,Is thatCorresponding multi-order center differences.

According to one embodiment of the application, the backbone network is ResNet < 18 >, and the backbone network comprises a convolution layer, a central differential convolution layer, a batch normalization layer, an activation function layer, a global pooling layer and an average pooling layer, wherein the central differential convolution layer is used for performing multi-order central differential convolution.

According to an embodiment of the present application, the acquiring feature similarity between the feature information of the pedestrian sample image and the feature information of the pedestrian tag image includes:

And calculating the feature similarity between the feature information of the pedestrian sample image and the feature information of the pedestrian label image through a bulldozer distance loss function.

According to one embodiment of the application, the training strategy of the pedestrian image quality evaluation model is a cosine annealing learning rate strategy.

In a second aspect, the present application provides a pedestrian image quality evaluation method including:

Acquiring a pedestrian image to be evaluated;

inputting the pedestrian image to be evaluated into the pedestrian image quality evaluation model to obtain the quality evaluation score of the pedestrian image to be evaluated, which is output by the pedestrian image quality evaluation model;

The pedestrian image quality evaluation model is trained based on the training method of the pedestrian image quality evaluation model in the first aspect.

According to the pedestrian image quality evaluation method, the acquired pedestrian image to be evaluated is input into the trained pedestrian image quality evaluation model, the fine granularity characteristic, the sharpness characteristic and the texture characteristic of the pedestrian image are accurately extracted, and the quality evaluation score of the pedestrian image to be evaluated is output, so that the accuracy of pedestrian image quality evaluation can be improved, and the subsequent analysis of pedestrian attributes is facilitated.

In a third aspect, the present application provides a training apparatus of a pedestrian image quality evaluation model, the apparatus comprising:

the first acquisition module is used for acquiring a pedestrian image data set, wherein the pedestrian image data set comprises a plurality of pedestrian sample images and pedestrian label images corresponding to the pedestrian sample images;

The first processing module is used for inputting the pedestrian image dataset into a pedestrian image quality evaluation model to be trained, carrying out convolution processing through a backbone network of the pedestrian image quality evaluation model to obtain characteristic information of the pedestrian sample image and characteristic information of the pedestrian label image output by the backbone network, wherein the characteristic information of the pedestrian sample image comprises a sample fine granularity characteristic, a sample sharpness characteristic and a sample texture characteristic, and the characteristic information of the pedestrian label image comprises a label fine granularity characteristic, a label sharpness characteristic and a label texture characteristic;

The second acquisition module is used for acquiring the feature similarity between the feature information of the pedestrian sample image and the feature information of the pedestrian label image;

and the second processing module is used for updating the network parameters of the pedestrian image quality evaluation model based on the feature similarity to obtain the trained pedestrian image quality evaluation model.

According to the training device of the pedestrian image quality evaluation model, the acquired pedestrian image data set is input into the pedestrian image quality evaluation model to be trained, the main network of the pedestrian image quality evaluation model is used for carrying out convolution processing, the global semantic features and the local information features of the pedestrian image are mined, the feature information such as the fine granularity, the sharpness and the texture of the pedestrian sample image and the pedestrian label image is obtained, the pedestrian image quality is evaluated more comprehensively and objectively, the network parameters of the pedestrian image quality evaluation model are updated based on the feature similarity, and the accuracy and the reliability of the model for evaluating the pedestrian image quality are improved.

In a fourth aspect, the present application provides a pedestrian image quality evaluation apparatus including:

the third acquisition module is used for acquiring the pedestrian image to be evaluated;

The third processing module is used for inputting the pedestrian image to be evaluated into the pedestrian image quality evaluation model to obtain the quality evaluation score of the pedestrian image to be evaluated, which is output by the pedestrian image quality evaluation model;

According to the pedestrian image quality evaluation device, the acquired pedestrian image to be evaluated is input into the trained pedestrian image quality evaluation model, the fine granularity characteristic, the sharpness characteristic and the texture characteristic of the pedestrian image are accurately extracted, and the quality evaluation score of the pedestrian image to be evaluated is output, so that the accuracy of pedestrian image quality evaluation can be improved, and the subsequent analysis of pedestrian attributes is facilitated.

In a fifth aspect, the present application provides an electronic device, including a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor implements the training method of the pedestrian image quality evaluation model described in the first aspect when the processor executes the computer program.

In a sixth aspect, the present application provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the training method of the pedestrian image quality evaluation model as described in the first aspect above.

In a seventh aspect, the present application provides a computer program product comprising a computer program which, when executed by a processor, implements the training method of the pedestrian image quality evaluation model as described in the first aspect above.

Additional aspects and advantages of the application will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the application.

Drawings

The foregoing and/or additional aspects and advantages of the application will become apparent and may be better understood from the following description of embodiments taken in conjunction with the accompanying drawings in which:

FIG. 1 is a flowchart of a training method of a pedestrian image quality evaluation model according to an embodiment of the present application;

FIG. 2 is a second flow chart of a training method of a pedestrian image quality evaluation model according to an embodiment of the present application;

Fig. 3 is a flowchart of a pedestrian image quality evaluation method provided by an embodiment of the present application;

fig. 4 is a schematic structural diagram of a training device of a pedestrian image quality evaluation model provided by an embodiment of the present application;

Fig. 5 is a schematic structural view of a pedestrian image quality evaluation device provided by an embodiment of the present application;

fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

The technical solutions of the embodiments of the present application will be clearly described below with reference to the drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which are obtained by a person skilled in the art based on the embodiments of the present application, fall within the scope of protection of the present application.

The terms first, second and the like in the description and in the claims, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged, as appropriate, such that embodiments of the present application may be implemented in sequences other than those illustrated or described herein, and that the objects identified by "first," "second," etc. are generally of a type, and are not limited to the number of objects, such as the first object may be one or more. Furthermore, in the description and claims, "and/or" means at least one of the connected objects, and the character "/", generally means that the associated object is an "or" relationship.

The training method, the pedestrian image quality evaluation method, the training device, the pedestrian image quality evaluation device, the electronic device and the readable storage medium of the pedestrian image quality evaluation model provided by the embodiment of the application are described in detail below by specific embodiments and application scenes thereof with reference to the accompanying drawings.

The training method of the pedestrian image quality evaluation model can be applied to the terminal, and can be specifically executed by hardware or software in the terminal.

The terminal includes, but is not limited to, a portable communication device such as a mobile phone or tablet having a touch sensitive surface (e.g., a touch screen display and/or a touch pad). It should also be appreciated that in some embodiments, the terminal may not be a portable communication device, but rather a desktop computer having a touch-sensitive surface (e.g., a touch screen display and/or a touch pad).

In the following various embodiments, a terminal including a display and a touch sensitive surface is described. However, it should be understood that the terminal may include one or more other physical user interface devices such as a physical keyboard, mouse, and joystick.

The execution subject of the training method of the pedestrian image quality evaluation model provided by the embodiment of the application can be the electronic equipment or the functional module or the functional entity which can realize the training method of the pedestrian image quality evaluation model in the electronic equipment, the electronic device in the embodiment of the application includes, but is not limited to, a mobile phone, a tablet computer, a camera, a wearable device and the like, and the training method of the pedestrian image quality evaluation model provided in the embodiment of the application is described below by taking the electronic device as an execution main body as an example.

The pedestrian image quality evaluation model is used for evaluating the pedestrian image quality, and accurate pedestrian image quality evaluation is obtained by training the pedestrian image quality evaluation model.

As shown in fig. 1, the training method of the pedestrian image quality evaluation model includes: steps 110 to 140.

Step 110, acquiring a pedestrian image dataset.

The pedestrian image data set comprises a plurality of pedestrian sample images and pedestrian label images corresponding to the pedestrian sample images.

The pedestrian image data set is a data set after preprocessing, and the data set for pedestrian image quality evaluation acquired by the image acquisition device can be preprocessed to obtain the pedestrian image data set.

In actual implementation, the original data collection and screening can be performed on the video monitoring scene, and the video of the corresponding scene is saved in a frame skipping manner, for example, an image is saved every 10 frames, so that a data set for pedestrian image quality evaluation is obtained.

And preprocessing such as pedestrian detection, data expansion and the like is carried out on the data set for pedestrian image quality evaluation, so that a pedestrian image data set is obtained.

In this embodiment, the effective pedestrian image may be extracted by using a pedestrian detection model for the image of the dataset for pedestrian image quality evaluation, and then data expansion may be performed on the effective pedestrian image by using image rotation, image plus noise, or the like.

The pedestrian detection model may be YOLOv, YOLOv5, SSD, or the like, and the effective pedestrian image is extracted by the pedestrian detection model.

When a pedestrian is shot through video monitoring, the pedestrian and surrounding scenes are shot, the collected images are subjected to a pedestrian detection model before a preprocessed data set is acquired, useless data such as the surrounding scenes are removed, and part of pedestrian images, namely effective pedestrian images, are reserved, so that the interference of complex backgrounds on pedestrian image quality evaluation can be effectively reduced.

It is understood that the obtained pedestrian image data set includes two types of images, namely a pedestrian sample image and a pedestrian label image corresponding to the pedestrian sample image, wherein the pedestrian label image includes label information of a pedestrian image quality evaluation of the corresponding pedestrian sample image.

And 120, inputting the pedestrian image data set into a pedestrian image quality evaluation model to be trained, and carrying out convolution processing through a backbone network of the pedestrian image quality evaluation model to obtain the characteristic information of the pedestrian sample image and the characteristic information of the pedestrian label image output by the backbone network.

The backbone network refers to a backbone part of the neural network model and is used for extracting characteristic information from the image.

The feature information of the pedestrian sample image comprises sample fine granularity feature information, sample sharpness feature information and sample texture feature information; the feature information of the pedestrian tag image includes tag fine-granularity feature information, tag sharpness feature information, and tag texture feature information.

In the embodiment, the acquired pedestrian image dataset is input into the pedestrian image quality evaluation model to be trained, convolution processing is carried out in the backbone network, and the characteristic information of the output pedestrian sample image and the characteristic information such as the fine granularity, the sharpness, the texture and the like of the pedestrian label image are acquired, so that the training of the pedestrian image quality evaluation model is facilitated.

It can be understood that the images in the pedestrian image dataset are processed in the backbone network, the images are processed in a convolution processing mode, and feature information such as fine granularity, sharpness and texture of the images is extracted.

After the pedestrian image dataset is obtained in step 110, the images in the pedestrian image dataset may be set to a uniform image specification before the pedestrian image dataset is input to the pedestrian image quality evaluation model to be trained in step 120, so that the pedestrian image quality evaluation model is convenient for processing the images.

For example, the size of all images of the pedestrian image data set may be set to 192×64 resolution, and the batch size of the pedestrian image data set may be set to 64.

The image pixel range of the pedestrian image data set is normalized, the average value [0, 0] is subtracted, the standard deviation [255, 255, 255] is divided, and the image specification of the pedestrian image data set is unified, so that the pedestrian image quality evaluation model is beneficial to processing images.

And 130, acquiring the feature similarity between the feature information of the pedestrian sample image and the feature information of the pedestrian label image.

In actual execution, after the degree of similarity between the feature information of the pedestrian sample image and the feature information of the pedestrian tag image is acquired, the feature similarity is calculated using a loss function in the pedestrian image quality evaluation model.

For example, the feature similarity is calculated by using a loss function, and the feature similarity can be obtained by respectively calculating the similarity corresponding to the feature information such as fine granularity, sharpness, texture and the like in the pedestrian sample image and the pedestrian label image, and respectively carrying out weighted summation on the calculation results.

For another example, the feature similarity is calculated by using a loss function, and the feature similarity can be obtained by respectively obtaining corresponding comprehensive features from the feature information such as fine granularity, sharpness, texture and the like in the pedestrian sample image and the pedestrian label image and calculating according to the corresponding comprehensive features.

And 140, updating network parameters of the pedestrian image quality evaluation model based on the feature similarity to obtain a trained pedestrian image quality evaluation model.

The network parameters refer to parameters capable of independently reflecting characteristics of the backbone network.

For example, based on the feature similarity, the weight of the pedestrian image quality evaluation model, which is a parameter representing the relationship used to calculate the input and output samples, may be updated.

In this embodiment, according to the feature similarity between the feature information of the pedestrian sample image and the feature information of the pedestrian label image, the network parameters such as the weight of the pedestrian image quality evaluation model are updated until the training is completed, and the trained pedestrian image quality evaluation model is obtained.

In actual implementation, the network parameters of the pedestrian image quality evaluation model are updated, and the network parameters can be updated through back propagation of the network.

In this embodiment, when training the pedestrian image quality evaluation model, a target threshold value or the number of iterations may be set as the condition for the training being completed.

For example, when the obtained feature similarity is greater than a target threshold, updating the network parameters is stopped, and a trained pedestrian image quality evaluation model is obtained.

For another example, when the iteration number of training the pedestrian image quality evaluation model reaches the preset iteration number, stopping training to obtain a trained pedestrian image quality evaluation model.

In the embodiment of the application, the backbone network of the pedestrian image quality evaluation model acquires the characteristics of fine granularity, sharpness, texture and the like of the pedestrian sample image and the pedestrian label image through convolution processing, fully excavates the global semantic characteristics and the local information characteristics of the pedestrian image, carries out model training, can evaluate the pedestrian image quality more comprehensively and objectively, and improves the accuracy of the pedestrian image quality evaluation model in evaluating the pedestrian image quality.

According to the training method of the pedestrian image quality evaluation model provided by the embodiment of the application, the acquired pedestrian image dataset is input into the pedestrian image quality evaluation model to be trained, the main network of the pedestrian image quality evaluation model is used for carrying out convolution processing, the global semantic features and the local information features of the pedestrian image are mined, the feature information such as the fine granularity, the sharpness and the texture of the pedestrian sample image and the pedestrian label image is obtained, the pedestrian image quality is evaluated more comprehensively and objectively, the network parameters of the pedestrian image quality evaluation model are trained and updated based on the feature similarity, and the accuracy and the reliability of the model for evaluating the pedestrian image quality are improved.

In some embodiments, the convolution processing is performed through a backbone network of the pedestrian image quality evaluation model, including:

And carrying out multi-order center differential convolution on the images of the pedestrian image data set through the backbone network.

The multi-order center difference convolution may be a multi-order calculation method such as a second-order center difference convolution, a third-order center difference convolution, and a fourth-order center difference convolution. The embodiment of the application carries out the convolution processing of the image through the multi-order center differential convolution, can realize the detection of the image edge and the extraction of the characteristic information, and can ensure the accuracy of the extracted image characteristic.

In the embodiment of the application, the characteristic information of the images of the pedestrian image data set is extracted by adopting multi-order central differential convolution, the fine granularity characteristics of the images of the pedestrian image data set can be extracted, the sharpness characteristics and the texture characteristics of the images of the pedestrian image data set can be extracted, and the accuracy and the reliability of the pedestrian image quality evaluation are improved by extracting more effective image characteristics.

In this embodiment, the second-order center difference convolution is performed on the image of the pedestrian image dataset through the backbone network of the pedestrian image quality evaluation model, and the fine-granularity feature, the sharpness feature, and the texture feature of the image of the pedestrian image dataset are obtained.

In the embodiment of the application, the fine granularity characteristics, the sharpness characteristics and the texture characteristics of the images of the pedestrian image data set are obtained by carrying out multi-order central difference convolution on the main network of the pedestrian sample image and the pedestrian label image in the pedestrian image quality evaluation model, so that the problem that the existing central difference convolution can only extract the fine granularity characteristics of the pedestrian image data set can be solved, and the accuracy and the reliability of pedestrian image quality evaluation can be improved by extracting more effective image characteristics.

In some embodiments, performing a multi-order central differential convolution on an image of a pedestrian image dataset over a backbone network may include:

Applying the formula

In this embodiment, a second order center differential convolution is taken as an example, where,Is thatThe second order center difference,The application formula of (2) is as follows:

wherein,Is the value of p_n for the feature layer at position.

In this embodiment, unlike the existing center differential convolution manner, the fine granularity, sharpness, and texture features of the image of the pedestrian image dataset can be obtained by performing a multi-order center differential convolution on the image of the pedestrian image dataset.

In some embodiments, the backbone network is ResNet, the backbone network comprising a convolutional layer, a central differential convolutional layer, a batch normalization layer, an activation function layer, a global pooling layer, and an average pooling layer, wherein the central differential convolutional layer is configured to perform a second order central differential convolution.

The ResNet network takes ResNet as a basic framework, has 18 layers of depth, and realizes the functions of identifying pedestrian images, extracting characteristic information and the like through the ResNet network.

The convolution layer is used for extracting image features of the pedestrian image; the center differential convolution layer is used for executing multi-order center differential convolution; the batch normalization layer is used for carrying out standardization processing on the data; the activation function layer is used for adding nonlinear factors and solving the problem which cannot be solved by the linear model; the global pooling layer is used for replacing the full-connection layer and can accept images with any size; the average pooling layer is used for replacing the full-connection layer, so that the pressure in the learning process is reduced.

In this embodiment, resNet is used as a backbone network, and a central differential convolution layer in the ResNet network is used to perform multi-order central differential convolution, and in cooperation with the convolution layer, the batch normalization layer, the activation function layer, the global pooling layer and the average pooling layer, the collection of characteristic information such as fine granularity, sharpness and texture of the pedestrian sample image and the pedestrian label image is realized.

The loss function of the pedestrian image quality evaluation model is described below.

In some embodiments, obtaining feature similarity between feature information of a pedestrian sample image and feature information of a pedestrian tag image includes:

The loss function adopts bulldozer distance (WASSERSTEIN DISTANCE), and image features are obeyed to a unified feature space through the loss function, so that training efficiency of the model is improved.

In actual execution, feature similarity between feature information of the pedestrian sample image and the pedestrian tag image is calculated by using a bulldozer loss function, and whether the pedestrian image quality evaluation model is good or bad is judged by the feature similarity.

The training strategy of the pedestrian image quality evaluation model is described below.

In some embodiments, the training strategy of the pedestrian image quality assessment model is a cosine annealing learning rate strategy.

The training strategy adopts a cosine annealing learning rate strategy (CosineAnnealingLR), and the learning rate is adjusted through the cosine annealing learning rate strategy.

The cosine annealing learning rate strategy is a training strategy adopting cosine function descent, and can enable learning to slowly descend first and then rapidly descend.

In the embodiment, the training strategy of the pedestrian image quality evaluation model adopts a cosine annealing learning rate strategy, and the learning rate is reduced by adopting a linear function, so that the accuracy of the pedestrian image quality evaluation model is maintained.

The following describes the formula of the cosine annealing learning rate strategy in detail.

The cosine annealing learning rate strategy has the formula expression:

Where l_r denotes the learning rate of the current iteration number, l_R denotes the initial learning rate, l_min denotes the minimum learning rate, cos denotes the cosine function, epoch is the current iteration number, and T_max denotes 1/2 of the cosine period.

A specific embodiment is described below.

As shown in fig. 2, the video monitoring data is collected and filtered, and the video of the video monitoring scene is stored in a frame-skipping manner, and an image is stored every 10 frames.

And constructing a pedestrian image dataset, extracting an effective pedestrian image from the acquired image by adopting a YOLOv, YOLOv5, SSD and other pedestrian detection models, and performing data expansion on the effective pedestrian image by adopting modes of image rotation, image noise and the like.

The image specification is unified, the size of the pedestrian image dataset image is set to 192x64, the batch size is set to 64, normalization is performed, the mean value [0, 0] is subtracted, and the standard deviation [255, 255, 255] is divided.

And adopting a uniformly distributed sampling method for the pedestrian image dataset, selecting ResNet as a main network, and setting a convolution mode in the pedestrian image quality evaluation model into a multi-order center differential convolution mode.

Adopts a cosine annealing learning rate strategy: As a training strategy, where l_r denotes the learning rate of the current iteration number, l_R denotes the initial learning rate, l_min denotes the minimum learning rate, cos denotes the cosine function, epoch is the current iteration number, and T_max denotes 1/2 of the cosine period.

The total iteration number is set to 300, T_max is set to 300, the minimum learning rate l_min is set to 0, and the initial learning rate l_R is set to 0.1.

And extracting the characteristic information of the pedestrian sample image and the characteristic information of the pedestrian label image by a convolution layer, a central difference convolution layer, a batch normalization layer, an activation function layer, a global pooling layer and an average pooling layer of a backbone network.

And calculating the similarity between the pedestrian sample image and the pedestrian label image by adopting a bulldozer distance loss function.

And updating network parameters through the back propagation of the network, and training the pedestrian image quality evaluation model.

And testing the trained network by adopting the test image, and outputting the test image quality evaluation score.

In the embodiment, by setting a convolution mode of multi-order center difference, fine granularity features, sharpness features and texture features in the pedestrian sample image can be accurately extracted, the credibility of pedestrian image quality evaluation can be improved, and subsequent pedestrian attribute analysis is facilitated.

The embodiment of the application also provides a pedestrian image quality evaluation method.

As shown in fig. 3, the pedestrian image quality evaluation method includes: step 310 and step 320.

Step 310, an image of the pedestrian to be evaluated is acquired.

The pedestrian image to be evaluated is a pedestrian image subjected to data preprocessing.

In the step, the image of the pedestrian monitored by the video is saved through frame skipping, and the pedestrian image is preprocessed to obtain the image of the pedestrian to be evaluated.

Step 320, inputting the pedestrian image to be evaluated into the pedestrian image quality evaluation model to obtain the quality evaluation score of the pedestrian image to be evaluated output by the pedestrian image quality evaluation model.

The pedestrian image quality evaluation model is obtained by training based on the training method of the pedestrian image quality evaluation model.

In the embodiment, a pedestrian image to be evaluated is input into a trained pedestrian image quality evaluation model, fine granularity characteristics, sharpness characteristics and texture characteristics of the pedestrian image to be evaluated are obtained in a multi-order center difference convolution mode, and quality evaluation scores of the pedestrian image to be evaluated are output.

According to the pedestrian image quality evaluation method provided by the embodiment of the application, the acquired pedestrian image to be evaluated is input into the trained pedestrian image quality evaluation model, the fine granularity characteristic, the sharpness characteristic and the texture characteristic of the pedestrian image are accurately extracted, and the quality evaluation score of the pedestrian image to be evaluated is output, so that the accuracy of pedestrian image quality evaluation can be improved, and the subsequent analysis of pedestrian attributes is facilitated.

According to the training method for the pedestrian image quality evaluation model, provided by the embodiment of the application, the execution subject can be a training device for the pedestrian image quality evaluation model. In the embodiment of the application, a training method for executing the pedestrian image quality evaluation model by using the training device for the pedestrian image quality evaluation model is taken as an example, and the training device for the pedestrian image quality evaluation model provided by the embodiment of the application is described.

The embodiment of the application also provides a training device of the pedestrian image quality evaluation model.

As shown in fig. 4, the training device of the pedestrian image quality evaluation model includes:

A first obtaining module 410, configured to obtain a pedestrian image dataset, where the pedestrian image dataset includes a plurality of pedestrian sample images and pedestrian tag images corresponding to the pedestrian sample images;

The first processing module 420 is configured to input a pedestrian image dataset into a pedestrian image quality evaluation model to be trained, perform convolution processing through a backbone network of the pedestrian image quality evaluation model, obtain feature information of a pedestrian sample image and feature information of a pedestrian label image output by the backbone network, where the feature information of the pedestrian sample image includes a sample fine granularity feature, a sample sharpness feature and a sample texture feature, and the feature information of the pedestrian label image includes a label fine granularity feature, a label sharpness feature and a label texture feature;

a second obtaining module 430, configured to obtain feature similarity between feature information of the pedestrian sample image and feature information of the pedestrian tag image;

The second processing module 440 is configured to update network parameters of the pedestrian image quality evaluation model based on the feature similarity, and obtain a trained pedestrian image quality evaluation model.

According to the training device of the pedestrian image quality evaluation model provided by the embodiment of the application, the acquired pedestrian image dataset is input into the pedestrian image quality evaluation model to be trained, the main network of the pedestrian image quality evaluation model is used for carrying out convolution processing, the global semantic features and the local information features of the pedestrian image are mined, the feature information such as the fine granularity, the sharpness and the texture of the pedestrian sample image and the pedestrian label image is obtained, the pedestrian image quality is evaluated more comprehensively and objectively, the network parameters of the pedestrian image quality evaluation model are updated based on the feature similarity, and the accuracy and the reliability of the model for evaluating the pedestrian image quality are improved.

In some embodiments, the first processing module 420 is configured to perform convolution processing through a backbone network of the pedestrian image quality evaluation model, including:

In some embodiments, the first processing module 420 is configured to perform a multi-order central differential convolution on an image of a pedestrian image dataset over a backbone network, including:

Applying the formula

In some embodiments, the first processing module 420 is configured for a backbone network ResNet, where the backbone network includes a convolution layer, a central differential convolution layer, a batch normalization layer, an activation function layer, a global pooling layer, and an average pooling layer, the central differential convolution layer configured to perform a multi-order central differential convolution.

In some embodiments, the second obtaining module 430, configured to obtain a feature similarity between feature information of the pedestrian sample image and feature information of the pedestrian tag image, includes:

The embodiment of the application also provides a pedestrian image quality evaluation device.

As shown in fig. 5, the pedestrian image quality evaluation device includes:

A third obtaining module 510, configured to obtain an image of a pedestrian to be evaluated;

The third processing module 520 is configured to input the pedestrian image to be evaluated to the pedestrian image quality evaluation model, and obtain a quality evaluation score of the pedestrian image to be evaluated output by the pedestrian image quality evaluation model;

According to the pedestrian image quality evaluation device provided by the embodiment of the application, the acquired pedestrian image to be evaluated is input into the trained pedestrian image quality evaluation model, the fine granularity characteristic, the sharpness characteristic and the texture characteristic of the pedestrian image are accurately extracted, and the quality evaluation score of the pedestrian image to be evaluated is output, so that the accuracy of pedestrian image quality evaluation can be improved, and the subsequent analysis of pedestrian attributes is facilitated.

The training device of the pedestrian image quality evaluation model in the embodiment of the application can be electronic equipment, and can also be a component in the electronic equipment, such as an integrated circuit or a chip. The electronic device may be a terminal, or may be other devices than a terminal. The electronic device may be a Mobile phone, a tablet computer, a notebook computer, a palm computer, a vehicle-mounted electronic device, a Mobile internet appliance (Mobile INTERNET DEVICE, MID), an augmented reality (augmented reality, AR)/Virtual Reality (VR) device, a robot, a wearable device, an ultra-Mobile personal computer (UMPC), a netbook or a Personal Digital Assistant (PDA), etc., and may also be a server, a network attached storage (Network Attached Storage, NAS), a personal computer (personal computer, PC), a Television (TV), a teller machine, a self-service machine, etc., which are not particularly limited in the embodiments of the present application.

The training device of the pedestrian image quality evaluation model in the embodiment of the application can be a device with an operating system. The operating system may be an Android operating system, an IOS operating system, or other possible operating systems, and the embodiment of the present application is not limited specifically.

The training device for the pedestrian image quality evaluation model provided by the embodiment of the application can realize each process realized by the method embodiments of fig. 1 to 2, and in order to avoid repetition, the description is omitted here.

In some embodiments, as shown in fig. 6, an electronic device 600 is further provided in the embodiments of the present application, which includes a processor 601, a memory 602, and a computer program stored in the memory 602 and capable of running on the processor 601, where the program, when executed by the processor 601, implements the respective processes of the training method embodiment of the pedestrian image quality evaluation model, and the same technical effects can be achieved, so that repetition is avoided and redundant description is omitted herein.

The electronic device in the embodiment of the application includes the mobile electronic device and the non-mobile electronic device.

The embodiment of the application also provides a non-transitory computer readable storage medium, on which a computer program is stored, which when executed by a processor, implements the respective processes of the training method embodiment of the pedestrian image quality evaluation model, and can achieve the same technical effects, and in order to avoid repetition, the description is omitted here.

Wherein the processor is a processor in the electronic device described in the above embodiment. The readable storage medium includes computer readable storage medium such as computer readable memory ROM, random access memory RAM, magnetic or optical disk, etc.

The embodiment of the application also provides a computer program product, which comprises a computer program, wherein the computer program realizes the training method of the pedestrian image quality evaluation model when being executed by a processor.

The embodiment of the application further provides a chip, the chip comprises a processor and a communication interface, the communication interface is coupled with the processor, the processor is used for running programs or instructions, the processes of the training method embodiment of the pedestrian image quality evaluation model can be realized, the same technical effects can be achieved, and the repetition is avoided, and the description is omitted here.

It should be understood that the chips referred to in the embodiments of the present application may also be referred to as system-on-chip chips, chip systems, or system-on-chip chips, etc.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element. Furthermore, it should be noted that the scope of the methods and apparatus in the embodiments of the present application is not limited to performing the functions in the order shown or discussed, but may also include performing the functions in a substantially simultaneous manner or in an opposite order depending on the functions involved, e.g., the described methods may be performed in an order different from that described, and various steps may be added, omitted, or combined. Additionally, features described with reference to certain examples may be combined in other examples.

From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a computer software product stored in a storage medium (e.g. ROM/RAM, magnetic disk, optical disk) comprising instructions for causing a terminal (which may be a mobile phone, a computer, a server, or a network device, etc.) to perform the method according to the embodiments of the present application.

The embodiments of the present application have been described above with reference to the accompanying drawings, but the present application is not limited to the above-described embodiments, which are merely illustrative and not restrictive, and many forms may be made by those having ordinary skill in the art without departing from the spirit of the present application and the scope of the claims, which are to be protected by the present application.

In the description of the present specification, reference to the terms "one embodiment," "some embodiments," "illustrative embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the application. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiments or examples. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

While embodiments of the present application have been shown and described, it will be understood by those of ordinary skill in the art that: many changes, modifications, substitutions and variations may be made to the embodiments without departing from the spirit and principles of the application, the scope of which is defined by the claims and their equivalents.

Claims

1. A training method of a pedestrian image quality evaluation model, comprising:

inputting the pedestrian image data set into a pedestrian image quality evaluation model to be trained, and carrying out multi-order central difference convolution on images of the pedestrian image data set through a backbone network of the pedestrian image quality evaluation model to obtain characteristic information of the pedestrian sample image and characteristic information of the pedestrian label image output by the backbone network, wherein the characteristic information of the pedestrian sample image comprises sample fine granularity characteristics, sample sharpness characteristics and sample texture characteristics, and the characteristic information of the pedestrian label image comprises label fine granularity characteristics, label sharpness characteristics and label texture characteristics;

updating network parameters of the pedestrian image quality evaluation model based on the feature similarity to obtain a trained pedestrian image quality evaluation model;

the performing multi-order center differential convolution on the image of the pedestrian image dataset through the backbone network includes:

Applying the formula

Wherein, p₀ is the convolution center position, θ, λ are the super-parameters,For the weight of the convolution kernel at position p_n,For a value of p_n for the feature layer at position, R is the range of convolution kernels,For the feature output at position p⁰ after center differential convolution,Is thatCorresponding multi-order center differences.

2. The training method of a pedestrian image quality evaluation model according to claim 1, wherein the backbone network is ResNet, and the backbone network includes a convolution layer, a central differential convolution layer, a batch normalization layer, an activation function layer, a global pooling layer, and an average pooling layer, and the central differential convolution layer is configured to perform multi-order central differential convolution.

3. The training method of the pedestrian image quality evaluation model according to any one of claims 1 to 2, characterized in that the obtaining of the feature similarity between the feature information of the pedestrian sample image and the feature information of the pedestrian tag image includes:

4. The training method of a pedestrian image quality evaluation model according to any one of claims 1 to 2, wherein the training strategy of the pedestrian image quality evaluation model is a cosine annealing learning rate strategy.

5. A pedestrian image quality evaluation method, characterized by comprising:

Acquiring a pedestrian image to be evaluated;

The pedestrian image quality evaluation model is trained based on the training method of the pedestrian image quality evaluation model according to any one of claims 1 to 4.

6. A training apparatus of a pedestrian image quality evaluation model, characterized in that the apparatus is for performing the training method of a pedestrian image quality evaluation model as claimed in any one of claims 1 to 4, and comprises:

7. A pedestrian image quality evaluation device characterized by comprising:

8. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the training method of the pedestrian image quality evaluation model of any one of claims 1-4 or the pedestrian image quality evaluation method of claim 6 when the program is executed by the processor.