Disclosure of Invention
In view of the above, it is necessary to provide a processing method, an apparatus, a computer device, and a storage medium for pedestrian re-recognition capable of reducing the difficulty of recognition when a target object is occluded, in order to solve the above technical problems.
In a first aspect, an embodiment of the present invention provides a processing method for pedestrian re-identification, where the method includes:
inputting the recognition image to be re-recognized into a pre-trained prediction neural network to obtain a plurality of sub-features of the recognition image output by the prediction neural network and a visibility confidence corresponding to each sub-feature of the recognition image; the identification image comprises a target object, a plurality of sub-features of the identification image correspond to a plurality of partial areas of the target object, and the plurality of partial areas of the target object form the target object; the visibility confidence coefficient is used for indicating the probability that the partial region of the target object corresponding to each sub-feature of the recognition image is not occluded;
determining features of the recognition image according to the plurality of sub-features of the recognition image and the visibility confidence corresponding to each sub-feature of the recognition image;
and searching a target image containing the target object in a preset image database according to the characteristics of the recognition image.
In one embodiment, before the recognition image to be re-recognized is input into the pre-trained predictive neural network, the method further includes:
acquiring a training sample set; the training sample set comprises a plurality of training samples and visibility labels of the training samples; the training sample comprises a training object, a plurality of sub-features of the training sample correspond to a plurality of partial regions of the training object, the plurality of partial regions of the training object form the training object, and the visibility label is used for indicating the visibility of each sub-feature of the training sample corresponding to the partial region of the training object;
and training the neural network based on the training sample set to obtain a predicted neural network.
In one embodiment, the obtaining of the training sample set includes:
obtaining a plurality of training samples;
and labeling each sub-feature in each training sample according to the pixel value to obtain the visibility label of each training sample.
In one embodiment, the labeling each sub-feature in each training sample according to the pixel value to obtain a visibility label of each training sample includes:
calculating an average pixel value of the training samples;
if the average pixel value of the partial area of the training object corresponding to the sub-features of the training sample is larger than the average pixel value of the training sample, marking the sub-features of the training sample as visible;
and if the average pixel value of the partial region of the training object corresponding to the sub-features of the training sample is less than or equal to the average pixel value of the training sample, marking the sub-features of the training sample as invisible.
In one embodiment, the training of the neural network based on the training sample set to obtain the predicted neural network includes:
extracting the features of the training sample through a neural network to obtain the features of the training sample, and dividing the features of the training sample into a plurality of sub-features;
convolution is carried out on the sub-features of the training sample by adopting a convolution layer with unshared weights, and a one-dimensional feature vector corresponding to each sub-feature of the training sample is obtained through a global average pooling layer and a full-connection layer;
obtaining a visibility confidence corresponding to each sub-feature of the training sample according to the activation function and the one-dimensional feature vector corresponding to each sub-feature of the training sample;
and training the neural network according to the visibility label of the training sample and the visibility confidence corresponding to each sub-feature of the training sample to obtain a predicted neural network.
In one embodiment, the determining the feature of the recognition image according to the plurality of sub-features of the recognition image and the visibility confidence corresponding to each sub-feature of the recognition image includes:
weighting and calculating a plurality of sub-features of the recognition image and the visibility confidence corresponding to each sub-feature of the recognition image to obtain a plurality of intermediate features;
and performing summation calculation on the plurality of intermediate features to obtain the features of the identification image.
In one embodiment, the searching for the target image including the target object in the preset image database according to the target feature of the recognition image includes:
extracting the characteristics of each candidate image in the image database to obtain the characteristics of each candidate image;
searching out target characteristics matched with the characteristics of the recognition images from the characteristics of the candidate images;
and determining the candidate image corresponding to the target feature as the target image.
In one embodiment, the feature of the identification image is a whole-body feature of a pedestrian; the plurality of sub-features of the recognition image include an upper-body feature and a lower-body feature of the pedestrian.
In a second aspect, an embodiment of the present invention provides a processing apparatus for pedestrian re-identification, where the apparatus includes:
the visibility prediction module is used for inputting the recognition image to be re-recognized into a pre-trained prediction neural network to obtain a plurality of sub-features of the recognition image output by the prediction neural network and a visibility confidence coefficient corresponding to each sub-feature of the recognition image; the identification image comprises a target object, a plurality of sub-features of the identification image correspond to a plurality of partial areas of the target object, and the plurality of partial areas of the target object form the target object; the visibility confidence coefficient is used for indicating the probability that the partial region of the target object corresponding to each sub-feature of the recognition image is not occluded;
the recognition image feature determination module is used for determining the features of the recognition image according to the plurality of sub-features of the recognition image and the visibility confidence degrees corresponding to the sub-features of the recognition image;
and the target image searching module is used for searching a target image containing a target object in a preset image database according to the characteristics of the recognition image.
In one embodiment, the apparatus further comprises:
the training sample set acquisition module is used for acquiring a training sample set; the training sample set comprises a plurality of training samples and visibility labels of the training samples; the training sample comprises a training object, a plurality of sub-features of the training sample correspond to a plurality of partial regions of the training object, the plurality of partial regions of the training object form the training object, and the visibility label is used for indicating the visibility of each sub-feature of the training sample corresponding to the partial region of the training object;
and the training module is used for training the neural network based on the training sample set to obtain the predicted neural network.
In one embodiment, the training sample set obtaining module includes:
the training sample acquisition sub-module is used for acquiring a plurality of training samples;
and the visibility label acquisition sub-module is used for labeling each sub-feature in each training sample according to the pixel value to obtain the visibility label of each training sample.
In one embodiment, the visibility label acquiring sub-module is specifically configured to calculate an average pixel value of a training sample; if the average pixel value of the partial area of the training object corresponding to the sub-features of the training sample is larger than the average pixel value of the training sample, marking the sub-features of the training sample as visible; and if the average pixel value of the partial region of the training object corresponding to the sub-features of the training sample is less than or equal to the average pixel value of the training sample, marking the sub-features of the training sample as invisible.
In one embodiment, the training module is specifically configured to perform feature extraction on a training sample through a neural network to obtain features of the training sample, and segment the features of the training sample into a plurality of sub-features; convolution is carried out on the sub-features of the training sample by adopting a convolution layer with unshared weights, and a one-dimensional feature vector corresponding to each sub-feature of the training sample is obtained through global average pooling and a full connection layer; obtaining a visibility confidence corresponding to each sub-feature of the training sample according to the activation function and the one-dimensional feature vector corresponding to each sub-feature of the training sample; and training the neural network according to the visibility label of the training sample and the visibility confidence corresponding to each sub-feature of the training sample to obtain a predicted neural network.
In one embodiment, the recognition image feature determining module is specifically configured to perform weighted calculation on a plurality of sub-features of the recognition image and a visibility confidence corresponding to each sub-feature of the recognition image to obtain a plurality of intermediate features; and performing summation calculation on the plurality of intermediate features to obtain the features of the identification image.
In one embodiment, the target image searching module is specifically configured to perform feature extraction on each candidate image in the image database to obtain features of each candidate image; searching out target characteristics matched with the characteristics of the recognition images from the characteristics of the candidate images; and determining the candidate image corresponding to the target feature as the target image.
In one embodiment, the feature of the identification image is a whole-body feature of a pedestrian; the plurality of sub-features of the recognition image include an upper-body feature and a lower-body feature of the pedestrian.
In a third aspect, an embodiment of the present invention provides a computer device, which includes a memory and a processor, where the memory stores a computer program, and the processor implements the steps in the method as described above when executing the computer program.
In a fourth aspect, the present invention provides a computer-readable storage medium, on which a computer program is stored, and the computer program, when executed by a processor, implements the steps in the method as described above.
According to the processing method, the device, the computer equipment and the storage medium for pedestrian re-identification, the identification image to be re-identified is input into the pre-trained prediction neural network, and a plurality of sub-features of the identification image output by the prediction neural network and the visibility confidence degree corresponding to each sub-feature of the identification image are obtained; determining features of the recognition image according to the plurality of sub-features of the recognition image and the visibility confidence corresponding to each sub-feature of the recognition image; and searching a target image containing the target object in a preset image database according to the characteristics of the recognition image. According to the embodiment of the invention, the prediction neural network extracts the features of the recognition image containing the target object, divides the extracted features, then carries out visibility prediction on the plurality of divided sub-features to obtain the probability that the partial region of the target object corresponding to each sub-feature is not shielded, then determines the features of the recognition image according to the probability that the partial region of the target object corresponding to each sub-feature is not shielded, and finally carries out image search according to the features of the recognition image. Because the visibility confidence of the unblocked sub-features is higher, namely the proportion of the sub-features corresponding to the unblocked partial regions in the features of the obtained identification image is higher, the search is mainly carried out according to the sub-features corresponding to the unblocked partial regions when the image search is carried out subsequently; however, the sub-features of the blocked partial area are also considered during searching, so that the accuracy of image searching is improved, and the difficulty of image searching is reduced.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
The processing method for pedestrian re-identification provided by the application can be applied to the application environment shown in fig. 1. The application environment includes a terminal 102 and aserver 104, the terminal 102 communicating with theserver 104 through a network. The terminal 102 may be, but not limited to, various personal computers, notebook computers, smart phones, tablet computers, and portable wearable devices, and theserver 104 may be implemented by an independent server or a server cluster formed by a plurality of servers.
In one embodiment, as shown in fig. 2, a processing method for pedestrian re-identification is provided, which is described by taking the method as an example applied to the server in fig. 1, and includes the following steps:
step 201, inputting the recognition image to be re-recognized into a pre-trained prediction neural network, and obtaining a plurality of sub-features of the recognition image output by the prediction neural network and a visibility confidence corresponding to each sub-feature of the recognition image.
In this embodiment, the recognition image includes a target object, and the sub-features of the plurality of recognition images correspond to a plurality of partial regions of the target object; a plurality of partial areas of the target object constitute the target object; the visibility confidence is used to indicate the probability that the partial region of the target object contained by the sub-feature of each identified image is not occluded. Specifically, a predictive neural network is trained in advance, and an identification image to be re-identified is input into the predictive neural network. After receiving the identification image, the prediction neural network extracts the features of the identification image, divides the extracted features into a plurality of sub-features, and predicts whether a partial region of the target object corresponding to each sub-feature is shielded or not to obtain a visibility confidence coefficient corresponding to each sub-feature.
For example, the target object included in the recognition image may be a pedestrian, a vehicle, or the like. In the present embodiment, a pedestrian is taken as an example, each sub-feature of the recognition image may correspond to a partial region of the pedestrian, for example, the sub-feature of the recognition image corresponds to the upper body or the lower body of the pedestrian, or corresponds to the head, the arms, the body, the legs, and the like. After the feature extraction is carried out on the recognition image by the prediction neural network, the extracted feature A is divided into sub-features B1 and B2, the sub-feature B1 is predicted and the visibility confidence coefficient is output to be 0.92, and the sub-feature B2 is predicted and the visibility confidence coefficient is output to be 0.15. That is, the probability that the partial region X1 of the target object corresponding to the sub-feature B1 is not occluded is 92%, and the probability that the partial region X2 of the target object corresponding to the sub-feature B2 is not occluded is 15%, that is, the partial region of the target object corresponding to the sub-feature B1 of the recognition image is not occluded, and the partial region of the target object corresponding to the sub-feature B2 of the recognition image is occluded.
Step 202, determining the features of the recognition image according to the plurality of sub-features of the recognition image and the visibility confidence corresponding to each sub-feature of the recognition image.
In this embodiment, the features of the recognition image may be calculated according to the plurality of sub-features of the recognition image and the visibility confidence corresponding to each sub-feature. In one embodiment, determining the features of the recognition image may specifically include the steps of: carrying out weighted calculation on each sub-feature of the identified image and the corresponding visibility confidence coefficient to obtain a plurality of intermediate features; and performing summation calculation on the plurality of intermediate features to obtain the features of the identification image.
For example, the sub-feature B1 of the recognition image is multiplied by the visibility confidence coefficient 0.92 to obtain an intermediate feature B1, the sub-feature B2 of the recognition image is multiplied by the visibility confidence coefficient 0.15 to obtain an intermediate feature B2, and finally, the two intermediate features B1 and B2 are added to obtain the feature a of the recognition image.
It can be understood that the visibility confidence of the non-occluded sub-features is higher, and the visibility confidence of the occluded sub-features is lower, so that the proportion of the sub-features of the non-occluded partial region in the features of the identified image is higher, and in the subsequent image search, the search is mainly performed according to the sub-features of the non-occluded partial region, but the sub-features of the occluded partial region are also considered, instead of cutting off the sub-features of the occluded partial region to reduce the features of the identified image, so that the accuracy of the image search is improved.
Step 203, searching a target image containing the target object in a preset image database according to the characteristics of the recognition image.
In this embodiment, an image database is preset, a large number of candidate images are stored in the image database, and feature extraction is performed on the candidate images to be searched to obtain features of the candidate images. Wherein the feature of each candidate image has the same dimension as the feature of the recognition image. At this time, the features of the recognition images are compared with the features of the respective candidate images, and when the feature of one of the candidate images matches the feature of the recognition image, the candidate image is determined as the target image. I.e. the target image and the recognition image contain the same target object.
For example, feature extraction is performed on candidate images C1 and C2 … … C100 to be searched, so that features C1 and C2 … … C100 of the candidate images are obtained; the feature a of the recognition image is compared with the features c1, c2 … … c100 of the candidate images one by one. And if the similarity between the feature C15 of the candidate image and the feature a of the recognition image is greater than the preset similarity, determining the candidate image C15 as the target image. The preset similarity is not limited in detail, and can be set according to actual conditions.
In the processing method for re-identifying the pedestrians, the prediction neural network extracts and divides the feature of the identification image containing the target object, performs visibility prediction on the plurality of divided sub-features to obtain the probability that each sub-feature is not shielded, then determines the feature of the identification image according to the probability that each sub-feature is not shielded, and performs image search according to the feature of the identification image. Through the embodiment of the invention, the visibility confidence of the unshielded sub-features in the features of the identification image is higher, namely the occupation ratio of the sub-features corresponding to the unshielded partial area in the features of the obtained identification image is higher, so that the search is mainly carried out according to the sub-features corresponding to the unshielded partial area when the image search is carried out subsequently; however, the sub-features of the blocked partial area are also considered during searching, so that the accuracy of image searching is improved, and the difficulty of image searching is reduced.
In one embodiment, the features of the identified image are full-body features of a pedestrian; the plurality of sub-features of the recognition image include an upper-body feature and a lower-body feature of the pedestrian.
For example, the target object included in the recognition image is a pedestrian riding a bicycle, the prediction neural network performs feature extraction on the recognition image, and then averagely divides the extracted feature into an upper sub-feature and a lower sub-feature, wherein the upper sub-feature corresponds to the upper body feature of the pedestrian, and the lower sub-feature corresponds to the lower body feature of the pedestrian. For the recognition image, the prediction neural network can obtain the visibility confidence coefficient corresponding to the upper half body feature and the visibility confidence coefficient corresponding to the lower half body feature of the pedestrian, and further obtain the whole body feature of the pedestrian according to the upper half body feature of the pedestrian, the visibility confidence coefficient corresponding to the upper half body feature and the visibility confidence coefficient corresponding to the lower half body feature. The upper body of the pedestrian is not shielded by the vehicle, so that the visibility confidence corresponding to the upper body features of the pedestrian is higher; however, since the lower body feature of the pedestrian is blocked by the vehicle, the visibility confidence corresponding to the lower body feature of the pedestrian is low. When searching according to the identification image, the searching is mainly carried out according to the upper half body characteristics of the pedestrian, but the lower half body characteristics of the pedestrian are also considered, and the lower half body of the pedestrian is not cut off, so that the accuracy of image searching can be improved, and the difficulty of image searching is reduced.
In another embodiment, as shown in FIG. 3, this embodiment is directed to an alternative process of training a predictive neural network. On the basis of the embodiment shown in fig. 2, thestep 202 may specifically include the following steps:
301, acquiring a training sample set; the training sample set comprises a plurality of training samples and visibility labels of the training samples; the training sample comprises a training object, a plurality of sub-features of the training sample correspond to a plurality of partial regions of the training object, the plurality of partial regions of the training object form the training object, and the visibility label is used for indicating the visibility of each sub-feature of the training sample corresponding to the partial region of the training object.
In this embodiment,step 301 includes: obtaining a plurality of training samples; and labeling each sub-feature in each training sample according to the pixel value to obtain the visibility label of each training sample. Since the size of the pixel value of the training sample reflects the state of the partial region of the target object, each sub-feature in the training sample can be labeled according to the pixel value. The training object may be a pedestrian included in the training sample, and the plurality of sub-features of the training sample may include an upper body feature and a lower body feature of the pedestrian.
In one embodiment, labeling each sub-feature in each training sample according to a pixel value to obtain a visibility label of each training sample includes: calculating an average pixel value of the training samples; if the average pixel value of the partial area of the training object corresponding to the sub-features of the training sample is larger than the average pixel value of the training sample, marking the sub-features of the training sample as visible; and if the average pixel value of the partial region of the training object corresponding to the sub-features of the training sample is less than or equal to the average pixel value of the training sample, marking the sub-features of the training sample as invisible.
For example, the pixel value of each pixel in the training sample M is obtained, then the average pixel value is calculated to be 150, and if the average pixel value of the partial region of the training object corresponding to the sub-feature of the training sample is greater than the average pixel value 150, the sub-feature of the training sample is marked as visible; and marking the sub-features of the training sample as invisible if the average pixel value of the sub-features of the training sample corresponding to the partial region of the training object is less than 150.
And 302, training the neural network based on the training sample set to obtain a predicted neural network.
In this embodiment, after the training sample set is obtained, the neural network is trained according to a plurality of training samples in the training sample set and the visibility label of each training sample, so as to obtain a predicted neural network.
In one embodiment, the training process may specifically include the following steps: extracting the features of the training sample through a neural network to obtain the features of the training sample, and dividing the features of the training sample into a plurality of sub-features; convolution is carried out on the sub-features of the training sample by adopting a convolution layer with unshared weights, and a one-dimensional feature vector corresponding to each sub-feature of the training sample is obtained through a global average pooling layer and a full-connection layer; obtaining a visibility confidence corresponding to each sub-feature of the training sample according to the activation function and the one-dimensional feature vector corresponding to each sub-feature of the training sample; and training the neural network according to the visibility label of the training sample and the visibility confidence corresponding to each sub-feature of the training sample to obtain a predicted neural network.
For example, the training sample M is input to a neural network, the neural network performs feature extraction on the training sample M, and the extracted feature is divided into upper and lower sub-features N1 and N2. Then, the neural network performs convolution processing on the sub-feature N1 and the sub-feature N2 by using convolutional layers with unshared weights. Then, dimension reduction is carried out on the sub-features N1 and N2 through a global average pooling layer and a full connection layer, and a one-dimensional feature vector is obtained. And finally, obtaining the visibility confidence corresponding to the sub-feature N1 of the training sample and the visibility confidence corresponding to the sub-feature N2 of the training sample through an activation function. Calculating the difference between the visibility confidence coefficient and the visibility label of the sub-features of the training samples output by the neural network through a loss function; and optimizing the neural network according to the difference value. When the loss function tends to converge, the training is finished, and the prediction neural network adopted by the embodiment of the invention is obtained.
In the step of training the prediction neural network, a training sample set is obtained; and training the neural network based on the training sample set to obtain a predicted neural network. The prediction neural network trained by the embodiment of the invention can predict the visibility of a plurality of parts in a complete image, thereby outputting the probability that each part is not shielded, further being applied to image retrieval and avoiding the problem that the shielded part in the complete image is cut off and difficult to re-identify pedestrians.
In another embodiment, as shown in FIG. 4, this embodiment relates to an alternative process of image searching. On the basis of the embodiment shown in fig. 2, the method may specifically include the following steps:
step 401, clipping the original image to obtain an identification image including the target object.
In this embodiment, if the target object in the original image to be searched is not located in the center of the image, the original image may be cropped to obtain an identified image, so that the target object is located in the center of the identified image.
For example, if the portrait X is located in the lower half of the original image D, the original image is cropped to obtain the recognition image a, so that the portrait X is located in the center of the recognition image a.
Step 402, obtaining a plurality of training samples; labeling each sub-feature in each training sample according to the pixel value to obtain the visibility label of each training sample; the training sample comprises a training object, a plurality of sub-features of the training sample correspond to a plurality of partial regions of the training object, the plurality of partial regions of the training object form the training object, and the visibility label is used for indicating the visibility of each sub-feature of the training sample corresponding to the partial region of the training object.
In one embodiment, an average pixel value of the training samples is calculated; if the average pixel value of the partial region of the training object corresponding to the sub-features of the training sample is larger than the average pixel value of the training sample, marking the sub-features of the training sample as visible; and if the average pixel value of the partial region of the training object corresponding to the sub-features of the training sample is less than or equal to the average pixel value of the training sample, marking the sub-features of the training sample as invisible.
And 403, training the neural network based on the training sample set to obtain a predicted neural network.
In one embodiment, feature extraction is carried out on a training sample through a neural network to obtain features of the training sample, and the features of the training sample are divided into a plurality of sub-features; convolution is carried out on the sub-features of the training sample by adopting a convolution layer with unshared weights, and a one-dimensional feature vector corresponding to each sub-feature of the training sample is obtained through global average pooling and a full connection layer; obtaining a visibility confidence corresponding to each sub-feature of the training sample according to the activation function and the one-dimensional feature vector corresponding to each sub-feature of the training sample; and training the neural network according to the visibility label of the training sample and the visibility confidence corresponding to each sub-feature of the training sample to obtain a predicted neural network.
Step 404, inputting the recognition image to be re-recognized into a pre-trained prediction neural network, and obtaining a plurality of sub-features of the recognition image output by the prediction neural network and a visibility confidence corresponding to each sub-feature of the recognition image.
The identification image comprises a target object, and each sub-feature of the identification image corresponds to a partial region of the target object; the visibility confidence is used for indicating the probability that the partial region of the target object corresponding to each sub-feature of the recognition image is not occluded.
Step 405, determining features of the recognition image according to the plurality of sub-features of the recognition image and the visibility confidence corresponding to each sub-feature of the recognition image.
In one embodiment, the feature of the identification image is a whole-body feature of a pedestrian; the plurality of sub-features of the recognition image include an upper-body feature and a lower-body feature of the pedestrian.
Step 406, extracting the features of each candidate image in the image database to obtain the features of each candidate image; searching out target characteristics matched with the characteristics of the recognition images from the characteristics of the candidate images; and determining the candidate image corresponding to the target feature as the target image.
In the embodiment, the candidate features to be searched are subjected to feature extraction to obtain the features of each candidate image; then, a difference value between the feature of the recognition image and the feature of each candidate image may be calculated, and when the difference value is smaller than a preset difference value, a target feature matching the feature of the recognition image is found.
For example, difference values between the feature a of the recognition image and the features c1, c2 … … c100 of the candidate images are calculated, respectively. And determining that the difference value between the feature a of the identification image and the feature C15 of the candidate image is smaller than a preset difference value, and determining that the feature C15 of the candidate image is the target feature and the candidate image C15 is the target image. The target image C15 contains the same target object as the recognition image. The preset difference value is not limited in detail in the embodiment of the invention, and can be set according to actual conditions.
Or calculating the Euclidean distance between the two features, and determining that the two features are matched when the Euclidean distance is smaller than a preset distance. The matching mode is not limited in detail in the embodiment of the invention, and can be set according to actual conditions.
In the processing method for re-identifying the pedestrian, the original image is cut to obtain an identification image comprising a target object; inputting the recognition image to be re-recognized into a pre-trained prediction neural network to obtain a plurality of sub-features of the recognition image output by the prediction neural network and a visibility confidence corresponding to each sub-feature of the recognition image; and then, determining the characteristics of the recognition image according to the plurality of sub-characteristics of the recognition image and the visibility confidence degree corresponding to each sub-characteristic of the recognition image, and searching according to the characteristics of the recognition image to obtain the target image. According to the embodiment of the invention, when image searching is carried out subsequently, searching is carried out mainly according to the sub-characteristics of the partial area which is not shielded; however, the sub-features of the blocked partial area are also considered during searching, so that the accuracy of image searching is improved, and the difficulty of image searching is reduced.
It should be understood that although the various steps in the flow charts of fig. 2-4 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least some of the steps in fig. 2-4 may include multiple sub-steps or multiple stages that are not necessarily performed at the same time, but may be performed at different times, and the order of performance of the sub-steps or stages is not necessarily sequential, but may be performed in turn or alternating with other steps or at least some of the sub-steps or stages of other steps.
In one embodiment, as shown in fig. 5, there is provided a processing apparatus for pedestrian re-identification, including:
thevisibility prediction module 501 is configured to input the recognition image to be re-recognized into a pre-trained prediction neural network, so as to obtain a plurality of sub-features of the recognition image output by the prediction neural network and a visibility confidence corresponding to each sub-feature of the recognition image; the identification image comprises a target object, a plurality of sub-features of the identification image correspond to a plurality of partial areas of the target object, and the plurality of partial areas of the target object form the target object; the visibility confidence coefficient is used for indicating the probability that the partial region of the target object corresponding to each sub-feature of the recognition image is not occluded;
an identified imagefeature determination module 502 for determining features of the identified image based on the plurality of sub-features of the identified image and the visibility confidence corresponding to each sub-feature of the identified image;
and a targetimage searching module 503, configured to search a target image containing the target object in a preset image database according to the feature of the recognition image.
In one embodiment, the apparatus further comprises:
the training sample set acquisition module is used for acquiring a training sample set; the training sample set comprises a plurality of training samples and visibility labels of the training samples; the training sample comprises a training object, a plurality of sub-features of the training sample correspond to a plurality of partial regions of the training object, the plurality of partial regions of the training object form the training object, and the visibility label is used for indicating the visibility of each sub-feature of the training sample corresponding to the partial region of the training object;
and the training module is used for training the neural network based on the training sample set to obtain the predicted neural network.
In one embodiment, the training sample set obtaining module includes:
the training sample acquisition sub-module is used for acquiring a plurality of training samples;
and the visibility label acquisition sub-module is used for labeling each sub-feature in each training sample according to the pixel value to obtain the visibility label of each training sample.
In one embodiment, the visibility label acquiring sub-module is specifically configured to calculate an average pixel value of a training sample; if the average pixel value of the partial area of the training object corresponding to the sub-features of the training sample is larger than the average pixel value of the training sample, marking the sub-features of the training sample as visible; and if the average pixel value of the partial region of the training object corresponding to the sub-features of the training sample is less than or equal to the average pixel value of the training sample, marking the sub-features of the training sample as invisible.
In one embodiment, the training module is specifically configured to perform feature extraction on a training sample through a neural network to obtain features of the training sample, and segment the features of the training sample into a plurality of sub-features; convolution is carried out on the sub-features of the training sample by adopting a convolution layer with unshared weights, and a one-dimensional feature vector corresponding to each sub-feature of the training sample is obtained through a global average pooling layer and a full-connection layer; obtaining a visibility confidence corresponding to each sub-feature of the training sample according to the activation function and the one-dimensional feature vector corresponding to each sub-feature of the training sample; and training the neural network according to the visibility label of the training sample and the visibility confidence corresponding to each sub-feature of the training sample to obtain a predicted neural network.
In one embodiment, the recognition image feature determining module is specifically configured to perform weighted calculation on a plurality of sub-features of the recognition image and a visibility confidence corresponding to each sub-feature of the recognition image to obtain a plurality of intermediate features; and performing summation calculation on the plurality of intermediate features to obtain the features of the identification image.
In one embodiment, the target image searching module is specifically configured to perform feature extraction on each candidate image in the image database to obtain features of each candidate image; searching out target characteristics matched with the characteristics of the recognition images from the characteristics of the candidate images; and determining the candidate image corresponding to the target feature as the target image.
In one embodiment, the feature of the identification image is a whole-body feature of a pedestrian; the plurality of sub-features of the recognition image include an upper-body feature and a lower-body feature of the pedestrian.
The specific definition of the processing device for pedestrian re-identification can be referred to the above definition of the processing method for pedestrian re-identification, and is not described herein again. The modules in the processing device for pedestrian re-identification can be wholly or partially realized by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.
In one embodiment, a computer device is provided, which may be a server, and its internal structure diagram may be as shown in fig. 6. The computer device includes a processor, a memory, a network interface, and a database connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The database of the computer device is used for storing processing data of pedestrian re-identification. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a processing method of pedestrian re-identification.
Those skilled in the art will appreciate that the architecture shown in fig. 6 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.
In one embodiment, a computer device is provided, comprising a memory and a processor, the memory having a computer program stored therein, the processor implementing the following steps when executing the computer program:
inputting the recognition image to be re-recognized into a pre-trained prediction neural network to obtain a plurality of sub-features of the recognition image output by the prediction neural network and a visibility confidence corresponding to each sub-feature of the recognition image; the identification image comprises a target object, a plurality of sub-features of the identification image correspond to a plurality of partial areas of the target object, and the plurality of partial areas of the target object form the target object; the visibility confidence coefficient is used for indicating the probability that the partial region of the target object corresponding to each sub-feature of the recognition image is not occluded;
determining features of the recognition image according to the plurality of sub-features of the recognition image and the visibility confidence corresponding to each sub-feature of the recognition image;
and searching a target image containing the target object in a preset image database according to the characteristics of the recognition image.
In one embodiment, the processor, when executing the computer program, performs the steps of:
acquiring a training sample set; the training sample set comprises a plurality of training samples and visibility labels of the training samples; the training sample comprises a training object, a plurality of sub-features of the training sample correspond to a plurality of partial regions of the training object, the plurality of partial regions of the training object form the training object, and the visibility label is used for indicating the visibility of each sub-feature of the training sample corresponding to the partial region of the training object;
and training the neural network based on the training sample set to obtain a predicted neural network.
In one embodiment, the processor, when executing the computer program, performs the steps of:
obtaining a plurality of training samples;
and labeling each sub-feature in each training sample according to the pixel value to obtain the visibility label of each training sample.
In one embodiment, the processor, when executing the computer program, performs the steps of:
calculating an average pixel value of the training samples;
if the average pixel value of the partial area of the training object corresponding to the sub-features of the training sample is larger than the average pixel value of the training sample, marking the sub-features of the training sample as visible;
and if the average pixel value of the partial region of the training object corresponding to the sub-features of the training sample is less than or equal to the average pixel value of the training sample, marking the sub-features of the training sample as invisible.
In one embodiment, the processor, when executing the computer program, performs the steps of:
extracting the features of the training sample through a neural network to obtain the features of the training sample, and dividing the features of the training sample into a plurality of sub-features;
convolution is carried out on the sub-features of the training sample by adopting a convolution layer with unshared weights, and a one-dimensional feature vector corresponding to each sub-feature of the training sample is obtained through a global average pooling layer and a full-connection layer;
obtaining a visibility confidence corresponding to each sub-feature of the training sample according to the activation function and the one-dimensional feature vector corresponding to each sub-feature of the training sample;
and training the neural network according to the visibility label of the training sample and the visibility confidence corresponding to each sub-feature of the training sample to obtain a predicted neural network.
In one embodiment, the processor, when executing the computer program, performs the steps of:
weighting and calculating a plurality of sub-features of the recognition image and the visibility confidence corresponding to each sub-feature of the recognition image to obtain a plurality of intermediate features;
and performing summation calculation on the plurality of intermediate features to obtain the features of the identification image.
In one embodiment, the processor, when executing the computer program, performs the steps of:
extracting the characteristics of each candidate image in the image database to obtain the characteristics of each candidate image;
searching out target characteristics matched with the characteristics of the recognition images from the characteristics of the candidate images;
and determining the candidate image corresponding to the target feature as the target image.
In one embodiment, the plurality of sub-features of the other image includes an upper body feature and a lower body feature of the pedestrian.
In one embodiment, a computer-readable storage medium is provided, having a computer program stored thereon, which when executed by a processor, performs the steps of:
inputting the recognition image to be re-recognized into a pre-trained prediction neural network to obtain a plurality of sub-features of the recognition image output by the prediction neural network and a visibility confidence corresponding to each sub-feature of the recognition image; the identification image comprises a target object, a plurality of sub-features of the identification image correspond to a plurality of partial areas of the target object, and the plurality of partial areas of the target object form the target object; the visibility confidence coefficient is used for indicating the probability that the partial region of the target object corresponding to each sub-feature of the recognition image is not occluded;
determining features of the recognition image according to the plurality of sub-features of the recognition image and the visibility confidence corresponding to each sub-feature of the recognition image;
and searching a target image containing the target object in a preset image database according to the characteristics of the recognition image.
In one embodiment, the computer program when executed by the processor implements the steps of:
acquiring a training sample set; the training sample set comprises a plurality of training samples and visibility labels of the training samples; the training sample comprises a training object, a plurality of sub-features of the training sample correspond to a plurality of partial regions of the training object, the plurality of partial regions of the training object form the training object, and the visibility label is used for indicating the visibility of each sub-feature of the training sample corresponding to the partial region of the training object;
and training the neural network based on the training sample set to obtain a predicted neural network.
In one embodiment, the computer program when executed by the processor implements the steps of:
obtaining a plurality of training samples;
and labeling each sub-feature in each training sample according to the pixel value to obtain the visibility label of each training sample.
In one embodiment, the computer program when executed by the processor implements the steps of:
calculating an average pixel value of the training samples;
if the average pixel value of the partial area of the training object corresponding to the sub-features of the training sample is larger than the average pixel value of the training sample, marking the sub-features of the training sample as visible;
and if the average pixel value of the partial region of the training object corresponding to the sub-features of the training sample is less than or equal to the average pixel value of the training sample, marking the sub-features of the training sample as invisible.
In one embodiment, the computer program when executed by the processor implements the steps of:
extracting the features of the training sample through a neural network to obtain the features of the training sample, and dividing the features of the training sample into a plurality of sub-features;
convolution is carried out on the sub-features of the training sample by adopting a convolution layer with unshared weights, and a one-dimensional feature vector corresponding to each sub-feature of the training sample is obtained through a global average pooling layer and a full-connection layer;
obtaining a visibility confidence corresponding to each sub-feature of the training sample according to the activation function and the one-dimensional feature vector corresponding to each sub-feature of the training sample;
and training the neural network according to the visibility label of the training sample and the visibility confidence corresponding to each sub-feature of the training sample to obtain a predicted neural network.
In one embodiment, the computer program when executed by the processor implements the steps of:
weighting and calculating a plurality of sub-features of the recognition image and the visibility confidence corresponding to each sub-feature of the recognition image to obtain a plurality of intermediate features;
and performing summation calculation on the plurality of intermediate features to obtain the features of the identification image.
In one embodiment, the computer program when executed by the processor implements the steps of:
extracting the characteristics of each candidate image in the image database to obtain the characteristics of each candidate image;
searching out target characteristics matched with the characteristics of the recognition images from the characteristics of the candidate images;
and determining the candidate image corresponding to the target feature as the target image.
In one embodiment, the features of the identified image are full-body features of a pedestrian; the plurality of sub-features of the recognition image include an upper-body feature and a lower-body feature of the pedestrian.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware related to instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).
The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above examples only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.