Movatterモバイル変換


[0]ホーム

URL:


CN113392786A - Cross-domain pedestrian re-identification method based on normalization and feature enhancement - Google Patents

Cross-domain pedestrian re-identification method based on normalization and feature enhancement
Download PDF

Info

Publication number
CN113392786A
CN113392786ACN202110689585.2ACN202110689585ACN113392786ACN 113392786 ACN113392786 ACN 113392786ACN 202110689585 ACN202110689585 ACN 202110689585ACN 113392786 ACN113392786 ACN 113392786A
Authority
CN
China
Prior art keywords
feature
normalization
pedestrian
unit
nem
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110689585.2A
Other languages
Chinese (zh)
Other versions
CN113392786B (en
Inventor
殷光强
贾召钱
王文超
曾宇昊
王春雨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Electronic Science and Technology of China
Original Assignee
University of Electronic Science and Technology of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Electronic Science and Technology of ChinafiledCriticalUniversity of Electronic Science and Technology of China
Priority to CN202110689585.2ApriorityCriticalpatent/CN113392786B/en
Publication of CN113392786ApublicationCriticalpatent/CN113392786A/en
Application grantedgrantedCritical
Publication of CN113392786BpublicationCriticalpatent/CN113392786B/en
Activelegal-statusCriticalCurrent
Anticipated expirationlegal-statusCritical

Links

Images

Classifications

Landscapes

Abstract

The invention belongs to the technical field of pedestrian re-identification, and particularly relates to a cross-domain pedestrian re-identification method based on normalization and feature enhancement. According to the technical scheme, on the basis of not using target domain data, domain gaps can be effectively restrained, pedestrian distinguishing characteristics are enhanced, and further the generalization capability of the recognition network model is enhanced; by means of the residual error connection idea, the example normalization can inhibit style difference and prevent information loss, so that the extracted features have domain invariance and the discrimination is kept; spatial information is fused into the channels through the attention unit CAB, and the characteristic weight of each channel is self-adaptively adjusted through constructing the dependency relationship among the channels, so that the pedestrian characteristic is effectively enhanced.

Description

Cross-domain pedestrian re-identification method based on normalization and feature enhancement
Technical Field
The invention belongs to the technical field of pedestrian re-identification, and particularly relates to a cross-domain pedestrian re-identification method based on normalization and feature enhancement.
Background
Cross-domain pedestrian re-identification refers to retrieving a target pedestrian from large-scale image or video data over different domains using computer vision techniques. The ideal cross-domain pedestrian re-recognition model can be trained once and tested at will, namely, the model is trained only by using the collected source domain data, and then the trained model can obtain good re-recognition effect on any other target domain. However, huge domain gaps often exist among data sets, which seriously hinder the generalization of the model from a source domain to a target domain and is also a main reason that the cross-domain pedestrian re-identification performance is difficult to improve.
Because of the inevitable domain differences between different data domains, many advanced re-recognition algorithms perform well when tested on a single data set, but have poor ability to generalize to another data domain. In order to improve the generalization capability of the model as much as possible, a plurality of cross-domain pedestrian re-identification methods appear in recent years, and the model is required to be better adapted to a target domain. The general method is to collect data of a part of target domains, cluster the extracted features by using a certain clustering algorithm to generate pseudo labels, train a model by using the generated pseudo labels, update model parameters, and iterate the steps until convergence. Although many cross-domain pedestrian re-identification methods do effectively improve the generalization capability of the model, the collection of the target domain data is also time-consuming and labor-consuming, and in practical application, the data of the target domain cannot be collected at all.
Specifically, in the cross-domain pedestrian re-identification model, the domain gap of the data set is mainly introduced in the data collection process, such as: differences in collection time can cause differences in image brightness, and differences in collection locations can cause differences in image background. These different stylistic differences make the data distribution in different domains different, which in turn leads to complications in the re-recognition task. Currently, transfer learning is one of the mainstream means for solving the problem of model generalization, specifically, knowledge or patterns learned in a certain field or task are applied to different but related fields or problems, and image style conversion (style transfer) is a transfer learning method in the image field, so that the problem of model generalization caused by image style difference can be effectively solved, and researchers widely apply the transfer learning method to the task of cross-domain pedestrian re-identification. The method of style migration based on generation of a countermeasure network (GAN) requires the use of target domain data during model training, adding additional collection and training costs, and Instance Normalization (IN) is used for style normalization, which performs a form of style normalization to adjust the feature statistics of the network. However, the IN can dilute the information carried by the global statistics of the feature response, obviously, the IN introduced IN the row re-identification task can normalize the image style to inhibit the inter-domain difference, but the process of the IN can lose some judgment information.
To improve the generalization ability of the model, it is also an effective means to enhance pedestrian features using an attention mechanism that enables the model to focus on the region of interest, which is generally divided into spatial attention and channel attention. The spatial attention utilizes the spatial relationship among the features to generate a spatial attention weight so as to position the concerned pedestrian information on the spatial dimension; channel attention is to improve the representational capacity of the network by modeling the dependencies of each channel. Different attention is usually paid to different tasks, researchers need to match the tasks in specific tasks, but simple overlapping use causes certain redundancy and wastes computing resources.
Disclosure of Invention
According to the problems in the prior art, the invention provides a cross-domain pedestrian re-identification method based on normalization and feature enhancement, which can effectively inhibit domain gaps and enhance pedestrian distinguishing features on the basis of not using target domain data, thereby enhancing the generalization capability of a model.
The method is realized by the following technical scheme:
the cross-domain pedestrian re-identification method based on normalization and feature enhancement is characterized by comprising the steps of establishing an identification network model, image feature normalization, image feature recovery and image feature output;
establishing a recognition network model, wherein the establishing of the recognition network model comprises establishing a normalization enhancement module NEM with an Instance normalization unit IN (instant normalization 1), a residual weight training unit CMS and an attention unit CAB, and inserting the normalization enhancement module NEM into a ResNet50 model by taking a ResNet50 model as a backbone network to form a recognition network model;
the image feature normalization comprises the following steps:
s11, extracting pedestrian image features x ∈ R based on ResNet50 modelc×h×w(Note: x, x in this embodiment)1、x2All image features), where x is the input feature of the normalization enhancement module NEM, c is the number of channels of the image features, h is the height of the image features, w is the width of the image features, R is the number of channels of the image featuresc×h×wRepresenting a real number domain space of dimensions c x h x w, x ∈ Rc×h×wRepresenting a vector in a real number domain space with input features x of dimensions c x h x w;
s12, obtaining input characteristic x e R by using example normalization unit INc×h×wAnd a variance σ (x) in each channel, and calculates a normalized feature x based on the obtained mean μ (x) and variance σ (x)1The calculation formula is as follows:
Figure BDA0003124378280000021
wherein, gamma and beta are learnable parameter vectors, and gamma belongs to Rc and beta belongs to Rc, which means that gamma and beta are vectors in a real number domain space with c dimension; the initial values of gamma and beta elements are respectively set to be 1 and 0, and then are automatically updated in the training process;
the image feature recovery comprises the following steps:
s21, the residual error weight training unit CMS is facilitated according to the normalized feature x1Learning a residual weight WrNamely, the following steps are provided:
Wr=sigmoid(mean(conv(x1)))
wherein conv (-) represents convolution, mean (-) represents global mean, sigmoid (-) represents activation function;
s22, based on the residual weight WrFusing input features x and normalized features x1And recovering discrimination information lost by the image characteristics due to style normalization, wherein a fusion formula is as follows:
x2=Wr×x1+(1-Wr)×x
wherein x is2For the restored image feature, named restored feature, and x2∈Rc×h×wExpressed is a recovered feature x2Is a vector in a real number domain space of dimensions c x h x w;
the image feature output comprises the steps of:
s31, exploring the restored features x using the attention cell CAB2The relevance between different channels in the channel and the self-adaptive extraction of the attention weight W of the channelcNamely, the following steps are provided:
Wc=ca(x2)
where ca (-) is the attention unit CAB, the channel attention weight WcMeasures the recovered feature x2The importance of the information of each channel;
s32, attention weighting W by channelcFor the recovered feature x2Filtering is carried out to enhance the characterization capability of the pedestrian characteristics, namely:
f=(Wc+1)×x2
wherein f is the output characteristic of the normalization and enhancement module NEM.
Further, the ResNet50 model comprises a Res1 unit, a Res2 unit, a Res3 unit, a Res4 unit, a Res5 unit and a Head unit which are sequentially connected in a communication mode, and normalization enhancement modules NEM are inserted into the output ends of the Res2 unit, the Res3 unit, the Res4 unit and the Res5 unit respectively.
Further, the method also includes introducing a NEM loss function into the normalized enhancement module NEM at the output end of the Res5 unit, that is, the image feature output of the normalized enhancement module NEM further includes the following steps:
s33, respectively calculating the central loss C of the input feature xxAnd output characteristic f center loss CfIn order to measure the in-class dispersion of the input feature x and the output feature f in the feature space, the calculation formula is as follows:
Figure BDA0003124378280000041
Figure BDA0003124378280000042
wherein, cxj∈RdRepresenting the class center of the jth pedestrian in the input feature x; c. Cfj∈RdRepresenting the class center of the jth pedestrian in the output characteristic f; n represents the total number of pedestrians in the data set, m represents the total number of features of the jth pedestrian, and xjiI-th feature representing j-th pedestrian in input feature x, d representing the dimension of each feature, RdA real number domain space representing d dimensions, i.e. cxjAnd cfjA vector in a real number domain space that is both d-dimensional;
s34, based on the center loss cfAnd cxEstablishing an NEM loss function, wherein the NEM loss function is as follows:
Figure BDA0003124378280000043
wherein L isNEMThe loss values calculated for input feature x and output feature f of NEM 5.
Further, in the step S11, x ∈ Rc×h×wThe carried characteristic information comprises a style and a shape; the style comprises an imaging style of the image and a clothing style of the pedestrian, and the shape is the contour shape of the pedestrian in the image.
Further, in the step S31, a channel attention weight W is obtainedcComprises thatThe following steps:
s311, along the recovered feature x2Performing maximum pooling and average pooling on the channel dimension to obtain two 1 × h × w two-dimensional matrixes, and recovering the characteristic x2Respectively carrying out element-by-element multiplication with the two 1 xhxw two-dimensional matrixes so as to respectively introduce the spatial information respectively corresponding to the two 1 xhxw two-dimensional matrixes into the recovered characteristic x2In the channel of (a);
s312, respectively performing maximum pooling and average pooling on the features corresponding to the introduced spatial information along the spatial dimension to generate two spatial aggregation masks F1And F2And F is1∈Rc×1×1,F2∈Rc×1×1(ii) a Wherein R isc×1×1Representing a real number domain space of dimensions c × 1 × 1, F1∈Rc×1×1And F2∈Rc×1×1Is represented by F1And F2Vectors in real number domain space, all of dimensions c × 1 × 1;
s313, concat operation is carried out on the two space aggregation masks, and the result obtained through the concat operation is sequentially subjected to convolution and sigmoid operation and is fused to obtain the final channel attention weight Wc
Further, the spatial information includes global information and saliency information on a space corresponding to the 1 × h × w two-dimensional matrix.
The beneficial effect that this technical scheme brought:
1) according to the technical scheme, on the basis of not using target domain data, domain gaps can be effectively restrained, pedestrian distinguishing characteristics are enhanced, and further the generalization capability of the recognition network model is enhanced; by means of the residual error connection idea, the example normalization can inhibit style difference and prevent information loss, so that the extracted features have domain invariance and the discrimination is kept; spatial information is fused into the channels through the attention unit CAB, and the characteristic weight of each channel is self-adaptively adjusted through constructing the dependency relationship among the channels, so that the pedestrian characteristic is effectively enhanced.
2) According to the technical scheme, NEM loss function constraint is introduced to identify the invariant features of the network model learning domain, so that the distance in the feature class is reduced, and the feature distribution is optimized.
Drawings
The foregoing and following detailed description of the invention will be apparent when read in conjunction with the following drawings, in which:
FIG. 1 is a block diagram of the overall structure of a pedestrian re-identification model as described herein;
fig. 2 is a block diagram of the structure of the normalization enhancement module NEM;
FIG. 3 is a block diagram of the attention unit CAB;
fig. 4 is a comparison graph of the effect of different combinations of insertions of the normalized enhancement module NEM in the ResNet 50;
Detailed Description
The technical solutions for achieving the objects of the present invention are further illustrated by the following specific examples, and it should be noted that the technical solutions claimed in the present invention include, but are not limited to, the following examples.
Example 1
The embodiment discloses a cross-domain pedestrian re-identification method based on normalization and feature enhancement, and as a basic implementation scheme of the invention, the method comprises the steps of establishing an identification network model, normalizing image features, recovering image features and outputting the image features.
Establishing a recognition network model, including establishing a normalized enhancement module NEM with an example normalization unit IN (namely IN IN FIG. 2), a residual weight training unit CMS (namely CMS IN FIG. 2) and an attention unit CAB (namely CA IN FIG. 2) as shown IN FIG. 2, and taking a ResNet50 model as a backbone network, and inserting the normalized enhancement module NEM into a ResNet50 model to form the recognition network model.
Normalizing the image features, namely normalizing the style of the features by calculating the mean and variance in each channel of the image features, so that the style difference between different domains can be inhibited, and the method specifically comprises the following steps:
s11, extracting pedestrian image features x ∈ R based on ResNet50 modelc×h×wThe carried characteristic information, wherein x is the input characteristic of the normalization enhancement module NEM, c is the channel number of the image characteristic, h is the channel height of the image characteristic, w is the channel width of the image characteristic, Rc×h×wRepresenting a real number domain space of dimensions c x h x w, x ∈ Rc×h×wIt is shown that the input feature x is a vector in a real number domain space of dimensions c × h × w, and x ∈ Rc×h×wThe carried characteristic information comprises a style and a shape; the style comprises the imaging style of the image and the clothing style of the pedestrian, and the shape is the contour shape of the pedestrian in the image;
s12, obtaining input characteristic x e R by using example normalization unit INc×h×wAnd a variance σ (x) in each channel, and calculates a normalized feature x based on the obtained mean μ (x) and variance σ (x)1The calculation formula is as follows:
Figure BDA0003124378280000061
wherein μ (x) and σ (x) represent an average value and a variance value calculated over a spatial dimension (h × w) of the image feature, respectively; both γ and β are learnable parameter vectors, and γ ∈ Rc、β∈RcIndicating that both γ and β are vectors in a real number domain space that is c-dimensional; the initial values of gamma and beta elements are respectively set to be 1 and 0, and then the values are automatically updated in the training process, specifically, the gamma initializes the vector with 1 and the beta initializes the vector with 0, the values of the gamma and the beta automatically change according to the gradient of back propagation in the training process, the function of the gamma and the beta is to ensure that the original learned characteristics are kept after each data is normalized, and simultaneously, the normalization operation and the accelerated training can be completed.
Although image feature normalization helps reduce style variation resulting in inter-domain gaps, if the style itself contains pedestrian re-recognition discrimination information, it may also result in significant information loss while eliminating the style variation. For example, clothing of pedestrians is important re-identification discrimination information, the texture of clothing fabric obviously belongs to one of styles, and when the style is inhibited, the discrimination of the feature is weakened, so that the image feature normalization can inhibit style difference and prevent information loss by means of a residual error connection idea, and meanwhile, the extracted feature has domain invariance and maintains discrimination. The image feature recovery method is specifically realized by image feature recovery, and comprises the following steps:
s21, the residual error weight training unit CMS is facilitated according to the normalized feature x1Learning a residual weight WrNamely, the following steps are provided:
Wr=sigmoid(mean(conv(x1)))
where conv (-) represents convolution, mean (-) represents global mean, sigmoid (-) represents activation function, i.e., feature x is first normalized1Passing through a convolution layer with convolution kernel size of 3 × 3 × c, step length of 2 and output channel of 1, normalizing feature x1Compressing the contained information in the dimensions of space and channels, then calculating the mean value in each channel, further compressing the space information, and finally obtaining the residual error weight W between 0 and 1 after sigmoid mappingrI.e. residual weights Wr∈R1
S22, based on the residual weight WrFusing input features x and normalized features x1And recovering discrimination information lost by the image characteristics due to style normalization, wherein a fusion formula is as follows:
x2=Wr×x1+(1-Wr)×x
wherein x is2For the restored image feature, named restored feature, and x2∈Rc×h×wExpressed is a recovered feature x2Is a vector in a real number domain space of dimensions c x h x w.
Since spatial information is gradually compressed and pedestrian-related information is gradually shifted to channel dimensions in the feature extraction process (referring to the overall feature extraction process, more than one link) by the ResNet50 module, it is necessary to enhance pedestrian features by means of channel attention, that is, the image feature output includes the following steps:
s31, exploring the restored features x using the attention cell CAB2The relevance between different channels enables the attention to be focused on the most meaningful part of the pedestrian image, and the attention weight W of the channel is extracted in a self-adaptive mannercNamely, the following steps are provided:
Wc=ca(x2)
where ca (-) is the attention unit CAB, the channel attention weight WcMeasure x2The importance of the information of each channel;
s32, attention weighting W by channelcFor the recovered feature x2Filtering is carried out to enhance the characterization capability of the pedestrian characteristics, namely:
f=(Wc+1)×x2
wherein f is the output characteristic of the normalization and enhancement module NEM.
According to the technical scheme, on the basis of not using target domain data, domain gaps can be effectively restrained, pedestrian distinguishing characteristics are enhanced, and further the generalization capability of the recognition network model is enhanced; by means of the residual error connection idea, the example normalization can inhibit style difference and prevent information loss, so that the extracted features have domain invariance and the discrimination is kept; spatial information is fused into the channels through the attention unit CAB, and the characteristic weight of each channel is self-adaptively adjusted through constructing the dependency relationship among the channels, so that the pedestrian characteristic is effectively enhanced.
Example 2
The embodiment discloses a cross-domain pedestrian re-identification method based on normalization and feature enhancement, which is a preferred implementation scheme of the invention, namely in theembodiment 1, a ResNet50 model comprises a Resl unit, a Res2 unit, a Res3 unit, a Res4 unit, a Res5 unit and a Head unit which are sequentially connected in a communication mode, a normalization enhancement module NEM is inserted after each Res unit or part of Res units of a ResNet50 model, and the normalization enhancement module NEM can respectively enhance features in relevant stages, so that the overall effect is good. In the ResNet50 model, the features obtained by the operation of the Res1 unit are too shallow and basically do not contain semantic information such as styles, and the normalized enhancement module NEM is inserted after the Res1 unit to play a role in feature enhancement, so that the complexity of the model is further increased, and therefore, the normalized enhancement module NEM is not inserted after the Res1 unit in the process of designing the identified network model.
Specifically, the effect of the NEM is best after the Res units are inserted into the NEM, and verification can be carried out according to experiments. As shown in fig. 4: NEM23 indicates the insertion of normalized-enhancement-module NEM at the output of Res2 and Res3 cells, NEM234 indicates the insertion of normalized-enhancement-module NEM at the output of Res2, Res3 and Res4 cells, NEM2345 indicates the insertion of normalized-enhancement-module NEM at the output of Res2, Res3, Res4 and Res5 cells, NEM345 indicates the insertion of normalized-enhancement-module NEM at the output of Res3, Res4 and Res5 cells, NEM45 indicates the insertion of normalized-enhancement-module NEM at the output of Res4 and Res5 cells, respectively; in addition, M, D, MS in the abscissa represents three pedestrian re-identification common data sets of Market1501, DukeMTMC-reiD and MSMT17 respectively; M-D represents training a model on a Market1501, then carrying out pedestrian re-identification test on the trained model on a DukeMTMC-reiD, and so on, wherein D-M, MS-M and MS-D are the same principle; the ordinate represents the mAP accuracy. As can be seen from fig. 4, NEM2345 has the best effect and can effectively enhance the cross-domain pedestrian re-recognition performance of the model, so that normalization enhancing modules NEM, such as NEM2, NEM3, NEM4 and NEM5 shown in fig. 1, are inserted into the output ends of Res2 unit, Res3 unit, Res4 unit and Res5 unit, respectively. Thus, the network model identification method in the technical scheme comprises the following working procedures: the Res2 unit of the ResNet50 model extracts the image characteristics of the original image, and the image characteristics are normalized, restored and output through NEM2, and then the image characteristics are sent to: and extracting deeper features of pedestrians from Res3 units of the ResNet50 model, and continuing image feature normalization, image feature recovery and image feature output processing on the features of Res3 units by NEM3, and so on, wherein the Res4 units and NEM4 and Res5 units and NEM5 are the same principle until the Head unit of the ResNet50 model is finally output.
Example 3
This example discloses a cross-domain pedestrian re-identification method based on normalization and feature enhancement, which is a preferred implementation of the present invention, that is, in example 2, in order to promote better clustering characteristics of features, the method further includes introducing a NEM loss function into the normalization and enhancement module NEM at the output end of the Res5 unit, and constraining the normalization and enhancement module NEM (i.e., NEM5), where it is expected that features extracted by NEM5 have better domain invariance and discriminability, and therefore, the image feature output of the normalization and enhancement module NEM further includes the following steps:
s33, respectively calculating the central loss C of the input feature xxAnd output characteristic f center loss CfIn order to measure the in-class dispersion of the input feature x and the output feature f in the feature space, the calculation formula is as follows:
Figure BDA0003124378280000091
Figure BDA0003124378280000092
wherein, cxj∈RdRepresenting the class center of the jth pedestrian in the input feature x; c. Cfj∈RdRepresenting the class center of the jth pedestrian in the output characteristic f; n represents the total number of pedestrians in the data set, m represents the total number of features of the jth pedestrian, and xjiI-th feature representing j-th pedestrian in input feature x, d representing the dimension of each feature, RdA real number domain space representing d dimensions, i.e. cxjAnd cfjVectors in real number domain space, both d-dimensional;
s34, based on the center loss cfAnd cxEstablishing an NEM loss function, wherein the NEM loss function is as follows:
Figure BDA0003124378280000093
wherein L isNEMThe loss values calculated for input feature x and output feature f ofNEM 5.
According to the technical scheme, NEM loss function constraint is introduced to identify the invariant features of the network model learning domain, so that the distance in the feature class is reduced, and the feature distribution is optimized.
Example 4
This example discloses a cross-domain pedestrian re-identification method based on normalization and feature enhancement, which is a preferred embodiment of the present invention, that is, in step S31 of example 1, as shown in fig. 3, a channel attention weight W is obtainedcThe method comprises the following steps:
s311, along the recovered feature x2Performing maximum pooling and average pooling on the channel dimension to obtain two 1 × h × w two-dimensional matrixes, and recovering the characteristic x2Respectively carrying out element-by-element multiplication with the two 1 xhxw two-dimensional matrixes so as to respectively introduce the spatial information respectively corresponding to the two 1 xhxw two-dimensional matrixes into the recovered characteristic x2In the channel of (a);
s312, in order to effectively calculate the attention of the channel, the spatial dimension of the relevant features needs to be compressed, generally, average pooling is used for aggregation of the spatial information to pay more attention to the global information, however, the maximum pooling can also obtain the unique features of the pedestrian to infer more detailed information on the channel, so that the features corresponding to the introduced spatial information are respectively subjected to maximum pooling and average pooling along the spatial dimension to generate two spatial aggregation masks F1And F2And F is1∈Rc×1×1,F2∈Rc×1×1The masks respectively focus on global information and unique information about pedestrians in the feature map; wherein R isc×h×1Representing a real number domain space of dimensions c × 1 × 1, F1∈Rc×1×1And F2∈Rc×1×1Is represented by F1And F2Vectors in real number domain space, all of dimensions c × 1 × 1;
s313, concat (vector splicing) operation is carried out on the two space aggregation masks, and the result obtained through the concat operation is sequentially subjected to convolution and sigmoid operation and is fused to obtain the final channel attention weight Wc

Claims (6)

1. The cross-domain pedestrian re-identification method based on normalization and feature enhancement is characterized by comprising the steps of establishing an identification network model, image feature normalization, image feature recovery and image feature output;
the identification network model building method comprises the steps of building a normalization enhancement module NEM with an instance normalization unit IN, a residual weight training unit CMS and an attention unit CAB, using a ResNet50 model as a main network, and inserting the normalization enhancement module NEM into a ResNet50 model to form an identification network model;
the image feature normalization comprises the following steps:
s11, extracting pedestrian image features x ∈ R based on ResNet50 modelc×h×wThe carried characteristic information, wherein x is the input characteristic of the normalization enhancement module NEM, c is the channel number of the image characteristic, h is the height of the image characteristic, w is the width of the image characteristic, Rc×h×wRepresenting a real number domain space of dimensions c x h x w, x ∈ Rc×h×wRepresenting a vector in a real number domain space with input features x of dimensions c x h x w;
s12, obtaining input characteristic x e R by using example normalization unit INc×h×wAnd a variance σ (x) in each channel, and calculates a normalized feature x based on the obtained mean μ (x) and variance σ (x)1The calculation formula is as follows:
Figure FDA0003124378270000011
where γ and β are both learnable parameter vectors, and γ ∈ Rc、β∈RcIndicating that both γ and β are vectors in a real number domain space that is c-dimensional; the initial values of gamma and beta elements are respectively set to be 1 and 0, and then are automatically updated in the training process;
the image feature recovery comprises the following steps:
s21, the residual error weight training unit CMS is facilitated according to the normalized feature x1Learning a residual weight WrNamely, the following steps are provided:
wr=sigmoid(mean(conv(x1)))
wherein conv (-) represents convolution, mean (-) represents global mean, sigmoid (-) represents activation function;
S22, based on the residual weight WrFusing input features x and normalized features x1And recovering discrimination information lost by the image characteristics due to style normalization, wherein a fusion formula is as follows:
x2=Wr×x1+(1-Wr)×x
wherein x is2For the restored image feature, named restored feature, and x2∈Rc×h×wExpressed is a recovered feature x2Is a vector in a real number domain space of dimensions c x h x w;
the image feature output comprises the steps of:
s31, exploring the restored features x using the attention cell CAB2The relevance between different channels in the channel and the self-adaptive extraction of the attention weight W of the channelcNamely, the following steps are provided:
Wc=ca(x2)
where ca (-) is the attention unit CAB, the channel attention weight WcMeasure x2The importance of the information of each channel;
s32, attention weighting W by channelcFor the recovered feature x2Filtering is carried out to enhance the characterization capability of the pedestrian characteristics, namely:
f=(Wc+1)×x2
wherein f is the output characteristic of the normalization and enhancement module NEM.
2. The cross-domain pedestrian re-identification method based on normalization and feature enhancement as claimed in claim 1, wherein: the ResNet50 model comprises a Res1 unit, a Res2 unit, a Res3 unit, a Res4 unit, a Res5 unit and a Head unit which are sequentially connected in a communication mode, and normalization enhancement modules NEM are inserted into the output ends of the Res2 unit, the Res3 unit, the Res4 unit and the Res5 unit respectively.
3. The cross-domain pedestrian re-identification method based on normalization and feature enhancement as claimed in claim 2, wherein: further comprising introducing a NEM loss function in a normalized enhancement module NEM at the output of said Res5 cell, i.e. the image feature output of the normalized enhancement module NEM further comprises the steps of:
s33, respectively calculating the central loss C of the input feature xxAnd output characteristic f center loss CfIn order to measure the in-class dispersion of the input feature x and the output feature f in the feature space, the calculation formula is as follows:
Figure FDA0003124378270000021
Figure FDA0003124378270000022
wherein, cxj∈RdRepresenting the class center of the jth pedestrian in the input feature x; c. Cfj∈RdRepresenting the class center of the jth pedestrian in the output characteristic f; n represents the total number of pedestrians in the data set, m represents the total number of features of the jth pedestrian, and xjiI-th feature representing j-th pedestrian in input feature x, d representing the dimension of each feature, RdA real number domain space representing d dimensions, i.e. cxjAnd cfjVectors in real number domain space, both d-dimensional;
s34, based on the center loss cfAnd cxEstablishing an NEM loss function, wherein the NEM loss function is as follows:
Figure FDA0003124378270000023
wherein L isNEMThe loss values calculated for input feature x and output feature f of NEM 5.
4. The cross-domain pedestrian re-identification method based on normalization and feature enhancement as claimed in claim 1, wherein: in the step S11, x ∈ Rc×h×wThe carried characteristic information comprises a style and a shape; the pattern includes an imageThe shape of the imaging style of the pedestrian is the contour shape of the pedestrian in the image.
5. The cross-domain pedestrian re-identification method based on normalization and feature enhancement as claimed in claim 1, wherein in the step S31, the channel attention weight W is obtainedcThe method comprises the following steps:
s311, along the recovered feature x2Performing maximum pooling and average pooling on the channel dimension to obtain two 1 × h × w two-dimensional matrixes, and recovering the characteristic x2Respectively carrying out element-by-element multiplication with the two 1 xhxw two-dimensional matrixes so as to respectively introduce the spatial information respectively corresponding to the two 1 xhxw two-dimensional matrixes into the recovered characteristic x2In the channel of (a);
s312, respectively performing maximum pooling and average pooling on the features corresponding to the introduced spatial information along the spatial dimension to generate two spatial aggregation masks F1And F2And F is1∈Rc×1×1,F2∈Rc×1×1(ii) a Wherein R isc×1×1Representing a real number domain space of dimensions c × 1 × 1, F1∈Rc×1×1And F2∈Rc×1×1Is represented by F1And F2Vectors in real number domain space, all of dimensions c × 1 × 1;
s313, concat operation is carried out on the two space aggregation masks, and the result obtained through the concat operation is sequentially subjected to convolution and sigmoid operation and is fused to obtain the final channel attention weight Wc
6. The cross-domain pedestrian re-identification method based on normalization and feature enhancement as claimed in claim 5, wherein the spatial information includes global information and saliency information on a corresponding space of a1 x h x w two-dimensional matrix.
CN202110689585.2A2021-06-212021-06-21 Cross-domain person re-identification method based on normalization and feature enhancementActiveCN113392786B (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN202110689585.2ACN113392786B (en)2021-06-212021-06-21 Cross-domain person re-identification method based on normalization and feature enhancement

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN202110689585.2ACN113392786B (en)2021-06-212021-06-21 Cross-domain person re-identification method based on normalization and feature enhancement

Publications (2)

Publication NumberPublication Date
CN113392786Atrue CN113392786A (en)2021-09-14
CN113392786B CN113392786B (en)2022-04-12

Family

ID=77623278

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN202110689585.2AActiveCN113392786B (en)2021-06-212021-06-21 Cross-domain person re-identification method based on normalization and feature enhancement

Country Status (1)

CountryLink
CN (1)CN113392786B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN115410223A (en)*2022-08-172022-11-29河北工业大学 A Domain Generalized Person Re-Identification Method Based on Invariant Feature Extraction
CN117994822A (en)*2024-04-072024-05-07南京信息工程大学Cross-mode pedestrian re-identification method based on auxiliary mode enhancement and multi-scale feature fusion

Citations (6)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN106815838A (en)*2017-01-222017-06-09晶科电力有限公司A kind of method and system of the detection of photovoltaic module hot spot
US20200285896A1 (en)*2019-03-092020-09-10Tongji UniversityMethod for person re-identification based on deep model with multi-loss fusion training strategy
CN111739036A (en)*2020-07-222020-10-02吉林大学 A hyperspectral-based detection method for document handwriting forgery
CN111832514A (en)*2020-07-212020-10-27内蒙古科技大学 Method and device for unsupervised pedestrian re-identification based on soft multi-label
CN112069920A (en)*2020-08-182020-12-11武汉大学 Cross-domain pedestrian re-identification method based on attribute feature-driven clustering
CN112200764A (en)*2020-09-022021-01-08重庆邮电大学Photovoltaic power station hot spot detection and positioning method based on thermal infrared image

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN106815838A (en)*2017-01-222017-06-09晶科电力有限公司A kind of method and system of the detection of photovoltaic module hot spot
US20200285896A1 (en)*2019-03-092020-09-10Tongji UniversityMethod for person re-identification based on deep model with multi-loss fusion training strategy
CN111832514A (en)*2020-07-212020-10-27内蒙古科技大学 Method and device for unsupervised pedestrian re-identification based on soft multi-label
CN111739036A (en)*2020-07-222020-10-02吉林大学 A hyperspectral-based detection method for document handwriting forgery
CN112069920A (en)*2020-08-182020-12-11武汉大学 Cross-domain pedestrian re-identification method based on attribute feature-driven clustering
CN112200764A (en)*2020-09-022021-01-08重庆邮电大学Photovoltaic power station hot spot detection and positioning method based on thermal infrared image

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
XIAN ZHONG .ETC: ""Grayscale Enhancement Colorization Network for Visible-infrared Person Re-identification"", 《IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY (EARLY ACCESS)》*
YE LI .ETC: ""End-to-end Network Embedding Unsupervised Key Frame Extraction for Video-based Person Re-identification"", 《11TH INTERNATIONAL CONFERENCE ON INFORMATION SCIENCE AND TECHNOLOGY (ICIST)》*
YE LI .ETC: ""Triplet online instance matching loss for person re-identification"", 《NEUROCOMPUTING》*
吴睿智 等: ""基于图卷积神经网络的位置语义推断"", 《电子科技大学学报》*
张伟信 等: ""基于残差网络的特征加权行人重识别研究"", 《微电子学与计算机》*

Cited By (2)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN115410223A (en)*2022-08-172022-11-29河北工业大学 A Domain Generalized Person Re-Identification Method Based on Invariant Feature Extraction
CN117994822A (en)*2024-04-072024-05-07南京信息工程大学Cross-mode pedestrian re-identification method based on auxiliary mode enhancement and multi-scale feature fusion

Also Published As

Publication numberPublication date
CN113392786B (en)2022-04-12

Similar Documents

PublicationPublication DateTitle
CN113112416B (en) A semantically guided face image restoration method
CN110728209A (en)Gesture recognition method and device, electronic equipment and storage medium
CN114758288A (en) A kind of distribution network engineering safety management and control detection method and device
CN110533024B (en)Double-quadratic pooling fine-grained image classification method based on multi-scale ROI (region of interest) features
CN112434655A (en)Gait recognition method based on adaptive confidence map convolution network
CN118230175B (en)Real estate mapping data processing method and system based on artificial intelligence
CN110175248B (en) A face image retrieval method and device based on deep learning and hash coding
CN110633624B (en) A machine vision human abnormal behavior recognition method based on multi-feature fusion
CN112232184A (en) A multi-angle face recognition method based on deep learning and spatial transformation network
CN113052017B (en)Unsupervised pedestrian re-identification method based on multi-granularity feature representation and domain self-adaptive learning
CN105138998A (en)Method and system for re-identifying pedestrian based on view angle self-adaptive subspace learning algorithm
CN112990316A (en)Hyperspectral remote sensing image classification method and system based on multi-saliency feature fusion
CN113392724B (en)Remote sensing scene classification method based on multi-task learning
CN113420731A (en)Model training method, electronic device and computer-readable storage medium
CN113326748B (en) A Neural Network Behavior Recognition Method Using Multidimensional Correlation Attention Model
CN114972904B (en) A zero-shot knowledge distillation method and system based on adversarial triplet loss
CN114677558B (en) An object detection method based on histogram of oriented gradients and improved capsule network
CN111311702A (en)Image generation and identification module and method based on BlockGAN
CN111401149B (en)Lightweight video behavior identification method based on long-short-term time domain modeling algorithm
CN118097360B (en) Image fusion method based on salient feature extraction and residual connection
CN113392786A (en)Cross-domain pedestrian re-identification method based on normalization and feature enhancement
Hua et al.Polarimetric SAR image classification based on ensemble dual-branch CNN and superpixel algorithm
CN114724245A (en)CSI-based incremental learning human body action identification method
CN111754637A (en) A Large-scale 3D Face Synthesis System with Sample Similarity Suppression
CN116189281B (en) End-to-end human behavior classification method and system based on spatiotemporal adaptive fusion

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination
GR01Patent grant
GR01Patent grant

[8]ページ先頭

©2009-2025 Movatter.jp