Disclosure of Invention
The embodiment of the invention aims to provide a target tracking method and a target tracking system integrating significant information and multi-granularity context characteristics, so as to avoid the situation that a target is lost when shielding, deformation, rotation and the like occur in the process of tracking the target.
The specific technical scheme is as follows:
In a first aspect of the present invention, there is first provided a target tracking system that fuses salient information and multi-granularity context features, including a twin sub-neural network, a multi-branch fusion module, a global context module, an attention-seeking module, a depth cross-correlation module, and a target location determination module, wherein:
The twin sub-neural network is used for acquiring a template picture and a search picture, extracting a plurality of features of the template picture as template branch features and extracting a plurality of features of the search picture as search branch features, wherein the template picture comprises appearance information of a target to be tracked;
the multi-branch fusion module is used for obtaining template characteristics of the template picture according to the template branch characteristics;
the global context module is used for obtaining the searching characteristics of the searching pictures according to the searching branch characteristics;
the attention force diagram module is used for obtaining an attention force diagram of the search feature and an attention force diagram of the template feature according to the search feature and the template feature;
The depth cross-correlation module is used for carrying out depth cross-correlation on attention force diagram of the template features and attention force diagram of the search features to obtain a score diagram;
And the target position determining module is used for classifying and regressing the score map and determining the position of the target in the search picture.
In a second aspect of the present invention, there is provided a target tracking method incorporating salient information and multi-granularity contextual features, the method being applied to a twin neural network, the method comprising:
the method comprises the steps of obtaining a template picture and a search picture, extracting a plurality of features of the template picture as template branch features, and extracting a plurality of features of the search picture as search branch features, wherein the template picture contains appearance information of a target to be tracked;
obtaining template characteristics of the template picture according to the template branch characteristics;
obtaining search features of the search pictures according to the search branch features;
obtaining attention force diagrams of the search features and attention force diagrams of the template features according to the search features and the template features;
Performing depth cross-correlation on the attention map of the template characteristic and the attention map of the search characteristic to obtain a score map;
and classifying and regressing the score map, and determining the position of the target in the search picture.
Optionally, the twin neural network includes a twin sub neural network, the obtaining a template picture and a search picture, extracting a plurality of features of the template picture as template branch features, and extracting a plurality of features of the search picture as search branch features includes:
obtaining a template picture and a search picture through the twin-sub neural network, wherein the size of the search picture is larger than that of the template picture;
Inputting the template picture into ResNet network of the twin sub-neural network to obtain vector convolution operation characteristic, two-dimensional matrix convolution operation characteristic, three-dimensional matrix convolution operation characteristic, four-dimensional matrix convolution operation characteristic and five-dimensional matrix convolution operation characteristic of the template picture as ft1、ft2、ft3、ft4、ft5 respectively, and taking the vector convolution operation characteristic, the two-dimensional matrix convolution operation characteristic, the three-dimensional matrix convolution operation characteristic, the four-dimensional matrix convolution operation characteristic and the five-dimensional matrix convolution operation characteristic of the template picture as template branch characteristics;
Inputting the search picture into the twin sub-neural network ResNet network to obtain vector convolution operation characteristics, two-dimensional matrix convolution operation characteristics, three-dimensional matrix convolution operation characteristics, four-dimensional matrix convolution operation characteristics and five-dimensional matrix convolution operation characteristics of the search picture, wherein the vector convolution operation characteristics, the two-dimensional matrix convolution operation characteristics, the three-dimensional matrix convolution operation characteristics, the four-dimensional matrix convolution operation characteristics and the five-dimensional matrix convolution operation characteristics are fs1、fs2、fs3、fs4、fs5 respectively and serve as search branch characteristics.
Optionally, the twin neural network includes a multi-branch fusion module, the obtaining the template feature of the template picture according to the template branch feature includes:
Carrying out channel compression on the ft3、ft4、ft5 characteristic of the template branch characteristic to obtain a characteristic fn3、fn4、fn5;
Ft2 features of the template branch features are subjected to the multi-branch fusion module to obtain features fn2 containing different receptive fields;
And Fn3、fn4、fn5 is added with Fs2 respectively and the center cutting operation is carried out, so that the template characteristic Ft3、Ft4、Ft5 of the template picture is obtained.
Optionally, the twin neural network includes a global context module, and the obtaining the search feature of the search picture according to the search branch feature includes:
Carrying out channel compression on the fs3、fs4、fs5 features of the search branch features to obtain features fm3、fm4、fm5;
The Fm3、fm4、fm5 feature is passed through the global context module to obtain a search feature Fs3、Fs4、Fs5.
Optionally, the twin neural network includes an attention profile module, and the obtaining the attention profile of the search feature and the attention profile of the template feature according to the search feature and the template feature includes:
inputting the template feature Ft3、Ft4、Ft5 and the search feature Fs3、Fs4、Fs5 into a self-attention module and a cross-attention module of the attention seeking module respectively to obtain attention seeking of the template featureAnd attention seeking diagrams of search features
Optionally, the twin neural network includes a deep cross-correlation module, and performing deep cross-correlation on the attention map of the template feature and the attention map of the search feature to obtain a score map, including:
Through the depth cross-correlation module, the attention of the template features is soughtAnd an attention seeking graph of the search featureAnd respectively performing deep cross-correlation operation to obtain a score map phi3、φ4、φ5.
Optionally, the twin neural network includes a target location determining module, where the classifying and regressing the score map determines a location of the target in the search picture, including:
inputting the score map phi3、φ4、φ5 into a classification branch and a regression branch of the target position determining module respectively;
the score map phi3、φ4、φ5 is subjected to convolution with the convolution kernel size of 1 multiplied by 1 and the step length of 1 through classification branches to obtain the characteristic of 2K of channel numberRespectively associating preset weight values withMultiplying to obtain classification featuresThe classification features comprise foreground and background features of the target in the search picture;
The score map phi3、φ4、φ5 is subjected to convolution with the convolution kernel size of 1 multiplied by 1 and the step length of 1 through regression branches to obtain the characteristic of 4k of channel numberRespectively associating preset weight values withMultiplying to obtain regression featuresThe regression feature comprises a feature of the target;
According to classification characteristicsAnd regression characteristicsAnd determining the position of the target in the search picture.
In yet another aspect of the embodiment of the present invention, there is also provided an electronic device including a processor, a communication interface, a memory, and a communication bus, where the processor, the communication interface, and the memory complete communication with each other through the communication bus;
A memory for storing a computer program;
and the processor is used for realizing any one of the target tracking methods integrating the significant information and the multi-granularity context characteristics when executing the programs stored in the memory.
In yet another aspect of the present invention, there is also provided a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements any of the above-described target tracking methods that incorporate salient information and multi-granularity context features.
In yet another aspect of the invention, there is also provided a computer program product comprising instructions which, when run on a computer, cause the computer to perform a method of object tracking incorporating salient information and multi-granularity context features as any of the above.
The embodiment of the invention provides a target tracking system integrating significant information and multi-granularity context characteristics, which comprises a twin sub-neural network, a multi-branch fusion module, a global context module, an attention map module, a depth cross-correlation module and a target position determination module. The system can acquire a template picture and a search picture through a twin sub-neural network, extract a plurality of features of the template picture as template branch features, extract a plurality of features of the search picture as search branch features, acquire the template features of the template picture according to the template branch features through a multi-branch fusion module, acquire the search features of the search picture according to the search branch features through a global context module, acquire attention map of the search features and attention map of the template features according to the search features and the template features through an attention map module, perform depth cross-correlation on the attention map of the template features and the attention map of the search features through a depth cross-correlation module, and perform classification and regression operation on the score map through a target position determination module to determine the position of a target in the search picture. The system enhances the accuracy of template feature extraction through the multi-branch fusion module, and enriches the relation between the search feature and the template feature through the attention seeking module. The situation that the target is lost possibly is avoided when the conditions such as shielding, deformation, rotation and the like occur in the process of tracking the target.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
In the prior art, a target tracking algorithm based on a twin neural network has the problems of insufficient extraction of the characteristics of a template, lack of connection between video frames in the process of tracking the target, and the like. When the conditions of shielding, deformation, rotation and the like occur in the process of tracking the target, the condition of losing the target can occur.
In order to solve the above problems, the embodiment of the invention provides a target tracking system integrating significant information and multi-granularity context features. The target tracking system provided by the embodiment of the invention can comprise:
The twin sub-neural network is used for acquiring a template picture and a search picture, extracting a plurality of features of the template picture as template branch features and extracting a plurality of features of the search picture as search branch features;
the multi-branch fusion module is used for obtaining template characteristics of the template picture according to the template branch characteristics;
The global context module is used for obtaining the searching characteristics of the searching pictures according to the searching branch characteristics;
the attention force diagram module is used for obtaining an attention force diagram of the search feature and an attention force diagram of the template feature according to the search feature and the template feature;
The depth cross-correlation module is used for carrying out depth cross-correlation on attention force diagram of the template features and attention force diagram of the search features to obtain a score diagram;
And the target position determining module is used for classifying and regressing the obtained score map and determining the position of the target in the search picture.
The target tracking system provided by the embodiment of the invention can enhance the accuracy of template feature extraction through the multi-branch fusion module, and enrich the relation between the search feature and the template feature through the attention-seeking model. The situation that the target is lost possibly is avoided when the conditions such as shielding, deformation, rotation and the like occur in the process of tracking the target.
Referring to fig. 1, fig. 1 is a flowchart of a target tracking method integrating salient information and multi-granularity context features, which is applied to a twin neural network and provided in an embodiment of the present invention, the method may include the following steps:
s101, acquiring a template picture and a search picture, extracting a plurality of features of the template picture as template branch features, and extracting a plurality of features of the search picture as search branch features.
S102, obtaining template characteristics of the template picture according to the template branch characteristics.
And S103, obtaining the search feature of the search picture according to the search branch feature.
S104, obtaining attention force diagrams of the search features and attention force diagrams of the template features according to the search features and the template features.
S105, carrying out deep cross-correlation on the attention map of the template characteristic and the attention map of the search characteristic to obtain a score map.
S106, classifying and regressing the score map, and determining the position of the target in the search picture.
The template picture comprises appearance information of a target to be tracked, and the search picture is a picture comprising the target.
According to the target tracking method for fusing the salient information and the multi-granularity context features, which is provided by the embodiment of the invention, the accuracy of template feature extraction can be enhanced through the multi-branch fusion module, and the relation between the search features and the template features is enriched through the attention map module. The situation that the target is lost possibly is avoided when the conditions such as shielding, deformation, rotation and the like occur in the process of tracking the target.
Referring to fig. 2, fig. 2 is a flow chart of a target tracking method integrating salient information and multi-granularity context features according to an embodiment of the present invention.
In one implementation, the size of the input template picture (TEMPLATEIMAGE) may be 127×127×3, the width and height of the input template picture may be 127×127 pixel channel number 3, the size of the input search picture (SEARCHIMAGE) may be 255×255×3, and the width and height of the input search picture may be 255×255 pixel channel number 3. Template pictures and search pictures are respectively input into a template branch and a search branch, and the two branches are ResNet networks sharing parameters. The resulting 5-dimensional matrix convolution operation of Resnet of the template and search branches is characterized by ft1、ft2、ft3、ft4、ft5 and fs1、fs2、fs3、fs4、fs5.ft1、ft2、ft3、ft4、ft5 sizes of 61×61×64、31×31×256、15×15×512、15×15×1024、15×15×2048.fs1、fs2、fs3、fs4、fs5, 125×125×64, 61×61×256, 31×31×512, 31×31×1024, 31×31×2048, respectively.
In one embodiment, the twin neural network comprises a twin sub-neural network, and step S101 includes:
Step one, obtaining a template picture and a search picture through a twin sub-neural network.
Inputting the template picture into a ResNet network of the twin sub-neural network to obtain vector convolution operation characteristics, two-dimensional matrix convolution operation characteristics, three-dimensional matrix convolution operation characteristics, four-dimensional matrix convolution operation characteristics and five-dimensional matrix convolution operation characteristics of the template picture, wherein the vector convolution operation characteristics, the two-dimensional matrix convolution operation characteristics, the three-dimensional matrix convolution operation characteristics, the four-dimensional matrix convolution operation characteristics and the five-dimensional matrix convolution operation characteristics are ft1、ft2、ft3、ft4、ft5 respectively and serve as template branch characteristics.
Inputting the search picture into a twin sub-neural network ResNet network to obtain vector convolution operation characteristics, two-dimensional matrix convolution operation characteristics, three-dimensional matrix convolution operation characteristics, four-dimensional matrix convolution operation characteristics and five-dimensional matrix convolution operation characteristics of the search picture, wherein the vector convolution operation characteristics, the two-dimensional matrix convolution operation characteristics, the three-dimensional matrix convolution operation characteristics, the four-dimensional matrix convolution operation characteristics and the five-dimensional matrix convolution operation characteristics are fs1、fs2、fs3、fs4、fs5 respectively and serve as search branch characteristics.
The size of the search picture is larger than the size of the template picture.
In one embodiment, the twin neural network comprises a multi-branch fusion module, step S102 comprising:
Step one, carrying out channel compression on the ft3、ft4、ft5 characteristic of the template branch characteristic to obtain a characteristic fn3、fn4、fn5.
Step two, the ft2 characteristic of the template branch characteristic is subjected to a multi-branch fusion module to obtain a characteristic fn2 containing different receptive fields;
and thirdly, adding Fn3、fn4、fn5 with Fn2 respectively and performing center cutting operation to obtain template characteristics Ft3、Ft4、Ft5 of the template picture.
In one implementation, the ft3、ft4、ft5 features of the search branch feature are convolved by 1X1, and the number of channels is compressed to 256, so that the sizes of the features fn3、fn4、fn5.fn3、fn4、fn5 are 15×15×256. The template feature Ft3、Ft4、Ft5 of the template picture has a size of 7×7×256.
Referring to fig. 3, fig. 3 is a block diagram of a multi-branch fusion module according to an embodiment of the present invention.
The multi-branch fusion module comprises the working steps of inputting the two-dimensional matrix convolution operation characteristics into a two-step convolution kernel (the two-step convolution kernel has the size of 3×3 and the step size of 1), and outputting a characteristic map (hereinafter referred to as a first characteristic map) with the size of 31×31×128 through two-step convolution operation. And secondly, inputting the first characteristic diagram into two branches, wherein the input characteristics of the first branch are kept unchanged, and the other branch is a convolution sub-network with the same two steps. Through the operation, the characteristics of different receptive fields of the two-dimensional matrix convolution operation characteristics can be obtained, and deeper characteristics containing a plurality of semantic information of the target can be obtained. And thirdly, connecting the characteristics of the two branches. Finally, a downsampling operation is performed to obtain a refined feature map fn2.
Fn2 is added with fn3、fn4、fn5 respectively, and then center cutting operation is carried out, wherein the center cutting operation is specifically shown as the following formula (1):
Ft3=Crop(fn3+fn2)
Ft4=Crop(fn4+fn2)(1)
Ft5=Crop(fn5+fn2)
in one embodiment, the twin neural network comprises a global context module, step S103 comprising:
Step one, performing channel compression on the fs3、fs4、fs5 features of the search branch features to obtain features fm3、fm4、fm5.
Step two, the Fm3、fm4、fm5 features pass through a global context module to obtain search features Fs3、Fs4、Fs5.
In one implementation, the fs3、fs4、fs5 features of the search branch feature are convolved by 1X1, and the number of channels is compressed to 256, so that the sizes of the features fm3、fm4、fm5.fm3、fm4、fm5 are 15×15×256. The size to search feature Fs3、Fs4、Fs5 is 31 x 256.
Referring to fig. 4, fig. 4 is a block diagram of a global context module provided by an embodiment of the present invention.
The global context module includes three parts, a context Wen Jian die module, a transform sub-module, and a fusion sub-module. It is assumed that x and z represent the input and output of the global context module, respectively,Np represents the number of elements in the feature map.
The global context operation can be represented by equation (2) where W1、W2 and W3 represent the weight coefficients of the three convolutions of kernel size in fig. 4, respectively, LN () represents the layer normalization function (Layer Normalization) for normalization and ReLu () represents the piecewise linear function for single-sided suppression.
In one embodiment, the twin neural network includes an attention seeking module, and step S104 is specifically:
The template feature Ft3、Ft4、Ft5 and the search feature Fs3、Fs4、Fs5 are respectively input into a self-attention module and a cross-attention module of the attention seeking module to obtain the attention seeking of the template featureAnd attention seeking diagrams of search features
Referring to fig. 5, fig. 5 is a block diagram of an attention seeking module according to an embodiment of the present invention.
As shown in fig. 5, the attention seeking module includes a self-attention module and a cross-attention module. To learn finer semantic features from space and channels, self-attention and cross-attention sub-networks are proposed. As shown in fig. 5, there are 4 dashed boxes from above and below, wherein the contents of the first and fourth dashed boxes represent self-attention, and the contents of the second and third dashed boxes represent cross-attention. In detail, the template feature Z and the search feature X are respectively denoted, wherein the feature sizes of Z and X are c×h×w and c×h×w, respectively.
The self-attention module consists of spatial attention and channel attention.
For the spatial attention, first, the search feature X is divided into spatial locations to obtainWherein Xi,j∈RC×1×1 corresponds to the spatial position (i, j) parameter, and secondly, compressing the channel by using a 1X 1 convolution with the corresponding formula Q=Wsq X, wherein Wsq∈RC×1×1×1 is the parameter of the convolution kernel, resulting in Q εRH×W, while the value of each spatial position of Q can be expressed asThen, a feature with spatial information is generatedAnd finally, giving X-to a leachable parameter alpha and adding the leachable parameter alpha with the original characteristic X to obtain a final characteristic Xsa, wherein sigma () is a sigmoid activation function, and the final characteristic Xsa is shown in the following formula (3):
for channel attention, first, the input feature X is divided by the number of channelsXi∈RH×W, and secondly, the global averaging pooling operation operates on the space to produce a vector V ε RC×1×1, where the value of the kth channel can be obtained by the following equation (4):
again, V is compressed and expanded using two convolution operations to obtain
Wherein, theAndParameters corresponding to two convolution kernels respectively, and then obtaining a characteristic diagram with channel aggregation characteristicsWherein σ () is a sigmoid activation function, and finallyA learnable parameter beta is given and added with the original characteristic X to obtain a final channel characteristic diagram Xca, as shown in the following formula (5):
Cross attention module
For the search branch, template feature Z and search feature X are input into a cross-attention module, which is located in the second and third dashed boxes in FIG. 5. Then the template features are subjected to global average pooling and two 1×1 convolution operations to finally obtain a channel feature mapWherein C is the number of channels, secondly, toPerforming sigmoid function activation operation and multiplying the initial characteristic xi to obtain a preliminary characteristicThe following formula (6):
Again, the corresponding features are computed to compute the final cross-attention feature mapThen, cross attention profile Xcro is defined by the sum of XThe following was obtained:
λ in the above formula (7) is a parameter that can be learned.
The attention search feature map is effectively obtained by merging features Xsa、Xca and Xcro in parallel by element-wise summation operations. The corresponding attention feature among the template features is obtained the same as the search branch.
In one embodiment, the twin neural network includes a deep cross correlation module, and step S105 is specifically:
Through the depth cross-correlation module, attention of template features is soughtAnd attention seeking diagrams of search featuresAnd respectively performing deep cross-correlation operation to obtain a score map phi3、φ4、φ5.
In one implementation, attention patterns of template features are presentedAnd attention seeking diagrams of search featuresPerforming depth cross-correlation operation to obtain phi3, and the like to obtain phi4 and phi5.
In one embodiment, the twin neural network includes a target position determination module, step S106 comprising:
Step one, the score map phi3、φ4、φ5 is respectively input into a classification branch and a regression branch of the target position determining module.
Step two, the score map phi3、φ4、φ5 is subjected to convolution with the convolution kernel size of 1 multiplied by 1 and the step length of 1 through classification branches to obtain the characteristic of 2K of channel numberRespectively associating preset weight values withMultiplying to obtain classification featuresThe classification features include foreground and background features of the object in the search picture.
Step three, the score map phi3、φ4、φ5 is subjected to convolution with the convolution kernel size of 1 multiplied by 1 and the step length of 1 through regression branches to obtain the characteristic of 4k of channel numberRespectively associating preset weight values withMultiplying to obtain regression featuresThe regression features contain features of the target.
Step four, according to the classification characteristicsAnd regression characteristicsAnd determining the position of the target in the search picture.
In one implementation, one mayRespectively multiplied by different learnable weighting coefficients, i.e.Can be used forRespectively multiplied by different learnable weighting coefficients, i.e.
Referring to fig. 6, fig. 6 is a precision and success rate test chart of the target tracking method according to the embodiment of the present invention.
The object tracking method integrating the significant information and the multi-granularity context characteristics and the object tracking method of the nine other main streams provided by the embodiment of the invention are tested on the OTB2015 data set to obtain an accuracy test chart (a) and a success rate test chart (b). As can be seen from fig. 6, the target tracking method provided by the embodiment of the invention is optimal in accuracy and success rate.
Referring to fig. 7, fig. 7 is an EAO value test chart of the object tracking method according to the embodiment of the present invention.
The EAO value test chart obtained by testing the target tracking method which is provided by the embodiment of the invention and is fused with the salient information and the multi-granularity context characteristics on the VOT2019 data set is compared with the current mainstream target tracking method, wherein the larger the EAO value is, the better the evaluation effect is. As can be seen from fig. 7, the object tracking method of the present invention performs optimally among a plurality of object tracking methods.
Referring to fig. 8, fig. 8 is an EAO value test chart of the target tracking method according to the embodiment of the present invention for various situations.
The object tracking method integrating the significant information and the multi-granularity context features provided by the embodiment of the invention is tested with other main stream object tracking methods in VOT2019 to obtain EAO values under various conditions, wherein each condition comprises camera movement (cameramotion), occlusion (occlusion), scale change (sizechange), illumination change (illuminationchange) and motion change (motion change). Referring to fig. 8, it can be seen that the object tracking method of the present invention exhibits better performance in the face of camera movement, illumination variation, and motion variation.
Referring to fig. 9, fig. 9 is a diagram of a representative visual result of a target tracking method according to an embodiment of the present invention.
Fig. 9 shows representative visual results of different object tracking methods on the OTB2015 dataset, i.e. 10 representative video sequences were selected in the OTB2015 dataset and the object tracking method of the present invention was compared with other mainstream object tracking methods in these video sequences. The mainstream target tracking method for comparison comprises Ocean, MDNet, daSiamRPN, ATOM, siamRPN ++ and SiamBAN. The object tracking method of the present invention shows more accurate tracking accuracy when faced with the cases of motion blur, rapid motion, and low-resolution object tracking. For example, in the three video sequences BlurOwl, soccer and DragonBaby, some of the target tracking methods suffer, but the target tracking method of the present invention is more robust in tracking while maintaining higher accuracy. Meanwhile, as can be seen from other video sequences in the graph, the target tracking method also shows better tracking performance when facing the target tracking conditions such as rotation, scale change, deformation, shielding and the like.
The embodiment of the invention also provides an electronic device, as shown in fig. 10, which comprises a processor 1001, a communication interface 1002, a memory 1003 and a communication bus 1004, wherein the processor 1001, the communication interface 1002 and the memory 1003 complete communication with each other through the communication bus 1004,
A memory 1003 for storing a computer program;
the processor 1001 is configured to implement any of the above-described target tracking methods that combine salient information and multi-granularity context features when executing a program stored on the memory 1003.
The communication bus mentioned above for the electronic device may be a Peripheral component interconnect standard (Peripheral ComponentInterconnect, PCI) bus or an extended industry standard architecture (ExtendedIndustry StandardArchitecture, EISA) bus, or the like. The communication bus may be classified as an address bus, a data bus, a control bus, or the like. For ease of illustration, the figures are shown with only one bold line, but not with only one bus or one type of bus.
The communication interface is used for communication between the electronic device and other devices.
The memory may include random access memory (RandomAccessMemory, RAM) or may include Non-volatile memory (Non-VolatileMemory, NVM), such as at least one disk memory. Optionally, the memory may also be at least one memory device located remotely from the aforementioned processor.
The processor may be a general-purpose processor including a central processing unit (CentralProcessing Unit, CPU), a network processor (NetworkProcessor, NP), etc., or may be a digital signal processor (DigitalSignalProcessor, DSP), an Application specific integrated circuit (Application SpecificIntegratedCircuit, ASIC), a Field-Programmable gate array (Field-Programmable GATEARRAY, FPGA) or other Programmable logic device, discrete gate or transistor logic device, discrete hardware components.
In yet another embodiment of the present invention, there is also provided a computer readable storage medium having stored therein a computer program which, when executed by a processor, implements the steps of any of the above-described target tracking methods incorporating salient information and multi-granularity contextual features.
In yet another embodiment of the present invention, there is also provided a computer program product containing instructions that, when run on a computer, cause the computer to perform any of the above embodiments of target tracking methods that incorporate salient information and multi-granularity contextual features.
In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, produces a flow or function in accordance with embodiments of the present invention, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in or transmitted from one computer-readable storage medium to another, for example, by wired (e.g., coaxial cable, optical fiber, digital Subscriber Line (DSL)), or wireless (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains an integration of one or more available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid state disk SolidStateDisk (SSD)), etc.
It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises an element.
In this specification, each embodiment is described in a related manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for system, electronic device, and computer-readable storage medium embodiments, since they are substantially similar to method embodiments, the description is relatively simple, with reference to the description of method embodiments in part.
The foregoing description is only of the preferred embodiments of the present invention and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention are included in the protection scope of the present invention.