Summary of the invention
The purpose of the present invention is to provide a kind of method for tracking target, device, electronic equipment and computer-readable storage mediumsMatter, it is intended in more new template, in conjunction with trace model feed back result and present frame in objective image quality come make whetherThe judgement of update, promotes the precision of tracking result, and avoids complicated calculation amount.
To achieve the above object, the present invention provides a kind of method for tracking target, which comprises
Based on current frame image, cut to obtain search image;
Described search image and target template image are input to network trace model, obtain confidence value and coordinate frame,Wherein, the network trace model is the trace model formed based on CNN and RPN, and the current frame image is video to be processedIn corresponding any one video frame images;
According to the confidence value and the coordinate frame, first object quantity coordinate frame is chosen as the first coordinate frame;
According to historical trace value and the candidate frame, tracking result is obtained;
The corresponding confidence value of the tracking result is obtained, and judges the confidence value whether in the first preset rangeIt is interior;
If so, carrying out image cropping to current frame image according to the target template image;
In the case where the clarity of image after judgement is cut is not less than default clarity, by the current video frame imageCorresponding cutting image replaces with new target template image.
It is described that image sanction is carried out to current frame image according to the target template image in a kind of implementation of the inventionThe step of cutting, comprising:
Obtain the space-number of video frame corresponding to the current video frame and target template image;
When the space-number is greater than preset value, image sanction is carried out to current frame image according to the target template imageIt cuts.
It is described according to the confidence value and the coordinate frame in a kind of implementation of the invention, choose first objectThe step of quantity coordinate frame is as the first coordinate frame, comprising:
The confidence value is ranked up according to sequence from big to small;
First object quantity confidence value is successively chosen, by coordinate frame corresponding to the first quantity confidence valueAs tracking target candidate frame.
In a kind of implementation of the invention, according to historical trace value and the candidate frame, the step of obtaining tracking result,Include:
Calculate the distance of the first coordinate frame and the second coordinate frame, wherein the second coordinate frame is the first video frameCoordinate frame corresponding to the tracking result of image, the first video frame images are the previous video frame figure of the current video frame imagePicture;
The candidate frame IoU value being calculated is sorted from large to small, the second quantity candidate frame conduct is sequentially obtainedNew tracking target candidate frame;
According to historical trace coordinate frame, new tracking target candidate frame is filtered, obtains the movement rail of tracking targetMark prediction;
Choose wherein tracking result of the maximum respective coordinates frame of confidence level as present frame.
In a kind of implementation of the invention, the method also includes:
The tracking result of present frame is added in historical trace value;
Judge whether the number of historical trace value is greater than the second predetermined number;
If so, successively deleting and time longest pursuit gain being added according to the time is added.
In a kind of implementation of the invention, in the first aim template that target template image is the video to be processedIn the case where image, the preparation method of the target template image includes:
Video to be processed is obtained, and is cut from the first frame image of the video to be processed and obtains target template image.
In a kind of implementation of the invention, the method also includes:
When judging the confidence value not in the first preset range, processing is not updated to target template image.
In addition, the invention also discloses a kind of target tracker, described device includes:
First cuts module, for being based on current frame image, is cut to obtain search image;
First obtains module, for described search image and target template image to be input to network trace model, obtainsConfidence value and coordinate frame, wherein the network trace model is the trace model formed based on CNN and RPN, the present frameImage is any one video frame images corresponding in video to be processed;
Module is chosen, for first object quantity coordinate frame being chosen and being made according to the confidence value and the coordinate frameFor the first coordinate frame;
Second obtains module, for obtaining tracking result according to historical trace value and the candidate frame;
Judgment module, for obtaining the corresponding confidence value of the tracking result, and judge the confidence value whetherIn first preset range;
Second cut module, for the judging result of the judgment module be in the case where, according to the target mouldPlate image carries out image cropping to current frame image;
Replacement module will be described in the case that the clarity for the image after judgement cutting is not less than default clarityCutting image corresponding to current video frame image replaces with new target template image.
And a kind of electronic equipment is disclosed, including memory, processor and storage on a memory and can handledThe computer program run on device, the processor realize any one method for tracking target when executing the computer programThe step of.
A kind of computer readable storage medium is also disclosed, computer program, the computer program quilt are stored thereon withThe step of any one method for tracking target is realized when processor executes.
Therefore, using the embodiment of the present invention method for tracking target, device, electronic equipment and computer-readable storage mediumMatter is cut to obtain search image, then be input to network trace with target template image first against a current frame imageModel obtains confidence value and coordinate frame, then carries out post-processing to confidence value and coordinate frame and obtains tracking result;PrimaryAfter target following, then the corresponding confidence value of tracking result is obtained, to current when confidence value is in the first preset rangeFrame image carries out image cropping;In the case where the clarity of image after judgement is cut is not less than default clarity, forward sight will be worked asCutting image corresponding to frequency frame image replaces with new target template image.By acquisition tracking result each time, useOnline updating template-policy, in object tracking process, whether judge templet, which needs, updates, can be better by template renewalAdaptation is made to the various change of tracking target.Therefore, using the embodiment of the present invention, the result fed back in conjunction with trace model withAnd objective image quality makes the judgement whether updated in present frame, promotes the precision of tracking result, and avoids complicatedCalculation amount.
Specific embodiment
Illustrate embodiments of the present invention below by way of specific specific example, those skilled in the art can be by this specificationOther advantages and efficacy of the present invention can be easily understood for disclosed content.The present invention can also pass through in addition different specific realitiesThe mode of applying is embodied or practiced, the various details in this specification can also based on different viewpoints and application, without departing fromVarious modifications or alterations are carried out under spirit of the invention.
Referring to FIG. 1-2, it should be noted that diagram provided in the present embodiment only the invention is illustrated in a schematic wayBasic conception, only shown in schema then with related component in the present invention rather than component count, shape when according to actual implementationShape and size are drawn, when actual implementation kenel, quantity and the ratio of each component can arbitrarily change for one kind, and its component clothOffice's kenel may also be increasingly complex.
As depicted in figs. 1 and 2, the embodiment of the invention provides a kind of method for tracking target, the method includes the steps such asUnder:
S101 is based on current frame image, is cut to obtain search image.
During video frequency object tracking, video format to be tracked can be AVI, MP4, MKV etc..Current video frameIt is the video frame images analyzed, can be any one video frame images in video to be processed.It is regarded in former frameFrequency after treatment can obtain tracking result, then can work as according to the tracking result of former frame for subsequent video frameCutting obtains search image after expanding a certain range in prior image frame, and the expansion of specific current frame image finds search image and isExisting process, this will not be repeated here for the embodiment of the present invention.
It should be noted that due in video object run have front and back continuity, so the mesh between before and after frames imageCursor position will not deviate too greatly, and can estimate target in the position of present frame, but not according to the tracking result of previous frame image isAccurate location.Position is estimated based on this again and amplifies a certain range, image is cut in the current frame and obtains search image, then pass throughNetwork processes match target template and search image, obtain the accurate location of target in the current frame.
Described search image and target template image are input to network trace model, obtain confidence value and seat by S102Mark frame, wherein the network trace model is the trace model formed based on CNN and RPN, and the current frame image is to be processedAny one corresponding video frame images in video.
In practical application, according to the target that the needs being provided previously track, in video first frame image after expanded scopeCutting obtains template image, to obtain initial target template image.Using second video frame images as current video frame figureAs for, after obtaining search image in second video frame images, as shown in Fig. 2, by target template image and search graphAs being separately input in neural network CNN, then the result of CNN is input in RPN, exports confidence value and seat via RPNMark frame.
It should be noted that CNN network is made of multilayer convolutional layer, after image inputs network, by every layer of convolutional layerThe characteristic pattern that can all obtain multichannel is calculated, and obtained characteristic pattern can input next convolutional layer again and new feature is calculatedFigure.There are many classical network structures, such as AlexNet, VGG, Inception, ResNet etc. for CNN network.It is with AlexNetExample, be of five storeys convolutional layer altogether, wherein interspersed activation primitive (ReLU), pond layer (Max Pooling), full articulamentum etc..It is omittingIn network in the case where full articulamentum, network output is 256 channel 6x6 size characteristic figures.
Each characteristic point of the parameters such as the given characteristic pattern size of RPN network meeting foundation, scaling, scale in characteristic patternThe upper Anchor for generating fixed number.The multi-channel feature figure of CNN network output has two-way branch after RPN networkOutput.It is classification branch all the way, outputs the confidence value of each corresponding A nchor, is in addition to return branch all the way, outputsThe coordinate regressand value of each corresponding A nchor.
It should be noted that corresponding to a confidence value for each coordinate frame, that is to say, that confidence level and coordinateFrame is one-to-one relationship.
S103 chooses first object quantity coordinate frame and sits as first according to the confidence value and the coordinate frameMark frame.
It should be noted that coordinate frame be it is multiple, a fixed quantity is exported by using deep neural network CNNBounding box, i.e. coordinate frame.
In a kind of implementation of the invention, for the method for the selection use of coordinate frame are as follows: pressed to the confidence valueIt is ranked up according to sequence from big to small;According to coordinate frame conduct corresponding to successively selection first object quantity confidence valueTrack target candidate frame.
After carrying out confidence value sequence, first object quantity confidence level before being chosen according to descending sequenceValue, and since confidence level and coordinate frame are one-to-one relationships, it can be obtained by and the first quantity confidence value pairThe the first quantity coordinate frame answered, and using the first quantity each and every one coordinate frame as target candidate frame.
S104 obtains tracking result according to historical trace value and the candidate frame.
In the embodiment of the present invention, the distance of the first coordinate frame and the second coordinate frame is calculated, wherein described second sitsCoordinate frame corresponding to the tracking result that frame is the first video frame images is marked, the first video frame images are the current video frame figureThe previous video frame images of picture;The candidate frame IoU value being calculated is sorted from large to small, the second quantity is sequentially obtainedCandidate frame is as new tracking target candidate frame;According to historical trace coordinate frame, new tracking target candidate frame is filtered,Obtain the motion profile prediction of tracking target;Choose wherein tracking knot of the maximum respective coordinates frame of confidence level as present frameFruit.
It should be noted that the coordinate frame distance in the embodiment of the present invention is measured using IoU, degree of overlapping IoU isIntersection over Union is that a kind of measure concentrates a standard of detection respective objects accuracy in specific data,It is a simple measurement standard.IoU score is the standard performance measurement of object type segmentation problem, in given one group of image,IoU measurement gives the similitude between the estimation range of the object present in this group of image and ground truth region, that is, countsCalculate the similitude of the first coordinate frame and the second coordinate frame.
Then IoU value is ranked up and retains preceding second quantity candidate frame as new tracking target candidate frame.ThenIn conjunction with historical trace coordinate frame information, newly-generated tracking target candidate frame is further filtered.It is sat according to historical traceFrame information is marked, in conjunction with video temporal characteristics, motion profile prediction can be carried out to tracking target, is further chosen wherein after filteringBest tracking result of the maximum respective coordinates frame of confidence level as present frame.
In addition to use IoU, can also using other as coordinate frame central point distance, coordinate frame overlapping area account for previous frame withThe ratio etc. of track result coordinate frame.
Further, the best tracking result of present frame is added in historical trace information table and is used to subsequent video frame,Ensure tracking information number in historical trace information table simultaneously.After the second predetermined number, when new tracking information is added, deleteExcept the tracking information being added at first, hold list length is constant.
S105 obtains the corresponding confidence value of the tracking result, and judges whether the confidence value is default firstIn range;If so, otherwise executing step S106 is not carried out the execution of template renewal.
The output best tracking result of present frame in part is post-processed by tracking result, while judging the best tracking result of present frameWhether confidence level is in the first preset range, if present frame tracking result confidence level is relatively low, illustrates to track target that have very much canThe situation of active or mistake can be gone out.In this case more new template will cause subsequent tracking network characteristic error, cause withTrack procedure failure.Meanwhile if present frame tracking result confidence level is very high, illustrate to track target and template image veryMatch, there is no too big variation, can not have to consider template renewal, the first preset range is selected as in actual application[0.75,0.95]。
S106 carries out image cropping to current frame image according to the target template image.
Tracking result according to current frame image is obtained according to cutting after template image expanded scope in current frame imageTo new candidate template image.
It will be appreciated that if the replacement of target template image imitates tracking while excessively frequently increasing calculation amountCorresponding benefit is not brought in terms of the raising of fruit, therefore the specific implementation of step S106 includes: that acquisition is described currentThe space-number of video frame corresponding to video frame and target template image;When the space-number is greater than preset value, according to describedTarget template image carries out image cropping to current frame image.
In a kind of implementation of the invention, whether the interval between more current frame number and last time update frame number surpassesThe space-number threshold value I_th for crossing a setting does not need more new template if the interval time between two frames is too short, considers performanceBalance between efficiency is selected as 50 in space-number threshold value I_th test process.
In actual use, if interval is more than I_th between current frame image and last time more new frame image, by present frameSerial number is saved as latest update frame number, solves serial number overflow problem in use, operating using remainder in frame number.
S107, in the case where the clarity of image after judgement is cut is not less than default clarity, by the current videoCutting image corresponding to frame image replaces with new target template image.
Meet after carrying out image cropping, image definition quality evaluation is carried out to the candidate template image after cutting.AndThe template image clarity used now is compared, to guarantee that template image definition quality does not have too many decline.FigureBrenner, Tenengrad, SMD2, energy gradient function etc. can be used in image sharpness quality evaluating method.In candidate templateImage definition quality meets the clarity requirement of template image, then updating candidate template image is next frame video image processingNew template image.
The result of tracking network model output is post-processed, in order to further promote the accurate of tracking resultProperty and tracking process robustness.Pass through last handling process, it is possible to reduce target is because block, the various factors such as background interference are madeAt tracking lose the case where.Tracking result post-processing part can be further improved on the basis of combining history timing informationThe tracking result of present frame reduces tracking error.But post-process the confidence level and seat of partial dependency tracking network model outputFrame is marked, situations such as the transformation of tracking target scene, target deformation, online template renewal part through the embodiment of the present invention is logicalIt crosses more new strategy judge templet to upgrade demand, by template renewal, can reduce because tracking target opposite formwork image occursTransformation and to the increased interference of tracking network model.
A kind of target tracker is also disclosed, described device includes:
First cuts module, for being based on current frame image, is cut to obtain search image;
First obtains module, for described search image and target template image to be input to network trace model, obtainsConfidence value and coordinate frame, wherein the network trace model is the trace model formed based on CNN and RPN, the present frameImage is any one video frame images corresponding in video to be processed;
Module is chosen, for first object quantity coordinate frame being chosen and being made according to the confidence value and the coordinate frameFor the first coordinate frame;
Second obtains module, for obtaining tracking result according to historical trace value and the candidate frame;
Judgment module, for obtaining the corresponding confidence value of the tracking result, and judge the confidence value whetherIn first preset range;
Second cut module, for the judging result of the judgment module be in the case where, according to the target mouldPlate image carries out image cropping to current frame image;
Replacement module will be described in the case that the clarity for the image after judgement cutting is not less than default clarityCutting image corresponding to current video frame image replaces with new target template image.
And a kind of electronic equipment is disclosed, including memory, processor and storage on a memory and can handledThe computer program run on device, the processor realize any one method for tracking target when executing the computer programThe step of.
A kind of computer readable storage medium is also disclosed, computer program, the computer program quilt are stored thereon withThe step of any one method for tracking target is realized when processor executes.
The above-described embodiments merely illustrate the principles and effects of the present invention, and is not intended to limit the present invention.It is any ripeThe personage for knowing this technology all without departing from the spirit and scope of the present invention, carries out modifications and changes to above-described embodiment.CauseThis, institute is complete without departing from the spirit and technical ideas disclosed in the present invention by those of ordinary skill in the art such asAt all equivalent modifications or change, should be covered by the claims of the present invention.