Movatterモバイル変換


[0]ホーム

URL:


CN109255382A - For the nerve network system of picture match positioning, method and device - Google Patents

For the nerve network system of picture match positioning, method and device
Download PDF

Info

Publication number
CN109255382A
CN109255382ACN201811046086.6ACN201811046086ACN109255382ACN 109255382 ACN109255382 ACN 109255382ACN 201811046086 ACN201811046086 ACN 201811046086ACN 109255382 ACN109255382 ACN 109255382A
Authority
CN
China
Prior art keywords
picture
frame
region
layer
convolution
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201811046086.6A
Other languages
Chinese (zh)
Other versions
CN109255382B (en
Inventor
巢林林
徐娟
褚崴
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Advanced New Technologies Co Ltd
Advantageous New Technologies Co Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding LtdfiledCriticalAlibaba Group Holding Ltd
Priority to CN201811046086.6ApriorityCriticalpatent/CN109255382B/en
Publication of CN109255382ApublicationCriticalpatent/CN109255382A/en
Priority to TW108123369Aprioritypatent/TWI701608B/en
Priority to PCT/CN2019/098984prioritypatent/WO2020048273A1/en
Application grantedgrantedCritical
Publication of CN109255382BpublicationCriticalpatent/CN109255382B/en
Activelegal-statusCriticalCurrent
Anticipated expirationlegal-statusCritical

Links

Classifications

Landscapes

Abstract

This specification embodiment provides a kind of nerve network system for picture match positioning that computer executes.First convolutional network of nerve network system, the second convolutional network, combination layer and frame return layer, wherein the first convolutional network carries out process of convolution to the first picture and pondization operates, obtain the first eigenvector that dimension corresponding with the first picture is the first number;Second convolutional network carries out process of convolution to second picture, obtains the corresponding N number of feature vector in N number of region that second picture is included, dimension is also the first number;The first eigenvector is combined operation with N number of feature vector respectively by combination layer, obtains N number of mix vector;Frame returns layer, is at least based on N number of mix vector, and using frame regression algorithm, the information of output prediction frame in second picture, the prediction frame indicates that second picture includes the part of the image content of the first picture.

Description

For the nerve network system of picture match positioning, method and device
Technical field
This specification one or more embodiment is related to Computer Image Processing field, the more particularly, to matching of pictureNerve network system, method and apparatus with positioning.
Background technique
Artificial intelligence and machine learning have been widely used in Computer Image Processing field, intelligently carry out image pointAnalysis, comparison, matching and target identification etc., wherein the matching of image and matching positioning be usually in face of the problem of.In simple terms,Whether images match refers to judge whether two images are similar, or be same content;And images match positioning refers to, finds outPosition of the content shown in one width figure in another width figure.
Traditional matching location algorithm generally uses the segment of first traversal search all size, then right one by one to these segmentsThe mode of ratio is matched and is positioned.Such scheme time complexity is very high, and the scheme of two-step in this way is difficult to carry outUnified global optimization.
Accordingly, it would be desirable to there is improved plan, the matching positioning of image is more quickly and efficiently carried out.
Summary of the invention
This specification one or more embodiment describes the nerve network system and method for picture match positioning, fromAnd rapidly and efficiently and integrally carry out the matching and positioning of picture.
According in a first aspect, providing nerve network system that a kind of computer executes, for picture match positioning, packetIt includes:
First convolutional network, including the first convolutional layer and pond layer, first convolutional layer roll up the first pictureProduct processing, obtains the first convolution characteristic pattern corresponding with the first picture;The pond layer carries out the first convolution characteristic patternPondization operation generates the first eigenvector that dimension is the first number, wherein first picture is picture to be matched;
Second convolutional network carries out process of convolution to second picture, it is right respectively to obtain N number of region that second picture is includedThe N number of feature vector answered, N number of feature vector dimension are first number;The second picture is picture to be searched;
The first eigenvector is combined operation with N number of feature vector respectively, obtains N number of group by combination layerResultant vector;
Frame returns layer, is at least based on N number of mix vector, the information of output prediction frame, institute in second pictureIt states prediction frame and indicates that second picture includes the region of the image content of the first picture.
In one embodiment, the second convolutional network includes the second convolutional layer and feature extraction layer, wherein the second convolutional layerProcess of convolution is carried out to the second picture, obtains the second convolution characteristic pattern corresponding with second picture;Feature extraction layer is based onThe second convolution characteristic pattern extracts the corresponding N number of feature vector in N number of region.
Further, according to a kind of design, the second convolutional layer and the first convolutional layer are common convolutional layer.
According to a kind of embodiment, N number of region is according to predetermined segmentation rule, and segmentation obtains.
According to another embodiment, N number of region is by selective search algorithm, or passes through Area generation networkIt generates.
According to a kind of embodiment, the combination operation that combination layer carries out includes dot product operation.
According to a kind of possible design, it includes the first hidden layer, the second hidden layer and output layer that frame, which returns layer,;
First hidden layer determines that first picture appears in the area probability of each region in N number of region;
Second hidden layer generates alternative frame at least one area, and obtains the confidence level of each alternative frame;
The output layer predicts frame according to the area probability of each region and the confidence level of each alternative frame, outputThe information of information, the prediction frame includes the coordinate of the prediction frame, the corresponding area probability of prediction frame and confidenceDegree.
Further, in a kind of design, second hidden layer generates in the region that area probability is greater than preset thresholdAlternative frame.
In one embodiment, the output layer is by the maximum alternative frame of the product of corresponding area probability and confidence levelAs the prediction frame.
According to a kind of embodiment, nerve network system is obtained by the end-to-end training of training sample, the training sampleIncluding multiple pictures pair, each picture is labeled in the second training picture to including the first training picture and the second training pictureTarget frame, the target frame show the region that the second training picture includes the image content of the first training picture.
Further, in one embodiment, it includes the first hidden layer and the second hidden layer that frame, which returns layer,;In such situationUnder, the end-to-end training includes:
Determine that the target frame is located in N number of region of the second training picture according to the position of the target frameSpecific region, and determine according to the specific region area label of the target frame;
By first hidden layer, prediction the first training picture is located at the estimation range probability of described each region;
By second hidden layer, alternative frame is generated in each region;
Determine the friendship of each alternative frame and the target frame and ratio, the confidence level as the alternative frame;
At least it is based on the confidence level of the area label and the estimation range probability and the alternative frame, adjustmentThe network layer parameter of first hidden layer and the second hidden layer, thus the training nerve network system.
According to second aspect, a kind of method that computer executes, for picture match positioning is provided, comprising:
First process of convolution is carried out to the first picture, obtains the first convolution characteristic pattern corresponding with the first picture;Wherein instituteStating the first picture is picture to be matched;
Pondization operation is carried out to the first convolution characteristic pattern, generates the first eigenvector that dimension is the first number;
Process of convolution is carried out to second picture, obtain the corresponding N number of feature in N number of region that second picture is included toAmount, N number of feature vector dimension are first number;The second picture is picture to be searched;
The first eigenvector is combined operation with N number of feature vector respectively, obtains N number of mix vector;
It is at least based on N number of mix vector, using frame regression algorithm, the letter of output prediction frame in second pictureBreath, the prediction frame indicate that second picture includes the part of the image content of the first picture.
According to the third aspect, a kind of device for picture match positioning is provided, comprising:
First convolution unit is configured to carry out the first process of convolution to the first picture, obtains corresponding with the first picture theOne convolution characteristic pattern;Wherein first picture is picture to be matched;
Pond unit is configured to carry out the first convolution characteristic pattern pondization operation, and generating dimension is the first numberFirst eigenvector;
Second convolution unit is configured to carry out process of convolution to second picture, obtains N number of region that second picture is includedCorresponding N number of feature vector, N number of feature vector dimension are first number;The second picture is to be searchedPicture;
Assembled unit is configured to the first eigenvector being combined operation with N number of feature vector respectively, obtainTo N number of mix vector;
Predicting unit is configured at least based on N number of mix vector, using frame regression algorithm, in second pictureThe information of output prediction frame, the prediction frame indicate that second picture includes the part of the image content of the first picture.
According to fourth aspect, a kind of calculating equipment, including memory and processor are provided, which is characterized in that described to depositIt is stored with executable code in reservoir and realizes the neural network of first aspect when the processor executes the executable codeSystem.
The scheme provided by this specification embodiment realizes the quick of picture by the nerve network system of two branchesMatching positioning, selects the region comprising picture to be matched with frame frame in picture to be searched.In this process, it matches and fixedBit synchronization is realized, is improved treatment effeciency, is improved process performance.
Detailed description of the invention
In order to illustrate the technical solution of the embodiments of the present invention more clearly, required use in being described below to embodimentAttached drawing be briefly described, it should be apparent that, drawings in the following description are only some embodiments of the invention, for thisFor the those of ordinary skill of field, without creative efforts, it can also be obtained according to these attached drawings othersAttached drawing.
Fig. 1 shows the implement scene schematic diagram of one embodiment of this specification disclosure;
Fig. 2 shows the structural schematic diagrams according to the nerve network system of one embodiment;
Fig. 3 shows the structural schematic diagram of the second convolutional network according to one embodiment;
Fig. 4 shows the structural schematic diagram of the second convolutional network according to another embodiment;
Fig. 5 shows the structural schematic diagram that layer is returned according to the frame of one embodiment;
Fig. 6 shows the prediction result schematic diagram according to one embodiment;
Fig. 7 shows the method for picture match positioning according to one embodiment;
Fig. 8 shows the flow chart for determining prediction frame according to one embodiment;
Fig. 9 shows the schematic block diagram of the picture match positioning device according to one embodiment.
Specific embodiment
With reference to the accompanying drawing, the scheme provided this specification is described.
Fig. 1 is the implement scene schematic diagram of one embodiment that this specification discloses.
According to the embodiment of this specification, using the picture pair marked, as training sample one neural network of trainingModel.After the neural metwork training is good, it can for carrying out matching positioning to picture.Specifically, the neural networkThere are two branches for tool, have the structure of double branches.Figure to be matched is inputted into the first branch, figure to be searched is inputted into the second branch,Trained neural network, that is, exportable matching positioning prediction result, generally, which includes at least, with prediction sideFrame outlines the part comprising figure content to be matched in figure to be searched.Therefore, which can be matched and be determined simultaneouslyPosition directly exports the matching positioning result of picture.
In order to realize the above picture matching positioning, above-mentioned neural network in Liang Ge branch respectively to two pictures, i.e., toMatching figure and figure to be searched, carry out feature extraction, are then combined to its feature, predict frame based on assemblage characteristic.MoreSpecifically, neural network is processed into first eigenvector using its first branch for figure to be matched;For to be searchedFigure, neural network utilize the second branch, are processed into N number of feature vector corresponding with N number of region.Then, by figure to be matchedCorresponding first eigenvector N number of feature vector corresponding with N number of region of figure to be searched is respectively combined, and obtains N number of groupResultant vector is based respectively on this N number of mix vector, using the frame regression algorithm in algorithm of target detection, predicts frame, and carry outFrame returns.Then, neural network is for two pictures of input, the directly result of output matching positioning.It is described below above-mentionedThe specific structure and implementation of neural network.
Fig. 2 shows the structural schematic diagram according to the nerve network system of one embodiment, the nerve network system be used for intoThe matching of row picture positions.It is appreciated that the nerve network system can by it is any have calculate, the equipment of processing capacity,Device, platform, device clusters are realized, such as computing platform shown in FIG. 1.As shown in Fig. 2, nerve network system at least wrapsIt includes, the first convolutional network 21, the second convolutional network 22, combination layer 23 and frame return layer 24.Above each net is described belowThe implementation of network layers.
First convolutional network 21 is used to carry out characteristic processing to figure to be matched, generates corresponding feature vector, below will be toMatching figure is known as the first picture.Generally, the first picture is close shot figure or detail view.
Specifically, the first convolutional network 21 includes the first convolutional layer 211 and pond layer 212, wherein the first convolutional layer pairFirst picture carries out process of convolution, obtains the first convolution characteristic pattern corresponding with the first picture;212 pairs of first volume of pond layerProduct characteristic pattern carries out pondization operation, generates first eigenvector corresponding with the first picture.
Convolutional layer is network layer the most basic and important in convolutional neural networks CNN, for carrying out at convolution to imageReason.Process of convolution is to be analyzed image a kind of processing operation usually used.Specifically, process of convolution is rolled up using oneProduct core carries out sequence of operations to each pixel in image.Matrix when convolution kernel (operator) is for doing image procossing isThe parameter of operation is done with original image pixel.Convolution kernel is usually network (such as the matrix or pixel region of 3*3 of a squareDomain), each grid has a weighted value.Using convolution kernel to picture carry out convolutional calculation when, by convolution kernel picture pictureIt is slided in prime matrix, one step-length of every sliding, product is carried out simultaneously to the image pixel value of element each in convolution kernel and its coveringSummation, the obtained new eigenvalue matrix constitute convolution characteristic pattern, i.e. feature map.Convolution algorithm can be from originalIn the picture element matrix of picture, abstract feature is extracted, according to the design of convolution kernel, these abstract characteristics for example be can reflect,The more global feature such as the stripe shape in a region, distribution of color in original image.
In one embodiment, above-mentioned first convolutional layer 211 may include one or more convolutional layers, each convolutional layer pairImage carries out a process of convolution.It is handled by these convolutional layers, obtains the corresponding convolution characteristic pattern (feature of the first picturemap)。
In one embodiment, above-mentioned first convolutional layer 211 may include multiple convolutional layers, between this multiple convolutional layerIt further include at least one ReLU or after certain convolutional layers (The Rectified Linear Unit corrects linear unit)Excitation layer, for convolutional layer output result to be done Nonlinear Mapping.The result of Nonlinear Mapping can be entered next convolutional layerContinue process of convolution, or can be used as the output of convolution characteristic pattern.
In one embodiment, above-mentioned first convolutional layer 211 includes multiple convolutional layers, between this multiple convolutional layer, alsoIncluding at least one pond layer (pooling), for convolutional layer output result to be carried out pondization operation.The result of pondization operation canTo be entered next convolutional layer, continue convolution operation.
Skilled in the art realises that as needed, the first convolutional layer 211 can be designed as including one or more volumesLamination, and the property of can choose ReLU excitation layer and/or pond layer are added between multiple convolutional layers.In this way, the first convolutional layerAfter 211 pair of first picture carries out process of convolution, the corresponding first convolution characteristic pattern of the picture is obtained.The first convolution characteristic pattern quiltIt is input to next pond layer 212.
Pond layer 212 is used to carry out additional pondization to the first convolution characteristic pattern corresponding to the first picture to operate.The pondChanging operation may include maximum value pond, average value pond, etc..
Usually, in the matching positioning of picture, figure to be matched is close shot figure or detail view, and figure to be searched is prospect mapOr global figure, it is thus typically necessary to figure to be matched is carried out " diminution ", to be compared with each region of figure to be searched pointAnalysis.Here, the first picture is figure to be matched, therefore in the first convolutional network 21, is obtained by 212 pairs of process of convolution of pond layerThe first convolution characteristic pattern carry out the operation of additional pondization, can be used for reducing the dimension of the feature of the first picture, be convenient for subsequentIt is combined with the provincial characteristics of figure to be searched, is also convenient for subsequent network processes.Then, pond layer 212 is handled by pond,The corresponding feature vector of the first picture, referred to as first eigenvector are obtained, is indicated with Fs.It is assumed that the dimension of this feature vector is D.
On the other hand, the second convolutional network 22 is used for second picture, i.e., figure to be searched carries out process of convolution, obtains theThe corresponding N number of feature vector in N number of region that two pictures are included, and the dimension of this N number of feature vector and above-mentioned first spyIt is identical to levy vector Fs, is D dimension.
Fig. 3 shows the structural schematic diagram of the second convolutional network according to one embodiment.As shown in figure 3, the second convolution netNetwork 22 includes the second convolutional layer 221 and feature extraction layer 222.
In one embodiment, the second convolutional layer 221 carries out process of convolution to second picture, obtains corresponding with second pictureThe second convolution characteristic pattern.As needed, the second convolutional layer 221 can be designed as including one or more convolutional layers, and canSelectively to add ReLU excitation layer and/or pond layer between multiple convolutional layers.Second convolutional layer 221 to second picture intoAfter row process of convolution, the corresponding second convolution characteristic pattern of the picture is obtained.
In one embodiment, structure and the process of convolution operation of the second convolutional layer 221 and the first convolutional layer 211 are completeIt is identical.At this point, the second convolutional layer 221 and the first convolutional layer 211 can be multiplexed identical convolutional layer, share weighting parameter, change andSpeech could also say that common convolutional layer, as shown in dotted line frame in Fig. 3.
The second convolution characteristic pattern that second convolutional layer 221 obtains is input into feature extraction layer 222, this feature extract layer222 are based on the second convolution characteristic pattern, extract the corresponding N number of feature vector in N number of region that second picture is included.
In one embodiment, above-mentioned N number of region is divided according to predetermined segmentation rule.For example, in an exampleIn son, predetermined segmentation rule is, by the length of figure to be searched and it is wide carry out 4 equal parts, so that figure to be searched is divided into 4*4=16A region.
It is appreciated that between the obtained convolution characteristic pattern of process of convolution and original image, there are certain mapping relations.Therefore, existIn the case where being split according to above-mentioned segmentation rule to second picture, it is believed that the volume Two that the second convolutional layer 221 obtainsProduct characteristic pattern can also accordingly be divided into N number of region, that is to say, that the second convolution characteristic pattern can be divided into N number of sonCharacteristic pattern, each subcharacter figure correspond to a region of original image.In one embodiment, second picture can be divided into firstThen N number of region inputs the second convolutional layer 221 respectively, so that the second convolutional layer 221 respectively carries out at convolution this N number of regionReason, to respectively obtain N number of subcharacter figure.The overall of this N number of subcharacter figure constitutes the corresponding second convolution feature of second pictureFigure.In another embodiment, second picture directly can also be inputted into the second convolutional layer 221, so that the second convolutional layer 221 is to wholeA second picture carries out process of convolution, obtains the second convolution characteristic pattern, then further according to segmentation rule, by the second convolution characteristic patternIt is divided into N number of subcharacter figure.Then, feature extraction layer 222 is based on the second convolution characteristic pattern, more specifically, based on advising with segmentationThen each subcharacter figure that corresponding segmentation obtains, carries out feature extraction, so that the N number of region for obtaining second picture is correspondingN number of feature vector.
In another embodiment, N number of region of second picture is to be generated in neural network according to pre-defined algorithm.
Fig. 4 shows the structural schematic diagram of the second convolutional network according to another embodiment.In volume Two as shown in Figure 4In product network, region generation module is further comprised, for generating N number of region in second picture according to pre-defined algorithm.
In one example, whole nerve network system uses for reference R-CNN (Region CNN) net for being used for target detectionNetwork model or Fast R-CNN network model.It is all using selective search (selective in both network modelsSearch mode) generates candidate region (region proposal) or is interest region ROI (region ofIt interest), can be using the candidate region of generation as region here.More specifically, being based on original image in R-CNNCandidate region (shown in dotted line) is generated, in Fast R-CNN, candidate region is generated based on the convolution characteristic pattern of extraction.?In the case where R-CNN or Fast R-CNN, the function of above-mentioned zone generation module can also be mentioned by the second convolutional layer and featureLayer is taken to realize jointly, without being presented as individual module.
In another example, whole nerve network system uses for reference further Faster R-CNN network model,In propose Area generation network RPN (region proposal network), be exclusively used in generate or suggest candidate region.?Under such circumstances, the region generation module of Fig. 4 corresponds to Area generation network RPN, special based on the convolution after process of convolutionSign figure, generates N number of region.
In another example, whole nerve network system is based on Yolo network model, wherein second picture is divided intoA*b region, at this time N=a*b.Correspondingly, region generation module can be using the algorithm in Yolo come generating region.
Although Fig. 3 and Fig. 4 show the example of the second convolutional network, wherein the second convolutional network is divided into the second convolutionLayer and feature extraction layer, but the specific implementation of the second convolutional network is not limited to this.In one example, the second convolutional networkRegion Feature Extraction can also be carried out while carrying out process of convolution, to directly export the feature vector of each region.
Then, the corresponding N number of feature vector in N number of region and the first convolutional network 21 of the second convolutional network 22 outputThe first eigenvector Fs of first picture of output, is input into combination layer 23, is combined operation wherein.Such as preceding instituteIt states, is handled the first picture for first eigenvector Fs by the first convolutional network 21, by the second convolutional network 22 by secondPicture processing is the corresponding N number of feature vector in N number of region, and makes these vector dimensions (being D dimension) all the same, such placeReason so that, the combination operation between vector is very convenient.
Specifically, combination layer 23 is by first eigenvector Fs, N number of feature vector corresponding with N number of region of second picture,It is respectively combined operation, to obtain N number of mix vector.
In one embodiment, said combination operation includes difference being sought between the corresponding element of vector, or average.
More preferably, in one embodiment, said combination operation includes the dot product between vector, that is, corresponding elementBetween multiplication.
Specifically, it is assumed that first eigenvector Fs can be indicated are as follows:
Fs=(a1,a2,…,aD)
The corresponding N number of feature vector in N number of region of second picture is F1, F2..., FN, ith feature vector F thereiniIt can indicate are as follows:
Fi=(b1,b2,…,bD)
So, the feature vector F of first eigenvector Fs and ith zoneiThe available mix vector V of dot producti,Middle Vi=(a1*b1,a2*b2,…,aD*bD)
In this way, the mix vector of the feature vector of available first eigenvector Fs and each region, obtains N number of combinationVector V1, V2 ..., VN.
Then, these mix vectors are input into frame and return layer 24.The frame return layer 24 at least based on it is N number of combine toAmount, the information of output prediction frame in second picture, the prediction frame indicate that second picture includes the picture of the first pictureThe region of content.
It is appreciated that R-CNN above-mentioned, Fast RCNN, Faster RCNN, Yolo network model and some otherNetwork model may be used to carry out target detection.In conventional target detection, it is also required to draw in picture to be detected firstIt is divided into region, obtains the corresponding feature vector in each region, the classification of these feature vectors input network is then returned into layer, withPerformance objective detection.The task of target detection can be divided into target classification and frame returns.Wherein target classification is to target pairThe classification of elephant is predicted that frame recurrence is to determine the minimum rectangle frame (bounding box) comprising target object.
The executive mode of algorithm of target detection is used for reference, the frame in Fig. 2, which returns layer 24, can use the side in target detectionFrame regression algorithm provides prediction frame from N number of region of second picture based on N number of mix vector.
As previously mentioned, N number of mix vector is (to be searched by the feature vector and second picture of the first picture (figure to be matched)Figure) the corresponding N number of feature vector in N number of region be respectively combined, therefore, this N number of mix vector can reflect, the first pictureSimilarity between each region of second picture, in other words, this N number of mix vector can be used as, the first picture respectively withThe feature vector for N number of stacking chart that each region of second picture is overlapped.Then, it is next equivalent to this N number of superpositionThe feature vector of figure, as the feature vector for the picture region for having pending target detection, the frame carried out in target detection is returnedReturn, the obtained frame can be used as the region that second picture includes the first image content.
Fig. 5 shows the structural schematic diagram that layer is returned according to the frame of one embodiment.As shown in figure 5, frame returns layer 24It may include the first hidden layer 241, the second hidden layer 242 and output layer 243.
First hidden layer 241 be used for determine the first picture appear in each region in N number of region of second picture region it is generalRate P (R1),P(R2),…,P(RN)。
In one embodiment, above-mentioned zone probability is the probability after normalizing by softmax.Therefore, Ge GequThe sum of the area probability in domain is 1.
Then, the second hidden layer 242 at least one area, generates alternative frame using frame regression algorithm, and obtainThe confidence level of each alternative frame.It is appreciated that in the training process of frame regression algorithm the same of prediction frame can be being generatedWhen, calculate the friendship between prediction frame and mark frame and than (IoU), the friendship and the measurement than can be used as confidence level;AccordinglyGround, in forecast period, frame regression algorithm can obtain the estimated of the alternative frame similarly while generating alternative frameIoU is as its confidence level.
In one embodiment, the region of the second selection region maximum probability from each region of hidden layer 242, for the areaDomain executes frame regression algorithm, generates alternative frame.
In another embodiment, it is filtered, is picked according to area probability of the preset probability threshold value to each region firstRemove the region that area probability is lower than threshold value;Second hidden layer 242 executes frame only to remaining region in that region respectivelyRegression algorithm generates alternative frame.
In another embodiment, the second hidden layer 242 is performed both by frame regression algorithm for each region, generates alternative sideFrame.
In one embodiment, the second hidden layer 242 is raw for each region of processing by executing frame regression algorithmAt multiple alternative frames, and calculates and the confidence level of each alternative frame is provided.
In another embodiment, the second hidden layer 242 is raw for each region of processing by executing frame regression algorithmAt multiple preliminary frames, the highest frame of confidence level alternately frame is then selected from this multiple preliminary frame.
By above various modes, the first hidden layer 241 determines the area probability of each region, 242 needle of the second hidden layerTo the alternative frame of at least partly Area generation, and obtain the confidence level of each alternative frame.Then, output layer 243 is according to eachThe confidence level of the area probability in region and each alternative frame, the information of output prediction frame.
Specifically, as previously mentioned, according to the different executive modes of the second hidden layer 242, it is more that the second hidden layer 242 is possible to outputA alternative frame, this multiple alternative frame are likely located at a region, it is also possible to be located at multiple regions.Under normal circumstances, from areaThe confidence level of the alternative frame generated in the biggish region of domain probability is also higher;But also it is not excluded for the special feelings occurred once in a whileCondition.Therefore, output layer 243 comprehensively considers the area probability in the region that each alternative frame is located at, and alternative frame itselfConfidence level, select most possible frame as prediction result.
In one embodiment, the multiple alternative frames obtained for the second hidden layer 242, output layer 243 calculate separately respectivelyThe product of the confidence level of the area probability and alternative frame in the region that a alternative frame is located at selects product maximum value institute rightThe alternative frame answered, as prediction frame.
In another embodiment, output layer 243 calculate separately the area probability in the region that each alternative frame is located atAlternative frame corresponding to the sum of the confidence level of the alternative frame, selection and value maximum, as prediction frame.
In another embodiment, the region of the selection region maximum probability first of output layer 243, in this region, selection is setThe maximum alternative frame of reliability is as prediction frame.
Output layer 243 comprehensively considers area probability and confidence level as a result, exports the information of optimal prediction frame.GenerallyThe information on ground, the prediction frame of output includes at least, and predicts the position coordinates of frame.The position coordinates be typically expressed as (x, y,W, h) mode, wherein x, y show the position at frame center, and w is the width of frame, and h is the height of frame.
In one embodiment, output layer 243 also exports the area probability and/or confidence level of prediction frame, as supplementInformation.
The foregoing describe the specific examples that frame returns the implementation of layer.But frame returns the implementation of layer notIt is limited to this.For example, in one implementation, frame returns layer can include before the network layer for executing frame regression algorithmSeveral convolutional layers carry out further process of convolution to each mix vector, then just execute frame and return.Another realIn existing mode, frame, which returns layer, can also use frame regression algorithm directly against each region, alternative frame be generated, without trueDetermine the area probability of each region.In another implementation, frame returns layer by comprehensive network layer, for each areaDomain, estimation region probability simultaneously generate alternative frame wherein simultaneously.Correspondingly, frame returns the net that layer may have other differentNetwork structure.
As above, frame returns layer 24 and is based on the corresponding feature vector in each region, and the information of output prediction frame is as pre-Survey result.
Although in the above examples, combination layer 23 and frame return layer 24 and are all shown as respectively being presented as a networkLayer, but implementation is without being limited thereto.For example, yolov3 proposes the method for multi-scale prediction.It is corresponding to it, is based on yolov3Nerve network system in, can have the combination of multiple " combination layers+frame return layer ".In this case, Ke YifenNot from multiple convolutional layers in the first convolutional network and the second convolutional network, the convolution for extracting 1 or multiple convolutional layers is specialSign figure inputs corresponding " combination layer+frame returns layer " respectively and is handled.
Fig. 6 shows the prediction result schematic diagram according to one embodiment.In Fig. 6 left figure be the first picture, i.e., figure to be matched,Right side is second picture, i.e., figure to be searched.After the first picture and second picture are inputted nerve network system shown in Fig. 2,Prediction frame can be exported in second picture, which shows the image content that second picture includes the first pictureRegion.Also, as shown in fig. 6, first indicates what the prediction frame was located in the top of prediction frame there are two numberThe area probability in region, second digit indicate the confidence level (or prediction IoU) of the prediction frame.
In this way, passing through the nerve network system of two branch shown in Fig. 2, the Rapid matching positioning of picture is realized, wait searchThe region comprising picture to be matched is selected with frame frame in rope picture.In this process, matching is realized with positioning synchronous, is improvedTreatment effeciency, improves process performance.
In one embodiment, above nerve network system first passes through training sample in advance, carries out end to end joint training.For the such nerve network system of training, the training sample of use needs to include multiple pictures pair, and each picture is to including theOne training picture and second trains picture, is labeled with target frame in the second training picture, which shows the second training pictureThe region of image content comprising the first training picture.The target frame so marked can be used as reference data (GroundTruth), for training nerve network system.
Specifically, training process may include that the first training picture and the second training picture are inputted neural network respectivelyThe first convolutional network and the second convolutional network of system return layer output prediction frame by frame.By the prediction frame and markTarget frame be compared, comparison result as prediction error, carry out backpropagation, pass through gradient decline etc. modes, adjustment mindParameter through network layer each in network system, thus the training nerve network system.
In a specific embodiment, frame returns layer 24 and takes structure as shown in Figure 5, including the first hidden layer, and secondHidden layer and output layer.In this case, the process of the training nerve network system specifically includes following steps.
As previously mentioned, the first training picture and the second training picture are inputted the first convolutional network and the second convolution net respectivelyNetwork respectively obtains the corresponding feature of N number of region institute of the corresponding feature vector of the first training picture and the second training pictureVector.These feature vectors are respectively combined, the corresponding N number of mix vector in N number of region is obtained.
It is appreciated that it is labeled with target frame in the second training picture, and it therefore, can be according to the position of target frame, determining shouldTarget frame is located in which region in N number of region of the second training picture.According to the region determined, the target frame is determinedArea label.
Also, based on above-mentioned N number of mix vector, by first hidden layer, prediction the first training picture is located at describedThe estimation range probability of each region.
Then, by second hidden layer, alternative frame is generated in each region.Also, determine each alternative frameFriendship with the target frame and the confidence level than (IoU), as the alternative frame.
Then, area label and estimation range probability, and the confidence level of alternative frame, adjustment described first are at least based onHidden layer, the parameter of the second hidden layer and output layer, to carry out the training of neural network.Area label more than it is appreciated that is suitableIn the reference data of area probability, therefore, the estimation range probability and area label obtained by comparing prediction can be determinedError relevant to area probability.In addition, the friendship of alternative frame and target frame and than embodying alternative bezel locations and sizeError.Therefore, it is based on area label and estimation range probability, and the confidence level of alternative frame, above two parts can be obtainedError.Prediction error further includes the relevant error of size and location of alternative frame and target frame, such as (x, y, w, h) numerical valueError.Then, the backpropagation of error is carried out, in nerve network system so as to adjust parameter, training nerve network system.
By the above training process, the nerve network system of available two branch shown in Fig. 2, for carrying out pictureRapid matching positioning.
According to the embodiment of another aspect, it is also proposed that a kind of method of picture match positioning.Fig. 7 is shown to be implemented according to oneThe method for picture match positioning of example.This method can be executed by computer.As shown in fig. 7, method is including at least followingStep.
In step 71, the first process of convolution is carried out to the first picture, obtains the first convolution feature corresponding with the first pictureFigure;Wherein first picture is picture to be matched.
In step 72, pondization operation is carried out to the first convolution characteristic pattern, generate dimension for the first number fisrt feature toAmount.
In step 73, process of convolution is carried out to second picture, it is corresponding to obtain N number of region that second picture is includedN number of feature vector, N number of feature vector dimension are first number;The second picture is picture to be searched.
In step 74, the first eigenvector is combined operation with N number of feature vector respectively, is obtained N number ofMix vector.
In step 75, it is at least based on N number of mix vector, using frame regression algorithm, is exported in second picture pre-The information of frame is surveyed, the prediction frame indicates that second picture includes the part of the image content of the first picture.
According to one embodiment, step 73 further comprises: carrying out the second process of convolution to second picture, obtains and secondThe corresponding second convolution characteristic pattern of picture;It is then based on the second convolution characteristic pattern, it is corresponding to extract N number of regionN number of feature vector.
In one embodiment, above-mentioned second process of convolution is identical as the first process of convolution in step 71.
According to a kind of possible design, N number of region is according to predetermined segmentation rule, and segmentation obtains.
According to another design, N number of region is by selective search algorithm, or is given birth to by Area generation networkAt.
In one embodiment, the combination operation in step 74 includes dot product operation.
Fig. 8 shows the flow chart for determining prediction frame according to one embodiment, the i.e. sub-step of step 75.Such as Fig. 8 instituteShow, according to a kind of embodiment, step 75 further comprises:
Step 751, determine that first picture appears in the area probability of each region in N number of region;
Step 752, alternative frame is generated at least one area, and obtains the confidence level of each alternative frame;
Step 753, according to the confidence level of the area probability of each region and each alternative frame, the letter of output prediction frameThe information of breath, the prediction frame includes the coordinate of the prediction frame, the corresponding area probability of prediction frame and confidence level.
According to one embodiment, step 752 includes in the region that area probability is greater than preset threshold, generating alternative sideFrame.
According to one embodiment, step 753 further comprises, the product of corresponding area probability and confidence level is maximumAlternative frame is as the prediction frame.
According to a kind of embodiment, the above method is realized by nerve network system, and the nerve network system passes through instructionPractice the end-to-end training of sample to obtain, the training sample includes multiple pictures pair, each picture to include the first training picture andSecond trains picture, is labeled with target frame in the second training picture, it includes the first training which, which shows the second training picture,The region of the image content of picture.
Further, in a kind of possible design, the end-to-end training includes:
Determine that the target frame is located in N number of region of the second training picture according to the position of the target frameSpecific region, and determine according to the specific region area label of the target frame;
Prediction the first training picture is located at the estimation range probability of described each region;
Alternative frame is generated in each region;
Determine the friendship of each alternative frame and the target frame and ratio, the confidence level as the alternative frame;
At least it is based on the confidence level of the area label and the estimation range probability and the alternative frame, adjustmentThe network layer parameter of first hidden layer and the second hidden layer, thus the training nerve network system.
According to the embodiment of another aspect, a kind of device for picture match positioning is also provided.Fig. 9 is shown according to oneThe schematic block diagram of the picture match positioning device of embodiment.It is appreciated that the device can have calculating, processing by anyThe unit of ability, platform, device clusters are realized.
As shown in figure 9, the device 900 includes:
First convolution unit 91 is configured to carry out the first process of convolution to the first picture, obtain corresponding with the first pictureFirst convolution characteristic pattern;Wherein first picture is picture to be matched;
Pond unit 92 is configured to carry out the first convolution characteristic pattern pondization operation, and generation dimension is the first numberFirst eigenvector;
Second convolution unit 93 is configured to carry out process of convolution to second picture, obtains N number of area that second picture is includedThe corresponding N number of feature vector in domain, N number of feature vector dimension are first number;The second picture is wait searchRope picture;
Assembled unit 94 is configured to the first eigenvector being combined operation with N number of feature vector respectively,Obtain N number of mix vector;
Predicting unit 95 is configured at least based on N number of mix vector, using frame regression algorithm, in second pictureThe information of middle output prediction frame, the prediction frame indicate that second picture includes the part of the image content of the first picture.
According to the embodiment of another aspect, a kind of computer readable storage medium is also provided, is stored thereon with computer journeySequence enables computer execute nerve network system described in conjunction with Figure 2 when the computer program executes in a computer,Or the method for combining Fig. 7 description.
According to the embodiment of another further aspect, a kind of calculating equipment, including memory and processor, the memory are also providedIn be stored with executable code, when the processor executes the executable code, realize the neural network system in conjunction with described in Fig. 2System, or the method in conjunction with described in Fig. 7.
Those skilled in the art are it will be appreciated that in said one or multiple examples, function described in the inventionIt can be realized with hardware, software, firmware or their any combination.It when implemented in software, can be by these functionsStorage in computer-readable medium or as on computer-readable medium one or more instructions or code transmitted.
Above-described specific embodiment has carried out further the purpose of the present invention, technical scheme and beneficial effectsIt is described in detail, it should be understood that being not intended to limit the present invention the foregoing is merely a specific embodiment of the inventionProtection scope, all any modification, equivalent substitution, improvement and etc. on the basis of technical solution of the present invention, done should allIncluding within protection scope of the present invention.

Claims (24)

CN201811046086.6A2018-09-072018-09-07Neural network system, method and device for picture matching positioningActiveCN109255382B (en)

Priority Applications (3)

Application NumberPriority DateFiling DateTitle
CN201811046086.6ACN109255382B (en)2018-09-072018-09-07Neural network system, method and device for picture matching positioning
TW108123369ATWI701608B (en)2018-09-072019-07-03 Neural network system, method and device for image matching and positioning
PCT/CN2019/098984WO2020048273A1 (en)2018-09-072019-08-02Neural network system for image matching and location determination, method, and device

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN201811046086.6ACN109255382B (en)2018-09-072018-09-07Neural network system, method and device for picture matching positioning

Publications (2)

Publication NumberPublication Date
CN109255382Atrue CN109255382A (en)2019-01-22
CN109255382B CN109255382B (en)2020-07-17

Family

ID=65047102

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN201811046086.6AActiveCN109255382B (en)2018-09-072018-09-07Neural network system, method and device for picture matching positioning

Country Status (3)

CountryLink
CN (1)CN109255382B (en)
TW (1)TWI701608B (en)
WO (1)WO2020048273A1 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
WO2020048273A1 (en)*2018-09-072020-03-12阿里巴巴集团控股有限公司Neural network system for image matching and location determination, method, and device
WO2021042798A1 (en)*2019-09-022021-03-11创新先进技术有限公司Method and device executed by computer and used for identifying vehicle component
CN114495053A (en)*2022-01-062022-05-13北京地平线信息技术有限公司Label distribution method and device
CN115699058A (en)*2020-07-142023-02-03阿里巴巴集团控股有限公司 Feature Interaction via Edge Search

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN111325222A (en)*2020-02-272020-06-23深圳市商汤科技有限公司 Image normalization processing method and device, and storage medium
TWI785431B (en)*2020-12-072022-12-01中華電信股份有限公司Network public opinion analysis method and server
TWI835562B (en)*2023-02-172024-03-11台達電子工業股份有限公司Machine learning optimization circuit and method thereof

Citations (3)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN106407891A (en)*2016-08-262017-02-15东方网力科技股份有限公司Target matching method based on convolutional neural network and device
CN107451602A (en)*2017-07-062017-12-08浙江工业大学A kind of fruits and vegetables detection method based on deep learning
CN107871134A (en)*2016-09-232018-04-03北京眼神科技有限公司A kind of method for detecting human face and device

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US5239593A (en)*1991-04-031993-08-24Nynex Science & Technology, Inc.Optical pattern recognition using detector and locator neural networks
GB2549554A (en)*2016-04-212017-10-25Ramot At Tel-Aviv Univ LtdMethod and system for detecting an object in an image
CN106355573B (en)*2016-08-242019-10-25北京小米移动软件有限公司 Target positioning method and device in pictures
US11113800B2 (en)*2017-01-182021-09-07Nvidia CorporationFiltering image data using a neural network
TWI607389B (en)*2017-02-102017-12-01耐能股份有限公司 Pool computing operation device and method for convolutional neural network
CN107038448B (en)*2017-03-012020-02-28中科视语(北京)科技有限公司Target detection model construction method
TWI617993B (en)*2017-03-032018-03-11財團法人資訊工業策進會 Identification system and identification method
CN107562805B (en)*2017-08-082020-04-03浙江大华技术股份有限公司 A method and device for searching for pictures by picture
CN108038540A (en)*2017-11-082018-05-15北京卓视智通科技有限责任公司A kind of multiple dimensioned neutral net and the image characteristic extracting method based on the network
CN109255382B (en)*2018-09-072020-07-17阿里巴巴集团控股有限公司Neural network system, method and device for picture matching positioning

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN106407891A (en)*2016-08-262017-02-15东方网力科技股份有限公司Target matching method based on convolutional neural network and device
CN107871134A (en)*2016-09-232018-04-03北京眼神科技有限公司A kind of method for detecting human face and device
CN107451602A (en)*2017-07-062017-12-08浙江工业大学A kind of fruits and vegetables detection method based on deep learning

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
QQ_18012981: "图像分割与匹配", 《HTTPS://BBS.CSDN.NET/TOPICS/392225499?UTM_MEDIUM=DISTRIBUTE.PC_RELEVANT.NONE-TASK-DISCUSSION_TOPIC-BLOGCOMMENDFROMBAIDU-18&DEPTH_1-UTM_SOURCE=DISTRIBUTE.PC_RELEVANT.NONE-TASK-DISCUSSION_TOPIC-BLOGCOMMENDFROMBAIDU-18》*
图像定位匹配方法: "图像定位匹配方法", 《HTTPS://WENKU.BAIDU.COM/VIEW/94B3D0AC89EB172DEC63B701.HTML》*
皮洋: "视频图像内容匹配与检索研究", 《中国优秀硕士学位论文全文数据库信息科技辑》*

Cited By (4)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
WO2020048273A1 (en)*2018-09-072020-03-12阿里巴巴集团控股有限公司Neural network system for image matching and location determination, method, and device
WO2021042798A1 (en)*2019-09-022021-03-11创新先进技术有限公司Method and device executed by computer and used for identifying vehicle component
CN115699058A (en)*2020-07-142023-02-03阿里巴巴集团控股有限公司 Feature Interaction via Edge Search
CN114495053A (en)*2022-01-062022-05-13北京地平线信息技术有限公司Label distribution method and device

Also Published As

Publication numberPublication date
CN109255382B (en)2020-07-17
WO2020048273A1 (en)2020-03-12
TWI701608B (en)2020-08-11
TW202011266A (en)2020-03-16

Similar Documents

PublicationPublication DateTitle
CN109255382A (en)For the nerve network system of picture match positioning, method and device
CN109584248B (en) Infrared target instance segmentation method based on feature fusion and dense connection network
US11182644B2 (en)Method and apparatus for pose planar constraining on the basis of planar feature extraction
CN110555481B (en)Portrait style recognition method, device and computer readable storage medium
CN110033007B (en)Pedestrian clothing attribute identification method based on depth attitude estimation and multi-feature fusion
CN109558832A (en)A kind of human body attitude detection method, device, equipment and storage medium
CN109325589A (en) Convolution calculation method and device
CN115223239B (en)Gesture recognition method, gesture recognition system, computer equipment and readable storage medium
CN111914809B (en)Target object positioning method, image processing method, device and computer equipment
CN112101262B (en)Multi-feature fusion sign language recognition method and network model
CN103390279A (en)Target prospect collaborative segmentation method combining significant detection and discriminant study
Zhang et al.Promptvt: Prompting for efficient and accurate visual tracking
CN110648311B (en)Acne image focus segmentation and counting network model based on multitask learning
CN107506792B (en)Semi-supervised salient object detection method
CN114299382B (en)Hyperspectral remote sensing image classification method and hyperspectral remote sensing image classification system
Xu et al.CCFNet: Cross-complementary fusion network for RGB-D scene parsing of clothing images
CN114882267A (en)Small sample image classification method and system based on relevant region
He et al.Structure-preserved self-attention for fusion image information in multiple color spaces
JP2023131117A (en)Joint perception model training, joint perception method, device, and medium
Chen et al.Bilinear Parallel Fourier Transformer for Multimodal Remote Sensing Classification
Lee et al.Boundary-aware camouflaged object detection via deformable point sampling
CN113688672B (en) An action recognition method based on the fusion of deep joints and manual appearance features
Chen et al.Salient object detection via spectral graph weighted low rank matrix recovery
Mao et al.ChaInNet: deep chain instance segmentation network for panoptic segmentation
CN109583584A (en)The CNN with full articulamentum can be made to receive the method and system of indefinite shape input

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination
REGReference to a national code

Ref country code:HK

Ref legal event code:DE

Ref document number:40003579

Country of ref document:HK

GR01Patent grant
GR01Patent grant
TR01Transfer of patent right
TR01Transfer of patent right

Effective date of registration:20200925

Address after:Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman Islands

Patentee after:Innovative advanced technology Co.,Ltd.

Address before:Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman Islands

Patentee before:Advanced innovation technology Co.,Ltd.

Effective date of registration:20200925

Address after:Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman Islands

Patentee after:Advanced innovation technology Co.,Ltd.

Address before:A four-storey 847 mailbox in Grand Cayman Capital Building, British Cayman Islands

Patentee before:Alibaba Group Holding Ltd.


[8]ページ先頭

©2009-2025 Movatter.jp