Embodiment
Embodiments herein is described below in detail, the example of the embodiment is shown in the drawings, wherein from beginning to endSame or similar label represents same or similar element or the element with same or like function.Below with reference to attachedThe embodiment of figure description is exemplary, it is intended to for explaining the application, and it is not intended that limitation to the application.
Below with reference to the accompanying drawings the search method and device of the three-dimensional image of the embodiment of the present application are described.
Fig. 1 is the first pass figure according to the search method of the three-dimensional image of the application one embodiment.
As shown in figure 1, the search method of three-dimensional image may include:
S1, the colouring information and depth information for determining three-dimensional image to be retrieved.
Specifically, the three-dimensional image of user's input can first be received.Wherein, three-dimensional image can be by such as3D camera grabs as Kinect obtain.Then, the colouring information and depth information of three-dimensional image can be obtained.
Three-dimensional image is specifically to be described by colouring information and depth information.Wherein, colouring information can be RGBColor mode or YUV color modes.In the present embodiment, rgb color pattern is mainly taken to illustrate.In rgb colorIn pattern, it may include describe the channel B of the R passages of red, the G passages of description green and description blueness.The value of each passageScope is between 0 to 255, that is to say, that and 256 grades of rgb color can be combined into about 16,780,000 kinds of colors altogether, i.e., 256 ×256 × 256=16777216.Therefore, the color of certain point in image can be described by the numerical value of three above passage.
Depth information is the information of the distance of each point and lens plane described in three-dimensional image.
S2, the colouring information of three-dimensional image and depth information inputted to the convolutional neural networks mould of training in advanceType.
Wherein, convolutional neural networks model is established according to the colouring information and depth information of three-dimensional image sample's.
S3, the characteristics of image by convolutional neural networks model output three-dimensional image.
S4, according to characteristics of image obtain retrieval result.
Specifically, the distance between data characteristics of candidate image in characteristics of image and database can be calculated.Then may be usedCandidate image is ranked up according to distance order from small to large, finally can using the candidate image sorted positioned at preceding N names asRetrieval result.Wherein, database is the database for preserving three-dimensional image pre-established.Wherein, distance can beEuclidean distance or COS distance.
It should be appreciated that the similarity between the nearlyer expression image of distance is higher, thus, candidate image can be arrangedSequence, so as to obtain more accurately retrieval result.
In addition, as shown in Fig. 2 embodiments herein may also include step S5.
S5, before colouring information and depth information to be inputted to the convolutional neural networks model to training in advance, to three-dimensionalThe colouring information and depth information of stereo-picture are normalized.
First, first the colouring information of three-dimensional image can be normalized.
Specifically, can obtain in three-dimensional image R passage numerical value j, G passage numerical value k and channel B numerical value l a little,Then respectively by R passage numerical value j, G passage numerical value k, channel B numerical value l divided by 255, so as to obtain the R passage numerical value after normalizationJ ', G passage numerical value k ' and channel B numerical value l '.Because j, k, l span are between 0 to 255, therefore, corresponding j ', k 'And l ' span is between 0 to 1.
Then, then to the depth information of three-dimensional image it is normalized.
Specifically, can obtain in three-dimensional image depth value h, minimum-depth numerical value a a little (put down apart from camera lensThe nearest value in face) and depth capacity numerical value b (value farthest apart from lens plane).Wherein, a≤h≤b.Then, depth can be obtainedThe first difference between numerical value h and minimum-depth numerical value a, then obtain it is second poor between depth capacity numerical value b and depth value hValue, finally by the first difference divided by the second difference, the final depth value obtained after normalization.Depth value after normalizationBetween span is 0 to 1.
Colouring information and depth information after normalized be typically represented by the image such as 256 of a fixed dimension ×The two dimensional image of 256 pixels.
The process for establishing convolutional neural networks model is described in detail below.
Specifically, as shown in Figure 3, it may include following steps:
S31, the colouring information and depth information for extracting three-dimensional image sample.
S32, colouring information and depth information to three-dimensional image sample are normalized, with corresponding to generationNormalized image sample.
First, first the colouring information of three-dimensional image sample can be normalized.
Specifically, can obtain in three-dimensional image sample R passages numerical value, G passages numerical value and channel B number a littleValue, then respectively by R passages numerical value, G passages numerical value, channel B numerical value divided by 255, so as to obtain the R port numbers after normalizationValue, G passages numerical value and channel B numerical value.Between the span of each passage numerical value after normalization is 0 to 1.
Then, the depth information of three-dimensional image sample can be normalized.
Specifically, can obtain in three-dimensional image sample depth value a little, minimum-depth numerical value (apart from camera lensThe nearest value of plane) and depth capacity numerical value (value farthest apart from lens plane).Then, depth value can be obtained and minimum is deepThe first difference between number of degrees value, then the second difference between depth capacity numerical value and depth value is obtained, it is finally poor by firstValue divided by the second difference, the final depth value obtained after normalization.The span of depth value after normalization is 0 to 1Between.
After this, normalized image sample can be generated according to the colouring information and depth information after normalization.For convenienceCalculate, normalized image sample can typically may be scaled to a fixed dimension such as 256 × 256 pixels.
S33, normalized image sample is trained, to establish convolutional neural networks model.
Specifically, the parameter of convolutional neural networks model can be trained based on multi-task learning method, to improve volumeThe identification precision of product neural network model.Wherein, task can be to the classification task of image pattern, can also be to imageSorting task of sample etc..
The search method of the three-dimensional image of the embodiment of the present application, by the face for determining three-dimensional image to be retrievedColor information and depth information, the colouring information of three-dimensional image and depth information are inputted to the convolutional Neural net of training in advanceNetwork model, and by the characteristics of image of convolutional neural networks model output three-dimensional image, finally obtained according to characteristics of imageRetrieval result, the degree of accuracy for obtaining retrieval result corresponding to three-dimensional image can be effectively improved, is made so as to lift userWith experience.
To achieve the above object, the application also proposes a kind of retrieval device of three-dimensional image.
Fig. 4 is the first structure schematic diagram according to the retrieval device of the three-dimensional image of the application one embodiment.
As shown in figure 4, the retrieval device of three-dimensional image may include:Determining module 110, input module 120, output mouldBlock 130 and acquisition module 140.
Determining module 110 is used for the colouring information and depth information for determining three-dimensional image to be retrieved.Specifically, may be usedFirst receive the three-dimensional image of user's input.Wherein, three-dimensional image can pass through the 3D video cameras as KinectCatch and obtain.Then, the colouring information and depth information of three-dimensional image can be obtained.
Three-dimensional image is specifically to be described by colouring information and depth information.Wherein, colouring information can be RGBColor mode or YUV color modes.In the present embodiment, rgb color pattern is mainly taken to illustrate.In rgb colorIn pattern, it may include describe the channel B of the R passages of red, the G passages of description green and description blueness.The value of each passageScope is between 0 to 255, that is to say, that and 256 grades of rgb color can be combined into about 16,780,000 kinds of colors altogether, i.e., 256 ×256 × 256=16777216.Therefore, the color of certain point in image can be described by the numerical value of three above passage.
Depth information is the information of the distance of each point and lens plane described in three-dimensional image.
Input module 120 is used to input the colouring information of three-dimensional image and depth information to the convolution of training in advanceNeural network model.Wherein, convolutional neural networks model is the colouring information and depth information according to three-dimensional image sampleEstablish.
Output module 130 is used for the characteristics of image that three-dimensional image is exported by convolutional neural networks model.
Acquisition module 140 is used to obtain retrieval result according to characteristics of image.Specifically, characteristics of image and database can be calculatedIn the distance between the data characteristics of candidate image.Then candidate image can be arranged according to the order of distance from small to largeSequence, the candidate image that finally can be located at preceding N names using sorting is as retrieval result.Wherein, database is to pre-establish for protectingDeposit the database of three-dimensional image.Wherein, distance can be Euclidean distance or COS distance.
It should be appreciated that the similarity between the nearlyer expression image of distance is higher, thus, candidate image can be arrangedSequence, so as to obtain more accurately retrieval result.
In addition, as shown in figure 5, the retrieval device of three-dimensional image may also include normalization module 150.
Normalization module 150 is used to input to the convolutional neural networks mould of training in advance by colouring information and depth informationBefore type, the colouring information and depth information of three-dimensional image are normalized.
First, first the colouring information of three-dimensional image can be normalized.
Specifically, can obtain in three-dimensional image R passage numerical value j, G passage numerical value k and channel B numerical value l a little,Then respectively by R passage numerical value j, G passage numerical value k, channel B numerical value l divided by 255, so as to obtain the R passage numerical value after normalizationJ ', G passage numerical value k ' and channel B numerical value l '.Because j, k, l span are between 0 to 255, therefore, corresponding j ', k 'And l ' span is between 0 to 1.
Then, then to the depth information of three-dimensional image it is normalized.
Specifically, can obtain in three-dimensional image depth value h, minimum-depth numerical value a a little (put down apart from camera lensThe nearest value in face) and depth capacity numerical value b (value farthest apart from lens plane).Wherein, a≤h≤b.Then, depth can be obtainedThe first difference between numerical value h and minimum-depth numerical value a, then obtain it is second poor between depth capacity numerical value b and depth value hValue, finally by the first difference divided by the second difference, the final depth value obtained after normalization.Depth value after normalizationBetween span is 0 to 1.
Colouring information and depth information after normalized be typically represented by the image such as 256 of a fixed dimension ×The two dimensional image of 256 pixels.
In addition, as shown in fig. 6, the retrieval device of three-dimensional image may also include extraction module 160, generation module 170With establish module 180.
Extraction module 160 is used for the colouring information and depth information for extracting three-dimensional image sample.
Generation module 170 is used to the colouring information and depth information of three-dimensional image sample be normalized,With normalized image sample corresponding to generation.
First, first the colouring information of three-dimensional image sample can be normalized.
Specifically, can obtain in three-dimensional image sample R passages numerical value, G passages numerical value and channel B number a littleValue, then respectively by R passages numerical value, G passages numerical value, channel B numerical value divided by 255, so as to obtain the R port numbers after normalizationValue, G passages numerical value and channel B numerical value.Between the span of each passage numerical value after normalization is 0 to 1.
Then, the depth information of three-dimensional image sample can be normalized.
Specifically, can obtain in three-dimensional image sample depth value a little, minimum-depth numerical value (apart from camera lensThe nearest value of plane) and depth capacity numerical value (value farthest apart from lens plane).Then, depth value can be obtained and minimum is deepThe first difference between number of degrees value, then the second difference between depth capacity numerical value and depth value is obtained, it is finally poor by firstValue divided by the second difference, the final depth value obtained after normalization.The span of depth value after normalization is 0 to 1Between.
After this, normalized image sample can be generated according to the colouring information and depth information after normalization.For convenienceCalculate, normalized image sample can typically may be scaled to a fixed dimension such as 256 × 256 pixels.
Establish module 180 to be used to be trained normalized image sample, to establish convolutional neural networks model.SpecificallyGround, the parameter of convolutional neural networks model can be trained based on multi-task learning method, to improve convolutional neural networks mouldThe identification precision of type.Wherein, task can be to the classification task of image pattern, can also be that sequence to image pattern is appointedBusiness etc..
The retrieval device of the three-dimensional image of the embodiment of the present application, by the face for determining three-dimensional image to be retrievedColor information and depth information, the colouring information of three-dimensional image and depth information are inputted to the convolutional Neural net of training in advanceNetwork model, and by the characteristics of image of convolutional neural networks model output three-dimensional image, finally obtained according to characteristics of imageRetrieval result, the degree of accuracy for obtaining retrieval result corresponding to three-dimensional image can be effectively improved, is made so as to lift userWith experience.
In the description of this specification, reference term " one embodiment ", " some embodiments ", " example ", " specifically showThe description of example " or " some examples " etc. means specific features, structure, material or the spy for combining the embodiment or example descriptionPoint is contained at least one embodiment or example of the application.In this manual, to the schematic representation of above-mentioned term notIdentical embodiment or example must be directed to.Moreover, specific features, structure, material or the feature of description can be with officeCombined in an appropriate manner in one or more embodiments or example.In addition, in the case of not conflicting, the skill of this areaArt personnel can be tied the different embodiments or example and the feature of different embodiments or example described in this specificationClose and combine.
Although embodiments herein has been shown and described above, it is to be understood that above-described embodiment is exampleProperty, it is impossible to the limitation to the application is interpreted as, one of ordinary skill in the art within the scope of application can be to above-mentionedEmbodiment is changed, changed, replacing and modification.