CN111292364A

Movatterモバイル変換

Info

Publication number: CN111292364A
Application number: CN202010070168.5A
Authority: CN
Inventors: 左忠斌; 左达宇
Original assignee: Tianmu Aishi Beijing Technology Co Ltd
Current assignee: Tianmu Aishi Beijing Technology Co Ltd
Priority date: 2020-01-21
Filing date: 2020-01-21
Publication date: 2020-06-16
Anticipated expiration: 2040-01-21
Also published as: CN111292364B

Abstract

The invention provides a method for quickly matching images in a three-dimensional model construction process, which comprises the steps of selecting a reference image and a search image to be matched; respectively constructing image pyramids of the two images; performing SURF feature matching on the 1 st layer at the tops of the two image pyramids to realize preliminary matching; establishing a geometric constraint condition between the 1 st layers of the two image pyramids; correcting and sampling a certain window of the current layer of the search image pyramid to be under the image space coordinate system of the 1 st layer of the reference image pyramid; establishing a correlation function between the gray values of the two images corresponding to the layer of the two pyramid images; and (5) establishing a geometric constraint condition between the next layers of the two image pyramids according to the matching result, and repeating the steps 5 and 6 until the matching of the last layer of the two image pyramids is completed. For the scheme of the target object surrounding type acquisition with limited volume, the method for optimizing the homonymy point matching algorithm and considering both the synthesis speed and the precision is provided for the first time.

Description

Method for rapidly matching images in three-dimensional model construction process

Technical Field

The invention relates to the technical field of topography measurement, in particular to the technical field of 3D topography measurement.

Background

When performing 3D measurement, processing and manufacturing by using 3D measurement data, or displaying and identifying by using 3D data, a more accurate 3D model should be established for a target object first. The currently common method includes using a machine vision mode to collect pictures of an object from different angles, and matching and splicing the pictures to form a 3D model. These 3D models can be regarded as data of real things, and a complement matching the object can be manufactured by using the data. For example, 3D data of a human foot can be collected to create a more suitable shoe. In addition, these data can also be used to verify identity. For example, the iris 3D model of the human body can be synthesized to be used as identity standard data, the iris 3D data is collected again when the iris 3D model is used, and the identity can be recognized by comparing the iris 3D data with the standard data. However, both factory manufacturing and transaction identification have high requirements on the synthesis speed and accuracy of the 3D model, which would otherwise bring about a great deterioration in the customer experience.

In the prior art, the improvement of the synthesis speed is considered to depend on the optimization of the 3D model reconstruction algorithm. Various algorithms have thus been proposed to improve the reconstruction of 3D models, but with modest results. In the prior art, the characteristics of high spatial resolution, large rotation angle, multi-view shooting and the like of the shot pictures possibly occurring in the surrounding type acquisition process are not considered, so that the exposure of each part of the pictures is uneven and the radiation distortion of the pictures is large due to the fact that the pictures are easy to cause the phenomena of angle difference of acquisition and reflection and scattering of sunlight on the surface of an object; meanwhile, due to the ultrahigh resolution, the problem of few local features on the photo is caused. That is, there is no algorithm optimization specifically for wrap-around acquisition in the prior art.

It is also believed in the art that the improvement in accuracy is more dependent on the accuracy of the image acquisition. The use of high resolution cameras naturally improves the image acquisition accuracy and to some extent the accuracy of 3D modeling, but ultrahigh resolution images also bring about an extreme decrease in the synthesis speed.

Moreover, the synthesis speed and the synthesis precision are in a pair of contradictions to some extent, and the improvement of the synthesis speed can lead to the reduction of the final 3D synthesis precision; to improve the 3D synthesis accuracy, the synthesis speed needs to be reduced, and more pictures need to be synthesized. First, there is no algorithm capable of improving the synthesis speed and the synthesis effect at the same time in the prior art. Secondly, the collection and synthesis are generally considered to be two processes, which do not affect each other and are not considered uniformly. This affects the efficiency of 3D synthesis modeling and does not compromise the improvement of synthesis speed and synthesis accuracy. Finally, in the prior art, it has also been proposed to use empirical formulas including rotation angle, object size, and object distance to define the camera position, thereby taking into account the speed and effect of the synthesis. However, in practical applications it is found that: unless a precise angle measuring device is provided, the user is insensitive to the angle and is difficult to accurately determine the angle; the size of the target is difficult to accurately determine, and particularly, the target needs to be frequently replaced in certain application occasions, each measurement brings a large amount of extra workload, and professional equipment is needed to accurately measure irregular targets. The measured error causes the camera position setting error, thereby influencing the acquisition and synthesis speed and effect; accuracy and speed need to be further improved.

Therefore, ① can break through the bias of algorithm optimization, synthesis speed and synthesis precision are improved through a homonymy point matching algorithm, ② algorithm can be matched with an image acquisition method to simultaneously improve the synthesis speed and the synthesis precision, ③ the technical problem that a 3D model scene can be generated for surrounding acquisition of a target object and algorithm optimization is specially carried out is urgently needed to be solved.

Disclosure of Invention

In view of the above, the present invention has been made to provide a method for fast matching of images during construction of a three-dimensional model that overcomes or at least partially solves the above-mentioned problems.

One aspect of the present invention provides a method for image fast matching in a three-dimensional model building process, including,

step 1: selecting a reference image and a search image to be matched;

step 2: respectively constructing image pyramids of the two images;

and step 3: performing SURF feature matching on the 1 st layer at the tops of the two image pyramids to realize preliminary matching;

and 4, step 4: establishing a geometric constraint condition between the 1 st layers of the two image pyramids;

and 5: correcting and sampling a certain window of the current layer of the search image pyramid to be under the image space coordinate system of the 1 st layer of the reference image pyramid;

step 6: establishing a correlation function between the gray values of the two images corresponding to the layer of the two pyramid images;

and 7: and (4) establishing a geometric constraint condition between the next layers of the two image pyramids according to the matching result, and correspondingly repeating the steps 5 and 6, and so on until the matching of the last layer of the two image pyramids is completed.

Optionally, the image pyramid is a set of images arranged in a pyramid shape with gradually decreasing resolution, the bottom of the pyramid is the original image, and the top is the low resolution image sampled from the original image.

Optionally, the geometric constraint condition is an affine transformation relationship between the reference image and the search image, and 6 affine transformation parameters are obtained by calculating the homonymous point pairs obtained by SURF image matching.

Optionally, step 5 includes: and correcting and sampling the search image window to the image space coordinate system of the reference image by using an affine transformation coefficient, so that the conjugate entities of the reference image and the search image are consistent on the image space, namely the reference image and the search image have no rotation, geometric deformation and scale change.

Optionally, step 6 includes: and selecting a matching window of the reference image target area and the search image in a preset range to calculate a correlation coefficient, and taking a point with the maximum correlation coefficient and larger than a threshold value as a matching point.

Optionally, when the image acquisition is performed,

wherein L is the linear distance of the optical center of the image acquisition device at two adjacent acquisition positions; f is the focal length of the image acquisition device; d is the rectangular length or width of the photosensitive element of the image acquisition device; t is the distance from the photosensitive element of the image acquisition device to the surface of the target along the optical axis; δ is the adjustment coefficient.

Alternatively, δ < 0.603; preferably δ <0.498, δ <0.356, δ < 0.311.

According to another aspect of the invention, a three-dimensional model construction method is provided, which comprises the matching method.

The third aspect of the invention provides a three-dimensional data comparison method, which comprises the matching method.

Invention and technical effects

1. For the scheme of the target object surrounding type acquisition with limited volume, the method for optimizing the homonymy point matching algorithm and considering both the synthesis speed and the precision is provided for the first time.

2. The method improves the synthesis speed and the synthesis precision by the mode of optimizing the position of the camera for acquiring the picture and the optimized algorithm. And when the position is optimized, the angle and the target size do not need to be measured, and the applicability is stronger.

3. The search image window is corrected and sampled to the image space coordinate system of the reference image, so that the reference image and the search image have no rotation, geometric deformation and scale change, and the matching precision and speed are improved.

4. By resampling and layering the images and matching the low-resolution image layer to the high-resolution image layer in sequence, the matching precision and speed are improved.

Drawings

Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:

fig. 1 is a flowchart illustrating image homonym matching according to an embodiment of the present invention;

fig. 2 is a schematic diagram illustrating matching of each layer of an image pyramid according to an embodiment of the present disclosure;

fig. 3 is an acquisition device provided in an embodiment of the present invention;

fig. 4 is another acquisition apparatus provided in an embodiment of the present invention.

The correspondence of reference numerals to the respective components is as follows:

the device comprises an object stage 1, arotating device 2, a rotatingarm 3 and animage acquisition device 4.

Detailed Description

Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

Homonym point matching algorithm

Through a great deal of research, when three-dimensional reconstruction is carried out by utilizing close-range shot pictures, due to the characteristics of high spatial resolution, large rotation angle, multi-view shooting and the like of the shot pictures, the exposure of each part of the pictures is uneven and the radiation distortion of the pictures is large due to the fact that the pictures are easy to obtain angle difference and the reflection and scattering phenomena of sunlight on the surface of an object; meanwhile, due to the ultrahigh resolution, local features on the photos are few, which brings great difficulty to matching of homologous points of subsequent photos. In addition, in the three-dimensional reconstruction process of the close-range object, the speed of matching the homonymous points of the photos directly influences the reconstruction speed and is directly related to the user experience. Based on the unique discovery of the invention, the homonymous point matching method based on the space consistency of the geometric constraint condition is provided, the method can better solve the problem of difficult matching of close-range photos shot at high resolution and multiple visual angles, and the homonymous point matching among the shot photos can be quickly realized.

The invention completes the matching of the homonymous points of the shot photos by a homonymous point matching method based on the space consistency of geometric constraint conditions. Firstly, primarily matching SURF characteristics on the top layer of an image pyramid, and quickly establishing an initial affine transformation relation between the SURF characteristics and the SURF characteristics; then extracting feature points from the reference image, converting a feature point location window to a search image on the top layer of an image pyramid by taking an affine transformation relation as a geometric constraint condition, sampling and correcting the search image window to an image space coordinate system of the reference image, matching correlation coefficients, iteratively eliminating gross errors by adopting a polynomial in a local range, and resolving and updating the affine transformation relation of the two again by using the obtained homonymy image points; and then, carrying out lower-layer image matching until the lower-layer image is obtained, finally, converting the matching result into an image space coordinate system of the searched image, and carrying out least square matching to improve the matching precision. Please refer to the matching flow chart of the same name point of the image as shown in fig. 1, and the matching diagram of each layer of the image pyramid as shown in fig. 2.

Firstly, establishing an image pyramid of a reference image and a search image;

an image pyramid is a series of image sets arranged in a pyramid shape with gradually decreasing resolution. The bottom of the pyramid is the original image and the top is the low resolution image sampled from the original image. The resolution of the bottom layer is highest, the data volume is largest, the resolution is gradually reduced along with the increase of the layer number, and when data are extracted from the original image data to construct a pyramid, a resampling method is adopted to construct the pyramid, so that a plurality of resolution levels are formed. From the bottom layer to the top layer of the pyramid, the resolution is lower and lower, but the range of representation is consistent, and the resolution of each layer can be represented by a formula. Assuming that the original resolution of the image data is r0 and the resampling ratio is m, the resolution rj of the j-th layer is r0 × mj, wherein the resampling ratio can be any integer greater than 1. In the method, the value of the resampling ratio m is 2, when the length or the width of the top-layer image pixel is less than 1024, the upper-layer image is not continuously sampled to generate, and the creation of the image pyramid of the reference image and the image pyramid of the search image are respectively completed.

Secondly, primarily matching SURF images;

the image initial matching based on the SURF characteristics, namely, the matching efficiency is improved while the characteristics of rotation resistance, scale invariance and the like of the SURF characteristics are matched. The basic idea is to perform SURF feature matching on the pyramid top layers of two images, so that the search range is reduced, the calculated amount is greatly reduced, and the matching speed is improved; then, coarse difference elimination is carried out by utilizing a random sampling method based on a polynomial, and large coarse difference points can be eliminated better by the method; and then mapping the matching result to the bottom layer of the image, and establishing an affine transformation relation between the two images by using the matching point pairs to complete the initial matching of the unmanned aerial vehicle image.

Thirdly, setting geometric constraint conditions;

the geometric constraint condition is an affine transformation relation between the reference image and the search image, and 6 affine transformation parameters are obtained through calculation of homonymous point pairs obtained through SURF image initial matching. The method has the effects of establishing the geometric relationship between the images, reducing the subsequent search range of precise matching and improving the matching efficiency of the whole image.

Wherein X and Y are pixel column number and row number of the feature point on the reference image, X and Y are pixel column number and row number of the corresponding pixel point with the same name on the search image of the feature point on the reference image, a₀、a₁、a₂、b₀、b₁、b₂Are coefficients of affine transformation polynomials.

Fourthly, realizing the image space consistency;

the image space consistency means that an affine transformation coefficient is used for correcting and sampling a search image window to an image space coordinate system of a reference image, so that conjugate entities of the reference image and the search image are consistent on the image space, namely the reference image and the search image do not have rotation, geometric deformation and scale change. That is, the matching results are coordinates in the image coordinate system of the reference image, and the conventional matching results are coordinates of the reference image and coordinates of the search image. The reason for this is: two sets of parameters of forward calculation and backward calculation from the reference image to the search image are solved during matching, under the condition that the initial conversion parameters are inaccurate, the backward calculation parameters are utilized to map the matching points to the search image, so that a large error is generated, and finally, the failure of subsequent matching is caused by the incorrect search position; in addition, repeated forward and backward calculations in the matching process also generate point location errors, which affect the matching point location accuracy.

Fifthly, matching correlation coefficients;

the correlation coefficient matching is to establish a correlation function between the gray values of the two images to describe the correlation between the images. The image correlation coefficient can be simply described as the covariance function divided by the variance of the two signals, which is a normalized covariance function. The essence of the method is that a reference image target area (determined by the radius of the target area) in a certain range is selected, a correlation coefficient is calculated with a matching window (determined by the radius of the searching area) of a searching image, a point with the maximum correlation coefficient and larger than a threshold value is taken as a matching point, and the size of the window participating in calculation of covariance is called as the matching radius. The correlation coefficient formula is as follows:

wherein σ (x, x) is the variance of pixel gray of the target area of the reference image, σ (y, y) is the variance of pixel gray of the search area of the search image, σ (x, y) is the covariance of pixel gray of the target area and the search area, the pixel position (c, r) where ρ is the maximum value and is greater than T is the position of the same-name point, and the threshold value T is generally 0.7-0.9. In the invention, T is 0.85.

Sixthly, matching image connection points with image space consistency under geometric constraint;

firstly, point feature extraction is carried out in a target window of a pyramid bottom layer of a reference image by adopting a Forstner operator, the distribution of feature points is as uniform as possible, logic partitioning is carried out to ensure that each block has a similar number of feature points, and the extraction result is mapped to each layer of the pyramid of the image.

And then SURF feature matching is carried out on the pyramid top layers of the two images, 6 affine transformation parameters between the top layer images are obtained through calculation, four corner point coordinates of a corresponding search window on the search image are obtained through a target window on the reference image and the affine transformation parameters, and resampling is carried out on the search window by using an affine transformation formula with the target window as a reference. And then, matching the feature points extracted by the Forstner operator in the target window to a re-sampled search window by using a correlation coefficient method to obtain conjugate feature points, reserving the conjugate feature points with the maximum correlation coefficients and larger than a threshold value, and then removing mismatching points by using a Ranpac thought to obtain more accurate conjugate points. And updating 6 affine transformation parameters by using the accurate conjugate points, and then calculating by using an affine transformation formula for updating the parameters to obtain the conjugate points of the residual feature points of the target window.

And conducting the top-level matching result to the next layer of the image, conducting the top-level matching result only on the searched image, then conducting the matching similar to the top-level pyramid, and paying attention to the fact that the updated affine transformation parameters are used for conducting image resampling each time. Until the matching result is conducted to the bottom layer of the pyramid. And then, taking the matching result of the bottom layer as an initial position, and performing least square matching to obtain an accurate connection point matching result.

And finally, matching other target windows on the reference image. The whole image matching is that all adjacent images meeting a certain overlapping degree are matched, so that a large number of multi-degree connecting points are obtained. Meanwhile, the image matching can be designed in a parallel processing mode, and the matching efficiency is improved.

The following description will be made by taking the case where the images a and B are identical in name point matching.

①, in order to obtain the position relationship between the image A and the image B quickly, resampling the image A and the image B with reduced resolution is carried out, the image A and the image B are sampled to 1024 pixels by 1024 pixels to obtain the image A1 and the image B1, and SURF feature point extraction and matching are carried out on the A1 and the image B1 sampled to obtain the dotted Pair Pair (A1, B1) of a plurality of pixel points of the image A1 and the image B1, and then the dotted Pair Pair (A1, B1) of the pixel points of the image A1 and the image B1 are mapped to the image A and the image B according to the proportion relationship of the image sampling to obtain the dotted Pair Pair (A, B) of the image A and the image B.

② substituting the pixel row and column coordinates of the same-name point pairs Pair (A, B) of the pixel points of the A image and the B image into the formula 1, 6 affine transformation parameters a of the A1 and the A2 can be obtained by calculation₀、a₁、a₂、b₀、b₁、b₂The geometric position relationship between the pixels of the image A and the image B is obtained.

③, taking the image A as a reference image, carrying out multi-level resolution reduction sampling of 2 times of the ratio on the image A, constructing a pyramid of the image, then adopting a Forstner operator to carry out point feature extraction in a target window of the image at the bottom layer of the pyramid, carrying out logic blocking (50 pixels by 50 pixels are one block), ensuring that each block has feature points, and mapping the extraction result to each layer of the image pyramid.

④, calculating a window of 10 pixels by 10 pixels around the characteristic point of the A image through 6 affine transformation parameters to obtain a window position corresponding to the B image, then resampling an image affine transformation formula of the window of the B image, correcting the window position to a pixel coordinate system of the A image, then matching the characteristic point extracted by the Forstner operator in the target window to a search window after resampling by using a correlation coefficient method to obtain conjugate characteristic points, reserving the conjugate characteristic points with the maximum correlation coefficient and more than 0.8 of a threshold value, and then removing mismatching points by using a Randac idea to obtain more accurate conjugate points, namely the homonymous characteristic points.

Collection equipment

In order to realize the acquisition of 3D information, the invention provides image acquisition equipment for acquiring 3D information, which comprises an image acquisition device and a rotating device. The image acquisition device is used for acquiring a group of images of the target object through the relative movement of an acquisition area of the image acquisition device and the target object; and the acquisition area moving device is used for driving the acquisition area of the image acquisition device to generate relative motion with the target object. The collection area is the effective field range of the image collection device. The structure of the specific acquisition equipment has different forms as follows:

① collecting equipment with rotary structure of collecting area moving device

As shown in fig. 3, the object is fixed on the object stage 1, therotating device 2 includes a rotation driving device and arotating arm 3, wherein the rotation driving device can be located above the object to drive therotating arm 3 to rotate, therotating arm 3 is connected with a vertical column extending downwards, and the vertical column is provided with animage collecting device 4. Theimage acquisition device 4 is rotated around the object by the driving of therotating device 2.

In another case, as shown in fig. 4, the apparatus includes a circular stage 1 for carrying a target; therotating device 2 comprises a rotating driving device and arotating arm 3, wherein therotating arm 3 is bent, and the horizontal lower section part is rotationally fixed on the base, so that the vertical upper section part rotates around the objective table 1; theimage acquisition device 4 is used for acquiring images of the target object and is arranged at the upper section part of therotating arm 3, and the specialimage acquisition device 4 can also rotate vertically along the rotating arm in a pitching manner so as to adjust the acquisition angle.

In fact, the manner of rotating the image capturing device around the object is not limited to the above, and various structures such as the image capturing device being disposed on an annular track around the object, on a turntable, on a rotating cantilever, etc. may be implemented. Therefore, the image acquisition device only needs to rotate around the target object. Of course, the rotation is not necessarily a complete circular motion, and can be only rotated by a certain angle according to the acquisition requirement. The rotation is not necessarily circular motion, and the motion track of the image acquisition device can be other curved tracks, but the camera is ensured to shoot objects from different angles.

In addition to the above, in some cases, the camera may be fixed, and the stage carrying the target may be rotated, so that the direction of the target facing the image capturing device changes from moment to moment, thereby enabling the image capturing device to capture images of the target from different angles. However, in this case, the calculation may still be performed according to the condition of converting the motion into the motion of the image capturing device, so that the motion conforms to the corresponding empirical formula (which will be described in detail below). For example, in a scenario where the stage rotates, it may be assumed that the stage is stationary and the image capture device rotates. The distance of the shooting position when the image acquisition device rotates is set by using an empirical formula, so that the rotating speed of the image acquisition device is deduced, the rotating speed of the object stage is reversely deduced, the rotating speed is conveniently controlled, and 3D acquisition is realized.

In addition, in order to enable the image acquisition device to acquire images of the target object in different directions, the image acquisition device and the target object can be kept still, and the image acquisition device and the target object can be rotated by rotating the optical axis of the image acquisition device. For example: the collecting area moving device is an optical scanning device, so that the collecting area of the image collecting device and the target object generate relative motion under the condition that the image collecting device does not move or rotate. The acquisition area moving device also comprises a light deflection unit which is driven by machinery to rotate, or is driven by electricity to cause light path deflection, or is distributed in space in multiple groups, so that images of the target object can be acquired from different angles. The light deflection unit may typically be a mirror, which is rotated to collect images of the target object in different directions. Or a reflector surrounding the target object is directly arranged in space, and the light of the reflector enters the image acquisition device in turn. Similarly to the foregoing, the rotation of the optical axis in this case can be regarded as the rotation of the virtual position of the image pickup device, and by this method of conversion, it is assumed that the image pickup device is rotated, so that the calculation is performed using the following empirical formula.

The image acquisition device is used for acquiring an image of a target object and can be a fixed-focus camera or a zoom camera. In particular, the camera may be a visible light camera or an infrared camera. Of course, it is understood that any device with image capturing function can be used, and does not limit the present invention, and for example, the device can be a CCD, a CMOS, a camera, a video camera, an industrial camera, a monitor, a camera, a mobile phone, a tablet, a notebook, a mobile terminal, a wearable device, a smart glasses, a smart watch, a smart bracelet, and all devices with image capturing function.

A background plate may also be added to the device when the arrangement is rotated. The background plate is positioned opposite to the image acquisition device, synchronously rotates when the image acquisition device rotates, and keeps still when the image acquisition device is still. And the image of the target object collected by the image collecting device is all with the background plate as the background. Of course, it is also possible to set a completely fixed background plate for the object so that the background plate can be used as the capturing background regardless of the movement of the image capturing apparatus. The background plate is all solid or mostly (body) solid. In particular, the color plate can be a white plate or a black plate, and the specific color can be selected according to the color of the object body. The background plate is usually a flat plate, and preferably a curved plate, such as a concave plate, a convex plate, a spherical plate, and even in some application scenarios, the background plate may have a wavy surface; the plate can also be made into various shapes, for example, three sections of planes can be spliced to form a concave shape as a whole, or a plane and a curved surface can be spliced.

The device further comprises a processor, also called processing unit, for synthesizing a 3D model of the object according to the plurality of images acquired by the image acquisition means and according to a 3D synthesis algorithm, to obtain 3D information of the object.

In addition to the above-mentioned rotation manner, in some situations, it is difficult to have a large space for accommodating the rotation of the rotating device. In this case, the rotation space of the rotating device is limited. For example, the rotation device may include a rotation driving device and a rotation arm, wherein the rotation track of the rotation arm has a smaller distance from the rotation center or the center line of the rotation arm coincides (or approximately coincides) with the rotation center line. The rotation driving device may include a motor directly connected to the linear type rotor arm through a gear, and at this time, a physical center line of the rotor arm coincides with a rotation center line of the rotor arm. In another case, the swivel arm is L-shaped, comprising a crossbar and a vertical arm. The cross arm of the rotating arm is connected with the rotating driving device, and the image acquisition device is installed on the vertical arm. The rotation driving device comprises a motor, the motor drives the cross arm to rotate, the vertical arm fixedly connected with the cross arm correspondingly rotates, and at the moment, the rotation center line is not overlapped with the physical center line of the vertical arm. Generally, to save space for rotation, the distance of such misalignment can be reduced appropriately, or the cross arm size can be reduced appropriately. Of course, when the L-shaped rotating arm is used, the vertical arm of the L-shaped rotating arm can be placed in a target object, and the cross arm is placed outside, so that the requirement on the rotating space can be reduced, and the size of the cross arm is required to be longer.

② acquisition equipment with translational structure acquisition area moving device

In addition to the above-described rotating structure, the image pickup device may move in a linear trajectory relative to the target object. For example, the image capturing device is located on the linear track, and sequentially passes through the target object along the linear track to capture images, and the image capturing device is not rotated in the process. Wherein the linear track can also be replaced by a linear cantilever. More preferably, the image capturing device is rotated to a certain degree when moving along a linear track, so that the optical axis of the image capturing device faces the target object.

③ acquisition equipment with irregular motion structure of acquisition area moving device

Sometimes, the movement of the acquisition area is irregular, for example, the image acquisition device can be held by hand to shoot around a target object, at this time, the movement is difficult to be performed in a strict track, and the movement track of the image acquisition device is difficult to be accurately predicted. Therefore, in this case, how to ensure that the captured images can be accurately and stably synthesized into the 3D model is a difficult problem, which has not been mentioned yet. A more common approach is to take multiple photographs, with redundancy in the number of photographs to address this problem. However, the synthesis results are not stable. Although there are some ways to improve the composite effect by limiting the rotation angle of the camera, in practice, the user is not sensitive to the angle, and even if the preferred angle is given, the user is difficult to operate in the case of hand-held shooting. Therefore, the invention provides a method for improving the synthesis effect and shortening the synthesis time by limiting the moving distance of the camera for twice photographing.

For example, in the process of face recognition, a user can hold the mobile terminal to shoot around the face of the user in a moving mode. As long as the experience requirements (specifically described below) of the photographing position are met, the 3D model of the face can be accurately synthesized, and at this time, the face recognition can be realized by comparing with the standard model stored in advance. For example, the handset may be unlocked, or payment verification may be performed.

In the case of irregular movement, a sensor may be provided in the mobile terminal or the image acquisition device, and a linear distance moved by the image acquisition device during two times of photographing may be measured by the sensor, and when the moving distance does not satisfy the above-mentioned experience condition with respect to L (specifically, the following condition), an alarm may be issued to the user. The alarm may comprise sounding or lighting an alarm to the user. Of course, the distance of the movement of the user and the maximum movable distance L may also be displayed on the screen of the mobile phone or prompted by voice in real time when the user moves the image acquisition device. The sensor that accomplishes this function includes: a range finder, a gyroscope, an accelerometer, a positioning sensor, and/or combinations thereof.

④ Multi-camera mode acquisition device

It can be understood that, besides the camera and the object move relatively, the camera can shoot images of the object at different angles, a plurality of cameras can be arranged at different positions around the object, and thus the images of the object at different angles can be shot simultaneously.

Image acquisition device position optimization

If when the outer surface information of the vase is collected, the image collecting device can rotate around the vase for a circle to shoot images within 360 degrees of the circumference of the vase. At this time, it is necessary to optimize at which position the image acquisition device acquires, otherwise it is difficult to consider both the time and the effect of 3D model construction. Of course, besides the mode of rotating around the target, a plurality of image capturing devices may be arranged to capture images simultaneously (specifically, refer to "multi-camera mode capturing apparatus"), and the position of the image capturing device still needs to be optimized, and the experience condition of the optimization is consistent with the above, and at this time, because of the plurality of image capturing devices, the optimized position is the position between two adjacent image capturing devices.

The acquisition area moving device is of a rotating structure, theimage acquisition device 4 rotates around the target object 1, when 3D acquisition is carried out, theimage acquisition device 4 changes relative to the target object 1 in the direction of optical axes of different acquisition positions, and the positions of two adjacentimage acquisition devices 4 or two adjacent acquisition positions of theimage acquisition device 4 meet the following conditions:

δ<0.603

wherein L is the linear distance between the optical centers of the two adjacent acquisition positionimage acquisition devices 4; f is the focal length of theimage acquisition device 4; d is the rectangular length or width of the photosensitive element (CCD) of theimage acquisition device 4; t is the distance from the photosensitive element of theimage acquisition device 4 to the surface of the target object 1 along the optical axis; δ is the adjustment coefficient.

When the two positions are along the length direction of the photosensitive element of theimage acquisition device 4, d is a rectangular length; when the two positions are along the width direction of the photosensitive element of theimage pickup device 4, d takes a rectangular width.

When theimage pickup device 4 is in any one of the two positions, the distance from the photosensitive element to the surface of the object along the optical axis is taken as T. In addition to this method, in another case, L is A_n、A_n+1Linear distance between optical centers of two image capturing devices, and A_n、A_n+1Twoimage capturing devices 4 adjacent to each other A_n-1、A_n+2Twoimage capturing devices 4 and A_n、A_n+1The distances from the respective photosensitive elements of the twoimage acquisition devices 4 to the surface of the target 1 along the optical axis are respectively T_n-1、T_n、T_n+1、T_n+2，T＝(T_n-1+T_n+T_n+1+T_n+2)/4. Of course, the average value may be calculated by using more positions than the adjacent 4 positions.

As above, L should be a straight-line distance between the optical centers of the twoimage capturing devices 4, but since the optical center positions of theimage capturing devices 4 are not easily determined in some cases, the centers of the photosensitive elements of theimage capturing devices 4, the geometric centers of theimage capturing devices 4, the axial centers of theimage capturing devices 4 connected to the pan-tilt (or platform, support), and the centers of the lens proximal or distal surfaces may be used instead in some cases, and the errors caused by the displacement are found to be within an acceptable range through experiments, and therefore the above range is also within the protection scope of the present invention.

In general, parameters such as object size and angle of view are used as means for estimating the position of a camera in the prior art, and the positional relationship between two cameras is also expressed in terms of angle. Because the angle is not well measured in the actual use process, it is inconvenient in the actual use. Also, the size of the object may vary with the variation of the measurement object. For example, when the head of a child is collected after 3D information on the head of an adult is collected, the head size needs to be measured again and calculated again. The inconvenient measurement and the repeated measurement bring errors in measurement, thereby causing errors in camera position estimation. According to the scheme, the experience conditions required to be met by the position of the camera are given according to a large amount of experimental data, so that the problem that the measurement is difficult to accurately measure the angle is solved, and the size of an object does not need to be directly measured. In the empirical condition, d and f are both fixed parameters of the camera, and corresponding parameters can be given by a manufacturer when the camera and the lens are purchased without measurement. And T is only a straight line distance, and can be conveniently measured by using a traditional measuring method, such as a ruler and a laser range finder. Therefore, the empirical formula of the invention enables the preparation process to be convenient and fast, and simultaneously improves the arrangement accuracy of the camera position, so that the camera can be arranged in an optimized position, thereby simultaneously considering the 3D synthesis precision and speed, and the specific experimental data is shown in the following.

Experiments were conducted using the apparatus of the present invention, and the following experimental results were obtained.

The camera lens is replaced, and the experiment is carried out again, so that the following experiment results are obtained.

From the above experimental results and a lot of experimental experiences, it can be found that the value of δ should satisfy δ <0.603, and at this time, a part of the 3D model can be synthesized, although a part cannot be automatically synthesized, it is acceptable in the case of low requirements, and the part which cannot be synthesized can be compensated manually or by replacing the algorithm. Particularly, when the value of δ satisfies δ <0.498, the balance between the synthesis effect and the synthesis time can be optimally satisfied; delta <0.356 can be chosen for better synthesis, where the synthesis time is increased but the synthesis quality is better. Of course, to further enhance the synthesis effect, δ <0.311 may be selected. When the delta is 0.681, the synthesis is not possible. It should be noted that the above ranges are only preferred embodiments and should not be construed as limiting the scope of protection.

Moreover, as can be seen from the above experiment, for the determination of the photographing position of the camera, only the camera parameters (focal length f, CCD size) and the distance T between the camera CCD and the object surface need to be obtained according to the above formula, which makes it easy to design and debug the device. Since the camera parameters (focal length f, CCD size) are determined at the time of purchase of the camera and are indicated in the product description, they are readily available. Therefore, the camera position can be easily calculated according to the formula without carrying out complicated view angle measurement and object size measurement. Particularly, in some occasions, the lens of the camera needs to be replaced, and then the position of the camera can be obtained by directly replacing the conventional parameter f of the lens and calculating; similarly, when different objects are collected, the measurement of the size of the object is complicated due to the different sizes of the objects. By using the method of the invention, the position of the camera can be determined more conveniently without measuring the size of the object. And the camera position determined by the invention can give consideration to both the synthesis time and the synthesis effect. Therefore, the above-described empirical condition is one of the points of the present invention.

The above is data obtained when the image of the outer surface of the target is collected and 3D synthesized, and according to the above similar method, experiments on the inner surface of the target and the connection portion of the target can be performed, and corresponding data can be obtained as follows:

when the inner surface is acquired, the value of δ should satisfy δ <0.587, and the partial 3D model can be synthesized, and although some parts cannot be automatically synthesized, the method is acceptable under the condition of low requirement, and the parts which cannot be synthesized can be compensated manually or by replacing an algorithm. Particularly, when the value of δ satisfies δ <0.443, the balance between the synthesis effect and the synthesis time can be optimally taken into consideration; δ <0.319 can be chosen for better synthesis, where the synthesis time increases but the synthesis quality is better. Of course, δ <0.282 may be chosen to further enhance the synthesis. Whereas, when δ is 0.675, synthesis is not possible. It should be noted that the above ranges are only preferred embodiments and should not be construed as limiting the scope of protection.

When the connecting part is collected, the value of delta is required to meet the condition that delta is less than 0.513, at this time, partial 3D models can be synthesized by matching with the images of the inner surface and the outer surface to form a complete 3D model comprising the inner surface and the outer surface, although a part of the 3D models cannot be automatically synthesized, the 3D model is acceptable under the condition of low requirement, and the part which cannot be synthesized can be compensated manually or by replacing an algorithm. Particularly, when the value of δ satisfies δ <0.415, the balance between the synthesis effect and the synthesis time can be optimally taken into consideration; δ <0.301 can be chosen for better synthesis, where the synthesis time increases but the synthesis quality is better. Of course to further enhance the synthesis effect δ <0.269 may be chosen. Whereas, when δ is 0.660, synthesis is not possible. It should be noted that the above ranges are only preferred embodiments and should not be construed as limiting the scope of protection.

The above data are obtained by experiments for verifying the conditions of the formula, and do not limit the invention. Without these data, the objectivity of the formula is not affected. Those skilled in the art can adjust the equipment parameters and the step details as required to perform experiments, and obtain other data which also meet the formula conditions.

3D Synthesis Process

According to the above-described acquisition method, the image acquisition apparatus 1 acquires a set of images of the object by moving relative to the object;

the processing unit obtains 3D information of the object according to a plurality of images in the group of images. The specific algorithm is as follows. Of course, the processing unit may be directly disposed in the housing where the image capturing device 1 is located, or may be connected to the image capturing device 1 through a data line or in a wireless manner. For example, an independent computer, a server, a cluster server, or the like may be used as a processing unit, and the image data acquired by the image acquisition apparatus 1 may be transmitted thereto to perform 3D synthesis. Meanwhile, the data of the image acquisition device 1 can be transmitted to the cloud platform, and 3D synthesis is performed by using the powerful computing capability of the cloud platform.

When the collected pictures are used for 3D synthesis, the existing algorithm can be adopted, and the optimized algorithm provided by the invention can also be adopted, and the method mainly comprises the following steps:

step 1: and performing image enhancement processing on all input photos. The contrast of the original picture is enhanced and simultaneously the noise suppressed using the following filters.

In the formula: g (x, y) is the gray value of the original image at (x, y), f (x, y) is the gray value of the original image at the position after being enhanced by the Wallis filter, and m_gIs the local gray average value, s, of the original image_gIs the local standard deviation of gray scale of the original image, m_fFor the transformed image local gray scale target value, s_fThe target value of the standard deviation of the local gray scale of the image after transformation. c belongs to (0, 1) as the expansion constant of the image variance, and b belongs to (0, 1) as the image brightness coefficient constant.

The filter can greatly enhance image texture modes of different scales in an image, so that the quantity and the precision of feature points can be improved when the point features of the image are extracted, and the reliability and the precision of a matching result are improved in photo feature matching.

The method mainly comprises the steps of ① constructing a Hessian matrix, generating all interest points for feature extraction, aiming at generating stable edge points (mutant points) of an image, ② constructing a scale space feature point position, comparing each pixel point processed by the Hessian matrix with 26 points in a two-dimensional image space and scale space neighborhood, preliminarily positioning key points, filtering weak key points compared with energy, screening out the finally positioned key points, selecting a stable key point, and taking the maximum charar direction as a wavelet characteristic vector matching region, wherein the maximum charar direction of the wavelet characteristic vector matching region is a vertical characteristic vector of a vertical wavelet transform region, the maximum charar direction of the wavelet transform region is a vertical characteristic vector of a vertical wavelet transform region, the wavelet transform region is a vertical characteristic transform region, the maximum charar direction of the wavelet transform region is a vertical characteristic transform region, the wavelet transform region is a vertical wavelet transform region, the wavelet transform region is a vertical characteristic transform region, the wavelet transform region is a vertical wavelet transform region, the wavelet transform region is a vertical characteristic transform region, the wavelet transform region is a vertical wavelet transform region, the wavelet transform region is a vertical region, the wavelet transform region is a vertical region is a transform region is a wavelet transform region, the wavelet transform region is a transform region, the wavelet transform region is a transform region, the wavelet transform region is a transform region, the wavelet transform region is a transform region, the transform region is a transform.

And step 3: inputting matched feature point coordinates, resolving sparse human face three-dimensional point cloud and position and posture data of a photographing camera by using a light beam method adjustment, namely obtaining model coordinate values of the sparse human face model three-dimensional point cloud and the position; and performing multi-view photo dense matching by taking the sparse feature points as initial values to obtain dense point cloud data. The process mainly comprises four steps: stereo pair selection, depth map calculation, depth map optimization and depth map fusion. For each image in the input data set, we select a reference image to form a stereo pair for use in computing the depth map. Therefore, we can get rough depth maps of all images, which may contain noise and errors, and we use its neighborhood depth map to perform consistency check to optimize the depth map of each image. And finally, carrying out depth map fusion to obtain the three-dimensional point cloud of the whole scene.

And 4, step 4: and reconstructing a human face curved surface by using the dense point cloud. The method comprises the steps of defining an octree, setting a function space, creating a vector field, solving a Poisson equation and extracting an isosurface. And obtaining an integral relation between the sampling point and the indicating function according to the gradient relation, obtaining a vector field of the point cloud according to the integral relation, and calculating the approximation of the gradient field of the indicating function to form a Poisson equation. And (3) solving an approximate solution by using matrix iteration according to a Poisson equation, extracting an isosurface by adopting a moving cube algorithm, and reconstructing a model of the measured point cloud.

The method comprises the following steps of 5, carrying out full-automatic texture mapping on a face model, carrying out texture mapping after the surface model is built, wherein the main process comprises ① obtaining texture data to obtain a surface triangular surface grid of a target reconstructed through an image, ② analyzing the visibility of a triangular surface of the reconstructed model, calculating a visible image set and an optimal reference image of each triangular surface by using calibration information of the image, ③ clustering the triangular surfaces to generate texture patches, clustering the triangular surfaces into a plurality of reference image texture patches according to the visible image set, the optimal reference image and the neighborhood topological relation of the triangular surfaces, automatically sequencing ④ texture patches to generate texture images, sequencing the generated texture patches according to the size relation, generating the texture image with the minimum surrounding area, and obtaining texture mapping coordinates of each triangular surface.

It should be noted that the above algorithm is an optimization algorithm of the present invention, the algorithm is matched with the image acquisition condition, and the use of the algorithm takes account of the time and quality of the synthesis, which is one of the inventions of the present invention. Of course, it can be implemented using conventional 3D synthesis algorithms in the prior art, except that the synthesis effect and speed are somewhat affected.

Utilization of three-dimensional models

By using the method, a three-dimensional model of the target object can be synthesized, so that the real physical world object is completely digitalized. The digitalized information can be used for identifying and comparing objects, product design, 3D display, medical assistance and other purposes.

For example, after the three-dimensional information of the face is collected, the three-dimensional information can be used as a basis for identification and comparison to perform 3D identification on the face.

For example, a more conformable garment may be designed for a user using a three-dimensional model of the human body.

For example, after a three-dimensional model of a workpiece is generated, 3D printing can be directly performed.

For example, after a three-dimensional model of the interior of the body is generated, the body information can be digitized for use in simulating surgical procedures for medical teaching.

The target object, and the object all represent objects for which three-dimensional information is to be acquired. The object may be a solid object or a plurality of object components. For example, the head, hands, etc. The three-dimensional information of the target object comprises a three-dimensional image, a three-dimensional point cloud, a three-dimensional grid, a local three-dimensional feature, a three-dimensional size and all parameters with the three-dimensional feature of the target object. Three-dimensional in the present invention means having XYZ three-direction information, particularly depth information, and is essentially different from only two-dimensional plane information. It is also fundamentally different from some definitions, which are called three-dimensional, panoramic, holographic, three-dimensional, but actually comprise only two-dimensional information, in particular not depth information.

The capture area in the present invention refers to a range in which an image capture device (e.g., a camera) can capture an image. The image acquisition device can be a CCD, a CMOS, a camera, a video camera, an industrial camera, a monitor, a camera, a mobile phone, a tablet, a notebook, a mobile terminal, a wearable device, intelligent glasses, an intelligent watch, an intelligent bracelet and all devices with image acquisition functions.

The rotation movement of the invention is that the front position collection plane and the back position collection plane are crossed but not parallel in the collection process, or the optical axis of the front position image collection device and the optical axis of the back position image collection device are crossed but not parallel. That is, the capture area of the image capture device moves around or partially around the target, both of which can be considered as relative rotation. Although the embodiment of the present invention exemplifies more orbital rotation, it should be understood that the limitation of the present invention can be used as long as the non-parallel motion between the acquisition region of the image acquisition device and the target object is rotation. The scope of the invention is not limited to the embodiment with track rotation.

In the description provided herein, numerous specific details are set forth. It is understood, however, that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.

Similarly, it should be appreciated that in the foregoing description of exemplary embodiments of the invention, various features of the invention are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more of the various inventive aspects. However, the disclosed method should not be interpreted as reflecting an intention that: that the invention as claimed requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this invention.

Those skilled in the art will appreciate that the modules in the device in an embodiment may be adaptively changed and disposed in one or more devices different from the embodiment. The modules or units or components of the embodiments may be combined into one module or unit or component, and furthermore they may be divided into a plurality of sub-modules or sub-units or sub-components. All of the features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or elements of any method or apparatus so disclosed, may be combined in any combination, except combinations where at least some of such features and/or processes or elements are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.

Furthermore, those skilled in the art will appreciate that while some embodiments herein include some features included in other embodiments, rather than other features, combinations of features of different embodiments are meant to be within the scope of the invention and form different embodiments. For example, in the claims, any of the claimed embodiments may be used in any combination.

The various component embodiments of the invention may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. It will be appreciated by those skilled in the art that a microprocessor or Digital Signal Processor (DSP) may be used in practice to implement some or all of the functionality of some or all of the components in an apparatus in accordance with embodiments of the present invention. The present invention may also be embodied as apparatus or device programs (e.g., computer programs and computer program products) for performing a portion or all of the methods described herein. Such programs implementing the present invention may be stored on computer-readable media or may be in the form of one or more signals. Such a signal may be downloaded from an internet website or provided on a carrier signal or in any other form.

It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The usage of the words first, second and third, etcetera do not indicate any ordering. These words may be interpreted as names.

Thus, it should be appreciated by those skilled in the art that while a number of exemplary embodiments of the invention have been illustrated and described in detail herein, many other variations or modifications consistent with the principles of the invention may be directly determined or derived from the disclosure of the present invention without departing from the spirit and scope of the invention. Accordingly, the scope of the invention should be understood and interpreted to cover all such other variations or modifications.

Claims

1. A method for fast matching images in the process of three-dimensional model construction is characterized in that:

step 1: selecting a reference image and a search image to be matched;

step 2: respectively constructing image pyramids of the two images;

2. The method of claim 1, wherein: the image pyramid is a set of images arranged in a pyramid shape with gradually decreasing resolution, with the bottom of the pyramid being the original image and the top being the low resolution image sampled from the original image.

3. The method of claim 1, wherein: the geometric constraint condition is an affine transformation relation between the reference image and the search image, and 6 affine transformation parameters are obtained through calculation of homonymous point pairs obtained through SURF image matching.

4. The method of claim 1, wherein: the step 5 comprises the following steps: and correcting and sampling the search image window to the image space coordinate system of the reference image by using an affine transformation coefficient, so that the conjugate entities of the reference image and the search image are consistent on the image space, namely the reference image and the search image have no rotation, geometric deformation and scale change.

5. The method of claim 1, wherein: the step 6 comprises the following steps: and selecting a matching window of the reference image target area and the search image in a preset range to calculate a correlation coefficient, and taking a point with the maximum correlation coefficient and larger than a threshold value as a matching point.

6. The method of claim 1, wherein: at the time of the image acquisition, the image acquisition is carried out,

7. The method of claim 6, wherein: δ < 0.603; preferably δ <0.498, δ <0.356, δ < 0.311.

8. A three-dimensional model construction method is characterized in that: method comprising the matching of claims 1-7.

9. A three-dimensional data comparison method is characterized in that: method comprising the matching of claims 1-7.