CN117197405A

Movatterモバイル変換

Info

Publication number: CN117197405A
Application number: CN202311267464.4A
Authority: CN
Inventors: 尹英杰; 丁菁汀; 李亮
Original assignee: Alipay Hangzhou Information Technology Co Ltd
Current assignee: Alipay Hangzhou Information Technology Co Ltd
Priority date: 2023-09-27
Filing date: 2023-09-27
Publication date: 2023-12-08

Abstract

The embodiment of the specification discloses an augmented reality method, a system and a storage medium of a three-dimensional object, wherein the method can firstly perform feature extraction processing on an image to be processed of a first terminal to obtain a first feature when performing special effect rendering on the three-dimensional object on the image; then, combining the feature matching processing result of the first feature and the point cloud feature set to obtain a target point cloud feature; then, camera pose data can be calculated according to the first feature and the target point cloud feature, and rendering processing is carried out on the image to be processed based on the camera pose data and the target point cloud feature.

Description

Augmented reality method, system and storage medium for three-dimensional object

Technical Field

The embodiment of the specification belongs to the technical field of augmented reality, and particularly relates to an augmented reality method, an augmented reality system and a storage medium for a three-dimensional object.

Background

As a technology for promoting integration between real world information and virtual world information content, the augmented reality technology has been widely used in various fields because of its strong interactivity and interest.

Since most of the augmented reality technologies show corresponding interactive special effects by means of identification codes, how to realize increasing the interestingness of the interactive special effects has become a major consideration, and therefore a technical scheme for increasing the interestingness of the interactive special effects is needed.

Disclosure of Invention

The embodiment of the specification provides an augmented reality method, a system and a storage medium for a three-dimensional object, and the technical scheme is as follows:

in a first aspect, embodiments of the present disclosure provide an augmented reality method for a three-dimensional object, including:

performing feature extraction processing on an image to be processed of a first terminal to obtain a first feature;

performing feature matching processing on the point cloud feature set based on the first feature to obtain a target point cloud feature; the point cloud feature set consists of point cloud features corresponding to at least two three-dimensional objects;

and calculating camera pose data according to the first features and the target point cloud features, and rendering an image to be processed based on the camera pose data and the target point cloud features.

In a second aspect, embodiments of the present disclosure provide an augmented reality system for a three-dimensional object, comprising:

the feature extraction module is used for carrying out feature extraction processing on the image to be processed of the first terminal to obtain first features;

the feature matching module is used for carrying out feature matching processing on the point cloud feature set based on the first feature to obtain a target point cloud feature; the point cloud feature set consists of point cloud features corresponding to at least two three-dimensional objects;

And the image rendering module is used for calculating camera pose data according to the first features and the target point cloud features, and rendering the image to be processed based on the camera pose data and the target point cloud features.

In a third aspect, embodiments of the present disclosure also provide an augmented reality system for a three-dimensional object, which may include: a processor and a memory; wherein the memory stores a computer program adapted to be loaded by the processor and to perform the above-mentioned augmented reality method steps of the three-dimensional object.

In a fourth aspect, embodiments of the present description provide a computer storage medium having stored thereon a plurality of instructions adapted to be loaded by a processor and to perform the above-described augmented reality method steps of a three-dimensional object.

The technical scheme provided by some embodiments of the present specification has the following beneficial effects:

in one or more embodiments of the present disclosure, in a process of performing special effect rendering on a three-dimensional object on an image, feature extraction processing may be performed on an image to be processed of a first terminal, to obtain a first feature; then, carrying out feature matching processing on the point cloud feature set based on the first feature to obtain a target point cloud feature; and then, calculating camera pose data according to the first features and the target point cloud features, and rendering an image to be processed based on the camera pose data and the target point cloud features. By means of matching processing of the features in the image to be processed and the point cloud feature sets corresponding to the plurality of three-dimensional objects, a user can achieve special effect display of the corresponding three-dimensional objects in the process of scanning the objects, interestingness and interactivity of special effect generation are effectively improved, calculated camera pose data can be combined, and special effect display effect on the image to be processed is effectively guaranteed.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present description, the drawings that are required in the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present description, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a schematic diagram of a system architecture of an augmented reality method for a three-dimensional object according to an embodiment of the present disclosure;

FIG. 2 is an overall flow chart of an augmented reality method for a three-dimensional object provided in an embodiment of the present disclosure;

fig. 3 is a schematic view of an effect of an image to be processed according to an embodiment of the present disclosure;

FIG. 4 is a schematic view of the effect of a three-dimensional image set according to the embodiment of the present disclosure;

FIG. 5 is a general flow chart of yet another method of augmented reality for a three-dimensional object provided by embodiments of the present disclosure;

fig. 6 is a schematic view of the effect of yet another image to be processed according to the embodiment of the present disclosure;

fig. 7 is a schematic structural diagram of an augmented reality system for a three-dimensional object according to an embodiment of the present disclosure;

fig. 8 is a schematic structural diagram of a server according to an embodiment of the present disclosure.

Detailed Description

The technical solutions in the embodiments of the present specification will be clearly and completely described below with reference to the drawings in the embodiments of the present specification.

The terms first, second, third and the like in the description and in the claims and in the above drawings are used for distinguishing between different objects and not necessarily for describing a particular sequential or chronological order. Furthermore, the terms "comprise" and "have," as well as any variations thereof, are intended to cover a non-exclusive inclusion. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those listed steps or elements but may include other steps or elements not listed or inherent to such process, method, article, or apparatus.

The following description provides examples and does not limit the scope, applicability, or examples set forth in the claims. Changes may be made in the function and arrangement of elements described without departing from the scope of the disclosure herein. Various examples may omit, replace, or add various procedures or components as appropriate. For example, the described methods may be performed in a different order than described, and various steps may be added, omitted, or combined. Furthermore, features described with respect to some examples may be combined into other examples.

The present disclosure may introduce some fields of application of augmented reality technology before describing in detail the method of augmented reality for a three-dimensional object in connection with one or more embodiments.

The augmented reality technology can be understood as a well-known AR technology, that is, a special effect of a virtual object is presented on a mobile terminal, and the special effect presentation mode of the virtual object can be, but not limited to, a dynamic special effect automatically and circularly played on a display interface, or a corresponding special effect is displayed according to user operation received by the display interface, and the technology has been widely applied in a plurality of fields. For example, taking the marketing field as an example, a user can scan a product code or a two-dimensional code corresponding to a marketing product through a mobile terminal, so that a corresponding product special effect is presented on a display interface of the mobile terminal, and the user can more truly experience the product; for example, taking the live broadcast field as an example, a user can select a virtual product in a live broadcast display interface of a mobile terminal, so that the virtual product is overlapped in the current live broadcast display interface in a picture special effect overlapping mode, and the live broadcast interaction experience of the user is enhanced conveniently; for example, in the social field, the user may scan the identification code or the two-dimensional code of the stranger through the mobile terminal, so as to display the identity special effect of the stranger on the display interface of the mobile terminal, thereby realizing the interesting of making friends.

It can be understood that in the process of displaying the corresponding special effects through the above mentioned identification codes, the camera of the mobile terminal actually obtains the plane of the identification code to identify the pose data in the coordinate system of the plane of the identification code, then the pose data of the plane of the identification code can be converted into the pose data in the corresponding camera coordinate system, then the pose data in the corresponding camera coordinate system can be converted into the pose data in the corresponding screen coordinate system, and further the corresponding special effects displayed on the identification code on the display interface of the mobile terminal can be realized by combining the corresponding rendering mode.

However, the interactive special effect displayed in the mode of the identification code is mainly aimed at a planar target, the interestingness of the interactive special effect corresponding to the three-dimensional target cannot be effectively improved, and the three-dimensional interactive experience of a user is easily affected. In order to increase the interest of the interactive special effects corresponding to the three-dimensional object, the method for enhancing the reality of the three-dimensional object is described in detail below in connection with one or more embodiments.

Referring to fig. 1, fig. 1 is a schematic diagram of a system architecture of an augmented reality method for a three-dimensional object according to an embodiment of the present disclosure.

As shown in fig. 1, the system architecture of the augmented reality method of the three-dimensional object may include a server 101, a first terminal 102 and a second terminal 103, where the server 101 may establish a connection with the first terminal 102 and the second terminal 103, respectively, for receiving images acquired by the first terminal 102 and the second terminal 103, and sending interactive special effects corresponding to a target three-dimensional object to the first terminal 102 or the second terminal 103, where the target three-dimensional object may be in the images acquired by the first terminal 102 or the second terminal 103. The server 101 may be understood as a processing platform, configured to acquire an image including a target three-dimensional object acquired by the first terminal 102, and obtain a target point cloud feature matched with the target three-dimensional object by performing matching processing on the image including the target three-dimensional object and the point cloud feature set. Here, the point cloud feature set may be obtained by processing, by the server 101, the second terminal 103 based on video or multiple frames of continuous images acquired by different types of three-dimensional objects (the video or multiple frames of continuous images show different angles of the three-dimensional object), and may include, but not limited to, point cloud features respectively corresponding to at least two types of three-dimensional objects, where each type of point cloud feature may specifically include spatial point cloud position coordinates corresponding to multiple feature points of the three-dimensional object, plane position coordinates of each feature point in the video or multiple frames of continuous images acquired by the second terminal 103, and local features corresponding to the plane position coordinates, where the local features may be, but are not limited to, feature descriptors.

In the embodiment of the present disclosure, before acquiring the image including the target three-dimensional object acquired by the first terminal 102, the server 101 may obtain the corresponding point cloud feature set by processing the video or the multi-frame continuous image acquired by the second terminal 103 based on the three-dimensional object of different types; and the system architecture of the augmented reality method of the three-dimensional object may only include the server 101 and the first terminal 102 without acquiring the video or the multi-frame continuous images acquired based on the three-dimensional object of different types through the second terminal 103 when the video or the multi-frame continuous images corresponding to the three-dimensional object of different types are stored in advance.

It can be understood that the target point cloud feature matching the target three-dimensional object may be a target point cloud feature corresponding to a three-dimensional object similar to the target three-dimensional object (with a similar shape or appearance), which may be, but is not limited to, determined by a distance calculation manner between feature points or a distance calculation manner between local features corresponding to feature points, and is not limited to this embodiment.

Then, after obtaining the target point cloud feature matching the target three-dimensional object, the server 101 may calculate corresponding camera pose data by combining, but not limited to, the feature of the target three-dimensional object with the target point cloud feature, where the feature of the target three-dimensional object may be obtained by, but not limited to, performing feature extraction processing on an image including the target three-dimensional object, and may specifically include plane position coordinates corresponding to a plurality of feature points of the target three-dimensional object, and corresponding local features, and the local features may be, but not limited to, feature descriptors. It can be appreciated that the camera pose data may, but is not limited to, be composed of 6 degrees of freedom, that is, the position coordinates of the camera corresponding to the first terminal 102 in the world coordinate system and the corresponding roll angle, pitch angle and yaw angle, and the camera pose data may specifically be calculated by the pose estimation method from the plane position of the target three-dimensional object and the corresponding matched spatial point cloud position coordinates.

Of course, the above mentioned multiple coordinate conversion processes and rendering manners may also be processed by a preset rendering engine, that is, the server 101 may send the camera pose data and the corresponding three-dimensional object rendering special effects to the rendering engine, so that the rendering engine may render the special effects according to the camera pose data and the corresponding three-dimensional object, and perform rendering processing on the image acquired by the first terminal 102, so that the special effects are attached to the target three-dimensional object in the image.

In this embodiment of the present disclosure, the first terminal 102 may be understood as a user terminal for acquiring an image including a target three-dimensional object, that is, after the user may, but is not limited to, acquire an image meeting the user requirement by a third party application program on the first terminal 102, the server 101 corresponding to the third party application program performs special effect processing on the image, and displays a special effect of the three-dimensional object corresponding to the target three-dimensional object on a display interface of the third party application program, where the special effect of the three-dimensional object may be attached to the target three-dimensional object.

The second terminal 103 may be understood as a data terminal for establishing a point cloud feature set, which may, but is not limited to, collect multi-angle video or multi-frame continuous images of different three-dimensional objects by a merchant corresponding to the above-mentioned third party application, and process the multi-angle video or multi-frame continuous images of different three-dimensional objects by the server 101 corresponding to the third party application to obtain the point cloud feature set. It can be understood that the point cloud feature set may further perform corresponding update processing according to the multi-angle video or the multi-frame continuous image of the new type three-dimensional object sent by the second terminal 103, so as to meet the multiple requirements of the user in real time.

The first terminal 102 or the second terminal 103 according to the embodiments of the present disclosure may be a mobile phone, a tablet computer, a desktop, a laptop, a notebook, an Ultra mobile personal computer (Ultra-mobile Personal Computer, UMPC), a handheld computer, a netbook, a personal digital assistant (Personal Digital Assistant, PDA), or the like.

Referring next to fig. 2, fig. 2 is a flowchart illustrating an overall method for augmented reality of a three-dimensional object according to an embodiment of the present disclosure.

As shown in fig. 2, the augmented reality method of the three-dimensional object may at least include the following steps:

step 202, performing feature extraction processing on an image to be processed of a first terminal to obtain a first feature.

Specifically, in the process of performing special effect rendering on the three-dimensional object, the image to be processed including the target three-dimensional object acquired by the first terminal may be received, where the first terminal may be understood as a user terminal, that is, the user may, but is not limited to, acquire the image to be processed meeting the user requirement through a third party application program on the first terminal.

Referring to fig. 3, as shown in fig. 3, the first terminal includes a three-dimensional object image consistent with a basketball shape in the image to be processed captured by the third party application.

Further, after the to-be-processed image including the target three-dimensional object is obtained, feature extraction processing may be performed on the to-be-processed image to obtain position coordinates of a plurality of feature points corresponding to the target three-dimensional object in the to-be-processed image, and local features corresponding to each position coordinate.

In the process of performing feature extraction processing on the image to be processed, the feature point detection algorithm may be used to determine a plurality of feature points corresponding to the target three-dimensional object, that is, any one pixel point is selected from the image to be processed, a brightness threshold is determined according to the brightness value of the pixel point, and whether the brightness value corresponding to the continuous n pixel points is greater than the sum of the brightness value of the selected pixel point and the brightness threshold or less than the difference between the brightness value of the selected pixel point and the brightness threshold (n is a positive integer greater than or equal to 2) is determined in the range of taking the pixel point as the center of a circle and the preset radius. It can be understood that when there are n pixel points whose corresponding luminance values are greater than the sum of the luminance value of the selected pixel point and the luminance threshold value or less than the difference between the luminance value of the selected pixel point and the luminance threshold value, the selected pixel point can be used as a feature point in the image to be processed, and the position coordinates of the feature point in the image to be processed can be recorded. Of course, the feature point in the image to be processed can also be determined by comparing the difference between the gray value of any one selected pixel point and the gray values of all adjacent pixel points, and the method is not limited thereto.

After determining a plurality of feature points in the image to be processed, a corresponding local feature can be obtained by comparing the gray value differences between all pixel points around each feature point, where the local feature can be understood as a feature descriptor, that is, a binary descriptor, and a character 0 and a character 1 in the binary descriptor can correspond to the gray value of all pixel points around the feature point and the size relationship between the feature point, for example, when the gray value of a certain pixel point around is smaller than the gray value of the feature point, a character in the binary descriptor corresponds to 0; when the gray value of a certain pixel point around is larger than the gray value of the feature point, the character in the binary descriptor corresponds to 1.

In the embodiment of the present disclosure, in addition to the above-mentioned feature extraction processing manner, the feature points of the target three-dimensional object may be extracted from the image to be processed by using a SIFT feature extraction algorithm or a convolutional neural network including a feature point detector and a description sub-network, which is not limited thereto.

And 204, performing feature matching processing on the point cloud feature set based on the first feature to obtain the target point cloud feature.

Specifically, after determining the first feature corresponding to the target three-dimensional object in the image to be processed, feature matching processing may be performed on a pre-stored point cloud feature set according to the first feature, so as to screen out target point cloud features matched with the target three-dimensional object from the point cloud feature set. The pre-stored point cloud feature set may be obtained by, but not limited to, processing video or multi-frame continuous images of different types of three-dimensional objects collected by the second terminal, where the point cloud feature set may specifically include spatial point cloud position coordinates corresponding to a plurality of feature points of each three-dimensional object collected by the second terminal, plane position coordinates of each feature point in the video or multi-frame continuous images collected by the second terminal, and local features corresponding to the plane position coordinates.

The second terminal may be understood as a data terminal for establishing a point cloud feature set, which may, but is not limited to, capture multi-angle videos or multi-frame continuous images of different three-dimensional objects by a merchant corresponding to the above-mentioned third party application, for example, the first terminal may be a user terminal corresponding to a different user in the marketing field, the second terminal may be a data terminal corresponding to a marketing merchant, the user may capture, through the first terminal, a to-be-processed image of a desired interactive special effect in the third party application, and the marketing merchant may capture multi-angle videos or multi-frame continuous images of all types of three-dimensional objects included in the third party application through the second terminal.

As an option of the embodiment of the present disclosure, before performing feature matching processing on the point cloud feature set based on the first feature, before obtaining the target point cloud feature, the method further includes:

acquiring a three-dimensional image set corresponding to each three-dimensional object based on a second terminal; each three-dimensional image set comprises at least two three-dimensional object images, and the shooting angles of the three-dimensional objects corresponding to each three-dimensional object image are different;

performing image pairing processing on all three-dimensional object images in each three-dimensional image set to obtain at least two paired images corresponding to each three-dimensional image set; wherein each set of paired images includes two three-dimensional object images;

respectively carrying out feature extraction processing on each three-dimensional object image in each group of paired images to obtain a second feature of each three-dimensional object image;

and obtaining a point cloud feature set based on each three-dimensional image set and the second features of all corresponding three-dimensional object images.

Before the feature matching processing is performed on the point cloud feature set based on the first feature, a corresponding point cloud feature set can be obtained through the three-dimensional image set acquired by the second terminal, so that the authenticity and reliability of the interactive special effect are guaranteed.

Specifically, in the process of obtaining the point cloud feature set, the method may, but is not limited to, receive a three-dimensional image set corresponding to each three-dimensional object collected by the second terminal, where the type of each three-dimensional object may be consistent with the type of the three-dimensional object set in the third party application program mentioned above, that is, after the user collects, through the first terminal, an image to be processed on the third party application program, an interactive special effect corresponding to a certain three-dimensional object, which is set in the third party application program, is finally presented on the first terminal.

Here, the three-dimensional image set corresponding to each three-dimensional object may be understood as surrounding shooting of each three-dimensional object through the second terminal, and the collected multi-angle video or multi-frame continuous image may be collected for the purpose of the surrounding shooting, so that all angles of each three-dimensional object may be collected, so as to obtain a better reduction effect of the three-dimensional object in the interactive special effect. It is to be understood that the above-mentioned continuous images may be continuous images of a plurality of frames having identical time intervals, and the photographing angles of the three-dimensional objects in each image are different.

Referring to fig. 4, an effect diagram of a three-dimensional image set provided in the embodiment of the present disclosure is shown in fig. 4, where the three-dimensional image set includes two continuous images, a first image (i.e., an effect diagram of an upper half of fig. 4) may be a schematic side view of a three-dimensional object with a shape consistent with a basketball shape, and a second image (i.e., an effect diagram of a lower half of fig. 4) may be a schematic front view of the three-dimensional object with a shape consistent with the basketball shape, and is not limited to the continuous image frame number shown in fig. 4.

Then, after the multiple sets of paired images are determined, feature extraction processing can be performed on two three-dimensional object images in each set of paired images respectively, so as to obtain position coordinates of a plurality of feature points corresponding to the three-dimensional object in each three-dimensional object image and local features corresponding to each position coordinate.

In the process of performing feature extraction processing on each three-dimensional object image in each paired image, a plurality of feature points corresponding to the three-dimensional object may be determined by a feature point detection algorithm, that is, any one pixel point is selected in each three-dimensional object image, a brightness threshold is determined according to a brightness value of the pixel point, and whether the brightness value corresponding to n continuous pixel points is greater than the sum of the brightness value and the brightness threshold of the selected pixel point or less than the difference between the brightness value of the selected pixel point and the brightness threshold (n is a positive integer greater than or equal to 2) is determined in a range with the pixel point as a center of a preset radius. It can be understood that when there are n pixel points whose corresponding luminance value is greater than the sum of the luminance value of the selected pixel point and the luminance threshold value or less than the difference between the luminance value of the selected pixel point and the luminance threshold value, the selected pixel point can be used as a feature point in each three-dimensional object image, and the position coordinates of the feature point in each three-dimensional object image can be recorded. Of course, the feature point in the image to be processed can also be determined by comparing the difference between the gray value of any one selected pixel point and the gray values of all adjacent pixel points, and the method is not limited thereto.

After determining the plurality of feature points in each three-dimensional object image, a corresponding local feature may be obtained, but is not limited to, by comparing the differences between gray values of all pixel points around each feature point, where a local feature may be understood as a feature descriptor, that is, a binary descriptor, and a character 0 and a character 1 in the binary descriptor may correspond to the magnitude relation between the gray values of all pixel points around a feature point and the feature point, for example, when the gray value of a certain pixel point around is smaller than the gray value of the feature point, a character in the binary descriptor corresponds to 0; when the gray value of a certain pixel point around is larger than the gray value of the feature point, the character in the binary descriptor corresponds to 1.

Then, based on each three-dimensional image set and the second characteristics of all the corresponding three-dimensional object images, the point cloud characteristics corresponding to each three-dimensional image set can be obtained, and the point cloud characteristics corresponding to all the three-dimensional image sets can be used as the point cloud characteristic sets.

As a further alternative of the embodiments of the present specification, obtaining a set of point cloud features based on each set of three-dimensional images and the second features of the respective all three-dimensional object images, includes:

Performing feature point matching processing on two second features corresponding to each group of paired images in each three-dimensional image set to obtain matched feature points;

performing point cloud reconstruction processing based on all the matching feature points corresponding to each three-dimensional image set and corresponding camera parameters to obtain point cloud features;

and obtaining a point cloud feature set according to each three-dimensional image set and the corresponding point cloud feature.

Because the second feature of each three-dimensional object image in each group of paired images of each three-dimensional image set comprises plane position coordinates, the camera parameters of the second terminal and the matched feature points of each three-dimensional object image are required to be combined, and corresponding point cloud features are obtained in a point cloud reconstruction processing mode, so that camera pose data with higher precision can be obtained later.

Then, after obtaining all the matching feature points corresponding to each three-dimensional image set, the camera parameters corresponding to the second terminal may be obtained by, but not limited to, a camera calibration processing manner, where the camera parameters may specifically be an internal reference matrix and an external reference matrix of the camera, the parameter types in the internal reference matrix may be parameters such as a focal length, a principal point position, and a distortion parameter, and the external reference matrix may include a rotation matrix and a translation matrix of the camera. Here, the processing manner of camera calibration is a common technical means in the field, and will not be repeated here.

Then, after obtaining the camera parameters corresponding to the second terminal, a corresponding projection matrix may be constructed based on the camera parameters, and all the matching feature points corresponding to each three-dimensional image set may be converted according to the projection matrix to obtain corresponding homogeneous coordinate pairs (including the plane position coordinates of each matching feature point and the coordinates after conversion treatment), and a homogeneous linear equation may be constructed for the homogeneous coordinates corresponding to all the matching feature points, and the homogeneous linear equation may be solved by a calculation manner such as a least square method, so as to obtain corresponding homogeneous coordinate estimation values, that is, space point cloud position coordinates. In order to ensure the effectiveness of the space point cloud position coordinates, after the homogeneous coordinate estimated value is obtained, normalization processing is performed on the homogeneous coordinate estimated value, and the homogeneous coordinate estimated value after normalization processing is used as the space point cloud position coordinates.

In the embodiment of the present disclosure, in order to ensure accuracy of the spatial point cloud position coordinates and the camera parameters, after the spatial point cloud position coordinates are obtained, the spatial point cloud position coordinates may be converted based on the camera parameters to obtain projection plane coordinates corresponding to the spatial point cloud position coordinates, and the re-projection errors between the projection plane coordinates and the corresponding plane position coordinates may be used as optimization targets, and the spatial point cloud position coordinates and the camera parameters may be optimized by using an optimization algorithm such as a Levenberg-Marquard algorithm.

Then, after obtaining all the space point cloud position coordinates corresponding to each three-dimensional image set, the plane position coordinates of the feature points corresponding to each space point cloud position coordinate in all the paired images and the corresponding local features can be used as the point cloud features corresponding to each three-dimensional image set, and the set of the point cloud features corresponding to all the three-dimensional image sets can be used as the point cloud feature set.

Here, the point cloud feature corresponding to each three-dimensional image set may be represented as, but not limited to:

P＝{(X_i ，Y_i ，Z_i ，F_i )，0＜i≤N}

in the above expression, (X)_i ，Y_i ，Z_i ) Can be corresponding to the point cloud coordinates of the ith feature point in the reconstruction space, F_i ＝{(x_j ，y_j )，f_j 0 < j.ltoreq.M may correspond to the local feature set of the ith feature point in all the paired images, where (x)_j ，y_j ) Can be corresponding to the plane position coordinates of the ith feature point in the jth three-dimensional object image, f_j May correspond to the local features of the ith feature point in the jth three-dimensional object image.

As a further alternative of the embodiments of the present disclosure, obtaining a set of point cloud features from each set of three-dimensional images and the corresponding point cloud features includes:

extracting a first frame of three-dimensional object image from each three-dimensional image set, and carrying out segmentation processing on the first frame of three-dimensional object image to obtain a first target segmentation mask image;

based on the first frame of three-dimensional object image and the corresponding first target segmentation mask image, carrying out segmentation processing on all the three-dimensional object images remained in the three-dimensional image set to obtain a target segmentation mask image corresponding to each three-dimensional object image;

and screening the corresponding point cloud features according to all the target segmentation mask patterns in each three-dimensional image set to obtain a point cloud feature set.

In order to ensure the accuracy of the point cloud feature sets, main body target segmentation processing can be performed on the three-dimensional object images in each three-dimensional image set, and screening processing is performed on the point cloud feature sets by integrating the three-dimensional images after the segmentation processing.

Specifically, in the process of obtaining the point cloud feature set, the main object segmentation process may be performed on the first frame three-dimensional object image in each three-dimensional image set through a preset segmentation model SAM (or other segmentation models), so as to obtain a first object segmentation mask map. Here, the preset segmentation model SAM may specifically include an image encoder, a hint encoder, and a mask decoder, so as to effectively map the image encoding, the hint encoding, and the output token to the mask, thereby obtaining the target segmentation mask map.

Then, all the three-dimensional object images remained in the three-dimensional image sets can be subjected to segmentation processing according to the first target segmentation mask image corresponding to each three-dimensional image set so as to obtain target segmentation mask images corresponding to each of the remaining three-dimensional object images. Here, in the process of performing the segmentation process on all the three-dimensional object images remaining in the three-dimensional image set based on the first target segmentation mask map, the tracking process may be performed on all the three-dimensional object images remaining in the three-dimensional image set according to the input first frame three-dimensional object map and the corresponding target segmentation mask map to obtain a target segmentation mask map corresponding to each of the remaining three-dimensional object images, but not limited to using a video object segmentation architecture (may also be referred to as XMem). It will be appreciated that the video object segmentation architecture may be comprised of a query encoder operable to extract image features of a particular query, a decoder operable to convert the output of the memory read step into an object mask, and a value encoder operable to combine the image with the object mask to extract new memory features.

Then, after the target segmentation mask map corresponding to each of the remaining three-dimensional object images is obtained, in order to more accurately determine whether the feature points in the point cloud feature set are target feature points, the position coordinates of any one feature point in the point cloud feature set on each three-dimensional object image may be, but are not limited to, compared, and whether the position coordinates are within the region where the target segmentation mask map of the corresponding three-dimensional object image is located. Possibly, when the position coordinate of any one feature point on each three-dimensional object image is determined to be in the area of the target segmentation mask map of the corresponding three-dimensional object image, the feature point is indicated to be a target feature, and further, the point cloud feature corresponding to the feature point can be reserved. Possibly, when the position coordinate of any one feature point on each three-dimensional object image is determined to be not in the area of the target segmentation mask image of the corresponding three-dimensional object image, the feature point is not the target feature, and then the point cloud feature corresponding to the feature point can be removed.

Next, after each feature point in each three-dimensional image set is processed in combination with the target segmentation mask map obtained by the target segmentation process, a set of point cloud features of all feature points remaining in all three-dimensional image sets may be regarded as a point cloud feature set.

And 206, calculating camera pose data according to the first features and the target point cloud features, and rendering an image to be processed based on the camera pose data and the target point cloud features.

Specifically, after the target point cloud feature is matched from the point cloud feature set, camera pose data can be obtained by a pose estimation processing mode according to, but not limited to, plane position coordinates corresponding to feature points in an image to be processed and space point cloud position coordinates corresponding to the feature points in the target point cloud feature, wherein the camera pose data can be composed of, but not limited to, 6 degrees of freedom, namely, position coordinates of a camera of the first terminal in a world coordinate system, and corresponding rolling angles, pitching angles and yaw angles.

Further, after the camera pose data is determined, the rendering process may be performed on the image to be processed acquired by the first terminal by combining the three-dimensional object rendering special effect corresponding to the cloud feature of the target point with the camera pose data through, but not limited to, a preset rendering engine, so that the three-dimensional object rendering special effect is attached to the target three-dimensional object in the image to be processed. Here, the three-dimensional object rendering special effect can be automatically and circularly played on the target three-dimensional object in the image to be processed, or the three-dimensional object with the corresponding view angle can be displayed according to the operation of the user on the display interface.

In the embodiment of the specification, by means of matching the characteristics in the image to be processed with the point cloud characteristic sets corresponding to the plurality of three-dimensional objects, the user can realize special effect display of the corresponding three-dimensional objects in the process of scanning the objects, the interestingness and interactivity of special effect generation are effectively improved, calculated camera pose data can be combined, and the special effect display effect on the image to be processed is effectively guaranteed.

The foregoing describes specific embodiments of the present disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims can be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.

As yet another alternative to the embodiment of the present disclosure, reference is next made to the overall flowchart of yet another method for augmented reality of a three-dimensional object provided by the embodiment of the present disclosure shown in fig. 5.

As shown in fig. 5, the augmented reality method of the three-dimensional object may at least include the following steps:

step 502, performing feature extraction processing on an image to be processed of the first terminal to obtain a first feature.

Specifically, step 502 may refer to step 202, which is not repeated herein.

Step 504, calculating a hamming distance between the feature descriptor of each feature point in the first feature and the feature descriptor of any one feature point in the point cloud feature set.

Specifically, in the process of performing feature matching processing based on the first feature set of the point cloud, in order to ensure the effectiveness and accuracy of the matching processing, the hamming distance between the feature descriptor of each feature point in the point cloud feature set and the feature descriptor of any one feature point in the point cloud feature set may be calculated, but not limited to, according to the feature descriptor of each feature point in the first feature. Here, a feature descriptor may be understood as a fixed length binary code, and the hamming distance between any two feature descriptors may be understood as calculating the number of different bits between the two binary codes, for example, but not limited to, two feature descriptors may be represented as 10001010 and 10010010, respectively, and the hamming distance calculated by the two feature descriptors is 2. Of course, the target point cloud feature may also be determined by calculating the similarity between any two feature descriptors, for example, the point cloud position coordinates corresponding to all feature points with the highest similarity in the point cloud feature set are used as the target point cloud feature.

It will be appreciated that other feature matching processes may be used in the embodiments of the present disclosure, for example, when the type of local feature is a floating point feature description, the corresponding feature matching process may be to calculate the euclidean distance, and is not limited thereto.

And 506, taking the point cloud position coordinates corresponding to all the feature points with the minimum hamming distance in the point cloud feature set as the target point cloud features.

Specifically, after the hamming distance between the feature descriptor of each feature point in the first feature and the feature descriptor of any one feature point in the point cloud feature set is calculated, the point cloud position coordinates corresponding to all feature points with the smallest hamming distance in the point cloud feature set can be used as the target point cloud features. It can be understood that when the calculated type is euclidean distance, the point cloud position coordinates corresponding to all feature points with minimum hamming distance in the point cloud feature set can be used as the target point cloud features.

And step 508, calculating camera pose data according to the first features and the target point cloud features, and rendering an image to be processed based on the camera pose data and the target point cloud features.

Specifically, step 508 can refer to step 206, which is not repeated here.

In the embodiment of the specification, the accuracy of the cloud features of the target point can be improved by a processing mode of carrying out Hamming distance calculation on the feature descriptors, so that the special effect of the corresponding three-dimensional object displayed by the user in the process of scanning the object can be more real.

As still another option of the embodiment of the present specification, calculating the camera pose data according to the first feature and the target point cloud feature includes:

extracting point cloud position coordinates corresponding to any four feature points from the target point cloud features, and calculating a mass center coordinate according to the four point cloud position coordinates;

Calculating camera position coordinates corresponding to each point cloud position coordinate based on the centroid coordinates, the four point cloud position coordinates and the image position coordinates corresponding to each point cloud position coordinate in the first feature;

and calculating the pose data of the camera based on the position coordinates of each point cloud and the corresponding position coordinates of the camera.

Specifically, in the process of calculating the camera pose data, point cloud position coordinates corresponding to any four feature points can be extracted from the target point cloud features, and taken as base points, weighting coefficients corresponding to each of the four feature points can be calculated by comparing the point cloud position coordinates corresponding to all the feature points in the target point cloud features with the position relationship of the point cloud position coordinates corresponding to each of the four feature points. Here, the positional relationship may be represented by calculating the sum of distances between the position coordinates of the point cloud, where a larger sum of distances indicates a lower weighting coefficient of the corresponding feature point, and the sum of weighting coefficients corresponding to all the feature points in the arbitrary four feature points is 1.

Next, after calculating the centroid coordinates, the following relational expression may be obtained, but not limited to, according to the centroid coordinates, the four point cloud position coordinates, and the point cloud position coordinates corresponding to all the feature points in the target point cloud coordinates:

In the above, P_i^w Can be correspondingly the point cloud position coordinates, alpha, corresponding to all characteristic points in the target point cloud coordinates_ij Can be correspondingly the barycenter coordinate, C_ij^w Four point cloud location coordinates may correspond.

in the above, P_i^c Can be correspondingly the image position coordinates corresponding to the position coordinates of each point cloud in the first characteristic, C_ij^c The image position coordinates corresponding to the four point cloud position coordinates can be corresponding.

Then, it can be assumed that the point cloud position coordinates corresponding to all the feature points in the target point cloud coordinates are on the image to be processedMay be expressed as { p }_i }＝{(u_i ，v_i ) -obtaining the following relation:

in the above formula, s can be corresponding to a scale factor, and the following relation can be used

Substituting the point cloud position coordinates corresponding to all the characteristic points in the target point cloud coordinates into the relation established by the projection points on the image to be processed:

the above relationship can then be converted into three equations as shown below:

Then, the three equations can be eliminated, and the camera position coordinates can be obtained through the point cloud position coordinates corresponding to the n feature points in the simultaneous target point cloud coordinates.

Then, after determining the camera position coordinates corresponding to each point cloud position coordinate, pose estimation processing may be performed on each point cloud position coordinate and the corresponding camera position coordinate by using a PnP pose estimation algorithm, so as to obtain camera pose data. Here, the PnP pose estimation algorithm may be a conventional technical means in the art, and will not be described herein in detail.

As still another alternative of the embodiment of the present specification, rendering the image to be processed based on the camera pose data and the target point cloud feature includes:

converting the point cloud position coordinates corresponding to each characteristic point in the target point cloud characteristic based on the camera pose data to obtain corresponding first position coordinates;

converting the first position coordinates corresponding to each feature point based on preset projection pose data to obtain corresponding second position coordinates;

converting the second position coordinates corresponding to each characteristic point based on a screen reference coordinate system of the first terminal to obtain third position coordinates;

And determining a three-dimensional object rendering special effect corresponding to the cloud characteristics of the target point, and performing rendering treatment on the image to be processed according to the third position coordinates corresponding to each characteristic point and the three-dimensional object rendering special effect.

In the process of rendering the image to be processed, the display position coordinates of the rendering special effect in the image to be processed can be obtained through a processing mode of coordinate system conversion, so that the consistency between the display special effect and the target three-dimensional object is ensured.

It may be understood that the three-dimensional object rendering special effects corresponding to the cloud features of the target point may be preset (or set by the second terminal), and the corresponding special effect display manner may, but is not limited to, illumination, shading, texture mapping, post-processing, and the like.

Referring to fig. 6, as shown in fig. 6, a virtual three-dimensional object with a shape consistent with that of a basketball is displayed in a display interface of the first terminal, and a prompt operation is provided below the virtual three-dimensional object, so that a user can execute a left-hand finger sliding operation or a right-hand finger sliding operation on the display interface, and a left-hand angle schematic diagram or a right-hand angle schematic diagram of the virtual three-dimensional object is displayed on the display interface. It should be noted that the position of the virtual three-dimensional object in the display interface may be consistent with the position of the target three-dimensional object in the image to be processed acquired by the first terminal.

Referring next to fig. 7, fig. 7 is a schematic structural diagram of an augmented reality system for a three-dimensional object according to an embodiment of the present disclosure.

As shown in fig. 7, the augmented reality system of the three-dimensional object may at least include a feature extraction module 701, a feature matching module 702, and an image rendering module 703, wherein:

The feature extraction module 701 is configured to perform feature extraction processing on an image to be processed of the first terminal, so as to obtain a first feature;

the feature matching module 702 is configured to perform feature matching processing on the point cloud feature set based on the first feature, so as to obtain a target point cloud feature; the point cloud feature set consists of point cloud features corresponding to at least two three-dimensional objects;

the image rendering module 703 is configured to calculate camera pose data according to the first feature and the target point cloud feature, and perform rendering processing on the image to be processed based on the camera pose data and the target point cloud feature.

In some possible embodiments, feature matching module 702 is further to:

before feature matching processing is carried out on the point cloud feature set based on the first feature to obtain the target point cloud feature, acquiring a three-dimensional image set corresponding to each three-dimensional object based on the second terminal; each three-dimensional image set comprises at least two three-dimensional object images, and the shooting angles of the three-dimensional objects corresponding to each three-dimensional object image are different;

In some possible embodiments, feature matching module 702 is further to:

judging whether the position coordinates of any one feature point in the point cloud features corresponding to each three-dimensional image set on each three-dimensional object image are in the corresponding target segmentation mask map or not;

when the position coordinates of any one feature point on each three-dimensional object image are detected and are not in the corresponding target segmentation mask diagram, eliminating the point cloud features of the feature points;

and when the position coordinates of any one feature point on each three-dimensional object image are detected to be in the corresponding target segmentation mask diagram, reserving the point cloud features of the feature points.

And taking the point cloud characteristics corresponding to each processed three-dimensional image set as a point cloud characteristic set.

In some possible embodiments, feature matching module 702 is specifically configured to:

calculating the Hamming distance between the feature descriptor of each feature point in the first feature and the feature descriptor of any one feature point in the point cloud feature set;

And taking the point cloud position coordinates corresponding to all the feature points with the minimum Hamming distance in the point cloud feature set as the target point cloud features.

In some possible embodiments, the image rendering module 703 is specifically configured to:

In some possible embodiments, the image rendering module 703 is specifically further configured to:

converting the second position coordinates corresponding to each feature point based on a screen reference coordinate system of the first terminal to obtain third position coordinates;

And determining the three-dimensional object rendering special effect corresponding to the cloud characteristics of the target point, and performing rendering processing on the image to be processed according to the third position coordinates corresponding to each characteristic point and the three-dimensional object rendering special effect.

Referring next to fig. 8, fig. 8 is a schematic structural diagram of a server according to an embodiment of the present disclosure.

As shown in fig. 8, the server 800 may include: at least one processor 801, at least one network interface 804, a user interface 803, a memory 805, a first terminal 806, a second terminal 807, and at least one communication bus 802.

Wherein the communication bus 802 may be used to enable connectivity communication of the various components described above.

The user interface 803 may include keys, among other things, and the optional user interface may also include a standard wired interface, a wireless interface.

The network interface 804 may include, but is not limited to, a bluetooth module, an NFC module, a Wi-Fi module, and the like.

Wherein the processor 801 may include one or more processing cores. The processor 801 connects various portions of the overall server 800 using various interfaces and lines, performs various functions of the routing server 800 and processes data by executing or executing instructions, programs, code sets, or instruction sets stored in the memory 805, and invoking data stored in the memory 805. In the alternative, processor 801 may be implemented in at least one of the hardware forms DSP, FPGA, PLA. The processor 801 may integrate one or a combination of several of a CPU, GPU, modem, and the like. The CPU mainly processes an operating system, a user interface, an application program and the like; the GPU is used for rendering and drawing the content required to be displayed by the display screen; the modem is used to handle wireless communications. It will be appreciated that the modem may not be integrated into the processor 801 and may be implemented on a single chip.

The memory 805 may include RAM or ROM. Optionally, the memory 805 comprises a non-transitory computer readable medium. Memory 805 may be used to store instructions, programs, code, sets of codes, or instruction sets. The memory 805 may include a stored program area and a stored data area, wherein the stored program area may store instructions for implementing an operating system, instructions for at least one function (such as a touch function, a sound playing function, an image playing function, etc.), instructions for implementing the above-described respective method embodiments, etc.; the storage data area may store data or the like referred to in the above respective method embodiments. The memory 805 may also optionally be at least one storage device located remotely from the aforementioned processor 801. As shown in fig. 8, an operating system, a network communication module, a user interface module, and an augmented reality application of a three-dimensional object may be included in the memory 805 as one type of computer storage medium.

In particular, the processor 801 may be used to invoke an augmented reality application of a three-dimensional object stored in the memory 805 and specifically perform the following operations:

In some possible embodiments, before performing feature matching processing on the point cloud feature set based on the first feature, before obtaining the target point cloud feature, the method further includes:

In some possible embodiments, deriving the set of point cloud features based on each set of three-dimensional images, and the second features of the respective all three-dimensional object images, includes:

In some possible embodiments, obtaining a set of point cloud features from each set of three-dimensional images and the corresponding point cloud features includes:

In some possible embodiments, according to all the target segmentation mask graphs in each three-dimensional image set, filtering the corresponding point cloud features to obtain a point cloud feature set, including:

In some possible embodiments, performing feature matching processing on the point cloud feature set based on the first feature to obtain a target point cloud feature, including:

In some possible embodiments, calculating camera pose data from the first feature and the target point cloud feature includes:

In some possible embodiments, rendering the image to be processed based on the camera pose data and the target point cloud features includes:

Embodiments of the present disclosure also provide a computer-readable storage medium having instructions stored therein that, when executed on a computer or processor, cause the computer or processor to perform one or more of the steps of the embodiments shown in fig. 2 or 5 described above. The above-described constituent modules of the electronic apparatus may be stored in a computer-readable storage medium if implemented in the form of software functional units and sold or used as independent products.

In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on a computer, the processes or functions in accordance with embodiments of the present description are produced in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in or transmitted across a computer-readable storage medium. The computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by a wired (e.g., coaxial cable, fiber optic, digital subscriber line (Digital Subscriber Line, DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). Computer readable storage media can be any available media that can be accessed by a computer or data storage devices, such as servers, data centers, etc., that contain an integration of one or more available media. The usable medium may be a magnetic medium (e.g., a floppy Disk, a hard Disk, a magnetic tape), an optical medium (e.g., a digital versatile Disk (Digital Versatile Disc, DVD)), or a semiconductor medium (e.g., a Solid State Disk (SSD)), or the like.

Those skilled in the art will appreciate that implementing all or part of the above-described embodiment methods may be accomplished by way of a computer program, which may be stored in a computer-readable storage medium, instructing relevant hardware, and which, when executed, may comprise the embodiment methods as described above. And the aforementioned storage medium includes: various media capable of storing program code, such as ROM, RAM, magnetic or optical disks. The technical features in the present examples and embodiments may be arbitrarily combined without conflict.

The above embodiments are merely illustrative of the preferred embodiments of the present invention and are not intended to limit the scope of the present invention, and various modifications and improvements made by those skilled in the art to the technical solution of the present invention should fall within the protection scope defined by the claims of the present invention without departing from the design spirit of the present invention.

Claims

1. An augmented reality method of a three-dimensional object, comprising:

And calculating camera pose data according to the first features and the target point cloud features, and rendering the image to be processed based on the camera pose data and the target point cloud features.

2. The method of claim 1, further comprising, prior to the performing feature matching processing based on the first feature-to-point cloud feature set to obtain a target point cloud feature:

performing image pairing processing on all the three-dimensional object images in each three-dimensional image set to obtain at least two paired images corresponding to each three-dimensional image set; wherein each set of the paired images includes two of the three-dimensional object images;

respectively carrying out feature extraction processing on each three-dimensional object image in each group of paired images to obtain second features of each three-dimensional object image;

and obtaining a point cloud feature set based on each three-dimensional image set and the second features of all the corresponding three-dimensional object images.

3. The method of claim 2, wherein the obtaining a point cloud feature set based on each of the three-dimensional image sets and the second features of the respective all of the three-dimensional object images includes:

performing feature point matching processing on the two second features corresponding to each group of paired images in each three-dimensional image set to obtain matched feature points;

4. A method according to claim 3, said deriving a set of point cloud features from each of said sets of three-dimensional images and corresponding point cloud features, comprising:

5. The method according to claim 4, wherein the filtering the corresponding point cloud features according to all the target segmentation mask graphs in each three-dimensional image set to obtain a point cloud feature set includes:

judging whether the position coordinates of any one feature point in the point cloud features corresponding to each three-dimensional image set on each three-dimensional object image are in the corresponding target segmentation mask image or not;

when the position coordinates of any one feature point on each three-dimensional object image are detected to be not in the corresponding target segmentation mask diagram, eliminating the point cloud features of the feature points;

6. The method of claim 1, wherein the performing feature matching processing on the first feature-to-point cloud feature set to obtain a target point cloud feature comprises:

and taking the point cloud position coordinates corresponding to all the feature points with the minimum Hamming distance in the point cloud feature set as target point cloud features.

7. The method of claim 6, the computing camera pose data from the first feature and the target point cloud feature comprising:

calculating camera position coordinates corresponding to each point cloud position coordinate based on the centroid coordinates, the four point cloud position coordinates and image position coordinates corresponding to each point cloud position coordinate in the first feature;

and calculating camera pose data based on each point cloud position coordinate and the corresponding camera position coordinate.

8. The method of claim 1, the rendering the image to be processed based on the camera pose data and the target point cloud feature, comprising:

9. An augmented reality system for a three-dimensional object, comprising:

10. The system of claim 9, the feature matching module further to:

before the characteristic matching processing is carried out on the point cloud characteristic set based on the first characteristic to obtain a target point cloud characteristic, acquiring a three-dimensional image set corresponding to each three-dimensional object based on a second terminal; each three-dimensional image set comprises at least two three-dimensional object images, and the shooting angles of the three-dimensional objects corresponding to each three-dimensional object image are different;

11. The system of claim 10, the feature matching module further to:

performing feature point matching processing on the two second features corresponding to each group of paired images in each three-dimensional image set to obtain matching feature points and corresponding camera parameters;

12. The system of claim 11, the feature matching module further to:

13. The system of claim 12, the feature matching module further to:

14. The system of claim 9, wherein the feature matching module is specifically configured to:

15. The system of claim 14, the image rendering module being specifically configured to:

16. The system of claim 9, the image rendering module is specifically further configured to:

17. An augmented reality system for a three-dimensional object includes a processor and a memory;

the processor is connected with the memory;

the memory is used for storing executable program codes;

the processor runs a program corresponding to executable program code stored in the memory by reading the executable program code for performing the method according to any one of claims 1-8.

18. A computer readable storage medium having stored thereon a computer program having instructions stored therein, which when run on a computer or processor, cause the computer or processor to perform the steps of the method according to any of claims 1-8.