Detailed Description
In order that those skilled in the art will better understand the present application, a technical solution in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present application without making any inventive effort, shall fall within the scope of the present application.
It should be noted that the terms "first," "second," and the like in the description and the claims of the present application and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the application described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
First, partial terms or terminology appearing in the course of describing embodiments of the application are applicable to the following explanation:
new view generation (Novel View Synthesis, NVS): the method is an application field of computer vision, and a new view angle generating task is to generate images of any view angle of an object or a scene under the condition of given a plurality of input images and corresponding camera poses.
Neural radiation fields (Neural Radiance Field, neRF) that can be used for the task of new angle generation.
Triangle Mesh model (Mesh): the method can be used for representing fine three-dimensional model details by using smaller data scale, and is suitable for reconstructing, editing and deforming the three-dimensional model.
Multilayer perceptrons (Multilayer Perceptron, MLPs): may also be referred to as an artificial neural network (Artificial Neural Network, ANN), which may have at least one hidden layer in between, in addition to the input and output layers.
Spherical harmonics (Spherical Harmonics, SH): the method is an angle part of a spherical coordinate system formal solution of the Laplace equation, and is widely applied to the aspects of quantum mechanics, computer graphics, rendering illumination treatment, spherical mapping and the like.
Symbol distance function (Sign Distance Function, SDF): determining a distance from a point to a boundary of the region over a limited region in space and defining a sign of the distance at the same time: the point is positive inside the boundary of the region, negative outside, and 0 when located on the boundary.
Example 1
The real-time new view angle generation scheme is any view angle display for super-realism of shooting entities, can digitally store and display the entities, is combined with XR application and the like, and has some defects when the real-time new view angle generation is performed in the following way at present, namely:
The exhibition scheme based on the manual modeling mapping is difficult to restore some specific details of shooting entities with high fidelity, lacks sense of realism and is more expensive in manpower. In the modeling process, dense reconstruction is performed by using traditional features, and in a weak texture or non-texture area, a matching algorithm is problematic, so that a three-dimensional model of a bending or hole is reconstructed. In the process of mapping, proper pixel point colors are selected from each image by utilizing an algorithm and stored in textures, so that the conditions of bending of texture details, uneven colors and the like are easy to occur, the final display effect is poor, and if texture mapping is not performed, the true effect that the colors of the same area change under different visual angles cannot be displayed.
Algorithms based on neural networks and network automation for volume rendering are difficult to render in real time on common devices such as cell phones. Specifically, an algorithm based on neural network and volume rendering, such as NeRF, for example, needs to sample a plurality of points along the viewing angle direction when rendering a pixel, and when the neural network is adopted for each point to predict the color, the calculated amount is large, and real-time rendering and displaying on common equipment, such as a mobile phone, are difficult.
The automatic algorithm based on the neural network and the polyhedron such as MobileNeRF can improve the rendering speed, and compared with the NeRF scheme, the method can achieve the effect of real-time rendering display, but is not suitable for common graphic applications such as heavy lighting, physical simulation, texture editing and the like due to the fact that the superimposed triangular mesh model is used, and is low in precision, and a large amount of noise exists on the surface of the method when the rendering display effect is checked in a short distance.
In summary, the technical problems of low reconstruction efficiency and poor display effect when the shooting entity is rendered and displayed in real time exist in the real-time new view angle generation scheme in the related technology.
According to an embodiment of the present application, there is provided an image processing method of a photographing entity, it being noted that the steps shown in the flowchart of the drawings may be performed in a computer system such as a set of computer executable instructions, and that although a logical order is shown in the flowchart, in some cases the steps shown or described may be performed in an order different from that herein.
Fig. 1 is a schematic diagram of a hardware environment of a virtual reality device of an image processing method of a photographing entity according to an embodiment of the application. As shown in fig. 1, the virtual reality device 104 is connected to the terminal 106, the terminal 106 is connected to the server 102 via a network, and the virtual reality device 104 is not limited to: the terminal 104 is not limited to a PC, a mobile phone, a tablet computer, etc., and the server 102 may be a server corresponding to a media file operator, and the network includes, but is not limited to: a wide area network, a metropolitan area network, or a local area network.
Optionally, the virtual reality device 104 of this embodiment includes: memory, processor, and transmission means. The memory is used to store an application program that can be used to perform: image acquisition is carried out on a shooting entity to obtain shooting data; performing sparse reconstruction on the shooting data to obtain a first reconstruction result, wherein the first reconstruction result is used for determining a reconstruction range corresponding to the shooting entity; performing geometric reconstruction on the first reconstruction result to obtain a second reconstruction result, wherein the second reconstruction result is used for determining a three-dimensional grid model corresponding to the shooting entity; and carrying out color reconstruction on the second reconstruction result to obtain a rendering and displaying result of the shooting entity under the target view angle, thereby solving the technical problems of low reconstruction efficiency and poor displaying effect when the shooting entity is rendered and displayed in real time in the related technology and achieving the purpose of high-fidelity rendering and displaying the shooting entity.
The terminal of this embodiment may be configured to perform rendering display results under a first viewing angle on a display screen of a Virtual Reality (VR) device or an augmented Reality (Augmented Reality, AR) device, where the rendering display results sequentially undergo sparse reconstruction of shooting data to obtain a first reconstruction result, geometric reconstruction of the first reconstruction result to obtain a second reconstruction result, and color reconstruction of the second reconstruction result, the shooting data is obtained by image acquisition of a shooting entity, the first reconstruction result is used to determine a reconstruction range corresponding to the shooting entity, the second reconstruction result is used to determine a three-dimensional grid model corresponding to the shooting entity, and send a viewing angle switching instruction to the Virtual Reality device 104, and the Virtual Reality device 104 displays the rendering display results under the second viewing angle at a target delivery position after receiving the viewing angle switching instruction.
Optionally, the head display of the head-mounted display (Head Mount Display, HMD) with eye tracking of the virtual reality device 104 of this embodiment has the same function as the eye tracking module in the above embodiment, that is, the screen in the HMD head display is used for displaying a real-time picture, and the eye tracking module in the HMD is used for acquiring the real-time movement track of the user's eyes. The terminal of the embodiment obtains the position information and the motion information of the user in the real three-dimensional space through the tracking system, and calculates the three-dimensional coordinates of the head of the user in the virtual three-dimensional space and the visual field orientation of the user in the virtual three-dimensional space.
The hardware architecture block diagram shown in fig. 1 may be used not only as an exemplary block diagram for an AR/VR device (or mobile device) as described above, but also as an exemplary block diagram for a server as described above, and in an alternative embodiment, fig. 2 shows in block diagram form one embodiment of a computing node in a computing environment 201 using an AR/VR device (or mobile device) as described above in fig. 1. Fig. 2 is a block diagram of a computing environment of a method for processing an image of a capturing entity according to an embodiment of the present application, as shown in fig. 2, the computing environment 201 includes a plurality of computing nodes (e.g., servers) running on a distributed network (shown as 210-1, 210-2, …). Different computing nodes contain local processing and memory resources and end user 202 may run applications or store data remotely in computing environment 201. The application may be provided as a plurality of services 220-1, 220-2, 220-3, and 220-4 in computing environment 201, representing services "A", "D", "E", and "H", respectively.
End user 202 may provide and access services through a web browser or other software application on a client, in some embodiments, provisioning and/or requests of end user 202 may be provided to portal gateway 230. Ingress gateway 230 may include a corresponding agent to handle provisioning and/or request for services (one or more services provided in computing environment 201).
Services are provided or deployed in accordance with various virtualization techniques supported by the computing environment 201. In some embodiments, services may be provided according to Virtual Machine (VM) based virtualization, container based virtualization, and/or the like. Virtual machine-based virtualization may be the emulation of a real computer by initializing a virtual machine, executing programs and applications without directly touching any real hardware resources. While the virtual machine virtualizes the machine, according to container-based virtualization, a container may be started to virtualize the entire Operating System (OS) so that multiple workloads may run on a single Operating System instance.
In one embodiment based on container virtualization, several containers of a service may be assembled into one Pod (e.g., kubernetes Pod). For example, as shown in FIG. 2, the service 220-2 may be equipped with one or more Pods 240-1, 240-2, …,240-N (collectively referred to as Pods). The Pod may include an agent 245 and one or more containers 242-1, 242-2, …,242-M (collectively referred to as containers). One or more containers in the Pod handle requests related to one or more corresponding functions of the service, and the agent 245 generally controls network functions related to the service, such as routing, load balancing, etc. Other services may also be Pod similar to Pod.
In operation, executing a user request from end user 202 may require invoking one or more services in computing environment 201, and executing one or more functions of one service may require invoking one or more functions of another service. As shown in FIG. 2, service "A"220-1 receives a user request of end user 202 from ingress gateway 230, service "A"220-1 may invoke service "D"220-2, and service "D"220-2 may request service "E"220-3 to perform one or more functions.
The computing environment may be a cloud computing environment, and the allocation of resources is managed by a cloud service provider, allowing the development of functions without considering the implementation, adjustment or expansion of the server. The computing environment allows developers to execute code that responds to events without building or maintaining a complex infrastructure. Instead of expanding a single hardware device to handle the potential load, the service may be partitioned to a set of functions that can be automatically scaled independently.
In the above-described operation environment, the present application provides an image processing method of a photographing entity as shown in fig. 3. It should be noted that, the image processing method of the photographing entity of this embodiment may be performed by the mobile terminal of the embodiment shown in fig. 1. Fig. 3 is a flowchart of an image processing method of a photographing entity according to an embodiment of the present application. As shown in fig. 3, the method may include the steps of:
Step S31, image acquisition is carried out on a shooting entity to obtain shooting data;
step S32, sparse reconstruction is carried out on shooting data to obtain a first reconstruction result, wherein the first reconstruction result is used for determining a reconstruction range corresponding to a shooting entity;
step S33, performing geometric reconstruction on the first reconstruction result to obtain a second reconstruction result, wherein the second reconstruction result is used for determining a three-dimensional grid model corresponding to the shooting entity;
and step S34, performing color reconstruction on the second reconstruction result to obtain a rendering and displaying result of the shooting entity under the target view angle.
The shooting entity can be people, objects and scenes in the virtual reality scene to be rendered and show objects, when the shooting entity is subjected to image acquisition, shooting is required to be carried out on all angles of the shooting entity, so that original acquired data is obtained, and finally shooting data for generating a new real-time visual angle is obtained by screening the original acquired data. In order to obtain the original acquisition data of all angles of the shooting entity, a surrounding shooting method can be adopted to shoot the object. For example, an image capturing component, such as a mobile phone, is used to capture a video or a plurality of pictures around an object, or the object is placed on a turntable, and the mobile phone is fixed, so that the object is captured in the video or the pictures, and the captured data is obtained by screening the captured video or the plurality of pictures.
After shooting data are obtained, sparse reconstruction is needed to be carried out on the shooting data, so that a first reconstruction result is obtained, the reconstruction range of a shooting entity is determined by utilizing the first reconstruction result, further, geometric reconstruction is carried out on the first reconstruction result after the reconstruction range is determined, a second reconstruction result is obtained, a three-dimensional grid model corresponding to the shooting entity is reconstructed by utilizing the second reconstruction result, finally, voxel color reconstruction and surface color reconstruction are carried out on the three-dimensional grid model corresponding to the shooting entity, and a rendering display result of the shooting entity under a real-time new view angle is obtained.
Based on the steps S31 to S34, image acquisition is carried out on the shooting entity to obtain shooting data; performing sparse reconstruction on the shooting data to obtain a first reconstruction result, wherein the first reconstruction result is used for determining a reconstruction range corresponding to the shooting entity; performing geometric reconstruction on the first reconstruction result to obtain a second reconstruction result, wherein the second reconstruction result is used for determining a three-dimensional grid model corresponding to the shooting entity; and performing color reconstruction on the second reconstruction result to obtain a rendering and displaying result of the shooting entity under the target view angle.
It is easy to understand that the method adopts a mode of performing sparse reconstruction on shooting data, further performing geometric reconstruction on the sparse reconstruction result and finally performing color reconstruction on the geometric reconstruction result, and obtains a rendering display result of the shooting entity under a real-time new view angle through a plurality of different data reconstruction processes, so that the display effect of the shooting entity in real-time rendering display is improved. Therefore, the embodiment of the application achieves the aim of performing high-fidelity rendering display on the shooting entity, thereby realizing the technical effects of improving the reconstruction efficiency and the rendering quality when performing real-time rendering display on the shooting entity, and further solving the technical problems of low reconstruction efficiency and poor display effect when performing real-time rendering display on the shooting entity in the related technology.
It should be noted that, the image processing method of the shooting entity provided by the embodiment of the application can be applied to the application scene related to real-time rendering and displaying of the shooting entity, especially the real-time generation scene of a new view angle in the fields of e-commerce, education, medical treatment, conference, social network, financial products, logistics, navigation and the like.
The image processing method of the shooting entity in the embodiment of the application is further described below.
In an alternative embodiment, in step S31, performing image acquisition on a shooting entity, obtaining shooting data includes:
step S311, adopting an image acquisition component to acquire images of shooting entities to obtain a preset number of image frames;
step S312, performing image gradient variance calculation on a preset number of image frames to determine an ambiguity index;
step S313, screening the preset number of image frames by using the ambiguity index to obtain shooting data.
The image acquisition component can be electronic equipment with image acquisition functions such as a smart phone, a tablet personal computer and a camera, and when the image acquisition component is used for acquiring images of shooting entities, video recording can be carried out aiming at the shooting entities, so that a preset number of image frames are obtained from video files. For example, every certain number of image frames in the video file is extracted, so as to obtain a preset number of image frames. For another example, the image acquisition component is utilized to perform surrounding shooting on the shooting entity according to a preset shooting frequency, so that a preset number of image frames are obtained.
Further, image gradient variance calculation is performed on a preset number of image frames, so that the Laplace gradient variance corresponding to the image frames can be obtained, the Laplace gradient variance of the image frames is used as a ambiguity index, the ambiguity index can be used for measuring the sharpness of the image frames, wherein the greater the Laplace gradient variance value of the image frames is, the clearer the image frames are, and otherwise, the more blurred the image frames are. When the ambiguity index is used for screening the image frames with the preset number, the image frames with the Laplace gradient variance larger than the preset value can be used as shooting data, so that clearer image frames are obtained for reconstructing shooting entities.
Based on the above-mentioned alternative embodiment, by adopting the image acquisition component to acquire images of the shooting entity, a preset number of image frames are obtained, then image gradient variance calculation is performed on the preset number of image frames, an ambiguity index is determined, and finally the preset number of image frames are screened by using the ambiguity index, so that shooting data meeting the definition requirement can be quickly obtained, and the rendering effect of the shooting entity during reconstruction is further improved.
In an alternative embodiment, in step S32, performing sparse reconstruction on the shot data, to obtain a first reconstruction result includes:
Step S321, analyzing the shooting data to obtain an analysis result, wherein the analysis result is used for determining whether the shooting data contains calibration plate data or not;
step S322, sparse reconstruction is carried out on the shooting data by adopting a reconstruction mode corresponding to the analysis result, and a first reconstruction result is obtained.
After the shooting data are obtained, whether the shooting data contain calibration plate (april tag) data or not can be determined by analyzing the shooting data, wherein the april tag data can be used as a visual reference in application fields such as AR, robot and camera correction, and pose information of an image acquisition assembly and sparse point clouds corresponding to shooting entities can be positioned by utilizing the april tag data.
After determining whether the photographed data contains april tag data, a reconstruction mode for performing sparse reconstruction on the photographed data can be further determined, so that a first reconstruction result is obtained.
Based on the above optional embodiments, by analyzing the photographed data to obtain an analysis result, and further performing sparse reconstruction on the photographed data by adopting a reconstruction mode corresponding to the analysis result, a first reconstruction result can be quickly obtained, so that a reconstruction range for a photographed entity can be accurately determined, and the reconstruction accuracy is further improved.
In an optional embodiment, in step S322, performing sparse reconstruction on the photographed data by adopting a reconstruction mode corresponding to the analysis result, to obtain a first reconstruction result includes:
step S3221, when the shooting data are determined to contain calibration plate data through the analysis result, positioning pose information of the image acquisition assembly and sparse point clouds corresponding to shooting entities by using the calibration plate data;
step S3222, performing sparse reconstruction based on pose information and sparse point cloud to obtain a first reconstruction result.
Specifically, the pose information of the image acquisition component and the sparse point cloud corresponding to the shooting entity are positioned by adopting a motion structural rule (Structure From Motion, SFM), wherein the SFM algorithm is a process of calculating three-dimensional information from a group of two-dimensional images of a space-time sequence, and structural information such as pose information and coefficient point cloud which are presented under a three-dimensional view angle can be recovered by analyzing the motion information in the images.
When the photographing data contains april tag data, the april tag data is preferentially used for positioning pose information of the image acquisition component and sparse point clouds corresponding to photographing entities. The pose information of the image acquisition component can be camera external parameters, the camera external parameters comprise a rotation matrix (R) and translation vectors (t), wherein the rotation matrix can be converted into three-dimensional rotation vectors, so that rotation angles around three axes x, y and z are respectively represented, and the translation vectors respectively represent translation amounts in three directions x, y and z. In the reverse reconstruction engineering, a point data set of the appearance surface of the shooting entity obtained through the image acquisition component is called point cloud, and when the point number of the point data set is small and the distance between the points is relatively large, the point data set is sparse point cloud. After pose information and sparse point cloud are recovered based on an april tag positioning technology, a reconstructed coordinate system can be obtained, a reconstruction range for a shooting entity is determined based on the reconstructed coordinate system, and a first reconstruction result is obtained.
Based on the above optional embodiments, when determining that the shooting data includes calibration plate data according to the analysis result, the pose information of the image acquisition component and the sparse point cloud corresponding to the shooting entity are located by using the calibration plate data, and then sparse reconstruction is performed based on the pose information and the sparse point cloud to obtain the first reconstruction result, so that the april tag data in the shooting data can be effectively utilized to perform sparse reconstruction, thereby rapidly obtaining the reconstruction range of the shooting entity, avoiding reconstruction of a large amount of background content, and further effectively improving the reconstruction precision of the shooting entity.
In an optional embodiment, in step S3222, performing sparse reconstruction based on pose information and a sparse point cloud, and obtaining a first reconstruction result includes:
step S32221, determining the center of the reconstruction range based on pose information, and determining the side length of the reconstruction range based on sparse point cloud;
and S32222, performing sparse reconstruction according to the center and the side length to obtain a first reconstruction result.
Specifically, when the center of the reconstruction range is determined based on pose information, convergence points of all camera vision lines can be determined through the pose information, and then the center of the reconstruction range is determined based on the vision line convergence points. And determining the side length of the reconstruction range based on the sparse point cloud, and finally performing sparse reconstruction according to the center and the side length to determine the reconstruction range for the shooting entity, wherein the reconstruction range can be a tetragonal body, and a first reconstruction result is obtained.
Based on the above-mentioned optional embodiment, by determining the center of the reconstruction range based on pose information and determining the side length of the reconstruction range based on the sparse point cloud, sparse reconstruction is performed according to the center and the side length, thereby rapidly obtaining the reconstruction range for the shooting entity.
In an optional embodiment, in step S322, performing sparse reconstruction on the photographed data by adopting a reconstruction mode corresponding to the analysis result, to obtain a first reconstruction result includes:
step S3223, when the shooting data are determined to not contain the calibration plate data through the analysis result, the pose information of the image acquisition assembly is acquired by using a three-dimensional reconstruction mode;
and step S3224, performing sparse reconstruction based on pose information to obtain a first reconstruction result.
Specifically, when the shot data does not include april tag data, pose information of the camera is acquired by using a three-dimensional reconstruction mode, and because a reconstructed coordinate system changes irregularly, the reconstruction range needs to be determined by using the pose information of the camera. The three-dimensional reconstruction mode can be a COLMAP technology, the COLMAP is a general motion structure and multi-view three-dimensional pipeline, has an image and command line interface, and can provide wide functions for reconstruction of ordered and unordered image sets.
Based on the above-mentioned optional embodiment, when it is determined that the shooting data does not include the calibration plate data through the analysis result, pose information of the image acquisition component is acquired by using a three-dimensional reconstruction mode, and further sparse reconstruction is performed based on the pose information, so as to obtain a first reconstruction result, and thus a reconstruction range for the shooting entity can be quickly obtained.
In an alternative embodiment, in step S3224, performing sparse reconstruction based on pose information, and obtaining the first reconstruction result includes:
step S32241, determining the center of the reconstruction range based on the pose information, and determining the radius of the reconstruction range based on the distance between the center and the image acquisition component;
and step S32242, performing sparse reconstruction according to the center and the radius to obtain a first reconstruction result.
Specifically, when the center of the reconstruction range is determined based on pose information, convergence points of all camera vision lines can be determined through the pose information, and then the center of the reconstruction range is determined based on the vision line convergence points. And determining the radius of the reconstruction range by further utilizing the distance between the center of the reconstruction range and the camera, and finally performing sparse reconstruction according to the center and the radius to determine the reconstruction range for the shooting entity, thereby obtaining a first reconstruction result.
Based on the above-mentioned alternative embodiment, the reconstruction range for the shooting entity is quickly obtained by determining the center of the reconstruction range based on the pose information, and determining the radius of the reconstruction range based on the distance between the center and the image acquisition component, and then performing sparse reconstruction according to the center and the radius.
In an alternative embodiment, in step S33, performing geometric reconstruction on the first reconstruction result to obtain a second reconstruction result includes:
step S331, performing discrete sampling on a first reconstruction result to obtain voxel coordinate data;
step S332, performing geometric reconstruction on the voxel coordinate data by using a target geometric reconstruction network model to obtain a coincidence distance function value, wherein the target geometric reconstruction network model is obtained by using a plurality of groups of data through machine learning training, and the plurality of groups of data comprise: training voxel coordinates obtained by reconstructing the training image;
step S333, performing three-dimensional modeling based on the coincidence distance function value to obtain a second reconstruction result.
The voxel coordinate data are used for representing coordinate data of space three-dimensional points obtained by discrete sampling in the reconstruction range determined by the above, the target set reconstruction network model comprises a neural network MLPs, the target geometric reconstruction network model is used for carrying out geometric reconstruction on the voxel coordinate data, an SDF value can be output, a Cube (MC) algorithm is used for carrying out three-dimensional modeling on the SDF value, wherein the MC algorithm is an algorithm capable of rendering an isosurface from the voxel coordinate data, and therefore a triangular Mesh model (Mesh) can be extracted from the voxel coordinate data, and a second reconstruction result is obtained.
Based on the above optional embodiment, the voxel coordinate data is obtained by performing discrete sampling on the first reconstruction result, and then the voxel coordinate data is geometrically reconstructed by adopting the target geometric reconstruction network model to obtain the coincidence distance function value, and finally the three-dimensional modeling is performed based on the coincidence distance function value, so that the three-dimensional grid model corresponding to the shooting entity can be rapidly determined.
In an alternative embodiment, in step S331, performing discrete sampling on the first reconstruction result to obtain voxel coordinate data includes: and carrying out discrete sampling on the occupied voxel grid of the shooting entity in the reconstruction range based on the first reconstruction result to obtain voxel coordinate data.
In particular, discrete sampling is performed on the occupied voxel grid (Occupancy) of the cube determined based on the reconstruction range described above, voxel coordinate data of a certain resolution, for example, 128 x 128 resolution, may be obtained. In performing discrete sampling, the weight of the sampling point is set to 1 when it is greater than 0.5, and set to 0 when the weight of the sampling point is not greater than 0.5. And determining an occupied voxel grid by adopting sampling points set to be 1, and updating the occupied voxel grid at intervals, so that an Occupancy voxel grid can be maintained, and when sampling is carried out along the camera view angle rays, the unoccupied sampling points are skipped by inquiring the Occupancy voxel grid, so that the calculated amount can be reduced, and the training speed of the target geometric reconstruction network model is further accelerated.
Based on the above optional examples, discrete sampling is performed on the occupied voxel grid of the shooting entity in the reconstruction range based on the first reconstruction result, so that voxel coordinate data can be rapidly obtained for efficient geometric reconstruction.
In an alternative embodiment, in step S332, performing geometric reconstruction on the voxel coordinate data using the geometric reconstruction network model, to obtain the coincidence distance function value includes:
step S3321, carrying out hash coding on the voxel coordinate data to obtain hash characteristics;
and S3322, geometrically reconstructing the voxel coordinate data and the hash features by adopting a target geometrical reconstruction network model to obtain the coincidence distance function value.
Specifically, hash coding is performed on voxel coordinate data, a learnable Hash (Hash) feature can be obtained in a Hash table by three-dimensional points in the voxel coordinate data, the Hash feature can enhance the expression capability of MLPs, a target geometric reconstruction network model is adopted to perform geometric reconstruction on the voxel coordinate data and the Hash feature, an SDF value is obtained, the SDF value is converted into a voxel density value in volume rendering, namely a voxel weight value, and then a color value can be obtained through volume rendering integration. The target geometric reconstruction network model can output a geometric reconstruction feature vector at the same time of outputting the SDF value, and the geometric reconstruction feature vector can be reconstructed in the subsequent voxel color and surface color.
Based on the above-mentioned alternative embodiment, the voxel coordinate data is hashed to obtain Ha Xite, and then the target geometric reconstruction network model is adopted to perform geometric reconstruction on the voxel coordinate data and the hashed feature to obtain the coincidence distance function value, so that the expression capability of the target geometric reconstruction network model can be optimized, and the accuracy and the speed of performing geometric reconstruction on the first reconstruction result are further improved.
In an optional embodiment, in step S34, performing color reconstruction on the second reconstruction result to obtain a rendered display result of the capturing entity under the target viewing angle includes:
step S341, performing voxel color reconstruction on the second reconstruction result to obtain a third reconstruction result, wherein the third reconstruction result is used for determining the voxel color of the shooting entity to be rendered and displayed under the target view angle;
step S342, performing surface color reconstruction on the second reconstruction result to obtain a fourth reconstruction result, wherein the fourth reconstruction result is used for determining the surface color of the shooting entity to be rendered and displayed under the target view angle;
in step S343, the rendering and displaying result of the shooting entity under the target viewing angle is determined by using the third reconstruction result and the fourth reconstruction result.
Based on the above optional embodiment, the voxel color reconstruction is performed on the second reconstruction result to obtain a third reconstruction result, so that the voxel color of the shooting entity to be rendered and displayed under the target view angle can be determined, the surface color reconstruction is further performed on the second reconstruction result to obtain a fourth reconstruction result, so that the surface color of the shooting entity to be rendered and displayed under the target view angle is determined, and finally, the rendering and displaying result of the shooting entity under the target view angle is determined by using the third reconstruction result and the fourth reconstruction result, so that the rendering and displaying effect of the shooting entity in implementing new view angle generation can be improved, and better rendering and displaying effect on the shooting entity can be obtained.
In an alternative embodiment, in step S341, performing voxel color reconstruction on the second reconstruction result, to obtain a third reconstruction result includes:
step S3411, performing voxel color reconstruction on the voxel coordinate data, the geometric reconstruction feature vector and the view angle direction of the target view angle by using a target voxel color prediction network model to obtain a first color value, wherein the target voxel color prediction network model is obtained by machine learning training by using a plurality of groups of data, and the plurality of groups of data comprise: training a first predicted value corresponding to the voxel coordinates, wherein the first predicted value is a volume rendering predicted color obtained by reconstructing the voxel coordinates, the first color value is used for determining the volume rendering color, and the geometric reconstruction feature vector is obtained after geometrically reconstructing voxel coordinate data through a target geometric reconstruction network model;
In step S3412, a third reconstruction result is obtained based on the first color value and the voxel weight corresponding to the coincidence distance function value.
The target voxel color prediction network model comprises a neural network MLPs, wherein the input of the target voxel color prediction network model comprises voxel coordinate data, a geometric reconstruction feature vector and a view angle direction of a target view angle, and the output of the target voxel color prediction network model is a first color value. The geometric reconstruction feature vector is a result output by the target geometric reconstruction network model. And sampling a series of three-dimensional points along the view angle direction of the target view angle by utilizing the volume rendering, outputting corresponding first color values for the three-dimensional points through the target voxel color prediction network model, and carrying out weighted summation on the first color values by utilizing voxel weights obtained by geometric reconstruction, so as to obtain a final color value, namely a third reconstruction result.
Based on the above optional embodiment, by performing voxel color reconstruction on voxel coordinate data, a geometric reconstruction feature vector and a view angle direction of a target view angle by using a target voxel color prediction network model, a first color value is obtained, and further, a third reconstruction result is obtained by calculating based on the first color value and a voxel weight corresponding to a distance function value, so that the voxel color of a shooting entity to be rendered and displayed under the target view angle can be rapidly determined.
In an alternative embodiment, in step S342, performing surface color reconstruction on the second reconstruction result, to obtain a fourth reconstruction result includes:
in step S3421, performing surface color reconstruction on the voxel coordinate data and the geometric reconstruction feature vector by using the target surface color prediction network model to obtain a second color value, and performing surface color reconstruction on the voxel coordinate data, the geometric reconstruction feature vector and the view angle direction of the target view angle to obtain a third color value, where the target surface color prediction network model is obtained by using multiple sets of data through machine learning training, and the multiple sets of data include: training a second predicted value corresponding to the voxel coordinates, wherein the second predicted value is a surface rendering predicted color obtained by reconstructing the voxel coordinates, the second color value is used for determining a diffuse reflection color of the surface rendering, and the third color value is used for determining a specular reflection color of the surface rendering;
step S3422, a fourth reconstruction result is determined based on the second color value and the third color value.
The target surface color prediction network model comprises a neural network MLPs, and when the input of the target surface color prediction network model is voxel coordinate data and geometric reconstruction feature vectors, the target surface color prediction network model can be output as a second color value, wherein the second color value is used for representing diffuse reflection colors, and the diffuse reflection colors can be described by adopting intrinsic colors and spherical harmonic coefficients. The spherical harmonic coefficients may represent irradiance of the scene, may be spatially interpolated, and may be saved with a low resolution picture in order to reduce storage. The intrinsic color can be stored by adopting a high-resolution image, and still can recover a clear diffuse reflection color by combining a low-resolution spherical harmonic coefficient. Specifically, when the coordinates and the geometric reconstruction feature vector of the three-dimensional point are input to the target surface color prediction network model, the output intrinsic color is a three-channel color system (Red Green Blue, RGB) image, and the output spherical harmonic coefficient is a nine three-channel RGB image.
When the input of the target surface color prediction network model is voxel coordinate data, a geometric reconstruction feature vector and a viewing angle direction of the target viewing angle, the input may be output as a third color value, and the third color value is used to represent a specular reflection color, that is, a highlight color. The prediction difficulty of the highlight color is high, so that light MLPs are used for prediction, and a geometric reconstruction feature vector and the view angle direction of a target view angle are input to the network when each rendering is performed, so that the color corresponding to the highlight is output. Specifically, when the coordinates of the three-dimensional points, the geometric reconstruction feature vector and the viewing angle direction of the target viewing angle are input to the target surface color prediction network model, the output highlight color is an RGB color value.
The diffuse reflection color and the specular reflection color can be separated by using the target surface color prediction network model, the intrinsic color, the spherical harmonic coefficient and the highlight color are predicted by using the target surface color prediction network model, and the graphic application of the re-lighting can be realized. When the rendering and displaying result of the shooting entity under the target visual angle is determined by using the third reconstruction result and the fourth reconstruction result, the diffuse reflection color and the specular reflection color can be overlapped so as to obtain the rendering and displaying result of the shooting entity under the target visual angle.
Based on the above optional embodiment, by performing surface color reconstruction on the voxel coordinate data and the geometric reconstruction feature vector by using the target surface color prediction network model, to obtain a second color value, and performing surface color reconstruction on the voxel coordinate data, the geometric reconstruction feature vector and the view angle direction of the target view angle, to obtain a third color value, and finally determining a fourth reconstruction result based on the second color value and the third color value, the surface color of the shooting entity to be rendered and displayed under the target view angle can be rapidly determined.
In an optional embodiment, the image processing method of the shooting entity in the embodiment of the application further includes:
step S41, calculating a first loss by using the first predicted value and a first real value, wherein the first real value is used for representing the real color of the volume rendering;
step S42, calculating a second loss by using the second predicted value and a second real value, wherein the second real value is used for representing the real color of the surface rendering;
step S43, calculating to obtain target loss through the first loss and the second loss;
step S44, updating model parameters of the initial geometric reconstruction network model to obtain a target geometric reconstruction network model, updating model parameters of the initial voxel color prediction network model to obtain a target voxel color prediction network model, and updating model parameters of the initial surface color prediction network model to obtain a target surface color prediction network model by using the target loss.
Specifically, in the training process, a first loss is constructed by using the volume rendering predicted color and the volume rendering real color obtained by volume rendering integration, a second loss is constructed by using the surface rendering predicted color and the surface rendering real color obtained by differential surface rendering, further, a target loss is obtained by calculating the first loss and the second loss, and the model parameters of the initial geometric reconstruction network model, the model parameters of the initial voxel color prediction network model and the model parameters of the initial surface color prediction network model are continuously and iteratively optimized by utilizing the target loss, so that the model performances of the target geometric reconstruction network model, the target voxel color prediction network model and the target surface color prediction network model are improved, and the rendering display result of a shooting entity under a target view angle is further optimized. In addition, the use of differentiable surface rendering can also alleviate the problem of large surface noise of the geometric model caused by the introduction of Hash features.
In an optional embodiment, the image processing method of the shooting entity in the embodiment of the application further includes:
step S35, displaying rendering display results on a display screen of the virtual reality VR device or the augmented reality AR device;
Step S36, performing post-processing operation based on the rendering display result to obtain an updated rendering display result, wherein the post-processing operation comprises at least one of the following steps: re-polishing, physical simulation and texture editing.
Based on the above-mentioned optional embodiments, by displaying the rendering display result on the rendering screen of the virtual reality VR device or the augmented reality AR device, and further performing post-processing operation based on the rendering display result, an updated rendering display result is obtained, so that real-time new view angle generation of the common graphics application can be realized.
In an optional embodiment, a graphical user interface is provided through the terminal device, where content displayed by the graphical user interface at least partially includes a rendering display result generating scene, and the image processing method of the shooting entity in the embodiment of the present application further includes:
step S51, responding to a first touch operation acted on a graphical user interface, and selecting a three-dimensional object acquisition function;
step S52, responding to a second touch operation acting on the graphical user interface, and selecting to start to create a three-dimensional object acquisition item under the three-dimensional object acquisition function;
step S53, in response to a third touch operation acting on the graphical user interface, determining the name and type of the three-dimensional object acquisition item and starting a video acquisition function to acquire shooting data;
Step S54, in response to the fourth touch operation on the graphical user interface, the shooting data is uploaded to generate a rendering display result of the shooting entity under the target viewing angle.
In the above optional embodiment, at least a rendering display result generating scene is displayed in the graphical user interface, and the user performs sparse reconstruction on the shooting data through the rendering display result generating scene, further performs geometric reconstruction on the sparse reconstruction result, and finally performs color reconstruction on the geometric reconstruction result, so as to obtain the rendering display result of the shooting entity under the new real-time view angle through multiple different data reconstruction processes. The rendering display result generating scene can be, but not limited to, application scenes related to real-time rendering display of shooting entities in the fields of electronic commerce, education, medical treatment, conferences, social networks, financial products, logistics, navigation and the like.
The graphical user interface further includes a first control (or a first touch area), and when a first touch operation acting on the first control (or the first touch area) is detected, a three-dimensional object collection function is selected, where the first touch operation may be operations such as clicking, frame selection, hooking, and condition screening.
The graphical user interface further comprises a second control (or a second touch area), and when a second touch operation acting on the second control (or the second touch area) is detected, the creation of the three-dimensional object acquisition item is selected to be started under the three-dimensional object acquisition function. The second touch operation may be operations such as clicking, selecting, hooking, and screening conditions.
The graphical user interface further comprises a third control (or a third touch area), and when a third touch operation acting on the third control (or the third touch area) is detected, the name and the type of the three-dimensional object acquisition item are determined, and a video acquisition function is started to acquire shooting data. The third touch operation may be operations such as clicking, selecting, hooking, and screening conditions.
The graphical user interface further comprises a fourth control (or a fourth touch area), and when fourth touch operation acting on the fourth control (or the fourth touch area) is detected, shooting data are uploaded to generate a rendering display result of the shooting entity under the target visual angle. The fourth touch operation may be clicking the "upload" button, pressing for a long time, or the like.
Further, sparse reconstruction is carried out on shooting data, geometric reconstruction is carried out on the sparse reconstruction result, finally, color reconstruction is carried out on the geometric reconstruction result, and a rendering display result of the shooting entity under a real-time new view angle is obtained through multiple different data reconstruction processes, so that a display effect of the shooting entity in real-time rendering display is improved. The process provides higher operation flexibility for the user, and the user experience is good.
It should be noted that, the first touch operation, the second touch operation, the third touch operation, and the fourth touch operation may be operations that a user touches a display screen of the terminal device with a finger and touches the terminal device. The touch operation may include single-point touch, multi-point touch, where the touch operation of each touch point may include clicking, long pressing, heavy pressing, swiping, and the like. The first touch operation, the second touch operation, the third touch operation, and the fourth touch operation may also be touch operations implemented through input devices such as a mouse and a keyboard.
The following describes an image processing method of a shooting entity in an embodiment of the present application with reference to the accompanying drawings.
Fig. 4, 5, 6 and 9 are schematic diagrams of a client interface of a method for processing an image of a shooting entity according to an embodiment of the present application, as shown in fig. 4, three-dimensional content may be created on a user mobile phone interface, including but not limited to three-dimensional object acquisition, three-dimensional digital person acquisition and three-dimensional scene acquisition. For example, when three-dimensional object collection is performed, three-dimensional object models such as shoe bags, toys, home furnishings and the like can be created through shooting; when three-dimensional digital person acquisition is carried out, a three-dimensional twin digital person image with high writing reality degree can be created through shooting; when the three-dimensional scene is collected, three-dimensional scenes such as houses, markets, indoor exhibition halls and the like can be created through shooting.
After the corresponding collection options are selected by touch on the interface shown in fig. 4, the collection item creation interface shown in fig. 5 can be entered. After the collection item creation control is selected by touch on the interface shown in fig. 5, the collection item definition interface shown in fig. 6 can be entered, the name type and collection form of the collection item can be configured in the interface content, after the configuration is completed, the collection start control is clicked and selected, and the mobile phone camera can be started to shoot shooting entities at different angles. Fig. 7 and 8 are examples of photographing data corresponding to a three-dimensional character model a, in which fig. 7 is a front view corresponding to one photographing entity according to an embodiment of the present application, and fig. 8 is a left view corresponding to one photographing entity according to an embodiment of the present application.
After shooting is completed, shooting data are obtained, sparse reconstruction is carried out on the shooting data, a reconstruction range corresponding to the three-dimensional character model A can be obtained, the reconstruction range can be shown as a cube in fig. 9, and a user can manually adjust the size, the angle and the height of the bounding box in an interface shown in fig. 9, so that the bounding box can be ensured to completely wrap the three-dimensional character model A. And further performing geometric reconstruction on the sparse reconstruction result, and finally performing color reconstruction on the geometric reconstruction result, and obtaining a rendering display result of the three-dimensional character model A under a new real-time view angle as shown in fig. 10 through a plurality of different data reconstruction processes.
The embodiment of the application can effectively relieve the problems in the weak texture or non-texture areas by using the neural network MLPs, and the MLPs have stronger smoothing effect, which is equivalent to the regular terms which can be used for improving the prediction result. By adopting the Mesh-based scheme, the intersection point with the model surface can be quickly obtained by utilizing hardware rasterization, and then the final color can be obtained by simple calculation, so that the complex process of calculating the color by a plurality of sampling points in NeRF is avoided, and the effect of rendering and displaying in real time can be achieved at the mobile phone end. Compared with MobileNeRF, the embodiment of the application adopts the neural network to predict the SDF value, then converts the SDF value into the density in volume rendering, and generates the color through volume rendering integration, thereby optimizing the network and having higher model precision.
According to the embodiment of the application, an end-to-end solution is adopted, a differentiable surface rendering module is designed, and the combined rendering is realized, so that a geometric reconstruction network and a surface color prediction network can be trained simultaneously, the two networks are mutually promoted, and the respective network performance is improved. Through carrying out the Occupancy and Hash feature technology, the training process can be effectively accelerated, and the color quality is improved. The problem of large model surface noise caused by the hash characteristic can be alleviated by a differentiable surface rendering technology. By designing the separation of diffuse reflection color and highlight color, the space occupation of storage can be effectively reduced, and conventional graphic applications such as re-lighting, physical simulation, texture editing and the like can be realized.
It should be noted that, the user information (including but not limited to user equipment information, user personal information, etc.) and the data (including but not limited to data for analysis, stored data, presented data, etc.) related to the present application are information and data authorized by the user or fully authorized by each party, and the collection, use and processing of the related data need to comply with the related laws and regulations and standards of the related country and region, and provide corresponding operation entries for the user to select authorization or rejection.
It should be noted that, for simplicity of description, the foregoing method embodiments are all described as a series of acts, but it should be understood by those skilled in the art that the present application is not limited by the order of acts described, as some steps may be performed in other orders or concurrently in accordance with the present application. Further, those skilled in the art will also appreciate that the embodiments described in the specification are all preferred embodiments, and that the acts and modules referred to are not necessarily required for the present application.
From the description of the above embodiments, it will be clear to a person skilled in the art that the method according to the above embodiments may be implemented by means of software plus a necessary general hardware platform, but that it may also be implemented by means of hardware. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (e.g. ROM/RAM, magnetic disk, optical disk) comprising instructions for causing a terminal device (which may be a mobile phone, a computer, a server, or a network device, etc.) to perform the method according to the embodiments of the present application.
Example 2
There is also provided in accordance with an embodiment of the present application a method of processing an image of a photographing entity, it being noted that the steps shown in the flowchart of the drawings may be performed in a computer system such as a set of computer executable instructions, and that although a logical sequence is shown in the flowchart, in some cases the steps shown or described may be performed in a different order than that shown.
Fig. 11 is a flowchart of an image processing method of a photographing entity according to still another embodiment of the present application, which may be performed in a server side. As shown in fig. 11, the method may include the steps of:
step S111, receiving shooting data from a client, wherein the shooting data is obtained by carrying out image acquisition on a shooting entity;
step S112, performing sparse reconstruction on the shooting data to obtain a first reconstruction result, performing geometric reconstruction on the first reconstruction result to obtain a second reconstruction result, and performing color reconstruction on the second reconstruction result to obtain a rendering and displaying result of the shooting entity under the target view angle, wherein the first reconstruction result is used for determining a reconstruction range corresponding to the shooting entity, and the second reconstruction result is used for determining a three-dimensional grid model corresponding to the shooting entity;
Step S113, feeding back the rendering display result to the client.
In the embodiment of the application, shooting data from a client is received, wherein the shooting data is obtained by carrying out image acquisition on a shooting entity; performing sparse reconstruction on shooting data to obtain a first reconstruction result, performing geometric reconstruction on the first reconstruction result to obtain a second reconstruction result, and performing color reconstruction on the second reconstruction result to obtain a rendering and displaying result of the shooting entity under a target view angle, wherein the first reconstruction result is used for determining a reconstruction range corresponding to the shooting entity, and the second reconstruction result is used for determining a three-dimensional grid model corresponding to the shooting entity; and feeding back the rendering display result to the client.
It is easy to understand that the server adopts a mode of performing sparse reconstruction on shooting data, further performing geometric reconstruction on the sparse reconstruction result, and finally performing color reconstruction on the geometric reconstruction result, obtains a rendering display result of the shooting entity under a real-time new view angle through a plurality of different data reconstruction processes, and feeds back the rendering display result to the client, thereby improving the display effect when the shooting entity is rendered and displayed in real time. Therefore, the embodiment of the application achieves the aim of performing high-fidelity rendering display on the shooting entity, thereby realizing the technical effects of improving the reconstruction efficiency and the rendering quality when performing real-time rendering display on the shooting entity, and further solving the technical problems of low reconstruction efficiency and poor display effect when performing real-time rendering display on the shooting entity in the related technology.
Optionally, fig. 12 is a schematic diagram of an image processing method for a shooting entity at a cloud server according to an embodiment of the present application, as shown in fig. 12, a client uploads shooting data to the cloud server, the cloud server performs sparse reconstruction on the shooting data to obtain a first reconstruction result, performs geometric reconstruction on the first reconstruction result to obtain a second reconstruction result, and performs color reconstruction on the second reconstruction result to obtain a rendering and displaying result of the shooting entity under a target viewing angle, and then the cloud server feeds back the rendering and displaying result to the client, and the final rendering and displaying result is displayed to a user through a graphical user interface of the client.
It should be noted that, the image processing method of the shooting entity provided by the embodiment of the application can be applied to the application scene related to real-time rendering and displaying of the shooting entity, especially the real-time generation scene of a new view angle in the fields of e-commerce, education, medical treatment, conference, social network, financial products, logistics, navigation and the like.
Example 3
There is also provided, in accordance with an embodiment of the present application, a rendering presentation method applicable to a capturing entity in a virtual reality scene such as a virtual reality VR device, an augmented reality AR device, or the like, where it is noted that the steps illustrated in the flowchart of the drawings may be performed in a computer system such as a set of computer executable instructions, and although a logical order is illustrated in the flowchart, in some cases, the steps illustrated or described may be performed in an order different from that illustrated herein.
Fig. 13 is a flowchart of a rendering presentation method of a photographing entity according to an embodiment of the present application. As shown in fig. 3, the method may include the steps of:
step S131, displaying a rendering display result under a first view angle on a display picture of a Virtual Reality (VR) device or an Augmented Reality (AR) device, wherein the rendering display result sequentially carries out sparse reconstruction on shooting data to obtain a first reconstruction result, carries out geometric reconstruction on the first reconstruction result to obtain a second reconstruction result, carries out color reconstruction on the second reconstruction result to obtain shooting data, the shooting data is obtained by carrying out image acquisition on a shooting entity, the first reconstruction result is used for determining a reconstruction range corresponding to the shooting entity, and the second reconstruction result is used for determining a three-dimensional grid model corresponding to the shooting entity;
step S132, in response to the received view angle switching instruction, drives the VR device or the AR device to display the rendering display result under the second view angle.
In the embodiment of the application, a rendering display result under a first view angle is displayed on a display picture of Virtual Reality (VR) equipment or Augmented Reality (AR) equipment, wherein the rendering display result sequentially carries out sparse reconstruction on shooting data to obtain a first reconstruction result, carries out geometric reconstruction on the first reconstruction result to obtain a second reconstruction result, carries out color reconstruction on the second reconstruction result, the shooting data is obtained by carrying out image acquisition on a shooting entity, the first reconstruction result is used for determining a reconstruction range corresponding to the shooting entity, and the second reconstruction result is used for determining a three-dimensional grid model corresponding to the shooting entity; and in response to the received view angle switching instruction, driving the VR equipment or the AR equipment to display a rendering display result under the second view angle.
It is easy to understand that the method adopts a mode of performing sparse reconstruction on shooting data, further performing geometric reconstruction on the sparse reconstruction result and finally performing color reconstruction on the geometric reconstruction result, and obtains a rendering display result of the shooting entity under a real-time new view angle through a plurality of different data reconstruction processes, so that the display effect of the shooting entity in real-time rendering display is improved. Therefore, the embodiment of the application achieves the aim of performing high-fidelity rendering display on the shooting entity, thereby realizing the technical effects of improving the reconstruction efficiency and the rendering quality when performing real-time rendering display on the shooting entity, and further solving the technical problems of low reconstruction efficiency and poor display effect when performing real-time rendering display on the shooting entity in the related technology.
It should be noted that, the rendering and displaying method of the shooting entity provided by the embodiment of the application can be applied to the application scene related to rendering and displaying the shooting entity in real time in the fields of e-commerce, education, medical treatment, conference, social network, financial products, logistics, navigation and the like, particularly in a real-time new view angle generating scene.
Alternatively, in this embodiment, the rendering and displaying method of the shooting entity may be applied to a hardware environment formed by a server and a virtual reality device. The rendering display result under the target view angle is displayed on the display screen of the virtual reality VR device or the augmented reality AR device, and the server may be a server corresponding to the media file operator, where the network includes but is not limited to: the virtual reality device is not limited to a wide area network, a metropolitan area network, or a local area network: virtual reality helmets, virtual reality glasses, virtual reality all-in-one machines, and the like.
Optionally, the virtual reality device comprises: memory, processor, and transmission means. The memory is used to store an application program that can be used to perform: displaying a rendering display result under a first view angle on a display picture of Virtual Reality (VR) equipment or Augmented Reality (AR) equipment, wherein the rendering display result sequentially carries out sparse reconstruction on shooting data to obtain a first reconstruction result, carries out geometric reconstruction on the first reconstruction result to obtain a second reconstruction result, carries out color reconstruction on the second reconstruction result to obtain the second reconstruction result, the shooting data is obtained by carrying out image acquisition on shooting entities, the first reconstruction result is used for determining a reconstruction range corresponding to the shooting entities, and the second reconstruction result is used for determining a three-dimensional grid model corresponding to the shooting entities; and in response to the received view angle switching instruction, driving the VR equipment or the AR equipment to display a rendering display result under the second view angle.
It should be noted that, the method for rendering and displaying a shooting entity applied in a VR device or an AR device in this embodiment may include the method of the embodiment shown in fig. 13, so as to achieve the purpose of driving the VR device or the AR device to display a rendering and displaying result under a target viewing angle.
Alternatively, the processor of this embodiment may call the application program stored in the memory through the transmission device to perform the above steps. The transmission device can receive the media file sent by the server through the network and can also be used for data transmission between the processor and the memory.
Optionally, in the virtual reality device, a head-mounted display with eye tracking is provided, a screen in the head-mounted display of the HMD is used for displaying a video picture displayed, an eye tracking module in the HMD is used for acquiring real-time motion tracks of eyes of the user, a tracking system is used for tracking position information and motion information of the user in a real three-dimensional space, a calculation processing unit is used for acquiring real-time position and motion information of the user from the tracking system, and calculating three-dimensional coordinates of the head of the user in the virtual three-dimensional space, visual field orientation of the user in the virtual three-dimensional space and the like.
In the embodiment of the present application, the virtual reality device may be connected to a terminal, where the terminal and the server are connected through a network, and the virtual reality device is not limited to: the terminal is not limited to a PC, a mobile phone, a tablet PC, etc., and the server may be a server corresponding to a media file operator, and the network includes but is not limited to: a wide area network, a metropolitan area network, or a local area network.
Example 4
According to an embodiment of the present application, there is also provided an image processing apparatus of a photographing entity for implementing the image processing method of a photographing entity, and fig. 14 is a block diagram of the image processing apparatus of a photographing entity according to an embodiment of the present application, as shown in fig. 14, the apparatus including:
the acquisition module 1401 is configured to acquire an image of a shooting entity, so as to obtain shooting data;
a first reconstruction module 1402, configured to perform sparse reconstruction on the photographed data to obtain a first reconstruction result, where the first reconstruction result is used to determine a reconstruction range corresponding to the photographed entity;
a second reconstruction block 1403, configured to perform geometric reconstruction on the first reconstruction result to obtain a second reconstruction result, where the second reconstruction result is used to determine a three-dimensional grid model corresponding to the shooting entity;
and a third modeling block 1404, configured to perform color reconstruction on the second reconstruction result, so as to obtain a rendering and displaying result of the shooting entity under the target view angle.
Optionally, the acquisition module 1401 is further configured to: adopting an image acquisition component to acquire images of shooting entities to obtain a preset number of image frames; performing image gradient variance calculation on a preset number of image frames, and determining an ambiguity index; and screening the image frames with the preset number by using the ambiguity index to obtain shooting data.
Optionally, the first reconstruction module 1402 is further configured to: analyzing the shooting data to obtain an analysis result, wherein the analysis result is used for determining whether the shooting data contains calibration plate data or not; and performing sparse reconstruction on the shooting data by adopting a reconstruction mode corresponding to the analysis result to obtain a first reconstruction result.
Optionally, the first reconstruction module 1402 is further configured to: when the shooting data are determined to contain calibration plate data through the analysis result, positioning pose information of the image acquisition component and sparse point clouds corresponding to shooting entities by using the calibration plate data; and performing sparse reconstruction based on the pose information and the sparse point cloud to obtain a first reconstruction result.
Optionally, the first reconstruction module 1402 is further configured to: determining the center of the reconstruction range based on pose information, and determining the side length of the reconstruction range based on sparse point cloud; and performing sparse reconstruction according to the center and the side length to obtain a first reconstruction result.
Optionally, the first reconstruction module 1402 is further configured to: when the analysis result shows that the shooting data does not contain the calibration plate data, acquiring pose information of the image acquisition component in a three-dimensional reconstruction mode; and performing sparse reconstruction based on the pose information to obtain a first reconstruction result.
Optionally, the first reconstruction module 1402 is further configured to: determining a center of the reconstruction range based on the pose information, and determining a radius of the reconstruction range based on a distance between the center and the image acquisition assembly; and performing sparse reconstruction according to the center and the radius to obtain a first reconstruction result.
Optionally, the second remodelling block 1403 is also for: performing discrete sampling on the first reconstruction result to obtain voxel coordinate data; performing geometric reconstruction on voxel coordinate data by using a target geometric reconstruction network model to obtain a coincidence distance function value, wherein the target geometric reconstruction network model is obtained by using a plurality of groups of data through machine learning training, and the plurality of groups of data comprise: training voxel coordinates obtained by reconstructing the training image; and carrying out three-dimensional modeling based on the coincidence distance function value to obtain a second reconstruction result.
Optionally, the second remodelling block 1403 is also for: and carrying out discrete sampling on the occupied voxel grid of the shooting entity in the reconstruction range based on the first reconstruction result to obtain voxel coordinate data.
Optionally, the second remodelling block 1403 is also for: carrying out hash coding on the voxel coordinate data to obtain hash characteristics; and geometrically reconstructing the voxel coordinate data and the hash characteristic by adopting a target geometrical reconstruction network model to obtain the coincidence distance function value.
Optionally, the third modeling block 1404 is further configured to: performing voxel color reconstruction on the second reconstruction result to obtain a third reconstruction result, wherein the third reconstruction result is used for determining the voxel color of the shooting entity to be rendered and displayed under the target view angle; performing surface color reconstruction on the second reconstruction result to obtain a fourth reconstruction result, wherein the fourth reconstruction result is used for determining the surface color of the shooting entity to be rendered and displayed under the target view angle; and determining a rendering and displaying result of the shooting entity under the target view angle by using the third reconstruction result and the fourth reconstruction result.
Optionally, the third modeling block 1404 is further configured to: and carrying out voxel color reconstruction on the voxel coordinate data, the geometric reconstruction feature vector and the view angle direction of the target view angle by adopting a target voxel color prediction network model to obtain a first color value, wherein the target voxel color prediction network model is obtained by utilizing a plurality of groups of data through machine learning training, and the plurality of groups of data comprise: training a first predicted value corresponding to the voxel coordinates, wherein the first predicted value is a volume rendering predicted color obtained by reconstructing the voxel coordinates, the first color value is used for determining the volume rendering color, and the geometric reconstruction feature vector is obtained after geometrically reconstructing voxel coordinate data through a target geometric reconstruction network model; and calculating to obtain a third reconstruction result based on the voxel weight corresponding to the first color value and the coincidence distance function value.
Optionally, the third modeling block 1404 is further configured to: carrying out surface color reconstruction on voxel coordinate data and a geometric reconstruction feature vector by adopting a target surface color prediction network model to obtain a second color value, and carrying out surface color reconstruction on the voxel coordinate data, the geometric reconstruction feature vector and a view angle direction of a target view angle to obtain a third color value, wherein the target surface color prediction network model is obtained by machine learning training by utilizing a plurality of groups of data, and the plurality of groups of data comprise: training a second predicted value corresponding to the voxel coordinates, wherein the second predicted value is a surface rendering predicted color obtained by reconstructing the voxel coordinates, the second color value is used for determining a diffuse reflection color of the surface rendering, and the third color value is used for determining a specular reflection color of the surface rendering; a fourth reconstruction result is determined based on the second color value and the third color value.
Optionally, the image processing apparatus of the shooting entity further includes: a calculation module 1405 for calculating a first loss using the first predicted value and a first real value, wherein the first real value is used to represent a volume rendering real color; calculating a second loss using the second predicted value and a second real value, wherein the second real value is used for representing the surface rendering real color; calculating to obtain target loss through the first loss and the second loss; an updating module 1406 is configured to update model parameters of the initial geometry reconstruction network model with the target loss to obtain a target geometry reconstruction network model, update model parameters of the initial voxel color prediction network model to obtain a target voxel color prediction network model, and update model parameters of the initial surface color prediction network model to obtain a target surface color prediction network model.
Optionally, the image processing apparatus of the shooting entity further includes: a display module 1407, configured to display a rendering display result on a display screen of the virtual reality VR device or the augmented reality AR device; a processing module 1408, configured to perform a post-processing operation based on the rendering display result, to obtain an updated rendering display result, where the post-processing operation includes at least one of: re-polishing, physical simulation and texture editing.
Optionally, the image processing apparatus of the shooting entity further includes: a response module 1409 for selecting a three-dimensional object acquisition function in response to a first touch operation acting on the graphical user interface; responding to a second touch operation acting on the graphical user interface, and selecting to start to create a three-dimensional object acquisition item under the three-dimensional object acquisition function; responding to a third touch operation acting on the graphical user interface, determining the name and type of a three-dimensional object acquisition project, and starting a video acquisition function to acquire shooting data; and in response to a fourth touch operation acting on the graphical user interface, uploading shooting data to generate a rendering display result of the shooting entity under the target view angle.
In the embodiment of the application, image acquisition is carried out on a shooting entity to obtain shooting data; performing sparse reconstruction on the shooting data to obtain a first reconstruction result, wherein the first reconstruction result is used for determining a reconstruction range corresponding to the shooting entity; performing geometric reconstruction on the first reconstruction result to obtain a second reconstruction result, wherein the second reconstruction result is used for determining a three-dimensional grid model corresponding to the shooting entity; and performing color reconstruction on the second reconstruction result to obtain a rendering and displaying result of the shooting entity under the target view angle.
It is easy to understand that the method adopts a mode of performing sparse reconstruction on shooting data, further performing geometric reconstruction on the sparse reconstruction result and finally performing color reconstruction on the geometric reconstruction result, and obtains a rendering display result of the shooting entity under a real-time new view angle through a plurality of different data reconstruction processes, so that the display effect of the shooting entity in real-time rendering display is improved. Therefore, the embodiment of the application achieves the aim of performing high-fidelity rendering display on the shooting entity, thereby realizing the technical effects of improving the reconstruction efficiency and the rendering quality when performing real-time rendering display on the shooting entity, and further solving the technical problems of low reconstruction efficiency and poor display effect when performing real-time rendering display on the shooting entity in the related technology.
Here, it should be noted that the acquisition module 1401, the first reconstruction module 1402, the second reconstruction module 1403, and the third reconstruction module 1404 correspond to steps S31 to S34 in embodiment 1, and the four modules are the same as the examples and application scenarios implemented by the corresponding steps, but are not limited to those disclosed in embodiment 1. It should be noted that the above modules or units may be hardware components or software components stored in a memory and processed by one or more processors, or the above modules may also be part of an apparatus and may be run in the AR/VR device provided in embodiment 1.
Fig. 15 is a block diagram of a still another image processing apparatus of a photographing entity according to an embodiment of the present application, as shown in fig. 15, the apparatus including:
a receiving module 1501, configured to receive photographing data from a client, where the photographing data is obtained by performing image acquisition on a photographing entity;
a processing module 1502, configured to perform sparse reconstruction on the shooting data to obtain a first reconstruction result, perform geometric reconstruction on the first reconstruction result to obtain a second reconstruction result, and perform color reconstruction on the second reconstruction result to obtain a rendering and displaying result of the shooting entity under the target view angle, where the first reconstruction result is used to determine a reconstruction range corresponding to the shooting entity, and the second reconstruction result is used to determine a three-dimensional grid model corresponding to the shooting entity;
and the feedback module 1503 is used for feeding back the rendering display result to the client.
In the embodiment of the application, shooting data from a client is received, wherein the shooting data is obtained by carrying out image acquisition on a shooting entity; performing sparse reconstruction on shooting data to obtain a first reconstruction result, performing geometric reconstruction on the first reconstruction result to obtain a second reconstruction result, and performing color reconstruction on the second reconstruction result to obtain a rendering and displaying result of the shooting entity under a target view angle, wherein the first reconstruction result is used for determining a reconstruction range corresponding to the shooting entity, and the second reconstruction result is used for determining a three-dimensional grid model corresponding to the shooting entity; and feeding back the rendering display result to the client.
It is easy to understand that the server adopts a mode of performing sparse reconstruction on shooting data, further performing geometric reconstruction on the sparse reconstruction result, and finally performing color reconstruction on the geometric reconstruction result, obtains a rendering display result of the shooting entity under a real-time new view angle through a plurality of different data reconstruction processes, and feeds back the rendering display result to the client, thereby improving the display effect when the shooting entity is rendered and displayed in real time. Therefore, the embodiment of the application achieves the aim of performing high-fidelity rendering display on the shooting entity, thereby realizing the technical effects of improving the reconstruction efficiency and the rendering quality when performing real-time rendering display on the shooting entity, and further solving the technical problems of low reconstruction efficiency and poor display effect when performing real-time rendering display on the shooting entity in the related technology.
Here, it should be noted that the above-mentioned receiving module 1501, processing module 1502 and feedback module 1503 correspond to steps S111 to S113 in embodiment 2, and the three modules are the same as the examples and application scenarios implemented by the corresponding steps, but are not limited to those disclosed in embodiment 2 above. It should be noted that the above modules or units may be hardware components or software components stored in a memory and processed by one or more processors, or the above modules may also be part of an apparatus and may be run in the AR/VR device provided in embodiment 2.
Fig. 16 is a block diagram of a rendering presentation apparatus of a photographing entity according to an embodiment of the present application, as shown in fig. 16, the apparatus including:
the display module 1601 is configured to display a rendering display result under a first view angle on a display screen of a virtual reality VR device or an augmented reality AR device, where the rendering display result sequentially performs sparse reconstruction on shooting data to obtain a first reconstruction result, performs geometric reconstruction on the first reconstruction result to obtain a second reconstruction result, performs color reconstruction on the second reconstruction result, and obtains the shooting data by performing image acquisition on a shooting entity, where the first reconstruction result is used to determine a reconstruction range corresponding to the shooting entity, and the second reconstruction result is used to determine a three-dimensional grid model corresponding to the shooting entity;
the response module 1602 is configured to, in response to the received view angle switching instruction, drive the VR device or the AR device to display a rendering display result at the second view angle.
In the embodiment of the application, a rendering display result under a first view angle is displayed on a display picture of Virtual Reality (VR) equipment or Augmented Reality (AR) equipment, wherein the rendering display result sequentially carries out sparse reconstruction on shooting data to obtain a first reconstruction result, carries out geometric reconstruction on the first reconstruction result to obtain a second reconstruction result, carries out color reconstruction on the second reconstruction result, the shooting data is obtained by carrying out image acquisition on a shooting entity, the first reconstruction result is used for determining a reconstruction range corresponding to the shooting entity, and the second reconstruction result is used for determining a three-dimensional grid model corresponding to the shooting entity; and in response to the received view angle switching instruction, driving the VR equipment or the AR equipment to display a rendering display result under the second view angle.
It is easy to understand that the method adopts a mode of performing sparse reconstruction on shooting data, further performing geometric reconstruction on the sparse reconstruction result and finally performing color reconstruction on the geometric reconstruction result, and obtains a rendering display result of the shooting entity under a real-time new view angle through a plurality of different data reconstruction processes, so that the display effect of the shooting entity in real-time rendering display is improved. Therefore, the embodiment of the application achieves the aim of performing high-fidelity rendering display on the shooting entity, thereby realizing the technical effects of improving the reconstruction efficiency and the rendering quality when performing real-time rendering display on the shooting entity, and further solving the technical problems of low reconstruction efficiency and poor display effect when performing real-time rendering display on the shooting entity in the related technology.
Here, the display module 1601 and the response module 1602 correspond to steps S131 to S132 in embodiment 3, and the two modules are the same as the examples and the application scenarios implemented by the corresponding steps, but are not limited to those disclosed in embodiment 3. It should be noted that the above modules or units may be hardware components or software components stored in a memory and processed by one or more processors, or the above modules may also be part of an apparatus and may be run in the AR/VR device provided in embodiment 3.
Example 5
The embodiment of the application can provide an image processing system of a shooting entity, and the computer terminal can comprise an AR/VR device, a server and a client, wherein the AR/VR device can be any AR/VR device in an AR/VR device group. Optionally, the image processing system of the shooting entity includes: a processor; a memory coupled to the processor for providing instructions to the processor for processing the steps of:
image acquisition is carried out on a shooting entity to obtain shooting data;
performing sparse reconstruction on the shooting data to obtain a first reconstruction result, wherein the first reconstruction result is used for determining a reconstruction range corresponding to the shooting entity;
performing geometric reconstruction on the first reconstruction result to obtain a second reconstruction result, wherein the second reconstruction result is used for determining a three-dimensional grid model corresponding to the shooting entity;
and performing color reconstruction on the second reconstruction result to obtain a rendering and displaying result of the shooting entity under the target view angle.
In the embodiment of the application, image acquisition is carried out on a shooting entity to obtain shooting data; performing sparse reconstruction on the shooting data to obtain a first reconstruction result, wherein the first reconstruction result is used for determining a reconstruction range corresponding to the shooting entity; performing geometric reconstruction on the first reconstruction result to obtain a second reconstruction result, wherein the second reconstruction result is used for determining a three-dimensional grid model corresponding to the shooting entity; and performing color reconstruction on the second reconstruction result to obtain a rendering and displaying result of the shooting entity under the target view angle.
It is easy to understand that the method adopts a mode of performing sparse reconstruction on shooting data, further performing geometric reconstruction on the sparse reconstruction result and finally performing color reconstruction on the geometric reconstruction result, and obtains a rendering display result of the shooting entity under a real-time new view angle through a plurality of different data reconstruction processes, so that the display effect of the shooting entity in real-time rendering display is improved. Therefore, the embodiment of the application achieves the aim of performing high-fidelity rendering display on the shooting entity, thereby realizing the technical effects of improving the reconstruction efficiency and the rendering quality when performing real-time rendering display on the shooting entity, and further solving the technical problems of low reconstruction efficiency and poor display effect when performing real-time rendering display on the shooting entity in the related technology.
Example 6
Embodiments of the present application may provide an AR/VR device that may be any one of a group of AR/VR devices. Alternatively, in this embodiment, the AR/VR device may be replaced by a terminal device such as a mobile terminal.
Alternatively, in this embodiment, the AR/VR device may be located in at least one network device among a plurality of network devices of the computer network.
In this embodiment, the above-mentioned AR/VR device may execute the program codes of the following steps in the image processing method of the capturing entity: image acquisition is carried out on a shooting entity to obtain shooting data; performing sparse reconstruction on the shooting data to obtain a first reconstruction result, wherein the first reconstruction result is used for determining a reconstruction range corresponding to the shooting entity; performing geometric reconstruction on the first reconstruction result to obtain a second reconstruction result, wherein the second reconstruction result is used for determining a three-dimensional grid model corresponding to the shooting entity; and performing color reconstruction on the second reconstruction result to obtain a rendering and displaying result of the shooting entity under the target view angle.
Alternatively, fig. 17 is a block diagram of a computer terminal according to an embodiment of the present application. As shown, the computer terminal may include: one or more (only one is shown) processors 172, memory 174, a memory controller, and a peripheral interface, wherein the peripheral interface interfaces with the radio frequency module, the audio module, and the display.
The memory may be used to store software programs and modules, such as program instructions/modules corresponding to the image processing method and apparatus of the shooting entity in the embodiment of the present application, and the processor executes the software programs and modules stored in the memory, thereby executing various functional applications and data processing, that is, implementing the image processing method of the shooting entity. The memory may include high-speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory may further include memory remotely located relative to the processor, which may be connected to the computer terminal via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The processor may call the information and the application program stored in the memory through the transmission device to perform the following steps: image acquisition is carried out on a shooting entity to obtain shooting data; performing sparse reconstruction on the shooting data to obtain a first reconstruction result, wherein the first reconstruction result is used for determining a reconstruction range corresponding to the shooting entity; performing geometric reconstruction on the first reconstruction result to obtain a second reconstruction result, wherein the second reconstruction result is used for determining a three-dimensional grid model corresponding to the shooting entity; and performing color reconstruction on the second reconstruction result to obtain a rendering and displaying result of the shooting entity under the target view angle.
Optionally, the above processor may further execute program code for: adopting an image acquisition component to acquire images of shooting entities to obtain a preset number of image frames; performing image gradient variance calculation on a preset number of image frames, and determining an ambiguity index; and screening the image frames with the preset number by using the ambiguity index to obtain shooting data.
Optionally, the above processor may further execute program code for: analyzing the shooting data to obtain an analysis result, wherein the analysis result is used for determining whether the shooting data contains calibration plate data or not; and performing sparse reconstruction on the shooting data by adopting a reconstruction mode corresponding to the analysis result to obtain a first reconstruction result.
Optionally, the above processor may further execute program code for: when the shooting data are determined to contain calibration plate data through the analysis result, positioning pose information of the image acquisition component and sparse point clouds corresponding to shooting entities by using the calibration plate data; and performing sparse reconstruction based on the pose information and the sparse point cloud to obtain a first reconstruction result.
Optionally, the above processor may further execute program code for: determining the center of the reconstruction range based on pose information, and determining the side length of the reconstruction range based on sparse point cloud; and performing sparse reconstruction according to the center and the side length to obtain a first reconstruction result.
Optionally, the above processor may further execute program code for: when the analysis result shows that the shooting data does not contain the calibration plate data, acquiring pose information of the image acquisition component in a three-dimensional reconstruction mode; and performing sparse reconstruction based on the pose information to obtain a first reconstruction result.
Optionally, the above processor may further execute program code for: determining a center of the reconstruction range based on the pose information, and determining a radius of the reconstruction range based on a distance between the center and the image acquisition assembly; and performing sparse reconstruction according to the center and the radius to obtain a first reconstruction result.
Optionally, the above processor may further execute program code for: performing discrete sampling on the first reconstruction result to obtain voxel coordinate data; performing geometric reconstruction on voxel coordinate data by using a target geometric reconstruction network model to obtain a coincidence distance function value, wherein the target geometric reconstruction network model is obtained by using a plurality of groups of data through machine learning training, and the plurality of groups of data comprise: training voxel coordinates obtained by reconstructing the training image; and carrying out three-dimensional modeling based on the coincidence distance function value to obtain a second reconstruction result.
Optionally, the above processor may further execute program code for: and carrying out discrete sampling on the occupied voxel grid of the shooting entity in the reconstruction range based on the first reconstruction result to obtain voxel coordinate data.
Optionally, the above processor may further execute program code for: carrying out hash coding on the voxel coordinate data to obtain hash characteristics; and geometrically reconstructing the voxel coordinate data and the hash characteristic by adopting a target geometrical reconstruction network model to obtain the coincidence distance function value.
Optionally, the above processor may further execute program code for: performing voxel color reconstruction on the second reconstruction result to obtain a third reconstruction result, wherein the third reconstruction result is used for determining the voxel color of the shooting entity to be rendered and displayed under the target view angle; performing surface color reconstruction on the second reconstruction result to obtain a fourth reconstruction result, wherein the fourth reconstruction result is used for determining the surface color of the shooting entity to be rendered and displayed under the target view angle; and determining a rendering and displaying result of the shooting entity under the target view angle by using the third reconstruction result and the fourth reconstruction result.
Optionally, the above processor may further execute program code for: and carrying out voxel color reconstruction on the voxel coordinate data, the geometric reconstruction feature vector and the view angle direction of the target view angle by adopting a target voxel color prediction network model to obtain a first color value, wherein the target voxel color prediction network model is obtained by utilizing a plurality of groups of data through machine learning training, and the plurality of groups of data comprise: training a first predicted value corresponding to the voxel coordinates, wherein the first predicted value is a volume rendering predicted color obtained by reconstructing the voxel coordinates, the first color value is used for determining the volume rendering color, and the geometric reconstruction feature vector is obtained after geometrically reconstructing voxel coordinate data through a target geometric reconstruction network model; and calculating to obtain a third reconstruction result based on the voxel weight corresponding to the first color value and the coincidence distance function value.
Optionally, the above processor may further execute program code for: carrying out surface color reconstruction on voxel coordinate data and a geometric reconstruction feature vector by adopting a target surface color prediction network model to obtain a second color value, and carrying out surface color reconstruction on the voxel coordinate data, the geometric reconstruction feature vector and a view angle direction of a target view angle to obtain a third color value, wherein the target surface color prediction network model is obtained by machine learning training by utilizing a plurality of groups of data, and the plurality of groups of data comprise: training a second predicted value corresponding to the voxel coordinates, wherein the second predicted value is a surface rendering predicted color obtained by reconstructing the voxel coordinates, the second color value is used for determining a diffuse reflection color of the surface rendering, and the third color value is used for determining a specular reflection color of the surface rendering; a fourth reconstruction result is determined based on the second color value and the third color value.
Optionally, the above processor may further execute program code for: calculating a first loss by using the first predicted value and a first real value, wherein the first real value is used for representing the real color of the volume rendering; calculating a second loss using the second predicted value and a second real value, wherein the second real value is used for representing the surface rendering real color; calculating to obtain target loss through the first loss and the second loss; and updating model parameters of the initial geometric reconstruction network model to obtain a target geometric reconstruction network model by adopting target loss, updating model parameters of the initial voxel color prediction network model to obtain a target voxel color prediction network model, and updating model parameters of the initial surface color prediction network model to obtain a target surface color prediction network model.
Optionally, the above processor may further execute program code for: displaying rendering display results on a display screen of the virtual reality VR device or the augmented reality AR device; and performing post-processing operation based on the rendering display result to obtain an updated rendering display result, wherein the post-processing operation comprises at least one of the following steps: re-polishing, physical simulation and texture editing.
Optionally, the above processor may further execute program code for: responding to a first touch operation acted on a graphical user interface, and selecting a three-dimensional object acquisition function; responding to a second touch operation acting on the graphical user interface, and selecting to start to create a three-dimensional object acquisition item under the three-dimensional object acquisition function; responding to a third touch operation acting on the graphical user interface, determining the name and type of a three-dimensional object acquisition project, and starting a video acquisition function to acquire shooting data; and in response to a fourth touch operation acting on the graphical user interface, uploading shooting data to generate a rendering display result of the shooting entity under the target view angle.
Optionally, the above processor may further execute program code for: receiving shooting data from a client, wherein the shooting data is obtained by carrying out image acquisition on a shooting entity; performing sparse reconstruction on shooting data to obtain a first reconstruction result, performing geometric reconstruction on the first reconstruction result to obtain a second reconstruction result, and performing color reconstruction on the second reconstruction result to obtain a rendering and displaying result of the shooting entity under a target view angle, wherein the first reconstruction result is used for determining a reconstruction range corresponding to the shooting entity, and the second reconstruction result is used for determining a three-dimensional grid model corresponding to the shooting entity; and feeding back the rendering display result to the client.
Optionally, the above processor may further execute program code for: displaying a rendering display result under a first view angle on a display picture of Virtual Reality (VR) equipment or Augmented Reality (AR) equipment, wherein the rendering display result sequentially carries out sparse reconstruction on shooting data to obtain a first reconstruction result, carries out geometric reconstruction on the first reconstruction result to obtain a second reconstruction result, carries out color reconstruction on the second reconstruction result to obtain the second reconstruction result, the shooting data is obtained by carrying out image acquisition on shooting entities, the first reconstruction result is used for determining a reconstruction range corresponding to the shooting entities, and the second reconstruction result is used for determining a three-dimensional grid model corresponding to the shooting entities; and in response to the received view angle switching instruction, driving the VR equipment or the AR equipment to display a rendering display result under the second view angle.
By adopting the embodiment of the application, the shooting entity is subjected to image acquisition to obtain shooting data; performing sparse reconstruction on the shooting data to obtain a first reconstruction result, wherein the first reconstruction result is used for determining a reconstruction range corresponding to the shooting entity; performing geometric reconstruction on the first reconstruction result to obtain a second reconstruction result, wherein the second reconstruction result is used for determining a three-dimensional grid model corresponding to the shooting entity; and performing color reconstruction on the second reconstruction result to obtain a rendering and displaying result of the shooting entity under the target view angle.
It is easy to understand that the method adopts a mode of performing sparse reconstruction on shooting data, further performing geometric reconstruction on the sparse reconstruction result and finally performing color reconstruction on the geometric reconstruction result, and obtains a rendering display result of the shooting entity under a real-time new view angle through a plurality of different data reconstruction processes, so that the display effect of the shooting entity in real-time rendering display is improved. Therefore, the embodiment of the application achieves the aim of performing high-fidelity rendering display on the shooting entity, thereby realizing the technical effects of improving the reconstruction efficiency and the rendering quality when performing real-time rendering display on the shooting entity, and further solving the technical problems of low reconstruction efficiency and poor display effect when performing real-time rendering display on the shooting entity in the related technology.
It will be appreciated by those skilled in the art that the structure shown in the figure is only illustrative, and the computer terminal may be a smart phone (such as an Android phone, an iOS phone, etc.), a tablet computer, a palm computer, a mobile internet device (Mobile Internet Devices, MID), a PAD, etc. Fig. 17 does not limit the structure of the electronic device. For example, the computer terminal may also include more or fewer components (e.g., network interfaces, display devices, etc.) than shown in fig. 17, or have a different configuration than shown in fig. 17.
Those of ordinary skill in the art will appreciate that all or part of the steps in the various methods of the above embodiments may be implemented by a program for instructing a terminal device to execute in association with hardware, the program may be stored in a computer readable storage medium, and the storage medium may include: flash disk, read-Only Memory (ROM), random-access Memory (Random Access Memory, RAM), magnetic or optical disk, and the like.
Example 7
Embodiments of the present application also provide a computer-readable storage medium. Alternatively, in this embodiment, the computer readable storage medium may be used to store the program code executed by the image processing method of the capturing entity provided in the embodiment 1 and the embodiment 2 and the rendering and displaying method of the capturing entity provided in the embodiment 3.
Alternatively, in this embodiment, the above-mentioned computer readable storage medium may be located in any one of the AR/VR device terminals in the AR/VR device network or in any one of the mobile terminals in the mobile terminal group.
Optionally, in the present embodiment, the computer readable storage medium is configured to store program code for performing the steps of: image acquisition is carried out on a shooting entity to obtain shooting data; performing sparse reconstruction on the shooting data to obtain a first reconstruction result, wherein the first reconstruction result is used for determining a reconstruction range corresponding to the shooting entity; performing geometric reconstruction on the first reconstruction result to obtain a second reconstruction result, wherein the second reconstruction result is used for determining a three-dimensional grid model corresponding to the shooting entity; and performing color reconstruction on the second reconstruction result to obtain a rendering and displaying result of the shooting entity under the target view angle.
Optionally, in the present embodiment, the computer readable storage medium is configured to store program code for performing the steps of: adopting an image acquisition component to acquire images of shooting entities to obtain a preset number of image frames; performing image gradient variance calculation on a preset number of image frames, and determining an ambiguity index; and screening the image frames with the preset number by using the ambiguity index to obtain shooting data.
Optionally, in the present embodiment, the computer readable storage medium is configured to store program code for performing the steps of: analyzing the shooting data to obtain an analysis result, wherein the analysis result is used for determining whether the shooting data contains calibration plate data or not; and performing sparse reconstruction on the shooting data by adopting a reconstruction mode corresponding to the analysis result to obtain a first reconstruction result.
Optionally, in the present embodiment, the computer readable storage medium is configured to store program code for performing the steps of: when the shooting data are determined to contain calibration plate data through the analysis result, positioning pose information of the image acquisition component and sparse point clouds corresponding to shooting entities by using the calibration plate data; and performing sparse reconstruction based on the pose information and the sparse point cloud to obtain a first reconstruction result.
Optionally, in the present embodiment, the computer readable storage medium is configured to store program code for performing the steps of: determining the center of the reconstruction range based on pose information, and determining the side length of the reconstruction range based on sparse point cloud; and performing sparse reconstruction according to the center and the side length to obtain a first reconstruction result.
Optionally, in the present embodiment, the computer readable storage medium is configured to store program code for performing the steps of: when the analysis result shows that the shooting data does not contain the calibration plate data, acquiring pose information of the image acquisition component in a three-dimensional reconstruction mode; and performing sparse reconstruction based on the pose information to obtain a first reconstruction result.
Optionally, in the present embodiment, the computer readable storage medium is configured to store program code for performing the steps of: determining a center of the reconstruction range based on the pose information, and determining a radius of the reconstruction range based on a distance between the center and the image acquisition assembly; and performing sparse reconstruction according to the center and the radius to obtain a first reconstruction result.
Optionally, in the present embodiment, the computer readable storage medium is configured to store program code for performing the steps of: performing discrete sampling on the first reconstruction result to obtain voxel coordinate data; performing geometric reconstruction on voxel coordinate data by using a target geometric reconstruction network model to obtain a coincidence distance function value, wherein the target geometric reconstruction network model is obtained by using a plurality of groups of data through machine learning training, and the plurality of groups of data comprise: training voxel coordinates obtained by reconstructing the training image; and carrying out three-dimensional modeling based on the coincidence distance function value to obtain a second reconstruction result.
Optionally, in the present embodiment, the computer readable storage medium is configured to store program code for performing the steps of: and carrying out discrete sampling on the occupied voxel grid of the shooting entity in the reconstruction range based on the first reconstruction result to obtain voxel coordinate data.
Optionally, in the present embodiment, the computer readable storage medium is configured to store program code for performing the steps of: carrying out hash coding on the voxel coordinate data to obtain hash characteristics; and geometrically reconstructing the voxel coordinate data and the hash characteristic by adopting a target geometrical reconstruction network model to obtain the coincidence distance function value.
Optionally, in the present embodiment, the computer readable storage medium is configured to store program code for performing the steps of: performing voxel color reconstruction on the second reconstruction result to obtain a third reconstruction result, wherein the third reconstruction result is used for determining the voxel color of the shooting entity to be rendered and displayed under the target view angle; performing surface color reconstruction on the second reconstruction result to obtain a fourth reconstruction result, wherein the fourth reconstruction result is used for determining the surface color of the shooting entity to be rendered and displayed under the target view angle; and determining a rendering and displaying result of the shooting entity under the target view angle by using the third reconstruction result and the fourth reconstruction result.
Optionally, in the present embodiment, the computer readable storage medium is configured to store program code for performing the steps of: and carrying out voxel color reconstruction on the voxel coordinate data, the geometric reconstruction feature vector and the view angle direction of the target view angle by adopting a target voxel color prediction network model to obtain a first color value, wherein the target voxel color prediction network model is obtained by utilizing a plurality of groups of data through machine learning training, and the plurality of groups of data comprise: training a first predicted value corresponding to the voxel coordinates, wherein the first predicted value is a volume rendering predicted color obtained by reconstructing the voxel coordinates, the first color value is used for determining the volume rendering color, and the geometric reconstruction feature vector is obtained after geometrically reconstructing voxel coordinate data through a target geometric reconstruction network model; and calculating to obtain a third reconstruction result based on the voxel weight corresponding to the first color value and the coincidence distance function value.
Optionally, in the present embodiment, the computer readable storage medium is configured to store program code for performing the steps of: carrying out surface color reconstruction on voxel coordinate data and a geometric reconstruction feature vector by adopting a target surface color prediction network model to obtain a second color value, and carrying out surface color reconstruction on the voxel coordinate data, the geometric reconstruction feature vector and a view angle direction of a target view angle to obtain a third color value, wherein the target surface color prediction network model is obtained by machine learning training by utilizing a plurality of groups of data, and the plurality of groups of data comprise: training a second predicted value corresponding to the voxel coordinates, wherein the second predicted value is a surface rendering predicted color obtained by reconstructing the voxel coordinates, the second color value is used for determining a diffuse reflection color of the surface rendering, and the third color value is used for determining a specular reflection color of the surface rendering; a fourth reconstruction result is determined based on the second color value and the third color value.
Optionally, in the present embodiment, the computer readable storage medium is configured to store program code for performing the steps of: calculating a first loss by using the first predicted value and a first real value, wherein the first real value is used for representing the real color of the volume rendering; calculating a second loss using the second predicted value and a second real value, wherein the second real value is used for representing the surface rendering real color; calculating to obtain target loss through the first loss and the second loss; and updating model parameters of the initial geometric reconstruction network model to obtain a target geometric reconstruction network model by adopting target loss, updating model parameters of the initial voxel color prediction network model to obtain a target voxel color prediction network model, and updating model parameters of the initial surface color prediction network model to obtain a target surface color prediction network model.
Optionally, in the present embodiment, the computer readable storage medium is configured to store program code for performing the steps of: displaying rendering display results on a display screen of the virtual reality VR device or the augmented reality AR device; and performing post-processing operation based on the rendering display result to obtain an updated rendering display result, wherein the post-processing operation comprises at least one of the following steps: re-polishing, physical simulation and texture editing.
Optionally, in the present embodiment, the computer readable storage medium is configured to store program code for performing the steps of: responding to a first touch operation acted on a graphical user interface, and selecting a three-dimensional object acquisition function; responding to a second touch operation acting on the graphical user interface, and selecting to start to create a three-dimensional object acquisition item under the three-dimensional object acquisition function; responding to a third touch operation acting on the graphical user interface, determining the name and type of a three-dimensional object acquisition project, and starting a video acquisition function to acquire shooting data; and in response to a fourth touch operation acting on the graphical user interface, uploading shooting data to generate a rendering display result of the shooting entity under the target view angle.
Optionally, in the present embodiment, the computer readable storage medium is configured to store program code for performing the steps of: receiving shooting data from a client, wherein the shooting data is obtained by carrying out image acquisition on a shooting entity; performing sparse reconstruction on shooting data to obtain a first reconstruction result, performing geometric reconstruction on the first reconstruction result to obtain a second reconstruction result, and performing color reconstruction on the second reconstruction result to obtain a rendering and displaying result of the shooting entity under a target view angle, wherein the first reconstruction result is used for determining a reconstruction range corresponding to the shooting entity, and the second reconstruction result is used for determining a three-dimensional grid model corresponding to the shooting entity; and feeding back the rendering display result to the client.
Optionally, in the present embodiment, the computer readable storage medium is configured to store program code for performing the steps of: displaying a rendering display result under a first view angle on a display picture of Virtual Reality (VR) equipment or Augmented Reality (AR) equipment, wherein the rendering display result sequentially carries out sparse reconstruction on shooting data to obtain a first reconstruction result, carries out geometric reconstruction on the first reconstruction result to obtain a second reconstruction result, carries out color reconstruction on the second reconstruction result to obtain the second reconstruction result, the shooting data is obtained by carrying out image acquisition on shooting entities, the first reconstruction result is used for determining a reconstruction range corresponding to the shooting entities, and the second reconstruction result is used for determining a three-dimensional grid model corresponding to the shooting entities; and in response to the received view angle switching instruction, driving the VR equipment or the AR equipment to display a rendering display result under the second view angle.
By adopting the embodiment of the application, the shooting entity is subjected to image acquisition to obtain shooting data; performing sparse reconstruction on the shooting data to obtain a first reconstruction result, wherein the first reconstruction result is used for determining a reconstruction range corresponding to the shooting entity; performing geometric reconstruction on the first reconstruction result to obtain a second reconstruction result, wherein the second reconstruction result is used for determining a three-dimensional grid model corresponding to the shooting entity; and performing color reconstruction on the second reconstruction result to obtain a rendering and displaying result of the shooting entity under the target view angle.
It is easy to understand that the method adopts a mode of performing sparse reconstruction on shooting data, further performing geometric reconstruction on the sparse reconstruction result and finally performing color reconstruction on the geometric reconstruction result, and obtains a rendering display result of the shooting entity under a real-time new view angle through a plurality of different data reconstruction processes, so that the display effect of the shooting entity in real-time rendering display is improved. Therefore, the embodiment of the application achieves the aim of performing high-fidelity rendering display on the shooting entity, thereby realizing the technical effects of improving the reconstruction efficiency and the rendering quality when performing real-time rendering display on the shooting entity, and further solving the technical problems of low reconstruction efficiency and poor display effect when performing real-time rendering display on the shooting entity in the related technology.
The foregoing embodiment numbers of the present application are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments.
In the foregoing embodiments of the present application, the descriptions of the embodiments are emphasized, and for a portion of this disclosure that is not described in detail in this embodiment, reference is made to the related descriptions of other embodiments.
In the several embodiments provided in the present application, it should be understood that the disclosed technology may be implemented in other manners. The above-described embodiments of the apparatus are merely exemplary, and the division of the units, such as the division of the units, is merely a logical function division, and may be implemented in another manner, for example, multiple units or components may be combined or may be integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some interfaces, units or modules, or may be in electrical or other forms.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.
The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be embodied essentially or in part or all of the technical solution or in part in the form of a software product stored in a storage medium, including instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a removable hard disk, a magnetic disk, or an optical disk, or other various media capable of storing program codes.
The foregoing is merely a preferred embodiment of the present application and it should be noted that modifications and adaptations to those skilled in the art may be made without departing from the principles of the present application, which are intended to be comprehended within the scope of the present application.