Movatterモバイル変換


[0]ホーム

URL:


CN120129927A - Method and system for rendering video graphics using scene segmentation - Google Patents

Method and system for rendering video graphics using scene segmentation
Download PDF

Info

Publication number
CN120129927A
CN120129927ACN202380070881.1ACN202380070881ACN120129927ACN 120129927 ACN120129927 ACN 120129927ACN 202380070881 ACN202380070881 ACN 202380070881ACN 120129927 ACN120129927 ACN 120129927A
Authority
CN
China
Prior art keywords
data
scene
cluster
rendering
graphics
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202380070881.1A
Other languages
Chinese (zh)
Inventor
保拉·宁
哈维尔·桑多瓦尔
李辰
孙宏宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Innopeak Technology Inc
Original Assignee
Innopeak Technology Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Innopeak Technology IncfiledCriticalInnopeak Technology Inc
Publication of CN120129927ApublicationCriticalpatent/CN120129927A/en
Pendinglegal-statusCriticalCurrent

Links

Classifications

Landscapes

Abstract

Translated fromChinese

用于渲染图形的系统和方法。该方法包括从光栅化的3D场景生成和接收图形数据,以确定至少第一对象。可以由顶点着色器使用场景的顶点数据生成图元数据。提供的位置向量可以定义整体簇数据结构在三个轴上的中心,提供的细分标量可以在三轴上划分结构。使用至少细分标量和结构的范围,可以通过边界框对簇进行划分。几何位置数据可以映射到由中心簇位置定义的相关簇。可以使用生成的簇数据为每个对象生成剔除掩码,并且可以使用至少剔除掩码和簇数据渲染场景。其他实施例包括被配置为执行上述方法的动作的相应系统和计算机程序。

Systems and methods for rendering graphics. The method includes generating and receiving graphics data from a rasterized 3D scene to determine at least a first object. Graph data may be generated by a vertex shader using vertex data of the scene. A provided position vector may define the center of an overall cluster data structure on three axes, and a provided subdivision scalar may divide the structure on the three axes. Clusters may be divided by bounding boxes using at least the subdivision scalar and the extent of the structure. The geometric position data may be mapped to associated clusters defined by the central cluster position. A culling mask may be generated for each object using the generated cluster data, and a scene may be rendered using at least the culling mask and the cluster data. Other embodiments include corresponding systems and computer programs configured to perform the actions of the above method.

Description

Method and system for rendering video graphics using scene segmentation
Cross Reference to Related Applications
The present application claims priority from U.S. patent application Ser. No. 63/418,050 ("050 application"), entitled "Scene segmentation for Mobile," filed on App. No. 2022, ser. No. 10, month 21 (agency's No. 1282. INNOPEAK-1022-099-P), the entire contents of which are incorporated herein by reference for all purposes.
Background
As video graphics standards increase year by year, the resource costs of rendering such video graphics are also continually rising. Optimizing these costs is particularly important in real-time applications (RTA) such as video games, video conferences, virtual Reality (VR) applications and extended reality (XR) applications. Furthermore, as the use of such real-time applications on mobile devices has become very common, it has become increasingly important to improve the quality of video graphics in mobile applications. However, mobile devices have limited memory capacity and bandwidth compared to desktop computers, which presents challenges in achieving adequate rendering performance. While there are a number of solutions to address the memory intensive problem of video graphics rendering, as described below, these solutions are still not satisfactory.
Accordingly, new and improved systems and methods for rendering video graphics are needed.
Disclosure of Invention
The invention relates to a graphics rendering system and method. According to a specific embodiment, the present invention provides a method for utilizing world space sampling clusters (world-SPACE SAMPLING clusters), a programmed instance rejection mask (procedural instance cull mask), and an instance rejection cluster (instance culled cluster). Still other embodiments exist.
Embodiments of the invention may be used in conjunction with existing systems and processes. For example, the rendering system configuration and related methods according to the present invention may be widely applied to various systems, including Virtual Reality (VR) systems, mobile devices, and the like. Furthermore, various techniques according to the present invention may be integrated into existing systems through integrated circuit fabrication, operating system software, and application program interfaces (application programming interface, APIs). There are other advantages.
The system of one or more computers may be configured to perform particular operations or actions by installing software, firmware, hardware, or a combination thereof on the system, which when executed would cause the system to perform the operations or actions. One or more computer programs may be configured by including instructions to perform particular operations or acts that, when executed by data processing apparatus, cause the apparatus to perform the operations or acts. One general aspect includes a method for rendering graphics on a computer device. The method includes receiving a plurality of graphics data associated with a three-dimensional (3D) scene, the plurality of graphics data including a plurality of vertex data associated with a plurality of vertices in the 3D scene. The method also includes generating a plurality of screen space primitive data (PRIMITIVE DATA) using at least the vertex data and the vertex shader, the plurality of primitive data may include at least a barycentric coordinate and a triangle index. The method further includes providing a location vector defining a center of the global cluster data structure (overall cluster data structure) in three axes. The method further includes a subdivision scalar (subdivision scalar) for uniformly dividing the global cluster data structure along the three axes. Clusters (clusters) are created by partitioning bounding boxes using at least a subdivision scalar and a range of overall data structures. The method further includes mapping the geometric position data to an associated cluster defined by a cluster center location. The method also includes executing, by one or more compute shaders, threads for each cluster to generate cluster data corresponding to the cluster. The method also includes generating a culling mask for each object using the cluster data. The method further includes rendering, by the shader, the objects represented in the screen space primitive buffer using at least the culling mask and the cluster data. Other embodiments of this aspect include corresponding computer systems, apparatus, and computer programs recorded on one or more computer storage devices, each configured to perform the actions of the methods described above.
Implementations may include one or more of the following features. The method may include storing Reservoir data based on spatiotemporal importance sampling of the Reservoir (Reservoir-Based Spatio-Temporal Importance Sampling, reSTIR). The method may include storing an ordered list of light sources. The center of the overall grid is independent of camera position. The center of the whole grid is positioned at the position of the camera. The method may include storing a cull mask, which may include cull to (cull-to) mask and/or cull from (cull-from) mask. The culling mask may be specified based on clusters corresponding to uniformly sized scene segments. The rejection mask may also be specified based on the material and the distance of each object relative to the camera. An implementation of the described techniques may include hardware, a method or process, or computer software on a computer-accessible medium.
One particular aspect of the invention includes a system for rendering video graphics. The system includes a memory that may include executable instructions. The system also includes a memory. The system further includes a processor coupled to the memory and configured to generate a plurality of graphics data associated with the 3D scene, the plurality of graphics data including a plurality of vertex data associated with a plurality of vertices in the 3D scene, generate a plurality of screen space primitive data using at least the plurality of vertex data and a vertex shader, the plurality of primitive data may include at least barycentric coordinates and a triangle index, provide a position vector defining a center of the global cluster data structure in three axes, provide subdivision scalars for uniformly dividing the global cluster data structure along the three axes, create clusters by dividing bounding boxes using at least the subdivision scalars and a range of the global data structure, map the geometric position data to associated clusters defined by cluster center positions, execute threads for each cluster by one or more compute shaders to generate cluster data corresponding to the clusters, generate a culling mask for each object using the cluster data, and render the objects represented in the screen space primitive buffer by the shader using at least the culling mask and the clusters. Other embodiments of this aspect include corresponding computer systems, apparatus, and computer programs recorded on one or more computer storage devices, each configured to perform the actions of the methods described above.
Implementations may include one or more of the following features. The processors in the system may include a central processing unit (central processing unit, CPU) and a graphics processing unit (graphics processing unit, GPU), where the GPU is configured. Memory is shared by the CPU and GPU. The memory may include a frame buffer for storing the first object. The system may include a display configured to display a first object at a refresh rate of at least 24 frames per second. An implementation of the described techniques may include hardware, a method or process, or computer software on a computer-accessible medium.
One particular aspect of the invention relates to a method for rendering graphics on a computer device. The method includes generating a 3D scene including a first object. The method also includes receiving a plurality of graphics data associated with the 3D scene. The method further includes providing a position vector defining a center of the global cluster data structure in three axes. The method further includes providing a subdivision scalar for dividing the global cluster data structure along three axes. The method further includes providing for creating clusters by partitioning the bounding box using at least a subdivision scalar and a range of the overall data structure. The method further includes mapping the geometric position data to an associated cluster defined by a cluster center location. The method also includes executing, by one or more compute shaders, threads for each cluster to generate cluster data corresponding to the cluster. The method also includes generating a culling mask for each object using the cluster data. The method further includes rendering, by the shader, the 3D scene using at least the culling mask and the cluster data. Other embodiments of this aspect include corresponding computer systems, apparatus, and computer programs recorded on one or more computer storage devices, each configured to perform the actions of the methods described above.
Implementations may include one or more of the following features. In the method, the 3D scene is rasterized to determine at least a first object that is intersected by a chief ray passing through each pixel, and the plurality of graphics data includes a plurality of vertex data associated with a plurality of vertices in the 3D scene. The method may include generating a plurality of primitive data using at least a plurality of vertex data and a vertex shader, the plurality of primitive data may include at least position data. The bounding box is evenly divided. The bounding box of the method is partitioned according to the horizon in the 3D scene. The method may include obtaining settings for dividing the bounding box. The method may include defining a culling mask based at least on an opacity of objects in the 3D scene. An implementation of the described techniques may include hardware, a method or process, or computer software on a computer-accessible medium.
It should be appreciated that embodiments of the present invention have many advantages over conventional techniques. The present invention provides configurations and methods for graphics rendering systems that reduce memory bandwidth load and improve performance by using world space sampling clusters and programmed instance rejection masks. In addition, the invention realizes the example elimination cluster by sharing the segmentation data, thereby improving the performance of specific application.
The present invention achieves these and other advantages in the context of known technology. However, the nature and advantages of the invention may be further understood by reference to the following description and the accompanying drawings.
Drawings
FIG. 1 is a simplified schematic diagram of a mobile device for rendering video graphics in accordance with an embodiment of the invention.
Fig. 2 is a simplified flow diagram of a conventional forward pipeline (pipeline) for rendering video graphics.
Fig. 3 is a simplified flow diagram of a conventional mixing pipeline for rendering video graphics.
Fig. 4A to 4D are simplified schematic diagrams of a cluster segmentation method according to an embodiment of the present invention.
Fig. 5 is a simplified flowchart of a graphics rendering method according to an embodiment of the present invention.
Fig. 6 is a simplified flowchart of a graphics rendering method according to an embodiment of the present invention.
Detailed Description
The invention relates to a graphics rendering system and method. According to a specific embodiment, the invention provides a method and a system for utilizing world space sampling clusters, programmed instance rejection masks and instance rejection clusters. The present invention may be configured for real-time applications (RTA), such as video conferencing, video gaming, virtual Reality (VR) and extended reality (XR) applications. Still other embodiments exist.
Mobile and XR applications typically need to address various limitations in graphics processing. These limitations can impact the visual quality and performance of the application and present challenges to providing a smooth and attractive user experience. One limitation is the limited processing power of the mobile device. Most smartphones and tablet computers are equipped with relatively small and power-efficient processors compared to desktop computers, which are not designed for intensive graphics processing tasks. Thus, mobile applications often need to efficiently render graphics using optimized algorithms and techniques to avoid overloading the device hardware. Another limitation is the limited memory and storage space of the mobile device. Most smartphones and tablet computers have limited and shared memory, which may limit the complexity and level of detail of graphics that can be rendered, and may require mobile applications to use efficient data structures and algorithms to minimize the memory and storage space they occupy. The third limitation is the limited battery life of the mobile device. Graphics processing may cause significant drain on the battery of the device, so mobile applications must be designed to minimize the impact on battery life to avoid draining power during use. This may involve techniques such as reducing the frame rate, using low resolution graphics, or disabling certain graphics functions when not needed. Another limitation is the limited bandwidth and connectivity of mobile networks. Many mobile applications rely on network connections to access data and resources, but mobile networks can be slow and unreliable, especially in poorly covered areas. This may affect the performance of graphics intensive applications and may require the use of techniques such as data compression and caching to minimize the amount of data that needs to be transmitted over the network. In order to provide a smooth and attractive user experience, mobile applications must be carefully designed, fully taking into account these limitations, and efficiently and effectively render graphics using optimized algorithms and techniques.
It should be appreciated that embodiments of the present invention provide techniques involving, but not limited to, spatial partitioning and forward rendering to improve graphics rendering in mobile devices. Spatial partitioning is a computer graphics technique that improves rendering performance by partitioning a three-dimensional space into smaller, more manageable areas. In real-time rendering, this is achieved by organizing objects in a scene into a data structure that allows for efficient spatial querying by traversing only the geometry relevant to the current query. One data structure used for spatial partitioning in forward rendering is a uniform spatial grid. The uniform spatial grid divides the three-dimensional space into a regular grid of cells, each cell containing a list of objects intersecting it. At the time of rendering, the camera's cone of view (view frustum) is used to determine which cells are in view, and only the objects in these cells are added to the list of objects that need to be rendered. This avoids the need to traverse the entire scene and can significantly reduce the number of objects that need to be considered for rendering. Another data structure commonly used for spatial partitioning in forward rendering is a binary spatial partitioning (binary space partitioning, BSP) tree. The BSP tree organizes objects in the scene into a hierarchical tree structure, with each node representing a plane that divides the space into two regions. At rendering, camera cones are used to determine which nodes are in view and only traverse these nodes to find the object that needs to be rendered. This allows more efficient culling of objects outside the view cone and may further reduce the number of objects that need to be considered for rendering. In addition to improving rendering performance, spatial partitioning may also be used for other tasks such as visibility determination, collision detection, and illumination calculation.
Forward+ rendering is a computer graphics technique that improves the performance and visual quality of real-time rendering. Forward + rendering is an extension of the traditional forward rendering method that individually renders each object in a scene by fully coloring each pixel covered by the object. Forward+rendering uses minimal pre-processing (prepass) to generate at least the screen space depth buffer before the full rendering pass. This depth buffer is used to refuse to execute the more costly main shaders that would render pixels that were occluded by having depth values that are farther than the values in the depth buffer. The forward + rendering pipeline may be further extended by dividing the three-dimensional space into smaller regions (e.g., a grid or tree structure) using spatial division techniques. The number of light sources in the scene is grouped in a screen space subdivision (subdivision) rather than a world space subdivision. A group of pixels (hereinafter referred to as a "grid cell") will contain a number of light sources attached to these pixels. During rendering, each pixel only queries the light sources associated with its assigned subdivision, rather than the complete list of light sources in the scene. This allows for efficient culling of objects outside the view cone and reduces the number of objects that need to be considered for rendering. Once the objects in the field of view are determined, a light source rejection step is used to generate a list of light sources for each grid cell, taking into account only the light sources that affect those objects. This further reduces the number of light sources that need to be processed and may save computational resources. Next, the light source rendering step calculates the illumination of each object. This involves selecting one light source from a list of light sources, evaluating the effect of that light source on the surface of the object, accumulating the resulting color, and repeating this process for other light sources as needed. This step may be performed in parallel on the GPU, allowing multiple light sources to be efficiently processed at a single time. Finally, forward + rendering applies the calculated illumination to the object using a shading step and generates a final image. This step may also be performed on the GPU, allowing real-time rendering of the scene.
Forward + rendering with a list of screen space light sources provides a number of advantages over conventional forward rendering. It allows more efficient culling of objects and light sources, which may improve rendering performance and reduce memory usage. It also allows more light sources to be used in the scene, which may improve the visual quality of the illumination. Furthermore, parallel processing on the GPU allows complex scenes to be rendered in real-time. However, there are also some limitations and trade-offs with this rendering architecture. One disadvantage is the overhead of building and maintaining the spatially partitioned data structure. For dynamic scenes with constantly changing object positions, the cost may be particularly high and if performed inefficiently, the overall rendering performance may be affected. Another problem is the choice of data structure and cell size. The optimal data structure and cell size will depend on the characteristics of the scene and the desired performance, but finding the correct balance can be challenging. Depth preprocessing and light source listing are valuable techniques to improve real-time rendering performance and visual quality. However, these methods are not directly applicable to ray tracing because ray tracing rendering effects rely on rendering the entire scene. The use of a list of screen space light sources in a path tracker can result in a large amount of illumination data being lost during rendering, creating artifacts or other problems during rendering.
It should be appreciated that embodiments of the present invention, as described in further detail below, efficiently implement scene segmentation and forward rendering techniques for mobile applications.
The following description is presented to enable one of ordinary skill in the art to make and use the invention and is provided in the context of a particular application. Various modifications and various uses in different applications will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to a wide variety of embodiments. Thus, the present invention should not be limited to the embodiments described herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
In the following detailed description, numerous specific details are set forth in order to provide a more thorough understanding of the invention. However, it will be apparent to one skilled in the art that the present invention may be practiced without these specific details. In other instances, well-known structures and devices are shown in block diagram form, rather than in detail, in order to avoid obscuring the present invention.
The reader of this invention shall note that all documents and documents filed concurrently with this specification, and documents which are public inspection with this specification, are incorporated herein by reference. All the features disclosed in this specification (including any accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise. Thus, unless expressly stated otherwise, each feature disclosed herein is one example only of a generic series of equivalent or similar features.
Furthermore, any element in a claim that is not explicitly stated as "a means for" or "a step for" performing a particular function should not be construed as a "means" or "step" clause as specified in 35U.S. c. clause 112, clause 6. In particular, the "step of the" or "act of the" as used in the claims herein is not intended to refer to the provision of clause 6 of 35u.s.c. 112.
Note that the terms left, right, front, rear, top, bottom, forward, rearward, clockwise and counterclockwise, if used, are used for convenience only and are not intended to imply any particular fixed orientation. Rather, these are used to reflect the relative position and/or orientation between various portions of the object.
Fig. 1 is a simplified schematic diagram of a mobile device 100 for performing graphics rendering using scene segmentation. The diagram is merely an example, which should not unduly limit the scope of the claims. Those of ordinary skill in the art will recognize many variations, alternatives, and modifications.
As shown, mobile device 100 may be configured within a housing 110 and may include a camera device 120 (or other image or video capture device), a processor device 130, a memory 140 (e.g., volatile memory storage), and a storage 150 (e.g., persistent memory storage). The camera 120 may be mounted on the housing 110 and configured to capture an input image. The input image may be stored in a memory 140, and the memory 140 may include a random-access memory (RAM) device, an image/video buffer device, a frame buffer, and the like. Various software, executable instructions, and files may be stored in the memory 150, and the memory 150 may include a read-only memory (ROM), a hard disk, and the like. Processor 130 may be coupled to each of the components described above and configured to communicate between these components.
In a specific example, the processor 130 includes a central processing unit (central processing unit, CPU), a network processing unit (network processing unit, NPU), and the like. The device 100 may also include a graphics processing unit (graphics processing unit, GPU) 132 that is coupled to at least the processor 130 and the memory 140. In an example, memory 140 is configured to be shared between processor 130 (e.g., a CPU) and GPU 132 and is configured to hold data used by an application program when running. Since the memory 140 is shared, the memory 140 must be efficient when used. For example, high memory usage of GPU 132 may negatively impact system performance.
The device 100 may also include a user interface 160 and a network interface 170. The user interface 160 may include a display area 162 configured to display text, images, video, render graphics, interactive elements, and the like. Display 162 may be connected to GPU 132 and may also be configured to display at a refresh rate of at least 24 frames per second. The display area 162 may include a touch screen display (e.g., in a mobile device, tablet, etc.). Or the user interface 160 may also include a touch interface 164, the touch interface 164 for receiving user input (e.g., a keyboard or keypad in a mobile device, notebook computer, or other computing device). The user interface 160 may be used for real-time applications (RTA) such as multimedia streaming, video conferencing, navigation, video gaming, etc.
The network interface 170 may be configured to send and receive instructions and files for graphics rendering (e.g., using Wi-Fi, bluetooth, ethernet, etc.). In a particular example, the network interface 170 may be configured to compress or downsample the image for transmission or further processing. The network interface 170 may be configured to send one or more images to a server for OCR. The processor 130 may be connected to and configured to communicate between the user interface 160, the network interface 170, and/or other interfaces.
In an example, processor 130 and GPU 132 may be configured to perform steps of rendering video graphics, which may include steps related to executable instructions stored in memory 150. The processor 130 may be configured to execute the application instructions and generate a plurality of graphics data related to a 3D scene comprising at least a first object. The plurality of graphics data may include a plurality of vertex data associated with a plurality of vertices (e.g., each object) in the 3D scene. GPU 132 may be configured to generate a plurality of primitive data using at least a plurality of vertex data and a vertex shader. The plurality of primitive data may include at least location data, etc.
In an example, GPU 130 may be configured to provide a location vector defining the center of the overall cluster data structure in three axes. GPU 132 may also be configured to provide a subdivision scalar that is used to divide the entire cluster data structure evenly along three axes. GPU 132 may then be configured to create clusters by partitioning the bounding box using at least the sub-divided scalar and the range of the overall data structure. GPU 132 may be configured to map geometric location data to associated clusters defined by cluster center locations. GPU 132 may also be configured to generate cluster data corresponding to the clusters by one or more compute shaders executing threads for each cluster. Using the cluster data, GPU 132 may be configured to generate a culling mask for each object. Further, GPU 132 may be configured to render at least the first object (and any other objects) through the shader using at least the cull mask and the cluster data.
To execute the described rendering pipeline, mobile device 100 includes hardware components that need to be configured and optimized to meet such rendering technology requirements. First, the CPU of the device needs to be powerful enough and energy efficient to handle the computation and data processing required for forward + rendering. This typically involves the use of a high performance CPU with multiple cores and threads, as well as specialized instructions and techniques, such as SIMD and out-of-order execution, to maximize performance and efficiency.
Second, the GPU of the device needs to be able to execute the complex shaders and algorithms used in the described rendering pipeline. This typically involves using a high performance GPU with a large number of compute units and fast memory bandwidth, and supporting advanced graphics APIs and functions, such as OpenGL ES 3.0 and compute shaders.
Third, the memory and storage subsystem of the device needs to be large enough and fast enough to support the data structures and textures used to store the spatial grid cells containing the data required for rendering. This typically involves the use of high capacity and high speed memory and storage technologies such as DDR4 RAM and UFS2.0 or NVMe storage, as well as efficient data structures and algorithms to minimize memory and storage usage.
Fourth, the display and touch screen of the device need to have high resolution and fast refresh rates to support the visual quality and interactivity of real-time rendering. This typically involves the use of high resolution and high refresh rate displays, as well as low latency and high precision touch screens to achieve smooth and responsive interaction.
The battery and power management subsystem of the device needs to be able to support the power requirements of real-time rendering. This typically involves the use of high capacity batteries and efficient power management techniques, such as smart charging and power saving modes, to ensure that the device can operate for long periods of time without draining power.
To meet the demands of the rendering pipeline described herein, mobile device 100 possesses powerful and energy-efficient hardware components, including a high-performance CPU, GPU, memory and storage subsystem, display and touch screen, and battery and power management subsystem. These components need to be optimized and configured to support the forward + rendering requirements and to achieve smooth and attractive visual effects and interactions.
Other embodiments of the present system include corresponding computer systems, apparatus, and computer programs recorded on one or more computer storage devices, each configured to perform the steps of the methods. Details of the method are further discussed below with reference to the accompanying drawings.
Fig. 2 is a simplified flow diagram of a conventional forward pipeline 200 for rendering video graphics. As shown, forward pipeline 200 (i.e., including vertex shader 210, followed by fragment shader 220). In the forward pipeline rendering process, the CPU provides graphics data (e.g., from memory, storage devices, networks, etc.) of the 3D scene to a graphics card or GPU. In the GPU, vertex shader 210 converts objects in a 3D scene from object space to screen space. This process involves projecting the geometry of the object and decomposing it into vertices, which are then transformed and divided into segments or pixels. In the fragment shader 220, the pixels are shaded (e.g., color, illumination, texture, etc.) and then passed to a display (e.g., a screen of a smartphone, tablet, VR goggles, etc.). In the lighting case, the rendering effect will be processed for each light source, for each vertex and each segment in the scene.
Fig. 3 is a simplified flow diagram of a conventional delay pipeline 300 for rendering video graphics. Here, the preprocessing process 310 involves receiving graphics data of a 3D scene from a CPU and generating a G buffer (G-buffer) containing data required for a subsequent rendering channel, such as color, depth, normal, etc. Ray traced reflection channel 320 involves processing the G-buffer data to determine the reflection condition of the scene, while ray traced shadow channel 330 involves processing the G-buffer data to determine the shadow condition of the scene. Then, the denoising channel 340 removes noise for the ray traced pixel. In the main rendering channel 350, reflection, shading, and texture assessment are combined to produce a rendered output with each pixel color. In post-processing channel 360, the rendered output may undergo additional rendering processing, such as color correction, depth of field, etc.
The delay pipeline 300 reduces the total number of fragments compared to the forward pipeline 200 by processing the rendering effect based only on non-occluded pixels. This is achieved by decomposing the rendering process into multiple phases (i.e. channels), wherein the colors, depths and normals of objects in the 3D scene are written to separate buffers, which are then rendered together to produce the final rendered frame. Subsequent channels use the depth values to skip rendering of occluded pixels when executing more complex illumination shaders. The deferred rendering pipeline approach reduces the complexity of any single shader compared to the forward rendering pipeline approach, but multiple rendering channels require greater memory bandwidth, which is particularly problematic in modern mobile architectures where memory is limited and shared.
According to an example, the present invention provides a method and system for world space sampling clusters. The spatial cluster system is configured with vectors defining the scope of the cluster data structure. Default value is {0, 0}, using bounding boxes of the entire scene, and independent of camera position. If set to a non-zero value, a spatial cluster is created centered on the camera and boundaries correspond to the range values for each configuration.
Once the boundaries are established, the scene is further divided into smaller clusters based on user-configured vectors that specify the number of subdivisions on each axis. Default values are {0, 0}, allowing the number of clusters and the boundaries of each cluster to be manually set (as will be further explained below). Each cluster contains data for accelerating the rendering of points within the cluster. These data are generated using a compute shader that executes threads for each cluster (e.g., the shaders shown in fig. 2 and 3).
Any data that is upsampled and can be used simultaneously should be stored in these clusters. In this example, to support a more generic spatial data structure, the light-specific optimization is sacrificed, and therefore a list of light sources ordered by importance is stored, rather than cut points in a hierarchical light source tree. The ReSTIR reservoirs may also be stored in clusters due to the seamless integration of ReSTIR with the light source importance sampling. Cut point data (cut-point data) of cluster culling masks and environment map visibility may also be stored in clusters.
Finally, at run-time, cluster data is loaded once to color a hit point (hit point) based on its location. Depending on the application, if the memory footprint of the hash map and the complexity of implementing an efficient hash scheme are manageable, then the use of the hash scheme for the lookup may be considered. However, in this case, the hit point locations are divided into their associated clusters according to the previously described ranges and subdivisions. If the subdivision portion is {0, 0}, then cluster IDs are derived from the instance culling mask, enabling clustering methods that rely on non-spatial properties (e.g., texture parameters).
Embodiments of such a generic spatial clustering system may be used to segment a scene to selectively load fragment data and share data within a given fragment. For small scenes with fewer clusters, the memory footprint and the number of computational schedules required to update the clusters for rendering are small, although they vary as the number of subdivisions varies. The low minimum cost of such a simple spatial clustering system makes it suitable for mobile devices. Compared with the screen space clustering method, the method introduces the dependency relationship between scene scale and performance, but the provided world space locality is more meaningful for ray tracing than the projection locality provided by the screen space clustering method.
Furthermore, spatial clusters can be reused for various sampling improvements by avoiding partition optimizations that are specific to any single sampling optimization. While this compromises the quality of any single sample optimization, it decouples the build and update times of the data structure from the number of optimizations implemented. This is particularly important when the grid cells are camera-centered, as each camera movement requires an update.
In general, the culling mask helps reduce the cost of initializing inline ray tracing (INLINE RAY TRACING) by reducing the proportion of acceleration structures that are loaded into memory and traversed. Only the instances in the acceleration structure that match the cull mask will be loaded. According to an example, the present invention provides a method of programmatic setting of instance rejection masks based on different heuristics. Five example methods of which are discussed below. Since the ray tracing requirements for each scene will vary from scene to scene content, these methods can be tested during development and the most appropriate method for the scene selected by inspection. For a given scenario, the selected segmentation method may be computed programmatically at load time to reduce the computation cost at run-time. In the following drawings, the coordinate frames used for these descriptions are coordinate frames with the Y axis as the upward direction, the X axis from right to left on the screen, and the Z axis from inside the screen to outside.
In a specific example, there are two important culling mask values, which will be referred to as culling to (cull-to) and culling from (cull-from) masks. Culling to a mask refers to a mask value that is packed with instance data built in the acceleration structure. Only non-zero rays that have mask values and are rejected to the mask are intersected by these instances after the boolean and operation. The culling self-mask refers to a mask value that is packed with instance data that is directly accessed by the runtime shader code. Culling self-masks provides a mask value for rays issued from the hit point of the corresponding instance. An example may have different cull-to-and cull-from mask values. In the following example, the values will be numbered from 1 to 8, which corresponds to the bits set on the 8-bit flush mask, with all other bits being zero. Of course, other variations, modifications, and alternatives are also possible.
The spatial grid is configured with a range vector and a center vector. By default, these are filled from the scene center and scene range, but the center may also be set as the camera position. For all grids below, the range is divided by the number of segments on a given axis to determine the boundary locations between segments. The internal boundaries are placed computationally, but for a grid with external faces, it is assumed that these faces extend to encompass the rest of the scene. The boolean flag (e.g., traceLocalOnly) may be enabled if the grid is intended to capture only instances that are strictly within range. This functionality is applicable to scenes where the contribution of distant objects to the ray tracing effect is not great.
According to an example, the present invention provides a method of assigning scene instances to segments based only on instance transforms (i.e., only spatial segmentation). Instances overlapping a given segment will be assigned to that segment. An instance overlapping multiple segments will be assigned to all overlapping segments by setting the bit corresponding to these segments to 1. For space-only segmentation, the culling-to-mask and culling-from-mask are always equal.
Fig. 4A is a simplified schematic diagram of a spatial-only aliquoting method according to an embodiment of the invention. The most straightforward way to divide a scene is to divide the scene into equal parts. This is just the method employed by world space clusters, given a list of instances, to calculate the boundary of a scene, which is then divided into equal-sized volumes. For an 8-bit instance culling mask, the number of volumes must be limited to eight. As shown in diagram 401, the most straightforward approach is to use two layers of four segments (uniform grid, segment size 2x2x 2), defined by an Axis Aligned Bounding Box (AABB). AABB allows efficient comparison of boundaries with each instance for allocation into fragments.
However, most game checkpoints (e.g., RPG or FPS checkpoints) are constrained to move on a single plane, and the scene data is more closely aligned in the Y-direction, so there is little need to subdivide the object according to the Y-position. Fig. 4B is a simplified schematic diagram of a spatial single plane only segmentation method according to an embodiment of the present invention. As shown in illustration 402, the number of segments along the Z-axis or X-axis is doubled (2D grid, segment size 2X1X 4). This axis may be selected according to a larger range of the overall scene bounding box.
Finally, for scenes with large areas of transmissive and reflective elements, such as bodies of water, the scene will be partitioned using X-Z planes. However, these scenes typically have more scene geometry above or below the water surface. Fig. 4C is a simplified schematic diagram of a spatial asymmetric segmentation-only method according to an embodiment of the present invention. As shown in diagram 403, only two halves of the fragments are allocated to the Y-subsection with fewer instances, and the remaining six fragments are allocated to the Y-subsection with more instances (X-Z grid, 2X1 fragments above and 3X2 fragments below).
In a specific example, this approach can be further refined by setting the Y subsection to the value of the instance with the largest X-Z boundary square. Those of ordinary skill in the art will recognize other variations, modifications, and alternatives to these spatial-only segmentation methods (e.g., segmentation along different axes, different numbers and proportions of segments, etc.).
All the previously described grids can be centered directly on the camera position and sized according to the range of configurations. However, another mesh view dedicated to camera-centric rendering is also presented herein. Thus, according to an example, the present invention provides a camera-centered segmentation method.
In general, objects close to the camera are more attractive and therefore have to be rendered in higher detail than objects far away. The camera-centered grid takes advantage of this and distributes the grid cells according to the camera's transformation. Fig. 4D is a simplified schematic diagram of a camera-centered segmentation method according to an embodiment of the present invention. As shown in diagram 404, four segments are allocated to cover a configurable maximum radius around the camera (labeled 1 through 4), with the remaining four segments (labeled 5 through 8) dividing all remaining instances in the scene into four quadrants.
Segment boundaries may create visible artifacts in indirect lighting, especially on mirror-like surfaces, which may show sharp cuts (cutoff) in reflection between instances assigned to different segments. In an example, the grid is aligned at a 45 degree angle to the camera local space in the X-Z plane to reduce the visibility of these cut-off artifacts. This requires maintenance of a separate "rotated view matrix" for transforming the instance from world space to rotated camera local space. Note that the bounding box of each instance must also be transformed correctly. The AABB comparison can then be performed normally in the rotated view space.
In an example, to avoid complicating the comparison of the non-square shaped outer segments 5-8, all examples are first compared to the configuration radius of the inner segment. Any instance that overlaps the radius may be added to an instance list that will only test whether it is included in the inside segment, and all other instances may be added to another instance list that will only test whether it is included in the outside segment. This eliminates the need to test the outer boundary during boundary testing of the two lists, simplifying boundary testing into quadrant testing.
According to an example, the present invention provides a method of texture-based scene segmentation. These methods take advantage of the fact that some rays only need to intersect certain types of materials. For example, subsurface rays need only intersect a material with subsurface scattering, while shadow rays must ignore the emitting material to properly create shadows. These approaches also consider that the spatial properties of an instance can affect the correlation of indirect illumination with different materials differently. For example, a perfectly smooth mirror may be expected to reflect distant objects, while a rough object is typically only affected by indirect illumination of nearby objects.
In particular examples, material classes may include thick subsurface, light source, alpha cutoff, smooth transmission, smooth reflection, rough object, center, large projection scale, and the like. Determining the type of material may include logically examining properties such as subsurface scattering, refractive index (index of refraction, IOR), emission, opacity, roughness, distance from center, center view size, etc. In addition, each material class may define culling-to-and culling-from values, respectively, as described above.
Based on the spatial properties of each instance, a center class and a large projection scale class are assigned. Instances that have been assigned to other material culling groups may still be considered for assignment to both culling groups. They may be set with respect to a currently active camera or a user configured view projection matrix. The latter may be useful when the scene content distribution makes the ray tracing effect only noticeable at the static location of interest.
The center class is calculated based on the view space distance of each instance relative to the origin. When considering the large projection scale class, all instances assigned to the center class are excluded.
To calculate the projection scale of the instance, the projection matrix of the camera is multiplied by a modified view matrix, which rotates the instance to the (0, d) position in the modified view space with the distance d between the camera and the instance. Then, after each instance is projected to the view plane center, the projection range of each instance bounding box can be used as a surrogate indicator of the importance of the instance in indirectly rendering the appearance. If the projected X or Y range exceeds a user-configured threshold, then the instance is added to the specified class (e.g., class 8 for an 8-bit mask).
The bandwidth consumption of ray tracing procedures (e.g., rayQuery initialization) is a major bottleneck for mobile ray tracing. For example, by using an instance culling mask to initialize rayQuery with only 1/8 of the scene geometry, bandwidth consumption is significantly reduced. This achieves a 15-20fps speed boost on the mobile device when rendered at 1697x760 resolution on a test scene containing 35,000 triangles. The example culling mask of this test scene is manually configured according to the appearance of the scene, specifically according to the layout and texture of the scene geometry.
The method for programmatically setting these culling masks replaces the cumbersome manual configuration step with a simple mode selector, allowing the user to select the segmentation mode that best suits his scene by checking. Furthermore, an embodiment of the method is to specify an instance mask for the first time based on the instance's scale or position in the scene. Furthermore, the programmatic setting of the instance rejection mask does not require additional manual adjustments by the user.
According to an example, the present invention provides a method of sharing scene cut data between a world space cluster and an instance culling system. In an example, both the spatial sampling clusters and the programmatic instance culling mask are managed by an overall scene segmentation manager. These two functions can be selectively linked, sacrificing the resolution of the spatial clusters and the flexibility of the instance rejection mask in exchange for the performance improvement brought by greater reuse.
When chained, for an 8-bit mask, the maximum number of clusters of bitwidth-limited space clusters of the example culling mask is eight. At such lower resolutions, spatial clusters are less capable of capturing high frequency illumination differences, and therefore optimization based on spatial clusters will not bring much improvement for scenes with hundreds of different characteristic small light sources. However, most modern games are first or third person games, where the camera view is positioned at the player character and most scene objects are similar in scale to the player character, where the scene is well covered with only eight clusters.
On the other hand, instance rejection masks may only be able to assign categories using space-only methods. Attempting to construct a bounding box from all coarse objects in a scene can almost certainly result in a bounding box that overlaps other bounding boxes and spans a large enough volume that the available data cannot be shared within that volume.
As previously described, the pre-computed boundaries may be used to overlay the programmatically subdivided portions of the scene boundaries. When the spatial sampling cluster is linked with a programmatic instance rejection mask class, the programmatic instance rejection mask class may generate a bounding box for the active mesh pattern and pass it to the spatial cluster class. Then, since the spatial clusters are the same as the instance culling class, the culling-to-value bit offset of the instance being colored can be directly used to determine which spatial cluster contains relevant data for improving the coloring. Since each instance can only be associated with one spatial cluster, this limits the programmatic instance rejection mask so that each instance's rejection to a value must be "one-hot", i.e., only one non-zero bit. Thus, the previous method can be extended to check for overlap between an instance and all candidate categories and assign only the instance to the category with the largest overlap.
Sharing scene segmentation data between world space clusters and instance culling systems reduces the overhead of maintaining both methods separately because the scene data structure only needs to be computed or updated once and then can be shared between both methods. This sacrifices the resolution of the spatial clusters (and thus the quality of the scene with high frequency direct illumination) and can only be used with the spatial-only approach of programming instance rejection masks. Nevertheless, the example culling cluster camera-centric approach achieves acceptable rendering quality and improved performance for typical first and third person game content.
Fig. 5 is a simplified flow diagram of a method 500 of rendering graphics on a computer device according to an embodiment of the invention. The diagram is merely an example, which should not unduly limit the scope of the claims. Those of ordinary skill in the art will recognize many variations, alternatives, and modifications. For example, one or more steps may be added, deleted, repeated, replaced, modified, rearranged and/or overlapped, which should not limit the scope of the claims.
According to an example, the method 500 may be performed on a rendering system, such as the system 100 in fig. 1. More specifically, the processor of the system may be configured by executable code stored in a system memory storage (e.g., persistent storage) to perform the operations of method 500. As shown, method 500 may include a step 502 of receiving a plurality of graphics data associated with a three-dimensional (3D) scene, the 3D scene being rasterized to determine at least a first object (or all object instances in the scene) that is to be intersected by a chief ray passing through a plurality of screen space pixels. This may include determining all data required for the first object to be intersected by the ray passing through each pixel in the viewport, wherein the plurality of graphics data includes a plurality of vertex data associated with a plurality of vertices in the 3D scene. In an example, the method includes generating a plurality of screen space primitive data using at least a plurality of vertex data. The plurality of primitive data includes at least a barycentric coordinate and a triangle index, but may also include other location data.
In step 504, the method includes providing a position vector defining a center of the global cluster data structure in three axes. In a specific example, the center of the global cluster data structure is independent of, at, or at the camera location. In step 506, the method includes providing a subdivision scalar for uniformly dividing the global cluster data structure along three axes. In step 508, the method includes partitioning the bounding box by using at least the sub-divided scalar and the range of the overall data structure to create clusters. The partitioning bounding box may include any of the segmentation techniques discussed previously and variations thereof.
In step 510, the method includes mapping geometric position data to an associated cluster defined by a cluster center location. In step 512, the method includes executing, by one or more compute shaders, threads for each cluster to generate cluster data corresponding to the cluster. In step 514, the method includes generating a cull mask for each object using the cluster data. In particular examples, rejecting the mask includes rejecting to the mask and/or rejecting from the mask. As previously described, the culling mask may be specified based on clusters corresponding to equal scene segments, or based on material and distance of each object relative to the camera.
In step 516, the method includes rendering at least the first object (or all objects represented in the screen space primitive buffer) by the shader using at least the cull mask and the cluster data. The shader may be configured in a computer device, such as device 100 shown in fig. 1, and may include a pipeline configuration, such as that shown in fig. 2 and 3. In addition, the method may include storing ReSTIR reservoir data, an ordered list of light sources, or the like.
FIG. 6 is a simplified flow diagram of a method 500 of rendering graphics on a computer device according to an embodiment of the invention. The diagram is merely an example, which should not unduly limit the scope of the claims. Those of ordinary skill in the art will recognize many variations, alternatives, and modifications. For example, one or more steps may be added, deleted, repeated, replaced, modified, rearranged and/or overlapped, which should not limit the scope of the claims.
According to an example, the method 600 may be performed on a rendering system, such as the system 100 in fig. 1. More specifically, the processor of the system may be configured by executable code stored in a memory storage (persistent storage) of the system to perform the operations of method 500. As shown, the method 500 may include a step 602 of generating a three-dimensional (3D) scene containing a first object (or containing all object instances in the scene). In a specific example, the 3D scene includes a first object intersected by a chief ray.
In step 604, the method includes receiving a plurality of graphics data associated with a 3D scene. In a specific example, the plurality of graphics data includes a plurality of vertex data associated with a plurality of vertices in the 3D scene. In an example, the method further includes generating a plurality of primitive data using at least the plurality of vertex data and the vertex shader, the plurality of primitive data including at least the location data.
In step 606, the method includes providing a position vector defining a center of the global cluster data structure in three axes. In step 608, the method includes providing a subdivision scalar for dividing the global cluster data structure along three axes. In step 610, the method includes partitioning the bounding box by using at least the sub-divided scalar and the range of the overall data structure to create clusters. In a specific example, the bounding boxes are uniformly divided, divided according to horizon in a 3D scene, or the like and combinations thereof. In an example, the method further includes obtaining settings for partitioning the bounding box. The partitioning bounding box may include any of the segmentation techniques discussed previously and variations thereof.
In step 612, the method includes mapping the geometric position data to an associated cluster defined by a cluster center location. In step 614, the method includes executing, by one or more compute shaders, threads for each cluster to generate cluster data corresponding to the cluster. In step 616, the method includes generating a cull mask for each object (e.g., first object) using the cluster data. As previously described, the method may further include defining a culling mask based at least on the opacity of objects in the 3D scene.
In step 618, the method includes rendering, by the shader, the 3D scene using at least the cull mask and the cluster data. As previously described, the shader may be configured in a computer device, such as device 100 shown in fig. 1, and may include a pipeline configuration, such as shown in fig. 2 and 3.
While the above is a complete description of the specific embodiments, various modifications, alternative constructions, and equivalents may be used. Accordingly, the foregoing description and drawings should not be deemed to limit the scope of the invention, which is defined by the appended claims.

Claims (20)

CN202380070881.1A2022-10-212023-02-06 Method and system for rendering video graphics using scene segmentationPendingCN120129927A (en)

Applications Claiming Priority (3)

Application NumberPriority DateFiling DateTitle
US202263418050P2022-10-212022-10-21
US63/418,0502022-10-21
PCT/US2023/062046WO2024086382A1 (en)2022-10-212023-02-06Methods and systems for rendering video graphics using scene segmentation

Publications (1)

Publication NumberPublication Date
CN120129927Atrue CN120129927A (en)2025-06-10

Family

ID=90738455

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN202380070881.1APendingCN120129927A (en)2022-10-212023-02-06 Method and system for rendering video graphics using scene segmentation

Country Status (2)

CountryLink
CN (1)CN120129927A (en)
WO (1)WO2024086382A1 (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN118365765B (en)*2024-06-142024-08-16中南大学Large-scale building scene model loading method based on mixed reality
CN120541900A (en)*2025-07-252025-08-26中国电建集团江西省电力设计院有限公司 A method and system for previewing substation design drawings based on density clustering

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US10853994B1 (en)*2019-05-232020-12-01Nvidia CorporationRendering scenes using a combination of raytracing and rasterization
US11315310B2 (en)*2020-03-122022-04-26Nvidia CorporationReservoir-based spatiotemporal importance resampling utilizing a global illumination data structure
US11995767B2 (en)*2020-08-172024-05-28Intel CorporationApparatus and method for compressing ray tracing acceleration structure build data

Also Published As

Publication numberPublication date
WO2024086382A1 (en)2024-04-25

Similar Documents

PublicationPublication DateTitle
US20250037361A1 (en)Object illumination in hybrid rasterization and ray traced 3-d rendering
US12067669B2 (en)Watertight ray triangle intersection
US11069124B2 (en)Systems and methods for reducing rendering latency
US11138782B2 (en)Systems and methods for rendering optical distortion effects
JP5476138B2 (en) A method for updating acceleration data structures for ray tracing between frames based on changing field of view
US8243073B2 (en)Tree insertion depth adjustment based on view frustum and distance culling
US10699467B2 (en)Computer-graphics based on hierarchical ray casting
US10964086B2 (en)Graphics processing
US10049486B2 (en)Sparse rasterization
US10055883B2 (en)Frustum tests for sub-pixel shadows
CN112001993B (en) A multi-GPU city simulation system for large scenes
US10553012B2 (en)Systems and methods for rendering foveated effects
CN120129927A (en) Method and system for rendering video graphics using scene segmentation
KR102864226B1 (en) Variable rate tessellation
US20250259384A1 (en)Multi-dimensional binning for hierarchical partitioning
US20250086877A1 (en)Content-adaptive 3d reconstruction
US20240355043A1 (en)Distributed light transport simulation with efficient ray forwarding

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination

[8]ページ先頭

©2009-2025 Movatter.jp