The present application claims priority from U.S. provisional applications No.62/488,526 filed onday 21, 4/2017 and No.62/653,056 filed onday 5, 4/2018.
Background
Remote gaming applications that control server-side games by client players have attempted to encode video output from three-dimensional (3D) graphics engines in real-time using existing or custom codecs (also known as encoders). However, the interactive nature of video games, and in particular the player feedback loop between the video output and the player input, makes the game video stream more delay sensitive than traditional video streams. Existing video coding methods may sacrifice computational power for reduced coding time, among others. New methods for integrating the encoding process into the video rendering process can significantly reduce the encoding time while also reducing computational power, improving the quality of the encoded video, and preserving the original bitstream data format to preserve the interoperability of existing hardware devices.
When a video game instance is run on hardware local to the player, it is desirable for the game to output each pixel with the highest quality. However, in a server-side game instance where the rendered output is encoded and transmitted to a remote client, the encoder may degrade the image quality to fit within the limited bandwidth. If the rendering quality is significantly higher than the quality of the decoded output, then a significant amount of the server-side rendering effort will be lost.
By matching the quality of the server-side rendering to the quantized quality based on feedback from the encoder, the game can reduce wasted server-side computations without any significant loss in client-side quality. Reducing server-side computational waste may also result in additional benefits, including reduced energy consumption, reduced rendering time, and reduced player feedback delay. In an environment where multiple game instances run on the same server, the server-side computational savings are more complex.
In streaming environments involving multi-player games, particularly games such as massively multiplayer online games ("MMOG"), it is becoming increasingly important to ensure that server-side rendering work is not wasted. Encoders that maximize rendering quality while preventing a decrease in game speed are particularly important because of the limited bandwidth available to MMOG players. As described below, current techniques employ various approaches to attempt to solve this problem, but are still inadequate.
U.S. patent publication No. US20170132830a1 ("' 830 publication") discloses a system and method for determining a selection of a coloring point in A3D scene in which coloring is performed, performing coloring on the determined coloring point, and determining coloring information of the 3D scene based on a result of the coloring performed on the determined coloring point. The shading of the scene is adjusted based on the temporal characteristics of the scene. However, this technique does not address the fundamental problem of optimizing the encoding based on server-side rendering functionality and available bandwidth.
U.S. patent publication No. US20170200253a1 ("the' 253 publication") discloses systems and methods for improving rendering performance of a graphics processor. At the graphics processor, an upper threshold can be set so that when a frame greater than the set threshold is encountered, the graphics processor will take appropriate action to reduce rendering time. However, this technique is based only on set thresholds and cannot be dynamically adjusted to accommodate server-side rendering functionality and available bandwidth.
U.S. patent publication No. US2017/0278296a1 ("the' 296 publication") discloses systems and methods in which an initial rendering of a scene is generated that determines texture at each portion of the scene, and a ray-traced rendering of the scene is generated by tracing an initial sample of rays. This reference discloses intelligently determining the optimal number of samples per pixel based on a priori knowledge of scene texture and identifying noise due to undersampling during ray tracing. Again, this technique is limited to optimal ray sampling and cannot be dynamically adjusted to accommodate server-side rendering functionality and available bandwidth.
As is apparent from the above discussion of the current state of the art, there is a need in the art to improve upon existing computer technology related to rendering and encoding of games.
Detailed Description
In describing the preferred embodiments of the present invention illustrated in the drawings, specific terminology will be resorted to for the sake of clarity. However, the invention is not intended to be limited to the specific terms so selected, and it is to be understood that each specific term includes all technical equivalents that operate in a similar manner to accomplish a similar purpose. Several preferred embodiments of the present invention have been described for illustrative purposes, and it should be understood that the present invention may be embodied in other forms not specifically shown in the drawings.
Modern rendering engines, such as those used in video games, have the ability to adapt certain quality settings during runtime based on factors such as the distance of the player from the target, rendering time of the previous frame, or other runtime measurements. The rendering engine may provide a variety of methods to adjust the quality, allowing for finer control of the overall rendering quality. Some examples include: biasing texture sampling to use blurred texture mapping (Mipmap); using lower quality cascades or fewer samples on the shadow; running a simplified path on the shading model (e.g., a DCT transform of specular reflection that looks like diffuse reflection); and post-processing with fewer samples (e.g., for gaussian, volume fog, etc.). In real-time streaming applications, altering one or more rendering quality settings in response to changes in encoder settings may provide optimal rendering cost savings without affecting the encoded output quality.
FIG. 1 is an illustration of an exemplary environment in which real-time rendered video is streamed to a remote viewer in real-time. The server 100 may be any hardware capable of running a real-time rendering process 102 (hereinafter also referred to as a renderer) and astream codec 104 simultaneously. Thecodec 104 must also have the capability to communicate its quantization settings back to the rendering process 102 through direct reporting or some other monitoring process known in the art. The encoded video stream is transmitted over a network to the client device 106. The client 106 may be any hardware capable of decoding and displaying a video stream.
Fig. 2 is a flow chart outlining exemplary stages of encoder-guided adaptive quality rendering. Real-time stream coding using h.264 standard compliant encoders typically employs a constant rate factor ("CRF") mode that reports the effective quantization setting of the encoded frame as a quantization parameter ("QP") at "report quantization setting for each encoded frame" (step 200). In some embodiments, the h.264 standard-compliant library used is ffmpeg, which outputs a quantization parameter as the variable f _ crf _ avg. The quantization parameter is an index from 0 to 51, which defines the lossy extent of compression during encoding. Lower QP values indicate lower compression and higher QP values indicate higher compression. To maintain a constant bit rate, an encoder operating in CRF mode will increase the QP of the frame, which can provide higher compression and lower the QP of the frame that requires higher quality. The encoder takes advantage of the fact that the human eye is less able to discern details on moving objects by increasing compression in areas with relatively high motion and decreasing compression in areas that are relatively stationary. This allows the encoder to maintain the target perceptual quality while reducing the size of some of the encoded frames.
The renderer reads the reported QP prior to rendering the frame "monitor for changes in quantization settings" (step 202). In "different? "(step 203), if the effective quantization settings have not changed since the previously rendered frame, the renderer takes no action to adapt to the rendering quality and will check again on the next frame. If the QP value read by the renderer is different from the previously rendered frame, or if this is the first encoded frame that is performing encoder-directed adaptive quality rendering, the rendering quality is altered at "change rendering quality setting to match quantization setting" (step 204). If the QP value has increased since the previously rendered frame, the renderer will decrease the quality to match the compression level at the encoder. Likewise, if the QP value has decreased since the previously rendered frame, the encoder will improve quality. To change rendering settings, the renderer will check a pre-generated lookup table that provides a rendering quality setting profile for the QP values provided by the encoder. Typically, there should be only one entry per encoder quality setting. The renderer finds the one entry using the QP provided by the encoder, and uses the associated rendering quality setting profile. Typically, the entire rendering quality setting profile is applied. The rendering quality setting profile defines a list of values for each available rendering quality setting. The pre-generation of the look-up table is described in more detail with reference to fig. 3. The predefined lookup table may define rendering settings for integer values of QP, which requires the renderer to round the read QP value to the nearest integer, or the lookup table may define rendering settings for each partial range of QP values between 0 and 51. The examples in fig. 4 and 5 assume that the renderer will round the QP to the nearest integer before using the lookup table, but alternatively the examples may be modified to use a partial range of QPs to define the lookup table. The renderer will change the quality setting according to the rendering quality setting profile obtained from the look-up table before rendering the next frame. Reducing the rendering quality will reduce the amount of rendering work wasted in the presence of quality bottlenecks at the encoder.
Fig. 3 is a flow chart summarizing an exemplary pre-generation of a look-up table that assigns a rendering quality setting profile to each partial range of encoder settings. The reference image will be used as a benchmark to measure the impact on perceptual quality when changing encoding settings or rendering settings. The reference image should represent a typical frame of video output and include rendering elements, such as models, textures or visual effects, that are specific to the selected game background. The game context may include a particular area, a particular map, a particular level, or some particular game. The selected reference image will be used to generate a look-up table that estimates the perceived quality of the video rendered within the same background as the reference image. For example, a look-up table generated from a reference image containing a representative set of elements from a game level may be used to estimate the perceived quality of video rendered from similar scenes within the same level. The method of combining multiple look-up tables into a common look-up table will be discussed further below. After the game background is identified, a representative scene should be selected and rendered at full quality, as shown at "select and generate reference image" (step 300). A full-quality rendered scene of a representative scene is referred to herein as a reference image.
The preferred embodiment of the runtime behavior of the renderer discussed above in connection with the description of FIG. 2 requires the renderer to round the received QP value to the nearest integer before reading the lookup table. Thus, the lookup table will be generated using only integer values of QP. At the encoder, the full-quality reference picture is encoded for each integer value of quality setting (quantization parameter (QP)integer values 0 to 51) in the encoder, as shown at "encode reference picture for each partial range of encoder settings" (step 302). In the preferred embodiment, there are 52 partial ranges defined by the rounding operation performed by the renderer. The implementation can be modified to create more partial ranges for more common QP values (values in the middle of 0 to 51) or less partial ranges for more rare QP values (values at the extreme of 0 to 51).
Perceptual quality is the degree to which one attempts to quantify the loss of quality between the human eye's perception of a compressed image and a full-quality source image. There are various methods for estimating perceptual quality including calculating the Mean Square Error (MSE) and peak signal-to-noise ratio (PSNR) of the quality of a compression codec using only the difference in luminance and contrast values between two images. As disclosed by z.wang, a.c. bovik, h.r.sheikh, and e.p.simonelli in "Image quality assessment: From error visibility to structural similarity," IEEE Transactions on Image Processing, vol.13, No.4, pp.600-612, apr.2004, the Structural Similarity (SSIM) index is a method that adds to the assumption that the human eye is also good at extracting structural information From a scene and defines a calculation to estimate perceived quality. SSIM compares pixel data of two images: the pixel data between the uncompressed full-quality reference picture and the encoded picture. The algorithm compares luminance, contrast, texture, and sometimes chrominance across an 8x8 pixel "window". SSIM is a preferred tool for computing perceptual quality because of its low computational cost and its superiority over methods such as MSE and PSNR. To generate the perceived quality for each value of the encoder settings (preferably at the renderer and/or game engine), an SSIM index is calculated between each encoded reference image and the reference image, as shown at "calculate perceived quality for each encoded reference image" (step 304). In a preferred embodiment, 52 SSIM values are calculated for values from 0 to 51 for one of each Quantization Parameter (QP) integer. The exemplary descriptions with reference to fig. 3, 4 and 5 use standard SSIM calculations to compare two still images, but there are variants of the SSIM method that can compare two video segments and can be replaced with increased computational costs. One such SSIM variant is Spatio-Temporal SSIM, such as the "Efficient Motion Weighted spread-Temporal Video SSIM Index," Human Vision and Electronic Imaging XV, vol.7527, Mar.2010 (from Anush K.Moorthy and Alan C.Bovik)
http:// live. ec. utexas. edu/publications/2010/sports _ spie _ jan10.pdf available).
The renderer may have multiple settings available for per-pixel quality control including screen resolution, texture map (mipmap) selection, multi-level of detail (LOD) selection, shadow quality, post-processing quality, or other settings. The quality setting profile defines a list of values for each available quality setting. In some embodiments, at the renderer, a list of all rendering settings that can be adaptively changed and their possible values are collected. All permutations combinations of adaptive quality rendering settings and their values are then generated to create a list of rendering quality setting profiles, as shown at "generate rendering quality setting profile list" (step 306). Because the renderer may have many quality settings with many possible values, the number of permutations of quality setting profiles may be too long. The example of fig. 5 discusses an exemplary method for limiting and optimizing the number of quality setting profiles in a list.
For each rendering quality setting profile in the list, the reference image should be re-rendered at the renderer using the specified rendering settings, as shown at "re-render reference image for each rendering quality setting profile" (step 308). If the rendering quality setting profile consists of more than one setting, the rendering time for each re-rendered reference image should also be recorded as a measure of the computational cost, illustratively in rendering time or clock cycles. This measure of computational cost can be used as a decision point in a later step if there is any SSIM value collision.
The perceived quality is measured by comparing each of the re-rendered images to the original reference image, using the same measure of perceived quality as was used previously instep 304, as shown at "calculate perceived quality for each re-rendered reference image" (step 310). In a preferred embodiment, a Structural Similarity Index (SSIM) is used to measure the perceptual quality of the encoder results, and will be used to measure the perceptual quality of the re-rendering results.
At the renderer, the two sets of perceptual quality values (the SSIM value of the encoded reference image computed atstep 304 and the SSIM value of the re-rendered reference image per profile computed at step 310) are compared between the two sets of images to find a matching SSIM value between the two sets. Ideally, for each SSIM value of the encoded reference image, there is an exact match of SSIM values in the group of images re-rendered for each profile. If there is not an exact match, the SSIM value of each of the selected profile re-rendered images should be greater than and as close as possible to the SSIM value of the target encoded reference image. A matching SSIM value across two sets of perceptual quality values will identify a rendering quality setting profile for each value of QP, as shown at "find quality setting profile for each partial range of encoder settings" (step 312). In the event of a conflict (there are two or more exact matches in the set of SSIM values from which the image is re-rendered from each profile), the computational cost recorded instep 308 may be used as a resolution point and a lower cost rendering quality setting profile selected for the encoder settings. FIG. 5 illustrates an example conflict.
The encoder settings and their matching rendering quality setting profiles should be organized into a look-up table, as shown at "create look-up table assigning rendering quality setting profiles to each encoder setting" (step 314). The lookup table may be used at the renderer during runtime to change the rendering quality setting to match the quantization setting as described instep 204 in fig. 2. The look-up table provides a rendering quality setting profile that generates an image with the same perceived quality as the encoded frame and provides the greatest computational savings for a given reference frame. Example lookup tables are shown in fig. 4 and 5.
The look-up table generated by the method described in connection with fig. 3 may be used as a reference image in a similar game context, scene, or environment. The process outlined in connection with fig. 3 may be repeated for multiple reference images (each representing a particular environment, scene type, or other meaningful game context). For example, a reference image may be selected from each map in the game to generate a plurality of map-specific look-up tables. The lookup tables may also be combined to create lookup tables that can be more commonly used in a gaming environment. For example, the map-specified look-up tables may be combined to generate one look-up table that may be used for all maps in the game. To combine the look-up tables, the rendering quality setting profiles for each QP may be combined to find an average for each setting contained in the profile. For example, three look-up tables are generated for three reference images. The rendering quality setting profile consists of three setting values: post-processing quality settings, shadow quality settings, and resolution settings. To combine the rendering quality setting profiles with QP value of 4, the profiles are read from each lookup table and denoted as P41={3,MED,95%}、P424, LOW, 90% }, and P432, MED, 90% }. The average for each setting is found to generate PAvg ═ 3, MED, 92% }. The profile averaging process should be rounded so that the rendering process never generates an image at a perceived quality level lower than the current encoding quality setting. The profile for each QP value is averaged and organized into a new look-up table.
Fig. 4 is an example of look-up table generation for a rendering quality setting profile consisting of only one setting. In this example, a single rendering quality setting is adapted in response to a change in the encoder quality setting. Rendering of the first person perspective of the 3D scene is accommodated at the renderer by altering the resolution of the 3D portion of the view (shown at "3D view" 400), however to preserve the readability of any player-oriented text, the resolution of the User Interface (UI) element (shown as "UI" 402) is not altered. This type of selective resolution scaling is referred to as dynamic resolution scaling and is an increasingly common feature of rendering engines. The reference image (shown at "reference image" 404) represents a single frame from a typical video output rendered at the highest resolution possible, and is selected according to the criteria outlined atstep 300 of fig. 3. At the encoder, a reference picture (shown at "reference picture" 404) is encoded for each integer value of QP, as described in connection withstep 302 of fig. 3, to generate a list of encoded reference pictures at "encoded reference picture" 406. As described in connection withstep 304 of fig. 3, SSIM values (shown as "SSIM" 408) are calculated for each encoded reference image 406 at the renderer. Since the rendering quality profile consists of only one quality setting, the number of quality profile permutation combinations is limited by the number of possible values available for the resolution of the 3D view (shown as "3D view" 400). The upper limit of the number of possible resolution values is defined by the maximum possible resolution of the 3D view and the lower limit is defined by the minimum possible resolution of the 3D view. The aspect ratio may define how many resolution values exist between the minimum resolution and the maximum resolution. For example, maximum resolution 3840x 2160 has an aspect ratio of 16:9 and the minimum feasible resolution in this aspect ratio is chosen to be 1280x 720. Between these upper and lower limits there are 160 possible resolutions with an aspect ratio of 16: 9. Alternatively, some numbers of the same resolution between the upper limit and the lower limit may be arbitrarily selected as the resolution samples. For example, the resolution may be gradually reduced in the x-direction between 3840 and 1280 to select some number of sample resolution sizes.
At the renderer, the reference image is re-rendered (as shown at "re-rendered reference sequence" 410) for each of the available resolution sizes or each of the selected sample resolution sizes, as described in connection withstep 308 of fig. 3. An "SSIM" value (shown as "SSIM" 412) is calculated at the renderer for each re-rendered image, as described instep 310 of fig. 3. Two sets of SSIM values are: the SSIM value of the encoded reference image (as shown at "SSIM" 408) and the SSIM value of each profile re-rendered reference image (as shown at "re-rendered reference sequence" 410) are compared to find a match across the group of pictures to provide a resolution setting for each integer value of QP. The results are organized into a lookup table to be used during runtime as shown at "lookup table" 414. By reducing the 3D view resolution to match the quantization settings, wasted rendering effort can be significantly reduced, which can lead to other benefits, including: reduction of energy consumption on the server, reduction of rendering time, and improvement of player feedback delay. These benefits are compounded in an environment where multiple game instances run on a single server.
Fig. 5 is an example of a look-up table generation for a rendering quality setting profile containing a plurality of settings. The process described in connection with fig. 3 does not change for selecting a reference image and measuring the perceptual quality of each encoder setting (as described in connection withsteps 300, 302 and 304). Because the renderer may scale one or more rendering quality settings related to the QP value, the generated list of rendering quality setting profiles (described in connection withstep 306 in fig. 3) may be too long to re-render the reference image and calculate the perceived quality for each rendering quality setting profile. Since there may be a large number of permutations of rendering settings, the decision tree may help to programmatically narrow the probability space. For example, it may not be desirable to have a rendering quality setting profile where the post-processing quality is very low but every other setting is very high. In some embodiments, it may be desirable to cover high quality shadows with low quality post-processing. In other embodiments, the opposite may be true. Such decisions are subjective, but criteria based include, but are not limited to, the computational cost associated with a particular rendering setting, poor perceived quality between two settings, relative conspicuity of one rendering setting relative to another (such as a close-up effect that consumes most of the screen compared to distant details that are only a few pixels wide), or relative game importance (such as an important visual effect for communicating back to the player).
Fig. 5 is an exemplary decision tree (as shown at "decision tree" 500) consisting of leaves for each permutation combination of four possible post-processing quality settings, three possible shadow quality settings, and five possible 3D view resolutions. This example decision tree is significantly smaller than the actual example, as there may be more adaptive rendering settings or more options per setting, as will be apparent to any person of ordinary skill in the art. Preferably, the decision tree is traversed according to any constraint, such as avoiding leaves where the post-processing quality is very low and all other settings are high. For each leaf that is not removed by the constraint, the reference frame may be re-rendered using the rendering quality setting profile associated with the leaf, as depicted at 308 in fig. 3. The computational cost measured in rendering time or clock cycles may be recorded here to serve as a potential resolution point in the event of a perceived quality value conflict. The perceived quality may then be measured for each re-rendered image, as described in connection withstep 310 of fig. 3. For each calculated perceptual quality value (SSIM) in the set of encoder settings calculations, a list of all rendering quality setting profiles with matching SSIM values may be generated, as described in connection withstep 312 of fig. 3. The example of fig. 5 shows the list generated for a QP value of 16.
The SSIM value of the reference image encoded withQP value 16 is 0.997, and for some three rendering quality setting profiles with matching SSIM values, the calculation costs are shown to be 16.004, 15.554, and 15.402. Since there are three conflicts for the perceived quality values, the computed cost of the earlier recording serves as a resolution point and can be used to determine which rendering quality setting profile is the least expensive, in this case the one with cost 15.402. A lookup table (as shown at "lookup table" 502) should be generated to allocate the cheapest rendering quality setting profile to each value of QP, as depicted atstep 314 in fig. 3. The rendering quality setting profile selected forQP value 16 is shown as "profile 16" in fig. 5.
Example 1: effect of rendering time as a proxy for computational waste
In an example embodiment, only the resolution is linearly scaled in response to changes in encoder quality. For example, if the encoder quality degrades by 50%, the resolution will degrade by 50% in response. Since the rendering time savings are directly correlated with the computational power savings, the rendering time is checked while scaling the resolution. Measurements are taken in a low motion environment including the player's hands, weapons, and a perspective of the first person holding the wall. The low motion view is selected to limit the number of factors that may affect the rendering time of the measurement that contaminate the measurement. These factors may include post-processing such as motion blur, changes in the number of rendered objects, changes in on-screen texture, or other elements of the view that may change in high motion views. The fixed view of the fixed scene may also directly compare various measurements taken at a scaled resolution. The rendering engine is forced to output video at progressively lower resolutions and the measurement results are shown in table 1 below.
Table 1: effect of resolution scaling on rendering time
An opacity process (opaque pass) is the part of the rendering pipeline used to draw opaque geometry in a view. This is the portion of the rendering pipeline that is most sensitive to resolution changes. Any rendering time savings or computational cost savings obtained by scaling the resolution will mostly come from the opaque rendering process.
As shown in table 1, at a full resolution 1280x 720 of 60 frames, the total rendering time is 1.4ms, wherein the rendering time of the opacity process is 0.4 ms. When the resolution is reduced to 50% of the full resolution, the total rendering time is 1.0ms, with the rendering time of the opacity process being 0.3 ms. Thus, reducing the resolution by 50% results in a significant saving of about 30% in rendering time. When the resolution is reduced to 25% of full resolution, the total rendering time is 0.8ms, with the rendering time of the opacity process being 0.2 ms. Thus, reducing the resolution by 75% results in a significant savings of more than 40% in rendering time.
The foregoing description and drawings should be considered as illustrative only of the principles of the invention. The present invention is not intended to be limited to the preferred embodiments and can be embodied in various ways that will be apparent to those of ordinary skill in the art. Many applications of the present invention will readily occur to those skilled in the art. Therefore, it is not desired to limit the invention to the specific examples disclosed or the exact construction and operation shown and described. On the contrary, all suitable modifications and equivalents may be resorted to, falling within the scope of the invention.