The part of the Continuous In (CIP) application entitled "Apparatus and Method for Wireless Video Gaming" filed on 10.12.2002, assigned to the assignee of the present CIP application, is part 10/315,460.
Background
Recorded audio and movie media have become an aspect of society since the tomas Edison (Thomas Edison) age. Recorded audio media (pole and record) and movie media (jukeboxes and movies) were widely released in the early 20 th century, but both technologies were still in their inception. In the late 20 s of the 20 th century, movies were combined with audio, followed by color movies, on a mass market basis. Radio broadcasting is evolving into broadcasting mass-market audio media in a form that largely supports advertising. When Television (TV) broadcast standards were established in the mid 40's of the 20 th century, TV was interfaced with radio in the form of broadcast mass market media to bring previously recorded or live movies into the home.
By the middle of the 20 th century, most american households have had record players (photo record players) for playing recorded audio media, radios for receiving live broadcast audio, and televisions for playing live broadcast audio/video (a/V) media. The 3 "media players" (record player, radio and TV) are often combined into a cabinet that shares common speakers, becoming the "media center" of the home. Although media options are limited for consumers, the media "ecosystem" is very stable. Most consumers know how to use a "media player" and are able to enjoy the full range of their capabilities. At the same time, publishers of media (mostly movie and television studios, and music companies) are able to distribute their media to both movie theaters and homes without suffering from widespread piracy or "secondary sales" (i.e., resale of used media). Typically, publishers do not receive revenue from secondary sales, and thus, secondary sales reduce the revenue that publishers receive for new sales from purchasers who may otherwise use media themselves. Although there is indeed a sale of used albums during the middle of the 20 th century, such a sale does not have a major impact on the album publisher, since, unlike a movie or video program (which is typically viewed once or only a few times by adults), a music track can be listened to hundreds or even thousands of times. Thus, music media is far "longer-lived" (i.e., has a persistent value for adult consumers) than movie/video media. Once a record is purchased, if the consumer likes the music, the consumer may hold it for a long time.
From the middle of the 20 th century to the present, the media ecosystem has undergone a series of radical changes to the interests and losses of both consumers and publishers. With the widespread introduction of audio recorders, in particular cassettes with high quality stereo sound, there is indeed a high degree of consumer convenience. It also marks the beginning of widespread consumer media practice-piracy. Indeed, many consumers record their own recordings using cassette tapes purely for convenience, but an increasing number of consumers (e.g., students in a dormitory who are ready to access each other's collection of recordings) will make pirated copies. Also, rather than purchasing a record or tape from a publisher, the consumer will record music played over the air.
The advent of consumer VCRs led to increased consumer convenience, as VCRs are now set up to record TV programs that can be viewed at a later time, and VCRs have also led to the establishment of the video rental industry, where movies as well as TV programming can be accessed on an "on demand" basis. The rapid development of mass market home media devices since the mid 80's of the 20 th century has led to unprecedented choices and convenience for consumers, and has also led to rapid expansion of the media publishing market.
Today, consumers are faced with a plethora of media choices and a plethora of media devices, many of which are tied to a particular form of media or a particular publisher. An avid consumer of media may connect a stack of devices to TVs and computers in various rooms of a house, resulting in a "mouse-and-socket" cable to one or more televisions and/or Personal Computers (PCs) and a cluster of remote controls. (in the context of this application, the term "personal computer" or "PC" refers to any kind of computer suitable for use in the home or office), including desktop computers, Macintosh (Macintosh machine)Or other non-Windows (Windows) computer, Windows-compatible device, Unix variant, notebook computer, etc.). Such devices may include video game consoles, VCRs, DVD players, audio surround sound processor/amplifiers, satellite set-top boxes, cable TV set-top boxes, and the like. In addition, toIn an avid consumer, there may be multiple devices of similar functionality due to compatibility issues. For example, a consumer may have both HD-DVD and Blu-ray DVD players, or Microsoft Xbox (Microsoft home Games)And Sony Playstation (Sony game station) Both video game systems. Indeed, due to incompatibility of some cross-game console versions of games, consumers may own XBox with later versions (such as XBox 360)) And both. Often, consumers are confused as to which video input and which remote to use. Even after the disc is placed in the correct player (e.g., DVD, HD-DVD, blu-ray, Xbox, or Playstation), the video and audio inputs for the device are selected, and the correct remote control is found, the consumer still faces technical challenges. For example, in the case of a wide screen DVD, a user may need to first determine the correct aspect ratio (e.g., 4: 3, full, zoom, wide zoom, cinema wide, etc.) and then set the correct aspect ratio on their TV or monitor screen. Similarly, the user may need to first determine the correct audio surround sound system format (e.g., AC-3, Dolby digital, DTS, etc.) and then set the correct audio surround sound system format. Oftentimes, consumers are unaware that they may not enjoy the media content at the full capabilities of their television or audio system (e.g., watching a movie squeezed at the wrong aspect ratio, or listening to stereo audio rather than surround sound audio).
Increasingly, internet-based media devices have been added to the stack of devices. Similar Sonos (soros)Tone of digital music systemThe audio device streams audio directly from the internet (stream). Likewise, like SlingboxTM(Shilingbao)TM) The entertainment player's device records and streams video out via a home network or via the internet, where the video can be remotely viewed on a PC. And Internet Protocol Television (IPTV) services provide cable TV-like services via Digital Subscriber Line (DSL) or other home internet connections. Recent efforts have also been made to integrate multiple media functions into a single device (such as Moxi (mosi)Media centers and PCs executing versions of the Windows XP media center). While each of the devices provides a little convenience to the functions that it performs, each device lacks universal and simple access to most media. In addition, the devices often cost hundreds of dollars to manufacture, often due to expensive processing and/or the need for local storage. In addition, such modern consumer electronics devices typically consume large amounts of power, even when idle, which means that they are more expensive and waste energy over time. For example, if the consumer forgets to turn the device off or switch it to a different video input, the device may continue to operate. Furthermore, because none of the devices is a complete solution, it must be integrated with a stack of other devices in the home, which still leaves the user with a mouse nest wire and many remote controls.
Furthermore, many newer internet-based devices typically provide media in a more general form (than it might otherwise be available) when they are functioning properly. For example, devices that stream video over the internet often stream only video material, and not interactive "extra items" that often accompany DVDs, such as "production" of video, games, or director's commentary. This is due to the fact that: interactive material is often produced in a particular format intended for a particular device that handles interactivity locally. For example, each of DVD, HD-DVD, and Blu-ray disc has its own particular interactive format. Any home media device or local computer, which may be developed to support all popular formats, will require a degree of sophistication and flexibility, which will likely be too expensive and complex for consumer operations.
Exacerbating this problem, if a new format is introduced later in the future, the local device may not have the hardware capability to support the new format, which would mean that the consumer would have to purchase an upgraded local media device. For example, if a higher resolution video or stereoscopic video is introduced at a later date (e.g., one video stream per eye), the local device may not have the computational power to decode the video, or it may not have the hardware to output the video in a new format (e.g., assuming stereoscopic vision is achieved with 120fps video synchronized with shutter glasses (shuttered glasses) with 60fps delivered to each eye, which option would not be available without an upgraded hardware purchase if the consumer's video hardware could only support 60fps video).
The problems of media device obsolescence and complexity are a serious problem when dealing with sophisticated interactive media, especially video games.
Modern video gaming applications are largely divided into four major non-portable hardware platforms: sony PlayStation1. 2 and 3(PS1, PS2, and PS 3); microsoft XboxAnd Xbox360And Nintendo Gamecube (Nintendo cube sugar)And WiiTM(ii) a And PC-based games. Each of the platforms is different therefromThe player, so that a game written to execute on one platform will not typically execute on another platform. There may also be compatibility issues between one generation of equipment and the next. Even if most software game developers build software games that are designed independently of a particular platform, in order to execute a particular game on a particular platform, a proprietary software layer (often referred to as a "game development engine") is required to adapt the game for use on the particular platform. Each platform is sold to consumers in the form of a "console" (i.e., a stand alone box attached to a TV or monitor/speaker) or is itself a PC. Typically, video games are sold on optical media, such as blu-ray DVD, DVD-ROM, or CD-ROM, that contain the video game embodied as a sophisticated real-time software application. As home broadband speeds increase, video games are becoming increasingly available for download.
The specificity of achieving platform compatibility with video game software is extremely demanding due to the real-time nature and high computational requirements of advanced video games. For example, one may desire full game compatibility from one generation of video games to the next (e.g., from XBox to XBox360, or from Playstation 2 ("PS 2") to Playstation 3 ("PS 3")), as there is general compatibility of productivity applications (e.g., microsoft word) from one PC to another PC with faster processing units or cores. However, this is not the case for video games. Because video game manufacturers typically seek the highest possible performance for a given price point when issuing a generation of video games, dynamic architectural changes are often made to the system so that many games written for previous generation systems do not work on later generation systems. For example, XBox is based on the x86 family of processors, while XBox360 is based on the PowerPC family.
Techniques may be utilized to mimic previous architectures, but given that video games are real-time applications, it is often impractical to achieve exactly the same behavior in the simulation. This is a loss to the consumer, video game console manufacturer, and video game software publisher. For the consumer, this means the necessity to keep both the old and new generation video game consoles on to the TV in order to be able to play all games. For console manufacturers, this means the costs associated with the emulation and slower adoption of new consoles. And for the publisher, this means that multiple versions of a new game may have to be released in order to cover all potential consumers-not only the version for each brand of video game (e.g., XBox, Playstation), but often the version for each version of a given brand (e.g., PS2 and PS 3). For example, a separate version of "crazy football 08" by the electric Arts company limited (Electronic Arts) was developed for XBox, XBox360, PS2, PS3, Gamecube, Wii, and PC platforms, among others.
Portable devices, such as mobile phones and portable media players, also present challenges to game developers. Increasingly, the devices are connected to wireless data networks and are capable of downloading video games. However, there are a variety of mobile phones and media devices in the market with a variety of different display resolutions and computing capabilities. Also, because such devices typically have power consumption, cost, and weight constraints, they typically lack advanced graphics acceleration hardware similar to a graphics processing unit ("GPU"), such as devices manufactured by NVIDIA (NVIDIA corporation, santa clara, ca, usa. Thus, game software developers typically develop a given game title for many different types of portable devices at the same time. The user can discover: a given game title is not available for its particular mobile phone or portable media player.
In the case of a home gaming console, hardware platform manufacturers typically charge royalties to software game developers for the ability to publish games on their platforms. Mobile phone wireless carriers also typically charge royalties to game publishers for downloading games to mobile phones. In the case of PC games, there are no royalties paid for publishing the games, but game developers typically face high costs due to the high customer service burden for supporting multiple PC configurations and installation issues that may arise. Also, PCs are generally less of an impediment to the piracy of game software because they can be easily reprogrammed by a skilled user and games can be more easily pirated and more easily distributed (e.g., via the Internet). Therefore, there are cost and disadvantages for software game developers to distribute on game consoles, mobile phones, and PCs.
Costs are more than this for game publishers of console and PC software. To distribute the game via the retail channels, the publisher charges the retailer a wholesale price that is less than the sale price to have the retailer a profit margin. Publishers typically must also pay for the cost of manufacturing and distributing the physical media that holds the games. Retailers often also charge "price protection fees" to publishers to cover possible contingent fees (such as games that cannot be sold, or games that are reduced in price, or retailers that must refund some or all of the wholesale price and/or receive back games from purchasers). Additionally, retailers also typically charge publishers a fee to facilitate the sale of games in an advertising flyer. Moreover, retailers increasingly purchase back games from users who have played the games, and then sell the games as used games, typically without sharing the revenue of the used games with the game publisher. The following facts increase the cost burden imposed on the game publisher: games are often pirated and distributed over the internet for download and free copying by users.
As internet broadband speeds increase and broadband connectivity becomes more widespread in the united states and around the world (more specifically, "internet cafes" to homes and to PCs renting internet connections), games are more distributed to PCs or consoles via downloads. Moreover, broadband connections are more used for playing multiplayer and massively multiplayer online games (both of which are referred to in this disclosure by the acronym "MMOG"). These changes alleviate some of the costs and problems associated with retail distribution. Downloading online games addresses some of the disadvantages of game publishers, as distribution costs are typically small and there is little or no cost of unsold media. Downloaded games are still pirated and, due to their size (often many gigabytes in size), can take a very long time to download. In addition, multiple games may be filled with small disk drives, such as those sold in connection with portable computers or in connection with video game consoles. However, to the extent that a game or MMOG requires an online connection to make the game playable, piracy problems are mitigated, as users are typically required to have a valid user account. Unlike linear media (e.g., video and music) that can be copied by a camera taking video of a display screen or a microphone recording audio from a speaker, each video game experience is unique and cannot be copied using simple video/audio recordings. Thus, even in regions where copyright laws are not strongly enforced and piracy is rampant, MMOGs can be protected from piracy and thus can support commerce. For example, the "magic animal world" MMOG of vivenda SA (vivanda) has been successfully deployed without suffering piracy all over the world. And many online or MMOG games, such as the "second life" MMOG of Linden Lab (Linden Lab), generate revenue for the game operator through economic models built into the games, where assets can be brought, sold and even built using online tools. Thus, mechanisms other than traditional game software purchases or subscriptions may be used to pay for use of the online game.
While piracy can often be mitigated due to the nature of online or MMOGs, online gaming operators still face remaining challenges. Many games require a significant amount of local (i.e., in-home) processing resources for the online or MMOG to work properly. If the user has a low performance local computer (e.g., a computer without a GPU, such as a low-end notebook computer), then it may not be able to play the game. Additionally, as game consoles age, they fall far behind the current state of the art and may not be able to handle more advanced games. Even assuming that the user's local PC is capable of handling the computational requirements of the game, there is often installation complexity. There may be driver incompatibilities (e.g., if a new game is downloaded, a new version of a graphics driver may be installed, which renders a previously installed game that relies on an old version of the graphics driver inoperable). As more games are downloaded, the console may run out of local disk space. Complex games typically receive downloaded patches from game developers over time when defects are discovered and repaired or if modifications are made to the game (e.g., if the game developers find the level of the game too difficult or too easy to play). The patch requires a new download. Sometimes not all users complete the download of all patches. At other times, downloaded patches introduce other compatibility or disk space consumption issues.
Also, during game play, large data downloads may be required to provide graphical or behavioral information to a local PC or console. For example, if a user enters one room in an MMOG and encounters a scene or character that consists of graphical data or has behavior that is not available on the user's local machine, the data for that scene or character must be downloaded. If the internet connection is not fast enough, this can cause substantial delays during game play. Furthermore, if an encountered scene or character requires storage space or computing power that exceeds that of the local PC or console, it may create the following situation: where the user cannot continue in the game or must continue with a reduced quality graphic. Thus, online or MMOG games often limit their storage and/or computational complexity requirements. In addition, it often limits the amount of data transfer during the game. Online or MMOG games can also narrow the market for users who can play the games.
Moreover, technology-savvy users are increasingly reverse engineering local copies of games and modifying games so that they can cheat. Cheating may be as simple as making repeated button presses that are more rapid than manually possible (e.g., for a very quick fire). In games that support in-game asset transactions, cheating can reach a level of fraud that results in fraudulent transactions involving assets of substantial economic value. This can lead to substantial detrimental consequences for the gaming operator when the online or MMOG economic model is based on such asset transactions.
The cost of developing new games grows as PCs and consoles are able to produce more sophisticated games, e.g., with more realistic graphics (such as real-time ray tracing), and more realistic behaviors (such as real-time physics simulations). In the early days of the video game industry, video game development was a very similar process to application software development; that is, most development costs are in the development of software (as opposed to the development of graphics, audio, and behavioral elements or "assets"), such as those software developments that can be developed for movies with a wide range of special effects. Today, many sophisticated video game developments are more akin to special feature-rich movie development than software development. For example, many video games provide a simulation of the 3-D world and produce characters, props and environments that are more realistic (i.e., computer graphics that appear as realistic as a live action image photographed). One of the most challenging aspects of photo-realistic game development is creating computer-generated faces that cannot be distinguished from live-action faces. Face capture techniques (such as Contour developed by Mova of san Francisco, Calif.)TM(outline)TM) An authenticity capture system) captures the precise geometry of the performer's face and tracks the precise geometry of the performer's face with high resolution while the performer is in motion. This technique allows 3D faces to be rendered on a PC or game console that are virtually indistinguishable from captured live-action faces. Accurately capturing and rendering "photo-realistic" faces is useful in a number of ways. First, highly identifiable celebrities or athletes are often used in video games (often hired at high cost), and imperfections may be noticeable to the user, distracting or unpleasant to the viewing experience. Often, a high degree of detail is required to achieve a high degree of photo-like realism-potentially requiring the rendering of a large number of polygons and high resolution textures (in the case where the polygons and/or textures change on a frame-by-frame basis as the face moves).
When a high polygon count scene with detailed texture changes rapidly, a PC or game console supporting the game may not have enough RAM to store enough polygon and texture data for the required number of animation frames generated in the game segment. In addition, a single optical drive or a single disk drive, which is typically available on a PC or game console, is typically much slower than RAM, and typically cannot keep up with the maximum data rate that a GPU can accept in rendering polygons and textures. Current games typically load most polygons and textures into RAM, which means that a given scene is largely limited in complexity and duration by the capacity of RAM. In cases such as face animation, this may limit a PC or game console to a low resolution face that is not photorealistic, or to a photorealistic face that can only be animated in a limited number of frames before the game pauses and loads polygons and textures (and other data) for more frames.
When a PC or console displays a message similar to "loading." the viewing progress bar moves slowly across the screen is recognized as an inherent drawback by users of today's complex video games. The delay in loading the next scene from the disk (unless otherwise conditional, the term "disk" herein refers to non-volatile optical or magnetic media, and non-disk media such as semiconductor "flash" memory) can take several seconds or even minutes. This wastes time and can be quite frustrating for the game player. As previously mentioned, much or all of the delay may be due to the load time of polygons, textures, or other data from the disk, but may also be the following: when a processor and/or GPU in a PC or console prepares data for a scene, a portion of the load time is spent. For example, a soccer video game may allow a player to select among a large number of players, teams, stadiums, and weather conditions. Thus, depending on what particular combination is selected, different polygons, textures, and other data (collectively "objects") for the scene may be needed (e.g., different subgroups have different colors and patterns on their uniforms). It is possible to enumerate many or all of the various permutations and pre-compute many or all of the objects in advance and store the objects on disk for storage of the game. However, if the number of permutations is large, the amount of storage required for all objects may be too large to fit on disk (or too impractical to download). Thus, existing PC and console systems are typically constrained in both complexity and playback duration for a given scene and suffer from long load times for complex scenes.
Another significant limitation of prior art video game systems and application software systems is that: it increasingly uses large databases of, for example, 3D objects (such as polygons and textures) that need to be loaded into a PC or game console for processing. As described above, when the database is stored locally on disk, the database may take a long time to load. However, if the database is stored at a remote location and accessed via the internet, the load time is typically much more severe. In such a case, it may take minutes, hours, or even days to download a large database. In addition, such databases often incur substantial expenses (e.g., 3D models of detailed high masted sailboats for use in games, movies, or historians) and are intended for sale to local end users. However, once the database is downloaded to the local user, it is at risk of being pirated. In many cases, a user wishes to download a database only for the purpose of evaluating the database to see if it fits the user's needs (e.g., if the 3D garment for the game character has a satisfactory appearance or look when the user performs a particular movement). Long loading times can be an obstacle for users who evaluate 3D databases before deciding to make a purchase.
Similar problems arise in MMOGs (more specifically, games such as those that allow users to utilize more customized characters). For a PC or game console to display a character, it needs to be able to access a database with 3D geometry (polygons, textures, etc.) and the behavior of the character (e.g., if the character has a shield, whether the shield is strong enough to deflect the spear). Typically, when an MMOG is first played by a user, a large database for characters is already available under an initial copy of the game, which is available locally on the game's optical disk or downloaded to a disk. However, as the game progresses, if a user encounters a character or object for which the database is not available locally (e.g., if another user has created a customized character), then their database must be downloaded before the character or object can be displayed. This can result in substantial delay in the game.
Given the sophistication and complexity of video games, another challenge to video game developers and publishers in the case of prior art video game consoles is: developing video games often takes 2 to 3 years, costing tens of millions of dollars. Given that new video game console platforms are introduced at a rate roughly once every five years, game developers need to begin the development of those games years before the new game console is released in order to make the video games available at the same time when the new platform is released. Several consoles from competing manufacturers are sometimes released at about the same time (e.g., within one or two years of each other), but what remains to be distinguished is the popularity of each console (e.g., which console will generate the largest sale of video game software). For example, in the most recent console cycle, Microsoft XBox 360, SonyPlaystation 3, and Nintendo Wii are scheduled to be introduced at about the same general time period. But in the years before that introduction, game developers essentially had to "place bets" which console platforms will be more successful than others, and invest in their development resources accordingly. Movie production companies must also amortize their limited production resources based on movies that they estimate to be likely successful long before they release. Given the growing degree of investment required for video games, game production is becoming increasingly movie-like, and game production companies routinely devote their production resources based on their estimates of future success for particular video games. However, unlike film companies, this impression is not based solely on the success of the production itself; rather, it is based on the success of the game console on which the game is to be executed. Publishing games on multiple consoles at the same time may mitigate risks, but this additional effort increases costs and often delays the actual publishing of games.
Application software and user environments on PCs are becoming more computationally intensive, dynamic, and interactive, not only making them more visually appealing to users, but also making them more useful and intuitive. For example, a new Windows Vista (Windows distant view)TMOperating system and MacintoshBoth subsequent versions of the operating system incorporate visual animation effects. Advanced graphics tools (such as Maya from Autodesk, Inc.)TM(Maya)TM) Provide very sophisticated 3D rendering and animation capabilities that drive the CPU and GPU limitations of the current state of the art. However, the computational requirements of these new tools create many practical problems for users of the products and software developers.
Because the visual display of an Operating System (OS) must work on a variety of computers, including previous generation computers that are no longer sold but can still be upgraded with new OSs, OS graphics requirements are largely limited by the least common denominator of the computers for which the OS is intended, which typically includes computers that do not include GPUs. This severely limits the graphics capabilities of the OS. Furthermore, battery-powered portable computers (e.g., notebook computers) limit visual display capabilities because high computing activity in the CPU or GPU typically results in higher power consumption and shorter battery life. Portable computers typically include software that automatically reduces processor activity when the processor is not being utilized to reduce power consumption. In some computer models, a user may manually reduce processor activity. For example, Sony's VGN-SZ280P notebook computer includes a switch labeled "Stamina (for low performance, longer battery life) on one side and" Speed "(for high performance, shorter battery life) on the other side. The OS executing on the portable computer must be able to function effectively even if the computer is executing at a fraction of its peak performance capability. Thus, OS graphics performance is often kept well below the available computing power of the current state of the art.
High-end, computationally intensive applications (such as Maya) are often sold with the expectation that they will be used on high performance PCs. This typically results in much higher performance, and more expensive and less portable, minimum common point requirements. Thus, the applications have a much more limited target audience than general purpose OS (or general purpose productivity applications like Microsoft Office) and are typically sold in much lower volumes than general purpose OS software or general purpose application software. The potential audience is further limited because it is often difficult for the intended user to try out the computationally intensive application in advance. For example, suppose a student wishes to know how to use Maya or a potential purchaser who already knows the application wishes to try Maya before making an investment in a purchase (this may involve also purchasing a high-end computer capable of executing Maya). When a student or potential purchaser can download or get a physical media copy of a demonstration version of Maya, it will not be able to make an all-round assessment of the product if it lacks a computer capable of performing the full potential of Maya (e.g., processing complex 3D scenes). This substantially limits the audience for the high-end applications. This also makes the selling price high because the development cost is usually amortized over a number of purchases that is much smaller than the number of purchases of the general-purpose application.
High-priced applications also create more incentive for individuals and businesses to use pirated copies of the application software. As a result, high-end application software suffers from rampant piracy, despite significant efforts by publishers of such software to mitigate such piracy through various techniques. However, even when using pirated high-end applications, the user may not eliminate the need to invest in expensive state-of-the-art PCs to execute pirated copies. Thus, although a user may obtain use of a software application for a fraction of its actual retail price, a user of pirated software still needs to purchase or obtain an expensive PC in order to fully utilize the application.
The same is true for users of high performance pirated video games. Although pirates may get games at a fraction of their actual price, they still need to purchase expensive computing hardware (e.g., a GPU-enhanced PC, or a high-end video game console like the XBox 360) needed to properly play the games. Given that video games are typically consumer entertainment, the additional cost for high-end video game systems may be prohibitively expensive. This situation is worse in countries (e.g., china) where the average annual income of current workers is quite low (relative to the average annual income of current workers in the united states). Thus, a much smaller percentage of the population owns a high-end video game system or a high-end PC. In these countries, "internet cafes" where users can pay fees to use computers connected to the internet are quite common. Often, the internet cafes have older model or low end PCs that do not have high performance features (such as GPUs that would otherwise enable players to play computationally intensive video games). This is a key factor in the success of games performed on low-end PCs (such as vivenda's "magic world", which is highly successful in china and is often played in internet cafes in china). In contrast, computationally intensive games (such as "second life") are less likely to be played on a PC installed in an internet cafe in china. The game is virtually inaccessible to users who only have access to low performance PCs in internet cafes.
There are also obstacles to users who consider purchasing video games and who are first willing to try out a demonstration version of the game by downloading the demonstration to their home via the internet. Video game presentations are often full-featured versions of games in which some features are disabled or a limit is imposed on the amount of game play. This may involve a long process (perhaps hours) of downloading gigabytes of data before the game can be installed and executed on a PC or console. In the case of a PC, it may also involve figuring out which special drivers are required for the game (e.g., DirectX or OpenGL drivers), downloading the correct version, installing the correct version, and then determining whether the PC is capable of playing the game. The latter step may involve determining whether the PC has sufficient processing (CPU and GPU) capability, sufficient RAM, and a compatible OS (e.g., some games execute on Windows XP and not Vista). Thus, after attempting to perform a long process of a video game presentation, a user may find that the video game presentation is unlikely to play given the user's PC configuration. Worse, once the user has downloaded new drivers for attempting the demonstration, these driver versions may be incompatible with other games or applications that the user is accustomed to using on the PC, and thus, installation of the demonstration may render previously operable games or applications inoperable. Not only are these barriers frustrating to users, they also create barriers to video game software publishers and video game developers selling their games.
Another problem that leads to uneconomical efficiency is related to the fact that: a given PC or game console is typically designed to accommodate a particular level of performance requirements for the application and/or game. For example, some PCs have more or less RAM, slower or faster CPUs, and slower or faster GPUs (if they have GPUs). Some games or applications utilize the full computing power of a given PC or console, while some games or applications do not. If the user's game or application selection does not reach the peak performance capabilities of the local PC or console, the user may waste money on the PC or console due to the unused features. In the case of a console, the console manufacturer may pay more than is necessary to subsidize the console cost.
Another problem that exists in the sale and enjoyment of video games relates to allowing users to watch others playing the games before they implement a purchase of the games. There are several prior art methods for recording a video game for playback at a later time. For example, U.S. patent No. 5,558,339 teaches recording game state information (including game controller actions) in a video game client computer (owned by the same or a different user) during "game play". This state information may be used at a later time to replay some or all of the game actions on a video game client computer (e.g., a PC or console). The significant drawbacks of this method are: for a user to view a recorded game, the user must have a video game client computer capable of playing the game and must have a video game application executing on the computer so that the game play is exactly the same when the recorded game state is replayed. In addition, the video game application must be written in such a way that there is no possible execution difference between the recorded game and the played back game.
For example, game graphics are generally computed on a frame-by-frame basis. For many games, depending on whether the scene is particularly complex or whether there are other delays that slow down execution (e.g., on a PC, another process may be executing that takes away CPU cycles from the game application), the game logic may sometimes take less than one frame time or longer to calculate the graphics displayed for the next frame. In such a game, a "threshold" frame calculated in slightly less time than one frame time (e.g., a few CPU clock cycles less) may eventually occur. When this same scene is recalculated using exactly the same game state information, it may easily take several CPU clock cycles more than one frame time (e.g., if the internal CPU bus is slightly out of phase with the external DRAM bus, and even if there is no large delay from another process taking milliseconds of CPU time away from game processing, it introduces a delay of several CPU cycle times). Thus, when the game is played back, the frames become calculated at two frame times rather than at a single frame time. Some actions are based on the frequency at which the game calculates new frames (e.g., when the game samples input from the game controller). When a game is played, this deviation in time references for different behaviors does not affect game play, but it can cause the played back game to produce different results. For example, if the orbit of a basketball is calculated at a steady 60fps rate, but the game controller input is sampled based on the calculated frame rate, the calculated frame rate may be 53fps when the game is recorded and 52fps when the game is replayed, which may cause a difference in whether the basketball is blocked from entering the basket, resulting in a different outcome. Therefore, recording a video game using game state requires a very careful game software design to ensure that playback using the same game state information produces exactly the same result.
Another prior art method for recording video games is to record only the video output of a PC or video game system (e.g., to a VCR, DVD recorder, or to a video capture board on a PC). The video can then be rewound and played back, or alternatively, the recorded video uploaded to the internet (typically after the video was compressed). The disadvantages of this method are: when playing back a 3D game sequence, the user is limited to viewing the sequence only from the viewpoint from which the sequence was recorded. In other words, the user may not change the viewpoint of the scene.
In addition, when compressed video of a recorded game sequence played on a home PC or game console is made available to other users via the internet, it is not possible to upload the compressed video to the internet in real time, even if the video is compressed in real time. The reason for this is because many homes in the world that connect to the internet have highly asymmetric broadband connections (e.g., DSL and cable modems typically have much higher downstream bandwidth than upstream bandwidth). Compressed high-resolution video sequences often have a bandwidth that is higher than the upload bandwidth capacity of the network, making it impossible to upload in real-time. Thus, after the game sequence is played (perhaps minutes or even hours), there will be a significant delay before another user on the internet can view the game. While this delay may be tolerable in certain situations (e.g., viewing the outcomes of game players occurring at a previous time), it eliminates the ability to view a live game (e.g., a basketball tournament played by winning players) or the ability to "instant replay" when the game is played live.
Another prior art approach allows a viewer with a television receiver to watch a video game live, but only under the control of the television producer. Some television channels in the united states and other countries provide video game viewing channels where television viewers can view particular video game users (e.g., top-rated players participating in a tournament) on the video game channels. This is done by feeding the video output of the video game system (PC and/or console) into the video distribution and processing device for the television channel. This is as it is the case when the television channel broadcasts a live basketball game, where several cameras provide live feeds from different angles around the basketball court. The television channel is then able to manipulate the output from the various video game systems with its video/audio processing and effects equipment. For example, a television channel may overlay text indicating the status of different players over video from a video game (just as it may overlay text during a live basketball game), and the television channel may record audio from commentators (which may discuss the actions that occur during the game). Additionally, the video game output may be combined with a camera that records video of the actual player of the game (e.g., displays the player's emotional response to the game).
One problem with this approach is that: the live video feed must be made available in real time to the video distribution and processing equipment of the television channel in order to make it irritating to live broadcasts. However, as previously described, this is often not possible when the video game system is executing from home (especially when a portion of the broadcast includes live video from a camera that is capturing real-world video of the game player). Additionally, in a tournament situation, it is of interest for in-home players to modify games and cheating, as previously described. For these reasons, such video game broadcasts on television channels are often configured with players and video game systems gathered at a common location (e.g., at a television studio or in an arena), where television production equipment may accept video feeds from multiple video game systems and potentially live cameras.
While such prior art video game television channels may provide very exciting shows to television viewers, which is a liked experience to live sporting events (e.g., liked to video game players presented in "athletes"), not only in terms of their actions in the video game world, but also in terms of their actions in the real world, these video game systems are often limited to situations where players are in close physical proximity to one another. Furthermore, because television channels are broadcast, each broadcasted channel can only display one video stream selected by the producer of the television channel. Due to these limitations and the high cost of broadcast time, production equipment and producers, the television channels typically only show top players participating in a top tournament.
In addition, a given television channel broadcasting a full screen image of a video game to all television viewers displays only one video game at a time. This severely limits the choices of the television viewer. For example, a television viewer may not be interested in a game displayed at a given time. Another viewer may only be interested in viewing game plays for a particular player that is not shown by the television channel at a given time. In other cases, the viewer may only be interested in viewing how the expert player deals with a particular level in the game. Other viewers may wish to control a point of view from which the video game is viewed that is different from the point of view selected by the production team or the like. In short, television viewers may have countless preferences in watching a video game (even if several different television channels are available, a particular broadcast of a television network does not adapt to the preferences). For all of the above reasons, prior art video game television channels have significant limitations in presenting video games to television viewers.
Another disadvantage of prior art video game systems and application software systems is that: they are complex and often suffer from errors, crashes and/or unintended and unwanted behavior (collectively, "defects"). While games and applications typically go through a debugging and tuning process (often referred to as "software quality assurance" or SQA) prior to release, it is almost invariably: once a game or application is released to a large audience in the field, the bug can suddenly appear. Unfortunately, it is difficult for software developers to identify and track many defects after release. Software developers may have difficulty realizing the defect. Even when it knows about a defect, there may be only a limited amount of information that it can use to identify what caused the defect. For example, a user may phone a game developer's consumer service hotline and leave a message stating: when playing the game, the screen starts to flash, then turns into solid blue (solid blue) and the PC freezes. It provides the SQA team with very little information useful in tracking defects. Some games or applications that are connected online may sometimes provide more information in certain situations. For example, a "watchdog" process may sometimes be used to monitor whether a game or application "crashes". The watchdog process may collect statistics about the state of the game or application process (e.g., memory stack usage state, degree to which the game or application has progressed, etc.) when the game or application crashes, and then upload that information to the SQA team via the internet. But in complex games or applications, this information can take a very long time to decrypt in order to accurately determine what the user was doing at the time of the crash. Nevertheless, it is not possible to determine what sequence of events caused the crash.
Yet another problem associated with PCs and game consoles is that: it suffers from service problems that make it extremely inconvenient for consumers. Service issues also affect the manufacturer of the PC or game console because it typically requires sending special boxes to safely ship a broken PC or console and thus incur the cost of repair if the PC or console is under warranty. The game or application software publisher may also be impacted by loss of sales (or online service usage) caused by the PC and/or console being in a repaired state.
FIG. 1 illustrates a diagram such as Sony Playstation3、Microsoft Xbox360NintendoWiiTMWindows-based personal computers or Apple Macintosh prior art video game systems. Each of the systems includes a Central Processing Unit (CPU) (typically for executing high level program code) for executing program codeA Graphics Processing Unit (GPU) for graphics operations), and multiple forms of input/output (I/O) for communicating with external devices and users. The components are shown grouped together as a single unit 100 for simplicity. The prior art video-game system of FIG. 1 is also shown to include an optical media drive 104 (e.g., a DVD-ROM drive); a hard disk drive 103 for storing video game program code and data; a network connection 105 for playing a multiplayer game, for downloading a game, patch, presentation, or other media; a Random Access Memory (RAM)101 for storing program code currently being executed by the CPU/GPU 100; a game controller 106 for receiving input commands from a user during game play; and a display device 102 (e.g., SDTV/HDTV or a computer monitor).
The prior art system shown in fig. 1 suffers from several limitations. First, the optical drive 104 and the hard disk drive 103 tend to have much slower access speeds compared to the access speed of the RAM 101. When working directly through RAM101, CPU/GPU100 can in practice process much more polygons per second than is possible when program code and data are read out directly from hard disk drive 103 or optical drive 104, due to the fact that RAM101 typically has much higher bandwidth and is not subject to the relatively long seek delays of the disk mechanism. But only a limited amount of RAM is provided in these prior art systems (e.g., 256-512 megabytes). Thus, a "loading." sequence is often required in which the RAM101 is periodically filled with data for the next scene of the video game.
Some systems attempt to overlap the loading of program code with game play simultaneously, but this can only be done when there is a known sequence of events (e.g., if a car is being driven along a road, the geometry of the approaching building on the roadside can be loaded while the car is being driven). For complex and/or fast scene changes, this type of overlap generally does not work. For example, in the situation where a user is in the middle of a campaign and the RAM101 is completely filled with data representing objects within the view at that time, if the user moves the view quickly to the left to view objects not currently loaded in RAM101, a discontinuity in action will result because there is not enough time to load new objects into RAM101 from hard drive 103 or optical media 104.
Another problem with the system of FIG. 1 is due to the storage capacity limitations of the hard drive 103 and optical media 104. Although disk storage devices can be manufactured with relatively large storage capacities (e.g., 500 megabytes or more than 500 megabytes), they do not provide sufficient storage capacity for the particular conditions encountered in current video games. For example, as previously described, a soccer video game may allow a user to select among many teams, players, and sports arenas around the world. For each team, each player, and each stadium, a large number of texture maps and environment maps are needed to characterize the 3D surface in the world (e.g., each team has a unique jersey, each requiring a unique texture map).
One technique for solving the latter problem described above is: for games, once the user selects a texture and environment map, the texture and environment map is pre-computed. This may involve many computationally intensive processes, including decompressing images, 3D mapping, shading, organizing data structures, and so forth. Thus, there may be a delay for the user when the video game performs these calculations. One way to reduce this delay is in principle: all of these calculations were performed initially when the game was developed-including each permutation of team, player roster and playground. The released version of the game will thus include all of the pre-processed data stored on the optical media 104 or on one or more servers on the internet, with only selected pre-processed data for a given team, player roster, sports field selection being downloaded to the hard drive 103 via the internet when the user makes a selection. However, as a practical matter, this pre-loaded data for each permutation possible in game play can easily be several terabytes (terabytes) of data, which far exceeds the capacity of today's optical media devices. Furthermore, the data for a given team, player roster, playground selection may easily be several megabytes of data or more. In the case of a home network connection (e.g., 10Mbps), downloading this data via the network connection 105 will take longer than computing the data locally.
Thus, the prior art gaming architecture shown in fig. 1 subjects the user to significant delays between large scene transitions of a complex game.
Another problem with prior art methods, such as the one shown in fig. 1, is that: over the years, video games have tended to become more advanced and require more CPU/GPU processing power. Thus, even with an unlimited amount of RAM, video game hardware requirements exceed the peak level of processing power available in the system. Therefore, users are required to upgrade the game hardware every few years to maintain synchronization (or to play newer games at a lower quality level). The consequences of the trend towards more advanced video games than ever are: machines for playing video games for home use are generally not cost effective because their cost is generally determined by the requirements of the highest performance games they can support. For example, it is possible to use the XBox360 to play games like "War machines" (Gears of War), which require high performance CPUs, GPUs and several megabytes of RAM, or it is possible to use the XBox360 to play "bean eating (Pac Man)", which is a game from the 70's 20 th century, which requires only several kilobytes of RAM and a very low performance CPU. In fact, the XBox360 has sufficient computing power to host many simultaneous "bean-eating" games at the same time.
During most hours of the week, video game machines are typically turned off. According to a study of Nielsen entertainment at 7 months 2006 for active players at and above 13 years old, on average, active players spent only 12% of the fourteen or all hours of the week playing the console video game. This means that the average video game console is idle 88% of the time, which is an inefficient use of expensive resources. This is particularly significant given that video game consoles are often subsidized by the manufacturer to reduce purchase prices (with the expectation that the subsidies will be earned back through royalties from future video game software purchases).
Video game consoles also incur costs associated with almost any consumer electronics device. For example, it is desirable to house the electronics and mechanisms of the system in a housing. The manufacturer needs to provide service guarantees. The retailer that sells the system needs to receive a profit on the sale of the system and/or on the sale of the video game software. All of these factors add to the cost of the video game console, which must be subsidized by the manufacturer, passed on to the consumer, or both.
In addition, piracy is a major problem for the video game industry. The security mechanisms utilized on virtually every larger video game system have been "breached" over the years, resulting in unauthorized duplication of the video game. For example, the Xbox360 security system broke in 2006, month 7 and the user is now able to download illegal copies online. Downloadable games (e.g., games for PCs or macs) are particularly vulnerable to piracy. In certain regions of the world where piracy is poorly regulated, there is virtually no viable market for stand-alone video game software, as users can purchase pirated copies with legitimate copies generally easily at a very small fraction of the cost. Moreover, in many parts of the world, the cost of game consoles is a high percentage of revenue, so that even if piracy is controlled, few people can afford state-of-the-art game systems.
In addition, the market for used games reduces revenue to the video game industry. When a user becomes bored with a game, they may sell the game to a store that resells the game to other users. This unauthorized but widespread practice significantly reduces revenue for game publishers. Similarly, when there is a platform transition every few years, a sales reduction of about 50% typically occurs. This is because: when the user knows that a newer version of a platform is about to be released, the user stops purchasing games for the older platform (e.g., when Playstation3 is about to be released, the user stops purchasing Playstation2 games). In combination, the loss of sales and the increased development costs associated with the new platform can have a very significant adverse impact on the profitability of the game developer.
New game consoles are also very expensive. Xbox360, Nintendo Wii, and Sony Playstation 3 are retail sold in hundreds of dollars. High-capacity personal computer gaming systems can cost up to $ 8000. This represents a significant investment for the user, particularly in view of the fact that the hardware becomes obsolete after a few years and that many systems are purchased for children.
One approach to the above problem is online gaming, where game program code and data are hosted on a server and delivered to client machines on demand, with compressed video and audio streamed over a digital broadband network. Some companies, such as G-Cluster in finland, which is now a subsidiary of SOFTBANK Broadmedia in japan, are currently providing the service online. Similar gaming services become available in local networks, such as those within hotels and provided by DSL and cable television providers. A major drawback of these systems is the problem of latency, i.e. the time it takes for the signal to travel to and from the game server, which is typically located in the "front end" of the operator. Fast motion video games (also known as "twitch" video games) require very low latency between the time a user performs an action through a game controller and the time a display screen is updated to display the result of the user action. Low latency is required so that the user feels the game responds "instantly". The user may be satisfied at different delay intervals depending on the type of game and the proficiency of the user. For example, a 100 millisecond delay may be tolerable for slow casual games (like the checkers) or slow-action role-playing games, but in fast-action games, delays in excess of 70 milliseconds or 80 milliseconds may cause the user to behave more poorly in the game and thus be unacceptable. For example, in games that require fast reaction times, there is a sharp drop in accuracy as the latency increases from 50 milliseconds to 100 milliseconds.
When a game or application server is installed in a nearby controlled network environment or a network environment where the network path to the user is predictable and/or tolerant of bandwidth peaks, it is much easier to control latency in terms of maximum latency and consistency of latency (e.g., so the user observes steady motion from digital video streaming over the network). This degree of control can be achieved as follows: between the cable TV network head-end to the cable TV user's home, or from the DSL central office to the DSL user's home, or in a business office Local Area Network (LAN) environment from a server or user. Furthermore, it is possible to obtain a point-to-point private connection between businesses with a certain hierarchy of guaranteed bandwidth and latency. But in a game or application system that hosts a game in a server center connected to the general internet and then streams (streams) the compressed video to users via a broadband connection, many factors cause delays, resulting in serious limitations in the deployment of prior art systems.
In a typical broadband connected home, a subscriber may have a DSL or cable modem for broadband services. The broadband service typically causes a round-trip delay between the user's home and the general internet of up to 25 milliseconds (and sometimes more). In addition, there is round-trip latency due to routing data to the server center via the internet. The latency through the internet varies based on the route given to the data and the delay caused by the data as it is routed. In addition to routing delays, round-trip delays are also caused by the speed of light traveling through the optical fibers that interconnect most of the internet. For example, for every 1000 miles, a round-trip delay of about 22 milliseconds results due to the speed of light through the fiber and other overhead.
The additional latency may be due to the data rate of the data flowing over the internet. For example, if a user has DSL service sold at "6 Mbps DSL service," in practice, the user will likely get at most a downlink throughput of less than 5Mbps, and will likely see periodically connection degradation due to various factors, such as congestion at the Digital Subscriber Line Access Multiplexer (DSLAM) during peak loading times. A similar problem may arise if there is congestion in the local shared coaxial cable circulating through neighbors or elsewhere in the cable modem system network, reducing the data rate of the cable modems for connections sold with "6 Mbps cable modem service" to much less than that data rate. If data packets at a steady rate of 4Mbps are made to flow unidirectionally in User Datagram Protocol (UDP) format from the server center over the connection, if everything works properly, the data packets will pass through without additional delay, but if there is congestion (or other impediment) and only 3.5Mbps is available to flow data to the user, then in a typical situation the packets will be dropped, resulting in lost data, or the packets will queue up at the point of congestion until they can be sent, introducing additional delay. Different congestion points have different queue capacities for holding delayed packets, so in some cases packets that cannot successfully resolve congestion are immediately dropped. In other cases, millions of bits of data are queued and eventually sent. However, in almost all cases, queuing at the point of congestion has a capacity limit, and once this limit is exceeded, the queue will overflow and packets will be dropped. Thus, to avoid incurring additional latency (or worse, packet loss), exceeding the data rate capacity from the game or application server to the user must be avoided.
Latency is also caused by the time required to compress the video in the server and decompress the video in the client device. Further delay is incurred when the video game executing on the server is computing the next frame to be displayed. Currently available video compression algorithms suffer from high data rates or high latency. For example, motion JPEG is an intra-frame-only lossy compression algorithm characterized by low latency. Each frame of the video is compressed independently of each other frame of the video. When a client device receives a frame of compressed motion JPEG video, it can immediately decompress the frame and display the frame, resulting in very low latency. But because each frame is compressed separately, the algorithm is unable to take advantage of similarities between successive frames, and therefore only intra-frame video compression algorithms suffer from very high data rates. For example, a 60fps (frames per second) 640 x 480 motion JPEG video may require data at 40Mbps (megabits per second) or above 40Mbps (megabits per second). The high data rate for the low resolution video window will be prohibitively expensive in many broadband applications (and indeed for most consumer internet-based applications). In addition, because each frame is compressed independently, artifacts in the frame that may result from lossy compression may appear at different locations in successive frames. This can result in a visual artifact that appears to the viewer as moving when the video is decompressed.
Other compression algorithms, such as MPEG2, h.264 or VC9 from Microsoft corporation, when used in prior art configurations, can achieve high compression ratios, but at the expense of high latency. The algorithm utilizes inter-frame compression as well as intra-frame compression. Periodically, the algorithm performs intra-only compression of the frame. Such frames are referred to as key frames (commonly referred to as "I" frames). The algorithm then typically compares the I-frame to both the previous frame and the successive frame. Rather than compressing the previous and successive frames independently, the algorithm determines what the change in the image has from the I frame to the previous and successive frames, and then stores the change as: "B" frames (for changes before an I frame) and "P" frames (for changes after an I frame). This results in a much lower data rate than intra-frame only compression. However, it usually comes at the cost of higher latency. I-frames are typically much larger (often 10 times larger) than B-frames or P-frames, and therefore, proportionally take longer to transmit at a given data rate.
Consider (for example) an i case: where an I-frame is 10 times the size of a B-frame and a P-frame, and there are 29B-frames + 30P-frames 59 interframes for each single I-frame, or 60 frames total for each "group of frames" (GOP). Thus, at 60fps, there are 1 GOP of 60 frames per second. It is assumed that the transmission channel has a maximum data rate of 2 Mbps. To achieve the highest quality video in the channel, the compression algorithm will produce a 2Mbps data stream, and given the above ratios, this will produce 2 million bits per frame (Mb)/(59+10) 30,394 bits and 303,935 bits per I frame. When a compressed video stream is received through a decompression algorithm, each frame needs to be decompressed and displayed at regular intervals (e.g., 60fps) in order to stably play the video. To achieve this result, if any frame is subject to transmission delay, all frames need to be delayed by at least that delay, so the worst case frame delay will define the delay for each video frame. Because the I-frame is largest, the I-frame introduces the longest transmission delay, and the entire I-frame will have to be received before the I-frame (or any inter-frame depending on the I-frame) can be decompressed and displayed. Assuming a channel data rate of 2Mbps, it would take 303,935/2Mb to 145 milliseconds to transmit the I-frame.
Inter-frame video compression systems that use a large percentage of the bandwidth of the transmission channel (as described above) will suffer from long delays due to the large size of the I-frames relative to the average size of the frames. Or, in other words, when the prior art inter-frame compression algorithm achieves a lower average data per frame rate (e.g., 2Mbps versus 40Mbps) than the intra-frame only compression algorithm, it still suffers from a high peak data per frame rate due to large I-frames (e.g., 303,935 × 60 ═ 18.2 Mbps). But please remember: the above analysis assumes that both P and B frames are much smaller than I frames. Although this is generally true, it is not true for frames with high image complexity that are not related to previous frames, high motion, or scene changes. In such cases, the P-frames or B-frames may become generally larger than the I-frames (if the P-frames or B-frames become larger than the I-frames, the sophisticated compression algorithm will typically "force" the I-frames and replace the P-frames or B-frames with the I-frames). Thus, a data rate peak of I-frame size may occur in the digital video stream at any time. Thus, for compressed video, high peak data rates from I-frames or large P-frames or B-frames result in high frame latency when the average video data rate is close to the data rate capacity of the transmission channel (often the case given the high data rate requirements for video).
Of course, the above discussion only characterizes the compression algorithm delay resulting from a large B-frame, P-frame, or I-frame in a GOP. If B frames are used, the delay will be higher. The reason is because all B frames and I frames after a B frame must be received before the B frame can be displayed. Thus, in a picture Group (GOP) sequence such as bbbbbipppbbbbippppp, where there are 5B-frames before each I-frame, the first B-frame can be displayed by the video decompressor only after the subsequent B-frame and I-frame are received. Thus, if the video is streamed at 60fps (i.e., 16.67 ms/frame), it will take 16.67 x 6 to 100 ms to receive five B-frames and I-frames before the first B-frame can be decompressed, regardless of the channel bandwidth, and this is the case for only 5B-frames. Compressed video sequences with 30B frames are quite common. Furthermore, at low channel bandwidths, such as 2Mbps, the delay impact due to the size of the I-frame adds significantly to the delay impact due to waiting for a B-frame to arrive. Thus, over a 2Mbps channel, with a large number of B frames, it is quite easy to use prior art video compression techniques for delays exceeding 500 milliseconds or more than 500 milliseconds. If B-frames are not used (at the expense of a lower compression ratio for a given level of quality), then no B-frame delay is incurred, but the delay due to peak frame size described above is still incurred.
The problem is exacerbated by the very nature of many video games. Video compression algorithms utilizing the GOP structure described above are largely optimized for use with live video or movie material to be used for passive viewing. Typically, the camera (real camera, or virtual camera in the case of computer generated animation) and scene are relatively stable simply because if the camera or scene moves around too jerkily, the video or movie material (a) is typically unpleasant to watch, and (b) if it is being watched, the viewer is typically unable to closely follow the action when the camera suddenly jerks around (e.g., if the camera is disturbed when shooting a child blowing out candles on birthday cakes and suddenly jerks around between cakes, the viewer typically focuses on the child and the cakes, rather than taking a brief interruption when the camera suddenly moves). In the case of a video conference or video teleconference, the camera may be held in a fixed position and not moved at all, resulting in very few data peaks at all. But 3D high motion video games are characterized by constant motion (e.g., consider a 3D tournament where the entire frame is in fast motion for the duration of the tournament, or consider a first person shooter game where the virtual camera is constantly moving around jerkily). The video game may produce a sequence of frames with large and frequent peaks where the user may need to clearly see what happens during this sudden movement. Thus, in 3D high motion video games, compression artifacts are far from tolerable. Thus, the video output of many video games (due to their nature) produces a compressed video stream with very high and frequent peaks.
Given that users of fast-action video games have little tolerance for high latency, and given all of the above latency reasons, there has been a limitation to server-hosted video games that stream video over the internet. In addition, if an application requiring a high degree of interactivity is hosted on the general internet and streams video, users of the application suffer from similar limitations. The service requires a network configuration in which the hosting server is located directly in the head-end (in the case of cable broadband) or in the central office (in the case of Digital Subscriber Line (DSL)), or within the LAN (or a specially rated private connection) in a commercial setting, in order to control the routes and distances from the client devices to the server to minimize latency and to accommodate peaks without causing latency. LANs (typically rated at 100Mbps-1Gbps) and leased lines with sufficient bandwidth typically can support peak bandwidth requirements (e.g., 18Mbps peak bandwidth is a fraction of the capacity of a 100Mbps LAN).
Peak bandwidth requirements may also be accommodated by the residential broadband infrastructure if specially adapted. For example, on a cable TV system, digital video communication may be given a dedicated bandwidth that can handle peaks such as large I-frames. Furthermore, on DSL systems, higher speed DSL modems (allowing for high peak values) may be provisioned, or specially rated connections that can handle higher data rates may be provisioned. However, traditional cable modems and DSL infrastructures attached to the general internet are far from tolerant of peak bandwidth requirements for compressed video. Thus, online services (hosting video games or applications in a server center a long distance from the client device, and then streaming the compressed video output over the internet via a traditional residential broadband connection) suffer from significant latency and peak bandwidth requirements-especially for games and applications that require very low latency (e.g., first person shooter games and other multi-user, interactive action games, or applications that require fast response times).
Detailed Description
In the following description, specific details are set forth (such as device types, system configurations, communication methods, etc.) in order to provide a thorough understanding of the present disclosure. However, it will be understood by those of ordinary skill in the art that these specific details may not be required to practice the described embodiments.
Fig. 2 a-2 b provide a high-level architecture of two embodiments in which video games and software applications are hosted by a hosting service 210 and accessed via the internet 206 (or other public or private network) by a client device 205 at a user site 211 under a subscription service (note that "user site" means a location anywhere where a user is located, including outdoors if using a mobile device). The client device 205 may be a general purpose computer with a wired or wireless connection to the internet, with an internal or external display device 222, such as a PC based on Microsoft Windows or Linux, or a Macintosh computer of Apple inc, or it may be a dedicated client device such as a set-top box that outputs video and audio to a monitor or television 222 (with a wired or wireless connection to the internet), or it may be a mobile device that presumably has a wireless connection to the internet.
Any of the devices may have its own user input device (e.g., keyboard, buttons, touch screen, tracking pad or inertial-sensing wand, video capture camera and/or motion tracking camera, etc.), or it may use an external input device 221 (e.g., keyboard, mouse, game controller, inertial sensing wand, video capture camera and/or motion tracking camera, etc.) connected by wire or wirelessly. As described in more detail below, the hosting service 210 includes servers of various performance levels (including those with high-power CPU/GPU processing capabilities). During the playing of a game or the use of an application on the hosting service 210, the home or office client device 205 receives keyboard and/or controller input from the user, and then it transmits the controller input to the hosting service 210 via the internet 206, and the hosting service 210 in response executes the game program code and generates successive frames of video output (a sequence of video images) for the game or application software (e.g., if the user presses a button that would direct an on-screen character to move to the right, the game program would then generate a sequence of video images showing the character moving to the right). The sequence of video images is then compressed using a low-latency video compressor, and the hosting service 210 then transmits the low-latency video stream over the internet 206. The home or office client device then decodes the compressed video stream and renders the decompressed video images on a monitor or TV. Thus, the computational and graphics hardware requirements of the client device 205 are significantly reduced. The client 205 need only have processing power for forwarding keyboard/controller inputs to the internet 206 and decoding and decompressing the compressed video stream received from the internet 206, virtually any personal computer today is capable of doing this in software on its CPU (e.g., an Intel corporation dual core CPU executing at about 2GHz is capable of decompressing 720p HDTV encoded using a compressor such as h.264 and Windows media VC 9). Furthermore, in the case of any client device, the dedicated chip can also perform video decompression for the standard in real time at much lower cost and with much less power consumption than a general purpose CPU (such as is required by modern PCs). Notably, to perform the functions of forwarding controller input and decompressing video, the home client device 205 does not require any specialized Graphics Processing Unit (GPU), optical drive, or hard drive (such as the prior art video game system shown in fig. 1).
As games and application software become more complex and more photo-realistic, it will require higher performance CPUs, GPUs, more RAM, and larger and faster disk drives, and can keep the computing power at the hosting service 210 constantly upgraded, but the end user will not need to upgrade the home or office client platform 205 because the processing requirements of the home or office client platform 205 will remain constant for display resolution and frame rate through a given video decompression algorithm. Thus, there are no hardware limitations and compatibility issues seen today in the systems illustrated in fig. 2 a-2 b.
Additionally, because the game and application software is executed only in servers in the hosting service 210, there is never a copy (in the form of optical media, or downloaded software) of the game or application software in the user's home or office (unless otherwise conditional, "office" as used herein would include any non-residential context, including, for example, a classroom). This significantly mitigates the possibility of the game or application software being illegally copied (pirated), as well as the possibility of valuable databases that can be used by the game or application software being pirated. Indeed, if a specialized server is required to play games or application software that is not feasible for home or office use (e.g., requiring very expensive, large, or noisy equipment), even if a pirated copy of the game or application software is obtained, it will not be operable in the home or office.
In one embodiment, the hosting service 210 provides a software development tool to a game or application software developer (which refers generally to a software development company, a game or movie studio, or a game or application software publisher) 220 that designs video games so that it can design games that can be executed on the hosting service 210. The tools allow developers to leverage features of the hosting service that would not normally be available in a standalone PC or game console (e.g., quickly accessing a very large database of complex geometries (unless otherwise conditional, "geometry" will be used herein to refer to polygons, textures, rigging, lighting, behaviors, and other components and parameters that define a 3D dataset)).
Under this architecture, different business models are possible. Under one model, the hosting service 210 collects subscription fees from end users and pays royalties to the developers 220, as shown in FIG. 2 a. In an alternative implementation (shown in FIG. 2 b), the developer 220 charges subscription fees directly from the user and pays the hosting service 210 for hosting the game or application content. These underlying principles are not limited to any particular business model for providing online gaming or application hosting.
Compressed video characteristics
As previously discussed, one significant problem with providing video game services or application software services online is latency. A delay of 70-80 milliseconds (from the time the input device is actuated by the user to the time the response is displayed on the display device) is an upper limit for games and applications that require fast response times. However, this is very difficult to achieve with the architectures shown in fig. 2a and 2b due to a large number of practical and physical constraints.
As indicated in fig. 3, when a user subscribes to an internet service, the connection is typically rated as a nominal maximum data rate 301 to the user's home or office. This maximum data rate may be more or less strictly enforced depending on the provider's policies and routing device capabilities, but the actual available data rate is typically lower for one of many different reasons. For example, there may be excessive network communication at the DSL central office or on the local cable modem loop, or there may be noise on the cable causing dropped packets, or the provider may establish a maximum number of bits per user per month. Currently, the maximum downstream data rate for cable and DSL services is typically in the range of hundreds of kilobits per second (Kbps) to 30 Mbps. Cellular services are typically limited to hundreds of Kbps of downstream data. However, the speed of broadband services and the number of users subscribing to broadband services will increase dramatically over time. Currently, some analysts estimate that 33% of us broadband users have a downstream data rate of 2Mbps or more than 2 Mbps. For example, some analysts predict: by 2010, over 85% of U.S. broadband users will have data rates of 2Mbps or more than 2 Mbps.
As indicated in fig. 3, the actual available maximum data rate 302 may fluctuate over time. Thus, in the case of low-latency, online gaming or application software, it is sometimes difficult to predict the actual available data rate for a particular video stream. Several problems may arise if the data rate 303 required to maintain a given level of quality at a given number of frames per second (fps) at a given resolution (e.g., 640 x 480@60fps) for a given amount of scene complexity and motion is increased above the actual available maximum data rate 302 (as indicated by the peaks in fig. 3). For example, some internet services will only drop packets, resulting in lost data and distorted/lost images on the user's video screen. Other services will temporarily buffer (i.e., queue) additional packets and provide them to the client at the available data rate, resulting in an increase in latency-an unacceptable result for many video games and applications. Finally, some internet service providers treat increases in data rates as malicious attacks, such as denial of service attacks (a well-known technique used by computer hackers to disable network connections), and will cut off the user's internet connection for a certain period of time. Accordingly, embodiments described herein seek to ensure that the required data rate for a video game does not exceed the maximum available data rate.
Hosting service architecture
FIG. 4a illustrates an architecture of a hosting service 210 according to one embodiment. The hosting service 210 may be located in a single server center, or may be distributed across multiple server centers (to provide low latency connections for users having lower latency paths to a particular server center than others, to provide load balancing among users, and to provide redundancy in the event of failure of one or more server centers). The hosting service 210 may eventually include thousands or even millions of servers 402, serving a very large user base (user base). The hosting service control system 401 provides overall control of the hosting service 210 and directs routers, servers, video compression systems, billing and accounting systems, and the like. In one embodiment, the hosting service control system 401 is implemented on a Linux-based decentralized processing system that is bound to a RAID array for storing data banks for user information, server information, and system statistics. In the above description, various actions performed by the hosting service 210 are initiated and controlled by the hosting service control system 401 unless due to other specific systems.
The hosting service 210 includes a number of servers 402, such as those currently available from Intel (Intel corporation), IBM (International Business machines corporation, USA), and Hewlett Packard (Hewlett Packard), among others. Alternatively, the server 402 may be assembled into a custom component configuration, or eventually the server 402 may be integrated such that the entire server is implemented as a single chip. Although this figure shows a few servers 402 for illustration, in an actual deployment, there may be as few as one server 402 or as many as millions or more servers 402. The servers 402 may all be configured in the same manner (as examples of some configuration parameters, with the same CPU type and performance; with or without a GPU, and with a GPU, with the same GPU type and performance; with the same number of CPUs and GPUs; with the same amount and same type/speed of RAM; and with the same RAM configuration), or various subsets of the servers 402 may be configured in the same manner (e.g., 25% of the servers may be configured in one particular manner, 50% of the servers are configured in a different manner, and 25% of the servers are configured in yet another manner), or each server 402 may be different.
In one embodiment, servers 402 are diskless, i.e., not have their own local mass storage (which is optical or magnetic, or semiconductor-based storage such as flash memory or other mass storage device serving a similar function), each server accessing a common mass storage via a fast backplane or network connection. In one embodiment, the fast connection is a Storage Area Network (SAN)403 connected to a Redundant Array of Independent Disks (RAID)405 series with connections between devices implemented using HyperText. As known to those skilled in the art, SAN403 may be used to combine many RAID arrays 405 together, resulting in extremely high bandwidth-approaching or potentially exceeding the bandwidth available from the RAM used in current game consoles and PCs. Furthermore, while RAID arrays based on rotating media, such as magnetic media, often have significant seek time access latency, RAID arrays based on semiconductor storage may be implemented with much lower access latency. In another configuration, some or all of the servers 402 provide some or all of their own mass storage locally. For example, the server 402 may store frequently accessed information (such as a copy of its operating system and video games or applications) on low latency local flash based storage, but it may utilize a SAN to access a rotating media based RAID array 405 with higher search latency to access a large database of geometry or game state information less frequently.
Additionally, in one embodiment, the hosting service 210 uses low-latency video compression logic 404, described in detail below. The video compression logic 404 may be implemented in software, hardware, or any combination thereof (specific embodiments of which are described below). The video compression logic 404 includes logic for compressing audio as well as visual material.
In operation, when a video game is played or an application at the user premises 211 is used via a keyboard, mouse, game controller or other input device 421, control signal logic 413 on the client 415 transmits control signals 406a-b (typically in the form of UDP packets) to the hosting service 210 representing button presses (and other types of user input) caused by the user. Control signals from a given user are routed to the appropriate server (or servers if multiple servers are responsive to the user's input device) 402. As illustrated in fig. 4a, the control signal 406a may be routed to the server 402 via a SAN. Alternatively or additionally, the control signal 406b may be routed directly to the server 402 via a hosting service network (e.g., an ethernet-based area network). Regardless of how the control signals 406a-b are transmitted, the or the server executes the game or application software in response to the control signals 406 a-b. Although not illustrated in fig. 4a, various network connection components, such as firewalls and/or gateways, may handle incoming and outgoing communications at the edge of the hosting service 210 (e.g., between the hosting service 210 and the internet 410) and/or at the edge of the user premises 211 (between the internet 410 and the home or office client 415). The graphics and audio output of the executed game or application software (i.e., the new sequence of video images) is provided to low-latency video compression logic 404, which low-latency video compression logic 404 compresses the sequence of video images in accordance with low-latency video compression techniques, such as those described herein, and transmits a compressed video stream (typically with compressed or uncompressed audio) back to the client 415 via the internet 410 (or, as described below, via a best high-speed network service that bypasses the general internet). Then, low-latency video decompression logic 412 on client 415 decompresses the video and audio streams and reproduces the decompressed video stream, and typically plays the decompressed audio stream on display device 422. Alternatively, the audio may be played on a speaker separate from the display device 422 or not played at all. Note that although input device 421 and display device 422 are shown in fig. 2a and 2b as stand-alone devices, they may be integrated within a client device, such as a portable computer or mobile device.
The home or office client 415 (previously described in fig. 2a and 2b as the home or office client 205) may be a very inexpensive and low-capability device with very limited computing or graphics performance and possibly very limited or no local mass storage. In contrast, each server 402 coupled to SAN403 and multiple RAIDs 405 may be an exceptionally high performance computing system, and in fact, if multiple servers are used cooperatively in a parallel processing configuration, there are few limitations on the amount of computation and graphics processing capability that can be tolerated. In addition, the computing power of the server 402 is provided to the user due to the low-latency video compression 404 and the low-latency video decompression 412 (as perceived by the user). When a user presses a button on the input device 421, the image on the display 422 is updated (without a perceptually meaningful delay) in response to the button press, as if the game or application software were executing locally. Thus, for a home or office client 415 that is a very low performance computer or simply a low cost chip implementing the low latency video decompression and control signal logic 413, the remote location, which is locally available from the point of view, effectively provides the user with any computing power. This gives the user the ability to play the highest level, processor intensive (typically new) video games and highest performance applications.
Fig. 4c shows a very basic and inexpensive home or office client device 465. The device is one embodiment of a home or office client 415 according to fig. 4a and 4 b. Which is approximately 2 inches long. Having an ethernet jack 462 that interfaces with a power over ethernet (PoE) cable, the device derives its power from the ethernet jack 462 and its connectivity to the internet. The device is capable of performing Network Address Translation (NAT) within a network that supports NAT. In an office environment, many new ethernet switches have PoE and bring the PoE directly to ethernet jacks in the office. In this case, all that is required is an ethernet cable from the wall jack to the client 465. If the available ethernet connection is not carrying power (e.g., in a home with DSL or cable modem but without PoE), there are inexpensive wall "bricks" (i.e., power supplies) available that will accept the unpowered ethernet cable and output the ethernet with PoE.
The client 465 contains control signal logic 413 (fig. 4a) coupled to a bluetooth wireless interface that interfaces with a bluetooth input device 479, such as a keyboard, mouse, game controller, and/or microphone and/or headset. Also, one embodiment of the client 465, if coupled with the display device 468, is able to output video at 120fps, the display device 468 is able to support 120fps video and signal (typically via infrared) a pair of shutter glasses 466 to alternately shutter one eye and then the other for each successive frame. The effect perceived by the user is a stereoscopic 3D image that "pops out" of the display screen. One such display device 468 that supports this operation is Samsung HL-T5076S. Because the video streams for each eye are separate, in one embodiment where two independent video streams are compressed by the hosting service 210, the frames are interleaved in time and the frames are decompressed within the client 465 in two independent decompression processes.
The client 465 also contains low-latency video decompression logic 412 that decompresses incoming video and audio and outputs via an HDMI (high definition multimedia interface), connector 463, which plugs into an SDTV (standard definition television) or HDTV (high definition television) 468, providing video and audio to the TV, or into an HDMI-enabled monitor 468. If the user's monitor 468 does not support HDMI, then HDMI-to-DVI (digital visual interface) may be used, but the audio will be lost. Under the HDMI standard, display capabilities (e.g., supported resolution, frame rate) 464 are expressed from the display device 468 and then this information is communicated back to the hosting service 210 via the internet connection 462 so the hosting service 210 can stream the compressed video in a format suitable for the display device.
Fig. 4d shows a home or office client device 475 that is the same as the home or office client device 465 shown in fig. 4c, except that the client device 475 has more external interfaces. Also, the client 475 may accept PoE to power, or it may tie up an external power adapter (not shown) plugged into a wall. Video camera 477 provides compressed video to client 475 using client 475USB input, the compressed video being uploaded by client 475 to hosting service 210 for use as described below. A low latency compressor will be created into the camera 477 utilizing the compression techniques described below.
In addition to having an ethernet connector for its internet connection, client 475 also has an 802.11g wireless interface to the internet. Both interfaces are capable of using NAT within a network that supports NAT.
Also, in addition to having an HDMI connector for outputting video and audio, the client 475 also has a dual-link DVI-I connector that includes an analog output (and has a standard adapter cable that will provide a VGA output). It also has analog outputs for composite video and S-video.
For audio, client 475 has left/right analog stereo RCA jacks, and for digital audio output it has TOSLINK (fiber optic) outputs.
In addition to the bluetooth wireless interface to the input device 479, it also has a USB jack for interfacing to the input device.
FIG. 4e shows one embodiment of the internal architecture of the client 465. All or some of the devices shown in this figure may be implemented in a field programmable logic array, a custom ASIC, or in several discrete devices (custom designed or off-the-shelf).
Ethernet with PoE 497 attaches to ethernet interface 481. Power 499 is derived from ethernet with PoE 497 and connected to the rest of the devices in client 465. Bus 480 is a common bus used for communication between devices.
Control CPU483 (almost any small CPU is suitable, such as MIPS R4000 series CPU at 100MHz with embedded RAM) executing a small client control application from flash memory 476 implements the protocol stack for the network (i.e., ethernet interface) and also communicates with the hosting service 210 and configures all devices in the client 465. It also handles the interface with the input devices 469 and sends the packets (along with the forward error correction protected user controller data, if necessary) back to the hosting service 210. Also, control CPU483 monitors the packet traffic (e.g., whether the packet is lost or delayed, and a timestamp of its arrival). This information is sent back to the hosting service 210 so that it can constantly monitor the network connection and adjust the content it sends accordingly. Flash memory 476 is initially loaded at the time of manufacture with a control program for control CPU483 and a serial number unique to the particular client 465 unit. This sequence number allows the hosting service 210 to uniquely identify the client 465 unit.
The bluetooth interface 484 wirelessly communicates via its antenna (internal to the client 465) to the input device 469.
Video decompressor 486 is a low-latency video decompressor configured to implement video decompression as described herein. A large number of video decompression devices exist, either off-the-shelf or as Intellectual Property (IP) with designs that can be integrated in FPGAs or custom ASICs. One company that provides IP for h.264 decoders is Ocean logic of Manly, new south wales, Australia (NSW Australia). The advantages of using IP are: the compression techniques used herein do not conform to the compression standard. Some standard decompressors are flexible enough to be configured to accommodate the compression techniques herein, but some standard decompressors may not. However, in the case of IP, there is full flexibility in redesigning the decompressor as needed.
The output of the video decompressor is coupled to a video output subsystem 487, which couples the video to the video output of the HDMI interface 490.
The audio decompression subsystem 488 is either implemented using an available standard audio decompressor, or it may be implemented as IP, or audio decompression may be implemented within a control processor 483 that may implement a Vorbis audio decompressor, for example.
The device implementing audio decompression is coupled to an audio output subsystem 489, the audio output subsystem 489 coupling audio to the audio output of the HDMI interface 490.
FIG. 4f shows one embodiment of the internal architecture of client 475. As can be seen, this architecture is the same as that of the client 465, except for the additional interface and optional external DC power from a power adapter plugged into the wall (and if so used, the optional external DC power replaces the power from the ethernet PoE 497). Common functionality with the client 465 will not be repeated below, but additional functionality will be described below.
CPU483 communicates with and configures additional devices.
WiFi subsystem 482 provides wireless internet access via its antenna as an alternative to ethernet 497. WiFi subsystems are available from a number of manufacturers, including atheros communications (atthelus communications, inc.) of santa clara, california.
The USB subsystem 485 provides an alternative to bluetooth communication for the wired USB input device 479. USB subsystems are fairly standard and readily available for FPGAs and ASICs, and are often built into off-the-shelf devices that perform other functions such as video decompression.
Video output subsystem 487 produces a wider range of video outputs than those within client 465. In addition to providing the HDMI490 video output, it provides DVI-I491, S-video 492, and composite video 493. Also, when the DVI-I491 interface is used for digital video, display capabilities 464 are passed back from the display device to the control CPU483 so that it can inform the hosting service 210 of the capabilities of the display device 478. All of the interfaces provided by the video output subsystem 487 are fairly standard interfaces and are readily available in many forms.
The audio output subsystem 489 outputs audio digitally via digital interface 494(S/PDIF and/or Toslink) and audio in analog form via stereo analog interface 495.
Round trip delay analysis
Of course, to the benefit of the previous paragraph, the round-trip delay between the user's action using the input device 421 and seeing the consequences of that action on the display device 420 should be no more than 70-80 milliseconds. This delay must take into account all factors in the path from the input device 421 in the user premises 211 to the hosting service 210 and back again to the user premises 211 to the display device 422. Fig. 4b illustrates various components and networks through which signals must travel, and above which is a timeline listing exemplary delays that may be expected in practical implementations. Note that fig. 4b is simplified so that only the important path routes are shown. Other routes of data for other features of the system are described below. Double-headed arrows (e.g., arrow 453) indicate round-trip delays and single-headed arrows (e.g., arrow 457) indicate one-way delays, and "-" represents an approximate measurement. It should be noted that there will be real world situations where the listed delays cannot be achieved, but in a large number of cases in the united states, using DSL and cable modem connections to the customer premises 211, the delays can be achieved in the situation described in the next paragraph. Also, note that while cellular wireless connectivity to the internet will indeed work in the system shown, most current U.S. cellular data systems (such as EVDO) incur very high latency and will not be able to achieve the latency shown in fig. 4 b. However, these basic principles may be implemented on future cellular technologies that may be capable of implementing this level of latency.
Starting from input device 421 at user premises 211, once the user actuates input device 421, user control signals are sent to client 415 (which may be a stand-alone device such as a set-top box, or which may be software or hardware executing in another device such as a PC or mobile device), and packetized (in UDP format in one embodiment) and given a destination address for the packets to reach hosting service 210. The packet will also contain information indicating from which user the control signal came. The control signal packets are then forwarded to WAN interface 442 via firewall/router/NAT (network address translation) device 443. WAN interface 442 is an interface device provided by a user's ISP (internet service provider) to user premises 211. WAN interface 442 may be a cable or DSL modem, a WiMax transceiver, a fiber optic transceiver, a cellular data interface, a power-line-over-power interface (powerline-over-powerline interface), or any other of a number of interfaces to the internet. Additionally, the firewall/router/NAT device 443 (and possibly the WAN interface 442) may be integrated into the client 415. One such example would be a mobile phone that includes software for implementing the functionality of the home or office client 415, as well as means for routing and connecting to the internet wirelessly via some standard (e.g., 802.11 g).
WAN interface 442, which is a facility that provides an interface between a WAN transporter connected to the user premises 211 and the general internet or private network, then routes control signals to what is referred to herein as a "point of presence" 441 for the user's Internet Service Provider (ISP). The nature of the point of presence will vary depending on the nature of the internet service provided. For DSL, it is typically the telephone company central office where the DSLAM is located. For a cable modem, it is typically a cable multi-system operator (MSO) headend. For cellular systems, it is typically a control room associated with a cellular tower. But whatever the nature of the point of presence, it will then route the control signal packets to the general internet 410. The control signal packets are then routed to the hosting service 210 via the interface that will most likely be the fiber optic transceiver interface to the WAN interface 444. WAN 444 will then route the control signal packets to routing logic 409 (which may be implemented in many different ways, including an ethernet switch and a routing server), which routing logic 409 evaluates the address of the user and routes the control signals to the correct server 402 for the given user.
The server 402 then treats the control signals as input to the game or application software executing on the server 402 and uses the control signals to process the next frame of the game or application. Once the next frame is generated, the video and audio are output from the server 402 to the video compressor 404. Video and audio may be output from the server 402 to the compressor 404 via a variety of means. First, the compressor 404 may be created into the server 402, so compression may be implemented locally within the server 402. Alternatively, the video and/or audio may be output in packetized form via a network connection (such as an ethernet connection) to a network (which is either a private network between the server 402 and the video compressor 404 or a network via a shared network such as the SAN 403). Alternatively, video may be output from the server 402 via a video output connector (such as a DVI or VGA connector) and then captured by the video compressor 404. Also, the audio may be output from the server 402 as digital audio (e.g., via a TOSLINK or S/PDIF connector) or analog audio, which is digitized and encoded by audio compression logic within the video compressor 404.
Once the video compressor 404 has captured the video frames from the server 402 and the audio generated during that frame time, the video compressor will compress the video and audio using the techniques described below. Once the video and audio are compressed, they are packetized by an address to send them back to the user's client 415 and routed to WAN interface 444, WAN interface 444 then routes the video and audio packets through the general internet 410, general internet 410 then routes the video and audio packets to the user's ISP's point of presence 441, point of presence 441 routes the video and audio packets to WAN interface 442 at the user's premises, WAN interface 442 routes the video and audio packets to firewall/router/NAT device 443, which then routes the video and audio packets to client 415.
Client 415 decompresses the video and audio and then displays the video on display device 422 (or the client's built-in display device) and sends the audio to display device 422 or to a separate amplifier/speaker or to an amplifier/speaker created into the client.
To make the user feel that the entire process just described is perceptually lag free, the round trip delay needs to be less than 70 milliseconds or 80 milliseconds. Some of the latency delays in the described round trip paths are controlled by the hosting service 210 and/or the user, while other latency delays are not controlled by the hosting service 210 and/or the user. Nevertheless, based on analysis and testing of a large number of real-world situations, the following is an approximate measurement.
The one-way transmission time 451 for sending control signals is typically less than 1 millisecond, and the round-trip routing 452 via the customer premises is typically completed in about 1 millisecond using an easily available consumer-level firewall/router/NAT switch on the ethernet. The customer ISP varies its round trip delay 453 extensively, but in the case of DSL and cable modem providers, it is typically seen to be between 10 and 25 milliseconds. The round trip delay on the general internet 410 can vary greatly in how traffic is routed and whether there are any failures in the routing (and the problem is discussed below), but typically the general internet provides reasonably optimal routing and the delay is largely determined by the speed of light through the fiber (given the distance to the destination). As discussed further below, 1000 miles has been determined as the approximate furthest distance that is expected to place the hosting service 210 away from the user premises 211. At 1000 miles (2000 miles of round trip), the actual transmission time for the signal over the internet is about 22 milliseconds. The WAN interface 444 to the hosting service 210 is typically a commercial grade fiber optic high speed interface with negligible latency. Thus, the universal internet latency 454 is typically between 1 millisecond and 10 milliseconds. The one-way routing 455 latency through the hosting service 210 can be achieved in less than 1 millisecond. The server 402 will typically calculate a new frame for the game or application in less than one frame time (which is 16.7 milliseconds at 60 fps), so 16 milliseconds is a reasonable maximum one-way delay 456 to be used. In the optimal hardware implementation of the video compression and audio compression algorithms described herein, the compression 457 can be completed in 1 millisecond. In a sub-optimal version, compression may take as much as 6 milliseconds (of course, a less optimal version may take longer, but the implementation will affect the overall latency of the round trip and will require the other latency to be shorter (e.g., the allowable distance over the general internet may be reduced) to maintain the 70-80 millisecond latency target). The round trip delays of the internet 454, user ISP453 and user premises routing 452 have been considered, so the remainder is a video decompression 458 delay, the video decompression 458 delay varying depending on whether the video decompression 458 is implemented in dedicated hardware or in software on the client device 415 (such as a PC or mobile device), depending on the size of the display and the performance of the decompression CPU. Typically, decompression 458 takes between 1 and 8 milliseconds.
Thus, the worst-case round-trip delay that a user of the system shown in FIG. 4a may expect to experience can be determined by adding all of the worst-case delays seen in practice. They are: 1+1+25+22+1+16+6+8 equals 80 ms. Furthermore, in practice (with the explanations discussed below to prevent misunderstandings), this is roughly the round-trip latency seen using a prototype version of the system shown in fig. 4a (using an off-the-shelf Windows PC as client device and home DSL and cable modem connection in the united states). Of course, the conditions that are better than worst-case can result in much shorter delays, but cannot be relied upon to develop widely used commercial services.
To achieve the latency listed in fig. 4b via the general internet, it is desirable for the video compressor 404 and video decompressor 412 in the client 415 (from fig. 4a) to generate packet streams with very specific characteristics so that the packet sequence generated via the entire path from the host service 210 to the display device 422 is not subject to delay or excessive packet loss, and in particular, consistently falls within the constraints of the bandwidth available to the user via the user's internet connection (via WAN interface 442 and firewall/router/NAT 433). In addition, the video compressor must produce a packet stream that is robust enough so that it can tolerate the inevitable packet loss and packet reordering that occurs in normal internet and network transmissions.
Low latency video compression
To accomplish the above objectives, one embodiment employs a new video compression method that reduces the latency and peak bandwidth requirements for transmitting video. Before describing this embodiment, an analysis of the current video compression technique will be provided with respect to fig. 5 and 6 a-6 b. Of course, the technique may be used in accordance with basic principles if the user has sufficient bandwidth to handle the data rates required by the technique. Note that audio compression is not addressed herein, but rather it is stated that audio compression is implemented simultaneously and synchronously with video compression. Prior art audio compression techniques exist that meet the requirements for this system.
Fig. 5 illustrates one particular prior art technique for compressing video, where each individual video frame 501- > 503 is compressed by compression logic 520 using a particular compression algorithm to produce a series of compressed frames 511- > 513. One embodiment of this technique is "motion JPEG," in which each frame is compressed based on the Discrete Cosine Transform (DCT) according to a Joint Photographic Experts Group (JPEG) compression algorithm. Various different types of compression algorithms may be used, however, still adhering to this basic principle (e.g., wavelet-based compression algorithms such as JPEG-2000).
One problem with this type of compression is that: which reduces the data rate of each frame, but which does not take advantage of similarities between successive frames to reduce the data rate of the overall video stream. For example, as illustrated in fig. 5, assuming a frame rate of 640 × 480 × 24 bits/pixel 640 × 480 × 24/8/1024-900 kilobytes/frame (KB/frame), motion JPEG may only compress 1/10 the stream for a given quality image, resulting in a 90 KB/frame data stream. At 60 frames/sec, this would require a channel bandwidth of 90KB 8 bits 60 frames/sec 42.2Mbps, which would be an extremely high bandwidth for almost all home internet connections in the united states today and an excessively high bandwidth for many office internet connections. In fact, assuming it requires a constant data flow with such high bandwidth, and it will serve only one user, it will consume a large percentage of the bandwidth of a 100Mbps ethernet LAN and the burdensome ethernet switches supporting the LAN, even in an office LAN environment. Thus, compression for motion video is inefficient when compared to other compression techniques, such as those described below. Furthermore, single frame compression algorithms that use lossy compression algorithms, such as JPEG and JPEG-2000, produce compression artifacts that may not be noticeable in still images (e.g., artifacts within dense leaves in a scene may not appear as artifacts, because the eye does not know exactly how the dense leaves should appear). However, once the scene is in motion, artifacts may be prominent because the eye detects artifacts that change from frame to frame, even though the artifacts are in a region of the scene where they may not be noticeable in the still image. This results in the perception of "background noise" in the sequence of frames, which has an appearance similar to the "snowflake" noise visible during edge-simulated TV reception. Of course, this type of compression may still be used in the particular embodiments described herein, but in general, to avoid background noise in the scene, a high data rate (i.e., a low compression ratio) is required for a given perceptual quality.
Other types of compression, such as h.264, or Windows media VC9, MPEG2, and MPEG4, are all more efficient in compressing video streams because they exploit similarities between successive frames. These techniques all rely on the same general techniques for compressing video. Thus, although the h.264 standard will be described, the same general principles apply to various other compression algorithms. A large number of h.264 compressors and decompressors are available, including the x264 open source software library for compressing h.264 and the FFmpeg (a video and audio streaming scheme) open source software library for decompressing h.264.
FIGS. 6a and 6b illustrate exemplary prior art compression techniques in which a series of uncompressed video frames 501-503, 559-561 are compressed by compression logic 620 into a series of "I-frames" 611, 671; "P frame" 612- "613; and "B frame" 670. The vertical axis in fig. 6a generally represents the resulting size of each of the encoded frames (although the frames are not drawn to scale). As described above, video coding using I-frames, B-frames, and P-frames is well understood by those skilled in the art. In short, the I-frame 611 is a DCT-based compression of the completely uncompressed frame 501 (similar to a compressed JPEG image as described above). The size of P frames 612 and 613 is typically significantly smaller than the size of I frame 611 because it utilizes data in the previous I or P frames; that is, it contains data indicating a change between previous I-frames or P-frames. B frame 670 is similar to a P frame, except that it uses frames in subsequent reference frames and, possibly, frames in previous reference frames.
For the following discussion, it will be assumed that the desired frame rate is 60 frames/second, each I-frame is about 160Kb, the average P-frame and B-frame is 16Kb, and a new I-frame is generated every second. Under this set of parameters, the average data rate will be: 160Kb +16Kb 59 ═ 1.1 Mbps. This data rate falls well within the maximum data rate for many current broadband internet connections to homes and offices. This technique also tends to avoid the background noise problem from intra-only coding, since P-frames and B-frames track the difference between frames, so compression artifacts tend not to appear and disappear from frame to frame, reducing the background noise problem described above.
One problem with the above type of compression is that: although the average data rate is relatively low (e.g., 1.1Mbps), a single I-frame may take several frame times to transmit. For example, using prior art, a 2.2Mbps network connection (e.g., DSL or cable modem with a peak of 2.2Mbps from the maximum available data rate 302 of fig. 3 a) would typically be sufficient to stream video at 1.1Mbps, one 160Kbps I frame every 60 frames. This would be done by having the decompressor queue 1 second of video before decompressing the video. Within 1 second, 1.1Mb of data will be transmitted, which will be easily accommodated by the 2.2Mbps maximum available data rate, even assuming that the available data rate may periodically drop by as much as 50%. Unfortunately, this prior art approach would result in a 1 second delay of the video due to the 1 second video buffering at the receiver. Such a delay is sufficient for many prior art applications (e.g., playback of linear video), but is an extremely long latency for fast-action video games that cannot tolerate latencies greater than 70-80 milliseconds.
If an attempt were made to eliminate the 1 second video buffer, it would still not result in a sufficient latency reduction for the fast-action video game. For example, as previously described, the use of B frames would require the receipt of all B frames preceding an I frame as well as the I frame. If it is assumed that 59 non-I frames are roughly split between P and B frames, there will be at least 29B frames and any previously received I frame before the B frame can be displayed. Thus, regardless of the available bandwidth of the channel, a delay of 1/60 seconds duration, or 500 milliseconds delay, is required for 29+1 ═ 30 frames each. Obviously, this time is extremely long.
Thus, another approach would be to eliminate B frames and use only I and P frames. (one consequence of this is that for a given level of quality, the data rate will increase, but for consistency in this example, it continues to be assumed that each I frame is 160Kb in size and the average P frame is 16Kb in size, and thus the data rate is still 1.1 Mbps). This approach eliminates the inevitable delay introduced by B frames because the decoding of each P frame depends only on the previously received frame. The method still has the problems that: i frames are much larger than average P frames so that on low bandwidth channels (as is typical in most homes and many offices), the transmission of I frames adds substantial latency. This is illustrated in fig. 6 b. The video stream data rate 624 is lower than the available maximum data rate 621 (except for I-frames), where the peak data rate 623 required for I-frames far exceeds the available maximum data rate 622 (and even exceeds the nominal maximum data rate 621). The required data rate for P frames is less than the maximum data rate available. Even if the available maximum data rate peak at 2.2Mbps remains stable at its 2.2Mbps peak rate, it will take 160Kb/2.2Mb 71 milliseconds to transmit the I-frame, and if the available maximum data rate 622 drops by 50% (1.1Mbps), it will take 142 milliseconds to transmit the I-frame. Thus, the delay in transmitting an I-frame will fall somewhere between 71 and 142 milliseconds. This delay adds to the delay identified in fig. 4b (which in the worst case amounts to 70 milliseconds), and therefore, this will result in a total round trip delay of 141 + 222 milliseconds from the moment the user actuates the input device 421 until the image is presented on the display device 422, which is extremely high. And if the available maximum data rate drops below 2.2Mbps, the delay will increase further.
It is also noted that there are often serious consequences to "jam" the ISP at a peak data rate 623 that far exceeds the available data rate 622. The equipment in different ISPs will behave differently, but when receiving packets at much higher data rates than the available data rate 622, the following behavior is quite common among DSL and cable modem ISPs: (a) delaying packets by queuing them (introducing latency), (b) dropping some or all packets, (c) disabling the connection for a period of time (most likely because the ISP is concerned that it is a malicious attack, such as a "denial of service" attack). Therefore, transmitting a packet stream at full data rate (with characteristics such as those shown in fig. 6 b) is not a viable option. The peaks 623 may be queued at the hosting service 210 and sent at a data rate that is lower than the maximum data rate available, introducing unacceptable latency as described in the previous paragraph.
Additionally, the video stream data rate sequence 624 shown in fig. 6b is a very "tame" video stream data rate sequence, and would be that data rate sequence that is expected to result from compressing video from the video sequence that does not change much and has very little motion (e.g., as would be common in a video teleconference where the camera is in a fixed position and has very little motion, and objects in the scene (e.g., a seated person talking) show less motion).
The video stream data rate sequence 634 shown in fig. 6c is a typical sequence that is expected to be visible from video with much more motion, such as might be produced in a movie or video game, or in some application software. Note that in addition to I frame peak 633, there are also P frame peaks (such as 635 and 636) that are quite large and in many instances exceed the available maximum data rate. Although the P frame peaks are not quite as large as the I frame peaks, they are still too large to be carried by the channel at full data rate, and as with the I frame peaks, the P frame peaks must be transmitted slowly (thereby increasing latency).
On a high bandwidth channel (e.g., a 100Mbps LAN, or a high bandwidth 100Mbps private connection), the network will be able to tolerate large peaks such as I-frame peak 633 or P-frame peak 636, but in principle, low latency can be maintained. However, such networks are often shared among many users (e.g., in an office environment), and this "peaky" data will impact the performance of the LAN, especially if the network communications are routed to a private shared connection (e.g., from a remote data center to an office). First, remember that this example is an example of a relatively low resolution video stream of 640 × 480 pixels at 60 fps. 1920 x 1080 HDTV streams at 60fps are readily processed by modern computers and displays, and 2560 x 1440 resolution displays at 60fps are increasingly available (e.g., Apple corporation's 30 "displays). With h.264 compression, a 1920 × 1080 high motion video sequence at 60fps may require 4.5Mbps to achieve a reasonable quality level. If the I frame peak is assumed to be 10 times the nominal data rate, it will produce a 45Mbps peak, and a smaller but still significant P frame peak. If several users are receiving video streams on the same 100Mbps network (e.g., a private network connection between an office and a data center), it is easy to see how the peaks of the video streams from several users may happen to align, overwhelming the bandwidth of the network and possibly overwhelming the bandwidth of the backplane of the switch supporting the users on the network. Even in the case of ultra-high speed ethernet, if enough users have enough peaks aligned at the same time, they can overwhelm the network or network switch. Furthermore, once 2560 x 1440 resolution video becomes more common, the average video stream data rate may be 9.5Mbps, resulting in perhaps a 95Mbps peak data rate. Needless to say, a 100Mbps connection between a data center and an office (which is an exceptionally fast connection today) will be completely overwhelmed by peak communications from a single user. Thus, even though LANs and private network connections may be more tolerant of peaky streaming video, streaming video with high peaks is not needed and may require special planning and adaptation by the IT department of the office.
Of course, for standard linear video applications, this problem is not an issue because the data rate is "smoothed" at the transmission point and the data for each frame is below the maximum available data rate 622, and a buffer in the client stores the sequence of I, P, and B frames before decompressing the sequence. Thus, the data rate on the network remains close to the average data rate of the video stream. Unfortunately, this introduces latency, which is unacceptable for low latency applications such as video games and applications that require fast response times, even if B-frames are not used.
One prior art solution for mitigating video streams with high peaks is to use a technique often referred to as "constant bit rate" (CBR) coding. Although the term CBR appears to imply that all frames are compressed to have the same bit rate (i.e., size), it is often referred to as a compression paradigm in which a maximum bit rate is allowed across a certain number of frames (in our case, 1 frame). For example, in the case of fig. 6c, if a CBR constraint is imposed on the encoding that limits the bit rate to, for example, 70% of the nominal maximum data rate 621, the compression algorithm will limit the compression of each of the frames such that any frame that would normally be compressed using more than 70% of the nominal maximum data rate 621 will be compressed with fewer bits. This result is: the frame that would normally require more bits to maintain a given level of quality would be "starved" of bits and the image quality of the frame would be worse than that of other frames that do not require more bits than 70% of the nominal maximum data rate 621. This approach may produce acceptable results for a particular type of compressed video where (a) less motion or scene change is expected and (b) the user may accept periodic quality degradation. A good example of an application that is suitable for CBR is video teleconferencing, as there are fewer peaks, and most users are acceptable in the event of temporary degradation in quality (e.g., if the camera is panned, resulting in significant scene motion and large peaks, there may not be enough bits during panning for high quality image compression that would result in degraded image quality). Unfortunately, CBR is not well suited for scenes with high complexity or large amounts of motion and/or many other applications requiring reasonably constant quality levels.
The low-latency compression logic 404 used in one embodiment uses a number of different techniques to address many of the problems of streaming low-latency compressed video while maintaining high quality. First, the low-latency compression logic 404 generates only I-frames and P-frames, thereby alleviating the need to wait several frame times to decode each B-frame. In addition, as illustrated in FIG. 7a, in one embodiment, the low-latency compression logic 404 subdivides each uncompressed frame 701-760 into a series of "tiles" and individually encodes each tile as an I-frame or a P-frame. The group of compressed I and P frames is referred to herein as an "R frame" 711-770. In the particular example shown in fig. 7a, each uncompressed frame is subdivided into 16 image blocks of a 4 x 4 matrix. However, the underlying principles are not limited to any particular subdivision mechanism.
In an embodiment, low latency compression logic 404 divides a video frame into a number of tiles, and encodes (i.e., compresses) one tile from each frame into an I-frame (i.e., compresses the tile as if it were a separate video frame of 1/16 size of a full image, and the compression for this "mini" frame is an I-frame compression) and the remaining tiles into P-frames (i.e., the compression for each "mini" 1/16 frame is a P-frame compression). Tiles compressed into I frames and tiles compressed into P frames will be referred to as "I tiles" and "P tiles," respectively. The image blocks to be encoded as I image blocks are changed with each successive video frame. Thus, in a given frame time, only one of the tiles in a video frame is an I tile, and the remainder of the tiles are P tiles. For example, in FIG. 7a, tile 0 of the uncompressed frame 701 is encoded as an I tile I0 and the remaining 1-15 tiles are encoded as P tiles (P1-P15) to generate an R frame 711. In the next uncompressed video frame 702, tile 1 of uncompressed frame 701 is encoded as I tile I1 and the remaining tiles 0 and 2 through 15 are encoded as P tiles (P0, and P2 through P15) to generate R frame 712. Thus, the I tiles and P tiles for a tile are progressively interleaved in time over successive frames. This process continues until an R image block 770 is produced, the last image block in the matrix being encoded as an I image block (i.e., I15). The process then restarts, producing another R frame, such as frame 711 (i.e., encoding the I tile for tile 0). Although not illustrated in fig. 7a, in one embodiment, the first R frame of the video sequence of R frames contains only I picture blocks (i.e., such that subsequent P frames have reference picture data (from which motion is calculated)). Alternatively, in an embodiment, the startup sequence uses the same I tile pattern as normal, but does not include P tiles for the tiles that have not yet been encoded along with the I tiles. In other words, certain tiles are not encoded along with any data until the first I tile arrives, thereby avoiding startup peaks in the video stream data rate 934 in fig. 9a, which is described in further detail below. Moreover, as described below, a variety of different sizes and shapes may be used for the tiles while still adhering to the underlying principles.
Video decompression logic 412 executing on client 415 decompresses each tile as if it were a separate video sequence of small I and P frames, and then renders each tile to a frame buffer that drives display device 422. For example, tile 0 of the video image is decompressed and reproduced using I0 and P0 from R frames 711-770. Similarly, tile 1 is reconstructed using I1 and P1 from R frames 711 to 770, and so on. As described above, decompression of I-frames and P-frames is well known in the art, and decompression of I tiles and P tiles may be accomplished by having multiple executions of the video decompressor execute on the client 415. Although the multiplication process appears to increase the computational burden on client 415, it does not actually increase the computational burden on client 415 because the tiles themselves are proportionally smaller (relative to the number of additional processing), so the number of pixels displayed is the same as if there was one processing and using the traditional full-size I and P frames.
This R-frame technique significantly mitigates the bandwidth peaks typically associated with I-frames (illustrated in fig. 6b and 6 c) because any given frame is predominantly composed of P-frames, which are typically smaller than I-frames. For example, again assuming a typical I frame is 160Kb, the I tile for each of the frames illustrated in FIG. 7a would be approximately 1/16 or 10Kb of that quantity. Similarly, assuming a typical P frame is 16Kb, the P frame for each of the image blocks illustrated in FIG. 7a may be approximately 1 Kb. The final result is an R frame of about 10Kb +15 × 1Kb to 25 Kb. Thus, each 60 frame sequence would be 25Kb by 60 ═ 1.5 Mbps. Thus, at 60 frames/second, this would require a channel capable of maintaining a bandwidth of 1.5Mbps, but with much lower peaks because the I picture blocks are spread throughout the 60 frame interval.
Note that in the previous example, with the same assumed data rate for I and P frames, the average data rate is 1.1 Mbps. This is because in the previous example, only a new I-frame was introduced every 60 frame times, whereas in this example the 16 tiles that make up the I-frame cycle through 16 frame times, and thus the equivalent of an I-frame was introduced every 16 frame times, resulting in a slightly higher average data rate. However, in practice, introducing more frequent I-frames does not linearly increase the data rate. This is due to the fact that: a P frame (or P tile) encodes mainly the difference from the previous frame to the next frame. Thus, if the previous frame is quite similar to the next frame, the P-frame will be very small, and if the previous frame is quite different from the next frame, the P-frame will be very large. But because P-frames are derived largely from previous frames, not from actual frames, the resulting encoded frames may contain more errors (e.g., visual artifacts) than I-frames with a sufficient number of bits. In addition, error accumulation can occur when one P frame follows another P frame (getting worse when there are long sequences of P frames). Now, the sophisticated video compressor will detect the fact that the quality of the image degrades after a sequence of P frames, and if necessary, it will allocate more bits to the following P frames to improve the quality, or if it is the most efficient course of action, replace the P frames with I frames. Thus, when a long P frame sequence is used (e.g., 59P frames, as in the previous example above), in particular when the scene has a lot of complexity and/or motion, typically more bits are needed for the P frames (as they become farther away from the I frame).
Alternatively, a P frame that closely follows an I frame tends to require fewer bits from the opposite viewpoint than a P frame that is further away from the I frame. Thus, in the example shown in fig. 7a, no P-frame is spaced further (before) than 15 frames from the I-frame, whereas in the previous example, a P-frame may be spaced 59 frames from the I-frame. Thus, in the case of more frequent I frames, P frames are smaller. Of course, the exact relative size will vary based on the nature of the video stream, but in the example of fig. 7a, if an I tile is 10Kb, then the size of the P tiles may be on average only 0.75Kb, resulting in 10Kb +15 x 0.75Kb 21.25Kb, or at 60 frames/sec the data rate will be 21.25Kb 60 Mbps 1.3Mbps, or about 16% higher than that of a stream with I frames followed by 59P frames at 1.1 Mbps. Again, the relative results between the two methods for video compression will vary depending on the video sequence, but in general we have empirically found that for a given level of quality, about 20% more bits are required to use R frames than to use I/P frame sequences. Of course, however, the R frames drastically reduce the peak values, which makes the video sequence available at a much smaller delay than the sequence of I/P frames.
The R-frame may be configured in a number of different ways depending on the nature of the video sequence, the reliability of the channel, and the available data rate. In an alternative embodiment, a number of image blocks other than 16 are used in a 4 x 4 configuration. For example, 2 tiles may be used in a 2 × 1 or 1 × 2 configuration, 4 tiles may be used in a 2 × 2, 4 × 1 or 1 × 4 configuration, 6 tiles may be used in a 3 × 2, 2 × 3, 6 × 1 or 1 × 6 configuration or 8 tiles may be used in a 4 × 2 (as shown in fig. 7 b), 2 × 4, 8 × 1 or 1 × 8 configuration. Note that the image blocks need not be square, nor do the video frames need to be square, or even rectangular. The image blocks can be broken down into whatever shape best suits the video stream and application being used.
In another embodiment, the loop for I tiles and P tiles is not locked to the number of tiles. For example, in an 8 tile 4 x 2 configuration, a 16-cycle sequence may still be used as illustrated in fig. 7 b. Sequential uncompressed frames 721, 722, 723 are each divided into 8 tiles 0-7, and each tile is individually compressed. From R frame 731, only tile 0 is compressed as an I tile, and the remaining tiles are compressed as P tiles. For the subsequent R frame 732, all 8 tiles are compressed as P tiles, and then for the subsequent R frame 733, tile 1 is compressed as an I tile and the other tiles are all compressed as P tiles. Further, the ordering continues for 16 frames as such, with only every other frame generating an I tile, thus generating the last I tile for tile 7 during the 15 th frame time (not shown in FIG. 7 b) and during the 16 th frame time (R frame 780 compressed using all P tiles). Then, the sequence starts again with tile 0 compressed as an I tile and the other tiles compressed as P tiles. As in the previous embodiment, the first frame of the entire video sequence will typically be an I tile to provide a reference for P tiles from that point onwards. The cycles of I tiles and P tiles need not even be even multiples of the number of tiles. For example, in the case of 8 tiles, each frame with one I tile may be followed by 2 frames all of P tiles before another I tile is used. In yet another embodiment, if, for example, a particular area of a screen is known to have more motion (requiring more frequent I tiles) while other areas are more static (e.g., displaying a score of a game) (requiring less frequent I tiles), then the particular tile may be ordered along with the I tiles more often than other tiles. Furthermore, although each frame is illustrated in fig. 7 a-7 b as having a single I tile, multiple I tiles may be encoded in a single frame (depending on the bandwidth of the transmission channel). Conversely, a particular frame or sequence of frames may be transmitted without an I tile (i.e., only a P tile).
The reason why the method of the previous paragraph works properly is that: while not having I tiles scattered across each single frame appears to result in large peaks, the behavior of the system is not as simple. Because each tile is compressed separately from other tiles, the encoding of each tile may become less efficient as the tiles become smaller because the compressor of a given tile is unable to take advantage of similar image features and similar motion from other tiles. Thus, dividing the screen into 16 tiles will generally result in less efficient encoding than dividing the screen into 8 tiles. However, if the screen is divided into 8 tiles and it causes data of one full I-frame to be introduced every 8 frames (rather than every 16 frames), it results in a much higher data rate overall. Thus, by introducing one full I-frame every 16 frames (rather than every 8 frames), the overall data rate is reduced. Also, by using 8 larger tiles (instead of 16 smaller tiles), the overall data rate is reduced, which also mitigates data peaks caused by larger tiles to some extent.
In another embodiment, the low-latency video compression logic 404 in fig. 7a and 7b automatically controls the allocation of bits to the tiles in the R frame by setting a pre-configuration based on known characteristics of the video sequence to be compressed or based on an ongoing analysis of the image quality in each tile. For example, in some racing video games, the front of a player's car (which is relatively motionless in the scene) occupies a large portion of the lower half of the screen, while the upper half of the screen is completely filled with approaching roads, buildings, and scenery, which is almost always in motion. If the compression logic 404 allocates an equal number of bits to each tile, the tiles in the lower half of the screen in uncompressed frame 721 in FIG. 7b (tiles 4-7) will typically be compressed with a higher quality than the tiles in the upper half of the screen in uncompressed frame 721 in FIG. 7b (tiles 0-3). If the particular game or this particular scene of the game is known to have the characteristics, the operator of the hosting service 210 may configure the compression logic 404 to allocate more bits to tiles at the top of the screen (as compared to the bits allocated to tiles at the bottom of the screen). Alternatively, compression logic 404 may estimate the compression quality of the tiles after compressing the frame (using one or more of a number of compression quality metrics, such as peak signal-to-noise ratio (PSNR)), and if it is determined that a particular tile consistently produces better quality results over a particular time window, it gradually allocates more bits to tiles that produce lower quality results until the various tiles reach a similar level of quality. In an alternative embodiment, the compressor logic 404 allocates bits to achieve higher quality in a particular image block or group of image blocks. For example, it may provide a better overall perceived appearance to have a higher quality at the center of the screen than at the edges.
In one embodiment, to improve the resolution of particular regions of the video stream, the video compression logic 404 encodes regions of the video stream having relatively more scene complexity and/or motion using smaller image blocks (as compared to regions of the video stream having relatively less scene complexity and/or motion). For example, as illustrated in fig. 8, smaller tiles are used around a moving character 805 in an area of one R frame 811, possibly followed by a series of R frames (not shown) having the same tile size. Then, when the person 805 moves to a new area of the image, smaller image blocks are used around this new area within another R frame 812, as illustrated. As described above, a variety of different sizes and shapes may be used as "tiles" while still complying with the underlying principles.
Although the cyclic I/P tiling described above substantially reduces peaks in the data rate of a video stream, it does not completely eliminate peaks, especially in the case of rapidly changing or highly complex video images, such as occur under movies, video games, and some application software. For example, during an abrupt scene transition, a complex frame may be followed by another complex frame that is completely different. Even though several I tiles may precede a scene transition by only a few frame times, it does not help in this situation because the material of the new frame is independent of the previous I tiles. In this case (and in other cases where a large number of pictures change, if not all of them), video compressor 404 will determine that many, if not all, P tiles are more efficiently encoded as I tiles, and that a very large peak in the data rate of the frame results.
As previously discussed, it is only the case for most consumer grade internet connections (and many office connections) that it cannot "jam" data beyond the available maximum data rate shown as 622 in fig. 6c and the nominal maximum data rate 621. Note that the nominal maximum data rate 621 (e.g., "6 Mbps DSL") is essentially a sales figure for a user considering purchasing an internet connection, but typically it does not guarantee a performance level. For the purpose of this application it is irrelevant, as we are only concerned with the maximum data rate 622 available when streaming video over a connection. Thus, in fig. 9a and 9c, when describing a solution to the peak problem, the nominal maximum data rate is omitted from the graph and only the available maximum data rate 922 is shown. The video stream data rate must not exceed the available maximum data rate 922.
To address this issue, the first thing the video compressor 404 does is to determine the peak data rate 941, which is the data rate that the channel can handle stably. The rate may be determined by a number of techniques. One such technique is to gradually send increasingly higher data rate test flows from the host service 210 to the client 415 (in fig. 4a and 4 b), and have the client provide feedback to the host service regarding the level of packet loss and latency. When packet loss and/or delay onset show a sharp increase, it is an indication that the maximum data rate 922 available is reached. The hosting service 210 may then gradually reduce the data rate of the test flow until the client 415 reports that the test flow has been received in a reasonable period of time (acceptable levels of packet loss and latency are near a minimum). This determines the peak maximum data rate 941, which will then be used as the peak data rate for the streaming video. Over time, the peak data rate 941 will fluctuate (e.g., if another user in the home begins to heavily use the internet connection), and the client 415 will need to constantly monitor the peak data rate 941 to see if packet loss or latency increases (indicating that the available maximum data rate 922 falls below the previously determined peak data rate 941), and if so, the peak data rate 941. Similarly, if over time, the client 415 finds that packet loss and latency remain at an optimal level, it may request that the video compressor slowly increase the data rate to see if the available maximum data rate increases (e.g., if another user in the home has stopped heavy use of the internet connection), and wait again until packet loss and/or higher latency indicates that the available maximum data rate 922 has been exceeded, and may again find a lower level for the peak data rate 941, but that lower level may be higher than before testing for increased data rate. Thus, the peak data rate 941 can be found using this technique (and other techniques like it) and adjusted periodically as needed. The peak data rate 941 determines the maximum data rate that can be used by the video compressor 404 to stream video to the user. The logic for determining the peak data rate may be implemented at the user premises 211 and/or on the hosting service 210. At user premises 211, client device 415 performs calculations to determine the peak data rate and transmits this information back to hosting service 210; at the hosting service 210, the server 402 at the hosting service performs calculations to determine a peak data rate based on statistical data (e.g., peak loss, latency, maximum data rate, etc.) received from the client 415.
FIG. 9a shows an example video stream data rate 934 with substantial scene complexity and/or motion, which is generated using the cyclic I/P tile compression techniques previously described and illustrated in FIG. 7a, FIG. 7b, and FIG. 8. The video compressor 404 is configured to output compressed video at an average data rate that is lower than the peak data rate 941, and note that most of the time, the video stream data rate remains lower than the peak data rate 941. Comparison of the data rate 934 with the video stream data rate 634 shown in FIG. 6c (which is generated using I/P/B or I/P frames) shows that the cyclic I/P tile compression produces a much smoother data rate. But at frame 2 peak 952 (which is close to 2 peak data rate 942) and frame 4 peak 954 (which is close to 4 peak data rate 944), the data rate still exceeds peak data rate 941, which is unacceptable. In practice, even for high motion video from a fast changing video game, peaks in excess of the peak data rate 941 occur in less than 2% of the frames, peaks in excess of 2 times the peak data rate 942 occur rarely, and peaks in excess of 3 times the peak data rate 943 occur rarely. However, when it does occur (e.g., during a scene transition), the data rate it requires must produce good quality video images.
One way to address this problem is to simply configure the video compressor 404 so that its maximum data rate output is the peak data rate 941. Unfortunately, the resulting video output quality during peak frames is poor because the compression algorithm is "starving" of bits. What results is that compression artifacts occur when there is a sudden transition or fast motion, and in time, the user begins to recognize: artifacts always appear suddenly when there is a sudden change or rapid motion, and they can become quite annoying.
Although the human visual system is quite sensitive to visual artifacts that occur during sudden changes or rapid movements, it is not very sensitive to detecting a reduction in frame rate in such situations. In fact, when the sudden change occurs, it appears that the human visual system is focused on tracking the change, and if the frame rate drops temporarily from 60fps to 30fps and then immediately returns to 60fps, the human visual system will not notice. Furthermore, in the case of very sharp transitions (such as sudden scene changes), the human visual system does not notice if the frame rate drops to 20fps or even 15fps and then immediately returns to 60 fps. As long as the frame rate reduction only occurs occasionally, it appears to the human observer that the video is constantly performing at 60 fps.
This characteristic of the human visual system is exploited by the technique illustrated in fig. 9 b. The server 402 (from fig. 4a and 4b) produces an uncompressed video output stream at a steady frame rate (in one embodiment, at 60 fps). The timeline displays each 1/60 seconds per frame 961-970 output. Starting at frame 961, each uncompressed video frame is output to low-latency video compressor 404, which low-latency video compressor 404 compresses the frame in less than one frame time, resulting in compressed frame 1981 for the first frame. The data generated for compressed frame 1981 may be larger or smaller depending on many factors as previously described. If the data is small enough that it can be transmitted to the client 415 at or below the peak data rate 941 in one frame time (1/60 seconds), it is transmitted during transmit time (xmit time) 991 (the length of the arrow indicating the duration of the transmit time). In the next frame time, the server 402 generates an uncompressed frame 2962, compresses it into a compressed frame 2982, and transmits it to the client 415 at the peak data rate 941 during a transmission time 992 that is less than one frame time.
Next, in the next frame time, the server 402 generates uncompressed frame 3963. When uncompressed frame 3963 is compressed by video compressor 404, the resulting compressed frame 3983 is more data than can be transmitted in one frame time at peak data rate 941. Thus, it is transmitted during transmission time (2 times peak) 993, which occupies all of the frame time and a portion of the next frame time. Now, during the next frame time, the server 402 generates another uncompressed frame 4964 and outputs it to the video compressor 404, but the data is ignored and illustrated by 974. This is because the video compressor 404 is configured to ignore other uncompressed video frames that arrive while it is still transmitting the previously compressed frame. Of course, the video decompressor of client 415 will fail to receive frame 4, but it simply continues to display frame 3 on display device 422 for 2 frame times (i.e., momentarily reduces the frame rate from 60fps to 30 fps).
For the next frame 5, the server 402 outputs an uncompressed frame 5965, compresses it into a compressed frame 5985 and transmits it within 1 frame during a transmission time 995. The video decompressor of client 415 decompresses frame 5 and displays it on display device 422. The server 402 then outputs an uncompressed frame 6966, which the video compressor 404 compresses into a compressed frame 6986, but at this point the resulting data is very large. The compressed frame is transmitted at the peak data rate 941 during transmit time (4 times peak) 996, but takes almost 4 frame times to transmit the frame. During the next 3 frame times, video compressor 404 ignores the 3 frames from server 402 and the decompressor of client 415 holds frame 6 steady on display device 422 for 4 frame times (i.e., momentarily reduces the frame rate from 60fps to 15 fps). Then finally, server 402 outputs frame 10970, video compressor 404 compresses it into compressed frame 10987 and transmits it during transmit time 997, and the decompressor of client 415 decompresses frame 10 and displays it on display device 422 and the video starts over again at 60 fps.
Note that although video compressor 404 discards video frames from the video stream generated by server 402, it does not discard audio data (regardless of what form the audio is), and when video frames are discarded video compressor 404 continues to compress and transmit audio data to client 415, client 415 continues to decompress the audio data and provide the audio to whatever device is used by the user to playback the audio. So during the period of dropped frames the audio continues unabated. Compressed audio consumes a relatively small percentage of bandwidth compared to compressed video, and therefore does not have a large impact on the overall data rate. Although not illustrated in either of the data rate maps, there is always data rate capacity reserved for the compressed audio stream within the peak data rate 941.
The example just described in FIG. 9b was chosen to illustrate how the frame rate drops during the data rate peak, but not to illustrate that when the previously described cyclic I/P tiling technique is used, the data rate peak and the consequent dropped frames are rare, even during high scene complexity/high action sequences, such as those found in video games, movies, and some application software. Therefore, reduced frame rates are infrequent and temporary, and the human visual system does not detect them.
If the frame rate reduction mechanism just described is applied to the video stream data rate illustrated in fig. 9a, the resulting video stream data rate is illustrated in fig. 9 c. In this example, 2 peak 952 has been reduced to flattened 2 peak 953, and 4 peak 955 has been reduced to flattened 4 peak 955, and the entire video stream data rate 934 remains at or below the peak data rate 941.
Thus, using the techniques described above, high action video streams may be transmitted with low latency over general purpose internet and consumer-level internet connections. Additionally, in an office environment on a LAN (e.g., 100Mbs ethernet or 802.11g wireless network) or on a private network (e.g., a 100Mbps connection between a data center and an office), high action video streams may be transmitted without peaks, such that multiple users (e.g., 1920 x 1080 at 60fps transmitted at 4.5 Mbps) may use the LAN or a shared private data connection without flooding the overlapping peaks (overlay) network or network switch backplane.
Data rate adjustment
In one embodiment, the hosting service 210 initially evaluates the available maximum data rate 622 and latency of the channel to determine an appropriate data rate for the video stream and then dynamically adjusts the data rate in response thereto. To adjust the data rate, the hosting service 210 can, for example, modify the image resolution and/or the number of frames per second of the video stream to be sent to the client 415. Moreover, the hosting service may adjust the quality level of the compressed video. When changing the resolution of the video stream (e.g., from 1280 × 720 resolution to 640 × 360), the video decompression logic 412 on the client 415 may scale up the image to maintain the same image size on the display screen.
In one embodiment, in the case of a complete exit of the channel, the hosting service 210 pauses the game. In the case of a multiplayer game, the hosting service reports to other users that the user has logged out of the game and/or paused the game for other users.
Dropped or delayed packets
In one embodiment, the video decompression logic 412 is able to mitigate visual artifacts if data is lost due to packet loss between the video compressor 404 and the client 415 in fig. 4a or 4b, or due to packets arriving too late to decompress and meeting the latency requirements of the decompressed frames being received out of order. In a streaming I/P frame implementation, if there are lost/delayed packets, the entire screen is affected, possibly causing the screen to completely freeze for a period of time or display other screen-wide visual artifacts. For example, if a lost/delayed packet causes the loss of an I-frame, the decompressor will lack a reference for all the following P-frames before receiving a new I-frame. If a P frame is lost, it will affect the following P frame for the entire screen. Depending on how long before the I-frame occurs, this will have a longer or shorter visual impact. With interleaved I/P tiles as shown in fig. 7a and 7b, a lost/delayed packet is less likely to affect the entire screen because it only affects the tiles contained in the affected packet. If the data for each tile is sent in a separate packet, it affects only one tile if the packet is lost. Of course, the duration of the visual artifact will depend on whether an I tile packet is lost and how many frames it will take before an I tile occurs in the event of a P tile loss. However, given that different tiles on the screen are updated very frequently (possibly every frame) through the I-frame, even if one tile on the screen is affected, other tiles may not be affected. Additionally, if some event causes several packets to be lost simultaneously (e.g., a spike in power adjacent to the DSL line that temporarily interrupts the data flow), some image blocks will be more affected than others, but because some image blocks will be updated quickly with new I image blocks, they are only temporarily affected. Also, in the case of a streaming I/P frame implementation, not only are the I frames the most critical frames, but the I frames are extremely large, so if there is an event that causes a dropped/delayed packet, there is a higher probability that the I frames are affected than a much smaller I tile (i.e., if any portion of the I frame is lost, it is not possible to decompress the I frames at all). For all of the reasons, using I/P tiles results in much smaller visual artifacts when packets are dropped/delayed compared to the case of I/P frames.
One embodiment attempts to reduce the effect of lost packets by intelligently encapsulating compressed tiles within TCP (transmission control protocol) packets or UDP (user datagram protocol) packets. For example, in one embodiment, tiles are aligned with packet boundaries whenever possible. Fig. 10a illustrates how image blocks may be encapsulated within a series of packets 1001-1005 without implementing this feature. Specifically, in FIG. 10a, tiles cross packet boundaries and are inefficiently packed such that the loss of a single packet results in the loss of multiple frames. For example, if a packet 1003 or 1004 is lost, three tiles are lost, resulting in visual artifacts.
In contrast, fig. 10b illustrates tile packaging logic 1010 for intelligently packaging tiles within packets to reduce the effects of packet loss. First, the tile encapsulation logic 1010 aligns the tiles with the packet boundaries. Thus, tiles T1, T3, T4, T7, and T2 are aligned with the boundaries of packet 1001-. The tile encapsulation logic also attempts to combine tiles within a packet in a potentially more efficient manner without crossing packet boundaries. Based on the size of each of the tiles, tiles T1 and T6 are combined in one grouping 1001; combining T3 and T5 in one grouping 1002; combining image blocks T4 and T8 in one grouping 1003; adding picture block T8 to packet 1004; and adds picture block T2 to packet 1005. Thus, under this scheme, a single packet loss will result in the loss of no more than 2 tiles (rather than 3 tiles as illustrated in FIG. 10 a).
One additional benefit of the embodiment shown in FIG. 10b is that: the image blocks are transmitted in a different order in which they are displayed within the image. This approach creates less noticeable artifacts on the display if neighboring packets are lost due to the same event that interferes with the transmission, which would affect areas on the screen that are not near each other.
One embodiment uses Forward Error Correction (FEC) techniques to protect certain portions of a video stream from channel errors. As is known in the art, FEC techniques such as reed-solomon and Viterbi generate and append error correction data information to data transmitted over a communication channel. If an error occurs in the base data (e.g., I frame), FEC may be used to correct the error.
FEC codes increase the data rate of transmission and therefore ideally they are only used when most needed. If data is being sent and it will not result in very noticeable visual artifacts, it may be preferable not to use FEC codes to protect the data. For example, a P tile immediately preceding a lost I tile will only produce a visual artifact on the screen of 1/60 seconds (i.e., the tile on the screen will not be updated). Such visual artifacts are hardly detectable by the human eye. As P tiles move further back from I tiles, missing P tiles become increasingly more noticeable. For example, if a tile loop pattern is an I tile followed by 15P tiles before the I tile is available again, then if the P tile immediately following the I tile is lost, it results in the tile displaying an incorrect image for 15 frame times (at 60fps, this would be 250 milliseconds). The human eye will easily detect the interruption of the 250 ms stream. Thus, the more backward a P tile is from a new I tile (i.e., the closer a P tile follows an I tile), the more noticeable the artifact is. As previously discussed, however, in general, the closer a P tile follows an I tile, the smaller the data for that P tile. Thus, not only is the P tile following the I tile more critical to protection from loss, but its size is smaller. Furthermore, in general, the smaller the data that needs to be protected, the smaller the FEC code that is needed to protect it.
Thus, as illustrated in fig. 11a, in one embodiment, only the I tiles are provided with FEC codes due to the importance of the I tiles in the video stream. Thus, FEC1101 contains an error correction code for I tile 1100 and FEC1104 contains an error correction code for I tile 1103. In this embodiment, no FEC is generated for P tiles.
In one embodiment illustrated in fig. 11b, FEC codes are also generated for P tiles that are most likely to cause visual artifacts when lost. In this embodiment, FEC1105 provides error correction codes for the first 3P tiles but not for the following P tiles. In another embodiment, the FEC code is generated for the P tiles with the smallest data size (which will tend to select the P tiles that occur earliest after the I tiles, which are most critical for protection).
In another embodiment, rather than sending the FEC code along with the tile, the tile is transmitted twice, each time in a different packet. If a packet is lost/delayed, another packet is used.
In one embodiment shown in fig. 11c, FEC codes 1111 and 1113 are generated for audio packets 1110 and 1112, respectively, transmitted from the hosting service concurrently with the video. Maintaining the integrity of the audio in the video stream is particularly important because distorted audio (e.g., clicks or hisses) would result in a particularly undesirable user experience. The FEC code helps ensure that the audio content is rendered at the client computer 415 without distortion.
In another embodiment, rather than sending the FEC code along with the audio data, the audio data is transmitted twice, each time in a different packet. If one packet is lost/delayed, another packet is used.
Additionally, in one embodiment illustrated in fig. 11d, FEC codes 1121 and 1123 are used for user input commands (e.g., button presses) 1120 and 1122, respectively, that are transmitted upstream from the client 415 to the hosting service 210. This is important because missing button presses or mouse movements in a video game or application may result in an undesirable user experience.
In another embodiment, rather than sending the FEC code along with the user input command data, the user input command data is transmitted twice, each time in a different packet. If one packet is lost/delayed, another packet is used.
In one embodiment, the hosting service 210 evaluates the quality of the communication channel with the client 415 to determine whether to use FEC and, if so, what portion of video, audio, and user commands should be applied with FEC. Assessing the "quality" of the channel may include functions such as estimating packet loss, delay, etc., as described above. If the channel is particularly unreliable, the hosting service 210 may apply FEC to all I tiles, P tiles, audio, and user commands. In contrast, if the channel is reliable, the hosting service 210 may apply FEC only to audio and user commands, or may not apply FEC to audio or video, or may not use FEC at all. Various other permutations of the application of FEC may be used while still adhering to the underlying principles. In one embodiment, the hosting service 210 constantly monitors the condition of the channel and changes the FEC policy accordingly.
In another embodiment, referring to fig. 4a and 4b, when packets are lost/delayed, resulting in loss of image block data, or if FEC is not able to correct the lost image block data, possibly due to particularly bad packet loss, the client 415 evaluates how many frames remain before a new I tile will be received and compares it to the round trip delay from the client 415 to the hosting service 210. If the round-trip delay is less than the number of frames before the new I tile should arrive, then the client 415 sends a message to the hosting service 210 requesting the new I tile. This message is routed to the video compressor 404, and instead of generating a P tile for a tile for which data has been lost, it generates an I tile. Assuming that the system shown in fig. 4a and 4b is designed to provide a round trip delay that is typically less than 80 milliseconds, this results in the tile being corrected to within 80 milliseconds (at 60fps, the frame has a duration of 16.67 milliseconds, so in a full frame time, the 80 millisecond delay will result in the corrected tile being within 83.33 milliseconds, 83.33 milliseconds being 5 frame times, which is an noticeable break, but much less noticeable than, for example, a 250 millisecond break for 15 frames). When compressor 404 generates such an I tile out of its usual loop order, if an I tile would cause the bandwidth of the frame to exceed the available bandwidth, compressor 404 would delay the loop of other tiles so that other tiles receive P tiles during the frame time (even though one tile would normally be an I tile during the frame), and then the usual loop would continue from the next frame, and a tile that would have received an I tile in the previous frame would typically receive an I tile. While this action temporarily delays the phase of the R frame cycle, it will typically not be visually noticeable.
Video and audio compressor/decompressor implementations
FIG. 12 illustrates a particular embodiment in which 8 tiles are compressed in parallel using a multi-core and/or multi-processor 1200. In one embodiment, using a dual core processor, quad core Xeon (to strong) CPU computer system executing at 2.66GHz or higher, each core implements the open source x264H.264 compressor as a separate process. However, various other hardware/software configurations may be used while still complying with the underlying principles. For example, each of the CPU cores may be replaced by an h.264 compressor implemented with an FPGA. In the example shown in FIG. 12, cores 1201-1208 are used to process the I tile and the P tile simultaneously as eight independent threads. As is well known in the art, current multi-core and multi-processor computer systems are inherently capable of multi-threading when integrated with multi-threading operating systems such as Microsoft Windows XP professional edition (either 64-bit or 32-bit edition) and Linux.
In the embodiment illustrated in FIG. 12, because each of the 8 cores is responsible for only one tile, it operates largely independently of the other cores, each performing a separate instantiation of x 264. Uncompressed video at 640 x 480, 800 x 600, or 1280 x 720 resolutions is captured using a PCI Express x 1-based DVI capture card, such as the Sendero video imaging IP development board from Microtronix of Oosterhout from Netherlands, and the FPGA on the card uses Direct Memory Access (DMA) to transfer the captured video into system RAM via the DVI bus. The tiles are arranged in a 4 x2 arrangement 1205 (although illustrated as square tiles, in this embodiment they have a 160 x 240 resolution). Each instantiation of x264 is configured to compress one of the 8 160 x 240 tiles, and it is synchronized so that each core enters a cycle after the initial I tile compression, each frame out of phase with the other frame, to compress an I tile followed by seven P tiles, as illustrated in FIG. 12.
At each frame time, the resulting compressed tiles are combined into a packet stream using the techniques previously described, and then the compressed tiles are transmitted to destination client 415.
Although not illustrated in fig. 12, if the data rate of the combined 8 tiles exceeds the specified peak data rate 941, then all 8 x264 processes will be suspended for the necessary frame time until the data for the combined 8 tiles has been transmitted.
In one embodiment, client 415 is implemented as software on a PC that executes 8 instantiations of FFmpeg. The receiving process receives the 8 tiles and routes each tile to the FFmpeg instantiation, which decompresses the tile and renders it to the appropriate tile location on the display device 422.
The client 415 receives keyboard, mouse, or game controller input from the PC's input device driver and transmits it to the server 402. The server 402 then applies the received input device data to a game or application executing on the server 402, the server 402 being a PC executing Windows using an Intel 2.16GHz dual core CPU. The server 402 then generates and outputs new frames via its DVI output from the motherboard-based graphics system or via the NVIDIA8800GTX PCI Express card's DVI output.
At the same time, server 402 outputs audio generated by the game or application via its digital audio output (e.g., S/PDIF) coupled to a digital audio input on a dual quad-core Xeon-based PC that implements video compression. The Vorbis open source audio compressor is used to compress audio simultaneously with video using whatever core is available to handle the threads. In one embodiment, the core that completes compressing its tile first performs audio compression. The compressed audio is then transmitted along with the compressed video and decompressed at the client 415 using a Vorbis audio decompressor.
Hosting service server central distribution
Light passing through glass, such as an optical fiber, travels at some fraction of the speed of the light in a vacuum, and thus the exact propagation speed of the light in the optical fiber can be determined. However, in practice, considering the time spent for routing delays, transmission inefficiencies, and others, we observe that the optimal delay on the internet reflects a transmission speed closer to 50% of the speed of light. Thus, the optimal 1000 mile round trip delay is about 22 milliseconds, and the optimal 3000 mile round trip delay is about 64 milliseconds. Thus, a single server on one U.S. coast would be too far away to serve clients on the other coast (which may be as far as 3000 miles) with the desired latency. However, as illustrated in fig. 13a, if the hosting service 210 server hub 1300 is located at a center in the united states (e.g., kansas, nebraska, etc.) such that the distance to any point in the continental united states is about 1500 miles or less than 1500 miles, the round-trip internet latency can be as low as 32 milliseconds. Referring to fig. 4b, note that: although the worst-case latency allowed by the customer ISP 453 is 25 milliseconds, typically we observe a latency closer to 10-15 milliseconds in the case of DSL and cable modem systems. Also, FIG. 4b assumes a maximum distance of 1000 miles from user premises 211 to hosting center 210. Thus, with the typical 15 ms user ISP round trip delay used and a maximum internet distance of 1500 miles for a 32 ms round trip delay, the total round trip delay from the time the user actuates input device 421 to seeing a response on display device 422 is 1+1+15+32+1+16+6+ 8-80 ms. Thus, an 80 millisecond response time can typically be achieved over an internet distance of 1500 miles. This would allow any customer premises in the continental united states with a sufficiently short customer ISP latency 453 to access a single server center centrally located.
In another embodiment illustrated in FIG. 13b, the hosting service 210 server centers HS1-HS6 are strategically located around the United states (or other geographic area), with certain larger hosting service server centers located near high population centers (e.g., HS2 and HS 5). In one embodiment, the server centers HS1-HS6 exchange information via a network 1301, which network 1301 can be the Internet or a private network or a combination of both. In the case of multiple server centers, users with high user ISP latency 453 can be served with lower latency.
While distance over the internet is indeed a factor that has an impact on round trip latency through the internet, other factors that are sometimes largely unrelated to latency also play a role. Packet flows are sometimes routed via the internet to a remote location and back again, causing delays from long loops. Sometimes there are improperly operating routing devices on the path, resulting in delays in transmission. There are sometimes communications overloading the path, which introduces delay. Furthermore, sometimes there is simply a failure to prevent the user's ISP from routing to a given destination. Thus, while the general internet typically provides a connection from one point to another with a fairly reliable and optimal route and latency that is largely determined by distance (especially in the case of long distance connections that result in routing to outside of the user's local area), this reliability and latency is not guaranteed at all and is often not achievable from the user's premises to a given destination on the general internet.
In one embodiment, when a user client 415 initially connects to the hosting service 210 to play a video game or use an application, the client communicates (e.g., using the techniques described above) with each of the available hosting service server centers HS1-HS6 at startup. If the delay is low enough for a particular connection, that connection is used. In one embodiment, the client communicates with all or a subset of the hosting service server centers, selecting the hosting service server center with the lowest latency connection. The client may select the service center with the lowest latency connection, or the server center may identify the server center with the lowest latency connection and provide this information (e.g., in the form of an internet address) to the client.
If a particular hosting service server center is overloaded and/or the user's games or applications can tolerate a delay to another, less loaded hosting service server center, the client 415 may be redirected to another hosting service server center. In this case, the game or application being executed by the user would be paused on the server 402 at the user's overloaded server center and the game or application state data would be transferred to the server 402 at another hosting server center. The game or application will then be restarted. In one embodiment, the hosting service 210 will wait until the game or application reaches a natural point of pause (e.g., between levels in the game, or after the user initiates a "save" operation in the application) before transferring. In yet another embodiment, the hosting service 210 will wait until user activity ceases for a specified period of time (e.g., 1 minute) and then will initiate a transfer at this time.
As described above, in one embodiment, the hosting service 210 subscribes to the Internet bypass service 440 of FIG. 14 in an attempt to provide guaranteed latency to its clients. An internet bypass service as used herein is a service that provides a private network route with guaranteed characteristics (e.g., latency, data rate, etc.) from one point to another on the internet. For example, if the hosting service 210 is receiving a large amount of communications from a user using the AT & T's DSL service provided in san francisco (rather than being routed to a central office based on san francisco of AT & T), the hosting service 210 will lease a high capacity private data connection from a service provider (possibly the AT & T itself or another provider) between the central office based on san francisco and one or more of the server centers for the hosting service 210. Then, if the route from all hosting service server centers HS1-HS6 to users using AT & T DSL in san Francisco via the general Internet results in too high a delay, then a private data connection may be used instead. While private data connections are generally more expensive than routes over the general internet, as long as they keep a small percentage of the hosting service 210 connected to the user, the overall cost impact is low and the user will experience a more consistent service experience.
In the event of a power failure, a server center often has two layers of backup power. The first tier is typically backup power from a battery (or from an alternative immediately available energy source, such as a flywheel that remains operational and is attached to a generator) that provides power immediately upon a power mains failure and keeps the server center operational. If the power failure is temporary and the mains returns quickly (e.g., within one minute), then it is desirable for the battery to keep the server center running. However, if the power failure lasts a longer period of time, a generator (e.g., diesel powered) is typically started instead of the battery and the generator can operate as long as it has fuel. The generator is extremely expensive because it must be able to produce as much power as the server center typically derives from the power mains.
In one embodiment, each of the hosting services HS1-HS5 share user data with each other so that when one server center has a power failure, it can pause the game and application in progress and then transfer the game or application state data from each server 402 to the servers 402 at the other server centers, and then notify each user's client 415 to direct it to the new server 402. Given that the situation occurs infrequently, it may be acceptable to transfer the user to a hosting service server center that is not capable of providing the optimal latency (i.e., the user would only have to tolerate higher latencies for the duration of the power failure), which would allow a much wider range of options for transferring the user. For example, given a difference in time zones across the united states, a user on the east coast may be about to sleep at 11:30PM, while a user on the west coast is beginning to peak video game usage at 8:30 PM. If there is a power failure in the hosting service server center on the west coast at that time, there may not be enough west coast servers 402 at the other hosting service server centers to handle all users. In such a scenario, some users may be transferred to a hosting service server center with available servers 402 on east coast, and the only consequence for the user would be a higher latency. Once a user is transferred from a server center that has lost power, the server center may then begin an orderly shutdown of its servers and devices in order to shut down all devices before the battery (or other immediate power backup) is exhausted. In this way, the cost of the generator for the server center can be avoided.
In one embodiment, during times of heavy loading of the hosting service 210 (either due to peak user loading or because one or more server centers have failed), users are transferred to other server centers based on latency requirements of the games or applications being used by the users. Thus, users using games or applications that require low latency will be given a preference for available low latency server connections with limited provisioning.
Host service features
FIG. 15 illustrates an embodiment of components of a server center for hosting service 210 utilized in the following feature description. As with the hosting service 210 illustrated in fig. 2a, unless otherwise conditioned, the components of this server center are controlled and coordinated by the hosting service 210 control system 401.
Inbound internet traffic 1501 from user clients 415 is directed to inbound routing 1502. Typically, inbound internet traffic 1501 will enter the server center via a high speed fiber connection to the internet, but any network connection device with sufficient bandwidth, reliability, and low latency will be sufficient. The ingress route 1502 is a system of switches and routing servers supporting the switches (which may be implemented as Ethernet, fibre channel, or via any other transport device) that takes arriving packets and routes each packet to the appropriate app/game server 1521 and 1525. In one embodiment, the packets transmitted to a particular application/game server represent a subset of the data received from the client and/or may be translated/changed by other components within the data center (e.g., network connection components such as gateways and routers). In some cases, packets are routed to more than one server 1521-1525 at a time, for example, if a game or application is executing on multiple servers concurrently in parallel. RAID arrays 1511 & 1512 are connected to the inbound routing network 1502 so that the application/game servers 1521 & 1525 can read RAID arrays 1511 & 1512 and write RAID arrays 1511 & 1512. In addition, a RAID array 1515 (which may be implemented as multiple RAID arrays) is also connected to the inbound routing 1502, and data from the RAID array 1515 may be read from the app/game server 1521 and 1525. Inbound routing 1502 may be implemented in a variety of prior art network architectures (including tree-structured switches with inbound Internet traffic 1501 at their root); implemented in a mesh structure interconnecting all the various devices; or as a sequence of interconnected sub-networks (centralized communication among the interworking devices is isolated from centralized communication among the other devices). One type of network is configured as a SAN (storage area network), which, although commonly used for storage devices, can also be used for general high-speed data transfer between devices. Also, the app/game servers 1521-1525 may each have multiple network connections to the inbound routing 1502. For example, server 1521-.
The app/game servers 1521-1525 may be configured identically, somewhat differently, or all differently, as previously described with respect to the server 402 in the embodiment illustrated in FIG. 4 a. In one embodiment, each user is typically at least one application/game server 1521-  1525 when using the hosting service. For simplicity of illustration, it will be assumed that a given user is using app/game server 1521, but multiple servers may be used by a user, and multiple users may share a single app/game server 1521-1525. User control input sent from the client 415 (as previously described) is received as inbound internet communications 1501 and routed to the app/game server 1521 via inbound routing 1502. App/game server 1521 uses the user's control input as control input to the game or application executing on the server and calculates the next frame of video and audio associated therewith. The app/game server 1521 then outputs the uncompressed video/audio 1529 to the common video compression 1530. The application/game server may output uncompressed video via any means, including one or more hyper-speed ethernet connections, but in one embodiment, video is output via a DVI (interactive digital video system) connection and audio and other compression and communication channel status information is output via a Universal Serial Bus (USB) connection.
The common video compression 1530 compresses the uncompressed video and audio from the app/game server 1521-1525. The compression may be implemented entirely in hardware or in hardware executing software. There may be a dedicated compressor for each app/game server 1521-1525 or if the compressors are fast enough, a given compressor may be used to compress video/audio from more than one app/game server 1521-1525. For example, at 60fps, the video frame time is 16.67 milliseconds. If the compressor is capable of compressing 1 frame in 1 millisecond, it can be used to compress video/audio from up to 16 app/game servers 1521-. This results in substantial cost savings in compression hardware. Because different servers will complete frames at different times, in one embodiment, the compressor resources are in a shared pool 1530 with shared storage (e.g., RAM, flash) for storing the state of each compression process, and when the server 1521 and 1525 frames are complete and ready to be compressed, the control device determines which compression resource is available at that time, providing the compression resource with the state of the server's compression process and the frames of uncompressed video/audio to be compressed.
Note that part of the state of the compression process for each server includes information about the compression itself, such as decompressed frame buffer data of the previous frame (which may be used as a reference for P tiles), the resolution of the video output; the quality of the compression; an image block structure; allocation of bits per image block; compression quality, audio format (e.g., stereo, surround sound, Dolby)AC-3 (Dolby)) AC-3). But the compression process state also includes communication channel state information about: the peak data rate 941, and whether the previous frame (as illustrated in fig. 9 b) is currently being output (and thus the current frame should be ignored), and potentially whether there are channel characteristics that should be considered in the compression, such as excessive packet loss, that affect the compression decision (e.g., in terms of the frequency of the I tile, etc.). Because the peak data rate 941 or other channel characteristics change over time, as determined by the app/game server 1521-1525 that supports each user monitoring data sent from the client 415, the app/game server 1521-1525 sends the relevant information to the shared hardware compression 1530.
Common hardware compression 1530 also packetizes the compressed video/audio using a device such as those previously described, and, where appropriate, applies FEC codes, replicates particular data, or takes other steps in order to adequately ensure the ability of the video/audio data stream to be received by client 415 and decompressed with a feasible high quality and reliability.
Some applications, such as those described below, require that the video/audio output of a given app/game server 1521-1525 be available at multiple resolutions (or in other multiple formats) simultaneously. If the app/game server 1521 & 1525 so notifies the shared hardware compression 1530 resource, the uncompressed video audio 1529 of the app/game server 1521 & 1525 will be compressed simultaneously in different formats, different resolutions, and/or in different packet/error correction structures. In some cases, some compression resources may be shared among multiple compression processes that compress the same video/audio (e.g., in many compression algorithms, there is a step whereby the image is scaled to multiple sizes before compression is applied. In other cases, separate compression resources would be required for each format. In any case, all of the various resolutions and formats of compressed video/audio 1539 required for a given app/game server 1521-. In one embodiment, the output of the compressed video/audio 1539 is in UDP format, so it is a unidirectional packet stream.
The outbound routing network 1540 includes a series of routing servers and switches that direct each compressed video/audio stream to an intended user or other destination via an outbound internet communications 1599 interface (which would typically be connected to a fiber optic interface to the internet) and/or back to the delay buffer 1515 and/or back to the inbound routing 1502, and/or output via a private network (not shown) for video distribution. Note (as follows): outbound routing 1540 may output a given video/audio stream to multiple destinations simultaneously. In one embodiment, this is implemented using Internet Protocol (IP) multicast, where a broadcast is intended to flow to a given UDP stream of multiple destinations simultaneously, and the broadcast is repeated by the routing servers and switches in the outbound routing 1540. The multiple destinations of the broadcast may be to multiple users' clients 415 via the internet, to multiple app/game servers 1521-1525 via inbound routing 1502, and/or to one or more delay buffers 1515. Thus, the output of a given server 1521-.
Additionally, in another embodiment, if multiple app/game servers 1521-.
Note that in one embodiment, a copy of all videos generated by app/game server 1521-. This allows each user to "playback" the video from each session in order to check for previous work or performance (in the case of games). Thus, in one embodiment, each compressed video/audio output 1539 stream routed to the user client 415 is also multicast to the delay buffer 1515. When the video/audio is stored on the delay buffer 1515, the directory on the delay buffer 1515 provides a cross-reference between the network address of the app/game server 1521 plus 1525 (which is the source of the delayed video/audio) and the location on the delay buffer 1515 where the delayed video/audio can be found.
Live, instantly viewable, instantly playable game
The app/game server 1521-1525 may be used not only to execute a user's given app or video game, but it may also be used to establish a user interface application for the hosting service 210 that supports navigation and other features via the hosting service 210. One such screenshot of the user interface application is shown in FIG. 16 (the "Game Finder" screen). This particular user interface screen allows the user to view 15 games that are played live (or delayed) by other users. Each of the "thumbnail" video windows (such as 1600) is a live video window in motion that displays one video from one user's game. The view displayed in the thumbnail may be the same view the user is looking at, or it may be a delayed view (e.g., if the user is playing a fighting game, the user may not want other users to see where they are hidden and they may choose to delay any view of their game play for a period of time (e.g., 10 minutes)). The view may also be a camera view of the game that is different from any user's view. Through menu selections (not shown in this illustration), a user may select a selection of games to view simultaneously based on a variety of criteria. As a small sampling of exemplary choices, the user may select a random selection of games (such as the one shown in fig. 16), all one category of games (played by different players), only the top level players of the game, players at a given level in the game, or lower level players (e.g., if the player is learning the base), players who are "partners" (or are competitors), games with the greatest number of viewers, and so forth.
Note that typically, each user will decide whether the video from their game or application is viewable by others, and if so, which others and when, whether the video is viewable by others, whether the video is viewable only with a delay.
The app/game server 1521-1525 that generated the user interface screen displayed in fig. 16 retrieves the 15 video/audio feeds by sending a message to the app/game server 1521-1525 of each user from whom the app/game server 1521-1525 is requesting a game. The message is sent via inbound routing 1502 or another network. The message will include the size and format of the requested video/audio and will identify the user viewing the user interface screen. A given user may choose to select "pirate" mode and not allow any other user to view the video/audio of their game (from their point of view or from another point of view), or as described in the previous paragraph, a user may choose to allow the video/audio from their game to be viewed, but delay the viewed video/audio. The user app/game server 1521-1525, which receives and accepts the request to allow its video/audio to be viewed, will therefore acknowledge to the requesting server, and it will also inform the common hardware compression 1530 that an additional compressed video stream of the requested format or screen size (assuming the format and screen size are different from those already generated) needs to be generated, and that it will also indicate the destination of the compressed video (i.e., the requesting server). If the requested video/audio is only delayed, the requesting app/game server 1521-1525 will be so notified and it will retrieve the delayed video/audio from the delay buffer 1515 by looking up the location of the video/audio in the directory on the delay buffer 1515 and the network address of the app/game server 1521-1525 that is the source of the delayed video/audio. Once all of the requests are generated and processed, up to 15 live thumbnail sized video streams are routed from the outbound route 1540 to the inbound route 1502 to the app/game servers 1521-1525 that generated the user interface screens and will be decompressed and displayed by the servers. The delayed video/audio stream may be at an excessive screen size and if so, the app/game server 1521-1525 will decompress the stream and scale down the video stream to the thumbnail size. In one embodiment, the request for audio/video is sent to (and managed by) a central "management" service (not shown in FIG. 15) similar to the hosting service control system of FIG. 4a, which then redirects the request to the appropriate app/game server 1521-1525. Further, in one embodiment, no request may be needed because the thumbnail is "pushed" to the clients of those users who are allowed to do so.
All simultaneously mixed audio from 15 games may produce harsh sounds. The user may choose to mix all the sounds together in this manner (perhaps just for the "noisy" sensation created by all the actions being viewed), or the user may choose to listen to the audio from only one game at a time. Selection of a single game is accomplished by moving the yellow selection frame 1601 to a given game (yellow frame movement may be accomplished by using arrow keys on a keyboard, by moving a mouse, by moving a joystick, or by pushing a directional button on another device such as a mobile phone). Once a single game is selected, only the audio from that game is played. Also, game information 1602 is displayed. In the case of this game, for example, the publisher logo ("EA") and the game logo "mastership car canyon" and the orange bar indicate, in relative terms, the number of people playing or watching the game at that particular moment (in this case, many, and therefore the game is "hot"). Additionally, a "status" is provided indicating that there are 80 different instances of the mastership flyer game being actively played by 145 players (i.e., the game may be played by an individual player game or a multiplayer game), and there are 680 viewers (of which this user is one). Note that this statistical data (and other statistical data) is collected by the hosting service control system 401 and stored on the RAID array 1511-1512 for keeping a log of the hosting service 210 operations and for appropriately billing the user and paying the publisher that provided the content. Some statistics are recorded as a result of actions taken by the service control system 401, and some statistics are reported to the service control system 401 by individual app/game servers 1521-1525. For example, when a game is being viewed (and when the game is stopped from viewing), the app/game server 1521 executing this game finder application sends a message to the hosting service control system 401 so that the hosting service control system 401 can update the statistics of how many games are being viewed. Some of the statistics may be available to a user interface application, such as this game viewfinder application.
If the user clicks the start button on their input device, they will see the thumbnail video in the yellow frame enlarged while the thumbnail video remains live at full screen size. This effect is shown in the process of fig. 17. Note that the size of video window 1700 increases. To implement this effect, the app/game server 1521-. The game-executing app/game server 1521-. The user playing the game may or may not have a display device 422 with the same resolution as the display device of the user that enlarged the game. Additionally, other viewers of the game may or may not have a display device 422 of the same resolution as the user who enlarged the game (and may have a different audio playback device, such as stereo or surround sound). Thus, the shared hardware compressor 1530 determines whether an appropriate compressed video/audio stream has been generated that satisfies the requirements of the user requesting the video/audio stream, and if an appropriate compressed video/audio stream does exist, the shared hardware compressor 1530 notifies the outbound routing 1540 to route a copy of the stream to the app/game server 1521-. The server, now receiving the full screen version of the selected video, will decompress it and gradually scale it up to full size.
Fig. 18 illustrates how the screen looks (as indicated by the image pointed at by arrow 1800) after the game has been fully enlarged to full screen and displayed at the full resolution of the user's display device 422. The app/game server 1521 @ 1525 executing the game viewer application sends a message to the other app/game servers 1521 @ 1525 providing the thumbnails indicating that the thumbnails are no longer needed and a message to the hosting service control server 401 indicating that other games are no longer being viewed. At this point, the only display generated is an overlay 1801 at the top of the screen, which provides information and menu control to the user. Note that as the game progresses, the audience grows to 2,503 viewers. With so many viewers, there must be many viewers with display devices 422, the display devices 422 having the same or close resolution (each app/game server 1521 and 1525 having the ability to scale video for adjusting the degree of coordination).
Because the game displayed is a multiplayer game, the user may decide to join the game at some point. The hosting service 210 may or may not allow the user to join the game for a variety of reasons. For example, a user may have to pay a fee to play a game and choose not to pay, the user may not have a sufficient level to join that particular game (e.g., for other players, it will not be competitive), or the user's internet connection may not have a latency low enough to allow the user to play (e.g., there is no latency constraint for viewing the game so a game that is played remotely (indeed, on another continent) can be viewed without latency concerns, but for a game to be played, the latency must be low enough for the user to (a) enjoy the game and (b) be on equal level with other players that may have lower latency connections). If the user is permitted to play, the app/game server 1521-, and then the hosting service control server 401 will direct the ingress port route 1502 to pass the control signal from the user to the now game hosting application/game server and the now game hosting application/game server will direct the common hardware compression 1530 to switch from compressing the video/audio from the application/game server hosting the game viewfinder application to compressing the video/audio from the now game hosting application/game server. The vertical synchronization of the game viewfinder application/game service with the new application/game server hosting the game is not synchronized and therefore there may be a time difference between the two synchronizations. Because the shared video compression hardware 1530 will begin compressing video immediately after the app/game server 1521-1525 completes a video frame, the first frame from the new server, which may be before the previously compressed frame completes its transmission, may complete its transmission earlier than the full frame time of the old server (e.g., consider transmission time 992 of FIG. 9 b-if uncompressed frame 3963 completes half of the frame time earlier, it will affect (imperge) transmission time 992). In this case, the common video compression hardware 1530 will ignore the first frame from the new server (e.g., as ignore (974) frame 4964), and the client 415 will keep the last frame from the old server for an additional frame time, and the common video compression hardware 1530 will begin compressing the next frame time video from the new application/game server hosting the game. Visually, the transition from one application/game server to another will be seamless to the user. The hosting service control server 401 then switches the app/game server 1521-1525 that notified hosting the game viewfinder to an idle state until it is needed again.
The user can then play the game. Furthermore, the exception is that the game will be played perceptually instantly (since the game has been loaded onto the app/game server 1521-1525 from the Raid array 1511-1512 at gigabit/second speed) and the game will be loaded onto a server that is exactly suited for the game, along with the operating system that is correctly configured for the game, through an ideal driver, scratch pad configuration (in the case of Windows), and no other applications that may compete with the operation of the game are executing on the server.
Also, as the user progresses through the game, each of the segments of the game will be loaded into the server from RAID array 1511-. Furthermore, because the hardware configuration and computing power of each app/game server 1521-1525 is known, pixel and vertex shading can be pre-computed.
Thus, the game may start almost instantaneously, it will execute in an ideal environment, and subsequent segments will load almost instantaneously.
However, in addition to these advantages, the user will be able to watch others playing the game (via the game viewfinder, previously described, and other devices), and both decide whether the game is interesting, and if so, learn skills from watching others. Further, the user will be able to demonstrate the game immediately without having to wait for a large download and/or installation, and the user will be able to play the game immediately (perhaps on a less-expensive trial basis, or on a longer-term basis). Furthermore, the user will be able to play the game on a Windows PC, Macintosh, on a television, at home, on the go, and even on a mobile phone through a wireless connection with sufficiently low latency. Furthermore, this may all be done without ever having a physical possession of the game copy.
As previously stated, a user may decide not to allow their game play to be viewed by others, allow their game to be viewed after a delay, allow their game to be viewed by a selected user, or allow their game to be viewed by all users. Regardless, in one embodiment, the video/audio is stored in the delay buffer 1515 for 15 minutes, and the user will be able to "rewind" and watch their previous game play, and pause the game, play back slowly, fast forward the game, etc., as they can when watching TV with a Digital Video Recorder (DVR). Although in this example, the user is playing a game, the same "DVR" capability is available if the user is using an application. This may be useful in checking previous work and in other applications as detailed below. Additionally, if the game is designed with the ability to fall back based on utilizing game state information so that the camera view, etc. can be changed, this "3D DVR" capability would also be supported, but it would require the game to be designed to support the "3D DVR" capability. The "DVR" capability of using delay buffer 1515 will work in conjunction with any game or application (of course, limited to video generated when the game or application is used), but in the case of 3D DVR-capable games, the user can control the 3D "fly-through" of previously played segments and have delay buffer 1515 record the resulting video and record the game state of the game segment. Thus, a particular "fly-through" will be recorded as a compressed video, but since the game state will also be recorded, a different fly-through will likely be at a later date of the same segment of the game.
As described below, users on the hosting service 210 will each have a user page in which the user can publish information about themselves and other data. One of the things that a user will be able to publish is a video clip from a game play that the user has saved. For example, if a user has overcome a particularly difficult challenge in a game, the user may "fall back" just before where they obtained their big result in the game, and then direct the hosting service 210 to save a video clip of a certain duration (e.g., 30 seconds) on the user's user page for other users to view. To implement this, the only thing the user is using the app/game server 1521-.
If the game has 3D DVR capabilities, as described above, the game state information needed for 3D DVR can also be recorded by the user and made available to the user's user page.
In the case where the game is designed to have "spectators" (i.e., users that are able to travel and observe actions in the 3D world without participating) in addition to active players, then the game finder application will enable the users to join the game as spectators as well as players. From a viewing perspective, there is no difference to the host system 210 that the user is a spectator, not an active player. The game is loaded onto the app/game server 1521-1525 and the user will control the game (e.g., control a virtual camera viewing the world). The only difference is the user's gaming experience.
Multiple user collaboration
Another feature of the hosting service 210 is the ability for multiple users to collaborate while watching live video (even if viewed using widely different devices). This is useful both when playing games and when using applications.
Many PCs and mobile phones are equipped with video cameras and have the ability to do real-time video compression, especially when the images are small. Also, small cameras are available, can be attached to televisions, and it is not difficult to implement real-time compression in software or using one of many hardware compression devices for compressing video. Also, many PCs and all mobile phones have microphones, and headsets are available with microphones.
The camera and/or microphone combined with local video/audio compression capabilities, in particular using the low latency video compression techniques described herein, will enable a user to transmit video and/or audio from the user premises 211 to the hosting service 210 along with input device control data. When using the technique, then the capabilities illustrated in FIG. 19 may be implemented: a user may have their video and audio 1900 appear on a screen within another user's game or application. This example is a multiplayer game where teammates collaborate in a racing car. The user's video/audio can only be selectively viewed/heard by their teammates. Furthermore, because there will effectively be no latency using the techniques described above, players will be able to talk to each other or move in real time without appreciable delay.
The video/audio integration is accomplished by having the compressed video and/or audio from the user's camera/microphone arrive as an inbound internet communication 1501. The inbound routing 1502 then routes the video and/or audio to the app/game server 1521- > 1525 that is permitted to view/hear the video and/or audio. Then, users of the respective app/game servers 1521-1525 who select to use video and/or audio decompress the video and/or audio and integrate it as needed to appear within the game or application, such as illustrated by 1900.
The example of FIG. 19 shows how the collaboration is used in a game, but the collaboration can be an extremely powerful tool for applications. Consider a scenario in which: one of the large buildings is being designed for new york city by architects in chicago for new york based real estate developers, but the decision involves a financial investor on a trip and happens to be at miami airport, and a decision needs to be made about the specific design elements of the building (in terms of how they are collocated with the buildings in their vicinity) to satisfy both investors and real estate developers. Assume a building company has a high resolution monitor with a camera attached to a PC in chicago, a real estate developer has a notebook computer with a camera in new york, and an investor has a mobile phone with a camera in miami. The building company may use the hosting service 210 to host powerful building design applications that are capable of highly realistic 3D rendering, and it may utilize a large database of buildings in new york city, as well as a database of the building being designed. The architectural design application will execute on one (or on several if it requires a lot of computing power) of the app/game servers 1521-1525. Each of the 3 users at disparate locations will be connected to the hosting service 210 and each will have simultaneous viewing of the video output of the architectural design application, but they will be sized appropriately by the common hardware compression 1530 for the given device and network connection characteristics each user has (e.g., the building company can see 2560 x 144060 fps display via a 20Mbps commercial internet connection, the property developer in new york can see 1280 x 72060 fps images via a 6Mbps DSL connection on their notebook computer, and the investor can see 320 x 18060 fps images via a 250Kbps cellular data connection on their mobile phone). Each party will hear the other party's voice (the conference call will be handled by any of the many widely available conference call suites in app/game server 1521 and 1525) and, via actuation of a button on the user input device, the user will be able to cause a video to appear using their local camera. As the meeting progresses, the architect will be able to display what the building looks like when it rotates the building and flies (fly-by) it next to another building in the area through a very photo-realistic 3D rendering, and all parties will see the same video at the resolution of the parties' display devices. It is not a problem that any of the local devices used by any party can handle 3D animations with this realism, let alone download or even store the huge database required to render the surrounding buildings in new york city. From the perspective of each of the users, although far away, and despite being disparate local devices, they will simply have a seamless experience with an incredible degree of realism. Furthermore, when a party wishes that their face be seen to better convey their emotional state, it may do so. Additionally, if a real estate developer or investor wishes to control a building program and use its own input device (which is a keyboard, mouse, keypad, or touch screen), it can do so, and it can respond with an imperceptible delay (assuming its network connection does not have unreasonable delay). For example, in the case of a mobile phone, if the mobile phone is connected to a WiFi network at an airport, it will have very low latency. But if it uses the cellular data networks available today in the united states, it will likely suffer from noticeable hysteresis. However, for most purposes of conferencing (where the investor is watching an architect to control a building over-the-air or talking about a video teleconference), even cellular delays should be acceptable.
Finally, at the end of the collaborative conference call, the real estate developer and investor will make their comments and stop broadcasting from the hosting service, the building company will be able to "rewind" the video of the conference that has been recorded on the delay buffer 1515 and check the comments, facial expressions and/or actions applied to the 3D model of the building made during the conference. If there is a particular segment that it wishes to save, then the segment of video/audio may be moved from delay buffer 1515 to RAID array 1511-1512 for archival storage and later playback.
Also, from a cost perspective, if an architect only needs to use the computing power and large database of new york city for 15 minutes of conference calls, it only needs to pay for the time that the resource is used, rather than having to own a high-powered workstation and having to purchase an expensive copy of the large database.
Video rich community service
The hosting service 210 gives rise to an opportunity to build video rich community services on the internet. FIG. 20 shows an exemplary user page for a game player on the hosting service 210. As with the game viewer application, the user page is an application executing on one of the app/game servers 1521 and 1525. All thumbnails and video windows on the page display constantly moving video (if the segment is short, it loops).
Using a video camera or by uploading a video, a user (with the user name "KILLHAZARD") can publish his own video 2000 (which other users can view). The video is stored on RAID array 1511-. Also, when other users come to the user page of KILLHAZARD, if KILLHAZARD is using the hosting service 210 at that time, then whatever live video 2001 is being played (assuming KILLHAZARD permits the user viewing his user page to view the video). This is accomplished by the app/game server 1521-. The compressed video stream of the appropriate resolution and format is then sent to and displayed by the app/game server 1521 executing the user page app 1525 using the same method used by the game viewfinder application. If the user selects the window of the live game play with KILLHAZARD and then clicks on their input device appropriately, the window will zoom in (again using the same method as the game viewfinder application) and the live video will fill the screen at the resolution of the viewing user's display device 422 (which is appropriate for viewing the user's characteristics of the internet connection).
The key advantages of this over the prior art methods are: a user viewing a user page can see a game played live that the user does not own, and may not have a local computer or game console that can play the game. It provides the user with an excellent opportunity to see the user shown as "active" in the user page playing the game, and this is an opportunity to learn to see the game that the user may wish to try or be better at.
Video clips recorded or uploaded by the camera of the partner 2002 from KILLHAZARD are also displayed on the user's page, and below each video clip is text indicating whether the partner is playing online (e.g., six _ shot is playing the game "dragon knight" (Eragon) and MrSnuggles99 is offline, etc.). By clicking on a menu item (not shown), the buddy video clip switches from displaying recorded or uploaded video to live video of the content currently playing the buddy on the host service 210 in its game at that moment in time. Thus, it becomes a game finder that groups the partners. If a buddy's game is selected and the user clicks on the game, the game will zoom in to full screen and the user will be able to view the game played live full screen.
Again, the user viewing the partner's game does not own a copy of the game, nor does it own the local computing/gaming console resources for playing the game. The game viewing is effectively instantaneous.
As previously described above, when a user plays a game on the hosting service 210, the user is able to "rewind" the game and find a video clip that he wishes to save, and then save that video clip to his user page. This is called a Brag Clip. The video segments 2003 are brag clips 2003 saved by killhahakard from previous games played thereby. Numeral 2004 shows how many times the brag clip has been viewed, and when the brag clip was viewed, the user has an opportunity to rate them, and the number of illustrations 2005 of the orange keyhole shape indicates how high the level is. As the user views the user page, the brag clip 2003 constantly cycles along with the rest of the video on the page. If the user selects and clicks on one of the brag clips 2003, it zooms in to present the brag clip 2003, and DVR controls that allow the clip to be played, paused, rewound, fast forwarded, stepped, etc.
Brag clip 2003 playback is implemented by the app/game server 1521-.
The brag clip 2003 may also be a "3D DVR" video clip from a game that supports 3D DVR capabilities (i.e., a sequence of game states from a game that can be played back and that allows the user to change the point of view of the camera). In this case, the game state information is stored in addition to the compressed video recording of a particular "fly-through" that the user made when recording the game piece. When the user page is being viewed and all thumbnails and video windows are constantly looping, the 3D DVR brag clip 2003 will constantly loop the brag clip 2003 that was recorded as compressed video when the user recorded the "fly-through" of the game segment. However, when the user selects the 3D DVR brag clip 2003 and clicks on the 3D DVR brag clip 2003, the user will be able to click on the button that gives his 3D DVR capabilities for the game segment in addition to the DVR control that allows the compressed video brag clip to be played. It will be able to independently control the camera "fly-through" during the game segment, and if it so wishes (and the user owning the user page allows it), it will be able to record an alternative brag clip "fly-through" in the form of compressed video, which will then be available to other viewers of the user page (either immediately, or after the owner of the user page has the opportunity to check for the brag clip).
The 3D DVR brag clip 2003 capability is enabled by launching a game that will replay recorded game state information on another app/game server 1521-. Because the game can be started almost instantaneously (as described previously), it is not difficult to start it (its play is limited to the game state recorded by the brag clip) and then allow the user to "fly" with the camera while recording the compressed video to the delay buffer 1515. Once the user is finished "flying through," the game is deactivated.
From the user's perspective, initiating a "fly-through" with a 3D DVR brag clip 2003 is no more difficult than controlling a DVR control that controls a linear brag clip 2003. The user may not know the game or even how to play the game. The user indicates a virtual camera operator staring at the 3D world during a game segment recorded by another operator.
The user will also be able to record their own audio over the brag clip (or recorded or uploaded from the microphone). In this way, the brag clip can be used to generate customized animations using characters and actions from the game. This animation technique is commonly referred to as "gaming movie" (machinima).
As the user progresses through the game, they will reach different skill levels. The game played will report the outcome to the service control system 401 and the skill level will also be displayed on the user page.
Interactive animated advertising
Online advertising has transitioned from text to still images, video, and now to interactive segments, typically implemented using animation thin clients such as Adobe Flash. The reason for using an animation thin client is that: users are often less patience to be delayed by privileges to promote products or services thereto. Also, thin clients execute on very low performance PCs, and thus advertisers may have a high degree of confidence: the interactive ad will work properly. Unfortunately, animation thin clients such as Adobe Flash are limited in the degree of interactivity and duration of the experience (to reduce download time).
FIG. 21 illustrates an interactive ad in which the user will select the exterior and interior colors of the car as it rotates in the showroom while real-time ray tracing shows how the car looks. The user then selects a character to drive the car, and the user can then employ the car for driving on a racing track or across a foreign location such as morna. The user may select a larger engine or a better tire and then see how the changed configuration affects the ability of the vehicle to accelerate or remain stable.
Of course, the advertisement is effectively a sophisticated 3D video game. But for such an advertisement that may be played on a PC or video game console, it would require a possible 100MB download, and in the case of a PC, it may require the installation of a special driver, and may not be executed at all when the PC lacks sufficient CPU or GPU computing power. Thus, the advertisement is impractical in prior art configurations.
In the hosting service 210, the advertisement is placed almost instantaneously, and preferably executes, regardless of the user's client 415 capabilities. Thus, it is delivered more quickly, experienced richer, and highly reliable than thin client interactive ads.
Geometry of flow during real-time animation
RAID array 1511-.
With prior art systems, such as the video game system shown in fig. 1, the available mass storage devices, especially in a practical home device, are too slow to stream the geometry during game play (except where the required geometry is somewhat predictable). For example, in a driving game where there are specified roads, the geometry for entering buildings within the field of view may be reasonably well predicted and the mass storage device may search ahead for the location where the upcoming geometry is located.
But in complex scenes with unpredictable changes (e.g., in battle scenes with complex characters around), if the RAM on a PC or video game system is completely filled with the geometry for the objects currently in view, and then what happens after the user suddenly turns their character to view it, there may be a delay before the geometry can be displayed if it is not pre-loaded into RAM.
In the hosting service 210, RAID arrays 1511-1512 may stream data at speeds in excess of the speed of ultra-high speed Ethernet, and in SAN networks, speeds better than 10 gigabits of Ethernet or better than 100 gigabits per second of other network technologies are possible. 100 megabits/second will load one gigabyte of data in less than one second. Within a 60fps frame time (16.67 ms), approximately 170 megabits (21MB) of data can be loaded. Of course, even in a RAID configuration, rotating the media will still result in a latency of more than one frame time, but flash-based RAID storage will eventually be as large as a rotating media RAID array and will not incur this high latency. In one embodiment, a cache written via a large amount of RAM is used to provide very low latency access.
Thus, with a sufficiently high network speed, and a large amount of memory with sufficiently low latency, the geometry can be streamed into the app/game server 1521-1525 as fast as the CPU and/or GPU can process 3D data. Thus, in the example given previously, where the user suddenly turned their character and looked backwards, the geometry of all the characters behind them could be loaded before the character completed the rotation, and thus, it would appear to the user as if they were in a photo-realistic world as the action of live broadcasting.
As discussed previously, one of the last borders in a photo-realistic computer animation is a human face, and the slightest error from a photo-realistic face can result in a negative reaction from the viewer due to the sensitivity of the human eye to imperfections. FIG. 22 shows the use of ContourTMAuthenticity capture technology (subject matter of application in the following co-pending applications: application No. 10/942,609 "application and method for capturing the motion of a performer" on 9/15.2004, "" application No. 10/942,413 "application and method for capturing the expression of a performer" on 9/15.2004, "" application No. 11/066,954 "application and method for capturing the identification of a motion capture System" on 25.2.2005, "" application No. 11/077,628 "application No. 2006" application and method for capturing the motion of a motion capture System "on 10.10.2005," application No. 11/255,854 "application No. 6 and method for capturing the motion of a motion System" on 20.10.2005, "" application No. 7 a fluoro centlamp "; 11/449,127 entitled "System and method for live-action-attached models" applied on 7.6.2006, each of which is assigned to the assignee of the present CIP application) results in a very smooth capture surface, yet achieves a high polygon count tracking surface (i.e., polygon motion follows the motion of the face accurately). Finally, when a video of a live performance is mapped onto the tracking surface to produce a textured surface, a photo-realistic result is produced.
While current GPU technology is capable of rendering many polygons in a tracked surface and texture and illuminating that surface in real time, if the polygons and textures change every frame time (which would produce the most photorealistic results), it would quickly consume all of the available RAM of a modern PC or video game console.
Using the streaming geometry technique described above, it becomes practical to constantly feed geometry into the app/game server 1521-1525 so that it can constantly animate photo-realistic faces allowing video games to be generated with faces that are almost indistinguishable from live action faces.
Integration of linear content with interactive features
Movies, television programs, and audio material (collectively, "linear content") are widely available in many forms to home and office users. Linear content can be obtained on physical media such as CD, DVD, HD-DVD, and Blu-ray media. It can also be recorded by DVR from satellite and cable TV broadcasts. In addition, it may be available via satellite and cable TV pay-per-view (PPV) content and with video-on-demand (VOD) over cable TV.
Increasingly linear content is available over the internet in both downloaded and streaming content. Today, there is really no place to experience all the features associated with linear media. For example, DVDs and other video optical media often have interactive features (such as director's commentary, "catwalk" clips, etc.) that are not available at other locations. Online music sites have cover art and song information that is not typically available on CDs, but not all CDs are available online. And websites associated with television shows often have additional features, blogs, and sometimes comments from actors or creative people.
In addition, in the case of many movies or sporting events, there are typically video games that are often released (in the case of movies) along with linear media or (in the case of sports) that can be closely tied to real-world events (e.g., a player's transactions).
The hosting service 210 is well suited to delivering linear content when joining disparate forms of related content together. Indeed, delivering movies is less challenging than delivering highly interactive video games, and the hosting service 210 is able to deliver linear content to a variety of devices in the home or office, or to mobile devices. FIG. 23 shows an exemplary user interface page for the hosting service 210 that displays the selection of linear content.
However, unlike most linear content delivery systems, the hosting service 210 is also capable of delivering related interactive components (e.g., menus and features on DVD, interactive overlays on HD-DVD, and Adobe Flash animations on websites (as described below)). Thus, the client device 415 limitations no longer introduce limitations on what features are available.
In addition, the hosting system 210 is capable of linking linear content with video game content dynamically and in real time. For example, if a user is watching a Quidditch game in a Harry potter movie and decides that they would like to try to play Quidditch, they may simply click a button and the movie will pause and it will be immediately delivered to the Quidditch segment of the Harry potter video game. After playing the quididitch game, another click of the button will resume and the movie will immediately begin again.
In the case of photo-realistic graphics and production techniques, where the video captured by the camera is indistinguishable from live action characters, the two scenes are virtually indistinguishable when the user makes a transition from the Quidditch game in live action movies to the Quidditch game in video games on a hosted service (as described herein). This provides entirely new authoring options for directors of both linear and interactive (e.g., video game) content, as the lines between the two worlds become indistinguishable.
With the hosting service architecture shown in fig. 14, control of the virtual camera in a 3D movie can be provided to the viewer. For example, in a scene occurring within a train, it would be possible to allow a viewer to control the virtual camera and look around the train as the story progresses. This assumes that all 3D objects ("assets") in the train are available, as well as a sufficient level of computing power to be able to render the scene and the original movie in real time.
And even for non-computer generated entertainment, there are very exciting interactive features that can be provided. For example, the 2005 movie "pride and prejudice" has many scenes in decorative gorgeous old uk mansions. For a particular building scene, the user may pause the video and then control the camera to tour the building, or possibly the surrounding area. To do this, a camera with a fish-eye lens may be carried through the building, much like the QuickTimeVR of Apple (Apple) corporation, which implements the prior art, as it tracks its position. The various frames will then be transformed so the image is not distorted, and then they are stored on RAID array 1511-1512 along with the movie and played back when the user chooses to continue the virtual tour.
In the case of sporting events, live sporting events (such as a basketball game) may be streamed via the hosting service 210 for viewing by the user (as it would for a common TV). After a user views a particular play, the video game of the game (eventually the basketball player appears generally photo-realistic as a real player) can catch up with the player starting in the same location, and the users (perhaps each controlling a player) can replay to see if they can do better than the player.
The hosting service 210 described herein is well suited to support this future world as it is capable of withstanding computing power and mass storage resources that are impractical to install in the home or in most office settings, and whose computing resources are always up-to-date (with the latest computing hardware available), but in the home setting there will always be homes with older generations of PCs and video games. Furthermore, in the hosting service 210, all this computational complexity is kept from the user, so even though the user may be using a very sophisticated system, from the user's perspective it is as simple as changing channels on a television. Additionally, the user will be able to access all computing power and the experience that the computing power will bring from any client 415.
Multi-player game
To the extent that the game is a multiplayer game, the game will be able to communicate not only to the app/game server 1521 through the inbound routing 1502 network, 1525, but also through the network bridge to the Internet (not shown) with servers or game machines that are not executing in the hosting service 210. When playing multiplayer games over computers on the general internet, then the app/game server 1521-.
A significant difference may be achieved when a multiplayer game is played entirely within the hosting service 210 server center. Each app/game server 1521-1525 hosting the game for the user will be interconnected with the other app/game servers 1521-1525 and any servers hosting the central control for the multiplayer game with extremely high speed, extremely low latency connectivity and large, extremely fast storage arrays. For example, if ultra-high speed Ethernet is used for the inbound routing 1502 network, the app/game servers 1521-1525 would be communicated among each other and to any server hosting the central control for the multiplayer game at gigabit/second speed with potentially only 1 millisecond or less latency. In addition, RAID array 1511-  1512 will be able to respond very quickly and then transfer data at gigabit/second speeds. As one example, if a user customizes a character in terms of appearance and clothing so that the character has a large number of geometries and behaviors unique to the character, under prior art systems limited to game clients executing in the home on a PC or game console, if the character were to come into view of another user, the user would have to wait until the long slow download was complete in order to load all of the geometry and behavior data into their computer. Within the hosting service 210, the same download may be better than a super speed Ethernet network served from RAID array 1511-1512 at gigabit/second speed. Even if a home user has an 8Mbps internet connection (which is extremely fast according to today's standards), ultra high speed ethernet is 100 times faster. Thus, work that takes one minute to perform on a fast internet connection will take less than one second on a gigabit ethernet network.
Top player grouping and tournament
The hosting service 210 is well suited for tournaments. Because no game is executed in the local client, there is no chance of user cheating. Also, due to the ability of the output routing 1540 to multicast UDP streams, the hosting service 210 is able to broadcast a large tournament to thousands of people in the audience at the same time.
In fact, when there is such a vogue that thousands of users are receiving a particular video stream of the same stream (e.g., a view showing a larger tournament), the video stream may be more efficiently sent to a Content Delivery Network (CDN), such as Akamai (Akamai corporation) or light (spotlight corporation), for mass distribution to many client devices 415.
A similar level of efficiency may be obtained when using a CDN to display a game finder page for a top level player grouping.
For larger tournaments, a live celebrity announcer may be used to provide commentary during a particular tournament. Although a large number of users will be watching a larger tournament, and a relatively small number will be playing in the tournament. Audio from the celebrity announcer may be routed to the app/game server 1521 hosting the user playing in the tournament and hosting any spectator mode copies of the game in the tournament and the audio may be recorded on top of the game audio. The video of the celebrity explainer may be superimposed on the game (possibly just above the spectator view as well).
Acceleration of web page loading
The world wide web major transport protocol, hypertext transfer protocol (HTTP), is conceived and defined in an era where only businesses have high speed internet connections and online consumers use dial-up modems or ISDN. At this point, the "gold standard" for the fast connection is the T1 line, which provides 1.5Mbps data rate symmetrically (i.e., with equal data rates in both directions).
Today, the situation is completely different. The average home connection speed via DSL or cable modem connections in a large number of developed worlds has a much higher downstream data rate than the T1 line. In fact, in some parts of the world, fiber-to-the-curb (fiber-to-the-curb) is bringing data rates up to 50Mbps to 100Mbps into the home.
Unfortunately, HTTP is not architected (nor implemented) to effectively take advantage of this dramatic speed improvement. A web site is a collection of files on a remote server. Very briefly, HTTP requests a first file, waits to download the file, and then requests a second file, waits to download the file, etc. Indeed, HTTP allows more than one "open connection" (i.e., more than one profile at a time), but only permits very few open connections due to agreed-upon criteria (and the desire to prevent the web server from being overloaded). Furthermore, because of the way web pages are constructed, browsers are often unaware of the multiple simultaneous pages available for immediate download (i.e., it becomes apparent only after parsing one page that a new archive, such as an image, needs to be downloaded). Thus, the files on the web site are loaded essentially one by one. Furthermore, due to the request and response protocol used by HTTP, there is approximately (typical web server in the united states of america visited) a 100 millisecond delay associated with each profile loaded.
In the case of a relatively slow connection, this does not introduce much of a problem, since the download time for the profile itself determines the latency of the web page. However, as the connection speed increases (especially in the case of complex web pages), problems begin to arise.
In the example shown in fig. 24, a typical commercial website is shown (this particular website is from a larger athletic shoe brand). The website has 54 files. The files include HTML, CSS, JPEG, PHP, JavaScript, and Flash files, and include video content. A total of 1.5 mbytes must be loaded before the web page is live (i.e., the user can click on the web page and begin using the web page). There are many reasons for large numbers of files. First, the web page is a complex and sophisticated web page, and second, the web page is a web page that is dynamically assembled based on information about the user accessing the page (e.g., which country the user is from, what language, whether the user has previously made a purchase, etc.), and different profiles are downloaded depending on all of these factors. However, it is still a very typical commercial web page.
Fig. 24 shows the amount of time that elapses before the web page is live as the connection speed increases. At a 1.5Mbps connection speed 2401, using a conventional web server with a conventional web browser, it takes 13.5 seconds before the web page is live. At a 12Mbps connection speed 2402, the load time is reduced to 6.5 seconds, or about twice as fast. But at a 96Mbps connection speed 2403, the load time is only reduced to about 5.5 seconds. The reason for this is because at such high download speeds, the time to download the files themselves is minimal, but a delay of approximately 100 milliseconds per file remains, resulting in a delay of 5.4 seconds for 54 files 100 milliseconds. Thus, regardless of how fast the connection to the home is, the website will always take at least 5.4 seconds before being live. Another factor is server-side queuing; each HTTP request is added at the back of the queue, so on busy servers this will have a significant impact, as for each small item to be obtained from the web server, the HTTP request needs to wait for it to return.
One way to address these problems is to discard or redefine HTTP. Alternatively, it may be preferable for the website owner to merge their files into a single file (e.g., in Adobe Flash format). However, as a practical matter, the company and many others have a large investment in their website architecture. In addition, although some homes have 12-100Mbps connections, most homes still have slower speeds, and HTTP does work well at slow speeds.
An alternative approach is to host the web browser on app/game server 1521-1525 and the archive for the web server on RAID array 1511-1512 (or potentially in RAM or on local storage on app/game server 1521-1525 that hosts the web browser). Due to the very fast interconnect through the inbound routing 1502 (or to local storage), rather than having 100 milliseconds of latency per file using HTTP, there will be a minimum latency per file using HTTP. Then, instead of having the user in the home access the web page via HTTP, the user may access the web page via the client 415. Then, even with a 1.5Mbps connection (because this web page does not require a large amount of bandwidth for its video), the web page will be live in less than 1 second per line 2400. In essence, there will be no latency before the web browser executing on the app/game server 1521-1525 displays the live page, and there will be no detectable latency before the client 415 displays the video output from the web browser. When the user uses the mouse to search for and/or type on a web page, the user's input is sent to the web browser executing on the app/game server 1521-1525 and the web browser will respond accordingly.
One disadvantage of this method is that: if the compressor is constantly transmitting video data, bandwidth is used even if the web page becomes static. This can be remedied by configuring the compressor to transmit data only when (and if) the web page changes and then only to the portion of the page where the change occurred. When there are some web pages with flashing banners or the like that change constantly, the web pages tend to be annoying, and unless there is a reason to move something (e.g., a video clip), the web pages are typically static. For the web page, the following may be the case: using the hosting service 210 will transmit less data (compared to a traditional web server) because only the actually displayed image will be transmitted, there is no thin client executable code, and there are no large objects that may never be viewed (such as scrolling through a flipped image).
Thus, using hosting service 210 to host legacy web pages, the web page load time can be reduced to the point where opening a web page is similar to changing channels on a television: effectively live the web page instantly.
Facilitating debugging of games and applications
As previously mentioned, video games and applications with real-time graphics are very complex applications and often contain deficiencies when they are released into the field. While a software developer will get feedback from the user about the bug, and it may have some way to pass back the machine state after a crash, it is very difficult to identify exactly what caused the game or real-time application to crash or to execute improperly.
When a game or application is executed in the hosting service 210, the video/audio output of the game or application is constantly recorded on the delay buffer 1515. In addition, a watchdog process executes each app/game server 1521-. If the watchdog process fails to report, the server control system 401 will attempt to communicate with the app/game server 1521 and 1525, and if successful, will collect whatever machine state is available. Whatever information is available is sent to the software developer along with the video/audio recorded by the delay buffer 1515.
Thus, when a game or application software developer gets notification of a crash from the host service 210, it gets a frame-by-frame record of the cause of the crash. This information may be of great value in tracking and repairing defects.
It should also be noted that when the app/game server 1521-.
Resource sharing and cost savings
The system shown in fig. 4a and 4b provides a number of benefits to both end users and game and application developers. For example, typically home and office client systems (e.g., PCs or game consoles) are in use for only a small percentage of the hours of the week. According to the 2006-year-10-month-5-day communication draft by Nielsen entertainment "Active Game Benchmark Study" (http:// www.prnewswire.com/cgi-bin/stores. ploCCT ═ 104& STORY ═/www/store/10-05-2006/0004446115 & EDATE ═), Active players take an average of 14 hours a week to play on the video game console and about 17 hours a week to play on the handheld device. The report also states: active players average 13 hours a week for all game play activities, including console, handheld and PC game play. Considering the higher number console video game play time, there are 168 hours of a week 24 x 7, which implies that in an active player's home, the video game console is in use only for 10% of the hours of a week 17/168. Alternatively, the video game console is idle 90% of the time. Given the high cost of video game consoles, and the fact that manufacturers subsidize the equipment, this is a very inefficient use of expensive resources. PCs within the industry are also typically used only during a portion of the hour of the week, especially non-portable desktop PCs often required by high-end applications such as Autodesk Maya. While some businesses operate on all hours and holidays, and some PCs (e.g., portable PCs taken home for work at night) are used on all hours and holidays, most business activities tend to focus on about 9AM to 5PM from monday to friday, fewer holidays, and rest times (such as lunch) in a given business time zone, and because most PC usage occurs when a user is actively utilizing a PC, it follows: the utilization of desktop PCs tends to follow these operating hours. If we assume that PC is used continuously from 9AM to 5PM for five days of the week, this would imply that PC is used in 40/168 ═ 24% of the hours of the week. High performance desktop PCs are a very expensive investment for business and this reflects very low availability. Schools teaching on desktop computers may use the computer for an even smaller portion of the week, and although it varies depending on the hours of teaching, most teaching occurs during the daytime hours from monday to friday. Thus, in general, PCs and video game consoles are utilized for only a small fraction of the hours of the week.
Notably, because many people work in commerce or at school during daytime hours on monday through friday other than holidays, these people typically do not play video games during these hours, and thus when they do play video games, they typically are during other hours (such as at night, on weekends, and on holidays).
Given the configuration of the hosting service shown in fig. 4a, the usage patterns described in the two paragraphs above result in very efficient utilization of resources. It is apparent that there is a limit to the number of users that can be served by the hosting service 210 at a given time, especially if the users require real-time responsiveness for complex applications, such as sophisticated 3D video games. However, unlike video game consoles in the home or PCs used by businesses (which are typically idle most of the time), the server 402 may be reused by different users at different times. For example, a high performance server 402 with high performance dual CPUs and dual GPUs and a large amount of RAM can be utilized by businesses and schools from 9AM to 5PM on non-holidays, but by players playing sophisticated video games in the evening, weekends and holidays. Similarly, low performance applications may be utilized by businesses and schools during business hours on low performance servers 402 with a Celeron CPU, no GPU (or very low end GPU), and limited RAM and low performance games may utilize the low performance servers 402 during non-business hours.
In addition, with the hosting service configuration described herein, resources are effectively shared among thousands if not millions of users. In general, an online service has only a small percentage of its total user base using the service at a given time. It is easy to see why the Nielsen video game usage statistics listed previously are considered. If an active player plays the console game only 17 hours a week, and if the peak usage time of the game is assumed to be during typical non-working, non-commercial hours in the evening (5-12AM, 7 x 5 days-35 hours/week) and weekend (8AM-12AM, 16 x 2-32 hours/week), there are 35+ 32-65 peak hours a week for 17 hours of game play. It is difficult to estimate the exact peak user load on the system for many reasons: some users will play during off-peak times, there may be a clustering (clustering) peak of users at a particular time of day, peak times may be affected by the type of game played (e.g., a child's game will likely be played at an earlier time in the evening), and so on. However, assuming that the average number of hours played by a player is much less than the number of hours during the day when the player is likely to play the game, only a fraction of the number of users of the hosting service 210 will be using the hosting service 210 at a given time. For this analysis, we assume a peak load of 12.5%. Thus, only 12.5% of the computing, compression, and bandwidth resources are used at a given time, resulting in only 12.5% of the hardware cost to support a given user's play of a given level of performance games due to the reuse of resources.
Further, given that some games and applications require more computing power than others, resources may be dynamically allocated based on the game played by the user or the application executed by the user. Thus, a user selecting a low performance game or application will be assigned a low performance (less expensive) server 402, and a user selecting a high performance game or application will be assigned a high performance (more expensive) server 402. In practice, a given game or application may have lower performance and higher performance regions of the game or application, and the user may be switched from one server 402 to another server 402 between regions of the game or application to keep the user executing on the lowest cost server 402 that meets the needs of the game or application. Note that a RAID array 405 that is much faster than a single disk will be available to even low performance servers 402, which has the benefit of faster disk transfer rates. Thus, the average cost per server 402 across all games played or applications used is much less than the cost of most expensive servers 402 playing the highest performance games or applications, however, even low performance servers 402 will receive disk performance benefits from the RAID array 405.
Additionally, the server 402 in the hosting service 210 may simply be a PC motherboard with no disks or peripheral interfaces (other than network interfaces) and, just as well, may be integrated down into a single chip with just a fast network interface to the SAN 403. Moreover, the RAID array 405 will likely be shared among many more users than there are disks, so the disk cost per active user will be much less than one disk drive. All of the equipment will likely reside in racks in the environmentally controlled server room environment. If the server 402 fails, it can be easily repaired or replaced at the hosting service 210. In contrast, a PC or game console in a home or office must be rugged, a separate appliance that must be able to survive reasonable wear and tear to prevent being banged or dropped requires a housing, has at least one disk drive, must survive adverse environmental conditions (e.g., being crammed into an overheated AV cabinet with other appliances), requires service assurance, must be packaged and shipped, and sold by a retailer who may collect retail profits. In addition, the PC or game console must be configured to meet the peak performance of the most computationally intensive anticipated game or application to be used at some point in the future, even though lower performance games or applications (or sectors of games or applications) may be played most of the time. Furthermore, if a PC or console fails, it is an expensive and time consuming process to get it repaired (adversely affecting manufacturers, users, and software developers).
Thus, given that the system shown in FIG. 4a provides a user with an experience comparable to that of a local computing resource, for the user to experience a given level of computing power in a home, office, or school, it is much less expensive to provide that computing power through the architecture shown in FIG. 4 a.
Eliminating the need for upgrades
In addition, the user no longer has to worry about upgrading the PC and/or console to play new games or handle higher performance new applications. Any games or applications on the hosting service 210, regardless of the type of server 402 they require, may be available to the user, and all games and applications execute near instantaneously (i.e., quickly loaded from the RAID array 405 or local storage on the server 402) and with up-to-date updates and bug fixes in place (i.e., a software developer will be able to select the ideal server configuration for the server 402 executing a given game or application, and then configure the server 402 with the best drives, and then over time, the developer will be able to simultaneously provide updates, bug fixes, etc. to all copies of the games or applications in the hosting service 210). Indeed, after the user begins using the hosting service 210, the user may find that the games and applications continue to provide a better experience (e.g., via updates and/or bug fixes) and may be the following: a user discovers a new game or application a year later that is available on the service 210 that utilizes computing technology (e.g., a higher performance GPU) that was not even present a year ago, so it would not be possible for the user to purchase a one year ago technology that would play the game or execute the application a year later. Because the computing resources to play the game or execute the application are not visible to the user (i.e., the user only selects the game or application to begin executing near instantaneously from the user's perspective-much like the user changes channels on a television), the user's hardware will have been "upgraded" without the user even being aware of the upgrade.
Eliminating the need for backup
Another major problem for users in businesses, schools, and homes is backup. If the disk fails, or if there is an inadvertent erasure, the information stored in the local PC or video game console (e.g., in the case of the console, the user's game achievements and ratings) may be lost. There are many applications available that provide manual or automatic backups for PCs, and game console state can be uploaded to an online server for backup, but local backups are typically copied to another local disk (or other non-volatile storage device) that must be stored somewhere safe and organized, and backups for online services are often limited due to the slow upstream speeds available over typical low-cost internet connections. Under the hosting service 210 of FIG. 4a, the data stored in the RAID array 405 may be configured using prior art RAID configuration techniques that are well known to those skilled in the art so that when a disk fails, the data will not be lost and a technician at the server center housing the failed disk will be notified and then replace the disk, which will then be automatically updated so that the RAID array is again fault tolerant. Additionally, because all disk drives are close to each other with a fast local network therebetween via SAN 403, it is not difficult to periodically backup all disk system configurations in a server center to secondary storage (which may be stored at the server center or easily relocated). From the perspective of the user of the hosting service 210, their data is always completely secure, and they never have to consider backups.
Access to presentations
Users often wish to try out games or applications before purchasing them. As previously mentioned, there are prior art devices by which to demonstrate (the verb form of "demonstrate" means to try out a demonstration version, also referred to as "demonstrate", but as a noun) games and applications, but each of which suffers from limitations and/or inconveniences. Using the hosting service 210, it is easy and convenient for the user to try out the presentation. In practice, what the user does is select a presentation via a user interface (such as the user interface described below) and try out the presentation. The presentation will load almost instantaneously on the server 402 appropriate for that presentation, and it will execute exactly like any other game or application. The presentation will work from the user's perspective regardless of whether the presentation requires a very high performance server 402 or a low performance server 402, and regardless of the type of home or office client 415 used by the user. The software publisher of the game demonstration or application demonstration will be able to control exactly what demonstration the user is permitted to try out and for how long, and of course, the demonstration may include user interface elements that provide the user with the opportunity to gain access to the full version of the game or application being demonstrated.
Because the demos may be lower cost or provided free of charge, some users may attempt to use repeated demos (especially game demos that may be fun to play repeatedly). The hosting service 210 may use various techniques to limit the presentation usage for a given user. The most straightforward approach is to establish a user ID for each user and limit the number of times a presentation is allowed to play for a given user ID. However, the user can set a plurality of user IDs, especially in the case where it is free. One technique for addressing this problem is to limit the number of times a given client 415 is allowed to play a presentation. If the client is a standalone device, the device will have a serial number, and the hosting service 210 may limit the number of times the presentation can be accessed by the client having the serial number. If the client 415 is executing in software on a PC or other device, a serial number may be assigned by the hosting service 210 and stored on the PC and used to restrict presentation use, but assuming the PC can be reprogrammed by a user and the serial number erased or changed, another option is for the hosting service 210 to keep a record of the PC network adapter Media Access Control (MAC) address (and/or other machine specific identifier, such as a hard drive serial number, etc.) and restrict presentation use to that MAC address. It is assumed that the MAC address of the network adapter can be changed, however, this is not an extremely simple approach. Another approach is to limit the number of times a presentation can be played to a given IP address. Although IP addresses may be reassigned periodically by cable modems and DSL providers, they do not occur very frequently in practice, and if it can be determined (e.g., by contacting an ISP) that the IP is in a block of IP addresses for residential DSL or cable modem access, a small number of demo uses for a given household can typically be established. Also, there may be multiple devices in the home behind a NAT router that shares the same IP address, but typically in a residential setting there will be a limited number of such devices. If the IP address is in a block serving a business, a larger number of demonstrations for the business may be established. Finally, however, the combination of all the previously described methods is the best way to limit the number of presentations on a PC. While there may not be an extremely simple way for a determined and technically skilled user to be limited in the number of replays of a demo, establishing a large number of hurdles may establish a sufficient barrier to make most PC users not worth the hassle to abuse the demo system, and instead, use demos when they would like to try out new games and applications.
Benefits to schools, businesses and other institutions
Significant benefits arise especially in businesses, schools and other institutions that utilize the system shown in fig. 4 a. Businesses and schools have substantial costs associated with installing, maintaining, and upgrading PCs, especially when dealing with PCs that execute high performance applications such as Maya. As stated previously, PCs are typically utilized only a portion of the hours of the week, and as in the home, a PC with a given level of performance capability costs much more in an office or school environment than in a server-centric environment.
In the case of a larger business or school (e.g., a large university), IT may be practical for the entity's IT department to set up a server center and maintain computers that are accessed remotely via LAN-level connections. There are many solutions for remotely accessing computers via a LAN or via a private high bandwidth connection between offices. For example, with Microsoft's Windows terminal server, or with virtual network computing applications (such as VNC from RealVNC (remote control) limited) or with thin client devices from Sun Microsystems, users can gain remote access to a PC or server with a range of quality in graphical response time and user experience. Additionally, such self-managed server centers are typically dedicated to a single business or school, and thus are not able to take advantage of the overlap of usage that is possible when disparate applications (e.g., entertainment and business applications) utilize the same computing resource at different times of the week. As a result, many businesses and schools lack the scale, resources, or expertise to independently set up a server center with a network connection to each user's LAN speed. In fact, a large percentage of schools and businesses have the same internet connection (e.g., DSL, cable modem) as homes.
However, the organization may still have a need for very high performance computations (either periodically or periodically). For example, a small building company may have only a small number of architects, with relatively modest computational needs when doing design work, but it may periodically require very high performance 3D computations (e.g., when building a 3D fly-through of a new building design for a client). The system shown in fig. 4a is well suited for the tissue. The organization need only be the same kind of network connection (e.g., DSL, cable modem) that is provided to the home and is typically very inexpensive. It may utilize an inexpensive PC as the client 415, or may be entirely absent, and utilize inexpensive dedicated equipment that simply implements the control signal logic 413 and low-latency video decompression 412. This feature is particularly attractive to schools that may have problems with theft of the PC or damage to specialized components within the PC.
This configuration solves many of the problems for the organization (and many of these advantages are also common to home users who do general-purpose computing). For example, the operating costs (which ultimately must be passed back to the user in some form in order to have a viable business) can be much lower because (a) the computing resources are shared with other applications that have different peak usage times in the week, (b) the organization can only obtain (and incur costs for) access to high performance computing resources when needed, (c) the organization does not have to provide resources for backing up or otherwise maintaining the high performance computing resources.
Removal of piracy
Additionally, games, applications, interactive movies, etc. may no longer be pirated as they are today. Because the game is executed at the server center, the user does not have access to the basic program code, and thus there is no piracy. Even if the user were to copy the original code, the user would not be able to execute the code on a standard game console or home computer. This opens up a market around the world (such as china) where standard video games are not available. Re-sale of used games is also not possible.
For game developers, as is the case today, there are fewer market discontinuities. In contrast to the current situation where a completely new generation of technology forces users and developers to upgrade and where game developers rely on timely delivery of hardware platforms, the hosting service 210 can be gradually updated over time as game requirements change.
Streaming interactive video
The above description provides a variety of applications enabled by the novel basic concept of general internet-based low-latency streaming interactive video (which implicitly also includes audio along with video, as used herein). Prior art systems that provide streaming video over the internet have only enabled applications that can be implemented through high latency interactions. For example, basic playback controls for linear video (e.g., pause, rewind, fast forward) work properly at high latency, and it is possible to select among linear video feeds. Furthermore, as previously stated, the nature of some video games allows them to be played with high latency. However, the high latency (or low compression ratio) of prior art methods for streaming video severely limits the potential applications of streaming video or narrows its deployment to specialized network environments, and even in such environments, prior art introduces a substantial burden on the network. The technology described herein opens the door to a wide variety of applications possible under low-latency streaming interactive video over the internet, particularly those enabled via consumer-level internet connections.
Indeed, with client devices as small as the client 465 of FIG. 4c, it is sufficient to provide an enhanced user experience with an efficient arbitrary amount of computing power, arbitrary amount of fast storage, and extremely fast network connections between powerful servers, which enables a new computing era. Additionally, because bandwidth requirements do not grow as the computing power of the system grows (i.e., because bandwidth requirements only relate to display resolution, quality, and frame rate), once broadband internet connectivity is ubiquitous (e.g., via widely distributed low-latency wireless coverage), reliable, and with sufficiently high bandwidth to meet the needs of all users' display devices 422, the problem will be whether a heavy client (such as a PC or mobile phone executing Windows, Linux, OSX, etc.) or even a thin client (such as Adobe Flash or Java) is necessary for typical consumer and business applications.
The advent of streaming interactive video has led to a reconsideration of assumptions about the structure of the computing architecture. One example of this is the hosting service 210 server-centric embodiment shown in FIG. 15. The video path for the delay buffer G and/or the packetized video 1550 is a feedback loop in which the multicast streaming interactive video output of the app/game server 1521 & 1525 is fed back into the app/game server 1521 & 1525 via path 1552 in real time or after a selectable delay via path 1551. This enables a variety of practical applications (e.g., such as those illustrated in fig. 16, 17, and 20) that would not be possible or feasible with prior art server or local computing architectures. However, as a more general architectural feature, the feedback loop 1550 provides recursion at the streaming interactive video level, as the video can be cycled indefinitely as the application requires it. This makes possible a variety of application possibilities that have never been available before.
Another key architectural feature is: the video stream is a unidirectional UDP stream. This effectively enables any degree of multicasting of streaming interactive video (in contrast, a bi-directional stream such as a TCP/IP stream will create more and more traffic stalls on the network from back-and-forth communications as the number of users increases). Multicasting is an important capability within a server center because it allows the system to respond to the growing needs of internet users (and indeed, the world's population) to communicate on a one-to-many or even many-to-many basis. Again, the examples discussed herein that illustrate the use of both streaming interactive video recursion and multicasting (such as fig. 16) are only the tips of very large icebergs with the potential.
In one embodiment, the various functional modules and associated steps described herein may be performed by specific hardware components that contain hardwired logic for performing the steps, such as an application specific integrated circuit ("ASIC"), or by any combination of programmed computer components and custom hardware components.
In one embodiment, the module may be implemented on a programmable digital signal processor ("DSP") such as the TMS320x architecture of Texas instruments (e.g., TMS320C6000, TMS320C 5000.. or the like). A variety of different DSPs may be used while still complying with the underlying principles.
Embodiments may include various steps as set forth above. The steps may be embodied in machine-executable instructions, which cause a general-purpose or special-purpose processor to perform certain steps. Various components not related to these basic principles (e.g., computer memory, hard disk drive, input device) have been omitted from the figures to avoid obscuring the relevant aspects.
Elements of the disclosed subject matter may also be provided as a machine-readable medium for storing the machine-executable instructions. The machine-readable medium may include, but is not limited to, flash memory, optical disks, CD-ROMs, DVDROMs, RAMs, EPROMs, EEPROMs, magnetic or optical cards, propagation media or other type of machine-readable media suitable for storing electronic instructions. For example, the present invention may be downloaded as a computer program which may be transferred from a remote computer (e.g., a server) to a requesting computer (e.g., a client) by way of data signals embodied in a carrier wave or other propagation medium via a communication link (e.g., a modem or network connection).
It should also be understood that elements of the disclosed subject matter may also be provided as a computer program product which may include a machine-readable medium having stored thereon instructions which may be used to program a computer (e.g., a processor or other electronic device) to perform a sequence of operations. Alternatively, the operations may be performed by a combination of hardware and software. The machine-readable medium may include, but is not limited to, floppy diskettes, optical disks, CD-ROMs, and magneto-optical disks, ROMs, RAMs, EPROMs, EEPROMs, magnetic or optical cards, propagation media or other type of media/machine-readable medium suitable for storing electronic instructions. For example, elements of the disclosed subject matter may be downloaded as a computer program product, wherein the program may be transferred from a remote computer or electronic device to a requesting process by way of data signals embodied in a carrier wave or other propagation medium via a communication link (e.g., a modem or network connection).
In addition, although the disclosed subject matter has been described in connection with specific embodiments, numerous modifications and variations are well within the scope of the present disclosure. The specification and drawings are, accordingly, to be regarded in an illustrative rather than a restrictive sense.