CROSS-REFERENCE TO RELATED APPLICATIONS This application is a continuation-in-part of U.S. patent application Ser. No. 10/862,115, filed Jun. 4, 2004 and entitled “Networked Media Station,” which is incorporated herein by reference in its entirety and to which priority is claimed.
FIELD OF THE DISCLOSURE The subject matter of the present disclosure relates to a system and method for synchronizing presentation of media at multiple recipients or devices on a network.
BACKGROUND OF THE DISCLOSURE With the increasing capacity and capability of personal computers, as well as improved multimedia interfaces for these computers, it has become popular to use personal computers as a repository for multimedia content, such as songs, movies, etc. Particularly with music, the increased popularity of storing multimedia information on a personal computer has resulted in a variety of products and services to serve this industry. For example, a variety of stand-alone players of encoded multimedia information have been developed, including, for example, the iPod, produced by Apple Computer of Cupertino, Calif. Additionally, services have been developed around these devices, which allow consumers to purchase music and other multimedia information in digital form suitable for storage and playback using personal computers, including, for example, the iTunes music service, also run by Apple Computer.
These products and services have resulted in an environment where many consumers use their personal computer as a primary vehicle for obtaining, storing, and accessing multimedia information. One drawback to such a system is that although the quality of multimedia playback systems for computers, e.g, displays, speakers, etc. have improved dramatically in the last several years, these systems still lag behind typical entertainment devices, e.g., stereos, televisions, projection systems, etc. in terms of performance, fidelity, and usability for the typical consumer.
Thus, it would be beneficial to provide a mechanism whereby a consumer could easily obtain, store, and access multimedia content using a personal computer, while also being able to listen, view, or otherwise access this content using conventional entertainment devices, such as stereo equipment, televisions, home theatre systems, etc. Because of the increasing use of personal computers and related peripherals in the home, it would also be advantageous to integrate such a mechanism with a home networking to provide an integrated electronic environment for the consumer.
In addition to these needs, there is also increasing interest in the field of home networking, which involves allowing disparate devices in the home or workplace to recognize each other and exchange data, perhaps under the control of some central hub. To date a number of solutions in this area have involved closed systems that required the purchase of disparate components from the same vendor. For example, audio speaker systems that allow computer-controlled switching of music from one location to another may be purchased as a system from a single vendor, but they may be expensive and/or may limit the consumer's ability to mix and match components of a home network from different vendors according to her own preferences. Thus, it would be beneficial to provide a mechanism by which various home networking components from differing vendors can nonetheless interact in a home network environment.
The subject matter of the present disclosure is directed to overcoming, or at least reducing the effects of, one or more of the problems set forth above.
SUMMARY OF THE DISCLOSURE A system and method for delivering network media at multiple devices is disclosed. For example, the network media delivery system includes client devices and a host device. Each client device has a network interface for network communication, an engine for processing media data, and a media interface for delivering processed media. The host device, which can be a computer, establishes network communication links with the client devices, which can be networked media stations. The media data can be audio, video, or multimedia. In one embodiment, the network communication links are wireless links established between a wireless network interface on the host device and wireless network interfaces on the client devices.
The host device sends media data to the client devices via the network. The media data can be sent wirelessly as unicast streams of packets containing media data that are transmitted at intervals to each client device. In one embodiment, the host device controls processing of media data such that processed media is delivered in a synchronized manner at each of the client devices. In another embodiment, the host device controls processing of media data such that processed media is delivered in a synchronized manner at the host device and at least one client device.
The system uses Network Time Protocol (NTP) to initially synchronize local clocks at the client devices with a reference clock at the host device. The media data is preferably sent as Real-Time Transport Protocol (RTP) packets from the host device to the client device. The system includes mechanisms for periodic synchronization, stretching, and compressing of time at the local clocks to handle clock drift. In addition, the system includes mechanisms for retransmission of lost packets of media data. In one embodiment, the system can be used to deliver audio at multiple sets of speakers in an environment, such as a house, and can reduce effects of presenting the audio out of sync at the multiple sets of speakers to avoid user-perceivable echo.
The foregoing summary is not intended to summarize each potential embodiment or every aspect of the present disclosure.
BRIEF DESCRIPTION OF THE DRAWINGS The foregoing summary, preferred embodiments, and other aspects of subject matter of the present disclosure will be best understood with reference to a detailed description of specific embodiments, which follows, when read in conjunction with the accompanying drawings, in which:
FIG. 1 illustrates an embodiment of a network media delivery system according to certain teachings of the present disclosure.
FIG. 2 illustrates an embodiment of a networked media station or client device.
FIG. 3 illustrates a process of operating the disclosed system in flowchart form.
FIG. 4 illustrates an embodiment of an interface of a media application operating on a host device of the disclosed system.
FIG. 5A illustrates portion of the disclosed system having a host device delivering packets to multiple client devices.
FIG. 5B illustrates portion of the disclosed system having a host device and client devices performing retransmission of lost packet information.
FIG. 6A illustrates an embodiment of a packet requesting retransmission of lost packets.
FIG. 6B illustrates an embodiment of a response to retransmission request.
FIG. 6C illustrates an embodiment of a response to a futile retransmission request.
FIG. 7 illustrates portion of the disclosed system having a host device and multiple client devices exchanging time information.
FIG. 8A illustrates an embodiment of a packet for synchronizing time.
FIG. 8B illustrates an embodiment of a packet for announcing time.
FIG. 9 illustrates portion of the disclosed system having a host device and a client device.
FIG. 10 illustrates an algorithm to limit stuttering in playback of audio.
While the subject matter of the present disclosure is susceptible to various modifications and alternative forms, specific embodiments thereof have been shown by way of example in the drawings and are herein described in detail. The figures and written description are not intended to limit the scope of the inventive concepts in any manner. Rather, the figures and written description are provided to illustrate the inventive concepts to a person skilled in the art by reference to particular embodiments, as required by 35 U.S.C. § 112.
DETAILED DESCRIPTION A network media delivery system having a host device and multiple client devices is described herein. The following embodiments disclosed herein are described in terms of devices and applications compatible with computer systems manufactured by Apple Computer, Inc. of Cupertino, Calif. The following embodiments are illustrative only and should not be considered limiting in any respect.
I. Components of the Network Media Delivery System
Referring toFIG. 1, an embodiment of a network media delivery system10 according to certain teachings of the present disclosure is illustrated. The system10 includes a host device orcomputer system20 and one or more networked media stations orclient devices50, and various other devices. The system10 in the present embodiment represents only one of several possible configurations and is meant to be illustrative only. Other possible configurations are discussed in the incorporated U.S. patent application Ser. No. 10/862,115. For example, thehost device20 can have a wired or wireless connection to each of theclient devices50 without the use of a hub orbase station30, or thehost device20 can have a wireless connection to the hub orbase station30. ¶The system10 is used to distribute media (e.g., audio, video, multimedia, etc.) via network connections from thehost device20 tomultiple client devices50 located throughout an environment, such as a house, office, etc.
Thehost device20 is a personal computer, such as an AirPort-equipped Mac or a Wi-Fi-compliant Windows-based PC. Theclient devices50 are networked media stations, such as disclosed in incorporated U.S. patent application Ser. No. 10/862,115. Theclient devices50 are plugged into wall sockets, which provide power to theclient devices50, and are coupled to entertainment devices, such asamplifiers80, powered speakers, televisions, stereo systems, videocassette recorders, DVD players, home theatre systems, or other devices capable of delivering media known in the art.
An example of theclient device50 is discussed briefly with reference toFIG. 2. Theclient device50 includes an AC power adapter portion52 and anetwork electronics portion54. Thenetwork electronics portion54 includes a wired network interface62, a peripheral interface64, and amedia interface66. As illustrated, the wired network interface62 is an Ethernet interface, although other types of wired network interface known in the art could be provided. Similarly, the peripheral interface64 is illustrated as a USB interface, although other types of peripheral interfaces, such as IEEE 1394 (“Firewire”), RS-232 (serial interface), IEEE 1284 (parallel interface), could also be used. Likewise, themedia interface66 is illustrated as an audio interface including both an analog lineout and an optical digital audio functionality. However, other media interfaces known in the art, such as a multimedia interface or a video interface using composite video, S-video, component video, Digital Video Interface (DVI), High Definition Multimedia Interface (HTMI), etc., could also be provided.
Thenetwork electronics portion54 also includes awireless networking interface68. Thewireless network interface68 preferably takes the form of a “Wi-Fi” interface according to the IEEE 802.11b or 802.11g standards know in the art. However, other wireless network standards could also be used, either in alternative to the identified standards or in addition to the identified standards. These other network standards can include the IEEE 802.11a standard or the Bluetooth standard, for example.
Returning toFIG. 1, thehost device20 runs amedia application22. In one exemplary embodiment, themedia application22 is iTunes software for media file management and playback produced by Apple Computer, Inc. In the present configuration, which is only one of several possibilities, thehost device20 is equipped with an Ethernet port that is connected via acable24 to abase station30. Thebase station30 can be any variety of access points known in the art. Preferably, thebase station30 includes wireless access, routing, switching and firewall functionality. Thebase station30 is connected via acable42 to amodem40, which receives an Internet connection through aconnection44. Using this arrangement, multimedia files stored onhost device20 can be played usingstereo amplifiers80, which are connected toclient devices50 using one of the audio interfaces on theclient devices50. Thehost device20 and theclient devices50 preferably communicate via a wireless network segment (illustrated schematically by connections32), but wired network segments formed by wired connections, such as Ethernet cables, could also provide communication between the host device and theclient devices50. Theclient devices50 communicate with the entertainment devices via awired network segment82.
Theclient devices50 act as wireless base stations for a wireless network and enable thehost device20 to deliver media (e.g., audio, video, and multimedia content) at multiple locations in an environment. For example, theclient devices50 are connected tostereo amplifiers80 or other entertainment devices to playback media stored on thehost device20. In one embodiment, a line level audio or a digital fiber optic type of connector connects theclient devices50 to thestereo amplifiers80. Either type of connector can plug into the multimedia port (66;FIG. 2), which is a dual-purpose analog/optical digital audio mini-jack. To interface withstereo amplifiers80, a mini stereo toRCA adapter cable82 is used, which connects to RCA-type right and left audio input ports on thestereo amplifier80. Alternatively, a Toslink digital fiber optic cable can be used, which would connect to digital audio input port on thestereo amplifiers80. These and other configurations are disclosed in incorporated U.S. patent application Ser. No. 10/862,115.
For the purposes of the present disclosure, theclient devices50 can also be connected tolaptops70 or personal computers that are capable of playing media (audio, video, etc.) so that the laptops and personal computers can also be considered entertainment devices. Moreover, thelaptops70 or personal computers can have the same functionality as both aclient device50 and an entertainment device so that thelaptops70 and personal computers can be considered both a client device and an entertainment device. Accordingly, the term “client device” as used herein is meant to encompass not only the networked media stations associated withreference numeral50, but the term “client device” as used herein is also intended to encompass any device (e.g., laptop, personal computer, etc.) compatible with the network media delivery system10 according to the present disclosure. In the present disclosure, however, reference is made toclient devices50 for ease in discussion. Furthermore, the term “entertainment device” as used herein is meant to encompass not onlystereo amplifiers80 as shown inFIG. 1, but the term “entertainment device” as used herein is also intended to encompass powered speakers, televisions, stereo systems, videocassette recorders, a DVD players, home theatre systems, laptops, personal computers, and other devices known in the art that capable of delivering media.
Theclient devices50 receive media data from thehost device20 over network connections and output this media data to the entertainment devices. Although it is contemplated that audio, video, audio/video, and/or other forms of multimedia may be used, exemplary embodiments disclosed herein relate to sharing of audio withclient devices50 connected to entertainment devices, such asstereo amplifiers80, or withlaptops70 or other computers having internal speakers or the like. The audio can be stored on thehost device20 or can be obtained from theInternet46. However, it will be appreciated that the teachings of the present disclosure can be applied to video, audio/video, and/or other forms of multimedia in addition to the audio in the exemplary embodiments disclosed herein. Furthermore, in the discussion that follows, various details of the network media delivery system are implemented using hardware and software developed by Apple Computer, Inc. Although certain details are somewhat specific to such an implementation, various principles described are also generally applicable to other forms of hardware and/or software.
During operation, the system10 delivers the same audio in separate locations of an environment (e.g., multiple rooms of a home). The system10 addresses several issues related to playing the same audio in multiple, separate locations. One issue involves playing the audio in the separate locations in a synchronized manner with each other. Because thehost device20 and theclient devices50 have their own processors, memory, and transmission interfaces, sending or streaming audio from thehost device20 to theclient devices50 through a wireless or wired communication link will not likely result in synchronized playing of the audio at the separate locations. In addition, theclient device50 may be connected to different types of entertainment devices, which may have different latency and playback characteristics. It is undesirable to play the same audio in the separate locations out of sync because the listener will hear echoes and other undesirable audio effects. The system10 addresses this issue by substantially synchronizing the playing of the audio in each location so that echo and other effects can be avoided. It should be noted that the level of precision required to substantially synchronize the playing of media at each location depends on the type of media being played, the perceptions of the user, spatial factors, and other details specific to an implementation.
Another issue related to playing of the same audio involves how to handle lost audio data at the separate locations. To address this issue, the disclosed system10 preferably uses a retransmission scheme to recover lost audio. These and other issues and additional details of the disclosed network media delivery system are discussed below.
II. Process of Operating the System
Referring toFIG. 3A, aprocess100 of operating the network media delivery system of the present disclosure is illustrated in flowchart form. During discussion of theprocess100, reference is concurrently made to components ofFIG. 1 to aid understanding. As an initial step in theprocess100, network discovery is performed, and thenetworked client devices50 and other configured devices (e.g., a configured laptop70) publish or announce their presence on the network using a predefined service type of a transfer control protocol (Block102). Thehost device20 browses the local sub-net for the designated service type (Block104).
The network discovery is used to initiate the interface between thehost device20 andclient devices50 and other compatible devices over the network of the system10. One example of such a network discovery uses Bonjour, which is a technology that enables automatic discovery of computers, devices, and services on IP networks. Bonjour uses standard IP protocols to allow devices to find each other automatically without the need for a user to enter IP addresses or configure DNS servers. Various aspects of Bonjour are generally known to those skilled in the art, and are disclosed in the technology brief entitled “MAC OS X: Bonjour,” dated April 2005, and published by Apple Computer, which is incorporated herein by reference in its entirety. To provide the media sharing functionality between thehost device20 and theclient devices50, theclient devices50 advertise over the network that they support audio streaming and particular audio capabilities (e.g., 44.1 kHz sample rate, 16-bit sample size, and 2-channel/stereo samples). Theclient devices50 may also advertise security, encryption, compression, and other capabilities and/or parameters that are necessary for communicating with theclient devices50.
Whencomplaint client devices50 are discovered, the addresses and port numbers of the discovereddevices50 are stored for use by the system10. Then, themedia application22 displays information about the foundclient devices50 in a user interface operating on the host device20 (Block106). In one embodiment, for example, themedia application22 discovers the client devices by obtaining information of the user's step up of computers and networks for their house, office, or the like from another application containing such information. In another embodiment, for example, themedia application22 discovers theclient devices50 and recognizes theseclient devices50 as potential destinations for audio data. Then, themedia application22 automatically provides these recognizeddevices50 as part of a selectable destination for audio playback in a user interface.
FIG. 4 shows an example of auser interface200 associated with the media application, such as iTunes. Among other elements, theuser interface200 shows anicon202 for selecting playback locations (e.g., networked client devices and other playback devices located in a house), which have detected on the network. A user may select theicon202 to access a pop-upmenu204 in which the user can activate/deactivate (i.e., check or uncheck) one or more of the playback locations as destinations for audio playback. Of course, theuser interface200 can display possible destinations for audio playback in a number of ways. For example, the display of possible destination can include a network schematic of the user's dwelling, office, or the like, that shows possible destination, or the display can be customized by the user.
Returning toFIG. 3A, the user selects one or more of the client devices to be used for playback in the user interface (Block108). Thehost device20 then uses Real-Time Streaming Protocol (RTSP) to set up and control the audio stream, and thehost device20 initiates an RTSP connection to each of the selectedclient devices50 to determine which set of features thedevices50 support and to authenticate the user (if a password is required) (Block110). On thehost device20, the user can then start playback using the user interface of the media application22 (Block112). Thehost device20 makes an RTSP connection to eachclient device50 to set it up for playback and to start sending the audio stream (Block114). Thehost device20 then sends a command to eachclient device50 to initiate playback (Block116). When eachclient device50 receives the command, thedevice50 negotiates timing information via User Datagram Protocol (UDP) packet exchanges with the host device20 (Block118). Eachclient device50 then determines whether the timing negotiation either succeeds or fails (Block119). Theclient devices50 do not respond to the command to initiate playback until the timing negotiation either succeeds or fails. The timing negotiation occurs early to guarantee that theclient devices50 have the initial timing information needed to synchronize their clocks with thehost device20 before any audio packets are processed by theclient devices50. ¶If the negotiation succeeds, theclient device50 can be used for playback (Block120). If the negotiation fails, however, the associatedclient device50 can perform a number of possible operations (Block121). For example, theclient device50 can return an error to thehost device20 in response to the command, and the session on thisdevice50 can be terminated. In another possible operation, the associatedclient device50 can retry to negotiate the timing information. Alternatively, the associatedclient device50 can ignore the fact that negotiating timing information has failed. This may be suitable when the user is not interested in the audio playing in synchronized manner in the multiple locations associated with theclient devices50. For example, the client device may be located by the pool or out in the garage and does not necessarily need to deliver the audio in synch with the other devices.
During playback atBlock120, thehost device20 sends audio data to theclient devices50, which process the audio data and deliver processed audio to the connected entertainment devices. An example of the process of playing back audio is discussed below with reference to the flowchart ofFIG. 3B with concurrent reference to element numerals ofFIG. 1. Various buffering, error checking, and other data transfer steps have been omitted from the general description ofFIG. 3B.
As discussed above, thehost device20 is connected to a wireless network established by theaccess point30, which can also provide for a shared connection to the Internet orother network46. Theclient devices50 are also connected to the wireless network and have their multimedia ports connected tostereo amplifiers80 or other entertainment device having output speakers or other multimedia output capability. A digital media file (e.g., a song in ACC format) is stored on thehost device20. Once playback is started (Block122), thehost device20 transcodes a portion of the media file from the format (e.g., AAC) in which it is stored to a format that is understood by client device50 (Block124). This transcoding step is not necessarily required if the file is stored on thehost device20 in a format that is understood by theclient device50. In any case, a block of audio data for transmission is created (Block126). This audio data is preferably compressed and encrypted (Block128). Encryption is not necessarily required, but it is advantageous for digital rights management purposes.
Thehost device20 then transmits the audio data over the wireless network to the client devices50 (Block130). Theclient devices50 decrypt and decompress the received audio data (Block132), and theclient devices50 decode the audio data based on the encoding performed in Block124 (Block134). The decoding results in raw audio data, which may be, for example, in the form of PCM data. This data is converted to analog audio signals by digital-to-audio converters (DAC) (Block136), and the audio signals are output to thestereo amplifiers80 for playing with their loudspeakers (Block138).
With the benefit of the description of the components of the disclosed network media delivery system and its process of operation provided inFIGS. 1 through 4, the discussion now turns to details related to how data is transferred between the host device and client devices, how lost data is handled, and how playback is synchronized, in addition to other details disclosed herein.
III. Network Transport Used for the System
To transfer audio data and other information, the network media delivery system10 of the present disclosure preferably uses User Datagram Protocol (UDP) as its underlying transport for media data. UDP is beneficial for synchronized playback to themultiple client devices50 because synchronized playback places time constraints on the network protocol. Because audio is extremely time sensitive and has a definite lifetime of usefulness, for example, a packet of media data, such as audio, can become useless if it is received after a point in time when it should have been presented. Accordingly, UDP is preferred because it provides more flexibility with respect to the time sensitive nature of audio data and other media data.
To use UDP or some similar protocol, the disclosed system is preferably configured to handle at least a small percentage of lost packets. The lost packets can be recovered using Forward Error Correction (FEC), can be hidden using loss concealment techniques (e.g. repetition, waveform substitution, etc.), or can be recovered via retransmission techniques, such as those disclosed herein. Although UDP is preferred for the reasons set forth herein, Transmission Control Protocol (TCP) can be used. Depending on the implementation, retransmission using TCP may need to address problems with blocking of transmissions. If a TCP segment is lost and a subsequent TCP segment arrives out of order, for example, it is possible that the subsequent segment is held off until the first segment is retransmitted and arrives at the receiver. This can result in a chain reaction and effective audio loss because data that has arrived successfully and in time for playback may not be delivered until it is too late. Due to some of the retransmission difficulties associated with TCP, the Partial Reliability extension of Stream Control Transmission Protocol (SCTP) can provide the retransmission functionality. Details related to the Partial Reliability of SCTP are disclosed in RFC 3758, which can be obtained from http://www.ieff.org/rfc/rfc3758.txt, which is incorporated herein by reference.
UDP is preferred for time critical portions of the protocol because it can avoid some of the problems associated with blockage of transmission. For example, UDP allows the host'smedia application22 to control retransmission of lost data because themedia application22 can track time constraints associated with pieces of audio data to be delivered. Based on the known time constraints, themedia application22 can then decide whether retransmission of lost packets of audio data would be beneficial or futile. All the same, in other embodiments, time critical portions of the disclosed system, such as time syncing, can be implemented using UDP, and audio data delivery can use TCP with a buffering system that addresses blocking problems associated with TCP.
IV. Audio Streaming and Playback with System
Before discussing how the client devices negotiate timing information in order to play audio in synchronization, the discussion first addresses how the disclosed system streams audio for playback. Referring toFIG. 5A, a portion of the disclosedsystem300 is shown with a host device320 and at least twoclient devices350A-B. Each of theclient devices350 has aprocessor352, amemory354, atransmission interface356, and anaudio interface358. Theclient devices350 also include a UDP stack and can include a TCP stack depending on the implementation. As noted previously with reference to the client device ofFIG. 2, the transmission interfaces356 can be a Wi-Fi-compatible wireless network interface, and theaudio interface358 can provide an analog and/or an optical digital output. Theprocessor352 andmemory354 can be conventional hardware components known in the art. Thememory354 has twoaudio buffers361 and362. Although not shown inFIG. 5A, each of theclient devices350 has a local clock, a playback engine, and other features.
The host device320 uses several commands to set up a connection with and to control operation of theclient devices350. These commands include ANNOUNCE (used for identification of active client devices), SETUP (used to setup connection and operation), RECORD (used to initiate playback at client devices), PAUSE (used to pause playback), FLUSH (used to flush memory at the client devices), TEARDOWN (used to stop playback), OPTIONS (used to configure options), GET_PARAMETER (used to get parameters from the client devices), and SET_PARAMETER (used to set parameters at the client devices).
Preferably, theclient devices350 are authenticated when initially establishing a connection to the media application322 running on the host device320. Upon successful authentication, the media application322 opens network connections to thetransmission interface356 of theclient devices350. Preferably, network connections between the host device320 and theclient devices350 are separated into an audio channel for sending audio data and a control channel used to set up connection and operation between thedevices320 and350. However, a single channel could be used for data and control information. Once the connections are established, the host device320 begins sending data to theclient devices350. In turn, theclient devices350 receive the audio data, buffer some portion of the data, and begin playing back the audio data once the buffer has reached a predetermined capacity.
Communication between the host device320 and theclient devices350 preferably uses the Real Time Streaming Protocol (RTSP) standard. The media application322 at the host device320 preferably uses Real-Time Transport Protocol (RTP) encapsulated in User Datagram Protocol (UDP)packets330 to deliver audio data from the host device320 to theclient devices350. RTSP, RTP, and UDP are standards known to those skilled in the art. Therefore, some implementation details are not discussed here. Details of RTSP can be found in “Real-Time Streaming Protocol,” RFC 2326, which is available from http://www.ietf.org/rfc/rfc2326.txt and which is hereby incorporated by reference in its entirety. Details of RTP can be found in “Real-Time Transport Protocol,” RFC 3550, which is available from http://www.ietf.org/rfc/rfc3550.txt and which is hereby incorporated by reference in its entirety.
Thepackets330 have RTP headers and include both sequence numbers and timestamps. The data payload of theRTP packets330 contains the audio data to be played back by theclient devices350. The media files, from which thepackets330 are derived, can be stored on host device320 in one or more formats, including, for example, MP3 (Motion Picture Expert's Group Layer 3), AAC (Advanced Audio Coding a/k/a MPEG-4 audio), WMA (Windows Media Audio), etc. Preferably, the media application322 running on the host device320 decodes these various audio formats to construct thepackets330 so that theclient devices350 do not need decoders for multiple formats. This also reduces the hardware performance requirements of theclient devices350. Another advantage of performing decoding on the host device320 is that various effects may be applied to the audio stream, for example, cross fading between tracks, volume control, equalization, and/or other audio effects. Many of these effects would be difficult or impossible to apply if theclient device350 were to apply them, for example, because of the computational resources required. Although not preferred in the present embodiment, other embodiments of the present disclosure can allow for decoding at theclient devices350 for audio and other forms of media.
The host device320 preferably uses a separateunicast stream310A-B ofRTP packets330 for each of theclient devices350A-B. In the present embodiment, the separateunicast streams310A-B are intended to deliver the same media information (e.g., audio) to each of theclient devices350A-B so that the same media can be presented at the same time frommultiple client devices350A-B. In another embodiment, each of the separateunicast streams310A-B can be used to deliver separate media information (e.g., audio) to each of theclient devices350A-B. The user may wish to unicast separate media information in some situations, for example, if a first destination of a first unicast stream of audio is a client device in a game room of a house and a second destination of a second unicast stream of different audio is a client device in the garage of the house. Therefore, it may be preferred in some situations to enable to the user to not only select sending the same media information by unicast streams to multiple client devices by to also allow the user to send different media information by separate unicast streams to multiple client devices. Theuser interface200 ofFIG. 4 can include a drop down menu or other way for the user to make such a related selection.
Separate unicast streams310 are preferred because multicasting over wireless networks can produce high loss rates and can be generally unreliable. All the same, the disclosedsystem300 can use multicasting over the wireless network. In general, though, bandwidth limitations (i.e. fixed multicast rate), negative effects on unicast performance (low-rate multicast slows down other unicast traffic due to multicast packets taking longer), and loss characteristics associated with multicasting over wireless (multicast packets are not acknowledged at the wireless layer) make multicasting less desirable than using multiple,unicast streams310A-B as preferred. Use of multiple,unicast streams310A-B does correspond to an increase in bandwidth asadditional client devices350 are added to a group of designated locations for playback. If the average compression rate for audio data is about 75%, the increase in bandwidth associated with multiple,unicast streams310A-B may correspond to about 1 Mbit/sec bandwidth required for eachclient device350 so that the host device320 can send compressed audio data to the access point (e.g.,30;FIG. 1) and another 1 Mbit/sec so that the access point can forward the compressed audio data to theclient device350.
Once an RTSP session has been started and the RECORD command has been sent from the host device320 to theclient devices350, the host device320 begins sendingnormal RTP packets330 containing the audio data for playback. TheseRTP packets330 are sent at regular intervals, based on the number of samples per second, which can be about 44,100 Hz for audio. TheRTP packets330 are sent at the regular intervals in a throttled and evenly spaced manner in order to approximate the audio playback rate of theremote client devices350 because the UDP-based connection does not automatically control the sending of data in relation to the rate at which that data is consumed on theremote client devices350.
Because each of themultiple client devices350 has their ownaudio buffers361,362, network conditions, etc., it may not be desirable to use a feedback scheme when sending thepackets330. Accordingly, the host device320 sends audio data at a rate that preferably does not significantly under-run or over-run aplayback engine353 of any of theremote client devices350. To accomplish this, the host device320 estimates afixed delay340 to insert betweenpackets330 to maintain the desired audio playback rate. In one embodiment, thepackets330 of audio data are sent with a delay of about 7.982-ms between packets330 (i.e., 352 samples per packet/44,100 Hz=˜7.982-ms per packet), which corresponds to a rate of about 125 packets/sec. Because thedelay340 is fixed, each of theclient devices350 can also detect any skew between its clock and the clock of the sending host device320. Then, based on the detected skew, eachclient device350 can insert simulated audio samples or remove audio samples in the audio it plays back in order to compensate for that skew.
As alluded to above, theRTP packets330 have timestamps and sequence numbers. When anRTP packet330 is received by aclient device350, theclient device350 decrypts and decompresses the payload (see Encryption and Compression section below), then inserts the packet320, sorted by its timestamp, into a packet queue. The twoaudio buffers361 and362 are alternatingly cycled as audio is played back. Eachaudio buffer361 and362 can store a 250-ms interval of audio. The received RTP packets in the packet queue are processed when one of the two, cycling audio buffers361 and362 completes playback. In one embodiment, the audio is USB-based so this is a USB buffer completion process.
To process the queued packets, theengine353 assembles the queued RTP packets in one of theaudio buffers361 or362. During the assembly, theengine353 calculates when each of queued RTP packets should be inserted into the audio stream. The RTP timestamp in the packets combined with time sync information (see the Time Synchronization section below) is used to determine when to insert the packets. Theengine353 performs this assembly process and runs through the queued packets to fill theinactive audio buffer361 or362 before the currently playingaudio buffer361 or362 has completed. Because each of theaudio buffers361 and362 can store 250-ms of audio, theclient device350 has a little less than 250-ms to assemble all the RTP packets, conceal any losses, and compensate for any clock skew. If there are any gaps in the audio (e.g., the device's audio clock is skewed from the host's audio clock, a packet was lost and not recovered, etc.), then those gaps can be concealed by inserting simulated audio samples or removing existing audio samples.
V. Encryption and Compression
For digital rights management purposes, it is desirable to determine whether theclient devices350 are authorized to receive an audio data stream and/or whether the communications links between the host device320 and theclient devices350 are secure (encrypted). This requires some form of authentication, which is preferably based on a public key/private key system. In one embodiment, eachclient station350 is provided with a plurality of private keys embedded in read only memory (ROM). The media application at the host device320 is then provided with a corresponding plurality of public keys. This allows identification data transmitted from thenetworked client devices350 to the media application to be digitally signed by theclient device350 using its private key, by which it can be authenticated by the media application at the host device320 using the appropriate public key. Similarly, data sent from the media application at the host device320 to thenetworked client stations350 is encrypted using a public key so that only aclient device350 using the corresponding private key can decrypt the data. The media software and networked media station can determine which of their respective pluralities of keys to use based on the exchange of a key index, telling them which of their respective keys to use without the necessity of transmitting entire keys.
In addition to encryption, the decoded audio data is preferably compressed by host device320 before transmission to theclient devices350. This compression is most preferably accomplished using a lossless compression algorithm to provide maximum audio fidelity. One suitable compressor is the Apple Lossless Encoder, which is available in conjunction with Apple's iTunes software. Theclient devices350 require a decoder for the compression codec used.
TheRTP packets330 are preferably compressed using the Apple Lossless algorithm and are preferably encrypted using the Advanced Encryption Standard (AES) with a 128-bit key size. Loss is still inevitable even though thesystem300 uses a UDP-based protocol that attempts to recover from packet loss via retransmission and/or Forward Error Correction (FEC). For this reason, encryption and compression preferably operate on a per-packet basis. In this way, eachpacket330 can be completely decoded entirely on its own, without the need for any surroundingpackets330. The Apple Lossless algorithm is used to compress eachindividual packet330 rather than compressing a larger stream of audio and packetizing the compressed stream. Although compressing eachindividual packet330 may reduce the effectiveness of the compression algorithm, the methodology simplifies operation for theclient devices350 and allows them to be more tolerant to packet loss. Although compression rates are highly dependent on the content, music audio can have an average compression rate of about 75% of the original size when used by the disclosedsystem300.
The AES-128 algorithm is used in frame-based cipher block chaining (CBC) mode to encrypt payloads of theRTP packets330 and the RTP payload portion of RTCP retransmission packets (380;FIG. 5B) discussed below. Because eachpacket330 represents a single audio frame, no other packets are required to decrypt each packet correctly. The system preferably supports any combination of encryption and compression, such as both encryption and compression, encryption only, compression only, or neither encryption nor compression. Encryption and compression are configured during the RTSP ANNOUNCE command. The format used to configure encryption and compression is based on the Session Description Protocol (SDP) and embedded as RTSP header fields. Compression uses an SDP “m” (media description) combined with an “rtpmap” and “fmtp” to specify the media formats being used numerically and how those numbers map to actual compression formats and algorithms.
VI. Retransmission of Lost Packets of Audio Data
As noted above, theRTP packets330 received from the host device320 have RTP sequence numbers. Based on those RTP sequence numbers, theclient device350 can determine whetherpackets330 that have been lost during transmission or for other reasons. The lostRTP packets330 cannot be queued for playback in theaudio buffers361 and362 of theclient devices350 so that gaps will result in the audio. To address this issue, theclient devices350 requests that the lost packet(s) be retransmitted. Referring toFIG. 5B, portion of the disclosedsystem300 is shown again to discuss how thesystem300 attempts to retransmit packets lost during original transmission.
To handle retransmissions, thesystem300 preferably uses Real-Time Transport Control Protocol (RTCP) when packet loss is detected. As note above, the sequence numbers associated with the received RTP packets (330;FIG. 5A) are used to determine if any packets have been lost in the transmission. If there is a gap in the sequence numbers, theclient device350 sends aretransmission request370 to the sender (e.g., host device320 or other linked client device350) requesting all the missing packets. In one embodiment, theretransmission request370 can request up to a maximum of 128 lost packets per detected gap.
In response to theretransmission request370, the host device320 sends one ormore retransmission responses380 for lost packets. Due to limitations of the maximum transmission unit (MTU) on RTCP packet sizes, only one response can be sent perretransmission response packet380. This means that a singleretransmission request packet370 from adevice350 may generate up to 128retransmission response packets380 from the host device320 if all of the lost packets are found in the host's recently sent packets.
Because RTP does not currently define a standard packet to be used for retransmissions, an RTP extension for an RTCP Retransmission Request packet is preferably defined.FIG. 6A shows an example of an RTCPRetransmit Request Packet370 for use with the disclosed system. The Sequence Number Base refers to the sequence number of the first (lost) packet requested by this RTCPRetransmit Request Packet370. The Sequence Number Count refers to the number of (lost) packets to retransmit, starting at the base indicated.
InFIG. 5A, theclient device350 sending the RTCPRetransmission Request packet370 tracks the retransmission requests that it sends in a queue to facilitate sending additional requests if a response to theretransmission request370 is not received in a timely manner. When aretransmission request370 has not been responded to in a timely manner, anotherretransmission request370 is sent from theclient device350. The process of retrying can be continued until a maximum time has elapsed since thefirst retransmission request370 was sent. After that maximum time, it is likely too late to deal with the lost packet anyway because the lost packets time for insertion in one of theaudio buffers361 or362 has passed.
When multiple, contiguous packets have been lost, theinitial retransmit request370 includes all the missing packets. However, if aresponse380 is not received in a timely manner, the missing packets are spread out amongmultiple requests370 over time when reattempts are made. Spreading out among multiple requests can maintain a uniform delivery of request and response packets. This also prioritizes packets by time and defers delivery of packets whose presentation time is later.
When the host device320 receives aretransmission request370, the host device320 searches a list of recently sent packets stored at the device320. If the requested packet in therequest370 is found, the host device320 sends aretransmission response380 to theclient device350. An example of an RTP extension for an RTCPRetransmit Response Packet380 is shown inFIG. 6B. The RTCPRetransmit Response Packet380 includes the complete RTP packet (e.g., header and payload) being retransmitted. Theretransmission packet380, however, is only sent to the sender of theretransmission request370, unlike the normal RTP packets (330;FIG. 5A) that are sent to all devices participating in the session.
If the requested packet is not found by the host device320, however, anegative response390 is sent so thecorresponding client device350 knows that any further attempt to request that particular packet is futile. An example of an RTP extension for an RTCP FutileRetransmit Response Packet390 is shown inFIG. 6C. The RTCP FutileRetransmit Response Packet390 includes the 16-bit sequence number of the failed packet followed by a 16-bit pad containing zero.
InFIG. 5B, theclient device350 receiving aretransmission response packet380 inserts thepacket380 into the packet queue in the same way used for inserting packets received as part of the normal RTP packet stream discussed above with reference toFIG. 5A. By definition, however, theretransmission response packet380 is already out-of-sequence and, therefore, does not trigger new retransmission requests based on its sequence number. If an existing packet already exists at the same timestamp as the incoming packet, either via the normal RTP stream or via retransmission, the packet is dropped as a duplicate.
Scheduling retransmission is based on regular reception of RTP packets (330;FIG. 5A) rather than explicit timers. This simplifies the code required and reduces retransmission overhead, but it also throttles retransmission during burst outages (e.g. wireless interference resulting in packet loss during a period). Since retransmissions only occur whenRTP packets330 are received, retransmissions are deferred beyond a possible window whenpackets330 may have been lost anyway.
VII. Controlling Relative Volume at Multiple Client Devices During Playback
Because the disclosedsystem330 plays music at multiple locations at the same time, it may be desirable to be able to adjust the volume at each location individually. The disclosedsystem300 supports individual volume control by using a relative volume setting specified using a header field as part of an RTSP SET_PARAMETER request. The volume is expressed as a floating-point decibel level (e.g. 0 dB for full volume). In addition to volume, the disclosedsystem330 can set other parameters related to the delivery of media at multiple locations using similar techniques. For example, the disclosedsystem300 can be used to set equalization levels at each location individually.
VII. Time Synchronization Between Host Device and Multiple Client Devices
Referring toFIG. 7,portion300 of the disclosed system is shown having a host device320 andmultiple client devices350 exchanging timing information. To play the same audio on themultiple client devices350 in synchronization with each other, the timebase on themultiple client devices350 is synchronized with areference clock324 on the host device320. As noted previously, the host device320 can be a Mac or Windows-based system running the media application322. The host device320 does not need to run any special server software, and only the media application322 according to the present disclosure is required. Thereference clock324 at the host device320 does not need to be synchronized with an external clock, such provided by an NTP server. Rather, theclient devices350 only need to be synchronized to thesame reference clock324 even if thatclock324 is wrong with respect to an external clock.
Thereference clock324 is maintained within the media application322 running on the host device320. If the host device320 is a Macintosh computer, then thereference clock324 can use the PowerPC timebase registers. If the host device320 is a Windows-based computer, thereference clock324 can use the Pentium performance counter registers. Thereference clock324 of the host's media application322 is separate from the normal wall-clock time of the host device320, which is maintained by an NTP agent and synchronized to an external clock. Thereference clock324 of the host's media application322 does not need to be synchronized to an external clock and in some cases this would actually be undesirable. For example, a time difference between thereference clock324 and the local clock of aclient device350 can be explicitly skewed or adjusted to account for spatial effects or differences, such at theclient device350 being located farther away than another. In addition, there may be situations where a user may want to intentionally skew the clocks to produce effects. Accordingly, the user interface associated with the disclosedsystem300, such asinterface200 ofFIG. 4, may include a drop-down menu or other control for intentionally manipulating skew.
To synchronize the timebase between theclient devices350 and the host device320, the media application322 uses time sync information based on the principals of the Network Time Protocol (NTP) encapsulated in Real-Time Transport Control Protocol (RTCP) packets. Preferably, NTP is not used directly to avoid collisions with existing NTP services (e.g., date/time synchronization with an external clock) and to avoid permission issues due to NTP's use of a privileged port number. Even though the time sync information of the media application322 is encapsulated in RTCP packets, the time synchronization works substantially the same as NTP and will be referred to as NTP henceforth. NTP is known in the art and provides the basis for inter-media synchronization support in the Real-Time Transport Protocol (RTP). Details of NTP can be found in “Network Time Protocol,” RFC 1305, which is available from http://www.ietf.org/rfc/rfc1305.txt and is incorporated herein by reference in its entirety.
Techniques of NTP, however, are preferably not used to provide moment-to-moment time directly to eachclient device350 due to issues related to network latency, bandwidth consumption, and CPU resources. Accordingly, techniques of NTP are used for periodic synchronization of time. In addition, eachclient device350 is provided with a high-resolution clock364 based on the local clock hardware of each client device350 (see Local Clock Implementation section below), the high-resolution clocks364 are synchronized with thereference clock324 of the host device320 using the NTP techniques.
Synchronizing thelocal clocks364 of theclient devices350 with thereference clock324 preferably does not jump to a new time with every correction (referred to as stepping) because stepping can introduce discontinuities in time and can cause time to appear to go backward, which can create havoc on processing code that relies on time. Instead, the time synchronization techniques of the present disclosure preferably correct time smoothly using clock slewing so that time advances in a linear and monotonically increasing manner. In the clock slewing techniques of the present disclosure, frequent micro-corrections, below a tolerance threshold, are performed to the runningclocks364 at theclient devices350 to bring their timebase gradually in sync with the timebase of thereference clock324 of the host's media application322. The clock slewing techniques also predict the relative clock skew between thelocal clocks364 and the host'sreference clock324 by analyzing past history of clock offsets and disciplining thelocal clocks364 to run at the same rate as the host'sreference clock324.
Because acentralized reference clock324 is used forseveral client devices350 on a local network, one way to disseminate time information is to send broadcast/multicast NTP packets periodically from the host device320 to theclient devices350. Sending NTP packets by multicasting must account for losses and performance degradation that may result from the wireless 802.11b and 802.11g communication links between the host device320 and theclient devices350. Due to issues of performance degradation, loss rates, and lack of propagation delay information associated with broadcasting or multicasting,unicast NTP transactions400 are preferably used.
As part of theunicast NTP transactions400, theclient devices350 periodically send unicast requests410 to the host device320 so that theclient devices350 can synchronize theirclocks364 with thereference clock324. Then, theclient devices350use responses420 from the host device320 corresponding to their requests410 to continually track the clock offset and propagation delay between theclient device350 and host device320 so theclient devices350 can update theirlocal clocks364. Thus, synchronization of the audio playback at theclient devices350 is achieved by maintaininglocal clocks364 that are synchronized to the host device'sclock324. Since all client devices participating in a particular session are synchronized to thereference clock324. When theclocks324 and364 are synchronized, theclient devices350 can play audio in-sync without ever communicating with each other.
With the timebase at theclient devices350 synchronized with thereference clock324 at the host device320, theclient devices350 can use the synchronized timebase to determine when to playback packets of audio data. As noted previously, audio data is delivered to theclient devices350 using RTP packets (330;FIG. 5A) that contain an RTP timestamp describing the time of a packet's audio relative to other packets in the audio stream. Theclient device350 uses this timestamp information to reconstruct audio at the correct presentation time for playback. Accordingly, eachclient device350 correlates the NTP timebase of itslocal clock364 with the RTP timestamps provided in the RTP packets of the audio stream.
With respect to the unicast requests andresponses410 and420 noted above, RTP does not define a standard packet format for synchronizing time. There is an RTCP sender report, which contains some timing information, but not everything that is needed to synchronize time (e.g., there is no originate time for receivers to determine the round trip time). There are also rules preventing sender reports from being sent before any RTP data has been sent, which is critical for playing the initial audio samples in sync.
Therefore, the host's media application322 preferably defines an RTP extension for an RTCP TimeSync packet for the requests andresponses410 and420. An embodiment of anRTCP TimeSync packet430 is shown inFIG. 8A. TheRTCP TimeSync Packet430 includes a header, the RTP timestamp at NTP Transmit (T3) time, NTP Originate (T1) timestamp, most significant word; NTP Originate (T1) timestamp, least significant word; NTP Receive (T2) timestamp, most significant word; NTP Receive (T2) timestamp, least significant word; NTP Transmit (T3) timestamp, most significant word; NTP Transmit (T3) timestamp, least significant word. The Marker bit (M) is not used for theseTimeSync packets430. The packet types (PT) include ‘210’ for a client device request to synchronize time in a manner similar to an NTP client device request and include ‘211’ for a host device response to a client device request. The ‘RTP Timestamp’ is the RTP timestamp at the same instant as the transmit time (T3). This should be 0. The times T1-T3 come from NTP and are used in the same manner as NTP.
InFIG. 7, the RTCP TimeSync request packets410 from theclient devices350 are sent once the RTSP RECORD command is received so that theclient devices350 can initially synchronize time. Then, theclient devices350 periodically send RTCP TimeSync request packets410. In one embodiment, the periodic intervals for synchronizing time can be at random intervals between two and three seconds apart. The RTCPTimeSync response packets420 are sent by the host device320 in response to receiving a valid RTCP TimeSync request packet410.
The host's media application322 also defines an RTP extension for anRTCP TimeAnnounce packet450. TheRTCP TimeAnnounce packets450 are sent periodically (e.g., once a second) by the host device320 to update theclient devices350 with the current timing relationship between NTP and RTP. TheRTCP TimeAnnounce packets450 can be sent sooner if the host device320 changes the NTP to RTP timing relationship. For example, when a new song starts, the host's media application322 can send a newRTCP TimeAnnounce packet450 with the marker bit (M) set to indicate that the NTP to RTP timing relationship has changed.
As shown in the embodiment ofFIG. 8B, theRTCPTimeAnnounce Packet450 includes an RTP timestamp; an NTP timestamp, high 32 bits; an NTP timestamp, low 32 bits; and an RTP timestamp when the new timeline should be applied. The Marker bit (M) is used to indicate an explicit change in the NTP to RTP timing relationship. The packet type (PT) is defined as ‘212’ to indicate that the host device is announcing a new NTP to RTP relationship. The “RTP Timestamp” is the RTP timestamp at the same instant as the NTP timestamp. The “NTP Timestamp” is the NTP timestamp at the same instant as the RTP timestamp. The field “RTP Apply Timestamp” refers to the RTP timestamp when the new timeline should be applied.
IX. Local Clock Implementation at Host Device
Returning toFIG. 7, thelocal clock364 of theclient device350 is discussed in more detail. Thelocal clock364 maintains a 64-bit nanoseconds counter that starts at zero on system boot and uses the 60-Hz clock interrupt to increment the nanoseconds counter. When an interrupt occurs, the 32-bit timer counter is used to determine how much time has passed since the last clock interrupt. This determined amount of time since the last clock interrupt is referred to as the tick delta and is in units of 1/100 of a microsecond. The tick delta is then converted to nanoseconds and is added to the nanoseconds counter to maintain the current time. The tick delta is used in this manner to avoid drift due to interrupt latency.
To maintain more accurate time, it may be preferable to allow time to be adjusted gradually. Accordingly, the nanoseconds counter is adjusted in very small increments during each clock interrupt to “slew” to the target time. These small increments are chosen based on a fraction of the amount of adjustment needed and based on the tick delta. This prevents time from appearing to go backward so that time always increases in a linear and monotonic manner.
Additionally, theclient device350 can predict what the next NTP clock offset will be in the future to further adjust thelocal clock364. To make the prediction, theclient device350 uses a moving average of NTP clock offsets to estimate the slope of the clock skew between each ofclient device350 and host device320. This slope is then extrapolated to estimate the amount of adjustment necessary to keep thelocal clock364 at theclient device350 in sync with thereference clock324. Theclient device350 then makes very small adjustments to the per-clock interrupt increment, in addition to the adjustments made for clock slewing, to simulate the faster or slower clock frequency of the host'sreference clock324. This allows thelocal clock364 to remain synchronized between NTP update intervals and may even allow thereference clock324 to remain synchronized in the absence of future NTP clock updates).
X. Simulated Timelines for Audio Playback
Referring toFIG. 9, additional details related to synchronized delivery of media with multiple client devices are discussed. InFIG. 9, portion of the networkmedia delivery system300 is again illustrated. The host device320 is schematically shown having the media application322 andreference clock324, as described previously. In addition, the host device320 is schematically shown having anengine323, aprocessor325, atransmission interface326, and anaudio interface327. As disclosed herein, the host device320 can be a computer. Therefore, theprocessor325 can be a conventional computer processor, thetransmission interface326 can be a Wi-Fi compatible wireless network interface, and theaudio interface327 can be a sound card or the like for playing audio. In addition, the media application322 can be a software program stored in memory on the computer and operating on thecomputer processor325. Furthermore, the media application322 can include theengine324 for processing media (e.g., audio) data and can include thereference clock324 for synchronizing time.
To play audio in a synchronized manner on multiple client devices350 (only one of which is shown inFIG. 9), audio data needs to be scheduled for playback at a constant or consistent rate. One way to achieve this is for the media application322 on the host device320 to sendpackets330 of audio data at a constant rate and to have the timeline for presenting that audio data with theclient device350 tied to the send rate of thepackets330. For example, packets of audio data can be sent about every 7.982-ms (i.e., 352 samples per packet/44,100 Hz=˜7.982-ms per packet, which corresponds to a rate of about 125 packets/sec), and the timeline for presenting that audio can correspond directly to this rate. While this works, the send rate of thepackets330 and the presentation timeline at theclient device350 must have a one-to-one correspondence, which can restrict the ability to buffer the audio data at theclient device350. As discussed herein, buffering of the audio data at theclient devices350 is desirable for handling lost packets, clock skew, etc. If five seconds of buffering is desired at theclient device350, there will be a five-second delay between the time when the audio data arrives at the client device and the time when it is actually played. Unfortunately, users can readily perceive such a high level of latency when buffering is used with such a one-to-one correspondence between the packet send rate and the presentation time of the audio.
To provide buffering without this high level of latency, the sending ofpackets330 is preferably decoupled or separated from the timeline for presenting the audio data of thosepackets330. To achieve this, the media application322 maintains twosimulated timelines328 and329. Afirst packet timeline328 corresponds to whenpackets330 should be sent, and asecond playback timeline329 corresponds to when the audio data in thosepackets330 should be presented or delivered (i.e., played for the user). Theseparate timelines328 and329 allow the send rate of thepackets330 to vary as needed so that thesystem300 can provide buffering without introducing latency. If more buffering is needed, for example, the packet send rate of thefirst packet timeline328 can be temporarily increased to front-load the buffers inmemory354 on theclient devices350 and can be later reduced back to the real-time send rate of thepackets330. Theseparate timelines328 and329 also avoid problems associated with fluctuations in the presentation time of audio caused by scheduled latency of the operating systems on the devices.
Thesecond playback timeline329, which corresponds to when the audio data in thepackets330 should be presented or delivered, is constructed by the host device320. Using thereference clock324 and a desired playback rate of the audio, the host device320 estimates the number of audio samples that would have played at a given point in time at theclient device350 to construct theplayback timeline329. Thissecond playback timeline329 is then published from the host device320 to theclient devices350 as part of thetime announcements450 sent periodically from the host device320 to theclient devices350. As discussed in greater detail previously, theclient device350 uses theperiodic time announcements450 to establish and maintain the relationship between the RTP timestamps in theaudio packets330 and the corresponding NTP presentation time for theaudio packets330 so that theclient device350 can deliver the audio in synch with other devices.
By having the send rate of the packets330 (represented by the packet timeline328) separate from the presentation time (represented by the playback timeline329), theperiodic time announcements450 are not designed to take effect immediately when received by theclient devices350 since theannouncements450 may come in advance of when they are effective. As noted previously, however, thetime announcement packets450 contain an additional RTP timestamp that indicates when the announced time should take effect at theclient device350. Therefore, atime announcement packet450 is saved at aclient device350 once it is received. When audio playback reaches the RTP timestamp of that savedtime announcement packet450, theclient device350 applies the time change contained in that savedtime announcement package450.
To play audio in a synchronized manner on multiple client devices350 (only one of which is shown inFIG. 9), it is also preferred to consider the amount of latency or delay between the time when the audio data is scheduled to be delivered at thedevice350 and the time when the audio is actually delivered by the device350 (and associated entertainment devices). Different types of client devices350 (and associated entertainment devices) will typically have different latency characteristics. Accordingly, the disclosedsystem300 preferably provides a way for eachclient device350 to report its latency characteristics (and that of its associated entertainment device) to the host device320 so that these latency characteristics can be taken into consideration when determining how to synchronize the playback of media at theclient devices350.
Determination of the latency characteristics of theclient devices350 preferably occurs at initial set up of thesystem300. For example, the media application322 at the host device320 sends RTSP SETUP requests312 to theclient devices350 at initial set up. Inresponses314 to the RTSP SETUP requests312, theclient devices350 use a header field to report the latency characteristics associated with theclient devices350. The values of the field are preferably given as the number of RTP timestamp units of latency. For example, aclient device350 having 250-ms of latency at a 44,100-Hz sample rate would report its audio-latency as 11025 RTP timestamp units. Based on the reported latency characteristics from theclient devices350, the host's media application322 determines a maximum latency of allclient devices350 in the group being used for playback. This maximum latency is then added to theplayback timeline329.
XI. Synchronized Local Playback at Host Device
In addition to synchronized playback atmultiple client devices350, the disclosedsystem300 allows for synchronized local playback at the host device320 running the media application322. For example, the host device320 can play the same audio to its local speakers (not shown) that is being played by theclient devices350, and thehost device350 can have that same audio play in sync with the all theother devices350. To achieve this, the host device320 uses many of the same principles as applied to theclient devices350. Rather than receiving packets of audio data over a wireless network, however, audio data is delivered directly to alocal playback engine323 of the media application322. In addition, because local playback on the host device320 is handled by the media application322, there is no need for the host device320 to synchronize time with itsown reference clock324.
The packets of audio data delivered to the synchronizedlocal playback engine323 within the media application322 are generated before being compressed and encrypted. Since these packets do not leave media application322, no compression or encryption is necessary. In one embodiment, the host device320 uses CoreAudio to playback audio. CoreAudio can be used for both Mac-based or Windows-based computers becauseQuickTime 7 provides support for CoreAudio on Windows-based computers. During operation, an output AudioUnit is opened, and a callback is installed. The callback is called when CoreAudio needs audio data to play. When the callback is called, the media application322 constructs the relevant audio data from the raw packets delivered to it along with the RTP->NTP timing information. Since CoreAudio has different latency characteristics than the latency characteristics associated with theclient devices350, information is also gathered about the presentation latency associated with the audio stream of CoreAudio. This information is used to delay the CoreAudio audio stream so that it plays in sync with the known latency of the audio streams associated with theclient devices350.
XII. Stutter Avoidance During Audio Playback
In addition to the techniques discussed previously for handling lost RTP packets of audio data and for synchronizing clocks between the host device320 and theclient devices350, the disclosedsystem300 preferably limits stuttering in the playback of media. Referring toFIG. 10, analgorithm500 for limiting stutter in the playback of media is shown in flowchart form. Thisalgorithm500 can be performed by the host device of the disclosed system for each of the client devices. Using thealgorithm500, the disclosed system detects audible “glitches” caused by gaps in the media (e.g., audio). These gaps can be caused by loss of packets, packets arriving too late, changes to the synchronized timeline, large amounts of clock skew, or other reasons. First, the system determines the number of such “glitches” occurring in a period of time for each of the client devices (Block502). Then, a determination is made whether the number of glitches is greater than a predetermined limit (Block504). For example, the audio is analyzed over a period of 250-ms to determine whether the 250-ms period is either “glitching” (bad) or “glitch-free” (good). A credit system is used to make this determination. Each time a glitching period is detected, the system takes away a number of credits from a credit score of the client device. The credit score is capped at a minimum value to prevent a long sequence of glitching periods from requiring protracted period of time for the client device to recover, because the intention is to allow the client device to recover quickly as soon as its audio situation clears up.
If the number of credits goes below a predefined threshold atBlock504, the client device is put on probation (Block506). When on probation, audio is disabled and silenced, but the client device can still send retransmit requests to the host device as needed to recover lost packets of audio data. The audio is silenced during probation so that the client device will not produce an annoying stutter sound when a significant number of glitching periods are successively delivered in an interval of time. Even though the audio is silenced, retransmits remain enabled so that operation of the client device can improve to a point suitable to resume playback.
If the number of glitches is not greater than the limit atBlock504, then the client device is set as “glitch free” (Block505). Each time a “glitch-free” period is detected, for example, a number of credits is added to the credit score for the client device. The number of credits is capped at a maximum value to prevent a long sequence of glitch-free periods from extending the number of glitches required before going into stutter avoidance mode because the intention is to be able to go into stutter avoidance mode quickly so that there is not any significant stutter produced.
For the client device on probation with audio silenced and retransmits enabled, the number of glitches occurring in a predetermined unit of time (e.g., X seconds) is determined (Block508). The number of glitches is compared to a predetermined limit or threshold (Block510). If the client device is on probation for the predetermined unit of time (X seconds) and the number of credits reaches an upper threshold atBlock510, the client devices is placed back into normal playback mode atBlock505.
If the client device remains on probation for the predetermined unit of time (X seconds) and the number of credits has not reached an upper threshold atBlock510, then the client device is put in jail (Block512). When in jail, the audio remains disabled and silenced. However, retransmits are now disabled. In this situation, the client device has not recovered for a significant period of time, and any retransmits may actually be making the situation worse. By disabling retransmits, the recovery time may be improved by reducing congestion on the network. In addition, disabling retransmits may at least reduce the amount of traffic on the network and may allow other client devices to receive packets of audio data more reliably.
If the client device remains in jail for a predetermined unit of time (e.g., Y seconds) atBlock514, the client device goes on parole to see if its situation has improved (Block516). When on parole, audio is still disabled and silenced. However, retransmits are re-enabled. The number of glitches occurring in a predetermined unit of time (e.g., Z seconds) is determined (Block518) and compared to a predetermined limit (Block520). If the client device is on parole for the predetermined unit of time and the number of credits reaches an upper threshold atBlock520, then client device returns to normal playback mode atBlock505 where audio and retransmits are both enabled. If the client device stays on parole for the predetermined unit of time and the number of credits does not reach the upper threshold atBlock520, however, the client device goes back to jail atBlock512.
XIII. Handling Address Resolution Protocol
With reference again toFIG. 5A, for example, the high volume of data being exchanged by the disclosedsystem300 can cause Address Resolution Protocol (ARP) requests, which are broadcast, to become lost. This may be the case especially when the ARP requests are wirelessly broadcast. Address Resolution Protocol (ARP) is a network protocol used to map a network layer protocol address to a data link layer hardware address. For example, ARP can be used to resolve an IP address to a corresponding Ethernet address. When ARP requests are lost, ARP entries at the host device320 can expire and can fail to be renewed during operation of the disclosedsystem300 so that connections between the host device320 andclient devices350 may appear to go down. Because steady, unicast streams310 ofpackets330 are being exchanged during operation of the disclosedsystem300, one solution to this problem is to extend the expiration times of the ARP entries at the host device320 as long aspackets330 from the host device320 are being received by theclient devices350. By extending the expiration time, the ARP entry for a givenclient device350 does not time out (as long aspackets330 are being received by that client device350), and theclient device350 does not need to explicitly exchange ARP packets, which may tend to get lost as noted previously, with the host device320.
In another solution, theclient devices350 periodically (e.g., once a minute) send unsolicited, unicast ARP request packets (not shown) to the host device320. These unicast ARP request packets contain source addresses (Internet Protocol (IP) address and the hardware address of the client device350) and target addresses (IP address and hardware address of the host device320). The unicast ARP request packets are more reliable than broadcast packets because the unicast packets are acknowledged and retried at a wireless layer. To keep the ARP entries on the host device320 for theclient devices350 from expiring, the host device320 updates its ARP cache when it receives these unicast ARP request packets by refreshing the timeout for the corresponding ARP entries. This prevents the host device320 from needing to issue a broadcast ARP request when the ARP entry for aclient device350 expires because the ARP entries effectively never expire as long as theclient devices350 unicast ARP request packets to the host device320.
The foregoing description of preferred and other embodiments is not intended to limit or restrict the scope or applicability of the inventive concepts conceived of by the Applicants. In exchange for disclosing the inventive concepts contained herein, the Applicants desire all patent rights afforded by the appended claims. Therefore, it is intended that the appended claims include all modifications and alterations to the full extent that they come within the scope of the following claims or the equivalents thereof.