TECHNICAL FIELDEmbodiments described herein relate to position calibration of audio sensors and actuators in a distributed computing platform.
BACKGROUNDMany emerging applications like multi-stream audio/video rendering, hands free voice communication, object localization, and speech enhancement, use multiple sensors and actuators (like multiple microphones/cameras and loudspeakers/displays, respectively). However, much of the current work has focused on setting up all the sensors and actuators on a single platform. Such a setup would require a lot of dedicated hardware. For example, to set up a microphone array on a single general purpose computer, would typically require expensive multichannel sound cards and a central processing unit (CPU) with larger computation power to process all the multiple streams.
Computing devices such as laptops, personal digital assistants (PDAs), tablets, cellular phones, and camcorders have become pervasive. These devices are equipped with audio-visual sensors (such as microphones and cameras) and actuators (such as loudspeakers and displays). The audio/video sensors on different devices can be used to form a distributed network of sensors. Such an ad-hoc network can be used to capture different audio-visual scenes (events such as business meetings, weddings, or public events) in a distributed fashion and then use all the multiple audio-visual streams for an emerging applications. For example, one could imagine using the distributed microphone array formed by laptops of participants during a meeting in place of expensive stand alone speakerphones. Such a network of sensors can also be used to detect, identify, locate and track stationary or moving sources and objects.
To implement a distributed audio-visual I/O platform, includes placing the sensors, actuators and platforms into a space coordinate system, which includes determining the three-dimensional positions of the sensors and actuators.
DESCRIPTION OF DRAWINGSFIG. 1 illustrates a schematic representation of a distributed computing in accordance with one embodiment.
FIG. 2 is a flow diagram describing the process of generating position ion for audio sensors and actuators in accordance with one embodiment.
FIG. 3 illustrates a computation scheme to generate position coordinates.
DETAILED DESCRIPTIONEmbodiments of a three-dimensional position calibration of audio sensors and actuators in a distributed computing platform are disclosed. In the following description, numerous specific details are set forth. However, it is understood that embodiments may be practiced without these specific details. In other instances, well-known circuits, structures and techniques have not been shown in detail in order not to obscure the understanding of this description.
Reference throughout this specification to “one embodiment” or “an embodiment” indicate that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment. Thus, the appearances of the phrases “in one embodiment” or “in an embodiment” in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.
FIG. 1 illustrates a schematic representation of a distributed computing platform consisting of a set of General-Purpose Computers (GPC)102-106 (sometimes referred to as computing devices).GPC102 is configured to be the master, and performs the location estimation. The GPCs (102-106) shown inFIG. 1 may include a personal computer (PC), laptop, PDA, tablet PC, or other computing devices. In one embodiment, each GPC is equipped with audio sensors108 (e.g., microphones), actuators110 (e.g., loudspeakers), andwireless communication capabilities112, andcameras 114. As is explained in more detail below, the sensors and actuators of the multiple GPCs are used to estimate their respective physical locations.
For example, in one embodiment, given a set of M acoustic sensors and S acoustic actuators in unknown locations, one embodiment estimates their respective three dimensional coordinates. The acoustic actuators are excited using a predetermined calibration signal such as a maximum length sequence or chirp signal, and the time of arrival (TOA) is estimated for each pair of the acoustic actuators and sensors. In one embodiment, the TOA for a given pair of microphone and speakers is defined as the time for the acoustic signal to travel from the speaker to the microphone. Measuring the TOA and knowing the speed of sound in the acoustical medium, the distance between each acoustical signal source and the acoustical sensors can be calculated, thereby determining the three dimensional positions of the actuators and the sensors.
FIG. 2 is a flow diagram describing, in greater detail, the process of generating the three-dimensional position calibration of audio sensors and actuators in a distributed computing platform, according to one embodiment. The process described in the flow diagram ofFIG. 2 periodically references the GPCs of the distributed computer platform illustrated inFIG. 1.
Inblock202, afirst GPC102, which may be considered the master GPC of the distributed platform, transmits a wireless signal to a surrounding set of GPCs in the distributed platform (the actual number of GPCs included in the distributed platform may vary based on implementation). The signal from thefirst GPC102 includes a request that a specific actuator of one of the GPCs (e.g., second GPC103) be excited to generate an acoustic signal to be received by the sensors of the surrounding GPCs (e.g.,GPC102,104-106). In one embodiment, the initial wireless signal from themaster GPC102 identifies thespecific actuator110 to be excited.
In response to the signal from themaster GPC102, inblock204 thesecond GPC103 excites theactuator110 to generate an acoustic signal. In one embodiment, the acoustic signal may be a maximum length sequence or chirp signal, or another predetermined signal. Inblock206, thesecond GPC103 also transmits a first global time stamp to the other GPCs104-106. In one embodiment, the global time stamp identifies when the second GPC103 initiated the actuation of theactuator110 for thesecond GPC103. In block208, the sensors of theGPCs102,104-106, receive the acoustic signal generated by thesecond GPC103.
In block210, the time for the acoustic signal to travel from theactuator110 of thesecond GPC103 to the respective sensors (hereinafter referred to as Time of Arrival (TOA)), is estimated. In one embodiment, the TOA for a given pair of a microphone and speaker is defined as the time taken by the acoustic signal to travel form the speaker to the microphone.
In one embodiment, the GPCs that receive the acoustic signal via their sensors, proceed to estimate the respective TOAs. In one embodiment, there exists a common clock in the distributed platform so that GPCs102-106 are able to determine the time of arrival of audio samples captured by the respective sensors. As a result, the TOA can be estimated based on the difference between the first global time stamp issued by thesecond GPC103 and the time of when the acoustic signal is received by a sensor.
Considering, however, that sensors are distributed on different platforms, the audio stream among the different GPCs are typically not synchronized in time (e.g., analog-to-digital and digital-to-analog converters of actuators and sensors of the different GPCs are unsynchronized). As a result, the estimated TOA does not necessarily correspond to the actual TOA. In particular, the TOA of the acoustic signal may include an emission start time, which is defined as the time after which the sound is actually emitted from the speaker (e.g., actuator110) once the command has been issued from the respective GPC (e.g., GPC103). The actual emission start time is typically never zero and can actually vary in time depending on the sound card and processor load of the respective GPC.
Therefore, to account for the variations in the emission start time, multiple alternatives may be used. For example, in one embodiment, if multiple audio input channels are available on the GPC exciting an actuator, then one of the output channels can be connected directly to one of the input channels forming a loop-back. Source emission start time can then be estimated for a given speaker, and can be globally transmitted to theother GPCs102,104-106 to more accurately determine the respective TOAs. Furthermore, in one embodiment, in the case of using the loop-back, the estimated emission start time will be included in the global time stamp transmitted by the respective GPC.
Once the TOAs for the acoustic signal have been estimated by the receiving GPCs104-106, which may include accounting for the unknown emission start time as described above, the TOAs are transmitted to themaster GPC102. In an alternative embodiment, the TOAs can be computed by themaster GPC102, in which case each sensor of GPCs104-106 generate a second global timestamp of when the acoustic signals arrived, respectively. In the alternative embodiment, themaster GPC102 uses the first global time stamp (identifying when thesecond GPC103 initiated the actuation of the actuator110) and the second global time stamps to estimate the TOAs for the respective pairs of actuators and sensors. In such as case, themaster GPC102 may also estimate the emission start time of the acoustic signal to estimate the TOAs.
Indecision block212, if additional actuators remain in the distributed platform, the processes of blocks202-210 are repeated to have each of the actuators in the platform generate an acoustic signal to determine the TOAs with respective receiving sensors. In an alternative embodiment, multiple separate actuators may be actuated in parallel, wherein the actuator signals are multiplexed by each actuator using a unique signal (e.g., different parameters for chirp or MLS signals). In the case of actuating the multiple separate actuators in parallel, themaster GPC102 identifies to each actuator a unique signal parameters to be used when exciting the actuator.
Once all of the TOAs for the different pairs of actuators and sensors have been computed and transmitted to themaster GPC102, in block214 themaster GPC102 computes the coordinates of the sensors and the actuators. More specifically, as illustrated in the position computation scheme ofFIG. 3, in one embodiment themaster GPC102, utilizes a nonlinear least squares (NLS)computation302 to determine thecoordinates304 of the actuators and/or sensors. In one embodiment, theNLS computation302 considers theTOAs306, the number ofmicrophones308 and the number ofspeakers310 in the platform, along with aninitial estimation312 at the coordinates of the actuators and speakers. The actual computation used by themaster GPC102 to compute the coordinates of the actuators and sensors based on the TOAs may vary based on implementation. For example, in an alternative embodiment to compute the positions of sensors and actuators with unknown emission start times, the NLS procedure is used to jointly estimate the positions and the emission times. Emission times add extra S (number of actuators) variables to the computation procedure.
To provide the initial estimation as used by the NLS, several alternatives are available. For example, if an approximate idea of the microphone and speaker positions is available, then the initialization may be done manually. In another embodiment, the use of one or more cameras may provide a rough estimate to be used as the initial estimation.
An additional embodiment to generate an initial estimation includes assuming that microphones and speakers on a given computing platform are approximately at the same position, and given all estimates of the pairwise distances between the separate GPCs, a multidimensional scaling approach may be used to determine the coordinates from, in one embodiment, the Euclidean distance matrix. The approach involves converting the symmetric pairwise distance matrix to a matrix of scalar products with respect to some origin and then performing a singular value decomposition to obtain the matrix of coordinates. The matrix coordinates in turn, may be used as the initial guess or estimate of the coordinates for the respective GPCs, and the microphones and speakers located on them.
The techniques described above can be stored in the memory of one of the computing devices or GPCs as a set of instructions to be executed. In addition, the instructions to perform the processes described above could alternatively be stored on other forms of computer and/or machine-readable media, including magnetic and optical disks. Further, the instructions can be downloaded into a computing device over a data network in a form of compiled and linked version.
Alternatively, the logic to perform the techniques as discussed above, could be implemented in additional computer and/or machine readable media, such as discrete hardware components as large-scale integrated circuits (LSI's), application-specific integrated circuits (ASIC's), firmware such as electrically erasable programmable read-only memory (EEPROM's); and electrical, optical, acoustical and other forms of propagated signals (e.g., carrier waves, infrared signals, digital signals, etc.); etc.
These embodiments have been described with reference to specific exemplary embodiments thereof. It will, however, be evident to persons having the benefit of this disclosure that various modifications and changes may be made to these embodiments without departing from the broader spirit and scope of the embodiments described herein. The specification and drawings are, accordingly, to be regarded in an illustrative rather than a restrictive sense.