Detailed Description
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like reference numerals refer to the like elements throughout. The embodiments are described below in order to explain the present invention by referring to the figures.
By way of introduction, aspects of the present invention are directed to systems and methods for reducing audio echo or unwanted reverberation in an audio conference or audio video conference implemented over a computer network. In particular, for example, during an audio conference using voice over internet protocol (VoIP), a participant may use a computer workstation equipped with a microphone to participate in the conference. Various embodiments of the present invention may be implemented in a VoIP audio conference implemented by a peer-to-peer or by a VoIP server or a mixture thereof.
Referring now to the drawings, and now to FIG. 1, there is shown a scenario featuring in accordance with the present invention. Fig. 1 shows three participants operating three workstations orcomputer systems 10A, 10B, and 10C, respectively, the workstations orcomputer systems 10A, 10B, and10C including microphones 2A, 2B, and 2C andspeakers 3a,3B, and 3C, respectively. Theworkstations 10A, 10B are configured in a single room. Workstation 10C is a remote peer computer system operating in another room, another city, or another continent. When there are two participants in a single location (e.g., a single room), the participant's voice may be received bymicrophone 2A of his workstation and bymicrophone 2B of his roommate's workstation. Bothworkstations 10A and 10B transmit parallel audio streams to remote participants over a network, and when both audio streams of the participants' voices are played, the remote participants of the conference hear echoes of the same voice. Conventionally, participants in a conference sharing a room may be required to ensure that only one microphone is unmuted in order to ensure sound quality.
Referring now also to FIG. 1A, there is schematically shown a network connection betweencomputer systems 10A, 10B and 10C and a Voice over Internet protocol (VoIP)server 13.Computer systems 10A and 10B may be conventionally interconnected by a Local Area Network (LAN), which may be implemented by a wired network (e.g., IEEE 802.3 Ethernet) or a wireless network (e.g., IEEE 802.11 Wifi). Reference is now also made to fig. 1B, which illustrates, by way of example, conventional peer-to-peer audio streaming during an audio conference. Specifically,computer system 10A communicates audio buffer A tocomputer systems 10B and 10C, and similarlycomputer system 10B communicates audio buffer B tocomputer systems 10A and 10C. In the scenario shown in fig. 1, where the same speech from the participant is encoded into audio buffers a and B (with a sufficiently long delay of greater than 30 milliseconds), then mixed and played atcomputer system 10C, the speech may hear an echo or unwanted reverberation as it is played atcomputer system 10C.
Reference is now also made to fig. 2, which schematically illustrates audio streaming during an audio conference according to features of the present invention. Thus, audio buffer B may be transmitted fromcomputer system 10B and received bycomputer system 10A. Atcomputer system 10A, audio buffers a and B may be synchronized (e.g., within 30 milliseconds), mixed, and transmitted to VoIP server 13.VoIP server 13 may transmit the synchronized and mixed audio buffer toremote computer system 10C, playing sound atremote computer system 10C without echo.
Reference is now also made to fig. 3, which schematically illustrates audio streaming during an audio conference according to another feature of the present invention.Computer systems 10A and 10B transmit audio buffers a and B, respectively, toVoIP server 13 separately.VoIP server 13 includes amodule 14,module 14 can synchronize and mix audio buffers a and B into a synchronized/mixed audio buffer such that audio is played atcomputer system 10C without echo.
Referring now also to fig. 5, amethod 50 in accordance with features of the present invention is shown. In step 51, the conferencing application may identify whether two ormore computer systems 10 participating in the audio conference have microphones 2 that may receive the same audio input signal. The identification may be performed by prompting a participant whether another participant of the audio conference is sharing a room with the participant (step 51). In step 52, the corresponding audio buffers may be received fromcomputer systems 10A and 10B and the audio buffers A and B are synchronized (step 53). In step 54, the gain difference between the received audio buffers a and B may be corrected. Themicrophone 2A may be less sensitive and/or the signal from themicrophone 2A may be streamed at a lower level than theother microphone 2B, so that gain may be added to themicrophone 2A to balance the level at play. It is also desirable to increase the gain of the microphone being used by the participant currently speaking relative to other unmuted microphones of the participants in the conference. In step 55, the audio buffers are mixed into output buffers, and the output buffers are sent (step 56) to the remotepeer computer system 10C. In step 57, the echo is reduced in the output buffer as it is played in theremote computer system 10C.
Referring now to fig. 4A, there is illustrated audio streaming for audio received during an audio conference in accordance with features of the present invention.Computer system 10A may receive an audio buffer fromVoIP server 13, the audio buffer comprising combined audio from a remote peer computer system (not shown). Thecomputer system 10A may transmit audio locally to thecomputer system 10B so that allcomputer systems 10 in the same room play the audio synchronously. Referring now also to fig. 4B, there is shown audio streaming for audio received during an audio conference in another configuration. The synchronized audio buffers are sent directly from theVoIP server 13 to thecomputer systems 10A and 10B. The received audio buffers may be sent to thefirst computer system 10A and thesecond computer system 10B with corresponding delays such that the received audio buffers are played synchronously at thefirst computer system 10A and thesecond computer system 10B. Alternatively, one speaker 3 in the same room may play audio.
Referring now to FIG. 6, a simplified computer system 60 is schematically illustrated in accordance with conventional techniques. Thecomputer system 10 includes aprocessor 601, a storage mechanism including a memory bus 607 for storing information in amemory 609, and anetwork interface 605 operatively connected to theprocessor 601 through the peripheral bus 603. Thecomputer system 10 also includes a data input mechanism 611 (disk drive), such as for a computer readable medium 613 (e.g., an optical disk). Thedata input mechanism 611 is operatively coupled to theprocessor 601 using a peripheral bus 603. Thesound card 614 is operatively connected to the peripheral bus 603. The input of thesound card 614 is operatively connected to the output of the microphone 2 and to the input of the speaker 3.
In this specification and in the following claims, a "computer system" is defined as one or more software modules, one or more hardware modules, or a combination thereof that work together to perform operations on electronic data. For example, the definition of computer system includes the hardware components of a personal computer as well as software modules, such as the operating system of a personal computer. The physical layout of the modules is not important. The computer system may include one or more computers coupled via a computer network. Likewise, a computer system may include a single physical device (e.g., a mobile phone, a laptop computer, or a tablet computer) with internal modules (e.g., memory and a processor) working together to perform operations on electronic data.
In this specification and in the following claims, a "network" is defined as any architecture in which two or more computer systems may exchange data. The data exchanged may be in the form of electrical signals that are meaningful to two or more computer systems. When data is transferred or provided over a network or another communications connection (either hardwired, wireless, or a combination of hardwired or wireless) to a computer system or computer device, the connection is properly viewed as a computer-readable medium. Thus, any such connection is properly termed a computer-readable medium. Combinations of the above should also be included within the scope of computer-readable media. Computer-executable instructions comprise, for example, instructions and data which cause a general purpose computer system or special purpose computer system to perform a certain function or group of functions. The described embodiments may also be embodied as computer readable code on a non-transitory computer readable medium. A non-transitory computer readable medium is any data storage device that can store data which can be thereafter read by a computer system. Examples of the non-transitory computer readable medium include read-only memory, random-access memory, CD-ROM, HDD, DVD, magnetic tape, and optical data storage devices. The non-transitory computer readable medium can also be distributed over a network coupled computer system so that the computer readable code is stored and executed in a distributed fashion.
The various aspects, embodiments, implementations, or features of the described embodiments may be used alone or in any combination. The various aspects of the described embodiments may be implemented in software, hardware, or a combination of hardware and software.
The terms "device," "workstation," and "computer system" are used interchangeably herein.
The term "connected" as used herein refers to both wired and wireless computer connections.
The term "emphasis" as used herein refers to a relative increase in audio gain or audio level.
The term "echo" as used herein refers to hearing when two audio signals having similar or identical audio inputs are played asynchronously with a time delay of greater than about 10-50 milliseconds.
The term "synchronized" or "synchronization" as used herein is less than about 50 milliseconds. In some cases where the participants are in different locations in a large room, there will be some reverberation depending on the room size. In such cases, the term "synchronized" or "synchronization" may refer to less than about 30 milliseconds. Alternatively, in some embodiments of the invention, it may be desirable to reduce reverberation even further, so that synchronization of less than about 20 milliseconds or less than 10 milliseconds may be suggested to be effective.
The transitional term "comprising" as used herein is synonymous with "including" and is broad or open-ended and does not exclude additional, unrecited elements or method steps. The articles "a", "an" (such as "a computer system", "an audio buffer") as used herein have the meaning of "one or more", i.e. "one or more computer systems", "one or more audio buffers".
All optional and preferred features and modifications of the described embodiments and the dependent claims may be used in all aspects of the invention taught herein. Furthermore, the various features of the dependent claims, as well as all optional and preferred features and modifications of the described embodiments, are combinable and interchangeable with each other.
While selected features of the invention have been illustrated and described, it should be understood that the invention is not limited to the described features.
While selected embodiments of the present invention have been shown and described, it should be understood that the invention is not limited to the described embodiments. Rather, it should be understood that changes can be made in these embodiments without departing from the scope of the invention as defined in the following claims and their equivalents.