US10535360B1

Movatterモバイル変換

Info

Publication number: US10535360B1
Application number: US15/605,079
Authority: US
Inventors: Chi Fai Ho; John Chiong
Original assignee: TP Lab Inc
Current assignee: TP Lab Inc
Priority date: 2017-05-25
Filing date: 2017-05-25
Publication date: 2020-01-14
Also published as: US11355135B1

Abstract

A phone stand includes a phone holder for coupling to a phone conducting a voice session, a plurality of directional speakers positioned to project sound to a focused audio area corresponding to a location where a user is expected to be positioned, other speaker(s), and a system controller. The system controller is configured to receive audio signals of the voice session from the phone, separate the audio signals into speech signals and non-speech signals, obtain output mixing attributes, generate mixed signals by combining the speech signals and the non-speech signals according to the output mixing attributes, and send the mixed signals to the plurality of directional speakers. The other speaker(s) can include non-directional speakers, and the system controller is further configured to send the speech signals in the mixed signals to the plurality of directional speakers and the non-speech signals in the mixed signals to the other speaker(s).

Description

BACKGROUND OF THE INVENTIONField

This invention relates generally to a phone stand, and more specifically a voice-oriented conversation speaker system based on a plurality of directional speakers.

Related Art

Uses of audio in a vehicle had been limited in the past. Drivers listened to radios and cassette tape or CD players; while operators of transportation vehicles used special voice devices for announcements and communication. With advances in mobile computing and digital radio, today's drivers engage a much larger number of activities involving voice and audio. They use in-car digital and often interactive entertainment system, high definition digital radio, voice-activated navigation system, in-car voice assistants, cell phones for phone calls, voice recording, voice messaging, voice mail and notification retrieval, music streaming and other voice and audio-based phone applications (“apps”).

Despite the increase of voice and audio usage, a vehicle fundamentally is noisy, due in part to wind, engine noise, echo and external noise. When a driver is engaged in a phone call using speaker phone of her cell phone, she can hardly hear the sound of the other caller, while her voice is drowned in the ambient noise when picked up by the phone's microphone. The driver constantly adjusts the volume of the radio or speakers to be louder to drown the noise. He may miss a turn announced by the navigation system, or gets frequently frustrated when the in-car system's voice assistant repeatedly fails to understand his commands or questions.

A noisy environment is not unique to a car or bus. Workers often find similar situations in a work area. Using a voice or audio device such as a phone in a noisy work place is difficult and frustrating.

The above scenarios illustrate the need for a phone stand that assists a phone in providing voice and audio clarity.

BRIEF SUMMARY OF THE INVENTION

Disclosed herein is a phone stand using a plurality of directional speakers and a corresponding method and computer readable medium as specified in the independent claims. Embodiments of the present invention are given in the dependent claims. Embodiments of the present invention can be freely combined with each other if they are not mutually exclusive.

According to one embodiment of the present invention, a phone stand includes: a phone holder for coupling to a phone, the phone for conducting a voice session; a plurality of directional speakers positioned to project sound to a focused audio area corresponding to a location where a user is expected to be positioned; one or more other speakers; and a system controller. The system controller is configured to: receive audio signals of the voice session from the phone; separate the audio signals into speech signals and non-speech signals; obtain one or more output mixing attributes; generate mixed signals by combining the speech signals and the non-speech signals according to the one or more output mixing attributes; and send the mixed signals to the plurality of directional speakers.

In one aspect of the present invention, the phone stand further includes one or more microphones configured to capture a user's speech as sound signals and to send the sound signals to the system controller. The system controller is further configured to: separate the sound signals into second speech signals and second non-speech signals; obtain one or more input mixing attributes; generate second mixed signals by combining the second speech signals and the second non-speech signals according to the input mixing attributes; and send the second mixed signals to the phone.

In one aspect of the present invention, the output mixing attributes include one or more of the following: an attribute for increasing a volume of the speech signals; an attribute for reducing a volume of the non-speech signals; an attribute for eliminating the non-speech signals; an attribute for maintaining the volume of the non-speech signals; an attribute for eliminating the non-speech signals if the speech signals are present; and an attribute for increasing a clarify of the speech signals.

In one aspect of the present invention, the system controller is further configured to receive an incoming session indication from the phone notifying the system controller of the voice session and to announce the incoming session indication using the plurality of directional speakers. The phone stand further includes one or more microphones configured to capture speech from a user in response to the announced incoming session indication and to send sound signals of the captured speech to the system controller. The system controller is further configured to determine that the sound signals comprise an acceptance or a decline of the voice session and to send the acceptance or the decline in an incoming session response message to the phone.

In one aspect of the present invention, the one or more other speakers include non-directional speakers, and the system controller is further configured to send the speech signals in the mixed signals to the plurality of directional speakers and to send the non-speech signals in the mixed signals to the one or more other speakers.

In one aspect of the present invention, the focused area comprises any one of the following: an area away from a dashboard of a vehicle; an area away from a passenger side compartment box of the vehicle; and an area behind a head rest of a seat in a vehicle.

In one aspect of the present invention, the voice session comprises a voice call.

In one aspect of the present invention, the audio signals received by the system controller include a first indication labeling a first portion of the audio signals as the speech signals and a second indication labeling a second portion of the audio signals as the non-speech signals.

BRIEF DESCRIPTION OF THE SEVERAL VIEWS OF THE FIGURES

FIG. 1 illustrates an exemplary embodiment of a phone stand computing system according to the present invention.

FIG. 2 illustrates an exemplary embodiment of a computing device according to the present invention.

FIGS. 3a-3billustrate exemplary embodiments of directional speakers of the phone stand according to the present invention.

FIG. 4 illustrates an exemplary embodiment of a process for receiving an incoming voice session according to the present invention.

FIG. 5 illustrates an exemplary embodiment of a process for processing audio signals received from the phone according to the present invention.

FIG. 6 illustrates an exemplary embodiment of a process for sending audio signals to the phone according to the present invention.

FIG. 7 illustrates an exemplary embodiment of a process for interworking with an audio-based phone application according to the present invention.

FIG. 8 illustrates an exemplary embodiment of a process for processing audio signals received from an audio-based phone application according to the present invention.

FIG. 9 illustrates an exemplary embodiment of a process for sending audio signals to an audio-based phone application according to the present invention.

DETAILED DESCRIPTION OF THE INVENTION

The following description is presented to enable one of ordinary skill in the art to make and use the present invention and is provided in the context of a patent application and its requirements. Various modifications to the embodiment will be readily apparent to those skilled in the art and the generic principles herein may be applied to other embodiments. Thus, the present invention is not intended to be limited to the embodiment shown but is to be accorded the widest scope consistent with the principles and features described herein.

Reference in this specification to “one embodiment”, “an embodiment”, “an exemplary embodiment”, or “a preferred embodiment” means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the invention. The appearances of the phrase “in one embodiment” in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Moreover, various features are described which may be exhibited by some embodiments and not by others. Similarly, various requirements are described which may be requirements for some embodiments but not other embodiments. In general, features described in one embodiment might be suitable for use in other embodiments as would be apparent to those skilled in the art.

In one embodiment,phone stand600 includes asystem controller630, which includes a hardware processor configured with processing capabilities and a storage for storing computer programming instructions, which when executed by the processor ofsystem controller630, allowssystem controller630 to controldirectional speakers612,speakers614,phone holder634,power module632,microphones624 and localwireless network interface626. In one embodiment,system controller630 interacts withphone111 over one or more data communication sessions via localwireless network interface626 tophone111 to processvoice session121 andaudio session141. A communication session, as used herein, refers to a series of interactions between two communication end points that occur during the span of a single connection.

In one embodiment,phone stand600 connects to adata network652. In oneembodiment phone111 connects todata network652.

FIG. 2 illustrates an exemplary embodiment of hardware components of a computing device which can be used for a controller, a network computer, a server or a phone. In one embodiment,computing device510 includes ahardware processor511, a network module, anoutput module515, aninput module517, astorage519, or some combination thereof. In one embodiment, thehardware processor511 includes one or more general processors, a multi-core processor, an application specific integrated circuit based processor, a system on a chip (SOC) processor, an embedded processor, a digital signal processor, or a hardware- or application-specific processor. In one embodiment,output module515 includes or connects to a display for displaying video signals, images or text, one or more speakers to play sound signals, or a lighting module such as an LED. In one embodiment,output module515 includes a data interface such as USB, HDMI, DVI, DisplayPort, thunderbolt or a wire-cable connecting to a display, or one or more speakers. In one embodiment,output module515 connects to a display or a speaker using a wireless connection or a wireless data network. In one embodiment,input module517 includes a physical or logical keyboard, one or more buttons, one or more keys, or one or more microphones. In one embodiment,input module517 includes or connects to one or more sensors such as a camera sensor, an optical sensor, a night-vision sensor, an infrared (IR) sensor, a motion sensor, a direction sensor, a proximity sensor, a gesture sensor, or other sensors that is usable by a user to provide input tocomputing device510. In one embodiment,input module517 includes a physical panel housing one or more sensors. In one embodiment,storage519 includes a storage medium, a main memory, a hard disk drive (HDD), a solid state drive (SSD), a memory card, a ROM module, a RAM module, a USB disk, a storage compartment, a data storage component or other storage component. In one embodiment,network module513 includes hardware, software, or a combination of hardware and software, to interface or connect to a wireless data network such as a cellular network, a mobile network, a Bluetooth network, a NFC network, a personal area network (PAN), a WiFi network, or a Li-Fi network.Storage519 stores executable instructions, which when read and executed by theprocessor511 ofcomputing device510, implements one or more functionalities of the current invention.

Returning toFIG. 1, in one embodiment,power module632 includes a charging unit to chargephone111. In one embodiment, the charging unit includes a wireless charging unit or a charging connector. In one embodiment,power module632 includes a battery. In one embodiment,power module632 connects to an external power source.

In one embodiment, localwireless network interface626 connects to one or more of a NFC network, a Bluetooth network, a PAN network, an 802.11 network, an 802.15 PAN network, a ZeeBee network a LiFi network, and a short distance wireless network connecting two close-by networking devices.

In one embodiment,data network652 includes a cellular network, a mobile data network, a WiFi network, a LiFi network, a WiMAX network, an Ethernet, or any other data network.

In one embodiment,phone111 can be a mobile phone, a cell phone, a smartphone, an office desk phone, a VoIP phone, a cordless phone, a professional phone used by a train operator, bus driver, or a truck driver.

In one embodiment,voice session121 is a voice call session, a telephone call session, a teleconference session, a voice message exchange session, a VoIP call session, a voice over instant messaging (IM) session, a session with a voice assistant application such as Apple Sir, Google Now, Amazon Alexa, Microsoft Cortana, or other voice assistant. In one embodiment,voice session121 is a voice recording session, a text to speech session, an audio book reading session, playing a podcast, or a voice announcement.

In one embodiment,audio session141 includes a voice session, a music playing session, a session playing radio, a video session playing audio, a session where audio clip is played. In one embodiment,audio session141 includes a plurality of combined voice sessions and other audio sessions.

In one embodiment, user101 is a car driver, a bus driver, a vehicle passenger, a pilot, an operator operating a bus, a train, a truck, a ship, or a vehicle. In one embodiment, user101 is an office clerk, a receptionist, or an office worker. In one embodiment, user101 stays in a noisy environment where user101 is to conduct avoice session121 oraudio session141 with clarity.

In one embodiment, user101 notices the announcement ofincoming session indication222 through lit-upLED652, or ring tone played ondirectional speakers612 orspeakers614. In one embodiment, user101 responds to theindication222 with aresponse104 to accept, reject or disconnectvoice session121. In one embodiment, theresponse104 includes the user101 speaking intomicrophones624 or pressing abutton651 onphone stand600. In one embodiment,response104 indicates an acceptance of thevoice session121. In one embodiment, user101 speaks “answer the call”, “accept”, “yes”, “hello”, or another spoken phrase to accept tovoice session121.Microphones624 captures sound signals corresponding toresponse104 and sendsresponse104 tosystem controller630. In one embodiment,system controller630

processes response

104 using natural language processing and recognizes the spoken words of user101.System controller630 matches the spoken words to one or more pre-stored words or sequences of words in an ontology database (not shown) to determine thatresponse104 to indicates an acceptance of thevoice session121.System controller630 sends the acceptance in anincoming session response224 message tophone111. In one embodiment,system controller630 includes the sound signals of theresponse104, as captured bymicrophones624, intoincoming session response224, and sends theincoming session response224 tophone111. Thephone111 processes the sounds signals in theincoming session response224 to determine ifresponse104 indicates an acceptance, a rejection or a disconnection of thevoice session121. In one embodiment,system controller630 sendsresponse104 to avoice process server656 overdata network652 to determine ifresponse104 indicates an acceptance, a rejection or a disconnection of thevoice session121.

In one embodiment, user101 does not need to do anything to accept, decline or disconnectvoice session121.Phone111 automatically continues or discontinuesvoice session121. In one embodiment,phone111 is configured to automatically accept thevoice session121 after a pre-determined period of time, or after a pre-determined number of rings. In one embodiment,phone111 receives a disconnect indication over thevoice session121. In one embodiment,voice session121 is a voice call andphone111 receives a disconnect indication after the remote caller or system disconnects the voice call. In one embodiment,voice session121 is to play a voice message andphone111 discontinuesvoice session121 after playing the voice message.

In one embodiment, the pressing of abutton651 indicates an acceptance of a voice call.System controller630 detects the pressing of thebutton651 and sends anincoming session response224 indicating an acceptance of thevoice session121 tophone111.

In one embodiment, user101 wants to decline or disconnectvoice session121. In one embodiment, user101 says “no”, “decline”, “hang up”, “bye”, “disconnect” or other word or word phrase to indicate rejection ofvoice session121. In one embodiment,microphone624 captures sound signals corresponding toresponse104. In one embodiment,system controller630 receives the captured sound signals frommicrophone624 and processes the sound signals using natural language processing to determine that theresponse104 indicates a rejection ofvoice session121.System controller630 includes an indication to drop thevoice session121 in theincoming session response224 and sends theincoming session response224 to thephone111. In one embodiment, the indication includes a command, a message, a flag, an integer, or a tag. In one embodiment,system controller630 sends captured sound signals corresponding to theresponse104 tophone111, and thephone111 then processes the sound signals to determine whether theresponse104 indicates a rejection of thevoice session121.

In one embodiment, the pressing of thebutton651 declines a call.System controller630 detects the pressing of thebutton651 and sends anincoming session response224 indicating a rejection of thevoice session121 tophone111.

In one embodiment,phone111 receivesincoming session response224. In one embodiment,phone111 determines that theincoming session response224 is a rejection of thevoice session121, and in response,phone111 rejectsvoice session121. In one embodiment,phone111 rejects thevoice session121 by disconnecting thevoice session121. In one embodiment,phone111 sends a rejection indication overvoice session121 to the caller. In one embodiment,phone111 determines that theincoming session response224 is an acceptance of thevoice session121, and in response, thephone111 sends an acceptance indication overvoice session121 to the caller or the callee.

FIG. 5 illustrates an exemplary embodiment for a processing of audio signals received from the phone during a voice session according to the present invention. In this embodiment,phone111 receivesaudio signals222 overvoice session121, established as described above with reference toFIG. 4.Phone111 sendsaudio signals222 to phone stand600. In one embodiment,system controller630 receivesaudio signals222 fromphone111.System controller630 processesaudio signals222 and separatesaudio signals222 into speech signals726 andnon-speech signals724. In one embodiment, audio signals222 includes a first indication labeling a first portion of theaudio signals222 as speech signals726 and a second indication labeling a second portion of theaudio signals222 as non-speech signals724. In one embodiment, audio signals222 includes a channel for speech signals726 and a channel fornon-speech signals724. In one embodiment,system controller630 identifiesaudio signals222 as speech signals726 and determines there are nonon-speech signals724 in the audio signals222. In one embodiment,system controller630 includes one or more voice call output mixing attributes721.System controller630 generatesmixed signals722 by combining speech signals726 andnon-speech signals724 according to output mixing attributes721. In one embodiment, output mixing attributes721 includes one or more attributes for increasing the volume of speech signals726, for reducing the volume ofnon-speech signals724, for eliminatingnon-speech signals724, for maintaining a volume ofnon-speech signals724 if speech signals726 are absent, for eliminatingnon-speech signals724 if speech signals726 are present, or some combination thereof.System controller630 generatesmixed signals722 according to the output mixing attributes721 such that the clarity for speech signals724 is increased. Upon generatingmixed signals722,system controller630 playsmixed signals722 viadirectional speakers612. In one embodiment, output mixing attributes721 includes a mixed signal volume adjustment attribute. In one embodiment,system controller630 adjusts the volume ofmixed signals722 according to the mixed signal volume adjustment attribute such that the volume is not too loud for user101, who is assumed to be positioned in the focused area ofdirectional speakers612 and listening to the sound ofmixed signals722. In one embodiment, output mixing attributes721 adjusts volume of speech signals726 higher thannon-speech signals724. In one embodiment,mixed signals722 are sent over todirectional speakers612 such that speech signals726 is played louder thannon-speech signals724. In one embodiment, speech signals inmixed signals722 are sent over todirectional speakers612 and non-speech signals inmixed signals722 are sent over tospeakers614.

In one embodiment,phone111 generatesaudio signals222 to include: a first indication labeling a first portion of theaudio signals222 as speech signals726 or a first channel for speech signals726; and a second an indication labeling a second portion of theaudio signals222 asnon-speech signals724 or a second channel fornon-speech signals724. In one embodiment,phone111 receivesaudio signals222 fromvoice session121, and the receivedaudio signals222 includes: a first indication labeling a first portion of theaudio signals222 as speech signals726 or a first channel for speech signals726; and a second indication labeling a second portion ofaudio signals222 asnon-speech signals724 or a second channel fornon-speech signals724. In one embodiment audio signals222 includes a Dolby multi-channel format for encoding speech signals726 into a dialogue channel andnon-speech signals724 into a non-dialogue channel. In one embodiment, thesystem controller630 plays the dialogue channel over thedirectional speakers612 and plays the non-dialogue channel over thespeakers614. In one embodiment, audio signals222 includes a different multi-channel or multiple sub-sessions formats to encodespeech signals726 andnon-speech signals724.

FIG. 6 illustrates an exemplary embodiment of a process for sending audio signals to the phone according to the present invention. In this embodiment, user101 speaks intomicrophone624 duringvoice session121. In one embodiment,microphones624 capture the user's speech as sound signals747 and sends the sound signals747 tosystem controller630.System controller630 processes soundsignals747 and separatessound signals747 into speech signals746 andnon-speech signals744. In one embodiment,system controller630 stores or has access to a storage with one or more voice call input mixing attributes741.System controller630 generatesmixed signals742 by combining speech signals746 andnon-speech signals744 according to input mixing attributes741. In one embodiment, input mixing attributes741 includes one or more attributes for increasing the volume of speech signals746, for reducing the volume ofnon-speech signals744, for eliminatingnon-speech signals744, for eliminatingnon-speech signals744 if speech signals746 are absent, for eliminatingnon-speech signals746 if speech signals746 are present, or some combination thereof.System controller630 generatesmixed signals742 according to the input mixing attributes741 such that the clarity of speech signals746 are increased. Upon generatingmixed signals742,system controller630 sendsmixed signals742 tophone111. In one embodiment,phone111 receivesmixed signals742 and sendsmixed signals742 overvoice session121. In one embodiment,mixed signals742 includes a first indication labeling a first portion ofmixed signals742 as speech signals746 and a second indication labeling a second portion ofmixed signals742 as non-speech signals744. In one embodiment,mixed signals742 includes a multi-channel format to encodespeech signals746 andnon-speech signals744. In one embodiment, sound signals747 includes only speech signals746.

In one embodiment,microphones624 include a directional microphone facing an assume position of user101, or a particular microphone closer to the assumed position of the user101 than the other microphones.System controller630 identifies the speech signals746 that are insound signals747 received from the directional or particular microphone. In one embodiment,microphones624 include a particular microphone located further away from the assumed position of the user101, and optionally where the particular microphone is shielded from sound made by user101.System controller630 identifies thenon-speech signals744 in sound signals747 received from the particular microphone.

In one embodiment, input mixing attributes741 includes a mixed signalvolume adjustment attribute742. In one embodiment,system controller630 increases the volume ofmixed signals742 prior to sendingmixed signals742 tophone111 according to the mixed signalvolume adjustment attribute742.

In one embodiment,audio app114 instructsphone111 to endaudio session141, and in response,phone111 sendsaudio session indication242 to include an ending indication. In one embodiment, the indication comprises a command, a message, a flag, an integer, or a tag. In one embodiment,system controller630 receives the ending indication, and in response, stops applying mixing the audio signals to be outputted by theapp114 or inputted to theapp114.

In one embodiment,system controller630 announcesaudio session indication242 usingspeakers614,directional speaker612, or an LED light.

FIG. 8 illustrates an exemplary embodiment of a process for processing audio signals received from an audio-based phone app according to the present invention. In this embodiment,audio app114 conducts anaudio session141, of which thesystem controller630 ofphone stand600 is notified byphone111, as described above with reference toFIG. 7. In one embodiment,audio app114 generates app audio signals244 duringaudio session141.Audio app114 sendsphone111 of app audio signals244. In one embodiment,phone111 sends app audio signals244 tosystem controller630.System controller630 receives app audio signals244, and processes app audio signals244 according to previously stored app audio output mixing attributes761. In one embodiment, output mixing attributes761 contain an attribute value indicating that no processing of app audio signals244 is to be performed by thesystem controller630.System controller630 plays app audio signals244 overdirectional speakers612 orspeakers614 according to output mixing attributes761.

In one embodiment, output mixing attributes761 contains an attribute value indicating that app audio signals244 are to be separated into speech signals726 andnon-speech signals724. Based on the output mixing attributes761,system controller630 processes app audio signals244 and separatesaudio signals244 into speech signals726 andnon-speech signals724.System controller630 then combines speech signals726 andnon-speech signals724 according to output mixing attributes761 to generatemixed signals722.

In one embodiment, output mixing attributes761 includes one or more attributes for increasing the volume of speech signals726, for reducing the volume ofnon-speech signals724, for eliminatingnon-speech signals724, for maintaining volume ofnon-speech signals724 if speech signals726 are absent, for eliminatingnon-speech signals724 if speech signals726 are present, or some combination thereof. In one embodiment,system controller630 generatesmixed signals722 according to the output mixing attributes761 such that the clarity of speech signals724 or the audio quality fornon-speech signals726 is increased. In one embodiment,system controller630 playsmixed signals722 overdirectional speakers612 orspeakers614 as specified by the output mixing attributes761. In one embodiment,system controller630 playsmixed signals722 usingdirectional speakers612 whensystem controller630 determines that the speech signals726 in themixed signals722 are of better quality than the non-speech signals724. In one embodiment,system controller630 playsmixed signals722 usingspeakers614 whensystem controller630 determines that thenon-speech signals724 in themixed signals722 are of better quality than the speech signals726. In one embodiment,system controller630 plays the speech signals726 inmixed signals722 usingdirectional speakers612. In one embodiment,system controller630 plays thenon-speech signals724 inmixed signals722 usingspeakers614.

In one embodiment,system controller630 determinesdirectional speakers612 are to be used to playmixed signals722. In one embodiment, output mixing attributes761 includes a volume adjustment attribute.System controller630 adjusts the volume ofmixed signals722 or app audio signals244 according to the volume adjustment attribute so that the volume is not too loud for user101, who is assumed to be positioned in the focused area ofdirectional speakers612.

In one embodiment, app audio signals244 include: a first indication labeling a first portion of app audio signals244 as speech signals726 or a first channel for speech signals726; and a second indication labeling a second portion of app audio signals244 asnon-speech signals724 or a second channel fornon-speech signals724. In one embodiment,phone111 modifies app audio signals244 to include such indications or channels. In one embodiment,audio signals244 received fromaudio session141 include such indications or channels. In one embodiment, audio app144 generates app audio signals244 to include such indications or channels. In one embodiment, app audio signals244 uses Dolby multi-channel format to indicate speech signals726 in a dialogue channel andnon-speech signals724 in a non-dialogue channel. In one embodiment, app audio signals244 uses a different channel or sub-session separation for speech signals726 andnon-speech signals724.

FIG. 9 illustrates an exemplary embodiment of a process for sending audio signals to an audio-based phone app according to the present invention. In this embodiment, during anaudio session141,audio app114 sends anaudio input request142 tophone111, and in response, thephone111 forwardsaudio input request142 tosystem controller630. In one embodiment,system controller630 receivesaudio input request142 and instructsmicrophones624 to capture sound signals747. In one embodiment,system controller630 receives captured sound signals747 frommicrophones624.System controller630 processes soundsignals747 according to app audio input mixing attributes762 to generatemixed signals742. In one embodiment, input mixing attributes762 includes attribute values that indicate that no processing of the sound signals747 is to be performed.System controller630 copies soundsignals747 tomixed signals742. In one embodiment, input mixing attributes762 contain attribute values that indicate that the sound signals747 are to be separated into speech signals and non-speech signals.System controller630 processes soundsignals747 to separate sound signals747 into speech signals746 andnon-speech signals744.System controller630 then combines speech signals746 andnon-speech signals744 according to the input mixing attributes762 to generatemixed signals742. In one embodiment, input mixing attributes762 includes one or more attributes for increasing the volume of speech signals746, for reducing the volume ofnon-speech signals744, for eliminatingnon-speech signals744, for eliminatingnon-speech signals744 if speech signals746 are absent, and for eliminatingnon-speech signals724 if speech signals746 are present, or some combination thereof. In one embodiment,system controller630 generatesmixed signals742 such that the clarity of speech signals746 or the quality ofnon-speech signals744 in sound signals747 are increased.

In one embodiment, upon generatingmixed signals742,system controller630 sendsmixed signals742 tophone111. In one embodiment,phone111 sendsmixed signals742 toaudio app114.

The present invention can take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment containing both hardware and software elements. In a preferred embodiment, the present invention is implemented in software, which includes but is not limited to firmware, resident software, microcode, etc.

Furthermore, the present invention can take the form of a computer program product accessible from a computer usable or computer readable storage medium providing program code for use by or in connection with a computer or any instruction execution system. For the purposes of this description, a computer usable or computer readable storage medium can be any apparatus that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. The medium can be an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system (or apparatus or device) or a propagation medium. Examples of a computer-readable medium include a semiconductor or solid state memory, magnetic tape, a removable computer diskette, a random access memory (RAM), a read-only memory (ROM), a rigid magnetic disk and an optical disk. Current examples of optical disks include compact disk-read only memory (CD-ROM), compact disk-read/write (CD-R/W) and DVD. A computer readable storage medium, as used herein, is not to be construed as being transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through a waveguide or other transmission media (e.g., light pulses passing through a fiber-optic cable), or electrical signals transmitted through a wire.

A data processing system suitable for storing and/or executing program code will include at least one processor coupled directly or indirectly to memory elements through a system bus. The memory elements can include local memory employed during actual execution of the program code, bulk storage, and cache memories which provide temporary storage of at least some program code in order to reduce the number of times code must be retrieved from bulk storage during execution.

Input/output or I/O devices (including but not limited to keyboards, displays, point devices, etc.) can be coupled to the system either directly or through intervening I/O controllers.

Network adapters may also be coupled to the system to enable the data processing system to become coupled to other data processing systems or remote printers or storage devices through intervening private or public networks. Modems, cable modem and Ethernet cards are just a few of the currently available types of network adapters.

The flowchart and block diagrams in the Figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified local function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used herein, the singular forms “a”, “an” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms “comprises” and/or “comprising,” when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

Although the present invention has been described in accordance with the embodiments shown, one of ordinary skill in the art will readily recognize that there could be variations to the embodiments and those variations would be within the spirit and scope of the present invention. Accordingly, many modifications may be made by one of ordinary skill in the art without departing from the spirit and scope of the appended claims.

Claims

What is claimed is:

1. A phone stand, comprising:

a phone holder for coupling to a phone, the phone for conducting an audio session, the audio session comprising at least one voice session established with a remote device and at least one other audio session of an application executing on the phone;

a plurality of directional speakers positioned to project sound to a focused audio area corresponding to a location where a user is expected to be positioned;

one or more other speakers; and

a system controller configured to:

receive audio signals of the audio session from the phone, the audio signals of the audio session comprising first audio signals received from the remote device over the at least one voice session and second audio signals received from the application executing on the phone;

separate the first and second audio signals of the audio session into speech signals and non-speech signals;

obtain one or more output mixing attributes for the speech signals and the non-speech signals;

modify the speech signals and the non-speech signals based on the one or more output mixing attributes;

generate mixed signals by combining the modified speech signals and the modified non-speech signals; and

send the mixed signals to the plurality of directional speakers.

2. The phone stand ofclaim 1, further comprising one or more microphones configured to capture a user's speech as sound signals and to send the sound signals to the system controller, wherein the system controller is further configured to:

separate the sound signals into second speech signals and second non-speech signals;

obtain one or more input mixing attributes for the second speech signals and the second non-speech signals;

modify the second speech signals and the second non-speech signals based on the one or more input mixing attributes;

generate second mixed signals by combining the modified second speech signals and the modified second non-speech signals; and

send the second mixed signals to the phone.

3. The phone stand ofclaim 1, wherein the output mixing attributes comprise one or more of the following:

an attribute for increasing a volume of the speech signals;

an attribute for reducing a volume of the non-speech signals;

an attribute for eliminating the non-speech signals;

an attribute for maintaining the volume of the non-speech signals;

an attribute for eliminating the non-speech signals if the speech signals are present; and

an attribute for increasing a clarify of the speech signals.

4. The phone stand ofclaim 1, wherein the system controller is further configured to receive an incoming session indication from the phone notifying the system controller of the at least one voice session and to announce the incoming session indication using the plurality of directional speakers,

wherein the phone stand further comprises one or more microphones configured to capture speech from a user in response to the announced incoming session indication and to send sound signals of the captured speech to the system controller,

wherein the system controller is further configured to determine that the sound signals comprise an acceptance or a decline of the at least one voice session and to send the acceptance or the decline in an incoming session response message to the phone.

5. The phone stand ofclaim 1, wherein the one or more other speakers comprise non-directional speakers, wherein the system controller is further configured to:

send the modified speech signals in the mixed signals to the plurality of directional speakers; and

send the modified non-speech signals in the mixed signals to the one or more other speakers.

6. The phone stand ofclaim 1, wherein the focused area comprises any one of the following:

an area away from a dashboard of a vehicle;

an area away from a passenger side compartment box of the vehicle; and an area behind a head rest of a seat in a vehicle.

7. The phone stand ofclaim 1, wherein the at least one voice session comprises a voice call between the phone and the remote device.

8. The phone stand ofclaim 1, wherein the audio signals received by the system controller comprises a first indication labeling a first portion of the audio signals as the speech signals and a second indication labeling a second portion of the audio signals as the non-speech signals.

9. A method for processing audio signals of an audio session from a phone, comprising:

(a) receiving, by a system controller of a phone stand, the audio signals of the audio session conducted by the phone, the audio session comprising at least one voice session established with a remote device and at least one other audio session of an application executing on the phone, the phone stand comprising a phone holder for coupling to the phone, a plurality of directional speakers positioned to project sound to a focused audio area corresponding to a location where a user is expected to be positioned, and one or more other speakers, comprising:

(a1) receiving first audio signals of the audio session from the remote device over the at least one voice session; and

(a2) receiving second audio signals of the audio session from the application executing on the phone;

(b) separating, by the system controller, the first and second audio signals into speech signals and non-speech signals;

(c) obtaining, by the system controller, one or more output mixing attributes for the speech signals and the non-speech signals;

(d) modifying the speech signals and the non-speech signals based on the one or more output mixing attributes;

(e) generating, by the system controller, mixed signals by combining the modified speech signals and the modified non-speech signals; and

(f) sending, by the system controller, the mixed signals to the plurality of directional speakers.

10. The method ofclaim 9, wherein the phone stand further comprises one or more microphones, wherein the method further comprises: capturing, by the one or more microphones, a user's speech as sound signals;

sending, by the one or more microphones, the sound signals to the system controller;

separating, by the system controller, the sound signals into second speech signals and second non-speech signals;

obtaining, by the system controller, one or more input mixing attributes for the second speech signals and the second non-speech signals;

modifying the second speech signals and the second non-speech signals based on the one or more input mixing attributes;

generating, by the system controller, second mixed signals by combining the modified second speech signals and the modified second non-speech signals; and

sending, by the system controller, the second mixed signals to the phone.

11. The method ofclaim 9, wherein the output mixing attributes comprise one or more of the following:

an attribute for increasing a volume of the speech signals;

an attribute for reducing a volume of the non-speech signals; an attribute for eliminating the non-speech signals;

an attribute for maintaining the volume of the non-speech signals;

an attribute for increasing a clarify of the speech signals.

12. The method ofclaim 9, wherein the phone stand further comprises one or more microphones, wherein the method further comprises:

receiving, by the system controller, an incoming session indication from the phone notifying the system controller of the at least one voice session;

announcing, by the system controller, the incoming session indication using the plurality of directional speakers;

capturing, by the one or more microphones, speech from a user in response to the announced incoming session indication;

sending, by the one or more microphones, sound signals of the captured speech to the system controller;

determining, by the system controller, that the sound signals comprise an acceptance or a decline of the at least one voice session; and

sending, by the system controller, the acceptance or the decline in an incoming session response message to the phone.

13. The method ofclaim 9, wherein the one or more other speakers comprise non-directional speakers, wherein the sending (f) comprises:

(f1) sending, by the system controller, the modified speech signals in the mixed signals to the plurality of directional speakers; and

(f2) sending, by the system controller, the modified non-speech signals in the mixed signals to the one or more other speakers.

14. The method ofclaim 9, wherein the focused area comprises any one of the following:

an area away from a dashboard of a vehicle;

15. The method ofclaim 9, wherein the at least one voice session comprises a voice call between the phone and the remote device.

16. The method ofclaim 9, wherein the audio signals received by the system controller comprises a first indication labeling a first portion of the audio signals as the speech signals and a second indication labeling a second portion of the audio signals as the non-speech signals.

17. A non-transitory computer readable medium embodied in a phone stand, the medium comprising computer readable program code embodied therein, wherein when executed by a processor causes the processor to:

(a) receive audio signals of an audio session conducted by a phone, the audio session comprising at least one voice session established with a remote device and at least one other audio session of an application executing on the phone, the phone coupled to the phone stand via a phone holder, comprising:

(a1) receive first audio signals of the audio session from the remote device over the at least one voice session; and

(a2) receive second audio signals of the audio session from the application executing on the phone:

(b) separate the audio signals into speech signals and non-speech signals;

(c) obtain one or more output mixing attributes for the speech signals and the non-speech signals;

(d) modify the speech signals and the non-speech signals based on the one or more output mixing attributes;

(e) generate mixed signals by combining the modified speech signals and the modified non-speech signals; and

(f) send the mixed signals to a plurality of directional speakers of the phone stand, wherein the plurality of direction speakers is positioned to project sound to a focused audio area corresponding to a location where a user is expected to be positioned.

18. The medium ofclaim 17, wherein the phone stand further comprises one or more microphones, wherein the processor is further caused to:

receive sound signals captured by the one or more microphones from a user's speech;

send the second mixed signals to the phone.

19. The medium ofclaim 17, wherein the output mixing attributes comprise one or more of the following:

an attribute for increasing a volume of the speech signals;

an attribute for reducing a volume of the non-speech signals;

an attribute for eliminating the non-speech signals; an attribute for maintaining the volume of the non-speech signals;

an attribute for increasing a clarify of the speech signals.

20. The medium ofclaim 17, wherein the phone stand further comprises one or more microphones, wherein the processor is further caused to:

receive an incoming session indication from the phone notifying of the at least one voice session;

announce the incoming session indication using the plurality of directional speakers;

receive sound signals captured by the one or more microphones from a user's speech in response to the announced incoming session indication;

determine that the sound signals comprise an acceptance or a decline of the at least one voice session; and

send the acceptance or the decline in an incoming session response message to the phone.

21. The medium ofclaim 17, wherein the one or more other speakers comprise non-directional speakers, wherein the send (f) comprises:

(f1) send the modified speech signals in the mixed signals to the plurality of directional speakers; and

(f2) send the modified non-speech signals in the mixed signals to the one or more other speakers.

22. The medium ofclaim 17, wherein the focused area comprises any one of the following:

an area away from a dashboard of a vehicle;

an area away from a passenger side compartment box of the vehicle; and

an area behind a head rest of a seat in a vehicle.

23. The medium ofclaim 17, wherein the at least one voice session comprises a voice call between the phone and the remote device.

24. The medium ofclaim 17, wherein the audio signals comprise a first indication labeling a first portion of the audio signals as the speech signals and a second indication labeling a second portion of the audio signals as the non-speech signals.