PARTIAL WAIVER OF COPYRIGHTAll of the material in this patent application is subject to copyright protection under the copyright laws of the United States and of other countries. As of the first effective filing date of the present application, this material is protected as unpublished material.
However, permission to copy this material is hereby granted to the extent that the copyright owner has no objection to the facsimile reproduction by anyone of the patent documentation or patent disclosure, as it appears in the United States Patent and Trademark Office patent file or records, but otherwise reserves all copyright rights whatsoever.
CROSS REFERENCE TO RELATED APPLICATIONSNot Applicable
STATEMENT REGARDING FEDERALLY SPONSORED RESEARCH OR DEVELOPMENTNot Applicable
REFERENCE TO A MICROFICHE APPENDIXNot Applicable
FIELD OF THE INVENTIONThis invention generally relates to the field of telephony services and more particularly to the field of telephony speech recognition services and telephony speech services synthesis over a communications network.
BACKGROUND OF THE INVENTIONThe use of telephone over PSTN (public switch telephone network) continues to grow at a tremendous rate. Recently, the use of the Internet and voice over IP has been gaining popularity as well.
The use of the telephone although easy for most people can be problematic for individuals with diminished hearing or speech capabilities. It is common for individuals who have diminished hearing or speech capabilities to feel discouraged when having to use the telephone. Special devices to improve the quality of participation for speech impaired individuals in a telephony conversation are widely available. One example of a device is a simple amplifier for amplifying the volume of the receiver's handset. This solution works well for many hearing impaired individuals; however, the use of a simple amplifier will not work for many other impaired individuals.
A system is disclosed in U.S. Pat. No. 5,832,433, with inventors Yashchin, et al., entitled “Speech synthesis method for operator assistance telecommunications calls comprising a plurality of text-to-speech (TTS) devices” describes a method and apparatus for providing automated operator services and in particular, a reverse directory assistance service. A calling customer is connected to an automated system that prompts the caller for a listing identifier which is used by the system to retrieve a textual listing corresponding to the listing identifier from a database of textual listings. The textual listing contains a TTS ID, which identifies a particular one TTS device from a plurality of TTS devices and the listing is optionally preprocessed and parsed into a plurality of fields that define the listing. The listing text is then sent to the particular one TTS device for text to speech synthesis of the text contained within the listing. The method further includes teaching the system which one TTS device of the plurality of TTS devices, best synthesizes the text contained within the listing and then identifying that one TTS device within the listing so that subsequent synthesis will utilize that TTS device.
Another system is disclosed in U.S. Pat. No. 4,659,877, with inventors Eric. A. Dorsey et al., entitled “Verbal computer terminal system” for a telephonic data communications system provides verbal communication of data in remote computer systems. The major components of the systems are a plurality of channels and a data processor. Each channel includes a text-to-speech translator for translating digitally stored textual data into analogue speech signals corresponding to the verbal expression of the textual data; a telephone interface for establishing a telephonic connection with a caller; and an RS232 port for accessing a data base in a remote host computer system. The data processor includes software for controlling the communications protocols used by each channel, whereby each channel emulates a computer terminal suitable for communication with the remote computer system connected to that channel, and software for extracting selected data from the data received from the remote computer system. This system enables one to telephonically receive data from a remote computer without the need for a computer terminal and without needing to reprogram the remote computer to communicate with the telephonic data communication system.
Yet another system is disclosed in U.S. Pat. No. 5,710,806, with inventors Peter Lee et al., entitled “Telecommunications device for the hearing impaired with telephone, text communication and answering, and automated voice carryover” for a telecommunication device for the hearing impaired having the capability of sending text communications via a telephone-style keypad and receiving text communications for display on an LCD display in addition to providing standard telephone voice communications. The telecommunications device automates voice carryover (VCO) calls and provides automatic answering and recording capability for text messages on the same telephone line as a voice answering machine and or a facsimile machine. The telecommunications device further includes means for selectively adjusting the amplification of received voice messages and sent voice messages to provide maximum amplification of said received voice messages without producing feedback oscillation or to provide maximum amplification of sent voice messages without producing feedback oscillation. The telecommunications device further including means for minimizing effect of the reflected impedance of the telephone line on signal transmission and optimizing the coupling of signals to the telephone line.
All these systems, although useful, are not without their shortcomings. To begin, all of these systems require that the user buy custom equipment, such as Teletype terminals and other specialty terminal devices. The expense of the required end-user telecommunication devices could be prohibitive. Accordingly, a need exists for a method and apparatus to provide individuals, who have diminished hearing or speech capabilities, with the ability to communicate, via telephone, without the need to purchase specialty terminal devices.
In other words, currently, telephony voice assistance devices require specialized hardware and software at the end user's location, restricting user access to that particular location. Moreover, provisions for customizing voice assistance resources in standalone environments are normally not provided, and the standard synthesized voice model used to speak text is unable to express gender, accenting, or modify the rate at which text is spoken. The user is left with an inflexible, costly, restrictive facility that requires maintenance and replacement.
Another shortcoming with system available today is the lack of Internet support. The convergence of telephone over the Internet in telephony IP continues to grow. The use of Web-based services continues to grow such as those available by DialPad.com. Moreover, many of the web-based services are lower cost or even free to consumers. However, none of these new services address the needs for individuals who have diminished hearing or speech capabilities over the Internet. Accordingly, a need exists to provide services to overcome the aforementioned problems.
SUMMARY OF THE INVENTIONWith recent advancements in computing and telephony technology, there is a new method for individuals who are hearing or speech impaired to participate in telephony conversations. Devices for speech and hearing impaired individuals to communicate via the telephone do exist; however, they do not utilize a Central Office service. This invention provides that service. Using a computer and subscribing to a Central Office (CO) service, hearing or speech impaired individuals can allocate and use Central Office Text-To-Speech (TTS) and Speech Recognition (SRECO) resources to communicate, on their behalf.
Accordingly, there is provided a system and method for individuals who are hearing or speech impaired to participate in telephony conversations. The major components of the system include a text-to-speech (TTS) translator, a speech recognition server (SRECO), a telephonic interface, a Central Office, a Voice Response Unit (VRU), a Subscriber Database (VASDB), a data base access interface, a Client Application (ClientApp), a Web Server interface, and an ISCP (Integrated Service Control Point) of an AIN (Advanced Intelligent Network) for controlling and coordinating the other parts.
According to one embodiment of the present invention, a method for communications in a network that includes a Central Office and an Advance Intelligent Network system, wherein the Central Office is coupled to a plurality of user units is disclosed. The method on the Advance Intelligent Network system comprising the steps of: receiving a communication session request from a first party using a first user unit; and determining if the first party is speech impaired, and if the first party is speech impaired, then performing the sub-steps of: prompting the first party for text input; receiving the text input from the first party; converting the text input, using a Text-to-Speech resource, to an audio output; and sending the audio output to a second party.
BRIEF DESCRIPTION OF THE DRAWINGSThe subject matter, which is regarded as the invention, is particularly pointed out and distinctly claimed in the claims at the conclusion of the specification. The foregoing and other objects, features, and advantages of the invention will be apparent from the following detailed description taken in conjunction with the accompanying drawings. For a fuller understanding of the advantages provided by the invention, reference should be made to the following detailed description together with the accompanying drawings wherein:
FIG. 1 illustrates an architectural diagram, of a PSTN network with an AIN system and Central office, for carrying out the present invention.
FIG. 2 is a flow diagram of, the beginning of a communication session, and, authentication of parties (an impaired and a PSTN party), according to the present invention.
FIG. 3 is a flow diagram of, allocation of appropriate resources, ending with a determination of a party's status, according to the present invention.
FIG. 4 is a flow diagram of the communications session between a SI and a non-impaired party, according to the present invention.
FIG. 5 is a flow diagram of, the communications session between a Hearing-Impaired (HI) party and a non-impaired party, according to the present invention.
FIG. 6 is a flow diagram of, the communications session between a party that is both Speech Impaired (SI) and Hearing-Impaired (HI), and a non-impaired party, according to the present invention.
DESCRIPTION OF THE EXEMPLARY EMBODIMENTSEmbodiments are Exemplary
While this invention is susceptible of embodiment in many different forms, there is shown in the drawings and will herein be described in detailed preferred embodiment of the invention with the understanding that the present disclosure is to be considered as an exemplification of the principles of the invention and is not intended to limit the broad aspect of the invention to the embodiment illustrated.
The numerous innovative teachings of the present application will be described with particular reference to the presently preferred embodiments, wherein these innovative teachings are advantageously applied to the particular problems of Systems And Methods For Assisting Speech And Hearing-Impaired Subscribers Using The Telephone And Central Office. However, it should be understood that these embodiments are only examples of the many advantageous uses of the innovative teachings herein. In general, statements made in the specification of the present application do not necessarily limit any of the various claimed inventions. Moreover, some statements may apply to some inventive features but not to others. In general, unless otherwise indicated, singular elements may be in the plural and visa versa with no loss of generality.
Exemplary Method
An exemplary general present invention method may best be understood by referencing the system diagrams of FIG.1 and the exemplary flowcharts of FIGS. 2-6. These diagrams will now be discussed in detail.
One skilled in the art will recognize that these steps may be rearranged and/or augmented with no loss of generality in the teachings of the present invention.
The teachings of the present invention are sufficiently broad to not limit the manner in which the above mentioned steps are to be performed as well as not limiting the method to any particular hardware, operating system, API, or graphical user interface. Thus, while the particular information gathered within the context of FIGS. 1-6 and the specific function calls listed in the exemplary flowcharts are preferred for some embodiments, they are by no means Limitive of the present invention teachings or the scope thereof.
Exemplary System Architecture
One example where the present invention can be reduced to practice is using on the IBM AIX Multi-Services Platform (MSP) comprising a Call Manager (CM), a VoiceAide WebServer Interface, a VoiceAide Subscriber Database (VASDB), a Text-To-Speech Server (TTS), a Voice Response Unit (DT6 VRU), a QuickPay Subscriber-Client Interface Application (VA-ClientApp) and a Speech Recognition Server (SRECO).
FIG. 1 is an architectural block diagram of a PSTN network with an AIN system and central office for carrying out the present invention. A PSTN (Public Switched Telephone Network) subscriber usinghandset telephone102 is connected through thePSTN network108 to a user using adata processing system104 running an application such as a Web browser for accessing data on theWeb server106 over thePSTN network108. An integrated service control point (ISCP)130, has adata connection136 to thePSTN network108. With the AIN (Advanced Intelligent Network) connected to the central office (CO) and data communications network (DCN). The AIN system is well known and is an architecture that separates service logic from switching equipment, allowing new services to be added without having to redesign switches to support new services. It encourages competition among service providers since it makes it easier for a provider to add services and it offers customers more service choices. The AIN system was first developed by Bell Communications Research, AIN is recognized as an industry standard in North America. TheISCP130 along with the generic data interface (GDI)128 and element management system (EMS)126 and the AIN system provides services to subscribers such as call forwarding, three-way calling, find me and other services.
TheWeb server106 can be an IBM PC or any other data processor having similar or better capabilities. The customer at thetelephone102 is referred to as a PSTN party, or, a non-impaired party, or, alternatively, a second party, or, a destination party. The customer at thedata processing system104 is referred to as an impaired party or a first party.
The Text-To-Speech Servers (TTS)116 are used to convert text, from aClient Application104, to speech and to play to aPSTN party102. The TTS Servers are connected via a Local Area Network (LAN). TheTTS servers116 are accessed via a DirectTalk/6000 TTS Custom Server, available from IBM Corporation that also interfaces with the (Voice Aide)Web Server106. A messaging interface is established between theWeb Server106 and the TTS Custom Server to carry requests, to convert text, received from theVA subscriber104 to the appropriateTTS server resource116.
The Speech Recognition Servers (SRECO)118 are used to recognize words spoken by the PSTN party. The recognized spoken words are returned as text fromSRECO Servers118 and are sent to aClient Application104 to display to an impaired party at104. TheSRECO Servers118 are connected via a LAN usingATM switch122. The servers are accessed via a DirectTalk/6000 Custom Server that also interfaces with a VACall State Table andTTS Server116.
A Voice Response Unit allows prerecorded menus of information to be played to asubscriber102 or104 and have selections made with voice or touch tone keypad input (i.e., DTMF—dual tone multiple frequency). TheVoice Response Unit112 is connected between theATM switch122 and thePSTN108 to match voice output from theTTS116 such as 8 k Mulaw output which is compatible with the PSTN network198.
A Subscriber Database (VASDB)114 is used to hold subscriber authentication information to ensure that only authorized subscribers are allowed to use the service. Further, theVASDB114 is used to store VA subscriber TTS and SRECO customization attribute values forTTS116 andSRECO118 resources. All units interface to one another via an Intranet or Internet110 and messaging between the units is TCP/IP based.
It should be understood that although these are shown as separate system, i.e.,Web Server106,Voice Response Unit112,Subscriber Database114,TTS116,SRECO118 andBilling120, any or all of these components may be combined into a single system such as the IBM MSP/600 system.
Registration and Authentication
Upon registering for a new service via the local Central Office (CO)124, a subscriber can allocate and useCO TTS116 andSRECO118 resources to communicate with anon-impaired party102.
First, a subscriber registers for VoiceAide service. At this stage, an entry, corresponding to theparticular subscriber104, is added to theVASDB114. The entry contains the subscriber's registration information and default TTS customization attribute values. Subscriber registration information contains the subscriber's account number and pin code. The account number and pin code provide secured access to the subscriber's account and act as a pointer to TTS customization information in theVASDB114.
Referring now to FIG.2 and FIG. 1, when asubscriber104 or aPSTN party102 wants to communicate (at step202) with one another, the Web Server Interface110 requests authentication of the subscriber. Next, the subscriber inputs the subscriber's account number and pin code via theClient App104. The account number is used to search (at step204) theVASDB114 for the subscriber's registration information. Verification of the pin code and account number is performed atVASDB114. If the successful, the service proceeds to next stage. If the Verification is not successful, the service re-prompts the subscriber for verification, or routes the customer to a help desk (not shown).
Next, the VoiceAide (VA) service notifies the subscriber of the results of authentication and asynchronously initiates an outbound call (at step304) to the destination party, or notifies the subscriber of an incoming call from the PSTN party. When thePSTN party102 answers, a prompt is played indicating the call received is from a Central Office VoiceAide Server110 for theimpaired subscriber104. The name of the subscriber is played to the PSTN party along with rules (at step306) for the Voice Aide call. Next, thePSTN party102 is asked to hold while notification of call establishment is sent to the subscriber104 (atsteps310 and312). After call establishment notification is received, thesubscriber104 gains control of the communication of conversation. Here, the conversation control is transferred between the PSTN party and the subscriber until the call is terminated. Thesubscriber104 is prompted for Input atstep314. When the subscriber inputs information at104, it is received by the VA service, more specifically by the Web Server110, atstep316. Next, the VA service determines (at step318) whether the subscriber is Hearing-Impaired, Speech Impaired or both Speech and Hearing-Impaired. Correspondingly, the events described below follow according to the determination.
Speech Impaired Subscriber (Si)
In the case of a speech impaired subscriber, theSubscriber104 types a message which is forwarded (at step316) to the Web Server Interface110, which in turn uses the allocated Text-to-Speech resource116 to convert (at step402) the subscriber's typed text to audio output. The audio output (at step404) is sent and played to thePSTN destination party102. TheSpeech Impaired subscriber104 has the option (at step406) of indicating whether a response to the first message is permitted from thePSTN destination party102, prior to receiving a second message from theSpeech Impaired subscriber104. If the response option is not selected, thePSTN destination party102 is not permitted to speak. If the response option is selected, thePSTN destination party102 is provided a speak-prompt (at step408), after the subscriber's first message is sent and played. The speak-prompt indicates that the Speech-Recognition resource,SRECO118, is active and will recognize spoken words and transmit text information to theImpaired Subscriber104. The Voice Aid Service waits (at step410) for speech by thePSTN destination party102. Next, the Voice Aid Service receives and forwards (at step412) the speech from thePSTN destination party102 to theImpaired Subscriber104. The communication continues in this manner (in other words repeats316) until the conversation terminates. If thePSTN destination party102 terminates, theImpaired party104 is notified that thePSTN destination party102 has terminated. If theSpeech Impaired party104 terminates, thePSTN destination party102 is played a special prompt to notify that theSpeech Impaired party104 has terminated the call.
Speech and Hearing Impaired Subscriber (Both)
In the case of a subscriber that is both speech and Hearing impaired, theimpaired subscriber104 types a message which is forwarded (at step316) to the Web Server Interface110, which in turn uses the allocated Text-to-Speech resource116 to convert (at step602) the subscriber's typed text to audio output. The audio output (at step604) is sent and played to thePSTN destination party102. The Speech and HearingImpaired subscriber104 has the option (at step606) of indicating whether a response to the first message is permitted from thePSTN destination party102, prior to receiving a second message from the Speech and HearingImpaired subscriber104. If the response option is not selected, thePSTN destination party102 is not permitted to speak. If the response option is selected, thePSTN destination party102 is provided a speak-prompt (at step608) after the subscriber's first message is sent and played. The speak-prompt indicates the Speech-Recognition resource,SRECO118, is active, can recognize spoken words, and will convert the spoken words to text, and transmit text information to theImpaired Subscriber104. The Voice Aid Service waits (at step610) for speech by thePSTN destination party102. Next, the Voice Aid Service receives (at step612) the speech from thePSTN destination party102, and converts the speech (at step614) totext using SRECO118. Next, the Voice Aid Service forwards (at step616) the text to the Speech and Hearing Impaired Subscriber. The communication continues in this manner (in other words repeats316) until the conversation terminates. If thePSTN destination party102 terminates, the Speech and HearingImpaired party104 is notified that thePSTN destination party102 has terminated the communication. If the Speech and HearingImpaired party104 terminates, thePSTN destination party102 is played a special prompt to notify that the Speech and HearingImpaired party104 has terminated the communication session.
Hearing Impaired Subscriber (Hi)
In the case of a Hearing-impaired subscriber, theSubscriber104 speaks (inputs) a message which is forwarded (at step316) to the Web Server Interface110, which in turn sends the audio output message (at step502) is sent and played to thePSTN destination party102. The HearingImpaired subscriber104 has the option (at step504) of indicating whether a response to the first message is permitted from thePSTN destination party102, prior to receiving second message from the HearingImpaired subscriber104. If the response option is not selected, thePSTN destination party102 is not permitted to speak. If the response option is selected, thePSTN destination party102 is provided a speak-prompt (at step506) after the subscriber's first message is sent and played. The speak-prompt indicates that the Speech-Recognition resource,SRECO118, is active, can recognize spoken words, and will convert the spoken words to text, and transmit text information to the Hearing-Impaired Subscriber104. The Voice Aid Service waits (at step508) for speech by thePSTN destination party102. Next, the Voice Aid Service receives and converts (at step510) the speech, from thePSTN destination party102, to text usingSRECO118. Next, the Voice Aid Service forwards (at step512) the text to the Hearing Impaired Subscriber. The communication continues in this manner (in other words repeats316) until the conversation terminates. If thePSTN destination party102 terminates, the HearingImpaired party104 is notified that thePSTN destination party102 has terminated the communication. If the Hearing Impairedparty104 terminates, thePSTN destination party102 is played a special prompt to notify that the HearingImpaired party104 has terminated the communication session.
In the present invention, TTS and SRECO conversion and playback are performed in real-time. Further customization of a TTS resource is provided to permit a subscriber to present a consistent voice interface to regularly called PSTN parties.
Two Modes: Initiation and Reception of the Call by Subscriber
VoiceAide can operate in two modes, specifically, initiation mode and receive mode. In the Initiation mode, a registeredsubscriber104 initiates an outbound call via VoiceAide; In the Receive mode, VoiceAide notifies a registeredsubscriber104 at the Internet connected terminal of an incoming call. Once a call is established at theVRU112, the VoiceAide service is run. Prompts are played to the PSTN connectedparty102 notifying the party that a CentralOffice VoiceAide Server126 is facilitating the call. ThePSTN party102 is informed thatTTS116 andSRECO118 will be used in the call and that thePSTN party102 will be played tones or queues instructing as to when to speak and listen.
Initiation by Subscriber
To initiate a call from an Internet connectedClient App104, a subscriber inputs an account number, a pin code, and the telephone number of thedestination party102. If authentication of the account number and pin code is successful (at step204), a message is sent to theClient App104 indicating to the subscriber that the call is in progress. Once the call is connected, the PSTN connectedparty102 is notified (at step306) by theVRU112 that the call is from anVoiceAide subscriber104 and that a CO VoiceAide Server110 will facilitate the session. ThePSTN party102 is informed of the session rules and is asked to hold while call establishment notification is sent (at step312) to theVoiceAide subscriber104.
Once the VoiceAide subscriber receives call establishment notification, the subscriber is presented with a screen (at step314) to input text. Upon typing text, the subscriber presses a button on a VA-ClientApp menu to send the typed text to the VoiceAide Server110. The text received at the server110 is converted to speech and spoken to the PSTN connectedparty102. Once the converted text is spoken, a prompt or queue is played to thePSTN party102 requesting a response. If there is no response, a message is sent to the VA-ClientApp informing theVA subscriber104 to type and send the next message. Communication continues in this manner until thePSTN party102 hangs up or until theVA subscriber104 presses the hang-up button on the VA-ClientApp menu.
Reception of the Call by Subscriber
To receive a VoiceAide call, the VA subscriber accesses a VA-ClientApp from a Central Office WebPage and downloads it to the terminal104. The active VA-ClientApp waits for an incoming call notification request from the VoiceAide Server110. The subscriber may also choose to terminate the wait and initiate an outbound call, instead.
Thesubscriber104 receives an incoming call notification in the form of a pop-up menu. The pop-up menu contains Caller ID information also. The subscriber answers the incoming call by pressing the answer button on the VA-ClientApp menu. Next, the subscriber is asked to provide a VoiceAide account number and pin code for authentication. If authentication is successful, the call continues in the manner as if theVA subscriber104 had initiated the call.
Customization
In both of the aforementioned modes, the subscriber can customize a TTS resource. To customize a TTS resource, the subscriber presses a customize button on the VA-ClientApp menu. The next menu requires the subscriber to enter a VA account number and pin code. If authentication is successful, the subscriber is shown a menu of TTS voice attributes. The subscriber can modify values in the voice attribute fields and also can test the modifications by pressing a play button on the TTS customization menu. This causes a sample phrase to be converted to speech using the new attribute values and downloaded to the subscriber'sterminal104.
After completing TTS voice customization, asubscriber104 can save customization values in theVASDB114 with their associated authentication information. When a VA call is initiated or received, the saved values will be used in the TTS conversion process.
In the present invention, theVoiceAide VRU112 can be directly connected to the Public Switch Telephone Network (PSTN)106 or can be connected to anintermediate switch120 that is connected to thePSTN106. Further, the VoiceAide service can be written in any language, the VRU programming interface can support, or the service platform on which it resides.
Termination
When theVA subscriber104 or thePSTN party102 terminates the VA call, resources associated with the call are freed. In one embodiment, to terminate, theVA subscriber104 selects a hang-up button from the VA-ClientApp menu. Next, thePSTN party102 is played a prompt. This prompt alerts thePSTN party102 that the call is terminating and to hang-up the telephone.
If thePSTN party102 terminates first, a text pop-up message will be presented, by the VA-ClientApp, to alert theVA subscriber104 that thePSTN party102 has terminated the call. In response, theVA subscriber104 selects a close button on the pop-up menu. This action frees the CO resources associated with the call. Then, the VA-ClientApp waits for the next incoming call.
Although a specific embodiment of the invention has been disclosed, those having skill in the art will understand that changes can be made to this specific embodiment without departing from the spirit and scope of the invention. The scope of the invention is not to be restricted, therefore, to the specific embodiment, and it is intended that the appended claims cover any and all such applications, modifications, and embodiments within the scope of the present invention.