CROSS REFERENCE TO RELATED APPLICATIONSThis application is related to and claims priority benefit to U.S. Provisional Application 62/183,923 filed Jun. 24, 2015 entitled, “Mediation Of A Combined Asynchronous And Synchronous Voice Communication Session”.
BACKGROUNDCommunication devices utilizing wireless communication protocols are ubiquitous. These devices may utilize a cellular voice network (e.g., GSM or CDMA), a cellular packet data network (e.g., LTE) or a non-cellular packet data network such as 802.11 WiFi over the Internet to place and receive telephone calls to other communication devices. A communication device may include another mobile communication device on the same or another cellular network, a Voice-over Internet Protocol (VoIP) communication device, a hybrid VoIP/Cellular communication device and/or a plain old telephone service (POTS) communication device. Moreover, a variety of computer devices may utilize communication interfaces and protocols for exchanging audio, video, and text data over an Internet Protocol (IP) network such as, for instance, the Internet. Each of these telephony or computer communication devices may use a different access network but all are interfaced at some point to allow for communication among the different networks.
Communication may be further categorized as synchronous or asynchronous which, for purposes of this disclosure may refer to real-time versus near real-time and/or non real-time. For instance, synchronous communication can refer to an exchange between communication endpoints in which each endpoint may relay and render data in a real-time fashion. Asynchronous communication can refer to an exchange between communication endpoints in which each endpoint may relay and render data in a near real-time and/or non real-time fashion. Synchronous communication, due to its real-time nature, has a limited time domain in which to cope with various network factors including, for example, jitter compensation, out of sequence packet arrival, missing, late, or lost packets, etc. Additionally, synchronous communication generally may require constant and consistent attention and interaction from the parties at both communication endpoints in order to be productive, thereby maximizing each party's communication benefit and minimizing wasted time. Asynchronous communication is not bound by the same limited time domain as synchronous communication when coping with network factors, nor is it bound by the constant and consistent attention and interaction from the parties at both communication endpoints in order to be productive. Asynchronous communication may be capable of operating in circumstances where network conditions are unavailable, partially unavailable, congested, latent or otherwise non-conducive to synchronous communication through store and forward capabilities and alternate transport/protocol mechanisms preferring, for example, reliable delivery ordering and receipt acknowledgement of data.
Typically, synchronous and asynchronous devices do not communicate with one another in what one would characterize as a unified communication session. The real-time versus non or near real-time nature of the two modes does not necessarily make for a good communication experience mainly because of the delay associated with the asynchronous mode of communication. However, recent advances in networks and communication protocols have made it possible for asynchronous communications to be very close to real-time to the point of being usable with a synchronous device. This is especially true when such a device either is not capable of synchronous communication or does not currently have access to a network that supports, or can currently maintain, a synchronous communication. Moreover, there may be times when a device may have access to synchronous communications but wishes to remain in an asynchronous mode due to cost or environmental considerations. Environmental considerations may include a desire to remain in asynchronous mode because of high levels of background noise or an environment that is very distracting making it easier to maintain an asynchronous connection versus a synchronous connection.
Described herein are methods, systems, and techniques for mediating a communication session between an asynchronous communication device and a synchronous communication device.
BRIEF DESCRIPTION OF THE DRAWINGSFIG. 1 illustrates an exemplary networked environment for implementing certain exemplary embodiments described herein.
FIG. 2 is a block diagram illustrating some of the functions of a communications server according to one or more embodiments described herein.
FIG. 3 is a block diagram illustrating a multi-mode communication device according to one or more embodiments described herein.
FIG. 4A is a first messaging diagram according to one or more embodiments described herein.
FIG. 4B is a second messaging diagram according to one or more embodiments described herein.
FIG. 5A is a third messaging diagram according to one or more embodiments described herein.
FIG. 5B is a fourth messaging diagram according to one or more embodiments described herein.
FIG. 6A is a fifth messaging diagram according to one or more embodiments described herein.
FIG. 6B is a sixth messaging diagram according to one or more embodiments described herein.
DETAILED DESCRIPTIONThe embodiments described herein disclose techniques, systems and methods for intelligently structuring, handling, and executing communication sessions among computer and/or communication devices. The systems and methods of the invention may be embodied in and performed by communication devices, communications servers, and other devices, as well as software instructions executed by some or all of such devices, as will be explained in detail below. The different types of networks contemplated herein include, for example, cellular mobile voice networks, cellular mobile data networks utilizing Internet Protocol (IP) protocol(s), the public switched telephone network (PSTN), and packet based data networks, such as the Internet or other packet switched IP-based networks, including wide area networks, local area networks, and combinations thereof.
As used herein the term “communication session” is meant to generally indicate a hybrid synchronous and asynchronous—duplex exchange of audio (e.g., voice telephony call, audio streaming data, or segmented audio data), between two or more computer and/or communication devices. As used herein, the term “communication device” is intended to mean a device capable of connecting to one or more telephony network(s) (e.g. the PSTN, one or more cellular mobile networks—voice and/or data, one or more VoIP networks, one or more data networks (e.g., the Internet, local area networks (LANs)). A device may be wired or wireless and may operate on one or more telephony networks including, but not limited to, a packet switched IP-based network, a cellular mobile network, a cellular data network or the PSTN. As used herein, the term “communication link” is intended to mean a physical or logical channel that connects a communication device with another communication endpoint. A communication endpoint may be another communication device or a communications server, the communications server operable to mediate a communication session between communication devices. A communication link may be a signaling link, a media link, or both. In this context, a communication session may be established via two communication links. One or more media streams may be transmitted over a communication link. A communications server may be situated between devices thereby making the communications server an endpoint in a communication link. A communications server may be hosted within an IP network such as, for instance, the Internet or a LAN/WAN accessible to the Internet.
The convergence of and inter-operation among different types of network technologies (e.g., heterogeneous network inter-operability) blurs the line between various distinct networks. This disclosure's discussion of networks includes the portion of a network that connects devices to a service provider's core network. This portion of a network may also be referred to as the interface between the device and the network. Another type of interface may be the interface between networks. That is, the interface necessary to facilitate seamless communication from one network to another.
Therefore, references herein to a device capable of connecting to or communicating via a cellular mobile voice network or cellular mobile data network refer to a device equipped with a cellular transceiver for wireless communication with basestations and other cellular mobile access points. Similarly, references herein to a device capable of connecting to or communicating via a data network refer to a device equipped with a transceiver or other network interface for wireless communication (e.g., 802.11) with a router or other data network access point. One particular device may be characterized herein as a communication device. A communication device may include multiple RF transceivers, one of which may be operable to connect to an access network for a cellular mobile network and another of which may be operable to connect to an access network for an IP data network (e.g., 802.11).
FIG. 1 illustrates an exemplary networkedenvironment100 for implementing certain exemplary embodiments described herein. Thenetworked environment100 may include multiple distinct inter-connected networks such as, for instance, a large scale Internet Protocol (IP) network (e.g., the Internet)101, one or more IP based local area networks (LAN)107, cellularmobile networks105, and the PSTN109. While these distinct networks utilize different protocols and signaling schemes, there are various interfaces (not shown) that allow for the seamless transition of voice and data (including text, audio, and video) such that various devices may communicate with one another over one or more of these inter-connected networks.
ThePSTN109 can be characterized as a circuit switched point-to-point communication network in which a physical connection between the endpoints is maintained for the duration of the connection or communication link. ThePSTN109 may also be referred to as the legacy telephone network as it is the backbone infrastructure for connecting communication devices comprised of Plain Old Telephone Service (POTS) phones116.
Cellularmobile networks105 may come in different varieties based on the radio transmission scheme between acommunication device104,106 (e.g., mobile or cellular phone) and the cellularmobile network basestation110 that may be in communication with thecommunication device104,106. In this embodiment,communication device104 represents a communication device capable of asynchronous communication with acommunications server102 whilecommunication device106 may be limited to synchronous communications withcommunications server102. Two such circuit switched voice radio transmission schemes are the Global System for Mobile Communication (GSM) and Code Division Multiple Access (CDMA). These circuit switched radio transmission schemes are incompatible with one another necessitating an intervening interface to allow communication between endpoints on either network. In addition, each network may operate over specific frequency ranges. Often, there may even be an intervening network such as thePSTN109 between two distinct cellular mobile voice networks105. For each cellularmobile voice network105, an interface to thePSTN109 may exist such that calls crossing that interface can be handled by the receiving network whether it is a cellularmobile network105 or thePSTN109.
Various cellular mobile network operators base their voice communications on one of the circuit switched radio transmission schemes and provide service tocommunication devices104,106 using that radio transmission scheme over a defined frequency band. For example, acommunication device104,106 wirelessly communicates with abasestation110 that serves as an access network to the cellularmobile network105. Thebasestation110 authenticates and authorizes thecommunication device104,106 to the cellularmobile network105 and, in conjunction with other equipment within the cellularmobile network105, manages calls to and from thecommunication device104,106. The cellularmobile network105 may provide circuit switched connectivity for anycommunication devices104,106 capable of cellular transmission that are physically located within range of the cellularmobile network105. The range of a cellularmobile network105 depends in part on an amplification, power, and/or energy associated with the antennas comprising cellular base station,communication devices104,106 and the like. This is true whether the communication device is utilizing the cellular mobile network's circuit switched voice protocols or data protocols (e.g., 2G, 3G, 4G, LTE, etc. . . . ) to communicate.
In fact, synchronous and asynchronous communications between acommunication device104 andcommunications server102 may occur over a cellular IP data channel such as, for instance, a 2G IP data channel, a 3G IP data channel, a 4G IP data channel, or LTE. Using these aforementioned data channels as the conduit for IP packet data, thecommunication device104 may utilize any number of protocols (e.g., VoIP, MQTT, webRTC) or messaging schemes (e.g., short messaging service (SMS) or multi-media messaging service (MMS)) to exchange content with thecommunications server102.
Similarly, an IP baseddata network107 like theInternet101 may provide wireless connectivity tocommunication devices104,106 that are also VoIP enabled andVoIP communication devices118 within range of anIP access point112. For instance, anIP access point112 may provide wireless connectivity using any of the 802.11 WiFi standards and/or any other type of IP based connectivity standard. As will be appreciated by those of skill in the art, acommunication device104,106 orVoIP communication device118 may experience a stronger connection signal when located closer to anIP access point112 than when located further away from theIP access point112. Thus, the strength of the wireless data connection may fade as the dualmode communication device104, synchronousmode communication device106, orVoIP communication device118 moves away from anIP access point112. In some cases theVoIP communication device118 may be wired directly to theIP access point112 via, for instance, an Ethernet coupling. In another embodiment, a computer device (not shown) may be used to create and exchange messages withcommunications server102.
The collection of IP based data networks illustrated inFIG. 1 such asLANs107, and theInternet101 all run on a packet based data transfer protocol characterized as packet switching. Packet switching essentially chops up a data stream (e.g., text, voice, data) into segments and transfers them across an IP network to a destination where the packets are re-assembled into the original data stream for output. Voice over IP (VoIP) is a specialized subset of IP packet based communication directed to IP telephony.VoIP communication device118 and VoIP enabledcommunication devices104,106 utilize anIP access point112 to the larger IP network such asLAN107 and thenInternet101. TheIP access point112 may be wired, wireless (e.g., WiFi), or a combination wired/wireless access point such as those illustrated inFIG. 1. AVoIP communication device118 and VoIP enabledcommunication devices104,106 may communicate with anIP access point112 to gain access to thelarger IP network101 and other communication devices. TheVoIP communication device118 has been illustrated as a wireline type device but may just as easily be a wireless device communicable with theIP access point112 over, for instance, one or more of the 802.11 protocols, like the VoIP enabledcommunication devices104,106. Thecommunications server102 may act as a centralized point for mediating communication sessions between devices. Thecommunications server102 may send and receive signaling data to set up and establish communication links between itself and thecommunication devices104,106,116,118. Once these communication links are established, thecommunications server102 may mediate synchronous, asynchronous, or a hybrid sync/async communication session.
In certain embodiments, cellular mobile network(s)105 include cellular networks or portions of cellular networks based on GSM, CDMA, 2G, 3G, 4G, LTE, and/or any other cellular network standards. IP baseddata networks107,101 include, for example, the Internet, one or more intranets, wide area networks (WANs), local area networks (LANs), and the like, portions or all of which may be wireless and/or wired. For instance, an IP baseddata network107,101 may be a wireless network or a portion of a wireless network implemented using an IEEE 802.11 standard, WiMAX (e.g., IEEE 802.16), and/or any other wireless data communication standard.
The various networks109 (PSTN),105 (Cellular),107,101 (IP Based) may interface withcommunications server102 through gateway devices, routers and/or other appropriate devices (not shown). Similarly, thecommunication devices104,106 may interface with the various networks109 (PSTN),105 (Cellular), and107,101 (IP based) throughappropriate access points110,112 (others not shown).
In addition, thecommunication devices104,106 via the cellularmobile network105 or aLAN IP network107 are capable of sending data including short message service (SMS, MMS) text or media messages into the IP network(s)101,107. Further, thecommunication devices104,106 via the cellularmobile network105 or aLAN IP network107 are capable of sending data over out of band signaling and data mechanisms/protocols such as Message Queuing Telemetry Transport (MQTT) and webRTC data channels.
FIG. 2 is a block diagram illustrating some of the functions of thecommunications server102 according to one or more embodiments described herein. Thecommunications server102 may comprise, for example, a server computer or any other system having computing capability and IP network connectivity. Thecommunications server102 may be hosted in a packet based IP network such as, for instance, the Internet. The schematic block diagram shows that thecommunications server102 may include at least oneprocessor205, adata storage component210 storing a Sync/Async module215 possessing one or more communication interfaces. The data storage module may further include an Automated Speech Recognition (ASR)engine220 and a Text-to-Speech (TTS)engine225.
The Sync/Async module215 may be responsible for constructing, accepting, maintaining, and mediating the various communication links of a communication session. Each participating device in a communication session may be communicable with thecommunications server102 via the Sync/Async module215 over a separate communication link or links. The Sync/Async module215 may also mediate between the appropriate communication links to create a communication session between two communication devices in which one communication device is in a synchronous communication mode while the other communication device is in an asynchronous communication mode.
The Sync/Async module215 may be responsible for processing communication session signaling and media including setting up and tearing down communication links with various devices and other call servers using one or more communication channels or protocols. In one embodiment, the Sync/Async module215 may send and receive session initiation protocol (SIP) messages. While the Sync/Async module215 may utilize one or more VoIP protocols such as SIP, it can communicate synchronously with end user devices that are not VoIP based by routing VoIP signaling, such as SIP, through other call servers that perform interface conversions from SIP to other protocols such as, for instance, SS7 for the PSTN or CDMA/TDMA/GSM for cellular mobile networks.
Thedata storage component210 ofcommunications server102 may also include anASR engine220 and aTTS engine225. TheASR engine220 and theTTS engine225 provide an additional capability to mix voice and text communications within a single communication session. For instance the asynchronous user may be in a meeting that prevents them from speaking but not from creating text messages. In this scenario, thecommunications server102 via theASR engine220 and aTTS engine225 can convert text to speech and send the audio to the synchronous user while converting speech to text to send to the asynchronous user.
Alternatively, a plurality ofcommunications servers102 may be employed and may be arranged, for example, in one or more server banks or computer banks or other arrangements. For example, a plurality ofcommunications servers102 together may comprise a cloud computing resource, a grid computing resource, and/or any other aggregated or distributed computing arrangement.Such communications servers102 may be located in a single installation or may be distributed among different geographical locations. For purposes of convenience, thecommunications server102 is illustrated inFIGS. 1 and 2 and referred to herein in the singular. Even though thecommunications server102 is referred to in the singular, it is understood that a plurality ofcommunications servers102 may be employed in various arrangements as described above. In addition, thecommunications server102 functions may themselves be separated into different servers. For example, there may be one ormore communications servers102 hosting and executing Sync/Async modules215. Such an architecture may be representative of a larger media server handling the media streams whileother communication servers102 handle communication session signaling and coordinate the use of a media server for media.
The communication interface(s) may include a voice-over-IP (VoIP) interface adapted to exchange IP based telephony signaling and/or media data with other IP network devices using a VoIP protocol. Another communication interface may be a PSTN interface adapted to convert incoming PSTN signaling and audio data to VoIP signaling and audio data and convert outgoing VoIP signaling and audio data to PSTN signaling and audio data. Still another communication interface may be an IP data interface adapted to exchange IP data with other IP network devices. The IP data may be indicative of audio, video, text or other streaming data. This may also include IP data exchanged with a mobile communication device over an intermediate cellular mobile network. Yet another communication network interface may be directed toward an alternative network (not shown) adapted to exchange data with a computing and/or communications device. Examples of alternative network(s) may include, but are not limited to, WiMax and whitespace. A whitespace network may be characterized as one that utilizes frequency spectrum that is overlapping with that of broadcast television frequency spectrum.
FIG. 3 is a block diagram illustrating acommunication device104 capable of both synchronous and asynchronous communication with thecommunications server102 according to one or more embodiments described herein. Thecommunication device104 may include a processor orprocessors305 for controlling the various components and functions of thecommunication device104. Thecommunication device104 may also include multiple RF transceivers such as, for instance, aWiFi transceiver310 and acellular transceiver315.
TheWiFi transceiver310 may be operable to communicate with an IPnetwork access point112 using one or more of the 802.11 wireless transmission protocols. Upon connection with an IPnetwork access point112, thecommunication device104 may exchange IP data with servers or other computers that are connected with or communicable with theInternet101 via LAN/WAN107. Such IP data exchanges may also occur using, for instance, an MQTT channel to carry IP data or an MMS transport mechanism to carry audio data. This may include thecommunications server102 shown inFIG. 1.
Thecellular transceiver315 may be operable to communicate with a cellularmobile network105 for both circuit switched voice and IP data communication. On the circuit switched voice side, the cellularmobile network105 may be based on GSM, CDMA, TDMA or other communication protocols while on the cellular IP data side, the cellularmobile network105 may be based on, for example, GPRS, EDGE, EV-DO, HSPA-D, HSPA-U, LTE, UMTS-WCDMA, UMTS-TDD, etc. It should be noted that the cellular IP data may include media (e.g., voice) data thereby making the cellular IP data side a viable conduit for synchronous voice communications based on VoIP or asynchronous voice communications using, for instance an MQTT channel, webRTC data channel, or even a multi-media messaging system (MMS) message.
Thecommunication device104 may further includedata storage325 and software applications such as, for instance, a sync/async communications application330. Thecommunication device104 may also include various user interface(s)302. Thedata storage325 may include, for example, one or more types of memory devices including, but not limited to, flash memory usable for ROM, RAM, PROM, EEPROM, and cache. Other software applications (not shown) may include, for example, one or more software applications executable on or by the processor(s)305 including, but not limited to, email applications, native phone dialers, contact applications, calendar applications, and specific data and/or audio/video applications. The user interface(s)302 may include, for example, a display, a touchscreen for soft-key input, speaker(s), microphone(s), a keyboard for hard-key input, and one or more buttons. Thedata storage325 may store contact data for people including, but not limited to, multiple telephone numbers, email addresses, SMS/MMS enabled telephone numbers, postal addresses, and the like. The contact data may be used by a contact application in conjunction with other applications on thecommunication device104 to facilitate communication sessions with the people in the contact database.
Similar tocommunications server102, adata storage component325 ofcommunication device104 may also include anASR engine335 and aTTS engine340. TheASR engine335 and theTTS engine340 provide an additional capability to mix voice and text communications within a single communication session.
In one embodiment, the sync/async communications application330 may facilitate setting up, via the user interface(s)302, a hybrid sync/async communication session with another end user device mediated by thecommunications server102. In this embodiment, the end user may issue a speech command that gets parsed and packed into an MQTT channel (or webRTC data channel) and sent over an IP data channel (e.g., 802.1 WiFi or one of the cellular IP data protocols) to thecommunications server102. Thecommunications server102 may then process the speech command and convert it to telephony signaling instructions. For instance, the speech command may include the audio “Call Jared”. Thecommunications server102 may have knowledge of the end user's contacts such that it can determine who Jared is and a telephone number associated with Jared. Similarly, the speech command may include audio such as “Call 919-555-1234”. In this example, thecommunications server102 will interpret the speech command as a request to establish a connection with the device associated with the telephone number 919-555-1234. Thecommunications server102 may then set about to establishing a synchronous communication link, via SIP for instance, with the telephone number associated with Jared in the first example or the telephone number 919-555-1234 in the second example. Once this communication link with the destination end user device is established, that device may communicate synchronously with thecommunications server102. Thecommunications server102 also maintains an asynchronous connection with the end user calling device over, for instance, an MQTT channel (or webRTC data channel) riding on an 802.11 WiFi connection or one of the cellular IP data connections. At this point, thecommunications server102 may receive voice data synchronously from one end user device and parse and pack it into a series of MQTT messages for delivery to the other end user device. In the reverse direction, thecommunications server102 may receive voice data asynchronously via a series of MQTT messages from the device operating in asynchronous mode. This voice data in the MQTT messages is then re-packaged and relayed by thecommunications server102 over the synchronous communication link to the other end user device. Thus, thecommunications server102 sits between the two end user devices and converts, as necessary, sync to async and vice versa to enable a hybrid communications session between two end user devices not necessarily operating in the same mode—sync or async.
In another embodiment, thecommunications server102 and/or thecommunication device104 may employ anASR engine220,335 andTTS engine225,340 to convert speech to text and text to speech. Utilization of such engines may allow thecommunications server102 and/orcommunication device104 to translate voice requests, audibly output dynamic data, and overall mix/blend a communication session in which one end user device is communicating synchronously via voice while the other end user device is communicating asynchronously via text. In such a scenario, thecommunications server102 and/orcommunication device104 may, viaASR engines220,335 recognize and convert voice media into text. Similarly,communications server102 and/orcommunication device104 may, viaTTS engines225,340, convert text to speech in order to convey typed messages from an asynchronous device to a simulated voice for a synchronous device. The embodiments are not necessarily limited to the examples described herein.
FIG. 3 has been described as acommunication device104. Other types of telephony and/or computing devices may be used in addition to the just describedcommunication device104. For example, a networked computer having Internet connectivity may also include many of the same applications (e.g., text messaging, browser, email) that perform many of the same functions ascommunication device104. If the networked computer further includes a cellular transceiver, it can perform all of the functions that are associated with thecommunication device104. Similarly a VoIP terminal with a processor and programmable storage that is connected to an IP network may also include many of the same applications (e.g., text messaging, browser, email) that perform many of the same functions ascommunication device104. Another example may be a tablet computer device with WiFi and/or cellular IP data capabilities that is connectable to an IP network. Thus, acommunication device104 is only one type of device that can initiate a communication session using the techniques described herein.
Included herein is a set of flow charts and message diagrams representative of exemplary methodologies for performing novel aspects of the disclosed architecture. While, for purposes of simplicity of explanation, the one or more methodologies shown herein, for example, in the form of a flow chart or flow diagram, are shown and described as a series of acts, it is to be understood and appreciated that the methodologies are not limited by the order of acts, as some acts may, in accordance therewith, occur in a different order and/or concurrently with other acts from that shown and described herein. For example, those skilled in the art will understand and appreciate that a methodology could alternatively be represented as a series of interrelated states or events, such as in a state diagram. Moreover, not all acts illustrated in a methodology may be required for a novel implementation.
FIG. 4A is one embodiment of a messaging diagram for establishing a communication session between an asynchronous capable initiatingcommunication device104 to asynchronous communication device106,116,118. In this embodiment, an initiatingcommunication device104 creates and sends aninvite message410 to acommunications server102. Theinvite message410 may include contact information for the destination communication device such as a telephone number or an audio contact request, as well as various communications paths for communication links, sync/async communication modality capabilities and modalities supported and/or requested. Theinvite message410 may include identifying and/or authenticating information for the initiatingcommunication device104, or it may have been previously established over, for example, an existing signaling link. Theinvite message410 may be sent using a lightweight protocol such as, for instance, MQTT or webRTC data channel over an 802.11 WiFi access point if available or over a cellular based IP data channel. Theinvite message410 may also be packed into an SMS/MMS message in which thecommunications server102 is associated with an SMS/MMS enabled telephone number. Theinvite message410 may also be implicit from an SMS text message or MMS audio message addressed to thedestination communication device106,116,118 and relayed through thecommunication server102, whereby the destination communication device116 does not have the capabilities to receive those types of messages, as it is a POTS phone. Likewise, in another embodiment, preferences may have been previously established to relay and transform SMS text message(s) or MMS media message(s) in such manner. A ‘dialer’ or ‘messaging’ application executing on the initiatingcommunication device104 may determine which mechanism to send theinvite message410.
Thecommunications server102 unpacks and parses theinvite message410 to determine the identity of both the initiatingcommunication device104 anddestination communication device106,116,118. In an embodiment in which the identity information is a telephone number—or resolves to a telephone number, thecommunications server102 sets out to establish a synchronous communication link with thedestination communication device106,116,118. Thecommunications server102 may issue aSIP Invite412 addressed to thedestination communication device106,116,118. While not generally pictured, the SIP invite412 may include information necessary to identify the initiatingcommunication device104. Thedestination communication device106,116,118 may respond with a sequence of messages intended to accept theSIP invite412. This sequence of messages may be shorthanded here as aSIP connect414. At this point, the communications server may respond appropriately and a synchronousaudio media channel416 may be established between thecommunications server102 and thedestination communication device106,116,118.
As thecommunications server102 is establishing a synchronous communication link with thedestination communication device106,116,118, the initiatingcommunication device104 may begin creating and sending audio data to thecommunications server102 to be relayed to thedestination communication device106,116,118. An end user may utilize an interface on the initiatingcommunication device104 to record audio that may then be segmented and packed as IP data into a series of MQTT orwebRTC messages418, for instance. Once the audio data has been packed into the transport mechanism, MQTT, webRTC, or otherwise, the audio data may be sent message bymessage420 to thecommunications server102. Thecommunications server102 unpacks, reformats and sends422 the audio data received from the initiatingcommunication device104 to thedestination communication device106,116,118 over the establishedaudio media channel416.
The end user of thedestination communication device106,116,118 may consume the audio data and respond by talking back. The speech or audio data is again carried over theaudio media channel416 from thedestination communication device106,116,118 to thecommunications server102. This time thecommunications server102 segments and packs the received audio into one or more MQTT orwebRTC messages424 or other transport mechanism. Thecommunications server102 may then send426 the MQTT or webRTC messages to the initiatingcommunication device104. The end user of the initiatingcommunication device104 may, automatically or via a user interface, play the receivedaudio data428. Upon listening, the end user of the initiatingcommunication device104 may opt to continue the conversation by recordingadditional audio data418. TheASR engine335 may recognize a key word such as the destination user's name and automatically record anything that follows terminating the recording upon end of speech detection. The user interface may also utilize DTMF tones to mark the beginning and end of speech segments. Thisprocess430 may continue until the conversation is complete. The conversation may continue in this manner whereby the initiatingcommunication device104 is communicating asynchronously with thecommunications server102 while thedestination communication device106,116,118 is communicating synchronously with thecommunications server102. Thecommunications server102 mediates between the two devices to keep the communication session going.
In some embodiments, theinvite message410 may contain both the destination information as well as initial audio data for relay as would otherwise be provided in418. For example, and invitemessage410 may contain an audio data request of “Message Jared. Want to meet for dinner tonight?” Wherein animplicit audio relay422 of “Want to meet for dinner tonight?” happens immediately after416 without418 or420.
It should be noted that if thedestination communication device106 routes the original invite request to a voice mail system, the originally recordedaudio data418 may be recorded into the destination communication device's106,116,118 voice mail system. In another embodiment, if thecommunication server102 determines that the request has been routed into a voice mail system and the destination communication device supports SMS text messages and/or MMS audio messages the message could be alternately or additionally delivered via SMS text message after ASR processing or sent via an MMS audio message. This preference may be inferred by thecommunication server102 as desirable if, for example, thedestination communication device106,116,118 responds to the call with a quick response text message. It should further be noted that that content of the text message response may be helpful and used by thecommunication server102 in the determination of the message relay mode preference.
FIG. 4B is another embodiment of a messaging diagram for establishing a communication session between an asynchronouscapable communication device104 to asynchronous communication device106,116,118. In this embodiment, the synchronous initiatingcommunication device106,116,118 and the asynchronousdestination communication device104 are reversed fromFIG. 4A. The initiatingcommunication device106,116,118 may send a SIP invite to thecommunications server102 intending to establish a communication session with thedestination communication device104. The intent of the initiatingcommunication device106,116,118 may be to establish a synchronous communications session but thecommunications server102 may determine that thedestination communication device104 is currently in an asynchronous mode. In such a case, thecommunications server102 may establish a synchronous communication link with the synchronous initiatingcommunication device106,116,118 as described above. Similarly, thecommunications server102 may segment and pack the received audio into one or more MQTT orwebRTC messages424 or other transport mechanism. Thecommunications server102 may then send426 the MQTT or webRTC messages to thedestination communication device104. The end user of thedestination communication device104 may, via a user interface, play the receivedaudio data428. The end user of thedestination communication device104 may opt to continue the conversation by recording and packing audio data into one or more MQTT messages, webRTC messages or the like418. The segmented audio data may then be sent via MQTT, webRTC or the like from thedestination communication device104 to thecommunications server102. Just as before, the conversation may continue in this manner whereby the initiatingcommunication device106,116,118 is communicating synchronously with thecommunications server102 while thedestination communication device104 is communicating asynchronously with thecommunications server102. Thecommunications server102 mediates between the two devices to keep the communication session going.
FIGS. 5A and 5B are additional embodiments of a messaging diagram for establishing a communication session between an asynchronouscapable communication device104 to asynchronous communication device106,116,118. Not allsynchronous communication devices106,116,118 are or need be SIP devices. In this embodiment, there may be another telecom server103 betweencommunications server102 and thesynchronous communication device106,116,118. In such a situation, theother telecom server103 may be responsible for bridging communications between thesynchronous communication device106,116,118 and thecommunications server102. For example, if aSIP_Invite message512 encounters an intermediate interface server (e.g., other telecom server103) before reaching thesynchronous communication device106,116,118 to which the SIP_Invite message is intended, theother telecom server103 will translate theSIP_Invite message512 to aConnect_Request message513 and forward to the intendedsynchronous communication device106,116,118. The Connect_Request label is generic and is not intended to represent any specific protocol or telecommunication signaling system. Rather, it is representative of whatever signaling or messaging is implemented by theother telecom server103 in communicating withsynchronous communication device106,116,118. Thus, theConnect_Request message513 may be an SS7 protocol message, a CDMA protocol message, a TDMA protocol message, etc. depending on the telecommunication protocol implemented by theother telecom server103.
Thesynchronous communication device106,116,118 may receive aConnect_Request message513 and respond accordingly. Thesynchronous communication device106,116,118 may return aConnect message515 to theother telecom server103 which, in turn, translates thegeneric Connect message515 to aSIP Connect message514 and forwards theSIP Connect message514 to thecommunications server102. Thecommunications server102 may then establish anaudio media channel516,517 between thecommunications server102 and thesynchronous communication device106,116,118. Theaudio media channel516,517 may traverse one or more telephony networks and one or moreother telecom servers103 depending on the telephony network to which asynchronous communication device106,116,118 subscribes.
The remainder ofFIG. 5A operates the same as that illustrated and described inFIG. 4A. Similarly, the remainder ofFIG. 5B operates the same as that illustrated and described inFIG. 4B. In each case the difference lies in how thesynchronous communication device106,107,109 ultimately connects with thecommunications server102. InFIGS. 4A-B, thesynchronous communication device106,107,109 may utilize the SIP protocol whereas inFIGS. 5A-5B, thesynchronous communication device106,107,109 may not utilize the SIP protocol and require theother telecom server103 to translate to/from the SIP protocol.
FIG. 6A illustrates another embodiment of a messaging diagram for establishing a communication session between an asynchronous capable initiatingcommunication device104 to asynchronous communication device106,116,118. In this embodiment, the communication between thecommunications server102 and theasynchronous device104 may be text based not voice or audio based.
In this embodiment, an initiatingcommunication device104 creates and sends aninvite message610 to acommunications server102. Theinvite message610 may include contact information for the destination communication device such as a telephone number or an audio contact request, as well as various communications paths for communication links, sync/async communication modality capabilities and modalities supported and/or requested. Theinvite message610 may include identifying and/or authenticating information for the initiatingcommunication device104, or it may have been previously established over, for example, an existing signaling link. Theinvite message610 may be sent using a lightweight protocol such as, for instance, MQTT or webRTC over an 802.11 WiFi access point if available or over a cellular based IP data channel. Theinvite message610 may also be packed into an SMS text message in which thecommunications server102 is associated with an SMS enabled telephone number. Theinvite message610 may also be implicit from an SMS text message or MMS audio message which is addressed to thedestination communication device106,107,109 and is relayed through thecommunication server102, whereby thedestination communication device109 does not have the capabilities to receive those types of messages, as it is a POTS phone. Likewise, in another embodiment, preferences may have been previously established to relay and transform SMS text message(s) or MMS audio message(s) in such manner. A ‘dialer’ or ‘messaging’ application executing on the initiatingcommunication device104 may determine which mechanism to send theinvite message610.
Thecommunications server102 unpacks and parses theinvite message610 to determine the identity of both the initiatingcommunication device104 anddestination communication device106,116,118. In an embodiment in which the identity information is a telephone number or resolves to a telephone number, thecommunications server102 sets out to establish a synchronous communication link with thedestination communication device106. Thecommunications server102 may issue aSIP Invite612 addressed to thedestination communication device106. While not generally pictured, the SIP invite612 may include information necessary to identify the initiatingcommunication device104. Thedestination communication device106 may respond with a sequence of messages intended to accept theSIP invite612. This sequence of messages may be shorthanded here as a SIP connect614. At this point, the communications server may respond appropriately and a synchronousaudio media channel616 may be established between thecommunications server102 and thedestination communication device106. It should be noted that thedestination communication device106 may be associated with a non-SIP system in which case the teachings and description ofFIGS. 5A-5B regarding another telecom server103 come into play to allow for a media channel to be set up and maintained between thecommunications server102 and thedestination communication device106.
As thecommunications server102 is establishing a synchronous communication link with thedestination communication device106,116,118, the initiatingcommunication device104 may begin creating and sending text data to thecommunications server102. An end user may utilize an interface on the initiatingcommunication device104 to enter text that may then be segmented and packed as IP data into a series of MQTT messages, orwebRTC messages618, for instance. Once the text data has been packed into the transport mechanism, MQTT, webRTC, or otherwise, the text data may be sent620 message by message to thecommunications server102. Thecommunications server102 unpacks the text data or text messages and converts the underlying text to speech via aTTS engine225. The speech may then be sent622 to thedestination communication device106 over the establishedaudio media channel616.
The end user of thedestination communication device106 may consume the audio data and respond by talking back. The speech or audio data is again carried over theaudio media channel616 from thedestination communication device106 to thecommunications server102. This time thecommunications server102 converts the speech to text via anASR engine220 before segmenting and packing the converted audio into one or more MQTT orwebRTC messages624 or other transport mechanism. Thecommunications server102 may then send626 the MQTT or webRTC messages to the initiatingcommunication device104. The end user of the initiatingcommunication device104 may, via a user interface, read the receivedtext628. Upon reading, the end user of the initiatingcommunication device104 may opt to continue the conversation by creatingadditional text data618. Thisprocess630 may continue until the conversation is complete. The conversation may continue in this manner whereby the initiatingcommunication device104 is communicating asynchronously by text with thecommunications server102 while thedestination communication device106 is communicating synchronously by voice with thecommunications server102. Thecommunications server102 mediates between the two devices converting from speech to text and text to speech when appropriate to keep the communication session going.
FIG. 6B is another embodiment of a messaging diagram for establishing a communication session between an asynchronouscapable communication device104 to asynchronous communication device106,116,118. In this embodiment, the synchronous initiatingcommunication device106,116,118 and the asynchronousdestination communication device104 are reversed fromFIG. 6A. The initiatingcommunication device106,116,118 may send aSIP invite612 to thecommunications server102 intending to establish a communication session with thedestination communication device104. The intent of the initiatingcommunication device106 may be to establish a synchronous communications session but thecommunications server102 may determine that thedestination communication device104 is currently in an asynchronous mode. In such a case, thecommunications server102 may establish a synchronous communication link with the synchronous initiatingcommunication device106 as described above. Similarly, thecommunications server102 may convert received audio to text and pack it into one or more MQTT orwebRTC messages624, SMS or other transport mechanism. Thecommunications server102 may then send626 the MQTT, webRTC, or SMS messages to thedestination communication device104. The end user of the initiatingcommunication device104 may, via a user interface, read the receivedtext data428. Upon reading, the end user of the initiatingcommunication device104 may opt to continue the conversation by creating a text response that may then be segmented and packed as IP data into a series of MQTT orwebRTC messages618 or even SMS text messages, for instance. Once the text data has been packed into the transport mechanism, MQTT, webRTC, SMS or otherwise, the text data may be sent620 message by message to thecommunications server102. Thecommunications server102 unpacks the text messages and converts the text to speech via aTTS engine225. The speech may then be sent622 to thedestination communication device106,116,118 over the establishedaudio media channel416. Thisprocess630 may continue until the conversation is complete. Just as before, the conversation may continue in this manner whereby the initiatingcommunication device106,116,118 is communicating synchronously using audio with thecommunications server102 while thedestination communication device104 is communicating asynchronously using text with thecommunications server102. Thecommunications server102 mediates between the two devices converting from speech to text and text to speech when appropriate to keep the communication session going.
It should be noted that if the communication session ends, it may easily be continued at a later time being re-established by the methods described above and illustrated inFIGS. 4 through 6. Additionally, if only one endpoint's communication link(s) to thecommunications server102 terminate, those communication link(s) may potentially be re-established by either the endpoint or thecommunications server102 to continue the communication session. It is also noted that the asynchronous communication modalities of voice used inFIGS. 4 and 5 and text used inFIGS. 6A-B may be intermingled in a communication session whereby the asynchronous device user can adjust between text and voice modalities as needed or desired for communication efficiency.
Theasynchronous communication device104 need not be atelephonic communication device104 but could be a more generic computer device. For example, any computing device that has IP data connectivity via 802.11 WiFi or cellular IP data can be anasynchronous communication device104. This may include, for example, a tablet or computer with WiFi connectivity, a tablet or computer with LTE (or other cellular IP data) connectivity, a WiFi only personal digital assistant (PDA) or handheld media type device, etc. Theasynchronous communication device104 could also be any device capable of either SMS and/or MMS messaging whereby the messages are relayed through thecommunications server102 as illustrated inFIGS. 4-5 for audio use cases andFIGS. 6A-B for text use cases.
Some embodiments may be described using the expression “one embodiment” or “an embodiment” along with their derivatives. These terms mean that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment. The appearances of the phrase “in one embodiment” in various places in the specification are not necessarily all referring to the same embodiment. Further, some embodiments may be described using the expression “coupled” and “connected” along with their derivatives. These terms are not necessarily intended as synonyms for each other. For example, some embodiments may be described using the terms “connected” and/or “coupled” to indicate that two or more elements are in direct physical or electrical contact with each other. The term “coupled,” however, may also mean that two or more elements are not in direct contact with each other, but yet still co-operate or interact with each other.
Although the flowcharts and message diagrams ofFIGS. 4-6 each show a specific order of execution, it is understood that the order of execution may differ from that which is depicted. Also, steps shown in succession in the flowcharts may be executed concurrently or with partial concurrence. Further, in some embodiments, one or more of the steps shown in the flowcharts may be skipped or omitted. In addition, any number of counters, state variables, warning semaphores, or messages might be added to the logical flows described herein, for purposes of enhanced utility, accounting, performance measurement, or providing troubleshooting aids, etc. It is understood that all such variations are within the scope of the present disclosure.
Where any component discussed herein is implemented in the form of software, any one of a number of programming languages may be employed such as, for example, C, C++, C#, Objective C, Java, Javascript, Perl, PHP, Visual Basic, Python, Ruby, Delphi, Flash, or other programming languages. Software components are stored in a memory and are executable by a processor. In this respect, the term “executable” means a program file that is in a form that can ultimately be run by a processor. Examples of executable programs may be, for example, a compiled program that can be translated into machine code in a format that can be loaded into a random access portion of a memory and run by a processor, source code that may be expressed in proper format such as object code that is capable of being loaded into a random access portion of a memory and executed by a processor, or source code that may be interpreted by another executable program to generate instructions in a random access portion of a memory to be executed by a processor, etc. An executable program may be stored in any portion or component of a memory including, for example, random access memory (RAM), read-only memory (ROM), hard drive, solid-state drive, USB flash drive, memory card, optical disc such as compact disc (CD) or digital versatile disc (DVD), floppy disk, magnetic tape, or other memory components.
A memory is defined herein as including both volatile and nonvolatile memory and data storage components. Volatile components are those that do not retain data values upon loss of power. Nonvolatile components are those that retain data upon a loss of power. Thus, a memory may comprise, for example, random access memory (RAM), read-only memory (ROM), hard disk drives, solid-state drives, USB flash drives, memory cards accessed via a memory card reader, floppy disks accessed via an associated floppy disk drive, optical discs accessed via an optical disc drive, magnetic tapes accessed via an appropriate tape drive, and/or other memory components, or a combination of any two or more of these memory components. In addition, the RAM may comprise, for example, static random access memory (SRAM), dynamic random access memory (DRAM), or magnetic random access memory (MRAM) and other such devices. The ROM may comprise, for example, a programmable read-only memory (PROM), an erasable programmable read-only memory (EPROM), an electrically erasable programmable read-only memory (EEPROM), or other like memory device.
The devices described herein may include multiple processors and multiple memories that operate in parallel processing circuits, respectively. In such a case, a local interface, such as a communication bus, may facilitate communication between any two of the multiple processors, between any processor and any of the memories, or between any two of the memories, etc. A local interface may comprise additional systems designed to coordinate this communication, including, for example, performing load balancing. A processor may be of electrical or of some other available construction.
Although the various modules and other various systems and components described herein may be embodied in software or code executed by general purpose hardware, as an alternative the same may also be embodied in dedicated hardware or a combination of software/general purpose hardware and dedicated hardware. If embodied in dedicated hardware, each can be implemented as a circuit or state machine that employs any one of or a combination of a number of technologies. These technologies may include, but are not limited to, discrete logic circuits having logic gates for implementing various logic functions upon an application of one or more data signals, application specific integrated circuits having appropriate logic gates, or other components, etc. Such technologies are generally well known by those skilled in the art and, consequently, are not described in detail herein.
Also, any logic, functionality or application described herein that comprises software or code can be embodied in any non-transitory computer-readable medium for use by or in connection with an instruction execution system such as, for example, a processor in a computer system or other system. In this sense, the logic may comprise, for example, statements including instructions and declarations that can be fetched from the computer-readable medium and executed by the instruction execution system. In the context of the present disclosure, a “computer-readable medium” can be any medium that can contain, store, or maintain the logic or application described herein for use by or in connection with the instruction execution system. The computer-readable medium can comprise any one of many physical media such as, for example, magnetic, optical, or semiconductor media. More specific examples of a suitable computer-readable medium would include, but are not limited to, magnetic tapes, magnetic floppy diskettes, magnetic hard drives, memory cards, solid-state drives, USB flash drives, or optical discs. Also, the computer-readable medium may be a random access memory (RAM) including, for example, static random access memory (SRAM) and dynamic random access memory (DRAM), or magnetic random access memory (MRAM). In addition, the computer-readable medium may be a read-only memory (ROM), a programmable read-only memory (PROM), an erasable programmable read-only memory (EPROM), an electrically erasable programmable read-only memory (EEPROM), or other type of memory device.
What has been described above includes examples of the disclosed architecture. It is, of course, not possible to describe every conceivable combination of components and/or methodologies, but one of ordinary skill in the art may recognize that many further combinations and permutations are possible. Accordingly, the novel architecture is intended to embrace all such alterations, modifications and variations that fall within the spirit and scope of the appended claims.