Movatterモバイル変換


[0]ホーム

URL:


US8370142B2 - Real-time transcription of conference calls - Google Patents

Real-time transcription of conference calls
Download PDF

Info

Publication number
US8370142B2
US8370142B2US12/914,617US91461710AUS8370142B2US 8370142 B2US8370142 B2US 8370142B2US 91461710 AUS91461710 AUS 91461710AUS 8370142 B2US8370142 B2US 8370142B2
Authority
US
United States
Prior art keywords
audio
text
speech recognition
captured
instances
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active, expires
Application number
US12/914,617
Other versions
US20110112833A1 (en
Inventor
David P. Frankel
Noel Tarnoff
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
ZipDX LLC
Original Assignee
ZipDX LLC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ZipDX LLCfiledCriticalZipDX LLC
Priority to US12/914,617priorityCriticalpatent/US8370142B2/en
Assigned to ZIPDX, LLCreassignmentZIPDX, LLCASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS).Assignors: FRANKEL, DAVID P., TARNOFF, NOEL
Publication of US20110112833A1publicationCriticalpatent/US20110112833A1/en
Application grantedgrantedCritical
Publication of US8370142B2publicationCriticalpatent/US8370142B2/en
Activelegal-statusCriticalCurrent
Adjusted expirationlegal-statusCritical

Links

Images

Classifications

Definitions

Landscapes

Abstract

Described herein are embodiments of systems, methods and computer program products for real-time transcription of conference calls that employ voice activity detection, audio snippet capture, and multiple transcription instances to deliver practical real-time or near real-time conference call transcription.

Description

CROSS-REFERENCE TO RELATED APPLICATION
This patent application claims the benefit of U.S. Provisional Patent Application No. 61/256,558, filed Oct. 30, 2009, and entitled “Real-Time Transcription of Conference Calls”, which is hereby incorporated by reference in its entirety.
BACKGROUND
Business professionals are routinely using audio conferencing systems, rather than in-person meetings, to collaborate. Conference calls are now a mainstay of business life, and continue to grow in popularity. The functionality of conference calling is not only used on a “stand-alone” basis, but also as part of video calls and “web conferences.” Often times, conference calls are recorded and then transcribed, so that those that could not attend can review the conversation, or so that those that did attend have a written record of what was said. The transcription, usually performed by a human transcriptionist, is typically available hours or days after the conference call takes place.
There are a number of applications for real-time teleconference transcription, which converts the conference call conversation to text while the teleconference is occurring and makes it accessible via a display and computer network (such as a web browser over the Internet).
Using real-time teleconference transcription enables those with hearing impairments to participate. Latecomers could review what they had missed. An individual could readily monitor multiple conference calls by watching, rather than listening. Participants that needed to step away or were interrupted could easily catch up when they returned. Participants could refer back to earlier dialogue if they couldn't recollect what had been said. Internet “chat” (entered via keyboard) could easily be mixed with spoken conversation.
Unfortunately, conference call transcription has been hampered by high cost, since historically it has been very labor-intensive. Automated speech-to-text (also called automatic speech recognition, ASR) technology has been improving, and it shows increasing promise. However, there are challenges to using ASR for real-time conference call transcription. The technology generally does not perform well in the performance of double-talk (more than one party speaking at once) or with background noise. ASR generally lacks the ability to identify who is talking (it cannot recognize voices). Many ASR algorithms cannot run in real time (it can take the algorithm more than one minute to convert a minute of speech). And, it can be costly to run ASR (both in terms of the computer resources required and potential royalties that must be paid).
Therefore, what is needed is a solution that addresses the challenges of conference call transcription, some of which are described above.
SUMMARY
Described herein are embodiments of systems, methods and computer program products for real-time transcription of conference calls that employ voice activity detection, audio snippet capture, and multiple transcription instances to deliver practical real-time or near real-time conference call transcription. In one aspect, participants in a conference call are each separately monitored. When any of them are speaking, their voice (isolated from the voices of other participants) is captured, one phrase or sentence at a time (called a “snippet”), and is sent to an instance of the transcription algorithm for conversion to text. A snippet can be determined by a voice activity detector (VAD), which can use any of several techniques as described herein or as known to one of ordinary skill in the art to determine when the participant is speaking and to find breaks in the speech. The resulting text output is labeled with the speaker's identity and concatenated with text derived from speech of the other participants.
In one aspect, multiple instances of the transcription (ASR) engine allow the system to transcribe speech from multiple talkers at once. Even when only one person is talking, the system can dispatch their speech in snippets to separate ASR instances. So, even if the embodiment of an ASR algorithm being used is not capable of transcribing a stream of speech in real-time, an embodiment of the system of the present invention can produce a near-real-time results by parsing the speech into snippets that are sent to a plurality of ASRs.
In one aspect, an ASR instance is not dedicated to each channel, therefore, ASR resources are not wasted on participants that are not speaking. Embodiments of the transcription system are exceptionally scalable even to conferences including hundreds or thousands of participants, because at any given instant, only one or a few will be talking.
Additional advantages will be set forth in part in the description which follows or may be learned by practice. The advantages will be realized and attained by means of the elements and combinations particularly pointed out in the appended inventive concepts. It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive, as inventive concepts.
BRIEF DESCRIPTION OF THE DRAWINGS
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments and together with the description, serve to explain the principles of the methods and systems:
FIG. 1 illustrates an exemplary environment for practicing embodiments of the invention;
FIG. 2 is a block diagram illustrating voice activity detector (VAD) and audio snippet capture functions according to an embodiment of the invention;
FIG. 3 is a block diagram illustrating functions of a snippet dispatcher, which manages the assignment of audio snippets received from the VAD and audio snippet capture functions to one or more transcription instances (ASRs), according to an embodiment of the invention;
FIG. 4 is a block diagram illustrating an exemplary operating environment for performing the disclosed methods; and
FIG. 5 is an exemplary flowchart illustrating a process for practicing an aspect according to an embodiment of the present invention.
FIG. 6 is a sample screen shot depicting the operation of the present invention, in one embodiment.
DETAILED DESCRIPTION
Before the present methods and systems are disclosed and described, it is to be understood that the methods and systems are not limited to specific synthetic methods, specific components, or to particular compositions. It is also to be understood that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting.
As used in the specification and the appended inventive concepts, the singular forms “a,” “an” and “the” include plural referents unless the context clearly dictates otherwise. Ranges may be expressed herein as from “about” one particular value, and/or to “about” another particular value. When such a range is expressed, another embodiment includes from the one particular value and/or to the other particular value. Similarly, when values are expressed as approximations, by use of the antecedent “about,” it will be understood that the particular value forms another embodiment. It will be further understood that the endpoints of each of the ranges are significant both in relation to the other endpoint, and independently of the other endpoint.
“Optional” or “optionally” means that the subsequently described event or circumstance may or may not occur, and that the description includes instances where said event or circumstance occurs and instances where it does not.
Throughout the description and inventive concepts of this specification, the word “comprise” and variations of the word, such as “comprising” and “comprises,” means “including but not limited to,” and is not intended to exclude, for example, other additives, components, integers or steps. “Exemplary” means “an example of” and is not intended to convey an indication of a preferred or ideal embodiment. “Such as” is not used in a restrictive sense, but for explanatory purposes.
Disclosed are components that can be used to perform the disclosed methods and systems. These and other components are disclosed herein, and it is understood that when combinations, subsets, interactions, groups, etc. of these components are disclosed that while specific reference of each various individual and collective combinations and permutation of these may not be explicitly disclosed, each is specifically contemplated and described herein, for all methods and systems. This applies to all aspects of this application including, but not limited to, steps in disclosed methods. Thus, if there are a variety of additional steps that can be performed it is understood that each of these additional steps can be performed with any specific embodiment or combination of embodiments of the disclosed methods.
The present methods and systems may be understood more readily by reference to the following detailed description of preferred embodiments and the Examples included therein and to the Figures and their previous and following description.
As will be appreciated by one skilled in the art, the methods and systems may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the methods and systems may take the form of a computer program product on a computer-readable storage medium having computer-readable program instructions (e.g., computer software) embodied in the storage medium. More particularly, the present methods and systems may take the form of web-implemented computer software. Any suitable computer-readable storage medium may be utilized including hard disks, CD-ROMs, optical storage devices, or magnetic storage devices.
Embodiments of the methods and systems are described below with reference to block diagrams and flowchart illustrations of methods, systems, apparatuses and computer program products. It will be understood that each block of the block diagrams and flowchart illustrations, and combinations of blocks in the block diagrams and flowchart illustrations, respectively, can be implemented by computer program instructions. These computer program instructions may be loaded onto a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions which execute on the computer or other programmable data processing apparatus create a means for implementing the functions specified in the flowchart block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including computer-readable instructions for implementing the function specified in the flowchart block or blocks. The computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer-implemented process such that the instructions that execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart block or blocks.
Accordingly, blocks of the block diagrams and flowchart illustrations support combinations of means for performing the specified functions, combinations of steps for performing the specified functions and program instruction means for performing the specified functions. It will also be understood that each block of the block diagrams and flowchart illustrations, and combinations of blocks in the block diagrams and flowchart illustrations, can be implemented by special purpose hardware-based computer systems that perform the specified functions or steps, or combinations of special purpose hardware and computer instructions.
Overview:
FIG. 1 illustrates an exemplary environment for practicing embodiments of the invention. As shown inFIG. 1,endpoint devices200,202 interface with aconferencing bridge204 via widebandcapable networks206 andnarrowband networks208. As shown inFIG. 1,endpoint devices200,202 are in numerous form and versions, including wired and wireless as well as Internet-based devices such as a “softphone” and/or Internet messaging clients such as computers configured to use VoIP via tools like GoogleTalk™ and Skype™.
In a conventional conference call, two or more audio connections are linked to a conference bridge, such as themultifidelity bridge204 shown inFIG. 1. Incoming audio from each of the audio connections is mixed (using algorithms of varying complexity) and sent back to the participants via the outgoing side of each connection.
The audio connections can be, for example, conventional telephone connections (established by participants dialing a phone number to connect to the bridge, or by the bridge dialing the phone numbers of the participants, or some combination). The audio connections can also be, for example, Voice-over-IP or some similar network connection. Some systems, such as the embodiment shown inFIG. 1, can support a mix of telephony and VoIP participants in any given conference call. Typically, by the time the audio connection reaches theconference bridge204, it has been converted to a digital format consisting of encoded audio samples and carried either in packet form or via time-division-multiplexing.
Embodiments described herein provide a practical way to perform real-time or near real-time transcription of a conference call and, as shown inFIG. 2, comprise voice activity detector (VAD) and audio snippet capture functions251, which monitor eachaudio connection252 in a conference and capture individual phrases or paragraphs as they are spoken. Improvements in ASR and VAD technology can be incorporated as they become available, making the embodiments of the described system even more capable and cost-effective. Also, embodiments described herein comprise asnippet dispatcher301, as shown inFIG. 3, which manages the assignment of audio snippets received from the VAD and audio snippet capture functions251 to one ormore ASRs302.
In one aspect, an embodiment of the invention provides a means to perform near real-time transcription, even when the ASR algorithm cannot operate in real-time. For example, if the embodiment of an ASR algorithm being used is not capable of transcribing a stream of speech in real-time, an embodiment of the system of the present invention can produce a near-real-time results by parsing the speech into snippets that are sent to a plurality of ASRs. In one aspect, an embodiment of the invention provides a means to identify each audio connection (and potentially each individual speaker) in a conference, without dedicating an ASR instance to each connection. When the embodiment of an ASR algorithm being used operates faster than real-time, a single (or a few) ASR instances can be shared among many speakers and several conferences. If an ASR algorithm is licensed “by the minute” (or there is otherwise an expense associated with having the ASR algorithm available to transcribe an audio stream), an embodiment of the invention, through its sharing of ASR instances, is much more cost-effective than one that dedicates an ASR instance to each connection.
As shown inFIG. 2, in one embodiment a VAD/audio snippet capture mechanism can be inserted in the path from each audio connection to the conference bridge. The VAD determines when a given participant is, and is not, speaking. In one aspect, the VAD can be very simple, detecting, for example, just that the audio signal exceeds a certain energy threshold. Or, in other aspects, the VAD can be more complex, distinguishing actual speech from other noises such as blowing wind, or coughing or breathing or typing. In various configurations, the VAD can also include, or work cooperatively with, a noise filter that removes the impairments before the audio signal is passed on. The sophistication of the VAD can have an impact on the quality of the transcription; however, embodiments of the invention are operable with a wide range of VAD algorithms, from simple to extremely complex.
Embodiments of the audio snippet capture mechanism work in conjunction with the VAD to capture the digital audio samples during those intervals when the corresponding participant is determined to be speaking. In one aspect, the audio snippet capture mechanism monitors the VAD to decide when it has collected a suitable snippet. Preferably, though not required, speech is captured by the audio snippet capture mechanism up to a natural break, such as the end of a sentence or paragraph. The snippet length can vary. Generally, snippet length varies according to implementation details, such as the optimum message size for transmission over the connection mechanism to the ASR, and the maximum delay desired between when the words are spoken and when they appear in the transcription. Thus, in one aspect the audio snippet capture mechanism monitors the length of time that the VAD indicates the participant has not been speaking, as well as the running length of the snippet, to determine the appropriate stop point.
In one embodiment, the nominal “not speaking” interval can be set to, for example, 400 milliseconds, looking for a natural “sentence” break, though other intervals are contemplated within the scope of embodiments of the invention. However, if no such break is found after, for example, 10 seconds, the “not speaking” interval threshold can be dynamically lowered to, for example, 200 milliseconds. If no such break is found after, for example, 20 seconds elapsed, the “not speaking” interval threshold can be dynamically lowered to, for example, 50 milliseconds. In one aspect, a snippet can be considered “complete” at a maximum of, for example, 30 seconds (and immediately commence capture of a subsequent snippet) if no “not speaking” interval has been detected at all. The values above are exemplary and are not intended to be limiting as the “not speaking” interval and the maximum snippet length can be adjusted for user preference and particular applications.
As shown inFIG. 3, as snippets are captured by the audio snippet capture mechanism, they are sent to the snippet dispatcher. In one aspect, the snippet dispatcher function can be a separate process through which all snippets are funneled, or it can be a distributed function that is executed as each capture process completes. As shown inFIG. 3, the snippet dispatcher is responsible for passing the captured snippet to the ASR instance, and it can operate in any of several modes depending on the overall system configuration and constraints of the ASR instances.
In some embodiments, the snippet dispatcher queues the snippets as they arrive from the one or more audio snippet capture mechanisms. The snippet dispatcher monitors the “busyness” of the ASR instances and the snippets are dispatched to ASR instances as the instances become available. The snippet dispatcher instructs the ASR instance to notify it upon completion of snippet processing so that the next snippet can be taken off the queue and dispatched.
In another embodiment, the number of ASR instances is not constrained. In this case, upon arrival of a new snippet at the snippet dispatcher, the snippet dispatcher instantiates a new ASR instance to process that snippet. Once processing of the snippet is complete, the ASR instance may disappear.
The audio snippet capture mechanism tags each snippet with the identity of the audio connection to which it corresponds, as well as a label indicating of which conference it is a part. It is also given a sequence number, so that its place in time can be determined relative to other snippets being generated from this and other audio connections. As the ASR instances finish their conversion of the snippets to text, they dispatch the results, including the parameters received with the snippet, to an aggregator that labels each with the appropriate audio connection identifier and combines them according to conference ID and sequence number. The results are then sent to (or available for retrieval by) parties subscribed to each particular conference transcript. Embodiments of the invention can be used in a wide variety of environments and numerous enhancements are possible and considered within the scope of the embodiments.
The VAD and audio snippet capture, and the snippet dispatcher mechanisms can be separate from the conference bridge, or can be integrated with it. Typically, these elements are implemented as software algorithms running on general-purpose computers, but they can be implemented as software running on digital signal processing hardware (DSP), or embodied in purpose-built hardware. The various elements of embodiments of the invention, and the conferencing system in its entirety, can be implemented in a single computer platform. Or the elements can be partitioned into separate subsystems, communicating over a local or wide-area network.
The identifier for each audio connection can be, for example, a numeric label; it could also be a “Caller-ID” captured as part of an incoming call or other signaling information sent when the call is established. Or, it could be communicated explicitly by a control function associated with the conferencing system. Or, it could be extracted from DTMF signals input by the participant when joining the call. Analogous techniques can also be used for the conference ID.
In some instances, the name of a teleconference participant may be known to the conferencing system, and this can be associated with the text output for that participant by linking the name to the audio connection ID. See, for example, U.S. Pat. No. 7,343,008, issued on Mar. 11, 2008, and incorporated herein by reference for an example of a teleconferencing system that can assign a name to a participant. In some situations, there may be multiple individuals associated with a given audio connection—for example, when several people are in a single conference room sharing a speakerphone. Their words can be tagged with a suitable label (“Boston Conference Room”), or a more explicit technique can be used to identify them. For example, each individual could be assigned a DTMF digit that they would press prior to speaking, which would be captured by a DTMF detector associated with the VAD and appended to the audio connection ID. Or, the ASR could be taught to recognize a particular phrase (“Now speaking: Ralph”) that participants would use to introduce themselves, and the transcribed name could then be made part of the audio connection ID passed to the aggregator function.
Some ASR algorithms can “learn,” resulting in enhanced performance as they process more speech from a specific individual. If the ASR instances have access to a shared database, they can use the audio connection ID to store “learned” information about that speaker in the database, which can then be retrieved by another (or the same) ASR instance when it resumes processing speech for that same audio connection. To the extent that two or more ASR instances are simultaneously processing audio for the same audio connection, it may not be possible to capture the “learnings” for all of them. Some ASR algorithms also “learn” from the context of the speech. Here, it may enhance performance if the shared database is used to exchange this type of learning across all participants in a given conference call, since they will likely be re-using many of the same words and phrases.
ASR algorithms are available for different languages. In multi-lingual environments, the language for a given conference call or an individual audio connection can be specified, and an appropriate ASR instance can be invoked, or language-specific ASR settings applied, when the associated snippets are dispatched.
It can be seen that embodiments of the invention can operate when there is only one party in the conference call. Embodiments of the invention can also function when there are an unlimited number of parties to the call, and can handle any number of simultaneous conference calls, provided that appropriate resources are available. “Cloud” computing can be employed, for example, to instantiate additional instances of the ASR function when required. Embodiments of the invention can operate on a “stand-alone” audio conference call, or on the audio portion of a video conference call, or the audio conference conducted as part of a web conference.
In one embodiment, the “aggregator” mentioned above receives transcription text (called text items) from the ASR instances and can operate in a variety of different ways. One approach is to store the text items in a database, along with the associated sequence numbers and speaker identification and any other available information. As depicted inFIG. 6, a software application (called a transcription viewer601) may run on a computer with a display device (such as a personal computer, or a smart phone); it retrieves and displays the text items and associated data from the database. The application could be implemented to run in an internet browser, or in some similar environment, or it could be a specialized application for a specific platform. The transcription viewer can also be configured such that it can be embedded into some other application (such as a blog or a web page providing other functions).
Depending on the particular situation, access to thetranscription viewer601 for a particular conference call might be restricted to only the organizer of that call, or only to participants in that call (602). It might also be made more generally available, to anybody at all or only to those with a password. Those skilled in the art are familiar with various kinds of access control mechanisms that could be applicable.
Thetranscription viewer601 can display the results of the transcription (603) in near-real-time (that is, as the conference takes place with a short delay). It can repeatedly poll the database for new updates and display these in the proper order by referencing the sequence numbers. Thetranscription viewer601 can also operate after the conference has ended.
Thetranscription viewer601 can display all thetext items604, or it can allow the user to select a subset for display based on some criteria (such as speaker identification, or a sequence numbers, or timestamps611 if that information is available). It can allow the text to be searched, highlighting, for example, all text items containing a particular word or phrase.
Thetranscription viewer601 can further allow the user to hear the original audio associated with a given text item. This can be implemented by, for example, associating in the database for each text item an audio file containing the source snippet. When the user selects (via mouse click, for example) a particular text item (607), thetranscription viewer601 plays the associated snippet through the computer's speaker or headphone.
Thetranscription viewer601 can offer the ability to play back all the text items sequentially (608), providing an on-screen highlight of the text item currently being played (610). It can offer the ability to play back only certain text items (such as those belonging to a particular speaker) (609). Since audio is captured from each channel separately, thetranscription viewer601 can be configured so that if two or more participants were speaking concurrently, their snippets are played sequentially. Or, if timestamps are available for each snippet, thetranscription viewer601 can replay the snippets as they actually took place, allowing speakers to overlap by merging the audio from multiple snippets.
Thetranscription viewer601 can allow a user to modify the text and then store it back into the database for subsequent viewing by others. This can be useful for correcting errors in the original transcription.
The use of Automated Speech Recognition technology has been described to perform the speech-to-text function. However, this same invention can be used in conjunction with human transcriptionists. Rather than queuing the snippets for processing by one or more ASR instances, the snippets can be queued and dispatched to one or more transcriptionists, each of which listens to a snippet played through a computer and enters via keyboard the corresponding text.
Analogous to the description above, this approach allows many transcriptionists to work simultaneously on the same conference call. In contrast to a more traditional approach where a single person transcribes an entire conference, in this mode a finished transcript can be available shortly after the call ends. And rather than having the transcriptionist try to identify speakers by recognizing voices, the speakers can be identified according to the channel from which any given snippet was captured.
The system has been described above as comprised of units. One skilled in the art will appreciate that this is a functional description and that the respective functions can be performed by software, hardware, or a combination of software and hardware. A unit can be software, hardware, or a combination of software and hardware. The units can comprise the VAD, Audio Snippet Capture, Snippet Dispatcher, andASR mechanisms software106 as illustrated inFIG. 4 and described below. In one exemplary aspect, the units can comprise acomputer101 as illustrated inFIG. 4 and described below.
FIG. 4 is a block diagram illustrating an exemplary operating environment for performing the disclosed methods. This exemplary operating environment is only an example of an operating environment and is not intended to suggest any limitation as to the scope of use or functionality of operating environment architecture. Neither should the operating environment be interpreted as having any dependency or requirement relating to any one or combination of components illustrated in the exemplary operating environment.
The present methods and systems can be operational with numerous other general purpose or special purpose computing system environments or configurations. Examples of well known computing systems, environments, and/or configurations that can be suitable for use with the systems and methods comprise, but are not limited to, personal computers, server computers, laptop devices, and multiprocessor systems. Additional examples comprise teleconference bridges, set top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, distributed computing environments that comprise any of the above systems or devices, and the like.
The processing of the disclosed methods and systems can be performed by software components. The disclosed systems and methods can be described in the general context of computer-executable instructions, such as program modules, being executed by one or more computers or other devices. Generally, program modules comprise computer code, routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The disclosed methods can also be practiced in grid-based and distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules can be located in both local and remote computer storage media including memory storage devices.
Further, one skilled in the art will appreciate that the systems and methods disclosed herein can be implemented via a general-purpose computing device in the form of acomputer101. The components of thecomputer101 can comprise, but are not limited to, one or more processors orprocessing units103, asystem memory112, and asystem bus113 that couples various system components including theprocessor103 to thesystem memory112. In the case ofmultiple processing units103, the system can utilize parallel computing.
Thesystem bus113 represents one or more of several possible types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, and a processor or local bus using any of a variety of bus architectures. By way of example, such architectures can comprise an Industry Standard Architecture (ISA) bus, a Micro Channel Architecture (MCA) bus, an Enhanced ISA (EISA) bus, a Video Electronics Standards Association (VESA) local bus, an Accelerated Graphics Port (AGP) bus, and a Peripheral Component Interconnects (PCI), a PCI-Express bus, a Personal Computer Memory Card Industry Association (PCMCIA), Universal Serial Bus (USB) and the like. Thebus113, and all buses specified in this description can also be implemented over a wired or wireless network connection and each of the subsystems, including theprocessor103, amass storage device104, anoperating system105, VAD, Audio Snippet Capture, Snippet Dispatcher, andASR mechanisms software106, teleconference data107 (which can include “learned” data available to the ASR algorithms), anetwork adapter108,system memory112, an Input/Output Interface110, adisplay adapter109, adisplay device111, and ahuman machine interface102, can be contained within one or moreremote computing devices114a,b,cat physically separate locations, connected through buses of this form, in effect implementing a fully distributed system.
Thecomputer101 typically comprises a variety of computer readable media. Exemplary readable media can be any available media that is accessible by thecomputer101 and comprises, for example and not meant to be limiting, both volatile and non-volatile media, removable and non-removable media. Thesystem memory112 comprises computer readable media in the form of volatile memory, such as random access memory (RAM), and/or non-volatile memory, such as read only memory (ROM). Thesystem memory112 typically contains data such asteleconference data107 and/or program modules such asoperating system105 and VAD, Audio Snippet Capture, Snippet Dispatcher, andASR mechanisms software106 that are immediately accessible to and/or are presently operated on by theprocessing unit103.
In another aspect, thecomputer101 can also comprise other removable/non-removable, volatile/non-volatile computer storage media. By way of example,FIG. 4 illustrates amass storage device104 which can provide non-volatile storage of computer code, computer readable instructions, data structures, program modules, and other data for thecomputer101. For example and not meant to be limiting, amass storage device104 can be a hard disk, a removable magnetic disk, a removable optical disk, magnetic cassettes or other magnetic storage devices, flash memory cards, CD-ROM, digital versatile disks (DVD) or other optical storage, random access memories (RAM), read only memories (ROM), electrically erasable programmable read-only memory (EEPROM), and the like.
Optionally, any number of program modules can be stored on themass storage device104, including by way of example, anoperating system105 and VAD, Audio Snippet Capture, Snippet Dispatcher, andASR mechanisms software106. Each of theoperating system105 and VAD, Audio Snippet Capture, Snippet Dispatcher, and ASR mechanisms software106 (or some combination thereof) can comprise elements of the programming and the VAD, Audio Snippet Capture, Snippet Dispatcher, andASR mechanisms software106.Teleconference data107 can also be stored on themass storage device104.Teleconference data107 can be stored in any of one or more databases known in the art. Examples of such databases comprise, DB2®, Microsoft® Access, Microsoft® SQL Server, Oracle®, mySQL, PostgreSQL, and the like. The databases can be centralized or distributed across multiple systems.
In another aspect, the user can enter commands and information into thecomputer101 via an input device (not shown). Examples of such input devices comprise, but are not limited to, a keyboard, pointing device (e.g., a “mouse”), a microphone, a joystick, a scanner, tactile input devices such as gloves, and other body coverings, and the like These and other input devices can be connected to theprocessing unit103 via ahuman machine interface102 that is coupled to thesystem bus113, but can be connected by other interface and bus structures, such as a parallel port, game port, an IEEE 1394 Port (also known as a Firewire port), a serial port, or a universal serial bus (USB).
In yet another aspect, adisplay device111 can also be connected to thesystem bus113 via an interface, such as adisplay adapter109. It is contemplated that thecomputer101 can have more than onedisplay adapter109 and thecomputer101 can have more than onedisplay device111. For example, a display device can be a monitor, an LCD (Liquid Crystal Display), or a projector. In addition to thedisplay device111, other output peripheral devices can comprise components such as speakers (not shown) and a printer (not shown) which can be connected to thecomputer101 via Input/Output Interface110. Any step and/or result of the methods can be output in any form to an output device. Such output can be any form of visual representation, including, but not limited to, textual, graphical, animation, audio, tactile, and the like. Furthermore, in one embodiment, thecomputer101 can be operably connected with a public switched telephone network (PSTN)117, as shown inFIGS. 1 and 4, providing connection toendpoint devices200,202.
Thecomputer101 can operate in a networked environment using logical connections to one or more remote computing/communication devices114a,b,candendpoint devices200,202. By way of example, a remote computing/communication device can be a personal computer, portable computer, a server, a router, a network computer, a peer device or other common network node, another teleconference bridge,endpoint devices200,202 as shown inFIG. 1, and so on. Logical connections between thecomputer101 and a remote computing device/communication114a,b,ccan be made via a local area network (LAN) and a general wide area network (WAN), or specialized networks such as aPSTN117. Such network connections can be through anetwork adapter108. Anetwork adapter108 can be implemented in both wired and wireless environments. Such networking environments are conventional and commonplace in offices, enterprise-wide computer networks, intranets, and theInternet115.
For purposes of illustration, application programs and other executable program components such as theoperating system105 are illustrated herein as discrete blocks, although it is recognized that such programs and components reside at various times in different storage components of thecomputing device101, and are executed by the data processor(s) of the computer. An implementation of VAD, Audio Snippet Capture, Snippet Dispatcher, andASR mechanisms software106 can be stored on or transmitted across some form of computer readable media. Any of the disclosed methods can be performed by computer readable instructions embodied on computer readable media. Computer readable media can be any available media that can be accessed by a computer. By way of example and not meant to be limiting, computer readable media can comprise “computer storage media” and “communications media.” “Computer storage media” comprise volatile and non-volatile, removable and non-removable media implemented in any methods or technology for storage of information such as computer readable instructions, data structures, program modules, or other data. Exemplary computer storage media comprises, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disks (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by a computer.
The methods and systems can employ Artificial Intelligence techniques such as machine learning and iterative learning. Examples of such techniques include, but are not limited to, expert systems, case based reasoning, Bayesian networks, behavior based AI, neural networks, fuzzy systems, evolutionary computation (e.g. genetic algorithms), swarm intelligence (e.g. ant algorithms), and hybrid intelligent systems (e.g. Expert inference rules generated through a neural network or production rules from statistical learning).
Exemplary Method of Use:
The following example is put forth so as to provide those of ordinary skill in the art with a complete disclosure and description of how the compounds, compositions, articles, devices and/or methods described herein are made and evaluated, and are intended to be purely exemplary and are not intended to limit the scope of the methods and systems. Efforts have been made to ensure accuracy with respect to numbers (e.g., amounts, temperature, etc.), but some errors and deviations should be accounted for.
Referring to the exemplary flowchart ofFIG. 5, a process is illustrated for practicing an aspect according to an embodiment of the present invention. At step502 a conference call is established among one or more participants. Note that the call may be only one person as that person may want to use embodiments of the invention for transcription purposes. Atstep504, a voice snippet is captured from one of the one or more participants. As described above, this is accomplished via the Voice Activity Detector/Audio Snippet Capture mechanism, as shown inFIG. 2. Atstep506, the captured voice snippet is assigned an identifier and a sequence position. As noted herein, the audio snippet capture mechanism assigns the identifier and sequence position to the voice snippet. Atstep507, the voice snippet is provided to an ASR instance. A snippet dispatcher mechanism can queue the snippet, if necessary, and dispatch it to an instance of an ASR mechanism. Atstep508, the voice snippet is converted into a text string. As noted above, this may be performed by one or more ASR instances. Atstep510, the text string is associated with its corresponding snippet ID and sequence position. Atstep512, the text string is provided to at least one of the one or more subscribers to the transcription.
While the methods and systems have been described in connection with preferred embodiments and specific examples, it is not intended that the scope be limited to the particular embodiments set forth, as the embodiments herein are intended in all respects to be illustrative rather than restrictive.
Unless otherwise expressly stated, it is in no way intended that any method set forth herein be construed as requiring that its steps be performed in a specific order. Accordingly, where a method inventive concept does not actually recite an order to be followed by its steps or it is not otherwise specifically stated in the inventive concepts or descriptions that the steps are to be limited to a specific order, it is no way intended that an order be inferred, in any respect. This holds for any possible non-express basis for interpretation, including: matters of logic with respect to arrangement of steps or operational flow; plain meaning derived from grammatical organization or punctuation; the number or type of embodiments described in the specification.
Throughout this application, various publications are referenced. The disclosures of these publications in their entireties are hereby incorporated by reference into this application in order to more fully describe the state of the art to which the methods and systems pertain.
It will be apparent to those skilled in the art that various modifications and variations can be made without departing from the scope or spirit. Other embodiments will be apparent to those skilled in the art from consideration of the specification and practice disclosed herein. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit being indicated by the following inventive concepts.

Claims (21)

1. A system for transcribing a conference call among a plurality of participants using a plurality of audio connections; the system comprising:
(a) a plurality of capture mechanisms, each one of the plurality of capture mechanisms capturing a portion of audio associated with one of the plurality of audio connections,
wherein each one of the plurality of capture mechanisms comprises
a voice activity detector for detecting a voice snippet included in the portion of audio, the length of the voice snippet being determined by detecting a break in the portion of audio, and
means for capturing the voice snippet;
(b) a plurality of speech recognition instances for converting audio to text, each one of the plurality of speech recognition instances having substantially the same capability;
(c) a dispatcher for forwarding a first captured portion of audio from a selected one of the plurality of capture mechanisms to a first one of the plurality of speech recognition instances, and for forwarding a second captured portion of audio from the selected one of the plurality of capture mechanisms to a second one of the plurality of speech recognition instances when the first one of the plurality of speech recognition instances is processing the first captured portion of audio, wherein the second captured portion of audio is subsequent to the first captured portion of audio; and
(d) a combiner for re-assembling the text converted by the first one of the plurality of speech recognition instances and the text converted by the second one of the plurality of speech recognition instances from captured portions of audio from the plurality of capture mechanisms.
11. A method for transcribing a conference call among a plurality of participants using a plurality of audio connections; the method comprising:
(a) capturing a plurality of portions of audio, each of the plurality of portions of audio being associated with at least one of the plurality of audio connections;
(b) forwarding a first portion of audio of the captured plurality of portions of audio to a first one of a plurality of speech recognition instances, and for forwarding a second portion of audio of the captured plurality of portions of audio to a second one of the plurality of speech recognition instances, the first portion of audio and the second portion of audio being associated with a selected one of the plurality of audio connections, whereby each of the plurality of speech recognition instances converts the audio to text, wherein the second portion of audio is subsequent to the first portion of audio, and wherein each one of the plurality of speech recognition instances has substantially the same capability;
(c) re-assembling the text converted by the first one of the plurality of speech recognition instances and the text converted by the second one of the plurality of speech recognition instances.
20. A non-transitory computer-readable medium having a computer program recorded thereon, the computer program comprising computer code instructions for implementing a method for transcribing a conference call among a plurality of participants using a plurality of audio connections; the non-transitory computer-readable medium comprising:
(a) a first computer code instruction portion for capturing a plurality of portions of audio, each of the plurality of portions of audio being associated with at least one of the plurality of audio connections;
(b) a second computer code instruction portion for forwarding a first portion of audio of the captured plurality of portions of audio to a first one of a plurality of speech recognition instances, and for forwarding a second portion of audio of the captured plurality of portions of audio to a second one of the plurality of speech recognition instances, the first portion of audio and the second portion of audio being associated with a selected one of the plurality of audio connections, whereby each of the plurality of speech recognition instances converts the audio to text, wherein the second portion of audio is subsequent to the first portion of audio, and wherein each one of the plurality of speech recognition instances has substantially the same capability; and
(c) a third computer code instruction portion for re-assembling the text converted by the first one of the plurality of speech recognition instances and the text converted by the second one of the plurality of speech recognition instances.
US12/914,6172009-10-302010-10-28Real-time transcription of conference callsActive2030-12-27US8370142B2 (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
US12/914,617US8370142B2 (en)2009-10-302010-10-28Real-time transcription of conference calls

Applications Claiming Priority (2)

Application NumberPriority DateFiling DateTitle
US25655809P2009-10-302009-10-30
US12/914,617US8370142B2 (en)2009-10-302010-10-28Real-time transcription of conference calls

Publications (2)

Publication NumberPublication Date
US20110112833A1 US20110112833A1 (en)2011-05-12
US8370142B2true US8370142B2 (en)2013-02-05

Family

ID=43974840

Family Applications (1)

Application NumberTitlePriority DateFiling Date
US12/914,617Active2030-12-27US8370142B2 (en)2009-10-302010-10-28Real-time transcription of conference calls

Country Status (1)

CountryLink
US (1)US8370142B2 (en)

Cited By (26)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US20120078626A1 (en)*2010-09-272012-03-29Johney TsaiSystems and methods for converting speech in multimedia content to text
US20140095166A1 (en)*2012-09-282014-04-03International Business Machines CorporationDeep tagging background noises
US20140278404A1 (en)*2013-03-152014-09-18Parlant Technology, Inc.Audio merge tags
US9008296B2 (en)2013-06-102015-04-14Microsoft Technology Licensing, LlcCatching up with an ongoing conference call
US20150120825A1 (en)*2013-10-252015-04-30Avaya, Inc.Sequential segregated synchronized transcription and textual interaction spatial orientation with talk-over
US9443518B1 (en)*2011-08-312016-09-13Google Inc.Text transcript generation from a communication session
US9886423B2 (en)2015-06-192018-02-06International Business Machines CorporationReconciliation of transcripts
US20190124128A1 (en)*2014-09-052019-04-25Minerva Project, Inc.System and method for tracking events and providing feedback in a virtual conference
US10389876B2 (en)2014-02-282019-08-20Ultratec, Inc.Semiautomated relay method and apparatus
US10748523B2 (en)2014-02-282020-08-18Ultratec, Inc.Semiautomated relay method and apparatus
US10878721B2 (en)2014-02-282020-12-29Ultratec, Inc.Semiautomated relay method and apparatus
US10917519B2 (en)2014-02-282021-02-09Ultratec, Inc.Semiautomated relay method and apparatus
US20210366478A1 (en)*2020-05-202021-11-25Sharp Kabushiki KaishaInformation processing system, information processing method, and recording medium having stored thereon information processing program
US20220115019A1 (en)*2020-10-122022-04-14Soundhound, Inc.Method and system for conversation transcription with metadata
US11328730B2 (en)2019-07-192022-05-10Nextiva, Inc.Automated audio-to-text transcription in multi-device teleconferences
US11450334B2 (en)*2020-09-092022-09-20Rovi Guides, Inc.Systems and methods for filtering unwanted sounds from a conference call using voice synthesis
US11539900B2 (en)2020-02-212022-12-27Ultratec, Inc.Caption modification and augmentation systems and methods for use by hearing assisted user
US11605385B2 (en)2019-10-312023-03-14International Business Machines CorporationProject issue tracking via automated voice recognition
US11664029B2 (en)2014-02-282023-05-30Ultratec, Inc.Semiautomated relay method and apparatus
US11817113B2 (en)2020-09-092023-11-14Rovi Guides, Inc.Systems and methods for filtering unwanted sounds from a conference call
US11854551B2 (en)2019-03-222023-12-26Avaya Inc.Hybrid architecture for transcription of real-time audio based on event data between on-premises system and cloud-based advanced audio processing system
US11916687B2 (en)2021-07-282024-02-27Zoom Video Communications, Inc.Topic relevance detection using automated speech recognition
US12052391B2 (en)2020-10-282024-07-30Capital One Services, LlcMethods and systems for automatic queuing in conference calls
US12058186B2 (en)2022-11-302024-08-06International Business Machines CorporationPrivate audio communication in a conference call
US12299557B1 (en)2023-12-222025-05-13GovernmentGPT Inc.Response plan modification through artificial intelligence applied to ambient data communicated to an incident commander
US12392583B2 (en)2023-12-222025-08-19John BridgeBody safety device with visual sensing and haptic response using artificial intelligence

Families Citing this family (54)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US9710819B2 (en)*2003-05-052017-07-18Interactions LlcReal-time transcription system utilizing divided audio chunks
AU2004237227B2 (en)2003-05-052011-07-14Interactions LlcApparatus and method for processing service interactions
US7567908B2 (en)2004-01-132009-07-28International Business Machines CorporationDifferential dynamic content delivery with text display in dependence upon simultaneous speech
US9031839B2 (en)*2010-12-012015-05-12Cisco Technology, Inc.Conference transcription based on conference data
US8719031B2 (en)*2011-06-172014-05-06At&T Intellectual Property I, L.P.Dynamic access to external media content based on speaker content
US9053750B2 (en)*2011-06-172015-06-09At&T Intellectual Property I, L.P.Speaker association with a visual representation of spoken content
US20130024196A1 (en)*2011-07-212013-01-24Nuance Communications, Inc.Systems and methods for using a mobile device to deliver speech with speaker identification
US20130022189A1 (en)*2011-07-212013-01-24Nuance Communications, Inc.Systems and methods for receiving and processing audio signals captured using multiple devices
US9313336B2 (en)2011-07-212016-04-12Nuance Communications, Inc.Systems and methods for processing audio signals captured using microphones of multiple devices
US9014358B2 (en)*2011-09-012015-04-21Blackberry LimitedConferenced voice to text transcription
JP5404726B2 (en)*2011-09-262014-02-05株式会社東芝 Information processing apparatus, information processing method, and program
US9230546B2 (en)2011-11-032016-01-05International Business Machines CorporationVoice content transcription during collaboration sessions
KR20130098032A (en)*2012-02-272013-09-04삼성전자주식회사Method for managing pateints using group commuication
US9263044B1 (en)*2012-06-272016-02-16Amazon Technologies, Inc.Noise reduction based on mouth area movement recognition
WO2014052431A1 (en)*2012-09-272014-04-03Dolby Laboratories Licensing CorporationMethod for improving perceptual continuity in a spatial teleconferencing system
WO2014097748A1 (en)*2012-12-182014-06-26インターナショナル・ビジネス・マシーンズ・コーポレーションMethod for processing voice of specified speaker, as well as electronic device system and electronic device program therefor
WO2014148190A1 (en)*2013-03-192014-09-25Necソリューションイノベータ株式会社Note-taking assistance system, information delivery device, terminal, note-taking assistance method, and computer-readable recording medium
KR102149266B1 (en)*2013-05-212020-08-28삼성전자 주식회사Method and apparatus for managing audio data in electronic device
US9595271B2 (en)*2013-06-272017-03-14Getgo, Inc.Computer system employing speech recognition for detection of non-speech audio
EP3017589B1 (en)*2013-07-022018-08-08Family Systems, LimitedSystem for improving audio conferencing services
US20190312973A1 (en)*2014-02-282019-10-10Ultratec, Inc.Semiautomated relay method and apparatus
CN106663429A (en)*2014-03-102017-05-10韦利通公司Engine, system and method of providing audio transcriptions for use in content resources
US9178773B1 (en)*2014-04-152015-11-03Green Key Technologies LlcComputer-programmed telephone-enabled devices for processing and managing numerous simultaneous voice conversations conducted by an individual over a computer network and computer methods of implementing thereof
US9917756B2 (en)*2014-06-272018-03-13Agora Lab, Inc.Systems and methods for visualizing a call over network with a caller readiness dialog box
US9838544B2 (en)2014-06-272017-12-05Agora Lab, Inc.Systems and methods for improved quality of a call over network with load leveling and last mile signal indication
JP2016062357A (en)*2014-09-182016-04-25株式会社東芝Voice translation device, method, and program
CN105991854B (en)*2014-09-292020-03-13上海兆言网络科技有限公司System and method for visualizing VoIP (Voice over Internet protocol) teleconference on intelligent terminal
JP6464411B6 (en)*2015-02-252019-03-13Dynabook株式会社 Electronic device, method and program
US9672829B2 (en)*2015-03-232017-06-06International Business Machines CorporationExtracting and displaying key points of a video conference
US20170024086A1 (en)*2015-06-232017-01-26Jamdeo Canada Ltd.System and methods for detection and handling of focus elements
US10089061B2 (en)2015-08-282018-10-02Kabushiki Kaisha ToshibaElectronic device and method
US9984674B2 (en)*2015-09-142018-05-29International Business Machines CorporationCognitive computing enabled smarter conferencing
US20170075652A1 (en)2015-09-142017-03-16Kabushiki Kaisha ToshibaElectronic device and method
US9652113B1 (en)*2016-10-062017-05-16International Business Machines CorporationManaging multiple overlapped or missed meetings
US10600420B2 (en)2017-05-152020-03-24Microsoft Technology Licensing, LlcAssociating a speaker with reactions in a conference session
US10841755B2 (en)2017-07-012020-11-17Phoneic, Inc.Call routing using call forwarding options in telephony networks
US11037567B2 (en)*2018-01-192021-06-15Sorenson Ip Holdings, LlcTranscription of communications
CN110612568B (en)*2018-03-292023-01-03京瓷办公信息系统株式会社Information processing apparatus
US10867610B2 (en)*2018-05-042020-12-15Microsoft Technology Licensing, LlcComputerized intelligent assistant for conferences
JP2019204025A (en)*2018-05-242019-11-28レノボ・シンガポール・プライベート・リミテッドElectronic apparatus, control method, and program
US10629205B2 (en)*2018-06-122020-04-21International Business Machines CorporationIdentifying an accurate transcription from probabilistic inputs
US10636427B2 (en)*2018-06-222020-04-28Microsoft Technology Licensing, LlcUse of voice recognition to generate a transcript of conversation(s)
US10388272B1 (en)2018-12-042019-08-20Sorenson Ip Holdings, LlcTraining speech recognition systems using word sequences
US11170761B2 (en)2018-12-042021-11-09Sorenson Ip Holdings, LlcTraining of speech recognition systems
US10573312B1 (en)*2018-12-042020-02-25Sorenson Ip Holdings, LlcTranscription generation from multiple speech recognition systems
US11017778B1 (en)2018-12-042021-05-25Sorenson Ip Holdings, LlcSwitching between speech recognition systems
CN111128132A (en)*2019-12-192020-05-08秒针信息技术有限公司Voice separation method, device and system and storage medium
FR3106690B1 (en)*2020-01-282022-07-29Vdp 3 0 Information processing method, telecommunication terminal and computer program
US11488604B2 (en)2020-08-192022-11-01Sorenson Ip Holdings, LlcTranscription of audio
CN112562677B (en)*2020-11-252023-12-15安徽听见科技有限公司Conference voice transcription method, device, equipment and storage medium
US20240005947A1 (en)*2021-04-212024-01-04Microsoft Technology Licensing, LlcSynthetic speech detection
JP7342918B2 (en)*2021-07-302023-09-12株式会社リコー Information processing device, text data editing method, communication system, program
US20240331690A1 (en)*2023-04-032024-10-03Comcast Cable Communications, LlcMethods and systems for enhanced conferencing
CN117765970A (en)*2023-12-222024-03-26中国电信股份有限公司Audio file identification method and device

Citations (21)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US6487534B1 (en)*1999-03-262002-11-26U.S. Philips CorporationDistributed client-server speech recognition system
US20020193991A1 (en)*2001-06-132002-12-19Intel CorporationCombining N-best lists from multiple speech recognizers
US20030122921A1 (en)2001-09-052003-07-03Taib Ronnie Bernard FrancisConference calling
US20040021765A1 (en)*2002-07-032004-02-05Francis KubalaSpeech recognition system for managing telemeetings
US20040114541A1 (en)2002-12-112004-06-17Siemens InformationSystem and method for collaboration summarization playback
US6785653B1 (en)2000-05-012004-08-31Nuance CommunicationsDistributed voice web architecture and associated components and methods
US20040186712A1 (en)2003-03-182004-09-23Coles Scott DavidApparatus and method for providing voice recognition for multiple speakers
US6816468B1 (en)1999-12-162004-11-09Nortel Networks LimitedCaptioning for tele-conferences
US6816834B2 (en)2002-10-232004-11-09Jon JarokerSystem and method for secure real-time high accuracy speech to text conversion of general quality speech
US6850609B1 (en)*1997-10-282005-02-01Verizon Services Corp.Methods and apparatus for providing speech recording and speech transcription services
US20050207554A1 (en)2002-11-082005-09-22Verizon Services Corp.Facilitation of a conference call
US7016844B2 (en)*2002-09-262006-03-21Core Mobility, Inc.System and method for online transcription services
US7130404B2 (en)*2003-03-182006-10-31Avaya Technology Corp.Apparatus and method for providing advanced communication conferencing operations
US7133513B1 (en)*2004-07-212006-11-07Sprint Spectrum L.P.Method and system for transcribing voice content of an on-going teleconference into human-readable notation
US20070106724A1 (en)*2005-11-042007-05-10Gorti Sreenivasa REnhanced IP conferencing service
US20070206759A1 (en)2006-03-012007-09-06Boyanovsky Robert MSystems, methods, and apparatus to record conference call activity
US7302390B2 (en)2002-09-022007-11-27Industrial Technology Research InstituteConfigurable distributed speech recognition system
US7343008B1 (en)*2007-04-232008-03-11Frankel David PIdentity-based conferencing systems and methods
US7539086B2 (en)2002-10-232009-05-26J2 Global Communications, Inc.System and method for the secure, real-time, high accuracy conversion of general-quality speech into text
US20090135741A1 (en)2007-11-282009-05-28Say2Go, Inc.Regulated voice conferencing with optional distributed speech-to-text recognition
US20090177470A1 (en)2007-12-212009-07-09Sandcherry, Inc.Distributed dictation/transcription system

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CA2154406C (en)*1994-07-222000-01-25Tomoharu KiyunaSystem for predicting internal condition of live body
US7133512B2 (en)*2003-10-302006-11-07International Business Machines CorporationConference call aggregation using an interactive voice response system

Patent Citations (27)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US6850609B1 (en)*1997-10-282005-02-01Verizon Services Corp.Methods and apparatus for providing speech recording and speech transcription services
US6487534B1 (en)*1999-03-262002-11-26U.S. Philips CorporationDistributed client-server speech recognition system
US6816468B1 (en)1999-12-162004-11-09Nortel Networks LimitedCaptioning for tele-conferences
US6785653B1 (en)2000-05-012004-08-31Nuance CommunicationsDistributed voice web architecture and associated components and methods
US20020193991A1 (en)*2001-06-132002-12-19Intel CorporationCombining N-best lists from multiple speech recognizers
US20030122921A1 (en)2001-09-052003-07-03Taib Ronnie Bernard FrancisConference calling
US6747685B2 (en)2001-09-052004-06-08Motorola, Inc.Conference calling
US20040021765A1 (en)*2002-07-032004-02-05Francis KubalaSpeech recognition system for managing telemeetings
US7302390B2 (en)2002-09-022007-11-27Industrial Technology Research InstituteConfigurable distributed speech recognition system
US7016844B2 (en)*2002-09-262006-03-21Core Mobility, Inc.System and method for online transcription services
US6816834B2 (en)2002-10-232004-11-09Jon JarokerSystem and method for secure real-time high accuracy speech to text conversion of general quality speech
US20090292539A1 (en)2002-10-232009-11-26J2 Global Communications, Inc.System and method for the secure, real-time, high accuracy conversion of general quality speech into text
US7539086B2 (en)2002-10-232009-05-26J2 Global Communications, Inc.System and method for the secure, real-time, high accuracy conversion of general-quality speech into text
US7539290B2 (en)2002-11-082009-05-26Verizon Services Corp.Facilitation of a conference call
US20050207554A1 (en)2002-11-082005-09-22Verizon Services Corp.Facilitation of a conference call
US20040114541A1 (en)2002-12-112004-06-17Siemens InformationSystem and method for collaboration summarization playback
US7545758B2 (en)2002-12-112009-06-09Siemens Communications, Inc.System and method for collaboration summarization playback
US20040186712A1 (en)2003-03-182004-09-23Coles Scott DavidApparatus and method for providing voice recognition for multiple speakers
US7130404B2 (en)*2003-03-182006-10-31Avaya Technology Corp.Apparatus and method for providing advanced communication conferencing operations
US7844454B2 (en)2003-03-182010-11-30Avaya Inc.Apparatus and method for providing voice recognition for multiple speakers
US7133513B1 (en)*2004-07-212006-11-07Sprint Spectrum L.P.Method and system for transcribing voice content of an on-going teleconference into human-readable notation
US20070106724A1 (en)*2005-11-042007-05-10Gorti Sreenivasa REnhanced IP conferencing service
US20070206759A1 (en)2006-03-012007-09-06Boyanovsky Robert MSystems, methods, and apparatus to record conference call activity
US7343008B1 (en)*2007-04-232008-03-11Frankel David PIdentity-based conferencing systems and methods
US20090135741A1 (en)2007-11-282009-05-28Say2Go, Inc.Regulated voice conferencing with optional distributed speech-to-text recognition
US20090177470A1 (en)2007-12-212009-07-09Sandcherry, Inc.Distributed dictation/transcription system
US8150689B2 (en)2007-12-212012-04-03Nvoq IncorporatedDistributed dictation/transcription system

Non-Patent Citations (8)

* Cited by examiner, † Cited by third party
Title
Amir et al. Towards Automatic Real Time Preparation of On-Line Video. Proceedings of the 34th Hawaii International Conference on System Sciences. 2001. (8 pages).
Diakopoulos et al. "Audio Puzzler: Piecing Together Time-Stamped Speech Transcripts with a Puzzle Game", ACM Multimedia, 2008 (4 pages).
Diakopoulos et al. "Audio Puzzler: Piecing Together Time-Stamped Speech Transcripts with a Puzzle Game", ACM Multimedia, 2008.*
Hindus et al. "Capturing, Structuring and Representing Ubiquitous Audio", ACM Transactions on Information Systems, vol. 11, 1993, pp. 376-400 (25 pages).
Hindus et al. "Capturing, Structuring and Representing Ubiquitous Audio", ACM Transactions on Information Systems, vol. 11, 1993, pp. 376-400.*
Mishne et al. "Automatic Analysis of Call-Center Conversations", In Proceedings of the 14th ACM international conference on Information and knowledge management (2005), pp. 453-459 (7 pages).
Mishne et al. "Automatic Analysis of Call-Center Conversations", In Proceedings of the 14th ACM international conference on Information and knowledge management (2005), pp. 453-459.*
Wellner et al. Conference Scribe: Turning Conference Calls into Documents. Proceedings of the 34th Hawaii International Conference on System Sciences. 2001. (9 pages).

Cited By (56)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US9332319B2 (en)*2010-09-272016-05-03Unisys CorporationAmalgamating multimedia transcripts for closed captioning from a plurality of text to speech conversions
US20120078626A1 (en)*2010-09-272012-03-29Johney TsaiSystems and methods for converting speech in multimedia content to text
US10019989B2 (en)*2011-08-312018-07-10Google LlcText transcript generation from a communication session
US20170011740A1 (en)*2011-08-312017-01-12Google Inc.Text transcript generation from a communication session
US9443518B1 (en)*2011-08-312016-09-13Google Inc.Text transcript generation from a communication session
US9263059B2 (en)*2012-09-282016-02-16International Business Machines CorporationDeep tagging background noises
US9972340B2 (en)2012-09-282018-05-15International Business Machines CorporationDeep tagging background noises
US9472209B2 (en)2012-09-282016-10-18International Business Machines CorporationDeep tagging background noises
US20140095166A1 (en)*2012-09-282014-04-03International Business Machines CorporationDeep tagging background noises
US20140278404A1 (en)*2013-03-152014-09-18Parlant Technology, Inc.Audio merge tags
US9300811B2 (en)2013-06-102016-03-29Microsoft Technology Licensing, LlcCatching up with an ongoing conference call
US9154618B2 (en)2013-06-102015-10-06Microsoft Technology Licensing, LlcCatching up with an ongoing conference call
US9008296B2 (en)2013-06-102015-04-14Microsoft Technology Licensing, LlcCatching up with an ongoing conference call
US20150120825A1 (en)*2013-10-252015-04-30Avaya, Inc.Sequential segregated synchronized transcription and textual interaction spatial orientation with talk-over
US10917519B2 (en)2014-02-282021-02-09Ultratec, Inc.Semiautomated relay method and apparatus
US10748523B2 (en)2014-02-282020-08-18Ultratec, Inc.Semiautomated relay method and apparatus
US11627221B2 (en)2014-02-282023-04-11Ultratec, Inc.Semiautomated relay method and apparatus
US10389876B2 (en)2014-02-282019-08-20Ultratec, Inc.Semiautomated relay method and apparatus
US10542141B2 (en)2014-02-282020-01-21Ultratec, Inc.Semiautomated relay method and apparatus
US11741963B2 (en)2014-02-282023-08-29Ultratec, Inc.Semiautomated relay method and apparatus
US10742805B2 (en)2014-02-282020-08-11Ultratec, Inc.Semiautomated relay method and apparatus
US11368581B2 (en)2014-02-282022-06-21Ultratec, Inc.Semiautomated relay method and apparatus
US12136425B2 (en)2014-02-282024-11-05Ultratec, Inc.Semiautomated relay method and apparatus
US10878721B2 (en)2014-02-282020-12-29Ultratec, Inc.Semiautomated relay method and apparatus
US11664029B2 (en)2014-02-282023-05-30Ultratec, Inc.Semiautomated relay method and apparatus
US12400660B2 (en)2014-02-282025-08-26Ultratec, Inc.Semiautomated relay method and apparatus
US12136426B2 (en)2014-02-282024-11-05Ultratec, Inc.Semiautomated relay method and apparatus
US12137183B2 (en)2014-02-282024-11-05Ultratec, Inc.Semiautomated relay method and apparatus
US10666696B2 (en)2014-09-052020-05-26Minerva Project, Inc.System and method for a virtual conference interactive timeline
US10805365B2 (en)*2014-09-052020-10-13Minerva Project, Inc.System and method for tracking events and providing feedback in a virtual conference
US20190124128A1 (en)*2014-09-052019-04-25Minerva Project, Inc.System and method for tracking events and providing feedback in a virtual conference
US9886423B2 (en)2015-06-192018-02-06International Business Machines CorporationReconciliation of transcripts
US9892095B2 (en)2015-06-192018-02-13International Business Machines CorporationReconciliation of transcripts
US11854551B2 (en)2019-03-222023-12-26Avaya Inc.Hybrid architecture for transcription of real-time audio based on event data between on-premises system and cloud-based advanced audio processing system
US11328730B2 (en)2019-07-192022-05-10Nextiva, Inc.Automated audio-to-text transcription in multi-device teleconferences
US12159632B2 (en)2019-07-192024-12-03Nextiva, Inc.Automated audio-to-text transcription in multi-device teleconferences
US11721344B2 (en)2019-07-192023-08-08Nextiva, Inc.Automated audio-to-text transcription in multi-device teleconferences
US11574638B2 (en)2019-07-192023-02-07Nextiva, Inc.Automated audio-to-text transcription in multi-device teleconferences
US11605385B2 (en)2019-10-312023-03-14International Business Machines CorporationProject issue tracking via automated voice recognition
US12035070B2 (en)2020-02-212024-07-09Ultratec, Inc.Caption modification and augmentation systems and methods for use by hearing assisted user
US11539900B2 (en)2020-02-212022-12-27Ultratec, Inc.Caption modification and augmentation systems and methods for use by hearing assisted user
US20210366478A1 (en)*2020-05-202021-11-25Sharp Kabushiki KaishaInformation processing system, information processing method, and recording medium having stored thereon information processing program
US11804223B2 (en)*2020-05-202023-10-31Sharp Kabushiki KaishaInformation processing system, information processing method, and recording medium having stored thereon information processing program
US11450334B2 (en)*2020-09-092022-09-20Rovi Guides, Inc.Systems and methods for filtering unwanted sounds from a conference call using voice synthesis
US11810585B2 (en)2020-09-092023-11-07Rovi Guides, Inc.Systems and methods for filtering unwanted sounds from a conference call using voice synthesis
US12159643B2 (en)2020-09-092024-12-03Adeia Guides Inc.Systems and methods for filtering unwanted sounds from a conference call using voice synthesis
US11817113B2 (en)2020-09-092023-11-14Rovi Guides, Inc.Systems and methods for filtering unwanted sounds from a conference call
US12073849B2 (en)2020-09-092024-08-27Rovi Guides, Inc.Systems and methods for filtering unwanted sounds from a conference call
US12125487B2 (en)*2020-10-122024-10-22SoundHound AI IP, LLC.Method and system for conversation transcription with metadata
US20220115019A1 (en)*2020-10-122022-04-14Soundhound, Inc.Method and system for conversation transcription with metadata
US12052391B2 (en)2020-10-282024-07-30Capital One Services, LlcMethods and systems for automatic queuing in conference calls
US12308987B2 (en)2021-07-282025-05-20Zoom Communications, Inc.Topic relevance detection in video conferencing
US11916687B2 (en)2021-07-282024-02-27Zoom Video Communications, Inc.Topic relevance detection using automated speech recognition
US12058186B2 (en)2022-11-302024-08-06International Business Machines CorporationPrivate audio communication in a conference call
US12392583B2 (en)2023-12-222025-08-19John BridgeBody safety device with visual sensing and haptic response using artificial intelligence
US12299557B1 (en)2023-12-222025-05-13GovernmentGPT Inc.Response plan modification through artificial intelligence applied to ambient data communicated to an incident commander

Also Published As

Publication numberPublication date
US20110112833A1 (en)2011-05-12

Similar Documents

PublicationPublication DateTitle
US8370142B2 (en)Real-time transcription of conference calls
US11669683B2 (en)Speech recognition and summarization
CN112075075B (en) Method and computerized intelligent assistant for facilitating teleconferencing
US8484040B2 (en)Social analysis in multi-participant meetings
US20220351729A1 (en)Systems and methods for recognizing a speech of a speaker
US11514914B2 (en)Systems and methods for an intelligent virtual assistant for meetings
US9894121B2 (en)Guiding a desired outcome for an electronically hosted conference
US8731919B2 (en)Methods and system for capturing voice files and rendering them searchable by keyword or phrase
US10629189B2 (en)Automatic note taking within a virtual meeting
US7130403B2 (en)System and method for enhanced multimedia conference collaboration
US7248684B2 (en)System and method for processing conference collaboration records
US8826210B2 (en)Visualization interface of continuous waveform multi-speaker identification
US7756923B2 (en)System and method for intelligent multimedia conference collaboration summarization
US20100268534A1 (en)Transcription, archiving and threading of voice communications
US20230403174A1 (en)Intelligent virtual event assistant
US20150149162A1 (en)Multi-channel speech recognition
US20250095654A1 (en)Automated Audio-to-Text Transcription in Multi-Device Teleconferences
US20230230589A1 (en)Extracting engaging questions from a communication session
US20240430118A1 (en)Systems and Methods for Creation and Application of Interaction Analytics
US20210327416A1 (en)Voice data capture
US20230230588A1 (en)Extracting filler words and phrases from a communication session
US11799679B2 (en)Systems and methods for creation and application of interaction analytics
EP1429528B1 (en)System and method for collaboration summarization playback

Legal Events

DateCodeTitleDescription
ASAssignment

Owner name:ZIPDX, LLC, CALIFORNIA

Free format text:ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:FRANKEL, DAVID P.;TARNOFF, NOEL;SIGNING DATES FROM 20101110 TO 20101111;REEL/FRAME:025411/0166

STCFInformation on status: patent grant

Free format text:PATENTED CASE

FPAYFee payment

Year of fee payment:4

MAFPMaintenance fee payment

Free format text:PAYMENT OF MAINTENANCE FEE, 8TH YR, SMALL ENTITY (ORIGINAL EVENT CODE: M2552); ENTITY STATUS OF PATENT OWNER: SMALL ENTITY

Year of fee payment:8

MAFPMaintenance fee payment

Free format text:PAYMENT OF MAINTENANCE FEE, 12TH YR, SMALL ENTITY (ORIGINAL EVENT CODE: M2553); ENTITY STATUS OF PATENT OWNER: SMALL ENTITY

Year of fee payment:12


[8]ページ先頭

©2009-2025 Movatter.jp