Movatterモバイル変換


[0]ホーム

URL:


US7308404B2 - Method and apparatus for speech recognition using a dynamic vocabulary - Google Patents

Method and apparatus for speech recognition using a dynamic vocabulary
Download PDF

Info

Publication number
US7308404B2
US7308404B2US10/912,517US91251704AUS7308404B2US 7308404 B2US7308404 B2US 7308404B2US 91251704 AUS91251704 AUS 91251704AUS 7308404 B2US7308404 B2US 7308404B2
Authority
US
United States
Prior art keywords
language model
spoken request
readable medium
computer readable
words
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Lifetime, expires
Application number
US10/912,517
Other versions
US20050055210A1 (en
Inventor
Anand Venkataraman
Horacio E. Franco
Douglas A. Bercow
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nuance Communications Inc
Original Assignee
SRI International Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US09/967,228external-prioritypatent/US6996519B2/en
Application filed by SRI International IncfiledCriticalSRI International Inc
Priority to US10/912,517priorityCriticalpatent/US7308404B2/en
Assigned to SRI INTERNATIONALreassignmentSRI INTERNATIONALASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS).Assignors: FRANCO, HORACIO E., BERCOW, DOUGLAS A., VENKATARAMAN, ANAND
Publication of US20050055210A1publicationCriticalpatent/US20050055210A1/en
Application grantedgrantedCritical
Publication of US7308404B2publicationCriticalpatent/US7308404B2/en
Assigned to NUANCE COMMUNICATIONS, INC.reassignmentNUANCE COMMUNICATIONS, INC.ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS).Assignors: SRI INTERNATIONAL
Adjusted expirationlegal-statusCritical
Expired - Lifetimelegal-statusCriticalCurrent

Links

Images

Classifications

Definitions

Landscapes

Abstract

A method and apparatus are provided for performing speech recognition using a dynamic vocabulary. Results from a preliminary speech recognition pass can be used to update or refine a language model in order to improve the accuracy of search results and to simplify subsequent recognition passes. This iterative process greatly reduces the number of alternative hypotheses produced during each speech recognition pass, as well as the time required to process subsequent passes, making the speech recognition process faster, more efficient and more accurate. The iterative process is characterized by the use of results from one or more data set queries, where the keys used to query the data set, as well as the queries themselves, are constructed in a manner that produces more effective language models for use in subsequent attempts at decoding a given speech signal.

Description

CROSS REFERENCE TO RELATED APPLICATIONS
This application is a continuation-in-part of U.S. patent application Ser. No. 09/967,228, filed Sep. 28, 2001 now U.S. Pat. No. 6,996,519 (titled “Method and Apparatus for Performing Relational Speech Recognition”), which is herein incorporated by reference in its entirety. In addition, this application claims the benefit of U.S. Provisional Patent Application Ser. No. 60/492,761, filed Aug. 5, 2003 (titled “Method for Refinement of Speech Recognition Hypothesis”), which is herein incorporated by reference in its entirety.
FIELD OF THE INVENTION
The present invention relates generally to speech recognition and relates more specifically to speech recognition systems having dynamic vocabularies.
BACKGROUND OF THE DISCLOSURE
Conventional speech recognition systems used for accessing structured data tend to be very restrictive in terms of the signals (e.g., user commands or utterances) that may be input to search a database. That is, if a user issues a verbal request that is not phrased to exactly match a data item in the system's database, the system may produce inaccurate or incomplete results.
One proposed solution to this problem is to include a plurality of potential alternate signals that may be spoken for each item in the database; however, memory constraints make this proposal difficult to feasibly implement.
Thus, there is a need in the art for a method and apparatus for speech recognition using a dynamic vocabulary.
SUMMARY OF THE INVENTION
In one embodiment, the present invention relates to a method and apparatus for performing speech recognition using a dynamic vocabulary. Results from a preliminary speech recognition pass can be used to update or refine a language model in order to improve the accuracy of search results and to simplify subsequent recognition passes. This iterative process greatly reduces the number of alternative hypotheses produced during each speech recognition pass, as well as the time required to process subsequent passes, making the speech recognition process faster, more efficient and more accurate. The iterative process is characterized by the use of results from one or more data set queries, where the keys used to query the data set, as well as the queries themselves, are constructed in a manner that produces more effective language models for use in subsequent attempts at decoding a given speech signal.
BRIEF DESCRIPTION OF THE DRAWINGS
The teachings of the present invention can be readily understood by considering the following detailed description in conjunction with the accompanying drawings, in which:
FIG. 1 is a block diagram of a speech recognition system that operates in accordance with the present invention;
FIG. 2 is a flow chart illustrating a method for recognizing words that have observable relationships;
FIG. 3 is a flow chart illustrating a method for generating or selecting new language models and/or new acoustic models for use in a speech recognition process;
FIG. 4 illustrates a flow diagram that depicts one embodiment of a method for speech recognition using a dynamic vocabulary, according to the present invention; and
FIG. 5 is a flow diagram illustrating one embodiment of a method for constructing a second language model in accordance the method illustrated inFIG. 4.
To facilitate understanding, identical reference numerals have been used, where possible, to designate identical elements that are common to the figures.
DETAILED DESCRIPTION
The present invention relates to a method and apparatus for speech recognition using a dynamic vocabulary.
FIG. 1 is a block diagram illustrating aspeech recognition system101 that operates in accordance with the present invention. Thissystem101 may be implemented in a portable device such as a hand held computer, a portable phone, or an automobile. It may also be implemented in a stationary device such as a desktop personal computer or an appliance, or it may be distributed between both local and remote devices. Thespeech recognition system101 illustratively comprises a speechrecognition front end103, aspeech recognition engine105, aprocessor107, and a memory/database109. In further embodiments, thespeech recognition system101 may also comprise one or more input/output (I/O) devices (not shown) such as a display, a keyboard, a mouse, a modem and the like. In one embodiment, at least one I/O device is a storage device (e.g., a disk drive, an optical disk drive or a floppy drive).
The speechrecognition front end103 receives and samples spoken input, and then measures and extracts features or characteristics of the spoken input that are used later in the speech recognition process. Thespeech recognition engine105 may include a search method (such as a Viterbi search method) and acoustic models (such as models of individual phonemes or models of groups of phonemes) used in the speech recognition process. Theprocessor107 and associatedmemory109 together operate as a computer to control the operation of thefront end103 and thespeech recognition engine105. Thememory109 stores recognizable words and word sets111 in an accessible database that is used by thesystem101 to process speech. Memory109 also stores thesoftware115 that is used to implement the methods of the present invention. Both the speechrecognition front end103 and thespeech recognition engine105 may be implemented in hardware, software, or combination of hardware and software (e.g., using Application Specific Integrated Circuits (ASIC)), where the software is loaded from a storage medium (e.g., an I/O device) and operated by theprocessor107 in thememory109 of thesystem101. As such, in one embodiment, the speechrecognition front end103 and/or thespeech recognition engine105 can be stored on a computer readable medium or carrier (e.g., RAM, magnetic or optical drive or diskette, and the like).
In one embodiment, the invention relates to speech recognition systems and methods used to recognize words that have observable relationships. Examples of word sets with observable relationships are addresses; locations; names and telephone numbers; airline flight numbers, departure/arrival times, and departure/arrival cities; product part numbers, catalog numbers, and product names; and any other sets of words used to identify a person, place, thing or action.
Groups of words with observable relationships may be referred to as “sparse domains” or domains that have a small “Cartesian product” because typically only a small fraction of all possible word combinations are valid combinations. For example, an address with the ZIP code “94025” is only associated with the city of Menlo Park, Calif. “San Francisco, Calif. 94025” or “Menlo Park, N.J. 94025” are not valid addresses.
FIG. 2 is a flow chart illustrating a preferred method for recognizing words that have observable relationships. This method may be implemented as asoftware routine115 that is executed by theprocessor107 ofFIG. 1. When a speech signal that represents a spoken utterance is received (step201), a speech recognition “pass” is performed by applying a first language model to the speech signal (step203). The language model may be a probabilistic finite state grammar, a statistical language model, or any other language model that is useful in a speech recognition system. The first recognition pass does not attempt to recognize the entire speech signal; for example, if the utterance represents an address, the first recognition pass may use a language model that recognizes only city names or only street numbers.
Next, a new language model and/or new acoustic models are selected or generated (step205). The selection or generation of the new model or models is based at least in part on results from the previous recognition pass, and may also be based on information regarding the linguistic structure of the domain and/or information regarding relationships among concepts, objects, or components in the domain. For example, the previous recognition passes may have recognized the city name “Menlo Park” and the street number “333.” Based on this information, a new language model might be generated or selected that includes only those streets in Menlo Park that have “333” as a street number.
This new language model and/or acoustic models and at least a portion of the speech signal are then used to perform another recognition pass (step207). If a satisfactory recognition of the spoken utterance is complete (step209), the speech recognition process ends (step211). If a satisfactory recognition of the spoken utterance is not complete, then steps205-209 are repeated as necessary.
FIG. 3 is a flowchart that illustrates a preferred method for generating or selecting a new language model and/or new acoustic models (i.e., amethod performing step205 ofFIG. 2.). In this method, a result from a speech recognition pass is acquired (step301). This result includes a component, object or concept of the relevant domain. For example, if the speech recognition system is being used to recognize an address, the result from the previous recognition pass may include a street number or city name.
Next, the result from the speech recognition pass is used to perform a search on a database that contains information regarding relationships among the domain concepts, objects, or components (step303). For example, the database may be a relational database that has information regarding the relationships among the components of an address. A search on the city name “Menlo Park” might find all the street names in that city; a search on the ZIP code “94025” might find all the streets within that ZIP code; and so on.
Finally, one or more results from the database search are then used to select or generate a language model and/or acoustic models (step305). For example, the results from a database search on the ZIP code “94025” might be used to generate a language model (or select an existing language model) that includes all of the street names in that ZIP code. Or, the results from a database search on the city name “Menlo Park” and the street name “Ravenswood Avenue” might be used to generate or select a language model that includes all of the street numbers on Ravenswood Avenue in Menlo Park. Language models generated or selected this way can be used to greatly reduce the search space of subsequent recognition passes, making the speech recognition process both faster and more accurate.
FIG. 4 illustrates a flow diagram that depicts one embodiment of amethod400 for speech recognition using a dynamic vocabulary, according to the present invention. In one embodiment, this method is implemented as asoftware routine115 that is executed by theprocessor107 ofFIG. 1. Themethod400 is initialized atstep405 and proceeds to step410, where themethod400 provides an initial language model, from which an initial wordgraph is computed. The initial wordgraph is a network of words and utterances that a user signal (e.g., a spoken request) could possibly include. In one embodiment, the initial language model is constructed so as to bias recognition hypotheses in favor of a domain under consideration. For example, if the method of the present invention is deployed in a music-related application, the initial language model provided instep410 might be tailored to a domain comprising song titles. In one embodiment, other possible domains for which an initial language model could be tailored include movies, books, games, cellular phone ring tones, auction items, library and retail catalogs, directory listings and addresses, among others.
In one embodiment, the initial language model is constructed using maximum-likelihood interpolation of an open language model (e.g., a language model that does not restrict a search space to a particular domain or sub-domain) with a domain-specific language model. In an alternate embodiment, a class-based language model may be used to enable themethod400 to achieve varying degrees of generalization within a given domain. In another embodiment, a mis-matched language model is provided (e.g., wherein phrases from which the model was built are typically not those that would be uttered by a user). For example, a language model built from a broadcast news report might be deployed in a system configured to recognize song titles.
Instep415, themethod400 receives a signal (e.g., a spoken request for data) from a user. For example, a user may dial a music server on his cellular phone and say, “I'd like to listen to ‘Radio Gaga’ by Queen.” Instep420, themethod400 generates one or more hypotheses (e.g., proposed data matches) in response to the user signal by decoding the signal using the initial wordgraph computed instep410.
Instep430, themethod400 computes a confidence score for each of the words appearing in each of the hypotheses produced instep420. The confidence score represents a likelihood that a data set query using the corresponding scored word will identify one or more data items corresponding to the user signal. In one embodiment, confidence scores are computed by combining the hypotheses produced instep420 into a second wordgraph and computing posterior probability scores for each word at each temporal position in the second wordgraph. Instep435, one or more high-confidence words are selected for use in a data set query. In one embodiment, high-confidence words are identified as any words having a confidence score that at least meets a predefined threshold. In one embodiment, where confidence scores are computed using a second wordgraph as described above, words having confidence scores that fall below a first predefined threshold are eliminated from the wordgraph, and the remaining words are identified as a set of high-confidence words suitable for selection. For instance, in the example provided above, themethod400 may return the words “Radio” and “Queen” as high-confidence words, since “Gaga” is a typically unknown word and since the user's cellular phone may capture background noise in addition to the user's spoken request.
Instep440, themethod400 uses the one or more high-confidence words selected instep430 query a data set. In one embodiment, the data set represents metadata related to resources, for which the user signal represents a request for access. In one embodiment, the data set is a database or the World Wide Web. For example, if themethod400 were deployed in a music-related application, the metadata might include song titles and/or artist names. Thus, if themethod400 used the words “Radio” and “Queen” to query a music-related data set for all songs that contain both words in their song track information, a set of returned results would likely include “Queen Greatest Hits Two: ‘Radio Gaga’ by Freddie Mercury.”
Instep450, themethod400 determines whether a number of results produced by the query ofstep440 exceeds a second predefined threshold. If themethod400 determines that the number of query results does exceed the second predefined threshold, themethod400 proceeds to step490 and constructs a second language model. In one embodiment, the second language model is constructed by updating the initial language model based on the query results. In another embodiment, the second language model is constructed as a new language model.
FIG. 5 is a flow diagram illustrating one embodiment of amethod500 for constructing the second language model (or updating the initial language model, as the case may be) in accordance withstep490 of themethod400. Themethod500 is initialized atstep505 and proceeds to step510, where themethod500 analyzes the query results to find novel words (e.g., words contained in the query results that are not present in the current incarnation of the initial language model). For instance, using the example provided above, themethod400 may determine instep490, based on the results returned instep440, that “Gaga” is a “novel” word contained in the user's signal (but not contained in the initial language model). Themethod500 then proceeds to step520 and adds the novel words (e.g., “Gaga”) to themethod400's pronunciation dictionary, thereby enabling the novel words to be identified the next time they are spoken by the user. In one embodiment, pronunciation of novel words is derived from their spelling in accordance with the methods described in M. J. Dedina and H. C. Nusbaum, “PRONOUNCE: A Program for Pronunciation by Analogy”, Computer Speech and Language 5, p. 55-64, 1991, although other methods for deriving pronunciations may be employed without departing from the scope of the present invention. Themethod500 terminates instep530.
The second language model and any subsequent language models are constructed in a manner that successively narrows the space (e.g., the portion of the data set) that is queried. For instance, in the example provided above, the second language model might allow only for the possibility that the user signal contains a request for a song by Queen with the word “radio” in its title. However, the second language model may also allow for several alternative ways of requesting each such song. For example, the second language model may rely in part on knowledge of how natural language queries are made, including ways in which such queries could actually be phrased by a user. These include the use of prefixes (e.g., “Play me X.”; “Get me X.”; “Please find me X.”; “I'd like to listen to X.”; “Do you have X?”; etc.), infixes (e.g., “X sung by Y.”; “X performed by Y.”; “X by Y.”; “X from the album Z.”; etc.), suffixes (e.g., “Please”; “Thank you”; “If you have it”; etc.) and disfluencies (e.g., “Uh”; “Hmm”; etc.).
In one embodiment, the second or updated language model is implemented directly as a search graph that efficiently encodes parallel paths from the start of the search graph to the end of the search graph. Each parallel path represents one possible way in which a user could phrase a request for one of the results returned by the data set query.
In another embodiment, the second language model is a statistical n-gram language model constructed in accordance with known techniques, such as those described in A. Stolcke, “SRILM: An Extensible Language Modeling Toolkit,” Proc. Intl. Conf. on Spoken Language Processing, Vol. 2, pp. 901-904 (2002).
Referring back toFIG. 4, once the current language model has been implemented (e.g., through construction of a second language model or update of the initial language model) as necessary instep490, themethod400 returns to step420 and generates one or more new hypotheses by decoding the user signal (received in step415) using an updated wordgraph computed from the second language model produced instep490. Thus, if a first data set query produces a list of words or phrases ranked by confidence scoring, and a second data set query using these listed words fails to produce many query results, the data set may be iteratively queried using progressively fewer words (e.g., by eliminating the lowest-confidence word with each query), and a subsequent language model can be constructed based on the union of the results obtained from all of the iterative queries.
Alternatively, if themethod400 determines that the number of query results does not exceed the second predefined threshold, themethod400 proceeds to step460 and constructs an updated language model and wordgraph from the query results. Themethod400 thereby refines decoding of the user signal to specifically target the query results obtained instep440. In one embodiment, the pronunciation of any new words (e.g., words not contained in the initial language model) is derived in accordance with Dedina et al. as described above.
Alternatively, if the second predefined threshold is not exceeded, themethod400 may attempt to increase the number of query results returned in order to increase the likelihood of finding a result that corresponds to the user signal. For example, themethod400 may query the data set for a second time, lowering the first pre-defined threshold so that a less restrictive set of high-confidence words is used in the second query.
In another embodiment, multiple alternative sets of high-confidence words may be used in multiple queries of the data set. For example, heuristic techniques may be implemented to expand the set of queries made to the data set. These techniques may be employed when it is expected that the information retrieved from a first data set query will be insufficient to generate a rich enough set of results to guarantee the presence of the item requested by the user in a subsequent language model. For example, where the user signal comprises a request for an address, a set of hypotheses (e.g., generated in accordance with step420) may include “33”, “333” and “338” as potential street numbers (ranked by confidence scoring in that order) and “94025” and “94035” as potential zip codes (also ranked by confidence scoring in that order). If a combination of street number “33” and zip code “94025” fails to produce any results in a data set query, or if results are produced by a first data set query that suffer from contingent inaccuracies, heuristic techniques may be implemented to expand the query step (step440) so that several queries in accordance withstep440 are performed. Each query is based on a different cross product of hypotheses (e.g., different combinations of street numbers and zip codes) from an N-best list of hypotheses based on the first data set query. A second data set query is constructed from the union of results of these several queries, e.g., (33, 94025), (33, 94035), (333, 94025), (333, 94035), (338, 94025) and (338, 94035).
Instep470, themethod400 decodes the user signal using the updated wordgraph constructed instep460. In one embodiment, decoding is accomplished by assigning probabilities to each phrase in the wordgraph, where a probability represents the likelihood that the phrase matches the user signal (e.g., is the phrase uttered by the user).
Instep480, themethod400 returns one or more results of the decoding performed instep470. In one embodiment, themethod400 returns the phrase with the highest assigned probability. In another embodiment, themethod400 returns a plurality of phrases (e.g., the phrases with the ten highest assigned probabilities, or all phrases having an assigned probability that deviates from the highest probability by less than a given amount). In yet another embodiment, if more than one result is produced instep470, themethod400 may perform additional decoding iterations in which more words from the “correct” hypothesis are identified, thereby progressively narrowing the search space to a single phrase matching the user signal.
In one embodiment, themethod400 proceeds to step492 and enables the user to select a location for the download or transmission of the results returned instep480. For example, the user may use a cellular phone to initiate a request for data (e.g., a song), but may wish to have the requested data downloaded to a remote location, for example, a home computer. In one embodiment, other remote locations include a desktop computer, a laptop computer, a personal digital assistant (PDA), a wristwatch, a portable music player, a car stereo, a hi-fi/entertainment center, a television, a digital video recorder (DVR), or a cable or satellite set stop box, among others. Once themethod400 has returned one or more results to the user as defined by themethod400's operating parameters, themethod400 terminates instep495.
In one embodiment, themethod400 may be executed in its entirety at a single computing device. However, persons skilled in the art will appreciate that various steps of themethod400 may be executed at two or more separate computing devices in order to enhance the speed, scalablity and/or availability of a system in which themethod400 is implemented. For instance, themethod400 may receive a signal from a user, in accordance withstep415, at a computing device that is local to the user. However, one or more of the steps subsequent to step415 may be performed at one or more remote computing devices.
For example, decoding of the user signal (in accordance with step420) may include, without limitation, a first, generic recognition pass and a second, more specialized recognition pass. The first and second recognition passes may be executed at a common server computer, or each recognition pass could be hosted at an individual server computer. Moreover, a plurality of server computers adapted for performing specialized recognition passes may be implemented to receive query results (e.g., obtained through step440), so that a single server computer is not required to process all query results. This increases server availability, as well as the amount of information that may be stored at the server level, and reduces failure rate by providing alternatives in the event of failure of one or more servers.
Additionally, decodingstep420 may comprise both a local processing step and a remote processing step. For example, the local processing step may be executed at a local device (e.g., the device that directly receives the user signal) to process the user signal and extract features therefrom. Features extracted during the local processing step may then be transmitted over a network to a remote server for the remote processing step, which involves generating one or more hypotheses in response to the extracted features of the user signal. This approach reduces bandwidth use and demands on the remote server by transmitting only portions (e.g., extracted features) of the user signal for processing, rather than transmitting the entire user signal. Exemplary methods for performingstep420 in accordance with both local and remote processing steps are described in co-pending, commonly assigned U.S. patent application Ser. No. 10/033,772 (filed Dec. 28, 2001), which is herein incorporated by reference.
In further embodiments, database searches in accordance withstep440 may also be distributed over one or more remote computing devices. For example, once themethod400 selects one or more high-confidence words with which to query the relevant data set(s), themethod400 may transmit these high-confidence words over a network to one or more remotely stored databases. Database searches in accordance withstep440 may implement distributed and/or parallel search techniques, including those described in co-pending, commonly assigned U.S. patent application Ser. No. 10/242,285 (filed Sep. 12, 2002) and Ser. No. 10/399,807 (filed Apr. 23, 2003), both of which are herein incorporated by reference.
Thus, the present invention represents a significant advancement in the field of speech recognition. In one embodiment, the inventive method and apparatus are provided with a dynamic vocabulary that updates each time a word not present in an initial language model is spoken. The dynamic vocabulary enables the method and apparatus to progressively narrow a space in which results (e.g., matches) for a user signal are searched, thereby increasing the accuracy of results that are returned to the user.
Although various embodiments which incorporate the teachings of the present invention have been shown and described in detail herein, those skilled in the art can readily devise many other varied embodiments that still incorporate these teachings.

Claims (49)

US10/912,5172001-09-282004-08-05Method and apparatus for speech recognition using a dynamic vocabularyExpired - LifetimeUS7308404B2 (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
US10/912,517US7308404B2 (en)2001-09-282004-08-05Method and apparatus for speech recognition using a dynamic vocabulary

Applications Claiming Priority (3)

Application NumberPriority DateFiling DateTitle
US09/967,228US6996519B2 (en)2001-09-282001-09-28Method and apparatus for performing relational speech recognition
US49276103P2003-08-052003-08-05
US10/912,517US7308404B2 (en)2001-09-282004-08-05Method and apparatus for speech recognition using a dynamic vocabulary

Related Parent Applications (1)

Application NumberTitlePriority DateFiling Date
US09/967,228Continuation-In-PartUS6996519B2 (en)2001-09-282001-09-28Method and apparatus for performing relational speech recognition

Publications (2)

Publication NumberPublication Date
US20050055210A1 US20050055210A1 (en)2005-03-10
US7308404B2true US7308404B2 (en)2007-12-11

Family

ID=34228552

Family Applications (1)

Application NumberTitlePriority DateFiling Date
US10/912,517Expired - LifetimeUS7308404B2 (en)2001-09-282004-08-05Method and apparatus for speech recognition using a dynamic vocabulary

Country Status (1)

CountryLink
US (1)US7308404B2 (en)

Cited By (149)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US20050234723A1 (en)*2001-09-282005-10-20Arnold James FMethod and apparatus for performing relational speech recognition
US20060143007A1 (en)*2000-07-242006-06-29Koh V EUser interaction with voice information services
US20060206326A1 (en)*2005-03-092006-09-14Canon Kabushiki KaishaSpeech recognition method
US20080221901A1 (en)*2007-03-072008-09-11Joseph CerraMobile general search environment speech processing facility
US20080288252A1 (en)*2007-03-072008-11-20Cerra Joseph PSpeech recognition of speech recorded by a mobile communication facility
US20090030687A1 (en)*2007-03-072009-01-29Cerra Joseph PAdapting an unstructured language model speech recognition system based on usage
US20090076794A1 (en)*2007-09-132009-03-19Microsoft CorporationAdding prototype information into probabilistic models
US20090228270A1 (en)*2008-03-052009-09-10Microsoft CorporationRecognizing multiple semantic items from single utterance
US20100268535A1 (en)*2007-12-182010-10-21Takafumi KoshinakaPronunciation variation rule extraction apparatus, pronunciation variation rule extraction method, and pronunciation variation rule extraction program
US20100312557A1 (en)*2009-06-082010-12-09Microsoft CorporationProgressive application of knowledge sources in multistage speech recognition
US20110029311A1 (en)*2009-07-302011-02-03Sony CorporationVoice processing device and method, and program
US20110029301A1 (en)*2009-07-312011-02-03Samsung Electronics Co., Ltd.Method and apparatus for recognizing speech according to dynamic display
US20110054900A1 (en)*2007-03-072011-03-03Phillips Michael SHybrid command and control between resident and remote speech recognition facilities in a mobile voice-to-speech application
US20110054894A1 (en)*2007-03-072011-03-03Phillips Michael SSpeech recognition through the collection of contact information in mobile dictation application
US20110054896A1 (en)*2007-03-072011-03-03Phillips Michael SSending a communications header with voice recording to send metadata for use in speech recognition and formatting in mobile dictation application
US20110054899A1 (en)*2007-03-072011-03-03Phillips Michael SCommand and control utilizing content information in a mobile voice-to-speech application
US20110060587A1 (en)*2007-03-072011-03-10Phillips Michael SCommand and control utilizing ancillary information in a mobile voice-to-speech application
US20110167061A1 (en)*2010-01-052011-07-07Microsoft CorporationProviding suggestions of related videos
US20110184736A1 (en)*2010-01-262011-07-28Benjamin SlotznickAutomated method of recognizing inputted information items and selecting information items
US20120029904A1 (en)*2010-07-302012-02-02Kristin PrecodaMethod and apparatus for adding new vocabulary to interactive translation and dialogue systems
US20120253799A1 (en)*2011-03-282012-10-04At&T Intellectual Property I, L.P.System and method for rapid customization of speech recognition models
US8527270B2 (en)2010-07-302013-09-03Sri InternationalMethod and apparatus for conducting an interactive dialogue
US8635243B2 (en)2007-03-072014-01-21Research In Motion LimitedSending a communications header with voice recording to send metadata for use in speech recognition, formatting, and search mobile search application
US20140039895A1 (en)*2012-08-032014-02-06Veveo, Inc.Method for using pauses detected in speech input to assist in interpreting the input during conversational interaction for information retrieval
US20140088964A1 (en)*2012-09-252014-03-27Apple Inc.Exemplar-Based Latent Perceptual Modeling for Automatic Speech Recognition
US8838457B2 (en)2007-03-072014-09-16Vlingo CorporationUsing results of unstructured language model based speech recognition to control a system-level function of a mobile communications facility
US8880405B2 (en)2007-03-072014-11-04Vlingo CorporationApplication text entry in a mobile environment using a speech processing facility
US8886540B2 (en)2007-03-072014-11-11Vlingo CorporationUsing speech recognition results based on an unstructured language model in a mobile communication facility application
US8886545B2 (en)2007-03-072014-11-11Vlingo CorporationDealing with switch latency in speech recognition
US8949130B2 (en)2007-03-072015-02-03Vlingo CorporationInternal and external speech recognition use with a mobile communication facility
US8949266B2 (en)2007-03-072015-02-03Vlingo CorporationMultiple web-based content category searching in mobile search application
US9262612B2 (en)2011-03-212016-02-16Apple Inc.Device access using voice authentication
US9300784B2 (en)2013-06-132016-03-29Apple Inc.System and method for emergency calls initiated by voice command
US9305545B2 (en)2013-03-132016-04-05Samsung Electronics Co., Ltd.Speech recognition vocabulary integration for classifying words to identify vocabulary application group
US9318108B2 (en)2010-01-182016-04-19Apple Inc.Intelligent automated assistant
US9330720B2 (en)2008-01-032016-05-03Apple Inc.Methods and apparatus for altering audio output signals
US9338493B2 (en)2014-06-302016-05-10Apple Inc.Intelligent automated assistant for TV user interactions
US9368114B2 (en)2013-03-142016-06-14Apple Inc.Context-sensitive handling of interruptions
US9483461B2 (en)2012-03-062016-11-01Apple Inc.Handling speech synthesis of content for multiple languages
US9495129B2 (en)2012-06-292016-11-15Apple Inc.Device, method, and user interface for voice-activated navigation and browsing of a document
US9535906B2 (en)2008-07-312017-01-03Apple Inc.Mobile device having human language translation capability with positional feedback
US9576574B2 (en)2012-09-102017-02-21Apple Inc.Context-sensitive handling of interruptions by intelligent digital assistant
US9582608B2 (en)2013-06-072017-02-28Apple Inc.Unified ranking with entropy-weighted information for phrase-based semantic auto-completion
US9620104B2 (en)2013-06-072017-04-11Apple Inc.System and method for user-specified pronunciation of words for speech synthesis and recognition
US9626955B2 (en)2008-04-052017-04-18Apple Inc.Intelligent text-to-speech conversion
US9633674B2 (en)2013-06-072017-04-25Apple Inc.System and method for detecting errors in interactions with a voice-based digital assistant
US9633660B2 (en)2010-02-252017-04-25Apple Inc.User profiling for voice input processing
US9646609B2 (en)2014-09-302017-05-09Apple Inc.Caching apparatus for serving phonetic pronunciations
US9646614B2 (en)2000-03-162017-05-09Apple Inc.Fast, language-independent method for user authentication by voice
US9668121B2 (en)2014-09-302017-05-30Apple Inc.Social reminders
US9697820B2 (en)2015-09-242017-07-04Apple Inc.Unit-selection text-to-speech synthesis using concatenation-sensitive neural networks
US9697822B1 (en)2013-03-152017-07-04Apple Inc.System and method for updating an adaptive speech recognition model
US9711141B2 (en)2014-12-092017-07-18Apple Inc.Disambiguating heteronyms in speech synthesis
US9715875B2 (en)2014-05-302017-07-25Apple Inc.Reducing the need for manual start/end-pointing and trigger phrases
US9721566B2 (en)2015-03-082017-08-01Apple Inc.Competing devices responding to voice triggers
US9760559B2 (en)2014-05-302017-09-12Apple Inc.Predictive text input
US9785630B2 (en)2014-05-302017-10-10Apple Inc.Text prediction using combined word N-gram and unigram language models
US9798393B2 (en)2011-08-292017-10-24Apple Inc.Text correction processing
US9818400B2 (en)2014-09-112017-11-14Apple Inc.Method and apparatus for discovering trending terms in speech requests
US9842101B2 (en)2014-05-302017-12-12Apple Inc.Predictive conversion of language input
US9842105B2 (en)2015-04-162017-12-12Apple Inc.Parsimonious continuous-space phrase representations for natural language processing
US9858925B2 (en)2009-06-052018-01-02Apple Inc.Using context information to facilitate processing of commands in a virtual assistant
US9865280B2 (en)2015-03-062018-01-09Apple Inc.Structured dictation using intelligent automated assistants
US9886953B2 (en)2015-03-082018-02-06Apple Inc.Virtual assistant activation
US9886432B2 (en)2014-09-302018-02-06Apple Inc.Parsimonious handling of word inflection via categorical stem + suffix N-gram language models
US9899019B2 (en)2015-03-182018-02-20Apple Inc.Systems and methods for structured stem and suffix language models
US9922642B2 (en)2013-03-152018-03-20Apple Inc.Training an at least partial voice command system
US9934775B2 (en)2016-05-262018-04-03Apple Inc.Unit-selection text-to-speech synthesis based on predicted concatenation parameters
US9953088B2 (en)2012-05-142018-04-24Apple Inc.Crowd sourcing information to fulfill user requests
US9966068B2 (en)2013-06-082018-05-08Apple Inc.Interpreting and acting upon commands that involve sharing information with remote devices
US9966065B2 (en)2014-05-302018-05-08Apple Inc.Multi-command single utterance input method
US9971774B2 (en)2012-09-192018-05-15Apple Inc.Voice-based media searching
US9972304B2 (en)2016-06-032018-05-15Apple Inc.Privacy preserving distributed evaluation framework for embedded personalized systems
US10043516B2 (en)2016-09-232018-08-07Apple Inc.Intelligent automated assistant
US10049663B2 (en)2016-06-082018-08-14Apple, Inc.Intelligent automated assistant for media exploration
US10049668B2 (en)2015-12-022018-08-14Apple Inc.Applying neural network language models to weighted finite state transducers for automatic speech recognition
US10056077B2 (en)2007-03-072018-08-21Nuance Communications, Inc.Using speech recognition results based on an unstructured language model with a music system
US10057736B2 (en)2011-06-032018-08-21Apple Inc.Active transport based notifications
US10067938B2 (en)2016-06-102018-09-04Apple Inc.Multilingual word prediction
US10074360B2 (en)2014-09-302018-09-11Apple Inc.Providing an indication of the suitability of speech recognition
US10078631B2 (en)2014-05-302018-09-18Apple Inc.Entropy-guided text prediction using combined word and character n-gram language models
US10079014B2 (en)2012-06-082018-09-18Apple Inc.Name recognition system
US10083690B2 (en)2014-05-302018-09-25Apple Inc.Better resolution when referencing to concepts
US10083688B2 (en)2015-05-272018-09-25Apple Inc.Device voice control for selecting a displayed affordance
US10089072B2 (en)2016-06-112018-10-02Apple Inc.Intelligent device arbitration and control
US10101822B2 (en)2015-06-052018-10-16Apple Inc.Language input correction
US10127911B2 (en)2014-09-302018-11-13Apple Inc.Speaker identification and unsupervised speaker adaptation techniques
US10127220B2 (en)2015-06-042018-11-13Apple Inc.Language identification from short strings
US10134385B2 (en)2012-03-022018-11-20Apple Inc.Systems and methods for name pronunciation
US10170123B2 (en)2014-05-302019-01-01Apple Inc.Intelligent assistant for home automation
US10169329B2 (en)2014-05-302019-01-01Apple Inc.Exemplar-based natural language processing
US10176167B2 (en)2013-06-092019-01-08Apple Inc.System and method for inferring user intent from speech inputs
US10185542B2 (en)2013-06-092019-01-22Apple Inc.Device, method, and graphical user interface for enabling conversation persistence across two or more instances of a digital assistant
US10186254B2 (en)2015-06-072019-01-22Apple Inc.Context-based endpoint detection
US10192552B2 (en)2016-06-102019-01-29Apple Inc.Digital assistant providing whispered speech
US10223066B2 (en)2015-12-232019-03-05Apple Inc.Proactive assistance based on dialog communication between devices
US10241752B2 (en)2011-09-302019-03-26Apple Inc.Interface for a virtual digital assistant
US10241644B2 (en)2011-06-032019-03-26Apple Inc.Actionable reminder entries
US10255907B2 (en)2015-06-072019-04-09Apple Inc.Automatic accent detection using acoustic models
US10269345B2 (en)2016-06-112019-04-23Apple Inc.Intelligent task discovery
US10276170B2 (en)2010-01-182019-04-30Apple Inc.Intelligent automated assistant
US10283110B2 (en)2009-07-022019-05-07Apple Inc.Methods and apparatuses for automatic speech recognition
US10297253B2 (en)2016-06-112019-05-21Apple Inc.Application integration with a digital assistant
US10318871B2 (en)2005-09-082019-06-11Apple Inc.Method and apparatus for building an intelligent automated assistant
US10332518B2 (en)2017-05-092019-06-25Apple Inc.User interface for correcting recognition errors
US10356243B2 (en)2015-06-052019-07-16Apple Inc.Virtual assistant aided communication with 3rd party service in a communication session
US10354011B2 (en)2016-06-092019-07-16Apple Inc.Intelligent automated assistant in a home environment
US10366158B2 (en)2015-09-292019-07-30Apple Inc.Efficient word encoding for recurrent neural network language models
US10410637B2 (en)2017-05-122019-09-10Apple Inc.User-specific acoustic models
US10446141B2 (en)2014-08-282019-10-15Apple Inc.Automatic speech recognition based on user feedback
US10446143B2 (en)2016-03-142019-10-15Apple Inc.Identification of voice inputs providing credentials
US10482874B2 (en)2017-05-152019-11-19Apple Inc.Hierarchical belief states for digital assistants
US10490187B2 (en)2016-06-102019-11-26Apple Inc.Digital assistant providing automated status report
US10496753B2 (en)2010-01-182019-12-03Apple Inc.Automatically adapting user interfaces for hands-free interaction
US10509862B2 (en)2016-06-102019-12-17Apple Inc.Dynamic phrase expansion of language input
US10521466B2 (en)2016-06-112019-12-31Apple Inc.Data driven natural language event detection and classification
US10552013B2 (en)2014-12-022020-02-04Apple Inc.Data detection
US10553209B2 (en)2010-01-182020-02-04Apple Inc.Systems and methods for hands-free notification summaries
US10567477B2 (en)2015-03-082020-02-18Apple Inc.Virtual assistant continuity
US10568032B2 (en)2007-04-032020-02-18Apple Inc.Method and system for operating a multi-function portable electronic device using voice-activation
US10593346B2 (en)2016-12-222020-03-17Apple Inc.Rank-reduced token representation for automatic speech recognition
US10607141B2 (en)2010-01-252020-03-31Newvaluexchange Ltd.Apparatuses, methods and systems for a digital conversation management platform
US10659851B2 (en)2014-06-302020-05-19Apple Inc.Real-time digital assistant knowledge updates
US10671428B2 (en)2015-09-082020-06-02Apple Inc.Distributed personal assistant
US10679605B2 (en)2010-01-182020-06-09Apple Inc.Hands-free list-reading by intelligent automated assistant
US10691473B2 (en)2015-11-062020-06-23Apple Inc.Intelligent automated assistant in a messaging environment
US10705794B2 (en)2010-01-182020-07-07Apple Inc.Automatically adapting user interfaces for hands-free interaction
US10706373B2 (en)2011-06-032020-07-07Apple Inc.Performing actions associated with task items that represent tasks to perform
US10733993B2 (en)2016-06-102020-08-04Apple Inc.Intelligent digital assistant in a multi-tasking environment
US10747498B2 (en)2015-09-082020-08-18Apple Inc.Zero latency digital assistant
US10755703B2 (en)2017-05-112020-08-25Apple Inc.Offline personal assistant
US10791176B2 (en)2017-05-122020-09-29Apple Inc.Synchronization and task delegation of a digital assistant
US10789945B2 (en)2017-05-122020-09-29Apple Inc.Low-latency intelligent automated assistant
US10789041B2 (en)2014-09-122020-09-29Apple Inc.Dynamic thresholds for always listening speech trigger
US10810274B2 (en)2017-05-152020-10-20Apple Inc.Optimizing dialogue policy decisions for digital assistants using implicit feedback
US20200357388A1 (en)*2019-05-102020-11-12Google LlcUsing Context Information With End-to-End Models for Speech Recognition
US11010550B2 (en)2015-09-292021-05-18Apple Inc.Unified language modeling framework for word prediction, auto-completion and auto-correction
US11025565B2 (en)2015-06-072021-06-01Apple Inc.Personalized prediction of responses for instant messaging
US20210201932A1 (en)*2013-05-072021-07-01Veveo, Inc.Method of and system for real time feedback in an incremental speech input interface
US11217255B2 (en)2017-05-162022-01-04Apple Inc.Far-field extension for digital assistant services
US11281993B2 (en)2016-12-052022-03-22Apple Inc.Model and ensemble compression for metric learning
US11551695B1 (en)*2020-05-132023-01-10Amazon Technologies, Inc.Model training system for custom speech-to-text models
US11587559B2 (en)2015-09-302023-02-21Apple Inc.Intelligent device identification
US11811889B2 (en)2015-01-302023-11-07Rovi Guides, Inc.Systems and methods for resolving ambiguous terms based on media asset schedule
US11847151B2 (en)2012-07-312023-12-19Veveo, Inc.Disambiguating user intent in conversational interaction system for large corpus information retrieval
US12032643B2 (en)2012-07-202024-07-09Veveo, Inc.Method of and system for inferring user intent in search input in a conversational interaction system
US12118984B2 (en)2020-11-112024-10-15Rovi Guides, Inc.Systems and methods to resolve conflicts in conversations
US12169496B2 (en)2013-05-102024-12-17Adeia Guides Inc.Method and system for capturing and exploiting user intent in a conversational interaction based information retrieval system
US12346368B2 (en)2014-12-232025-07-01Adeia Guides Inc.Systems and methods for determining whether a negation statement applies to a current or past query

Families Citing this family (43)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US20060149551A1 (en)*2004-12-222006-07-06Ganong William F IiiMobile dictation correction user interface
TWI293753B (en)*2004-12-312008-02-21Delta Electronics IncMethod and apparatus of speech pattern selection for speech recognition
WO2007055766A2 (en)*2005-08-092007-05-18Mobile Voicecontrol, Inc.Control center for a voice controlled wireless communication device system
US8635073B2 (en)*2005-09-142014-01-21At&T Intellectual Property I, L.P.Wireless multimodal voice browser for wireline-based IPTV services
US8244545B2 (en)*2006-03-302012-08-14Microsoft CorporationDialog repair based on discrepancies between user model predictions and speech recognition results
US7756708B2 (en)*2006-04-032010-07-13Google Inc.Automatic language model update
US8688451B2 (en)*2006-05-112014-04-01General Motors LlcDistinguishing out-of-vocabulary speech from in-vocabulary speech
WO2008004663A1 (en)*2006-07-072008-01-10Nec CorporationLanguage model updating device, language model updating method, and language model updating program
DE602006005830D1 (en)2006-11-302009-04-30Harman Becker Automotive Sys Interactive speech recognition system
EP1936606B1 (en)2006-12-212011-10-05Harman Becker Automotive Systems GmbHMulti-stage speech recognition
US20110054895A1 (en)*2007-03-072011-03-03Phillips Michael SUtilizing user transmitted text to improve language model in mobile dictation application
US20090030697A1 (en)*2007-03-072009-01-29Cerra Joseph PUsing contextual information for delivering results generated from a speech recognition facility using an unstructured language model
US20110054898A1 (en)*2007-03-072011-03-03Phillips Michael SMultiple web-based content search user interface in mobile search application
US20080312934A1 (en)*2007-03-072008-12-18Cerra Joseph PUsing results of unstructured language model based speech recognition to perform an action on a mobile communications facility
US20090030688A1 (en)*2007-03-072009-01-29Cerra Joseph PTagging speech recognition results based on an unstructured language model for use in a mobile communication facility application
US20090030685A1 (en)*2007-03-072009-01-29Cerra Joseph PUsing speech recognition results based on an unstructured language model with a navigation system
US8396713B2 (en)*2007-04-302013-03-12Nuance Communications, Inc.Method and system for using a statistical language model and an action classifier in parallel with grammar for better handling of out-of-grammar utterances
DE102008027958A1 (en)*2008-03-032009-10-08Navigon Ag Method for operating a navigation system
EP2259252B1 (en)2009-06-022012-08-01Nuance Communications, Inc.Speech recognition method for selecting a combination of list elements via a speech input
JP5697860B2 (en)*2009-09-092015-04-08クラリオン株式会社 Information search device, information search method, and navigation system
US8744860B2 (en)2010-08-022014-06-03At&T Intellectual Property I, L.P.Apparatus and method for providing messages in a social network
EP2650805B1 (en)*2010-11-102017-08-30Rakuten, Inc.Related-word registration device, information processing device, related-word registration method, program for related-word registration device, and recording medium
US9031839B2 (en)*2010-12-012015-05-12Cisco Technology, Inc.Conference transcription based on conference data
US9418152B2 (en)*2011-02-092016-08-16Nice-Systems Ltd.System and method for flexible speech to text search mechanism
US8630860B1 (en)*2011-03-032014-01-14Nuance Communications, Inc.Speaker and call characteristic sensitive open voice search
US8909512B2 (en)*2011-11-012014-12-09Google Inc.Enhanced stability prediction for incrementally generated speech recognition hypotheses based on an age of a hypothesis
US9620111B1 (en)*2012-05-012017-04-11Amazon Technologies, Inc.Generation and maintenance of language model
US10354650B2 (en)2012-06-262019-07-16Google LlcRecognizing speech with mixed speech recognition models to generate transcriptions
US10031968B2 (en)*2012-10-112018-07-24Veveo, Inc.Method for adaptive conversation state management with filtering operators applied dynamically as part of a conversational interface
US9697821B2 (en)*2013-01-292017-07-04Tencent Technology (Shenzhen) Company LimitedMethod and system for building a topic specific language model for use in automatic speech recognition
WO2015026366A1 (en)*2013-08-232015-02-26Nuance Communications, Inc.Multiple pass automatic speech recognition methods and apparatus
DE102014017385B4 (en)*2014-11-242016-06-23Audi Ag Motor vehicle device operation with operator correction
JP6514503B2 (en)2014-12-252019-05-15クラリオン株式会社 Intention estimation device and intention estimation system
CN116628157A (en)*2015-10-212023-08-22谷歌有限责任公司Parameter collection and automatic dialog generation in dialog systems
US10896681B2 (en)*2015-12-292021-01-19Google LlcSpeech recognition with selective use of dynamic language models
US9912977B2 (en)*2016-02-042018-03-06The Directv Group, Inc.Method and system for controlling a user receiving device using voice commands
US10192555B2 (en)*2016-04-282019-01-29Microsoft Technology Licensing, LlcDynamic speech recognition data evaluation
US10026398B2 (en)*2016-07-082018-07-17Google LlcFollow-up voice query prediction
US10832664B2 (en)*2016-08-192020-11-10Google LlcAutomated speech recognition using language models that selectively use domain-specific model components
US10607599B1 (en)2019-09-062020-03-31Verbit Software Ltd.Human-curated glossary for rapid hybrid-based transcription of audio
US11823659B2 (en)*2019-12-112023-11-21Amazon Technologies, Inc.Speech recognition through disambiguation feedback
US11900817B2 (en)2020-01-272024-02-13Honeywell International Inc.Aircraft speech recognition systems and methods
EP3855428B1 (en)*2020-01-272023-09-06Honeywell International Inc.Aircraft speech recognition systems and methods

Citations (11)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US5177685A (en)1990-08-091993-01-05Massachusetts Institute Of TechnologyAutomobile navigation system using real time spoken driving instructions
US5613036A (en)*1992-12-311997-03-18Apple Computer, Inc.Dynamic categories for a speech recognition system
US5825978A (en)1994-07-181998-10-20Sri InternationalMethod and apparatus for speech recognition using optimized partial mixture tying of HMM state functions
US5839106A (en)1996-12-171998-11-17Apple Computer, Inc.Large-vocabulary speech recognition using an integrated syntactic and semantic statistical language model
US6029132A (en)1998-04-302000-02-22Matsushita Electric Industrial Co.Method for letter-to-sound in text-to-speech synthesis
US6314165B1 (en)1998-04-302001-11-06Matsushita Electric Industrial Co., Ltd.Automated hotel attendant using speech recognition
US6526380B1 (en)1999-03-262003-02-25Koninklijke Philips Electronics N.V.Speech recognition system having parallel large vocabulary recognition engines
US6631346B1 (en)1999-04-072003-10-07Matsushita Electric Industrial Co., Ltd.Method and apparatus for natural language parsing using multiple passes and tags
US6823493B2 (en)2003-01-232004-11-23Aurilab, LlcWord recognition consistency check and error correction system and method
US6917910B2 (en)*1999-12-272005-07-12International Business Machines CorporationMethod, apparatus, computer system and storage medium for speech recognition
US7027987B1 (en)2001-02-072006-04-11Google Inc.Voice interface for a search engine

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US7016849B2 (en)*2002-03-252006-03-21Sri InternationalMethod and apparatus for providing speech-driven routing between spoken language applications

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US5177685A (en)1990-08-091993-01-05Massachusetts Institute Of TechnologyAutomobile navigation system using real time spoken driving instructions
US5613036A (en)*1992-12-311997-03-18Apple Computer, Inc.Dynamic categories for a speech recognition system
US5825978A (en)1994-07-181998-10-20Sri InternationalMethod and apparatus for speech recognition using optimized partial mixture tying of HMM state functions
US5839106A (en)1996-12-171998-11-17Apple Computer, Inc.Large-vocabulary speech recognition using an integrated syntactic and semantic statistical language model
US6029132A (en)1998-04-302000-02-22Matsushita Electric Industrial Co.Method for letter-to-sound in text-to-speech synthesis
US6314165B1 (en)1998-04-302001-11-06Matsushita Electric Industrial Co., Ltd.Automated hotel attendant using speech recognition
US6526380B1 (en)1999-03-262003-02-25Koninklijke Philips Electronics N.V.Speech recognition system having parallel large vocabulary recognition engines
US6631346B1 (en)1999-04-072003-10-07Matsushita Electric Industrial Co., Ltd.Method and apparatus for natural language parsing using multiple passes and tags
US6917910B2 (en)*1999-12-272005-07-12International Business Machines CorporationMethod, apparatus, computer system and storage medium for speech recognition
US7027987B1 (en)2001-02-072006-04-11Google Inc.Voice interface for a search engine
US6823493B2 (en)2003-01-232004-11-23Aurilab, LlcWord recognition consistency check and error correction system and method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Geutner, P., et al., "Transcribing Multilingual Broadcast News Using Hypothesis Driven Lexical Adaption", Interactive Systems Laboratories, School of Computer Science, Carnegie Mellon University, In Proc. DARPA Broadcast News Workshop, http://www.nist.gov/speech/publications/darpe98/html/bn60.htm, 1998.
Hunt, M., "Automatic Identification of Spoken Names and Addresses - and why we should abolish account numbers," Novauris, A James Baker Company Presentation, www.novauris.com, Date Unknown, (Assignee became aware of this reference on Feb. 4, 2005).

Cited By (212)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US9646614B2 (en)2000-03-162017-05-09Apple Inc.Fast, language-independent method for user authentication by voice
US20060143007A1 (en)*2000-07-242006-06-29Koh V EUser interaction with voice information services
US7533020B2 (en)*2001-09-282009-05-12Nuance Communications, Inc.Method and apparatus for performing relational speech recognition
US20050234723A1 (en)*2001-09-282005-10-20Arnold James FMethod and apparatus for performing relational speech recognition
US20060206326A1 (en)*2005-03-092006-09-14Canon Kabushiki KaishaSpeech recognition method
US7634401B2 (en)*2005-03-092009-12-15Canon Kabushiki KaishaSpeech recognition method for determining missing speech
US10318871B2 (en)2005-09-082019-06-11Apple Inc.Method and apparatus for building an intelligent automated assistant
US20110060587A1 (en)*2007-03-072011-03-10Phillips Michael SCommand and control utilizing ancillary information in a mobile voice-to-speech application
US10056077B2 (en)2007-03-072018-08-21Nuance Communications, Inc.Using speech recognition results based on an unstructured language model with a music system
US8949130B2 (en)2007-03-072015-02-03Vlingo CorporationInternal and external speech recognition use with a mobile communication facility
US8996379B2 (en)2007-03-072015-03-31Vlingo CorporationSpeech recognition text entry for software applications
US8886545B2 (en)2007-03-072014-11-11Vlingo CorporationDealing with switch latency in speech recognition
US8886540B2 (en)2007-03-072014-11-11Vlingo CorporationUsing speech recognition results based on an unstructured language model in a mobile communication facility application
US20090030687A1 (en)*2007-03-072009-01-29Cerra Joseph PAdapting an unstructured language model speech recognition system based on usage
US20110054900A1 (en)*2007-03-072011-03-03Phillips Michael SHybrid command and control between resident and remote speech recognition facilities in a mobile voice-to-speech application
US20110054894A1 (en)*2007-03-072011-03-03Phillips Michael SSpeech recognition through the collection of contact information in mobile dictation application
US20110054896A1 (en)*2007-03-072011-03-03Phillips Michael SSending a communications header with voice recording to send metadata for use in speech recognition and formatting in mobile dictation application
US20110054899A1 (en)*2007-03-072011-03-03Phillips Michael SCommand and control utilizing content information in a mobile voice-to-speech application
US8949266B2 (en)2007-03-072015-02-03Vlingo CorporationMultiple web-based content category searching in mobile search application
US8880405B2 (en)2007-03-072014-11-04Vlingo CorporationApplication text entry in a mobile environment using a speech processing facility
US20080288252A1 (en)*2007-03-072008-11-20Cerra Joseph PSpeech recognition of speech recorded by a mobile communication facility
US8838457B2 (en)2007-03-072014-09-16Vlingo CorporationUsing results of unstructured language model based speech recognition to control a system-level function of a mobile communications facility
US9495956B2 (en)2007-03-072016-11-15Nuance Communications, Inc.Dealing with switch latency in speech recognition
US8635243B2 (en)2007-03-072014-01-21Research In Motion LimitedSending a communications header with voice recording to send metadata for use in speech recognition, formatting, and search mobile search application
US20080221901A1 (en)*2007-03-072008-09-11Joseph CerraMobile general search environment speech processing facility
US9619572B2 (en)2007-03-072017-04-11Nuance Communications, Inc.Multiple web-based content category searching in mobile search application
US10568032B2 (en)2007-04-032020-02-18Apple Inc.Method and system for operating a multi-function portable electronic device using voice-activation
US8010341B2 (en)*2007-09-132011-08-30Microsoft CorporationAdding prototype information into probabilistic models
US20090076794A1 (en)*2007-09-132009-03-19Microsoft CorporationAdding prototype information into probabilistic models
US8595004B2 (en)*2007-12-182013-11-26Nec CorporationPronunciation variation rule extraction apparatus, pronunciation variation rule extraction method, and pronunciation variation rule extraction program
US20100268535A1 (en)*2007-12-182010-10-21Takafumi KoshinakaPronunciation variation rule extraction apparatus, pronunciation variation rule extraction method, and pronunciation variation rule extraction program
US9330720B2 (en)2008-01-032016-05-03Apple Inc.Methods and apparatus for altering audio output signals
US10381016B2 (en)2008-01-032019-08-13Apple Inc.Methods and apparatus for altering audio output signals
US8725492B2 (en)2008-03-052014-05-13Microsoft CorporationRecognizing multiple semantic items from single utterance
US20090228270A1 (en)*2008-03-052009-09-10Microsoft CorporationRecognizing multiple semantic items from single utterance
US9626955B2 (en)2008-04-052017-04-18Apple Inc.Intelligent text-to-speech conversion
US9865248B2 (en)2008-04-052018-01-09Apple Inc.Intelligent text-to-speech conversion
US10108612B2 (en)2008-07-312018-10-23Apple Inc.Mobile device having human language translation capability with positional feedback
US9535906B2 (en)2008-07-312017-01-03Apple Inc.Mobile device having human language translation capability with positional feedback
US9858925B2 (en)2009-06-052018-01-02Apple Inc.Using context information to facilitate processing of commands in a virtual assistant
US10475446B2 (en)2009-06-052019-11-12Apple Inc.Using context information to facilitate processing of commands in a virtual assistant
US10795541B2 (en)2009-06-052020-10-06Apple Inc.Intelligent organization of tasks items
US11080012B2 (en)2009-06-052021-08-03Apple Inc.Interface for a virtual digital assistant
US8386251B2 (en)2009-06-082013-02-26Microsoft CorporationProgressive application of knowledge sources in multistage speech recognition
US20100312557A1 (en)*2009-06-082010-12-09Microsoft CorporationProgressive application of knowledge sources in multistage speech recognition
US10283110B2 (en)2009-07-022019-05-07Apple Inc.Methods and apparatuses for automatic speech recognition
US8612223B2 (en)*2009-07-302013-12-17Sony CorporationVoice processing device and method, and program
US20110029311A1 (en)*2009-07-302011-02-03Sony CorporationVoice processing device and method, and program
US9269356B2 (en)2009-07-312016-02-23Samsung Electronics Co., Ltd.Method and apparatus for recognizing speech according to dynamic display
US20110029301A1 (en)*2009-07-312011-02-03Samsung Electronics Co., Ltd.Method and apparatus for recognizing speech according to dynamic display
US8209316B2 (en)*2010-01-052012-06-26Microsoft CorporationProviding suggestions of related videos
US20110167061A1 (en)*2010-01-052011-07-07Microsoft CorporationProviding suggestions of related videos
US12087308B2 (en)2010-01-182024-09-10Apple Inc.Intelligent automated assistant
US10705794B2 (en)2010-01-182020-07-07Apple Inc.Automatically adapting user interfaces for hands-free interaction
US10553209B2 (en)2010-01-182020-02-04Apple Inc.Systems and methods for hands-free notification summaries
US10276170B2 (en)2010-01-182019-04-30Apple Inc.Intelligent automated assistant
US11423886B2 (en)2010-01-182022-08-23Apple Inc.Task flow identification based on user intent
US9548050B2 (en)2010-01-182017-01-17Apple Inc.Intelligent automated assistant
US10679605B2 (en)2010-01-182020-06-09Apple Inc.Hands-free list-reading by intelligent automated assistant
US9318108B2 (en)2010-01-182016-04-19Apple Inc.Intelligent automated assistant
US10706841B2 (en)2010-01-182020-07-07Apple Inc.Task flow identification based on user intent
US10496753B2 (en)2010-01-182019-12-03Apple Inc.Automatically adapting user interfaces for hands-free interaction
US10607140B2 (en)2010-01-252020-03-31Newvaluexchange Ltd.Apparatuses, methods and systems for a digital conversation management platform
US10984327B2 (en)2010-01-252021-04-20New Valuexchange Ltd.Apparatuses, methods and systems for a digital conversation management platform
US10984326B2 (en)2010-01-252021-04-20Newvaluexchange Ltd.Apparatuses, methods and systems for a digital conversation management platform
US12307383B2 (en)2010-01-252025-05-20Newvaluexchange Global Ai LlpApparatuses, methods and systems for a digital conversation management platform
US10607141B2 (en)2010-01-252020-03-31Newvaluexchange Ltd.Apparatuses, methods and systems for a digital conversation management platform
US11410053B2 (en)2010-01-252022-08-09Newvaluexchange Ltd.Apparatuses, methods and systems for a digital conversation management platform
US20110184736A1 (en)*2010-01-262011-07-28Benjamin SlotznickAutomated method of recognizing inputted information items and selecting information items
US10049675B2 (en)2010-02-252018-08-14Apple Inc.User profiling for voice input processing
US9633660B2 (en)2010-02-252017-04-25Apple Inc.User profiling for voice input processing
US8527270B2 (en)2010-07-302013-09-03Sri InternationalMethod and apparatus for conducting an interactive dialogue
US9576570B2 (en)*2010-07-302017-02-21Sri InternationalMethod and apparatus for adding new vocabulary to interactive translation and dialogue systems
US20120029904A1 (en)*2010-07-302012-02-02Kristin PrecodaMethod and apparatus for adding new vocabulary to interactive translation and dialogue systems
US9262612B2 (en)2011-03-212016-02-16Apple Inc.Device access using voice authentication
US10102359B2 (en)2011-03-212018-10-16Apple Inc.Device access using voice authentication
US10726833B2 (en)2011-03-282020-07-28Nuance Communications, Inc.System and method for rapid customization of speech recognition models
US9679561B2 (en)*2011-03-282017-06-13Nuance Communications, Inc.System and method for rapid customization of speech recognition models
US9978363B2 (en)2011-03-282018-05-22Nuance Communications, Inc.System and method for rapid customization of speech recognition models
US20120253799A1 (en)*2011-03-282012-10-04At&T Intellectual Property I, L.P.System and method for rapid customization of speech recognition models
US10057736B2 (en)2011-06-032018-08-21Apple Inc.Active transport based notifications
US11120372B2 (en)2011-06-032021-09-14Apple Inc.Performing actions associated with task items that represent tasks to perform
US10706373B2 (en)2011-06-032020-07-07Apple Inc.Performing actions associated with task items that represent tasks to perform
US10241644B2 (en)2011-06-032019-03-26Apple Inc.Actionable reminder entries
US9798393B2 (en)2011-08-292017-10-24Apple Inc.Text correction processing
US10241752B2 (en)2011-09-302019-03-26Apple Inc.Interface for a virtual digital assistant
US10134385B2 (en)2012-03-022018-11-20Apple Inc.Systems and methods for name pronunciation
US9483461B2 (en)2012-03-062016-11-01Apple Inc.Handling speech synthesis of content for multiple languages
US9953088B2 (en)2012-05-142018-04-24Apple Inc.Crowd sourcing information to fulfill user requests
US10079014B2 (en)2012-06-082018-09-18Apple Inc.Name recognition system
US9495129B2 (en)2012-06-292016-11-15Apple Inc.Device, method, and user interface for voice-activated navigation and browsing of a document
US12032643B2 (en)2012-07-202024-07-09Veveo, Inc.Method of and system for inferring user intent in search input in a conversational interaction system
US12169514B2 (en)2012-07-312024-12-17Adeia Guides Inc.Methods and systems for supplementing media assets during fast-access playback operations
US11847151B2 (en)2012-07-312023-12-19Veveo, Inc.Disambiguating user intent in conversational interaction system for large corpus information retrieval
US11024297B2 (en)*2012-08-032021-06-01Veveo, Inc.Method for using pauses detected in speech input to assist in interpreting the input during conversational interaction for information retrieval
US20190130899A1 (en)*2012-08-032019-05-02Veveo, Inc.Method for using pauses detected in speech input to assist in interpreting the input during conversational interaction for information retrieval
US9799328B2 (en)*2012-08-032017-10-24Veveo, Inc.Method for using pauses detected in speech input to assist in interpreting the input during conversational interaction for information retrieval
US20140039895A1 (en)*2012-08-032014-02-06Veveo, Inc.Method for using pauses detected in speech input to assist in interpreting the input during conversational interaction for information retrieval
US10140982B2 (en)*2012-08-032018-11-27Veveo, Inc.Method for using pauses detected in speech input to assist in interpreting the input during conversational interaction for information retrieval
US9576574B2 (en)2012-09-102017-02-21Apple Inc.Context-sensitive handling of interruptions by intelligent digital assistant
US9971774B2 (en)2012-09-192018-05-15Apple Inc.Voice-based media searching
US20140088964A1 (en)*2012-09-252014-03-27Apple Inc.Exemplar-Based Latent Perceptual Modeling for Automatic Speech Recognition
US8935167B2 (en)*2012-09-252015-01-13Apple Inc.Exemplar-based latent perceptual modeling for automatic speech recognition
US9305545B2 (en)2013-03-132016-04-05Samsung Electronics Co., Ltd.Speech recognition vocabulary integration for classifying words to identify vocabulary application group
US9368114B2 (en)2013-03-142016-06-14Apple Inc.Context-sensitive handling of interruptions
US9922642B2 (en)2013-03-152018-03-20Apple Inc.Training an at least partial voice command system
US9697822B1 (en)2013-03-152017-07-04Apple Inc.System and method for updating an adaptive speech recognition model
US20210201932A1 (en)*2013-05-072021-07-01Veveo, Inc.Method of and system for real time feedback in an incremental speech input interface
US12169496B2 (en)2013-05-102024-12-17Adeia Guides Inc.Method and system for capturing and exploiting user intent in a conversational interaction based information retrieval system
US9966060B2 (en)2013-06-072018-05-08Apple Inc.System and method for user-specified pronunciation of words for speech synthesis and recognition
US9620104B2 (en)2013-06-072017-04-11Apple Inc.System and method for user-specified pronunciation of words for speech synthesis and recognition
US9633674B2 (en)2013-06-072017-04-25Apple Inc.System and method for detecting errors in interactions with a voice-based digital assistant
US9582608B2 (en)2013-06-072017-02-28Apple Inc.Unified ranking with entropy-weighted information for phrase-based semantic auto-completion
US10657961B2 (en)2013-06-082020-05-19Apple Inc.Interpreting and acting upon commands that involve sharing information with remote devices
US9966068B2 (en)2013-06-082018-05-08Apple Inc.Interpreting and acting upon commands that involve sharing information with remote devices
US10185542B2 (en)2013-06-092019-01-22Apple Inc.Device, method, and graphical user interface for enabling conversation persistence across two or more instances of a digital assistant
US10176167B2 (en)2013-06-092019-01-08Apple Inc.System and method for inferring user intent from speech inputs
US9300784B2 (en)2013-06-132016-03-29Apple Inc.System and method for emergency calls initiated by voice command
US11133008B2 (en)2014-05-302021-09-28Apple Inc.Reducing the need for manual start/end-pointing and trigger phrases
US11257504B2 (en)2014-05-302022-02-22Apple Inc.Intelligent assistant for home automation
US9842101B2 (en)2014-05-302017-12-12Apple Inc.Predictive conversion of language input
US10169329B2 (en)2014-05-302019-01-01Apple Inc.Exemplar-based natural language processing
US10170123B2 (en)2014-05-302019-01-01Apple Inc.Intelligent assistant for home automation
US10497365B2 (en)2014-05-302019-12-03Apple Inc.Multi-command single utterance input method
US9785630B2 (en)2014-05-302017-10-10Apple Inc.Text prediction using combined word N-gram and unigram language models
US9966065B2 (en)2014-05-302018-05-08Apple Inc.Multi-command single utterance input method
US9715875B2 (en)2014-05-302017-07-25Apple Inc.Reducing the need for manual start/end-pointing and trigger phrases
US10083690B2 (en)2014-05-302018-09-25Apple Inc.Better resolution when referencing to concepts
US10078631B2 (en)2014-05-302018-09-18Apple Inc.Entropy-guided text prediction using combined word and character n-gram language models
US9760559B2 (en)2014-05-302017-09-12Apple Inc.Predictive text input
US9668024B2 (en)2014-06-302017-05-30Apple Inc.Intelligent automated assistant for TV user interactions
US10659851B2 (en)2014-06-302020-05-19Apple Inc.Real-time digital assistant knowledge updates
US10904611B2 (en)2014-06-302021-01-26Apple Inc.Intelligent automated assistant for TV user interactions
US9338493B2 (en)2014-06-302016-05-10Apple Inc.Intelligent automated assistant for TV user interactions
US10446141B2 (en)2014-08-282019-10-15Apple Inc.Automatic speech recognition based on user feedback
US10431204B2 (en)2014-09-112019-10-01Apple Inc.Method and apparatus for discovering trending terms in speech requests
US9818400B2 (en)2014-09-112017-11-14Apple Inc.Method and apparatus for discovering trending terms in speech requests
US10789041B2 (en)2014-09-122020-09-29Apple Inc.Dynamic thresholds for always listening speech trigger
US9886432B2 (en)2014-09-302018-02-06Apple Inc.Parsimonious handling of word inflection via categorical stem + suffix N-gram language models
US9646609B2 (en)2014-09-302017-05-09Apple Inc.Caching apparatus for serving phonetic pronunciations
US9668121B2 (en)2014-09-302017-05-30Apple Inc.Social reminders
US10127911B2 (en)2014-09-302018-11-13Apple Inc.Speaker identification and unsupervised speaker adaptation techniques
US10074360B2 (en)2014-09-302018-09-11Apple Inc.Providing an indication of the suitability of speech recognition
US9986419B2 (en)2014-09-302018-05-29Apple Inc.Social reminders
US11556230B2 (en)2014-12-022023-01-17Apple Inc.Data detection
US10552013B2 (en)2014-12-022020-02-04Apple Inc.Data detection
US9711141B2 (en)2014-12-092017-07-18Apple Inc.Disambiguating heteronyms in speech synthesis
US12346368B2 (en)2014-12-232025-07-01Adeia Guides Inc.Systems and methods for determining whether a negation statement applies to a current or past query
US11997176B2 (en)2015-01-302024-05-28Rovi Guides, Inc.Systems and methods for resolving ambiguous terms in social chatter based on a user profile
US11811889B2 (en)2015-01-302023-11-07Rovi Guides, Inc.Systems and methods for resolving ambiguous terms based on media asset schedule
US11843676B2 (en)2015-01-302023-12-12Rovi Guides, Inc.Systems and methods for resolving ambiguous terms based on user input
US11991257B2 (en)2015-01-302024-05-21Rovi Guides, Inc.Systems and methods for resolving ambiguous terms based on media asset chronology
US9865280B2 (en)2015-03-062018-01-09Apple Inc.Structured dictation using intelligent automated assistants
US9721566B2 (en)2015-03-082017-08-01Apple Inc.Competing devices responding to voice triggers
US10567477B2 (en)2015-03-082020-02-18Apple Inc.Virtual assistant continuity
US10311871B2 (en)2015-03-082019-06-04Apple Inc.Competing devices responding to voice triggers
US11087759B2 (en)2015-03-082021-08-10Apple Inc.Virtual assistant activation
US9886953B2 (en)2015-03-082018-02-06Apple Inc.Virtual assistant activation
US9899019B2 (en)2015-03-182018-02-20Apple Inc.Systems and methods for structured stem and suffix language models
US9842105B2 (en)2015-04-162017-12-12Apple Inc.Parsimonious continuous-space phrase representations for natural language processing
US10083688B2 (en)2015-05-272018-09-25Apple Inc.Device voice control for selecting a displayed affordance
US10127220B2 (en)2015-06-042018-11-13Apple Inc.Language identification from short strings
US10356243B2 (en)2015-06-052019-07-16Apple Inc.Virtual assistant aided communication with 3rd party service in a communication session
US10101822B2 (en)2015-06-052018-10-16Apple Inc.Language input correction
US10186254B2 (en)2015-06-072019-01-22Apple Inc.Context-based endpoint detection
US10255907B2 (en)2015-06-072019-04-09Apple Inc.Automatic accent detection using acoustic models
US11025565B2 (en)2015-06-072021-06-01Apple Inc.Personalized prediction of responses for instant messaging
US10671428B2 (en)2015-09-082020-06-02Apple Inc.Distributed personal assistant
US10747498B2 (en)2015-09-082020-08-18Apple Inc.Zero latency digital assistant
US11500672B2 (en)2015-09-082022-11-15Apple Inc.Distributed personal assistant
US9697820B2 (en)2015-09-242017-07-04Apple Inc.Unit-selection text-to-speech synthesis using concatenation-sensitive neural networks
US11010550B2 (en)2015-09-292021-05-18Apple Inc.Unified language modeling framework for word prediction, auto-completion and auto-correction
US10366158B2 (en)2015-09-292019-07-30Apple Inc.Efficient word encoding for recurrent neural network language models
US11587559B2 (en)2015-09-302023-02-21Apple Inc.Intelligent device identification
US10691473B2 (en)2015-11-062020-06-23Apple Inc.Intelligent automated assistant in a messaging environment
US11526368B2 (en)2015-11-062022-12-13Apple Inc.Intelligent automated assistant in a messaging environment
US10049668B2 (en)2015-12-022018-08-14Apple Inc.Applying neural network language models to weighted finite state transducers for automatic speech recognition
US10223066B2 (en)2015-12-232019-03-05Apple Inc.Proactive assistance based on dialog communication between devices
US10446143B2 (en)2016-03-142019-10-15Apple Inc.Identification of voice inputs providing credentials
US9934775B2 (en)2016-05-262018-04-03Apple Inc.Unit-selection text-to-speech synthesis based on predicted concatenation parameters
US9972304B2 (en)2016-06-032018-05-15Apple Inc.Privacy preserving distributed evaluation framework for embedded personalized systems
US10049663B2 (en)2016-06-082018-08-14Apple, Inc.Intelligent automated assistant for media exploration
US11069347B2 (en)2016-06-082021-07-20Apple Inc.Intelligent automated assistant for media exploration
US10354011B2 (en)2016-06-092019-07-16Apple Inc.Intelligent automated assistant in a home environment
US10509862B2 (en)2016-06-102019-12-17Apple Inc.Dynamic phrase expansion of language input
US10192552B2 (en)2016-06-102019-01-29Apple Inc.Digital assistant providing whispered speech
US10733993B2 (en)2016-06-102020-08-04Apple Inc.Intelligent digital assistant in a multi-tasking environment
US11037565B2 (en)2016-06-102021-06-15Apple Inc.Intelligent digital assistant in a multi-tasking environment
US10067938B2 (en)2016-06-102018-09-04Apple Inc.Multilingual word prediction
US10490187B2 (en)2016-06-102019-11-26Apple Inc.Digital assistant providing automated status report
US10269345B2 (en)2016-06-112019-04-23Apple Inc.Intelligent task discovery
US10521466B2 (en)2016-06-112019-12-31Apple Inc.Data driven natural language event detection and classification
US10089072B2 (en)2016-06-112018-10-02Apple Inc.Intelligent device arbitration and control
US10297253B2 (en)2016-06-112019-05-21Apple Inc.Application integration with a digital assistant
US11152002B2 (en)2016-06-112021-10-19Apple Inc.Application integration with a digital assistant
US10043516B2 (en)2016-09-232018-08-07Apple Inc.Intelligent automated assistant
US10553215B2 (en)2016-09-232020-02-04Apple Inc.Intelligent automated assistant
US11281993B2 (en)2016-12-052022-03-22Apple Inc.Model and ensemble compression for metric learning
US10593346B2 (en)2016-12-222020-03-17Apple Inc.Rank-reduced token representation for automatic speech recognition
US10332518B2 (en)2017-05-092019-06-25Apple Inc.User interface for correcting recognition errors
US10755703B2 (en)2017-05-112020-08-25Apple Inc.Offline personal assistant
US10791176B2 (en)2017-05-122020-09-29Apple Inc.Synchronization and task delegation of a digital assistant
US11405466B2 (en)2017-05-122022-08-02Apple Inc.Synchronization and task delegation of a digital assistant
US10789945B2 (en)2017-05-122020-09-29Apple Inc.Low-latency intelligent automated assistant
US10410637B2 (en)2017-05-122019-09-10Apple Inc.User-specific acoustic models
US10810274B2 (en)2017-05-152020-10-20Apple Inc.Optimizing dialogue policy decisions for digital assistants using implicit feedback
US10482874B2 (en)2017-05-152019-11-19Apple Inc.Hierarchical belief states for digital assistants
US11217255B2 (en)2017-05-162022-01-04Apple Inc.Far-field extension for digital assistant services
US11545142B2 (en)*2019-05-102023-01-03Google LlcUsing context information with end-to-end models for speech recognition
US20200357388A1 (en)*2019-05-102020-11-12Google LlcUsing Context Information With End-to-End Models for Speech Recognition
US11551695B1 (en)*2020-05-132023-01-10Amazon Technologies, Inc.Model training system for custom speech-to-text models
US12118984B2 (en)2020-11-112024-10-15Rovi Guides, Inc.Systems and methods to resolve conflicts in conversations

Also Published As

Publication numberPublication date
US20050055210A1 (en)2005-03-10

Similar Documents

PublicationPublication DateTitle
US7308404B2 (en)Method and apparatus for speech recognition using a dynamic vocabulary
US11676575B2 (en)On-device learning in a hybrid speech processing system
CN111710333B (en)Method and system for generating speech transcription
US11016968B1 (en)Mutation architecture for contextual data aggregator
US7831428B2 (en)Speech index pruning
EP2259252B1 (en)Speech recognition method for selecting a combination of list elements via a speech input
EP2453436B1 (en)Automatic language model update
US9805722B2 (en)Interactive speech recognition system
US10170107B1 (en)Extendable label recognition of linguistic input
US7574356B2 (en)System and method for spelling recognition using speech and non-speech input
JP5241840B2 (en) Computer-implemented method and information retrieval system for indexing and retrieving documents in a database
US7831425B2 (en)Time-anchored posterior indexing of speech
US20110071827A1 (en)Generation and selection of speech recognition grammars for conducting searches
US20060265222A1 (en)Method and apparatus for indexing speech
US10872601B1 (en)Natural language processing
EP1470547A2 (en)System and method for a spoken language interface to a large database of changing records
KR102851303B1 (en) Systems and methods for adaptive proper name entity recognition and understanding
US20050004799A1 (en)System and method for a spoken language interface to a large database of changing records
CN108351876A (en)System and method for point of interest identification
JP5326169B2 (en) Speech data retrieval system and speech data retrieval method
JP5360414B2 (en) Keyword extraction model learning system, method and program
US11430434B1 (en)Intelligent privacy protection mediation
US8055693B2 (en)Method for retrieving items represented by particles from an information database
Wang et al.Voice search
WangMandarin spoken document retrieval based on syllable lattice matching

Legal Events

DateCodeTitleDescription
ASAssignment

Owner name:SRI INTERNATIONAL, CALIFORNIA

Free format text:ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:VENKATARAMAN, ANAND;FRANCO, HORACIO E.;BERCOW, DOUGLAS A.;REEL/FRAME:015416/0483;SIGNING DATES FROM 20041012 TO 20041102

FEPPFee payment procedure

Free format text:PAT HOLDER NO LONGER CLAIMS SMALL ENTITY STATUS, ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: STOL); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

STCFInformation on status: patent grant

Free format text:PATENTED CASE

ASAssignment

Owner name:NUANCE COMMUNICATIONS, INC., MASSACHUSETTS

Free format text:ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:SRI INTERNATIONAL;REEL/FRAME:021679/0102

Effective date:20080623

FPAYFee payment

Year of fee payment:4

FPAYFee payment

Year of fee payment:8

MAFPMaintenance fee payment

Free format text:PAYMENT OF MAINTENANCE FEE, 12TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1553); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

Year of fee payment:12


[8]ページ先頭

©2009-2025 Movatter.jp