Movatterモバイル変換


[0]ホーム

URL:


US20040148171A1 - Method and apparatus for speech synthesis without prosody modification - Google Patents

Method and apparatus for speech synthesis without prosody modification
Download PDF

Info

Publication number
US20040148171A1
US20040148171A1US10/662,985US66298503AUS2004148171A1US 20040148171 A1US20040148171 A1US 20040148171A1US 66298503 AUS66298503 AUS 66298503AUS 2004148171 A1US2004148171 A1US 2004148171A1
Authority
US
United States
Prior art keywords
speech
unit
indication
context
speech unit
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US10/662,985
Inventor
Min Chu
Hu Peng
Yong Zhao
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Microsoft Technology Licensing LLC
Original Assignee
Microsoft Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Microsoft CorpfiledCriticalMicrosoft Corp
Priority to US10/662,985priorityCriticalpatent/US20040148171A1/en
Assigned to MICROSOFT CORPORATIONreassignmentMICROSOFT CORPORATIONASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS).Assignors: PENG, HU, CHU, MIN, ZHAO, YONG
Publication of US20040148171A1publicationCriticalpatent/US20040148171A1/en
Assigned to MICROSOFT TECHNOLOGY LICENSING, LLCreassignmentMICROSOFT TECHNOLOGY LICENSING, LLCASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS).Assignors: MICROSOFT CORPORATION
Abandonedlegal-statusCriticalCurrent

Links

Images

Classifications

Definitions

Landscapes

Abstract

A speech synthesizer is provided that concatenates stored samples of speech units without modifying the prosody of the samples. The present invention is able to achieve a high level of naturalness in synthesized speech with a carefully designed training speech corpus by storing samples based on the prosodic and phonetic context in which they occur. In particular, some embodiments of the present invention limit the training text to those sentences that will produce the most frequent sets of prosodic contexts for each speech unit. Further embodiments of the present invention also provide a multi-tier selection mechanism for selecting a set of samples that will produce the most natural sounding speech.

Description

Claims (30)

What is claimed is:
1. A method for synthesizing speech, the method comprising:
generating a training context vector for each of a set of training speech units in a training speech corpus, each training context vector indicating the prosodic context of a training speech unit in the training speech corpus;
indexing a set of speech segments associated with a set of training speech units based on the context vectors for the training speech units;
generating an input context vector for each of a set of input speech units in an input text, each input context vector indicating the prosodic context of an input speech unit in the input text;
using the input context vectors to find a speech segment for each input speech unit; and
concatenating the found speech segments to form a synthesized speech signal.
2. The method ofclaim 1 wherein the each context vector comprises a position-in-phrase coordinate indicating the position of the speech unit in a phrase.
3. The method ofclaim 1 wherein the each context vector comprises a position-in-word coordinate indicating the position of the speech unit in a word.
4. The method ofclaim 1 wherein the each context vector comprises a left phonetic coordinate indicating a category for the phoneme to the left of the speech unit.
5. The method ofclaim 1 wherein the each context vector comprises a right phonetic coordinate indicating a category for the phoneme to the right of the speech unit.
6. The method ofclaim 1 wherein the each context vector comprises a left tonal coordinate indicating a category for the tone of the speech unit to the left of the speech unit.
7. The method ofclaim 1 wherein the each context vector comprises a right tonal coordinate indicating a category for the tone of the speech unit to the right of the speech unit.
8. The method ofclaim 1 wherein the each context vector comprises a coordinate indicating a coupling degree of pitch, duration and/or energy with a neighboring unit.
9. The method ofclaim 1 the each context vector comprises a coordinate indicating a level of stress of a speech unit.
10. The method ofclaim 1 wherein indexing a set of speech segments comprises generating a decision tree based on the training context vectors.
11. The method ofclaim 10 wherein using the input context vectors to find a speech segment comprises searching the decision tree using the input context vector.
12. The method ofclaim 11 wherein searching the decision tree comprises:
identifying a leaf in the tree for each input context vector, each leaf comprising at least one candidate speech segments; and
selecting one candidate speech segment in each leaf node, wherein if there is more than one candidate speech segment on the node The selection is based on a cost function.
13. The method ofclaim 12 wherein the cost function comprises a distance between the input context vector and a training context vector associated with a speech segment.
14. The method ofclaim 13 wherein the cost function further comprises a smoothness cost that is based on a candidate speech segment of at least one neighboring speech unit.
15. The method ofclaim 14 wherein the smoothness cost gives preference to selecting a series of speech segments for a series of input context vectors if the series of speech segments occurred in series in the training speech corpus.
16. The method ofclaim 1 wherein the context vector comprises one or more higher order coordinates being combinations of at least two factors from a set of factors including:
an indication of a position of a speech unit in a phrase;
an indication of a position of a speech unit in a word;
an indication of a category for a phoneme preceding a speech unit;
an indication of a category for a phoneme following a speech unit;
an indication of a category for tonal identity of the current speech unit;
an indication of a category for tonal identity of a preceding speech unit;
an indication of a category for tonal identity of a following speech unit;
an indication of a level of stress of a speech unit;
an indication of a coupling degree of pitch, duration and/or energy with a neighboring unit; and
an indication of a degree of spectral mismatch with a neighboring speech unit.
17. A method of selecting sentences for reading into a training speech corpus used in speech synthesis, the method comprising:
identifying a set of prosodic context information for each of a set of speech units;
determining a frequency of occurrence for each distinct context vector that appears in a very large text corpus;
using the frequency of occurrence of the context vectors to identify a list of necessary context vectors; and
selecting sentences in the large text corpus for reading into the training speech corpus, each selected sentence containing at least one necessary context vector.
18. The method ofclaim 17 wherein identifying a collection of prosodic context information sets as necessary context information sets comprises:
determining the frequency of occurrence of each prosodic context information set across a very large text corpus; and
identifying a collection of prosodic context information sets as necessary context information sets based on their frequency of occurrence.
19. The method ofclaim 18 wherein identifying a collection of prosodic context information sets as necessary context information sets further comprises:
sorting the context information sets by their frequency of occurrence in decreasing order;
determining a threshold, F, for accumulative frequency of top context vectors; and
selecting the top context vectors whose accumulative frequency is not smaller than F for each speech unit as necessary prosodic context information sets.
20. The method ofclaim 17 further comprising indexing only those speech segments that are associated with sentences in the smaller training text and wherein indexing comprises indexing using a decision tree.
21. The method ofclaim 20 wherein indexing further comprises indexing the speech segments in the decision tree based on information in the context information sets.
22. The method ofclaim 21 wherein the decision tree comprises leaf nodes and at least one leaf node comprises at least two speech segments for the same speech unit.
23. A method of selecting speech segments for concatenative speech synthesis, the method comprising:
parsing an input text into speech units;
identifying context information for each speech unit based on its location in the input text and at least one neighboring speech unit;
identifying a set of candidate speech segments for each speech unit based on the context information; and
identifying a sequence of speech segments from the candidate speech segments based in part on a smoothness cost between the speech segments.
24. The method ofclaim 23 wherein identifying a set of candidate speech segments for a speech unit comprises applying the context information for a speech unit to a decision tree to identify a leaf node containing candidate speech segments for the speech unit.
25. The method ofclaim 24 wherein identifying a set of candidate speech segments further comprises pruning some speech segments from a leaf node based on differences between the context information of the speech unit from the input text and context information associated with the speech segments.
26. The method ofclaim 23 wherein identifying a sequence of speech segments comprises using a smoothness cost that is based on whether two neighboring candidate speech segments appeared next to each other in a training corpus.
27. The method ofclaim 23 wherein identifying a sequence of speech segments comprises using an objective measure comprising one or more first order components from a set of factors comprising:
an indication of a position of a speech unit in a phrase;
an indication of a position of a speech unit in a word;
an indication of a category for a phoneme preceding a speech unit;
an indication of a category for a phoneme following a speech unit;
an indication of a category for tonal identity of the current speech unit;
an indication of a category for tonal identity of a preceding speech unit;
an indication of a category for tonal identity of a following speech unit;
an indication of a level of stress of a speech unit;
an indication of a coupling degree of pitch, duration and/or energy with a neighboring unit; and
an indication of a degree of spectral mismatch with a neighboring speech unit.
28. The method ofclaim 23 wherein identifying a sequence of speech segments comprises using an objective measure comprising one or more higher order components being combinations of at least two factors from a set of factors including:
an indication of a position of a speech unit in a phrase;
an indication of a position of a speech unit in a word;
an indication of a category for a phoneme preceding a speech unit;
an indication of a category for a phoneme following a speech unit;
an indication of a category for tonal identity of the current speech unit;
an indication of a category for tonal identity of a preceding speech unit;
an indication of a category for tonal identity of a following speech unit;
an indication of a level of stress of a speech unit;
an indication of a coupling degree of pitch, duration and/or energy with a neighboring unit; and
an indication of a degree of spectral mismatch with a neighboring speech unit.
29. The method ofclaim 24 wherein identifying a sequence of speech segments further comprises identifying the sequence based in part on differences between context information for the speech unit of the input text and context information associated with a candidate speech segment.
30. A computer-readable medium having computer executable instructions for synthesizing speech from speech segments based on speech units found in an input text, the speech being synthesized through a method comprising steps of:
identifying context information for each speech unit based on the prosodic structure of the input text;
identifying a set of candidate speech segments for each speech unit based on the context information;
identifying a sequence of speech segments from the candidate speech segments;
concatenating the sequence of speech segments without modifying the prosody of the speech segments to form the synthesized speech.
US10/662,9852000-12-042003-09-15Method and apparatus for speech synthesis without prosody modificationAbandonedUS20040148171A1 (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
US10/662,985US20040148171A1 (en)2000-12-042003-09-15Method and apparatus for speech synthesis without prosody modification

Applications Claiming Priority (3)

Application NumberPriority DateFiling DateTitle
US25116700P2000-12-042000-12-04
US09/850,527US6978239B2 (en)2000-12-042001-05-07Method and apparatus for speech synthesis without prosody modification
US10/662,985US20040148171A1 (en)2000-12-042003-09-15Method and apparatus for speech synthesis without prosody modification

Related Parent Applications (1)

Application NumberTitlePriority DateFiling Date
US09/850,527Continuation-In-PartUS6978239B2 (en)2000-12-042001-05-07Method and apparatus for speech synthesis without prosody modification

Publications (1)

Publication NumberPublication Date
US20040148171A1true US20040148171A1 (en)2004-07-29

Family

ID=26941450

Family Applications (3)

Application NumberTitlePriority DateFiling Date
US09/850,527Expired - Fee RelatedUS6978239B2 (en)2000-12-042001-05-07Method and apparatus for speech synthesis without prosody modification
US10/662,985AbandonedUS20040148171A1 (en)2000-12-042003-09-15Method and apparatus for speech synthesis without prosody modification
US11/030,208Expired - Fee RelatedUS7127396B2 (en)2000-12-042005-01-06Method and apparatus for speech synthesis without prosody modification

Family Applications Before (1)

Application NumberTitlePriority DateFiling Date
US09/850,527Expired - Fee RelatedUS6978239B2 (en)2000-12-042001-05-07Method and apparatus for speech synthesis without prosody modification

Family Applications After (1)

Application NumberTitlePriority DateFiling Date
US11/030,208Expired - Fee RelatedUS7127396B2 (en)2000-12-042005-01-06Method and apparatus for speech synthesis without prosody modification

Country Status (4)

CountryLink
US (3)US6978239B2 (en)
EP (1)EP1213705B1 (en)
AT (1)ATE354155T1 (en)
DE (1)DE60126564T2 (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US20100076768A1 (en)*2007-02-202010-03-25Nec CorporationSpeech synthesizing apparatus, method, and program
US20120072224A1 (en)*2009-08-072012-03-22Khitrov Mikhail VasilievichMethod of speech synthesis
US20130218569A1 (en)*2005-10-032013-08-22Nuance Communications, Inc.Text-to-speech user's voice cooperative server for instant messaging clients
US20130268275A1 (en)*2007-09-072013-10-10Nuance Communications, Inc.Speech synthesis system, speech synthesis program product, and speech synthesis method
US8775185B2 (en)2007-03-212014-07-08Vivotext Ltd.Speech samples library for text-to-speech and methods and apparatus for generating and using same
US9251782B2 (en)2007-03-212016-02-02Vivotext Ltd.System and method for concatenate speech samples within an optimal crossing point
US20160240215A1 (en)*2013-10-242016-08-18Bayerische Motoren Werke AktiengesellschaftSystem and Method for Text-to-Speech Performance Evaluation

Families Citing this family (169)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
EP2336899A3 (en)1999-03-192014-11-26Trados GmbHWorkflow management system
US7369994B1 (en)1999-04-302008-05-06At&T Corp.Methods and apparatus for rapid acoustic unit selection from a large speech corpus
US20060116865A1 (en)1999-09-172006-06-01Www.Uniscape.ComE-services translation utilizing machine translation and translation memory
US8645137B2 (en)2000-03-162014-02-04Apple Inc.Fast, language-independent method for user authentication by voice
US6978239B2 (en)*2000-12-042005-12-20Microsoft CorporationMethod and apparatus for speech synthesis without prosody modification
DE10117367B4 (en)*2001-04-062005-08-18Siemens Ag Method and system for automatically converting text messages into voice messages
GB0113581D0 (en)*2001-06-042001-07-25Hewlett Packard CoSpeech synthesis apparatus
GB0113587D0 (en)*2001-06-042001-07-25Hewlett Packard CoSpeech synthesis apparatus
GB2376394B (en)*2001-06-042005-10-26Hewlett Packard CoSpeech synthesis apparatus and selection method
US7263479B2 (en)*2001-10-192007-08-28Bbn Technologies Corp.Determining characteristics of received voice data packets to assist prosody analysis
US7574597B1 (en)2001-10-192009-08-11Bbn Technologies Corp.Encoding of signals to facilitate traffic analysis
KR100438826B1 (en)*2001-10-312004-07-05삼성전자주식회사System for speech synthesis using a smoothing filter and method thereof
US7483832B2 (en)*2001-12-102009-01-27At&T Intellectual Property I, L.P.Method and system for customizing voice translation of text to speech
US20030154080A1 (en)*2002-02-142003-08-14Godsey Sandra L.Method and apparatus for modification of audio input to a data processing system
US7136816B1 (en)*2002-04-052006-11-14At&T Corp.System and method for predicting prosodic parameters
KR100486734B1 (en)2003-02-252005-05-03삼성전자주식회사Method and apparatus for text to speech synthesis
US7496498B2 (en)*2003-03-242009-02-24Microsoft CorporationFront-end architecture for a multi-lingual text-to-speech system
US8103505B1 (en)*2003-11-192012-01-24Apple Inc.Method and apparatus for speech synthesis using paralinguistic variation
US7983896B2 (en)*2004-03-052011-07-19SDL Language TechnologyIn-context exact (ICE) matching
US7788098B2 (en)*2004-08-022010-08-31Nokia CorporationPredicting tone pattern information for textual information used in telecommunication systems
US7869999B2 (en)*2004-08-112011-01-11Nuance Communications, Inc.Systems and methods for selecting from multiple phonectic transcriptions for text-to-speech synthesis
KR101056567B1 (en)*2004-09-232011-08-11주식회사 케이티 Apparatus and Method for Selecting Synthesis Unit in Corpus-based Speech Synthesizer
JP2007024960A (en)*2005-07-122007-02-01Internatl Business Mach Corp <Ibm>System, program and control method
US8677377B2 (en)2005-09-082014-03-18Apple Inc.Method and apparatus for building an intelligent automated assistant
US20070203706A1 (en)*2005-12-302007-08-30Inci OzkaragozVoice analysis tool for creating database used in text to speech synthesis system
US8036894B2 (en)*2006-02-162011-10-11Apple Inc.Multi-unit approach to text-to-speech synthesis
US7584104B2 (en)*2006-09-082009-09-01At&T Intellectual Property Ii, L.P.Method and system for training a text-to-speech synthesis system using a domain-specific speech database
US9318108B2 (en)2010-01-182016-04-19Apple Inc.Intelligent automated assistant
US8027837B2 (en)*2006-09-152011-09-27Apple Inc.Using non-speech sounds during text-to-speech synthesis
US8521506B2 (en)2006-09-212013-08-27Sdl PlcComputer-implemented method, computer software and apparatus for use in a translation system
US20080077407A1 (en)*2006-09-262008-03-27At&T Corp.Phonetically enriched labeling in unit selection speech synthesis
CN101202041B (en)*2006-12-132011-01-05富士通株式会社Method and device for making words using Chinese rhythm words
CA2661890C (en)*2007-03-072016-07-12International Business Machines CorporationSpeech synthesis
US8977255B2 (en)2007-04-032015-03-10Apple Inc.Method and system for operating a multi-function portable electronic device using voice-activation
US8583438B2 (en)*2007-09-202013-11-12Microsoft CorporationUnnatural prosody detection in speech synthesis
US9053089B2 (en)2007-10-022015-06-09Apple Inc.Part-of-speech tagging using latent analogy
US8620662B2 (en)*2007-11-202013-12-31Apple Inc.Context-aware unit selection
US9330720B2 (en)2008-01-032016-05-03Apple Inc.Methods and apparatus for altering audio output signals
US8996376B2 (en)2008-04-052015-03-31Apple Inc.Intelligent text-to-speech conversion
US10496753B2 (en)2010-01-182019-12-03Apple Inc.Automatically adapting user interfaces for hands-free interaction
US20100030549A1 (en)2008-07-312010-02-04Lee Michael MMobile device having human language translation capability with positional feedback
WO2010067118A1 (en)2008-12-112010-06-17Novauris Technologies LimitedSpeech recognition involving a mobile device
GB2468278A (en)*2009-03-022010-09-08Sdl PlcComputer assisted natural language translation outputs selectable target text associated in bilingual corpus with input target text from partial translation
US9262403B2 (en)*2009-03-022016-02-16Sdl PlcDynamic generation of auto-suggest dictionary for natural language translation
US20120309363A1 (en)2011-06-032012-12-06Apple Inc.Triggering notifications associated with tasks items that represent tasks to perform
US9858925B2 (en)2009-06-052018-01-02Apple Inc.Using context information to facilitate processing of commands in a virtual assistant
US10241644B2 (en)2011-06-032019-03-26Apple Inc.Actionable reminder entries
US10241752B2 (en)2011-09-302019-03-26Apple Inc.Interface for a virtual digital assistant
US9431006B2 (en)2009-07-022016-08-30Apple Inc.Methods and apparatuses for automatic speech recognition
GB2474839A (en)*2009-10-272011-05-04Sdl PlcIn-context exact matching of lookup segment to translation memory source text
GB0922608D0 (en)*2009-12-232010-02-10Vratskides AlexiosMessage optimization
US10705794B2 (en)2010-01-182020-07-07Apple Inc.Automatically adapting user interfaces for hands-free interaction
US10553209B2 (en)2010-01-182020-02-04Apple Inc.Systems and methods for hands-free notification summaries
US10679605B2 (en)2010-01-182020-06-09Apple Inc.Hands-free list-reading by intelligent automated assistant
US10276170B2 (en)2010-01-182019-04-30Apple Inc.Intelligent automated assistant
DE112011100329T5 (en)2010-01-252012-10-31Andrew Peter Nelson Jerram Apparatus, methods and systems for a digital conversation management platform
US8682667B2 (en)2010-02-252014-03-25Apple Inc.User profiling for selecting user specific voice input processing information
US8688435B2 (en)2010-09-222014-04-01Voice On The Go Inc.Systems and methods for normalizing input media
US10762293B2 (en)2010-12-222020-09-01Apple Inc.Using parts-of-speech tagging and named entity recognition for spelling correction
US9128929B2 (en)2011-01-142015-09-08Sdl Language TechnologiesSystems and methods for automatically estimating a translation time including preparation time in addition to the translation itself
US9262612B2 (en)2011-03-212016-02-16Apple Inc.Device access using voice authentication
TWI441163B (en)*2011-05-102014-06-11Univ Nat Chiao TungChinese speech recognition device and speech recognition method thereof
US10057736B2 (en)2011-06-032018-08-21Apple Inc.Active transport based notifications
US8994660B2 (en)2011-08-292015-03-31Apple Inc.Text correction processing
US10134385B2 (en)2012-03-022018-11-20Apple Inc.Systems and methods for name pronunciation
US9483461B2 (en)2012-03-062016-11-01Apple Inc.Handling speech synthesis of content for multiple languages
US9280610B2 (en)2012-05-142016-03-08Apple Inc.Crowd sourcing information to fulfill user requests
US10395270B2 (en)2012-05-172019-08-27Persado Intellectual Property LimitedSystem and method for recommending a grammar for a message campaign used by a message optimization system
US9721563B2 (en)2012-06-082017-08-01Apple Inc.Name recognition system
US10007724B2 (en)*2012-06-292018-06-26International Business Machines CorporationCreating, rendering and interacting with a multi-faceted audio cloud
US9495129B2 (en)2012-06-292016-11-15Apple Inc.Device, method, and user interface for voice-activated navigation and browsing of a document
US9576574B2 (en)2012-09-102017-02-21Apple Inc.Context-sensitive handling of interruptions by intelligent digital assistant
US9547647B2 (en)2012-09-192017-01-17Apple Inc.Voice-based media searching
US10638221B2 (en)2012-11-132020-04-28Adobe Inc.Time interval sound alignment
US10249321B2 (en)*2012-11-202019-04-02Adobe Inc.Sound rate modification
US10455219B2 (en)2012-11-302019-10-22Adobe Inc.Stereo correspondence and depth sensors
DE212014000045U1 (en)2013-02-072015-09-24Apple Inc. Voice trigger for a digital assistant
US9368114B2 (en)2013-03-142016-06-14Apple Inc.Context-sensitive handling of interruptions
WO2014144579A1 (en)2013-03-152014-09-18Apple Inc.System and method for updating an adaptive speech recognition model
AU2014233517B2 (en)2013-03-152017-05-25Apple Inc.Training an at least partial voice command system
WO2014197336A1 (en)2013-06-072014-12-11Apple Inc.System and method for detecting errors in interactions with a voice-based digital assistant
WO2014197334A2 (en)2013-06-072014-12-11Apple Inc.System and method for user-specified pronunciation of words for speech synthesis and recognition
US9582608B2 (en)2013-06-072017-02-28Apple Inc.Unified ranking with entropy-weighted information for phrase-based semantic auto-completion
WO2014197335A1 (en)2013-06-082014-12-11Apple Inc.Interpreting and acting upon commands that involve sharing information with remote devices
DE112014002747T5 (en)2013-06-092016-03-03Apple Inc. Apparatus, method and graphical user interface for enabling conversation persistence over two or more instances of a digital assistant
US10176167B2 (en)2013-06-092019-01-08Apple Inc.System and method for inferring user intent from speech inputs
AU2014278595B2 (en)2013-06-132017-04-06Apple Inc.System and method for emergency calls initiated by voice command
DE112014003653B4 (en)2013-08-062024-04-18Apple Inc. Automatically activate intelligent responses based on activities from remote devices
US9620105B2 (en)2014-05-152017-04-11Apple Inc.Analyzing audio input for efficient speech and music recognition
US10592095B2 (en)2014-05-232020-03-17Apple Inc.Instantaneous speaking of content on touch devices
US9502031B2 (en)2014-05-272016-11-22Apple Inc.Method for supporting dynamic grammars in WFST-based ASR
US9760559B2 (en)2014-05-302017-09-12Apple Inc.Predictive text input
US9633004B2 (en)2014-05-302017-04-25Apple Inc.Better resolution when referencing to concepts
CN110797019B (en)2014-05-302023-08-29苹果公司Multi-command single speech input method
US10170123B2 (en)2014-05-302019-01-01Apple Inc.Intelligent assistant for home automation
US9734193B2 (en)2014-05-302017-08-15Apple Inc.Determining domain salience ranking from ambiguous words in natural speech
US9842101B2 (en)2014-05-302017-12-12Apple Inc.Predictive conversion of language input
US9430463B2 (en)2014-05-302016-08-30Apple Inc.Exemplar-based natural language processing
US10078631B2 (en)2014-05-302018-09-18Apple Inc.Entropy-guided text prediction using combined word and character n-gram language models
US9785630B2 (en)2014-05-302017-10-10Apple Inc.Text prediction using combined word N-gram and unigram language models
US9715875B2 (en)2014-05-302017-07-25Apple Inc.Reducing the need for manual start/end-pointing and trigger phrases
US10289433B2 (en)2014-05-302019-05-14Apple Inc.Domain specific language for encoding assistant dialog
US10659851B2 (en)2014-06-302020-05-19Apple Inc.Real-time digital assistant knowledge updates
US9338493B2 (en)2014-06-302016-05-10Apple Inc.Intelligent automated assistant for TV user interactions
US10446141B2 (en)2014-08-282019-10-15Apple Inc.Automatic speech recognition based on user feedback
US9818400B2 (en)2014-09-112017-11-14Apple Inc.Method and apparatus for discovering trending terms in speech requests
US10789041B2 (en)2014-09-122020-09-29Apple Inc.Dynamic thresholds for always listening speech trigger
US9606986B2 (en)2014-09-292017-03-28Apple Inc.Integrated word N-gram and class M-gram language models
US10127911B2 (en)2014-09-302018-11-13Apple Inc.Speaker identification and unsupervised speaker adaptation techniques
US9886432B2 (en)2014-09-302018-02-06Apple Inc.Parsimonious handling of word inflection via categorical stem + suffix N-gram language models
US10074360B2 (en)2014-09-302018-09-11Apple Inc.Providing an indication of the suitability of speech recognition
US9668121B2 (en)2014-09-302017-05-30Apple Inc.Social reminders
US9646609B2 (en)2014-09-302017-05-09Apple Inc.Caching apparatus for serving phonetic pronunciations
US10552013B2 (en)2014-12-022020-02-04Apple Inc.Data detection
US9711141B2 (en)2014-12-092017-07-18Apple Inc.Disambiguating heteronyms in speech synthesis
US9865280B2 (en)2015-03-062018-01-09Apple Inc.Structured dictation using intelligent automated assistants
US9721566B2 (en)2015-03-082017-08-01Apple Inc.Competing devices responding to voice triggers
US9886953B2 (en)2015-03-082018-02-06Apple Inc.Virtual assistant activation
US10567477B2 (en)2015-03-082020-02-18Apple Inc.Virtual assistant continuity
US9899019B2 (en)2015-03-182018-02-20Apple Inc.Systems and methods for structured stem and suffix language models
US9842105B2 (en)2015-04-162017-12-12Apple Inc.Parsimonious continuous-space phrase representations for natural language processing
US10083688B2 (en)2015-05-272018-09-25Apple Inc.Device voice control for selecting a displayed affordance
US10127220B2 (en)2015-06-042018-11-13Apple Inc.Language identification from short strings
US10101822B2 (en)2015-06-052018-10-16Apple Inc.Language input correction
US9578173B2 (en)2015-06-052017-02-21Apple Inc.Virtual assistant aided communication with 3rd party service in a communication session
US11025565B2 (en)2015-06-072021-06-01Apple Inc.Personalized prediction of responses for instant messaging
US10255907B2 (en)2015-06-072019-04-09Apple Inc.Automatic accent detection using acoustic models
US10186254B2 (en)2015-06-072019-01-22Apple Inc.Context-based endpoint detection
US10671428B2 (en)2015-09-082020-06-02Apple Inc.Distributed personal assistant
US10747498B2 (en)2015-09-082020-08-18Apple Inc.Zero latency digital assistant
US9697820B2 (en)2015-09-242017-07-04Apple Inc.Unit-selection text-to-speech synthesis using concatenation-sensitive neural networks
US10366158B2 (en)2015-09-292019-07-30Apple Inc.Efficient word encoding for recurrent neural network language models
US11010550B2 (en)2015-09-292021-05-18Apple Inc.Unified language modeling framework for word prediction, auto-completion and auto-correction
US11587559B2 (en)2015-09-302023-02-21Apple Inc.Intelligent device identification
US10504137B1 (en)2015-10-082019-12-10Persado Intellectual Property LimitedSystem, method, and computer program product for monitoring and responding to the performance of an ad
US10691473B2 (en)2015-11-062020-06-23Apple Inc.Intelligent automated assistant in a messaging environment
US10049668B2 (en)2015-12-022018-08-14Apple Inc.Applying neural network language models to weighted finite state transducers for automatic speech recognition
US10832283B1 (en)2015-12-092020-11-10Persado Intellectual Property LimitedSystem, method, and computer program for providing an instance of a promotional message to a user based on a predicted emotional response corresponding to user characteristics
US10223066B2 (en)2015-12-232019-03-05Apple Inc.Proactive assistance based on dialog communication between devices
US10446143B2 (en)2016-03-142019-10-15Apple Inc.Identification of voice inputs providing credentials
US9934775B2 (en)2016-05-262018-04-03Apple Inc.Unit-selection text-to-speech synthesis based on predicted concatenation parameters
US9972304B2 (en)2016-06-032018-05-15Apple Inc.Privacy preserving distributed evaluation framework for embedded personalized systems
US10249300B2 (en)2016-06-062019-04-02Apple Inc.Intelligent list reading
US10049663B2 (en)2016-06-082018-08-14Apple, Inc.Intelligent automated assistant for media exploration
DK179309B1 (en)2016-06-092018-04-23Apple IncIntelligent automated assistant in a home environment
US10490187B2 (en)2016-06-102019-11-26Apple Inc.Digital assistant providing automated status report
US10586535B2 (en)2016-06-102020-03-10Apple Inc.Intelligent digital assistant in a multi-tasking environment
US10192552B2 (en)2016-06-102019-01-29Apple Inc.Digital assistant providing whispered speech
US10067938B2 (en)2016-06-102018-09-04Apple Inc.Multilingual word prediction
US10509862B2 (en)2016-06-102019-12-17Apple Inc.Dynamic phrase expansion of language input
DK179049B1 (en)2016-06-112017-09-18Apple IncData driven natural language event detection and classification
DK201670540A1 (en)2016-06-112018-01-08Apple IncApplication integration with a digital assistant
DK179343B1 (en)2016-06-112018-05-14Apple IncIntelligent task discovery
DK179415B1 (en)2016-06-112018-06-14Apple IncIntelligent device arbitration and control
US10043516B2 (en)2016-09-232018-08-07Apple Inc.Intelligent automated assistant
US10593346B2 (en)2016-12-222020-03-17Apple Inc.Rank-reduced token representation for automatic speech recognition
DK201770439A1 (en)2017-05-112018-12-13Apple Inc.Offline personal assistant
DK179496B1 (en)2017-05-122019-01-15Apple Inc. USER-SPECIFIC Acoustic Models
DK179745B1 (en)2017-05-122019-05-01Apple Inc. SYNCHRONIZATION AND TASK DELEGATION OF A DIGITAL ASSISTANT
DK201770432A1 (en)2017-05-152018-12-21Apple Inc.Hierarchical belief states for digital assistants
DK201770431A1 (en)2017-05-152018-12-20Apple Inc.Optimizing dialogue policy decisions for digital assistants using implicit feedback
DK179549B1 (en)2017-05-162019-02-12Apple Inc.Far-field extension for digital assistant services
US10635863B2 (en)2017-10-302020-04-28Sdl Inc.Fragment recall and adaptive automated translation
CN107945786B (en)*2017-11-272021-05-25北京百度网讯科技有限公司 Speech synthesis method and apparatus
US10817676B2 (en)2017-12-272020-10-27Sdl Inc.Intelligent routing services and systems
US11256867B2 (en)2018-10-092022-02-22Sdl Inc.Systems and methods of machine learning for digital assets and message creation
CN109754778B (en)*2019-01-172023-05-30平安科技(深圳)有限公司Text speech synthesis method and device and computer equipment
KR102637341B1 (en)*2019-10-152024-02-16삼성전자주식회사Method and apparatus for generating speech
US12314300B1 (en)*2023-12-282025-05-27Open Text Inc.Methods and systems of content integration for generative artificial intelligence

Citations (35)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US4718094A (en)*1984-11-191988-01-05International Business Machines Corp.Speech recognition system
US4979216A (en)*1989-02-171990-12-18Malsheen Bathsheba JText to speech synthesis system and method using context dependent vowel allophones
US5146405A (en)*1988-02-051992-09-08At&T Bell LaboratoriesMethods for part-of-speech determination and usage
US5384893A (en)*1992-09-231995-01-24Emerson & Stern Associates, Inc.Method and apparatus for speech synthesis based on prosodic analysis
US5440481A (en)*1992-10-281995-08-08The United States Of America As Represented By The Secretary Of The NavySystem and method for database tomography
US5592585A (en)*1995-01-261997-01-07Lernout & Hauspie Speech Products N.C.Method for electronically generating a spoken message
US5715367A (en)*1995-01-231998-02-03Dragon Systems, Inc.Apparatuses and methods for developing and using models for speech recognition
US5732395A (en)*1993-03-191998-03-24Nynex Science & TechnologyMethods for controlling the generation of speech from text representing names and addresses
US5839105A (en)*1995-11-301998-11-17Atr Interpreting Telecommunications Research LaboratoriesSpeaker-independent model generation apparatus and speech recognition apparatus each equipped with means for splitting state having maximum increase in likelihood
US5857169A (en)*1995-08-281999-01-05U.S. Philips CorporationMethod and system for pattern recognition based on tree organized probability densities
US5905972A (en)*1996-09-301999-05-18Microsoft CorporationProsodic databases holding fundamental frequency templates for use in speech synthesis
US5912989A (en)*1993-06-031999-06-15Nec CorporationPattern recognition with a tree structure used for reference pattern feature vectors or for HMM
US5933806A (en)*1995-08-281999-08-03U.S. Philips CorporationMethod and system for pattern recognition based on dynamically constructing a subset of reference vectors
US5937422A (en)*1997-04-151999-08-10The United States Of America As Represented By The National Security AgencyAutomatically generating a topic description for text and searching and sorting text by topic using the same
US6064960A (en)*1997-12-182000-05-16Apple Computer, Inc.Method and apparatus for improved duration modeling of phonemes
US6076060A (en)*1998-05-012000-06-13Compaq Computer CorporationComputer method and apparatus for translating text to sound
US6101470A (en)*1998-05-262000-08-08International Business Machines CorporationMethods for generating pitch and duration contours in a text to speech system
US6141642A (en)*1997-10-162000-10-31Samsung Electronics Co., Ltd.Text-to-speech apparatus and method for processing multiple languages
US6151576A (en)*1998-08-112000-11-21Adobe Systems IncorporatedMixing digitized speech and text using reliability indices
US6172675B1 (en)*1996-12-052001-01-09Interval Research CorporationIndirect manipulation of data using temporally related data, with particular application to manipulation of audio or audiovisual data
US6185533B1 (en)*1999-03-152001-02-06Matsushita Electric Industrial Co., Ltd.Generation and synthesis of prosody templates
US6230131B1 (en)*1998-04-292001-05-08Matsushita Electric Industrial Co., Ltd.Method for generating spelling-to-pronunciation decision tree
US6366883B1 (en)*1996-05-152002-04-02Atr Interpreting TelecommunicationsConcatenation of speech segments by use of a speech synthesizer
US6401060B1 (en)*1998-06-252002-06-04Microsoft CorporationMethod for typographical detection and replacement in Japanese text
US20020072908A1 (en)*2000-10-192002-06-13Case Eliot M.System and method for converting text-to-voice
US20020152073A1 (en)*2000-09-292002-10-17Demoortel JanCorpus-based prosody translation system
US6499014B1 (en)*1999-04-232002-12-24Oki Electric Industry Co., Ltd.Speech synthesis apparatus
US6505158B1 (en)*2000-07-052003-01-07At&T Corp.Synthesis-based pre-selection of suitable units for concatenative speech
US20030208355A1 (en)*2000-05-312003-11-06Stylianou Ioannis G.Stochastic modeling of spectral adjustment for high quality pitch modification
US6665641B1 (en)*1998-11-132003-12-16Scansoft, Inc.Speech synthesis using concatenation of speech waveforms
US6708152B2 (en)*1999-12-302004-03-16Nokia Mobile Phones LimitedUser interface for text to speech conversion
US6751592B1 (en)*1999-01-122004-06-15Kabushiki Kaisha ToshibaSpeech synthesizing apparatus, and recording medium that stores text-to-speech conversion program and can be read mechanically
US6829578B1 (en)*1999-11-112004-12-07Koninklijke Philips Electronics, N.V.Tone features for speech recognition
US6978239B2 (en)*2000-12-042005-12-20Microsoft CorporationMethod and apparatus for speech synthesis without prosody modification
US7010489B1 (en)*2000-03-092006-03-07International Business Mahcines CorporationMethod for guiding text-to-speech output timing using speech recognition markers

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
JP2000075878A (en)1998-08-312000-03-14Canon Inc Speech synthesis apparatus and method, and storage medium
US6871178B2 (en)*2000-10-192005-03-22Qwest Communications International, Inc.System and method for converting text-to-voice

Patent Citations (37)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US4718094A (en)*1984-11-191988-01-05International Business Machines Corp.Speech recognition system
US5146405A (en)*1988-02-051992-09-08At&T Bell LaboratoriesMethods for part-of-speech determination and usage
US4979216A (en)*1989-02-171990-12-18Malsheen Bathsheba JText to speech synthesis system and method using context dependent vowel allophones
US5384893A (en)*1992-09-231995-01-24Emerson & Stern Associates, Inc.Method and apparatus for speech synthesis based on prosodic analysis
US5440481A (en)*1992-10-281995-08-08The United States Of America As Represented By The Secretary Of The NavySystem and method for database tomography
US5890117A (en)*1993-03-191999-03-30Nynex Science & Technology, Inc.Automated voice synthesis from text having a restricted known informational content
US5732395A (en)*1993-03-191998-03-24Nynex Science & TechnologyMethods for controlling the generation of speech from text representing names and addresses
US5912989A (en)*1993-06-031999-06-15Nec CorporationPattern recognition with a tree structure used for reference pattern feature vectors or for HMM
US5715367A (en)*1995-01-231998-02-03Dragon Systems, Inc.Apparatuses and methods for developing and using models for speech recognition
US5592585A (en)*1995-01-261997-01-07Lernout & Hauspie Speech Products N.C.Method for electronically generating a spoken message
US5727120A (en)*1995-01-261998-03-10Lernout & Hauspie Speech Products N.V.Apparatus for electronically generating a spoken message
US5857169A (en)*1995-08-281999-01-05U.S. Philips CorporationMethod and system for pattern recognition based on tree organized probability densities
US5933806A (en)*1995-08-281999-08-03U.S. Philips CorporationMethod and system for pattern recognition based on dynamically constructing a subset of reference vectors
US5839105A (en)*1995-11-301998-11-17Atr Interpreting Telecommunications Research LaboratoriesSpeaker-independent model generation apparatus and speech recognition apparatus each equipped with means for splitting state having maximum increase in likelihood
US6366883B1 (en)*1996-05-152002-04-02Atr Interpreting TelecommunicationsConcatenation of speech segments by use of a speech synthesizer
US5905972A (en)*1996-09-301999-05-18Microsoft CorporationProsodic databases holding fundamental frequency templates for use in speech synthesis
US6172675B1 (en)*1996-12-052001-01-09Interval Research CorporationIndirect manipulation of data using temporally related data, with particular application to manipulation of audio or audiovisual data
US5937422A (en)*1997-04-151999-08-10The United States Of America As Represented By The National Security AgencyAutomatically generating a topic description for text and searching and sorting text by topic using the same
US6141642A (en)*1997-10-162000-10-31Samsung Electronics Co., Ltd.Text-to-speech apparatus and method for processing multiple languages
US6064960A (en)*1997-12-182000-05-16Apple Computer, Inc.Method and apparatus for improved duration modeling of phonemes
US6230131B1 (en)*1998-04-292001-05-08Matsushita Electric Industrial Co., Ltd.Method for generating spelling-to-pronunciation decision tree
US6076060A (en)*1998-05-012000-06-13Compaq Computer CorporationComputer method and apparatus for translating text to sound
US6101470A (en)*1998-05-262000-08-08International Business Machines CorporationMethods for generating pitch and duration contours in a text to speech system
US6401060B1 (en)*1998-06-252002-06-04Microsoft CorporationMethod for typographical detection and replacement in Japanese text
US6151576A (en)*1998-08-112000-11-21Adobe Systems IncorporatedMixing digitized speech and text using reliability indices
US6665641B1 (en)*1998-11-132003-12-16Scansoft, Inc.Speech synthesis using concatenation of speech waveforms
US6751592B1 (en)*1999-01-122004-06-15Kabushiki Kaisha ToshibaSpeech synthesizing apparatus, and recording medium that stores text-to-speech conversion program and can be read mechanically
US6185533B1 (en)*1999-03-152001-02-06Matsushita Electric Industrial Co., Ltd.Generation and synthesis of prosody templates
US6499014B1 (en)*1999-04-232002-12-24Oki Electric Industry Co., Ltd.Speech synthesis apparatus
US6829578B1 (en)*1999-11-112004-12-07Koninklijke Philips Electronics, N.V.Tone features for speech recognition
US6708152B2 (en)*1999-12-302004-03-16Nokia Mobile Phones LimitedUser interface for text to speech conversion
US7010489B1 (en)*2000-03-092006-03-07International Business Mahcines CorporationMethod for guiding text-to-speech output timing using speech recognition markers
US20030208355A1 (en)*2000-05-312003-11-06Stylianou Ioannis G.Stochastic modeling of spectral adjustment for high quality pitch modification
US6505158B1 (en)*2000-07-052003-01-07At&T Corp.Synthesis-based pre-selection of suitable units for concatenative speech
US20020152073A1 (en)*2000-09-292002-10-17Demoortel JanCorpus-based prosody translation system
US20020072908A1 (en)*2000-10-192002-06-13Case Eliot M.System and method for converting text-to-voice
US6978239B2 (en)*2000-12-042005-12-20Microsoft CorporationMethod and apparatus for speech synthesis without prosody modification

Cited By (11)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US20130218569A1 (en)*2005-10-032013-08-22Nuance Communications, Inc.Text-to-speech user's voice cooperative server for instant messaging clients
US9026445B2 (en)*2005-10-032015-05-05Nuance Communications, Inc.Text-to-speech user's voice cooperative server for instant messaging clients
US20100076768A1 (en)*2007-02-202010-03-25Nec CorporationSpeech synthesizing apparatus, method, and program
US8630857B2 (en)*2007-02-202014-01-14Nec CorporationSpeech synthesizing apparatus, method, and program
US8775185B2 (en)2007-03-212014-07-08Vivotext Ltd.Speech samples library for text-to-speech and methods and apparatus for generating and using same
US9251782B2 (en)2007-03-212016-02-02Vivotext Ltd.System and method for concatenate speech samples within an optimal crossing point
US20130268275A1 (en)*2007-09-072013-10-10Nuance Communications, Inc.Speech synthesis system, speech synthesis program product, and speech synthesis method
US9275631B2 (en)*2007-09-072016-03-01Nuance Communications, Inc.Speech synthesis system, speech synthesis program product, and speech synthesis method
US20120072224A1 (en)*2009-08-072012-03-22Khitrov Mikhail VasilievichMethod of speech synthesis
US8942983B2 (en)*2009-08-072015-01-27Speech Technology Centre, LimitedMethod of speech synthesis
US20160240215A1 (en)*2013-10-242016-08-18Bayerische Motoren Werke AktiengesellschaftSystem and Method for Text-to-Speech Performance Evaluation

Also Published As

Publication numberPublication date
DE60126564D1 (en)2007-03-29
US20020099547A1 (en)2002-07-25
ATE354155T1 (en)2007-03-15
US7127396B2 (en)2006-10-24
US6978239B2 (en)2005-12-20
EP1213705A2 (en)2002-06-12
EP1213705B1 (en)2007-02-14
US20050119891A1 (en)2005-06-02
EP1213705A3 (en)2004-12-22
DE60126564T2 (en)2007-10-31

Similar Documents

PublicationPublication DateTitle
US20040148171A1 (en)Method and apparatus for speech synthesis without prosody modification
US7386451B2 (en)Optimization of an objective measure for estimating mean opinion score of synthesized speech
US7024362B2 (en)Objective measure for estimating mean opinion score of synthesized speech
US10453442B2 (en)Methods employing phase state analysis for use in speech synthesis and recognition
US7263488B2 (en)Method and apparatus for identifying prosodic word boundaries
US6173263B1 (en)Method and system for performing concatenative speech synthesis using half-phonemes
US7124083B2 (en)Method and system for preselection of suitable units for concatenative speech
US6185533B1 (en)Generation and synthesis of prosody templates
US6845358B2 (en)Prosody template matching for text-to-speech systems
US20080059190A1 (en)Speech unit selection using HMM acoustic models
CN1971708A (en)Prosodic control rule generation method and apparatus, and speech synthesis method and apparatus
US7328157B1 (en)Domain adaptation for TTS systems
Bettayeb et al.Speech synthesis system for the holy quran recitation.
Chu et al.A concatenative Mandarin TTS system without prosody model and prosody modification.
JP4532862B2 (en) Speech synthesis method, speech synthesizer, and speech synthesis program
Hansakunbuntheung et al.Space reduction of speech corpus based on quality perception for unit selection speech synthesis
EP1777697B1 (en)Method for speech synthesis without prosody modification
Houidhek et al.Evaluation of speech unit modelling for HMM-based speech synthesis for Arabic
NgSurvey of data-driven approaches to Speech Synthesis
JPH11249678A (en)Voice synthesizer and its text analytic method
JP3571925B2 (en) Voice information processing device
Kawa et al.Development of a text-to-speech system for Japanese based on waveform splicing
Wongpatikaseree et al.A real-time Thai speech synthesizer on a mobile device
Janicki et al.Taking advantage of pronunciation variation in unit selection speech synthesis for Polish
Narendra et al.Syllable specific target cost formulation for syllable based text-to-speech synthesis in Bengali

Legal Events

DateCodeTitleDescription
ASAssignment

Owner name:MICROSOFT CORPORATION, WASHINGTON

Free format text:ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:CHU, MIN;PENG, HU;ZHAO, YONG;REEL/FRAME:015194/0592;SIGNING DATES FROM 20040408 TO 20040409

STCBInformation on status: application discontinuation

Free format text:ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION

ASAssignment

Owner name:MICROSOFT TECHNOLOGY LICENSING, LLC, WASHINGTON

Free format text:ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:MICROSOFT CORPORATION;REEL/FRAME:034766/0001

Effective date:20141014


[8]ページ先頭

©2009-2025 Movatter.jp