Embodiment
Before describing in detail, should be noted that the present invention mainly is the method step of relevant TTS conversion and the combination of part of appliance according to text voice of the present invention (TTS) switch technology.Therefore, these parts of appliance and method step show in place with ordinary symbol in the accompanying drawings, in order to be unlikely to, those details relevant have only been expressed with understanding the present invention owing to the details that is easy to expect makes the disclosure smudgy concerning benefiting from those of ordinary skills described here.
With reference to Fig. 1, represented the electrical diagram of communication system according to an embodiment of theinvention.Communication system 100 comprisesfirst device 105, and it is the customer set up in thiscommunicator 100, all personal communicator in this way, and one of them example is a cell phone.Customer set up 105 is coupled tocordless communication network 110, and thiscordless communication network 110 is coupled toWWW 115 again, and WWW is the wired Information Network that is connected with optics of main use certainly, but it can comprise some wireless connections.Second device 120 also is coupled toWWW 115, and thissecond device 120 is server units.
Customer set up 105 comprisesprocessor 115, and thisprocessor 115 is coupled to storer 150,loudspeaker 160, network interface 164 and user interface 170.Processor 155 can be microprocessor, digital signal processor or any other processor that is suitable for use in customer set up 105.The programmed instruction ofstorer 150access control processors 155 operation, and can use conventional instruction with provide a plurality of basically independently the mode of function realize.In these functions some are those functions that typically are classified as application.Many functions can be conventional, but some function described here is exclusive at least in some aspects.Storer 150 is also stored temporarily, short-term and long-term information, for example is cache memory and form.Therefore,storer 150 can comprise the memory storage of different hardware type, for example random access memory, programmable read only memory, flash memory etc.Loudspeaker 160 can be the loudspeaker that can find in the conventional customer set up such such as cellphone.Network interface 165 can be the radio set that can find in cell phone, and perhaps when customer set up for example was the bluetooth coupling arrangement, network interface was a bluetootho transceiver.As an alternative,network interface 165 can be to be used for customer set up through personal area network operation to the Wireline interface that is connected to the customer set up (not shown) of WWW byradio net 110, perhaps can be the Wireline interface that is used to be directly connected to the customer set up on theWWW 115 as an alternative.As an alternative,WWW 115 can be a sizable private, for example supports the company's net of several thousand users in someareas.User interface 170 can be little or big display and little or bigkeyboard.Server unit 120 preferably has the device of suitable large storage capacity with respect to customer set up 105.For example, server typically has very big hard disk drive or a plurality of driver (for example, the storer of 20 GB).
With reference to Fig. 2, represented the programming model of the customer set up 105 of the embodiment of the invention described according to reference Fig. 1.Thesynthetic dictionary 220 ofapplication 205 and literal is coupled to speech engine 210.Network transmission function 225 is coupled to speech engine 210.Using 205 is one of several software application that can be coupled tospeech engine 210, and be that produce will be by the application of the synthetic texts set ofspeech engine 210,speech engine 210 producessimulating signals 211 so that use theloudspeaker 160 of customer set up 105 to provide sound to represent.Speech engine 210 can embed function in its programming instruction anddata storer 150 in, and this function is used for directly synthesizing from the monogram of a literal sound of this literal and represents.As everyone knows, thisly synthetic typically sound very false and may often be wrong, thereby make the user misread these words.Therefore, provide literal to synthesizedictionary 220, it can comprise common language set and relevant word pronunciation set, and this has reduced the misunderstanding of user to literal.In fact thesynthetic dictionary 220 of literal can comprise a plurality of literal set that combine.For example, default collection to all immovable common language of different application and pronunciation thereof can make up with a literal relevant with application-specific and pronunciation set thereof, and wherein this literal and the pronunciation set thereof relevant with this application-specific just is incorporated in this dictionary when this application-specific of operation.This may be effectively when pre-determining one group of different application and speech engine and use together.For example, phone dialer can provide different literal tospeech engine 210 rather than to web browser.But this method may cause the memory space aspect to go wrong, and stores these literal and pronunciation thereof because storer must be associated with each application and about which literal knowledge of default storage indictionary 220 just in time.But the synthetic dictionary of literal that is arranged in customer set up may be subject to its memory span (for example, less than gigabit) fully.
In one embodiment of the invention, an application can provide a text set (not relevant pronunciation) to thesynthetic dictionary 220 of the literal in the storer 150.The set of text literal can be that this uses normally used text set, they be estimate when this applications moves than short-term (for example, from less than one second to many minutes) in this application literal that will use, perhaps the set of text literal can be to comprise the set of the text of speech text as an alternative.The text that speech text in the context of this application is intended to provide by the loudspeaker order is at once gathered.For example, the sentence of preparing in response to the user imports telephone number to user prompt " The number entered is 847-576-9999 " is a speech text.Numeral the 0,1,2,3,4,5,6,7,8, the 9th, the example of text, they more may be the digital collections that the address applications expection will be used.By following technology, the long-range literal that obtains customer set up synthesizes unexistent word pronunciation in the dictionary 220.For this purpose,speech engine 210 is coupled tonetwork transmission function 225, with the literal that does not have in thesynthetic dictionary 220 of the literal that is sent in customer set up on network.
With reference to Fig. 3, represented method according to the phonetic synthesis of the embodiment of theinvention.In step 305, the function (for example speech engine 210) relevant with the synthetic dictionary of literal 220 accepted text literal and gathered, no matter it is speech text or other, thesynthetic dictionary 220 of literal determines instep 310 whether thesynthetic dictionary 220 of literal of current configuration comprises the pronunciation of text literal set.The resulting text subclass that does not find pronunciation comprises invalid literal subclass (when one or more such literal).Customer set up 105 sends text literal invalid subset to server unit instep 315 by network then.With reference in the described example of Fig. 1, this network comprisesradio net 110 andWWW 115 in the above, but this network can also include spider lines and not have wirelessnetwork.Server unit 120 receives text literal invalid subset instep 320, and, for producing word pronunciation, this invalid text set gathers instep 325 by with reference to theserver unit 120 interior or synthetic dictionaries ofserver unit 120 operable large-scale literal.By being positioned at server or other computing machines that is typically fixed network device, this literal compound word allusion quotation enough big (for example, greater than gigabit) to needed all literal of all customer set ups that comprise that almost it is served.Server unit 120 preferably produces this literal pronunciation set to comprise all texts in the text invalid subset.This literal pronunciation set may not comprise a text certainly.This literal pronunciation set for server produced has each pronunciation that is associated with thesetexts.In step 330, server sends this literal pronunciation set to customer set up 105 by network (perhaps depending on the circumstances, by a plurality of networks).
When customer set up 105 whenstep 335 receives this literal pronunciation set, customer set up 105 determines instep 337 whether this literal pronunciation set relevant with aspeech text.In step 340, determine whether to provide (synthesizing) this speech text.When also not synthesizing this speech text, use this literal pronunciation set that the synthetic of speech text is provided atstep 345speech engine 210, thus the minimizing translation error.(surpass in the minimum prescribed situation of time delay when synthesized this speech text instep 340 as the delay that is receiving this literal pronunciation set, perhaps before receiving this literal pronunciation set, receive in the situation of the order that provides this speech text), perhaps when determining that instep 337 this literal pronunciation set is not when being used for a speech text, customer set up 105 determines instep 350 whether this pronunciation set is stored in thestorer 150 of customer set up 105, and whereinstorer 150 synthesizes replenishing of dictionary as the literal to customer set up 105.Sort memory can be used for the schedule time, this time for example is when the application of this literal of request pronunciation set in use the time, perhaps for example, and based onstorer 150 capacity limitations, perhaps for example, based on using and the priority of the memory span limit and/or time etc.When planning to be stored in thestorer 150, store them to this pronunciation set in step 355.This processing finishes instep 360.
Should be realized that the invention provides a kind of being used for provides the unique technique of text pronunciation at the customer set up with the synthetic dictionary capacity (for example less than gigabit) of limited literal, thereby has reduced the misunderstanding mistake.
In the explanation in front, the present invention and benefit and advantage have been described with reference to specific embodiment.But those of ordinary skills should be realized that, can carry out various modifications and change and do not break away from proposed invention scope in the following claim.Correspondingly, it is illustrative and not restrictive that instructions and accompanying drawing should be regarded as, and all this modifications all should be within the scope of the present invention.Benefit, advantage, the scheme of dealing with problems and can cause any benefit, advantage and solution occurs or the tangible more any element that becomes should not be interpreted as key, needs or the essential feature or the element of all or any claim.
Term " comprises " as used herein, " comprising " or its any distortion should cover comprising of nonexcludability, not only comprises those key elements but also can comprise and clearly not listing or other elements that this processing, method, technology or equipment are intrinsic so that comprise processing, method, technology or the equipment of a series of key elements.
Employed " set " meaning is nonempty set in the following claim.Term " another " is defined as at least one second or more as used herein.Term " comprises " and/or " having " is defined as comprising as used herein.Term " coupling " is defined as connection as used herein, also needs not to be mechanically but need not to be directly.Term " program " is defined as the instruction sequence that design is used for carrying out on computer system as used herein." program " or " computer program " can comprise other instruction sequences that subroutine, function, process, object method, object are realized, can be carried out application, java small routine (applet), servlet, source code, object code, shared library/dynamic load library and/or be designed to carry out on computer system.