This is a continuation-in-part of application Ser. No. 09/409,000, filed Sep. 29, 1999, entitled “System and Apparatus For Dynamically Generating Audible Notices From An Information Network.”
BACKGROUND OF THE INVENTION1. Field of the Invention
This invention relates generally to devices for browsing information on an information network. More specifically, this invention relates to an apparatus and system for receiving personalized information from an information network in audio format using distributed text-to-speech processing.
2. Description of the Related Art
A number of different information networks are available that allow access to information contained on their computers, with the Internet being one that is generally known to the public. The capabilities, usefulness, and amount of information available from information networks are ever-increasing. Further, users often subscribe to one or more information services that are accessible via an information network. Currently, a user must browse the information network for information that is of interest to them. Oftentimes, a user must interrupt their use of an application program, such as spreadsheets or word processing programs, to browse the information network. Even messages sent from information networks to users via e-mail or instant messaging facilities require the user to take specific action to learn the content of the messages. Additionally, while some subscription services and portal services allow a user to customize the format and, to a certain extent, the content, of the information provided, a user must still manually navigate to the various sources of information to see if there is anything of interest to them. Still further, a user often has to sift through a lot of information that is of no interest to them, thereby consuming more time than necessary. Another drawback to current capabilities is that the user typically is not informed immediately when information of interest becomes available, but rather, must enter commands to browse the information sources, and therefore may not receive information of interest as soon as it is available.
In the prior art, systems are available to provide information requested from an information network in aural format, however, these systems require interaction with the user and do not provide the information that the user has indicated an interest in automatically as the information becomes available.
It is therefore desirable to provide users with the ability to prescreen information from various, selected sources, to reduce the amount of time required to find items of interest to the user.
It is also desirable to provide users with relevant information as soon as possible after the news becomes available.
It is also desirable to provide a summary of news items of interest to the user, and to allow the user to access more in-depth information regarding a particular summary.
It is further desirable to receive the information aurally, thereby allowing the user to receive information of interest without being required to interrupt their activity to manipulate or view the information.
There are several known methods for converting information from text format to audio format for output to an audio output device such as an audio speaker system. The information is typically in conventional orthography and the output is synthetic speech. The input is provided in the form of a digital signal which represents the characters of conventional orthography. The primary output is also a digital signal representing an acoustic waveform corresponding to the synthetic speech. Digital-to-analog conversion is a well known technique for producing analog signals which can drive audio speakers. The signal may have any convenient implementation, e.g. electrical, magnetic, electromagnetic or optical.
Speech converters usually include two major sub-units namely an analyzer and a synthesizer. The analyzer divides the original input signal into small textual elements. The synthesizer converts each of these small elements into a short segment of digital waveform and it also joins these together to produce the output.
It will be appreciated that the linguistic analysis of a sentence is exceedingly complicated since it involves many different linguistic tasks, and a wide variety of linguistic processors are commercially available, each of which is capable of doing at least one of the tasks. Further, different portions of the linguistic analysis can be distributed among at least two different data processors.
One category of linguistic processors is designated as “converters” in that they change the nature of the symbols utilized. For example a “converter” alters a signal representing a word or other linguistic element in graphemes into a signal representing the same element in phomenes using a grapheme to phoneme dictionary. This dictionary requires a large amount of storage space, and it is therefore preferable to store and maintain one dictionary in a central location, such as a network server, so that it may be accessed by several users, instead of storing and maintaining separate copies of the dictionary on each user's workstation. The benefits of maintaining large resources on servers arc both ease of maintenance and reduced client system resource requirements. Further, converting the phonemes to an audio signal generates a large amount of data, and transferring the data in audio format requires a large amount of bandwidth.
The invention disclosed in U.S. patent application Ser. No. 09/409,000, filed Sep. 29, 1999, entitled “System and Apparatus For Dynamically Generating Audible Notices From An Information Network” discloses a text-to-speech (TTS) engine that resides either in a client-side processor, in a server-side processor, or which is distributed among data processors in the system. TTS processing functions are computationally intensive and some tasks require a large amount of storage space and bandwidth for data transfer. Therefore, it is further desirable to distribute the TTS engine between at least two data processors in a manner which optimizes processing time, data transfer, and storage space efficiency.
In addition to grapheme to phoneme TTS converters, there are other TTS engines that use different algorithms for transforming text data to audio data. Typically, these other TTS engines also involve converting text data to an intermediate format that requires less storage than the data in audio format. Therefore, it is also desirable to distribute other types of TTS engines between at least two data processors in a manner which optimizes processing time, data transfer, and storage space efficiency.
SUMMARY OF THE INVENTIONIn one embodiment, the present invention provides a system for converting information from a text format to an audio format, wherein the text to speech conversion is distributed among two or more data processors. One data processor executes a first set of program instructions to receive information in text format from a data source, to convert the information from the text format to an intermediate format, such as phonemes, and to transmit the information in the intermediate format to the second data processor. The second data processor executes a second set of program instructions to convert the information from the intermediate format to the audio format. In one embodiment, the first data processor, such as a network server, includes one or more databases to aid TTS synthesis, such as one or more grapheme to phoneme dictionaries, that are accessible by multiple users. The second data processor is a client side data processor, such as a client workstation.
In another embodiment, the present invention provides a computer program product for dynamically generating audible notices from an information network using distributed text to speech processing. The information network includes a client processor and a remote processor, such as a network server. The computer program product includes a first set of program instructions that are executed on the remote processor that generate an intermediate representation of the information, such as a phonemic representation. The computer program product further includes a second set of program instructions that are executed on the client side processor that allow a user to preselect at least one data source that is accessible from the information network, to receive information from the at least one preselected data source, and to convert the information from a text format to an audio format based on the intermediate representation of the information.
In one embodiment, the first set of program instructions utilize a dictionary for translating graphemes to phonemes that is stored in a location that is accessible by the first set of program instructions.
In another embodiment, the present invention provides a method for dynamically generating audible notices from an information network which includes preselecting at least one data source from the information network, receiving information from the at least one preselected data source, converting the information from a text format to an intermediate format in a remote processor, converting the information from the intermediate format to an audio format in a client processor, and transmitting audio signals representative of the information in audio format. In one embodiment, the text is converted into an intermediate phonemic representation using a dictionary for translating graphemes to phonemes. The dictionary is stored in a location that is accessible by the remote processor. The phonemes are converted to audio output signals in the client processor.
Each embodiment of the present invention distributes the text to speech processing so that multiple users can take advantage of resources requiring a large amount of storage space from a remote, centralized processor, such as a network server. Intermediate processing of the information is performed at the remote processor to take advantage of the centralized resources, thus reducing the amount of data transfer from the remote processor to the client processor. The information, in intermediate format, is then transferred to the client processor, where it is converted to audio output signals. This feature also advantageously reduces data transfer requirements, since audio output format typically requires a large amount of data storage compared to the intermediate format.
The foregoing has outlined rather broadly the objects, features, and technical advantages of the present invention so that the detailed description of the invention that follows may be better understood.
BRIEF DESCRIPTION OF THE DRAWINGSFIG. 1 is a block diagram of a system for accessing an information network found in the prior art.
FIG. 1a is a block diagram of an example of a computer workstation found in the prior art with which the present invention may be utilized.
FIG. 2 is a block diagram of a two-tier architecture for providing speech-synthesized information in accordance with the present invention.
FIG. 3 is a block diagram of a three-tier architecture for providing speech-synthesized information in accordance with the present invention.
FIG. 4 is a block diagram of a two-tier architecture for providing speech-synthesized information with distributed text to speech processing in accordance with the present invention.
FIG. 5 is a block diagram of a three-tier architecture for providing speech-synthesized information with distributed text to speech processing in accordance with the present invention.
The present invention may be better understood, and its numerous objects, features, and advantages made apparent to those skilled in the art by referencing the accompanying drawings. The use of the same reference symbols in different drawings indicates similar or identical items.
DETAILED DESCRIPTIONThe method and apparatus of the present invention is applicable to devices that access a computerized information network. A number of different information networks are available that allow access to information contained on their computers, with the Internet being one that is generally known to the public. While the Internet is used herein as an example of how the present invention is utilized, it is important to recognize that the present invention is also applicable to other information networks and information systems such as Intranets, database management systems, and document retrieval systems.
An example of atypical Internet connection110 found in the prior art is shown inFIG. 1. A user that wishes to access information on the Internet typically has acomputer workstation112 that executes an application program known asbrowser114.Workstation112 establishes acommunication link116 withweb server118 such as a dial-up wired connection with a modem, a direct link such as a T1 or ISDN line, or a wireless connection through a cellular or satellite network. When the user enters a request for information by entering commands inbrowser114,workstation112 sends a request for information, such as a search for documents pertaining to a specified topic, or a specific web page toweb server118. Eachweb server118,120,122,124 on the Internet has a known address which the user must supply to thebrowser114 in order to connect to theappropriate web server118,120,122, or124. If the information is not available on the user'sweb server118, a central link such asbackbone126 allowsweb servers118,120,122,124 to communicate with one another to supply the requested information. Becauseweb servers118,120,122,124 can contain more than one web page, the user will also specify in the address which particular web page he wants to view. Theweb servers118,120,122,124 execute a web server application program, often referred to as a portal, which monitors requests, services requests for the information on that particular web server, and transmits the information to the user'sworkstation112. A display generated bybrowser114 to present information provided by a program on the server side is then presented oncomputer workstation112. The display typically includes one or more areas for the user to enter commands and to view the information presented.
In the prior art, a web page is primarily visual data that is intended to be displayed on the display device, such as the monitor of user'sworkstation112. Whenweb server118 receives a web page request, it will transmit a document, generally written in a markup language such as hypertext markup language (HTML) or extensible markup language (XML), acrosscommunication link116 to the requestingbrowser114.Communication link116 may be one or a combination of different data transmission systems, such as a direct dial-up modem connected to a telephone line, dedicated high-speed data links such as Ti or ISDN lines, and even wireless networks which transmit information via satellite or cellular networks.Browser114 interprets the markup language and outputs the web page to the monitor ofuser workstation112. This web page displayed on the user's display may contain text, graphics, and links (which are addresses of other web pages). These other web pages (i.e., those represented by links) may be on the same or ondifferent web servers118,120,122,124. The user can go to these other web pages by clicking on the links using a mouse or other pointing device. Whenweb server118 receives a search request, the request is sent to the server containing the search engine specified by the user. The search engine then compiles one or more pages containing a list of links to web pages onother web browsers120,122,124 that may contain information relevant to the user's request. The search engine transmits the page(s) in markup language back to the requesting web server. This entire system of web pages with links to other web pages on other servers across the world is known as the “World Wide Web”.
Workstation112 and/orweb servers116 are computer systems, such ascomputer system130 as shown inFIG. 1a.Computer system130 includes central processing unit (CPU)132 connected byhost bus134 to various components includingmain memory136,storage device controller138,network interface140, audio andvideo controllers142, and input/output devices144 connected via input/output (I/O)controllers146. Those skilled in the art will appreciate that this system encompasses all types of computer systems including, for example, mainframes, minicomputers, workstations, servers, personal computers, Internet terminals, network appliances, notebooks, palm tops, personal digital assistants, and embedded systems. Typicallycomputer system130 also includescache memory150 to facilitate quicker access betweenprocessor132 andmain memory136. I/O peripheral devices often includespeaker systems152,graphics devices154, and other I/O devices144 such as display monitors, keyboards, mouse-type input devices, floppy and hard disk drives, DVD drives, CD-ROM drives, and printers. Many computer systems also include network capability, terminal devices, modems, televisions, sound devices, voice recognition devices, electronic pen devices, and mass storage devices such as tape drives. The number of devices available to add to personal computer systems continues to grow, howevercomputer system130 may include fewer components than shown inFIG. 1a and described herein.
The peripheral devices usually communicate withprocessor132 over one ormore buses134,156,158, with the buses communicating with each other through the use of one ormore bridges160,162.Computer system130 may be one of many workstations or servers connected to a network such as a local area network (LAN), a wide area network (WAN), or a global information network such as the Internet throughnetwork interface140.
CPU132 can be constructed from one or more microprocessors and/or integrated circuits.Main memory136 stores programs and data thatCPU132 may access. Whencomputer system130 starts up, an operating system program is loaded intomain memory136. The operating system manages the resources ofcomputer system130, such asCPU132,audio controller142,storage device controller138,network interface140, I/O controllers146, andhost bus134. The operating system reads one or more configuration files to determine the hardware and software resources connected tocomputer system130.
During operation,main memory136 includes the operating system, configuration file, and one or more application programs with related program data. Application programs can run with program data as input, and output their results as program data inmain memory136 or to one or more mass storage devices through a memory controller (not shown) andstorage device controller138.CPU132 executes one or more application programs, including one or more programs to establish a connection to a computer network throughnetwork interface140. The application programs may be embodied in one executable module or may be a collection of routines that are executed as required. Operating systems commonly use “windows”, as well known in the art, to present information about or from an application program. Each application program typically has its own window that is generated when the application program is executing. Each window may be minimized to an icon, maximized to fill the display, overlaid in front of other windows, and underlaid behind other windows.
Storage device controller138 allowscomputer system130 to retrieve and store data from mass storage devices such as magnetic disks (hard disks, diskettes), and optical disks (DVD and CD-ROM). The information from the DASD can be in many forms including application programs and program data. Data retrieved throughstorage device controller138 is usually placed inmain memory136 whereCPU132 can process it.
One skilled in the art will recognize that the foregoing components and devices are used as examples for sake of conceptual clarity and that various configuration modifications are common. For example,audio controller142 is connected toPCI bus156 inFIG. 1a, but may be connected to theISA bus138 or reside on the motherboard (not shown) in alternative embodiments. As further example, althoughcomputer system130 is shown to contain only a singlemain CPU132 and asingle system bus134, those skilled in the art will appreciate that the present invention may be practiced using a computer system that hasmultiple CPUs132 and/ormultiple busses134. In addition, the interfaces that are used in the preferred embodiment may include separate, fully programmed microprocessors that are used to off-load computationally intensive processing fromCPU132, or may include input/output (I/O) adapters to perform similar functions. Further,PCI bus156 is used as an exemplar of any input-output devices attached to any I/O bus;AGP bus159 is used as an exemplar of any graphics bus;graphics device154 is used as an exemplar of any graphics controller; and host-to-PCI bridge160 and PCI-to-ISA bridge162 are used as exemplars of any type of bridge. Consequently, as used herein the specific exemplars set forth inFIG. 1 are intended to be representative of their more general classes. In general, use of any specific exemplar herein is also intended to be representative of its class and the non-inclusion of such specific devices in the foregoing list should not be taken as indicating that limitation is desired.
FIG. 2 shows a block diagram of components included in one embodiment ofnotice system200 for dynamically generating audible notices from an information network according to the present invention.Notice system200 allows a user to customize delivery of information based on, for example, the data source and a user's profile.Notice system200 provides the information in speech-synthesized format as well as on the user's workstation display as the information becomes available.Notice system200 may perform the following functions independently or in conjunction with other components in Internet connection110:
- play headline audio for new, noteworthy stories as those stories appear;
- present the user with textual (typically HTML-rendered) story headlines;
- allow the user to select a headline to view the entire story;
- allow the user to subscribe and unsubscribe to data sources; and
- allow the user to set various preferences (e.g., monitoring schedules).
 
One benefit ofnotice system200 is that the user does not have to monitor data sources manually becausenotice system200 presents the headlines in audible format as they become available. The user does not have to take any action to receive up-to-date news as its appears, nor does the user have to interrupt his work to check data sources manually. For example, if a user subscribes to one or more services that provide world news and/or financial data sources,notice system200 could be configured to report when the price of one or more specified stocks moves up or down by more than a given percent as the change is published by the stock quote data source. Further, the information will be output to the display associated withworkstation112 even when the window fornotice system200 is not visible on the user's screen. When the user hears a spoken headline of interest, he or she can use the display generated bynotice system200 to access one or more hyperlinks leading to page(s) that contain the full story for the headline. The user can specify criteria and parameters to prioritize reported stories, such criteria including, but not limited to user preferences, noteworthiness, and story metadata (e.g., a specified importance, expiration date, and/or urgency). Further, program instructions can be included inclient204 to monitor user behavior and generate criteria and parameters based on the user's previous interaction withnotice system200.
Notice system200 also presents this news in text format in a browser window, which need not be visible when the story arrives. As the data sources post news stories,notice system200 announces the headlines.Notice system200 includes one or more news summary page listing all of the recent headlines. Each headline is a hyperlink to the web page that contains the full story. Optionally, summary pages may provide additional information with each headline. For example, the summary pages may include additional story text, graphics, or links.
Notice system200 also includes text-to-speech (TTS)engine208,sound player210, data source monitor212, and datasource story adapter214.Notice system200 is a two-tiersystem having client204 communicating directly withremote services216.TTS engine208 includes programs instructions for synthesizing speech into a standard audio format from textual input, such as markup language, and is commercially available from a variety of manufacturers. In the embodiment of the present invention shown inFIG. 2,TTS engine208 may reside inclient204 or be a component inremote services216, e.g.,TTS engine226.
A “story” innotice system200 includes some or all of the following components:
- headline;
- story URL;
- optional source definition;
- optional identification;
- optional parameter;
- optional timestamp;
- optional advertisement; and
- optional additional data.
 
The story URL points to a web page (usually on the data source's site) that contains the full story.Notice system200 specifies a default set of data sources, such asdata sources218,220,222. A story can also define new data sources, however. By including an optional source definition, a story can announce the new sources of information to users.
Another optional component of a story is a set of one or more parameters, which some data sources require to access information. For example, a financial data source requires a stock symbol to retrieve price quotes for a particular stock.Notice system200 can accommodate zero, one, or more parameters for a particular data source.
A story may optionally contain a variety of other information such as an identification, a time stamp, the name of the author of a story, graphics, audio, video, advertisements, keywords, and categorization information. If a story does not have a time stamp,notice system200 automatically assigns one to it.Client204 outputs the story's headline in audible format usingsound player210. The story's headline may be marked up in a speech synthesis markup language.
Stories are available from a virtually unlimited variety of subscriber and non-subscriber data sources, such asdata sources218,220, and222.Notice system200 includes a syntax for a textual representation of a story. This story syntax is also referred to as “story format”. Information that is in a foreign format (i.e., not in story format) fromdata sources218 and220 is converted to story format in datasource story adapters214,224. Stories that are supplied in story format, such as fromdata source222, do not require conversion.Adapters214,224 are usually designed to convert source from one specific foreign format to story format. In one embodiment, the syntax for story format is defined by an XML document type definition (DTD), which allows a developer to define keyword assignments for tags and their associated parameters, as known in the art. Thus,data sources218,220 may provide information in story format, or, alternatively,client204 may include one or more adapters to convert information from foreign formats to story format.
A user does not necessarily want to hear the headlines of all new stories from all available data sources. Otherwise, a user would be inundated with constant updates of information. For example, a user who subscribes to stock quotes would here a continues stream of price updates. Accordingly, the present invention allows a user to specify one ormore data sources218,220,222 from which to receive information, as well as one or more noteworthiness criterion for selecting stories presented to the user bynotice system200. If a data source has a noteworthiness criterion,notice system200 reads a new story from that data source only if the story satisfies the criterion. The noteworthiness criteria that are available for selection is based on the type of information provided by a particular data source. For example, a stock quote data source noteworthiness criterion could be “price change greater than 1% from the last announced price”. If the data source supplies more than one criterion, the user can select a conjunction or disjunction of criterion. Furthermore, a criterion can be parameterized, in which case the user supplies one or more parameters. For example, “percentage change in trading volume” is a parameterized stock quote criterion. The user could specify a parameter of “2%” to be informed of a volume change greater than 102% or less than 98% of the previously reported volume.
Data sources218,220,222 publish stories and include the following components:
- name
- description URL
- stories URL
- optional schedule
- optional data source groups
- optional additional data
 
The description URL points to a web page that describesdata source218,220,222.Notice system200 uses the stories URL to get the lateststories data sources218,220,222. The range of topics for stories is unlimited. For example, a product catalog can be specified as a data source. The stories are announcements of new products, discontinued products, improved products, etc. A weather forecast data source publishes forecast “stories”. The automobile section of the classified advertisement section of a newspaper publishes classified ad “stories” about cars that are for sale. A ticker tape publishes stock quote “stories”.
Further, a user may specify a data source category, which is a group of related data sources. For example, a “World News” data source category would contain data sources for world news stories. It would also contain data source categories for different countries and/or regions of the world such as Asia and the Middle East. A data source may belong to zero or more data source categories.
Notice system200 includes a default set ofdata sources218,220,222. In addition, a story can define a new data source. Such stories are referred to as source stories. A user reading a source story can subscribe to the source the story announces. A user can also manually enter a definition for a web-based format source. The definition requires at least the URL for data source stories. If adata source adapter214,224 is available, a user on a fatclient notice system200 can specify the location of the adapter. In this case,notice system200 will download and installadapter214,224.
Client204 includesbrowser202 which interprets documents and scripts that are typically written in mark-up language.Client204 generates a news page that is refreshed automatically via a ‘Refresh’ META tag or other mechanism for refreshing the display. The refresh rate can adapt to the rate of arrival of new stories or a refresh command may be pushed fromminiserver206 when a new story is sent tobrowser202.Client204 also either plays audio served fromremote TTS engine226, or the client invokeslocal TTS engine208 to generate speech. Ifremote TTS engine226 is used,browser202 must be capable of playing audio. Iflocal TTS engine208 is used, eitherbrowser202,TTS engine208, or another set of program instructions inclient204 must be capable of playing audio.
Remote services216 perform five primary functions: data source monitoring, data source management, data source interfacing, state management, and client services.
Notice system200 includes capabilities forclient204 to pull stories fromdata sources218,220,222, and forremote services216 to push stories toclient204. For data sources that do not push stories toclient204, data source monitor212polls data sources218,220,222 periodically to check the availability of new stories. The polling schedules can be fairly complex including an adaptive scheduler, which increases the polling frequency with the rate of arrival of new stories. The adaptive scheduler reduces the polling rate as the rate of arrival of new stories decreases. Static schedulers are also included, for example, hourly polling during business hours.
Data source management includes the creation, modification, and deletion ofdata sources218,220,222.
Miniserver206 manages state information including user registrations, subscriptions, data source definitions, stories, user preferences, user profiles, data source profiles, data source categories, and other information.Miniserver206 stores most of the state information in relational databases.
Client services are all of theservices notice system200 requires including new story reports, subscription modifications, and user preferences modifications.
In oneembodiment notice system200 provides an optional auto-personalization feature whereby the user can choose to havenotice system200 model the user's interests. With this model,notice system200 can automatically subscribe the user to sources relevant to the users interests.Notice system200 can also direct relevant stories to the user from data sources to which the user doesn't subscribe.
Notice system200 can categorizedata sources218,220,222 with either explicit data (e.g., as part of a data source definition) or derived data (from, e.g., machine learning techniques).Notice system200 may categorize stories as well. A story can belong to one or more story categories. Eachdata source218,220,222 is a de facto story category.Notice system200 can use any story data—or data derived from the story—to categorize it.
Notice system200 also monitors and dynamically logs its overall state, includes story arrival rates, errors, usage data, and other information.
Notice system200 may serve audio advertisements with headlines. These audio ads can be personalized based on the headlines, the user's profile, and other information.Notice system200 may also place advertisements on the summary pages served toclient204. The advertisements can be personalized based on the data source, current stories, the user's profile, and other information that may be customized by the user. Further,data sources218,220,222 can also deliver ads in its data source markup language as “stories”, or in its stories.
A three-tier embodiment of the present invention fornotice server300 is shown inFIG. 3, includingclient302,server304, andremote services306.Notice system300 provides capabilities and advantages that are virtually identical to noticesystem200 including providing customized delivery of stories in speech-synthesized format as well as in a window on a display as the stories become available, auto-personalization, categorizingdata sources218,220,222, and categorizing stories. One of the differences betweennotice system200 andnotice system300 is thatclient302 is a “thin” client architecture, whereasclient204 in notice system200 (FIG. 2) is a “fat” client architecture.Miniserver206 provides enough functionality inclient204 to eliminate any requirement for a separate server, such asserver304 innotice system300.
Client302 innotice system300 further includesbrowser308,TTS engine310, andsound player210.Server304 includesminiserver314, datasource story adapter214, and data source monitor212. In an alternate embodiment,TTS engine320 resides inserver304, thereby replacingTTS engine310 inclient302. In both notice system200 (FIG. 2) andnotice system300, the TTS engine may be located on the client side, (e.g. TTS engine208 or310) or in a computer system that is remote from the client side (e.g.,TTS engine226 or320).
Two issues that arise when TTS is performed remote from the client side are the computational resources required to convert text to speech, and the bandwidth required to transfer speed data from the remote processor to the client side. One alternative is to distributeTTS engines208,310 throughoutnotice system200 or300 to reduce bandwidth and computational burden on a single TTS engine. In many types of text to speech converters, functions ofTTS engines208,226,310,320 can be broken down into a composition of functions g(f(x)). One type ofknown TTS engine208,226,310,320 involves expanding text (x) into phonemes in the function f(x) and requires a large dictionary for translating graphemes to phonemes. A phoneme is a component part or unit in the pronunciation of a word in the sound system of a language. The function g(f(x)) computes sounds that represent the phonemes and could be more computationally intensive compared to the function f(x). Converting the phonemes to representative sounds, also referred to as audio data, generates a large amount of data, even when audio compression schemes are utilized. Ideally, this conversion is performed onclient side204,302 to alleviate the need to transfer a large amount of audio data fromremote services216 orserver304.
Thus, in an embodiment of the two-tier architecture400 shown inFIG. 4, f(x) is distributed inTTS engine426 whereremote services216 has storage capacity for the large word-to-phoneme dictionary428. Further, g(f(x)) is distributed inTTS engine408 onclient side204, thereby offloading heavy computational workload and data transfer requirements fromremote services216.
Likewise, in an embodiment of the three-tier architecture500 shown inFIG. 5, f(x) is distributed inTTS engine520 inserver304, which has storage capacity for the large word-to-phoneme dictionary522, and g(f(x)) is distributed toTTS engine510 onclient side302, thereby offloading heavy computational workload and data transfer requirements fromserver304.
Server304 performs data source monitoring via data source monitor212, as discussed hereinabove fornotice system200.Server304 also manages state information including user registrations, subscriptions, data source definitions, stories, user preferences, user profiles, data source profiles, data source categories, and other information.Server304 stores most of the state information in relational databases. Further,server304 may perform data source interfacing, such as converting information in a foreign format to story format usingdata source adapter214. Alternatively, a required data source adapter, such asdata source adapter224, may reside inremote services306.
Notice system300 also includes capabilities forclient302 to pull stories fromdata sources218,220,222, and forremote services216 to push stories toclient302, throughserver304. For data sources that do not push stories toclient302 viaserver304, data source monitor212polls data sources218,220,222 periodically to check the availability of new stories in a manner similar to that described in the discussion fornotice system200 hereinabove.
Notice systems, such asnotice systems200 and300, may serve audio advertisements with headlines. These audio ads can be personalized based on the headlines, the user's profile, and other information.Notice systems200,300 may also place advertisements on the summary pages served toclients204,302, respectively. The advertisements can be personalized based on the data source, current stories, the user's profile, and other information that may be customized by the user. Further,data sources218,220,222 can also deliver ads in its data source markup language as “stories”, or in its stories.
While the invention has been described with respect to the embodiments and variations set forth above, these embodiments and variations are illustrative and the invention is not to be considered limited in scope to these embodiments and variations. For example, although a TTS engine for converting graphemes to phonemes has been discussed as an example of a TTS engine that may utilize the present invention, the present invention may also be utilized with other similar functions which compute intermediate representations and generate a relatively small amount of data compared to the final audio output. Further, several different databases may be included in one remote location, such as grapheme to phoneme dictionaries for a variety of different languages. Accordingly, various other embodiments and modifications and improvements not described herein may be within the spirit and scope of the present invention, as defined by the following claims.