CROSS-REFERENCE TO RELATED APPLICATIONThis application claims the priority benefits of China application serial no. 201210593080.7, filed on Dec. 31, 2012, and China application serial no. 201310182947.4, filed on May 17, 2013. The entirety of each of the above-mentioned patent applications is hereby incorporated by reference herein and made a part of specification.
BACKGROUND OF THE INVENTION1. Field of the Invention
The invention relates to a speech dialogue processing technique. More particularly, the invention relates to a natural language dialogue system and a method capable of correcting a speech response.
2. Description of Related Art
In the field of natural language recognitions, computers usually use certain syntax to capture and recognize user's intentions or information within his/her inputs. Accordingly, computers are able to determine user's intentions if there are sufficient data relating to sentences input by users stored in databases of the computers.
Conventionally, a built-in phrase list including specific idioms indicating certain intentions or information is often applied to compare with user sentences, and every user is asked to express his/her intentions with the uses of the well-defined specific idioms within the phrase list, such that his/her intentions may be correctly recognized by computers. However, it is rather unreasonable and/or unpractical if the user is forced to remember every idiom included in the phrase list. For instance, if a user intends to learn the weather conditions, he/she may be asked to input “what is the weather tomorrow (or the day after tomorrow) in Shanghai (or Beijing)”? In case the user uses another colloquial expression instead, e.g., “how is Shanghai tomorrow?”, this expression may be interpreted as “there is a place called ‘tomorrow’ in Shanghai” because the word “weather” is not shown in his/her sentence. Thereby, the user's intention may be misunderstood by computers. In addition, user's sentences are usually complicated and diverse, and sometimes his/her inputs may be erroneous, which needs fuzzy matching processes for further identifications. Obviously, those phrase lists established under this conventionally rigid input-rule usually conduct disappointing analysis results.
From another perspective, one syntactic structure/sentence may refer to different intentions even if all possible principles of natural language analyses are applied to recognize users' intentions. For instance, if the user sentence is “I want to see the Romance of the Three Kingdoms”, he/she may intend to watch the film of “Romance of the Three Kingdoms” or read the book of “Romance of the Three Kingdoms”. Under such a scenario, the user has to make a further selection between these two matches. Sometimes, it will be redundant and inefficient for a user to make selection among meaningless matches. For instance, if a user's sentence is “I want to see One Million Star”, it is unnecessary to recognize the user's intention as a book or a painting of “One Million Star” (because “One Million Star” is a very famous TV show among Chinese).
Moreover, in most cases, search results obtained from a full-text search are non-structured data, which usually contains separate and unrelated information therein. For instance, if a user inputs a keyword in a search engine (e.g., Google or Baidu) for searches, search results in webpages usually include separate and diverse information waiting for user's identifications. The only way for the user to find out useful information contained in the search results is to browse and/or look into those webpages one-by-one. It is really a time-cost approach for the user to browse those search results, and, sometimes, he/she may skip or miss his/her desired information inadvertently. The uses of the search results obtained conventionally are accordingly limited.
SUMMARY OF THE INVENTIONAn embodiment of the invention provides a natural language dialogue system and a method capable of correcting a speech response. If a speech response output by the natural language dialogue system does not match a user's intention included in his or her request information, the natural language dialogue system is able to correct the previously output speech response and provide a new speech response that matches the user's request message.
In an embodiment of the invention, a method for correcting a speech response includes following steps. A first speech input is received. At least one first keyword included in the first speech input is parsed to obtain a candidate list, wherein the candidate list has at least one report answer. One of the at least one report answer is selected from the candidate list as a first report answer, and a first speech response is output according to the first report answer. A second speech input is received and parsed to determine whether the first report answer is correct. If the first report answer is incorrect, another report answer other than the first report answer is selected from the candidate list as a second report answer, and a second speech response is output according to the second report answer.
In an embodiment of the invention, a natural language dialogue system that includes a speech sampling module and a natural language comprehension system is provided. The speech sampling module receives a first speech input. The natural language comprehension system is coupled to the speech sampling module and parses at least one first keyword included in the first speech input to generate a candidate list that has at least one report answer. The natural language comprehension system then selects one of the at least one report answer from the candidate list as a first report answer and outputs a first speech response according to the first report answer. The speech sampling module receives a second speech input, and the natural language comprehension system parses the second speech input to determine whether the selected first report answer is correct. If the first report answer is incorrect, the natural language comprehension system selects one report answer other than the first report answer as a second report answer and outputs a second speech response according to the second report answer.
In view of the above, if the speech response output by the natural language dialogue system fails to match the request information of the speech input from the user, the natural language dialogue system corrects the previously output speech response and further outputs another speech response (that relatively conforms to the request information of the user) according to another speech input subsequently provided by the user. Thereby, in the event that the user is dissatisfied with the report answer provided by the natural language dialogue system, the natural language dialogue system may provide a new speech response to the user, so as to facilitate the use of the natural language dialogue system when the user talks to the natural language dialogue system.
Several exemplary embodiments accompanied with figures are described in detail below to further describe the invention in details.
BRIEF DESCRIPTION OF THE DRAWINGSThe accompanying drawings are included to provide further understanding, and are incorporated in and constitute a part of this specification. The drawings illustrate exemplary embodiments and, together with the description, serve to explain the principles of the invention.
FIG. 1 is a block diagram illustrating a natural language comprehension system according to an embodiment of the invention.
FIG. 2 is a diagram illustrating a parsed result obtained by a natural language processor which parses various request information from a user according to an embodiment of the invention.
FIG. 3A is a schematic diagram illustrating a plurality of records stored into a structured database according to an embodiment of the invention, wherein the records have specific data structures.
FIG. 3B is a schematic diagram illustrating a plurality of records stored into a structured database according to another embodiment of the invention, wherein the records have specific data structures.
FIG. 3C is a schematic diagram illustrating indication data stored in an indication data storage system according to an embodiment of the invention.
FIG. 4A is a flowchart illustrating a search method according to an embodiment of the invention.
FIG. 4B is a flowchart illustrating a work process of a natural language comprehension system according to another embodiment of the invention.
FIG. 5A is a block diagram illustrating a natural language dialogue system according to an embodiment of the invention.
FIG. 5B is a block diagram illustrating a natural language comprehension system according toFIG. 5A of the embodiment of the invention.
FIG. 5C is a block diagram illustrating a natural language dialogue system according toFIG. 5A of another embodiment of the invention.
FIG. 6 is a flowchart illustrating a method for correcting a speech response according to an embodiment of the invention.
FIG. 7A is a block diagram of the invention illustrating a natural language dialogue system used for outputting report answers according to user's preferences.
FIG. 7B is another block diagram of the invention illustrating a natural language dialogue system used for outputting report answers according to user's preferences.
FIG. 8A is a flowchart illustrating a natural language dialogue method for outputting report answers according to user's preferences.
FIG. 8B is a schematic diagram illustrating a plurality of records stored in a structured database used for outputting report answers according to user's preferences, wherein the records have specific data structures.
FIG. 9 is a schematic diagram illustrating a mobile terminal apparatus according to an embodiment of the invention.
FIG. 10 is a schematic diagram illustrating an information system according to an embodiment of the invention.
FIG. 11 is a flowchart illustrating a selection method based on speech recognition according to an embodiment of the invention.
FIG. 12 is a block diagram illustrating a speech control system according to an embodiment of the invention.
FIG. 13 is a block diagram illustrating a speech control system according to another embodiment of the invention.
FIG. 14 is a block diagram illustrating a speech control method according to an embodiment of the invention.
DETAILED DESCRIPTIONS OF EMBODIMENTSThe use of the conventional phrase list is subject to rigid input rules, and therefore is incapable of recognizing diverse user's input sentences, which usually introduces difficulties in searching and acquiring user's desired information raised from incorrect recognitions to user's intentions, or delivers unnecessary information to the user due to insufficient recognition capabilities. The conventional search engines may merely provide search results that contain separate data with little relevance, such that any user has to browse the search results one-by-one to capture his/her desired information therefrom, thus resulting in time waste and even missing information. In view of the above, a search method and associated search system that provides structured data are introduced herein. Specifically, different types of data are respectively stored into different specific fields. Thereby, when a user conducts searches based on his/her input information by using natural languages, the user's intentions may be promptly and correctly determined, and the desired information may then be provided to the user. Alternatively, more accurate information may be provided to the user for further selections if more determinations are needed.
FIG. 1 is a block diagram illustrating a natural language comprehension system according to an embodiment of the invention. With references toFIG. 1, the naturallanguage comprehension system100 includes asearch system200, anatural language processor300, and a knowledgecomprehension assistance module400 coupled to thesearch system200 and thenatural language processor300. Thesearch system200 includes astructured database220, asearch engine240, and asearch interface unit260, wherein thesearch engine240 is coupled to the structureddatabase220 and thesearch interface unit260. In the present embodiment, thesearch system200 is equipped with thesearch interface unit260, which should not be construed as a limitation to the invention. That is, in another embodiment of the invention, there may be nosearch interface unit260 in thesearch system200, and thesearch engine240 conducts a full-text search to the structureddatabase220 after receivingkeyword108 from API (Application Interface) calls.
When a user sends his/herrequest information102 to the naturallanguage comprehension system100, thenatural language processor300 parses therequest information102 and sends a parsed possibleintention syntax data106 associated with therequest information102 to the knowledgecomprehension assistance module400. The possibleintention syntax data106 include akeyword108 and anintention data112. The knowledgecomprehension assistance module400 obtains and then sends thekeyword108 included in the possibleintention syntax data106 to thesearch system200, while theintention data112 are stored in the knowledgecomprehension assistance module400. After thesearch engine240 in thesearch system200 conducts a full-text search in the structureddatabase220 according to thekeyword108, aresponse result110 of the full-text search is transmitted back to the knowledgecomprehension assistance module400. The knowledgecomprehension assistance module400 may compare theresponse result110 with theintention data112 stored in the knowledgecomprehension assistance module400 to obtain a confirmativeintention syntax data114, which are then directed to a parsedresult output module116 for further processing. According to the confirmativeintention syntax data114, the parsedresult output module116 delivers a parsedresult104 to a server (not shown). The server finally sends required data to the user if the required data are found by means of the parsed result104 (detailed explanations will be given in the following paragraphs). Note that the parsedresult104 may include thekeyword108, parts of the information within a record (e.g., the serial number of each record302), or all information of the record (e.g., a record shown in FIG.3A/FIG. 3B) associated with thekeyword108, etc. Besides, the parsedresult104 may be directly converted into a speech output to the user by the server in one embodiment. Additionally, the parsedresult104 may be processed in a certain manner (which will be elaborated hereinafter), and a speech output corresponding to the processed parsedresult104 may be output to the user, finally. People skilled in the art are able to modify the way of outputting the information through thesearch system200 based on various applications and/or demands, and the invention is not limited thereto.
The parsedresult output module116 may be combined with other modules as applications. For instance, in an embodiment of the invention, the parsedresult output module116 may be integrated into the knowledgecomprehension assistance module400. In another embodiment, the parsedresult output module116 may be separated from the naturallanguage comprehension system100 and is located in the server (that exemplarily contains the natural language comprehension system100), and thus the server may directly receive and process the confirmativeintention syntax data114. In addition, theintention data112 may be stored in a storage apparatus within the knowledgecomprehension assistance module400, in the naturallanguage comprehension system100, in the server (that exemplarily contains the natural language comprehension system100), or in any storage apparatus that may be accessed by the knowledgecomprehension assistance module400. The invention is not limited thereto. Besides, the naturallanguage comprehension system100 that includes thesearch system200, thenatural language processor300, and the knowledgecomprehension assistance module400 may be constituted by hardware, software, firmware, or a combination thereof, which should not be construed as limitations to the invention.
The naturallanguage comprehension system100 may be configured in a cloud server, a LAN server, a personal computer (PC), a mobile computer device (e.g., a notebook computer), or a mobile communication apparatus (e.g., a cell phone). The components of the naturallanguage comprehension system100 or those of thesearch system200 may not be integrated into one machine. That is, the components of the naturallanguage comprehension system100 or those of thesearch system200 may be located in different apparatuses or systems, and may communicate with each other according to different communication protocols. For instance, thenatural language processor300 and the knowledgecomprehension assistance module400 may be configured in an identical smart phone, while thesearch system200 is configured in a cloud server. Alternatively, thesearch interface unit260, thenatural language processor300, and the knowledgecomprehension assistance module400 may be arranged in an identical notebook computer, while thesearch engine240 and thestructured database220 may be configured in a LAN server. Additionally, when the naturallanguage comprehension system100 is configured in a server (i.e., a cloud server or a LAN server), thesearch system200, thenatural language processor300, and the knowledgecomprehension assistance module400 may be configured in different computer hosts, while information and data transmissions among thesearch system200, thenatural language processor300, and the knowledgecomprehension assistance module400 may be coordinated by the main system of the server. Certainly, according to applications and/or actual demands, two or all of thesearch system200, thenatural language processor300, and the knowledgecomprehension assistance module400 may be integrated in a computer host, which should not be construed as a limitation to the invention.
As described herein, the user is able to send his/her request information to thenatural language processor300 in various manners, e.g., by way of speech inputs or textual descriptions. For instance, if the naturallanguage comprehension system100 is located in a cloud server or in a LAN server (not shown), the user may input therequest information102 through a mobile device (e.g., a cell phone, a personal digital assistant (PDA), a tablet PC, or any other similar system). Through telecommunication links provided by telecommunication service providers, therequest information102 may be transmitted to the naturallanguage comprehension system100 in a server, and therefore thenatural language processor300 may parse therequest information102. After the server confirms the user's intention, the parsedresult104 corresponding to the user's intention generated by the parsedresult output module116 may be processed by the server, and finally the information requested by the user may be transmitted back to user's mobile device. For instance, therequest information102 from the user is a question (e.g., “what is the weather going to be tomorrow in Shanghai”) requesting the naturallanguage comprehension system100 to provide an answer. After the naturallanguage comprehension system100 parses therequest information102 and realizes that the user intends to learn the weather in Shanghai tomorrow, in one embodiment, the naturallanguage comprehension system100 may output associated searched weather data as theoutput result104 to the user through the parsedresult output module116. In addition, if therequest information102 from the user is “I want to watch Let the Bullets Fly (a Chinese movie)” or “I want to listen to Days When We Were Together (a Chinese song)”, thenatural language processer300 may obtain a possibleintention syntax data106 including associatedkeyword108 andintention data112 after parsing therequest information102, and then a full-text search may be conducted in the structureddatabase220 by thesearch engine240 to recognize and confirm user's intention.
Particularly, when the user'srequest information102 is “what is the weather going to be tomorrow in Shanghai,” thenatural language processor300, after parsing therequest information102, may obtain a possible intention syntax data106:
“<queryweather>,<city>=Shanghai,<time>=tomorrow”
In an embodiment of the invention, if the naturallanguage comprehension system100 clearly recognizes the user's intention, the parsedresult output module116 of the naturallanguage comprehension system100 may directly output the parsedresult104 to the server, and the server may search associated weather data requested by the user and then transmit the searched weather data to the user. Additionally, if the user'srequest information102 is “I want to see the Romance of the Three Kingdoms,” thenatural language processor300 may obtain three possibleintention syntax data106 after parsing the request information102:
“<readbook>,<bookname>=Romance of the Three Kingdoms”;
“<watchTV>,<TVname>=Romance of the Three Kingdoms”; and
“<watchfilm>,<filmname>=Romance of the Three Kingdoms”.
Since the keywords108 (i.e., “Romance of the Three Kingdoms”) in the possibleintention syntax data106 may refer to different categories, i.e., book (<readbook>), TV drama (<watchTV>), and film (<watchfilm>), which indicates arequest information102 may derive many possibleintention syntax data106 after parsing. The knowledgecomprehension assistance module400 needs more parsing procedures to identify user'srequest information102. Moreover, if the user inputs “I want to see Let the Bullets Fly,” two possible intention syntax data106 (as provided below) may be derived because “Let the Bullets Fly” may refer to a film or a book:
“<readbook>,<bookname>=Let the Bullets Fly”; and
“<watchfilm>,<filmname>=Let the Bullets Fly”.
The keyword108 (i.e., “Let the Bullets Fly”) in the possibleintention syntax data106 may also refer to two fields i.e., book (<readbook>), and film (<watchfilm>). The above-mentioned possibleintention syntax data106 may be further parsed by the knowledgecomprehension assistance module400 to obtain a confirmativeintention syntax data114 to clarify the user's intention. When the knowledgecomprehension assistance module400 parses the possibleintention syntax data106, the knowledgecomprehension assistance module400 may transmit the keyword108 (e.g., “Romance of the Three Kingdoms” or “Let the Bullets Fly”) to thesearch system200 through thesearch interface unit260. Thestructured database220 in thesearch system200 stores a plurality of records, wherein each record has a specific data structure. Thesearch engine240 may conduct a full-text search in the structureddatabase220 according to thekeyword108 received through thesearch interface unit260 and then deliver aresponse result110 back to the knowledgecomprehension assistance module400. The knowledgecomprehension assistance module400 may then obtain a confirmativeintention syntax data114 based on theresponse result110. Details of conducting the full-text search in the structureddatabase220 to derive the confirmativeintention syntax data114 will be described below with reference toFIG. 3A andFIG. 3B.
The naturallanguage comprehension system100 described herein is capable of capturing thekeywords108 included in the request information delivered from users and of determining the categories associated with thekeywords108 by conducting associated full-text search in the structureddatabase220. For instance, if the user inputs “I want to watch the Romance of the Three Kingdoms,” the possibleintention syntax data106 fallen into three different categories (book, TV drama, and film) may be correspondingly obtained. The naturallanguage comprehension system100 may further parse the possibleintention syntax data106 to recognize and ascertain the user's intention. Accordingly, the user may express his/her intention or deliver information colloquially and easily without using any specific term or expression, e.g., those words, phrases, or sentences, recorded in the conventional phrase list.
FIG. 2 is a schematic diagram illustrating a parsed result obtained by anatural language processor300 which parses various request information from a user according to an embodiment of the invention.
As shown inFIG. 2, when the user'srequest information102 is “what is the weather going to be tomorrow in Shanghai,” thenatural language processor300 may obtain a possibleintention syntax data106 shown in the following after parsing the request information102:
“<queryweather>,<city>=Shanghai,<time>=tomorrow”
Here, theintention data112 are “<queryweather>”, and thekeywords108 are “Shanghai” and “tomorrow.” Since there is only one intention syntax data106 (inquiring about the weather <queryweather>) obtained after thenatural language processor300 parses therequest information102 in an embodiment of the invention, the knowledgecomprehension assistance module400 may directly capturekeywords108 “Shanghai” and “tomorrow” therefrom and then send associated parsedresult104 associated with these two keywords to the server so as to search information regarding the weather (e.g., the parsedresult104 may be used for inquiring about the weather conditions in Shanghai tomorrow, such as weather, temperature, and so forth). Accordingly, the knowledgecomprehension assistance module400 may be unnecessary to conduct a full-text search in the structureddatabase220 to recognize the user's intention if the knowledgecomprehension assistance module400 considers the only oneintention syntax data106 parsed by therequest information102 is able to show what the user's intention is. Certainly, in an embodiment of the invention, the full-text search may still be conducted to the structureddatabase220 to further recognize and ascertain the user's intention, and people skilled in the art may modify the embodiments according to applications and/or actual demands.
If the user'srequest information102 is “I want to see Let the Bullets Fly,” two possibleintention syntax data106 may be derived from the user's request information102:
“<readbook>,<bookname>=Let the Bullets Fly”; and
“<watchfilm>,<filmname>=Let the Bullets Fly”.
According to these twocorresponding intention data112 “<readbook>” and “<watchfilm>” and thesame keyword108 “Let the Bullets Fly”, the user's intention may be interpreted as “read the book of Let the Bullets Fly” or “watch the film of Let the Bullets Fly.” To further recognize and ascertain the user's intention, thekeywords108 “Let the Bullets Fly” are transmitted to thesearch interface unit260 through the knowledgecomprehension assistance module400, and thesearch engine240 conducts a full-text search in the structureddatabase220 according to thekeywords108 “Let the Bullets Fly,” so as to determine whether “Let the Bullets Fly” refers to a book or film.
Additionally, if the user'srequest information102 is “Days When We Were Together,” two possibleintention syntax data106 may be derived from the user's request information102:
“<playmusic>,<singer>=When We Were Together, <songname>=Days”;
“<playmusic>,<songname>=Days When We Were Together”
According to thesame intention data112 “<playmusic>” and the two sets ofcorresponding keywords108, i.e., “When We Were Together” and “Days” as well as “Days When We Were Together,” the user's intention may be interpreted as “listen to the song ‘Days’ performed by the music artist ‘When We Were Together’” and “listen to the song ‘Days When We Were Together’”, respectively. The knowledgecomprehension assistance module400 may transmit the first set ofkeywords108 “When We Were Together” and “Days” and the second set ofkeywords108 “Days When We Were Together” to thesearch interface unit260 to make sure whether a song entitled “Days” and performed by the music artist “When We Were Together” actually exists (i.e., to recognize and ascertain the user's intention implied by the first set of keywords108), and also make sure whether a song entitled “Days When We Were Together” exists (i.e., to recognize and ascertain the user's intention implied by the second set of keywords108). Note that the formats and the names corresponding to the possibleintention syntax data106 and theintention data112 are not limited to those described herein.
FIG. 3A is a schematic diagram illustrating a plurality of records stored into astructured database220 according to an embodiment of the invention, wherein the records have specific data structures as shown therein.
In general, according to conventional methods of conducting full-text searches, the search results (e.g., obtained through Google or Baidu) are non-structured data and are thus separate and unrelated. Any user has to browse the search results one-by-one so as to find out information he/she wants, which is rather inconvenient and is not user-friendly. By contrast, in the invention, search efficiency and accuracy is guaranteed by means of a structured database because associated value data in each record of the structured database are correlated, and those value data within a record collectively demonstrate the category of the record belongs. When thesearch engine240 conducts a full-text search in the structureddatabase220 and also when at least one value data in the record matches with the keyword, associated indication data corresponding to those matched value data may be output for the purpose of recognizing and ascertaining the user's intention included in the request information. Detailed implementations will be further described in the following embodiment.
According to an embodiment of the invention, each record302 stored in the structureddatabase220 includes atitle field304 and acontent field306, wherein thetitle field304 includes a plurality ofsub-fields308, and each of which includes anindication field310 and avalue field312. Within each record302, theindication field310 serves to store an indication data, and thevalue field312 serves to store a value data.Record 1 shown inFIG. 3A is exemplified to explain more detailed hereinafter. Threesub-fields308 in thetitle field304 ofRecord 1 respectively store:
“singerguid: Andy Lau”;
“songnameguid: Days When We Were Together”; and
“songtypeguid: HK and Taiwan, Cantonese, pop”;
Eachindication field310 of these threesub-fields308 respectively stores associated indication data “singerguid,” “songnameguid,” and “songtypeguid,” and the correspondingvalue field312 respectively stores the value data “Andy Lau,” “Days When We Were Together,” and “HK and Taiwan, Cantonese, pop.” The indication data “singerguid” demonstrates the value data “Andy Lau” is a singer's name, the indication data “songnameguid” demonstrates the value data “Days When We Were Together” is a song name, and the indication data “songtypeguid” demonstrates the value data “HK and Taiwan, Cantonese, pop” is a song type. The indication data may be represented by different digit numbers or characters, which should not be construed as a limitation to the invention. Thecontent field306 ofRecord 1 may store lyrics of the song “Days When We Were Together” or other data related to this song (e.g., the composer/lyricist of the song). Note that the data stored in thecontent field306 of each record as shown inFIG. 3A are merely exemplary, and whether the stored data are authentic or not should not be construed as a limitation to the invention.
In the previous embodiment, each record includes thetitle field304 and thecontent field306, and each sub-field308 in thetitle field304 includes anindication field310 and avalue field312. However, thesefields304 andsub-fields308 should not be construed as limitations to the invention, and therecord302 may not contain thecontent field306 or even theindication field310 in some embodiments.
Besides, in an embodiment of the invention, a first special character is stored as a separation between two neighboringsub-fields308 so as to separate data of any two neighboringsub-fields308, and a second special character is stored as a separation between theindication field310 and thevalue field312 within a sub-field308 so as to separate the indication data in theindication field310 from the value data in thevalue field312. For instance, as shown inFIG. 3A, the second special character “:” (Colon) serves to separate the indication data “singerguid” from the value data “Andy Lau,” to separate the indication data “songnameguid” from the value data “Days When We Were Together,” and to separate the indication data “songtypeguid” from the value data “HK and Taiwan, Cantonese, pop”. InRecord 1, the first special character “|” (Dash) is applied to separate twoneighboring sub-fields308 within arecord302. Note that those special characters applied to separate the stored data are not limited to those described herein.
In another aspect, according to an embodiment of the invention, the digit number of each sub-field308 in thetitle field304 may be fixed. For instance, each sub-field308 may use 32 characters and theindication field310 may need 7 or 8 digits (for directing to 128 or 256 different indication data), which indicates digit numbers may be fixed in the invention. Besides, the first and second special characters respectively need fixed digit numbers as presentations. Therefore, after the digit numbers of the indication field310 (e.g., 8 digits), first special character (i.e., one character or 8 digits), and the second special character (i.e., one character or 8 digits) are subtracted from the total digit number of the sub-field308, the remaining digit number of the sub-field308 may be applied for storing value data of thevalue field312. Please note the digit numbers of the sub-field308 is fixed, and the data sequentially stored in each sub-field308, i.e., the indication data in theindication field310, the first special character, the value data in thevalue field312, and finally the second special character, all have fixed digit numbers as well. Accordingly, the value data in thevalue field312 may be directly obtained by skipping proper digits. For example, if the value data “Andy Lau” in thefirst sub-field308 are retrieved, thesearch system200 may skip the digits associated with the indication field310 (e.g., the first eight digits), the second special character (e.g., the consecutive8 digits used for representing a colon), and also the first special character (e.g., the last 8 digits used for representing a dash) within thefirst sub-field308. At this example, there are 32-3=29 characters being applied to store the value data (i.e., “Andy Lau”) in thevalue field312. The number “3 (also 1+1+1)” here refers to the character of the indication data in the indication field310 (the first “1” of “3”, since the size of 8 digits is equivalent to that of one character), the first special character (the second “1” of “3”), and the second special character (the last “1” of “3”). Subsequently, category determinations may then be made by comparing those retrievedvalue data312 withkeyword108. After the retrievedvalue data312 is compared with the keywords108 (regardless of whether the comparison results are successful or not), thenext value data312 in the sub-field308 may be retrieved in the same manner (e.g., the value data “Days When We Were Together” in thesecond sub-field308 ofRecord 1 are then retrieved) for further comparison. Thekeyword108 may be firstly compared with the value data retrieved fromRecord 1, and after all the value data ofRecord 1 are compared, the value data of thefirst sub-field308 of Record 2 (e.g., “Xiaogang Feng”) is then compared with thekeyword108. All comparisons may continue until those value data of all records are compared with thekeyword108.
Note that the digit number of each sub-field308, and also digit numbers of those fields within thesub-field308, including the digit numbers of theindication field310, the first special character, and the second special character, may be changed according to practical applications and/or actual demands. The invention is not limited thereto. The comparison between thekeyword108 and the retrievedvalue data312 is exemplified in the previous embodiment and should not be construed as a limitation to the invention. In another embodiment, the full-text search may be conducted by comparing thekeyword108 with all contents of therecords302 character-by-character. Besides, the way of skipping the digits of theindication field310, the second special character, and the first special character may be achieved by means of bit-shift operations (e.g., division) under hardware, software, or a combination thereof. Any people skilled in the art may make necessary modifications based on his/her practical applications and/or actual demands. In another embodiment, thetitle field304 may not include the first and second special characters, and each sub-field308 in thetitle field304 may be established by using a fixed digit number such that theindication field310 in the sub-field308 may be established by means of another fixed digit number different from that of thesub-field308. Since the digit numbers of both thesub-field308 and theindication field310 are fixed, the indication data or the value data in each sub-field308 may be directly retrieved by skipping certain digit numbers through bit-shift operations (e.g., division).
When the digit number of each sub-field308 is fixed, a counter may be used in the search system200 (or in a server having the natural language comprehension system100) to register which sub-field308 is compared currently. Additionally, another counter may be employed to store the order of the record which is compared currently. For instance, a first counter is applied to show the order of the currently compared record, and a second counter is applied to show the order of the currently compared sub-field. If the data in the third sub-field308 (i.e., “filenameguid: Huayi Brothers Media”) ofRecord 2 shown inFIG. 3A are compared currently, the value stored by the first counter is 2 indicating the currently compared record isRecord 2, and the value stored by the second counter is 3 indicating the currently compared sub-field is the third one. For the purpose of reserving most digits ofsub-field308 as the storage of thevalue data312, theindication field310 merely contains 7 or 8 digits in the embodiment. And, the stored indication data (by means of 8 digits) in theindication field310 may act as an indicator/pointer for retrieving actual indication data from an indicationdata storage apparatus280. In one embodiment, the indication data are stored in tables; however, any kind of data structures may be applied to store the indication data of the invention as long as the indication data is accessible by thesearch system200. Practically, the value data may be directly retrieved for comparison, and the indication data may be directly retrieved according to the values of these two counters if a matched comparison result is found. This retrieved indication data may be served as theresponse result110 and may then be transmitted to the knowledgecomprehension assistance module400 for further processing. For instance, when the data in the second sub-field308 (i.e., “songnameguid: Betrayal”) ofRecord 6 are compared and matched with thekeyword108, the current values of the first/second counters are 6/2, respectively. Therefore, according to these two counter values, the indication data may be obtained by searching associated table as shown inFIG. 3C stored in the indicationdata storage apparatus280, and the table indicates that the indication data in the second sub-field ofRecord 6 are “songnameguid.” In a further embodiment of the invention, all digits in the sub-field308 may be applied to store the value data if the digit number for each sub-field308 is fixed. Thereby, theindication field312, the first special character, and the second special character may be completely removed. In this case, thesearch engine240 is aware that it goes to the next sub-field after passing a fixed digit number, and the value of the second counter increases one thereafter. Certainly, the value of the first counter will increase one when searching the next record. In one embodiment, any record of the structureddatabase220 may be designed to have an identical size and numbers ofsub-fields308 within a record may be fixed to a predetermined number, such that thesearch engine240 is aware it goes to the end of a record if identical-sized data having has been parsed for the record. In another embodiment, a third special symbol, e.g., a period or the like, is placed at the end of a record such that thesearch engine240 is aware it goes to the end of a record whenever this predetermined symbol is found. Thereby, more digits may be applied to store the value data.
Another example is provided herein to explain the process of transmitting theresponse result110 back to the knowledgecomprehension assistance module400 for further processing after the matched comparison result is found. According to the data structure of therecords302, in one embodiment of the invention as shown above, if the user'srequest information102 is “I want to see Let the Bullets Fly,” two possibleintention syntax data106 may be derived from the user's request information102:
“<readbook>,<bookname>=Let the Bullets Fly”; and
“<watchfilm>,<filmname>=Let the Bullets Fly”;
According to thekeyword108 “Let the Bullets Fly” received through thesearch interface unit260, thesearch engine240 conducts a full-text search to thetitle field304 of each record stored in the structureddatabase220 shown inFIG. 3A. In thetitle field304 ofRecord 5, the value data “Let the Bullets Fly” are found, and thus a matched result is obtained. Thesearch system200 then transmits the indication data “filmnameguid” (in the third sub-title filed of thetitle field304 in Record 5) as theresponse result110 back to the knowledgecomprehension assistance module400. Since the third sub-title field inRecord 5 includes the indication data “filmnameguid” corresponding to the value data “Let the Bullets Fly,” the knowledgecomprehension assistance module400 is able to compare the indication data “filmnameguid” with the previously storedintention data112 “<watchfilm>” and “<readbook>” in the possibleintention syntax data106, so as to determine the confirmativeintention syntax data114 corresponding to therequest information102 is “<watchfilm>,<filmname>=Let the Bullet Fly” (because of the word “film”). That is, the data “Let the Bullets Fly” described in the user'srequest information102 refer to the name of a film, and the user's intention contained in therequest information102 is to watch the film “Let the Bullets Fly” instead of reading the book “Let the Bullets Fly.” The confirmativeintention syntax data114 “<watchfilm>,<filmname>=Let the Bullet Fly” is then directed to the parsedresult output module116 for further processes.
Relevant explanations are further provided in the following example. If the user'srequest information102 is “I want to listen to Days When We Were Together,” two possibleintention syntax data106 may be derived from the user's request information102:
“<playmusic>,<singer>=when we were together, <songname>=days”; and
“<playmusic>,<songname>=Days When We Were Together”;
Thesearch engine240 then conducts a full-text search in thetitle field304 of the records stored in the structureddatabase220 as shown inFIG. 3A according to the two sets ofkeywords108 received through the search interface unit260:
“When We Were Together” and “Days”; and
“Days When We Were Together”
During the full-text searches, no matched result corresponding to the first set of keywords108 (i.e., “When We Were Together” and “Days”) is found in all records, butRecord 1 corresponding to the second set of keywords108 (i.e., “Days When We Were Together”) is found. Hence, thesearch system200 considers the indication data “songnameguid” (corresponding to the second set of keywords108) in thetitle field304 ofRecord 1 as theresponse result110 and then transmits back to the knowledgecomprehension assistance module400. After receiving the indication data “songnameguid” corresponding to the value data “Days When We Were Together,” the knowledgecomprehension assistance module400 compares the intention data112 (i.e., <singer>, <songname>, etc.) in the possible intention syntax data106 (i.e., “<playmusic>,<singer>=When We Were Together, <songname>=Days” and “<playmusic>,<songname>=Days When We Were Together”) with the indication data “songnameguid” and then observes that the user'srequest information102 does not contain any data relating to singer named “Days” but relating to a song titled “Days When We Were Together” (because only <songname> is successfully matched). Through this comparison procedure, the knowledgecomprehension assistance module400 is able to determine required confirmativeintention syntax data114 corresponding to therequest information102 is “<playmusic>,<songname>=Days When We Were Together,” and the user's intention included in therequest information102 is to listen to the song “Days When We Were Together.”
In another embodiment of the invention, the searchedresponse result110 may be a completely matched record completely matching thekeywords108 or a partially matched record partially matching thekeywords108. For instance, if the user'srequest information102 is “I want to listen to Betrayal of Jam Hsiao,” thenatural language processor300 may obtain two possibleintention syntax data106 after parsing the request information102:
“<playmusic>,<singer>=Jam Hsiao,<songname>=Betrayal”;
“<playmusic>,<songname>=Betrayal of Jam Hsiao”;
and thenatural language processor300 transmits two sets ofkeywords108 to the search interface unit260:
“Jam Hsiao” and “Betrayal”; and
“Betrayal of Jam Hsiao”;
According to thekeywords108 received through thesearch interface unit260, thesearch engine240 conducts a full-text search to thetitle field304 of each record302 stored in the structureddatabase220 shown inFIG. 3A. During the full-text search, no matched result corresponding to the second set of keywords108 (i.e., “Betrayal of Jam Hsiao”) is found in the title fields304 of allrecords302, but Records 6 and 7 corresponding to the first set of keywords108 (i.e., “Jam Hsiao” and “Betrayal”) are matched.Record 6 is a partially matched record since merely “Jam Hsiao” of the first set ofkeywords108 matches the value data “Jam Hsiao” inRecord 6 but “Betrayal” of the first set of keywords does not match other value data “Aska Yang” and “Gary Chaw”. By contrast,Record 7 is the completely matched record because the first set ofkeywords108 “Jam Hsiao” and “Betrayal” are both found in the first and second value data of Record 7 (because both “Jam Hsiao” and “Betrayal” are successfully matched). Note thatRecord 5 corresponding to therequest information102 “I want to watch Let the Bullet Fly” andRecord 1 corresponding to therequest information102 “I want to listen to Days When We Were Together” are also a partially matched record. In an embodiment of the invention, when thesearch interface unit260 outputs a plurality ofresponse results110 to the knowledgecomprehension assistance module400, thesearch interface unit260 may sequentially output the completely matched records and then those partially matched records since the priority of the completely matched records may be set as higher than that of the partially matched records. Hence, when thesearch interface unit260 outputs the response results110 associated withRecord 6 andRecord 7, the output priority ofRecord 7 is greater than that ofRecord 6 because all value data “Jam Hsiao” and “Betrayal” inRecord 7 are successfully matched, while the value data “Asko Yang” and “Gary Chaw” inRecord 6 are not matched. In other words, since the priority of a matched record is higher than others if this record matches with thekeywords108 to a greater extent, the knowledgecomprehension assistance module400 is advantageous to search or determine required confirmativeintention syntax data114 efficiently. In another embodiment, indication value of the matched record with the highest priority among all matched records may be directly output as the response result110 (and may be the confirmativeintention syntax data114 later). The above descriptions should not be construed as limitations to the invention. In another embodiment, as long as any matched record is found, associated indication value of this matched record is output without considering its priority, so as to expedite the search processes. For instance, if therequest information102 is “I want to listen to Betrayal of Jam Hsiao,” and a matched result is found inRecord 6, corresponding indication data inRecord 6 are output as theresponse result110 immediately. In another embodiment, associated operation(s) for the record having the highest priority may be directly performed and then provided to the user. For instance, if the record “play the film of Romance of the Three Kingdoms” has the highest priority, the film “Romance of the Three Kingdoms” may be directly played. Moreover, if the record “play the song of Betrayal performed by Jam Hsiao” has the highest priority, the song “Betrayal” performed by Hsiao may be directly played. Note that the above descriptions are merely descriptive but not restrictive.
In yet another embodiment of the invention, if the user'srequest information102 is “I want to listen to Betrayal of Andy Lau,” one of the corresponding possibleintention syntax data106 may be:
“<playmusic>,<singer>=Andy Lau,<songname>=Betrayal”;
If thekeywords108 “Andy Lau” and “Betrayal” are input to thesearch engine240 through thesearch interface unit260, no matched result will be found in the database show inFIG. 3A. In yet another embodiment of the invention, thekeywords108 “Andy Lau” and “Betrayal” may be respectively input to thesearch engine240 through thesearch interface unit260, and a result indicating that “Andy Lau” is a singer's name (the indication data “singerguid”) and another result indicating that “Betrayal” is a song title (the indication data “songnameguid”, while the song may be performed by Gary Chaw or by Jam Hsiao, Aska Yang, and Gary Chaw together) may be respectively obtained. Alternatively, the naturallanguage comprehension system100 may further remind the user of “whether the song Betrayal is performed by Jam Hsiao (according to the matched result of Record7)” or “whether the song Betrayal is performed by Jam Hsiao, Aska Yang, and Gary Chaw together (according to the matched result of Record6)”.
In yet another embodiment of the invention, each record stored in the structureddatabase220 may further include asource field314 and apopularity field316. As shown inFIG. 3B, each record stored in the structureddatabase220 not only has the fields shown inFIG. 3A but also owns thesource field314, thepopularity field316, thepreference field318, and thedislike field320. Thesource field314 of each record302 stores an indication/pointer regarding where the source structured database therecord302 is from (please note only one structureddatabase220 is shown in the drawings, but there may be various structured databases actually), the user who provides the record302, or the server which provides information relating to therecord302. According to the preferences derived from therequest information102 previously provided by the user, thesearch system200 may search a certain structured database. For instance, when thekeyword108 included in therequest information102 are applied to conduct a full-text search and a matched result is found, the popularity value of the matched record will increase one automatically. Thepopularity field316 of each record302 stores a search popularity or a popularity value of therecord302, which may refer to the number of matches or the matching probability of therecord302 regarding therequest information102 provided by an identical user, all users of a special group, or all users during a time interval. Thereby, the knowledgecomprehension assistant module400 is able to determine the user's intention according to current popularity. The ways of employing thepreference field318 and thedislike field320 will be introduced later in the following paragraphs. Specifically, if the user'srequest information102 is “I want to see the Romance of the Three Kingdoms,” thenatural language processor300 may obtain many possibleintention syntax data106 after parsing the request information102:
“<readbook>,<bookname>=Romance of the Three Kingdoms”;
“<watchTV>,<TVname>=Romance of the Three Kingdoms”; and
“<watchfilm>,<filmname>=Romance of the Three Kingdoms”.
In one embodiment, if thesearch system200 browses and gathers statistic historical records (i.e., the number of times of selecting therecord302 by a certain user, as stored in the popularity field316) associated with the user'srequest information102, thesearch system200 may conduct a search in the structured database specifically storing the film and concludes that the most of the user's requests are to watch films (assumed there is only one record respectively relating to book, TV, and film for “Romance of the Three Kingdoms”, and the value of thepopularity field316 for watching film is higher than that for watching TV and reading book), and thereby the knowledgecomprehension assistance system400 may determine that “<watchfilm>,<filmname>=Romance of the Three Kingdoms” as the confirmative intention syntax data114 (after aresponse result110 indicating such a conclusion is received). In yet another embodiment, thesearch system200 may browse and gather statistic for thosepopularity field316 of all matched records for further identifications if there are a lot of records indicating the identical category. For example, if there are more than one records in thestructure database220 relating to respectively relating to book, TV, and film for “Romance of the Three Kingdoms”, thesearch system200 may gather statistics of those matched so as to find which category having the largest value. For example, if there are 5, 13, 16 matched records relating to book, TV, and film for “Romance of the Three Kingdoms”, and the summation of the values of these fivepopularity field316 relating to book is 30, the summation of the values of these thirteenpopularity field316 relating to TV is 18, the summation of the values of these sixteenpopularity field316 relating to drama is 25, thesearch system200 may select a matched record having thelargest popularity field316 among the five one relating to book for “Romance of the Three Kingdoms”. Associated indication value (may also include the value stored in the source field314) of this selected record may then be directed to the knowledgecomprehension assistance module400 for further processing. In one embodiment, the value stored in thesource field314 may show the code indicating where to find the database specifically storing the film. Moreover, the code stored in thesource field314 may be delivered to the knowledgecomprehension assistance system400 as a part of theresponse result110 so as to show to the user where to obtain the drama for playback. The way to change the value stored in thepopularity field316 may be varied according to different computer systems equipped with the naturallanguage comprehension system100, and the invention is not limited thereto. Besides, the value of thepopularity field316 may gradually decrease as time goes by, so as to indicate that the user's is gradually no longer interested in therecord302. The invention is not limited thereto as well.
In another embodiment of the invention, the user may particularly enjoy watching a TV drama of “Romance of the Three Kingdoms” during a certain period of time. Since “Romance of the Three Kingdoms” is a long-running drama, and the user is not able to watch all episodes at one time, the user may repeatedly select the TV drama of “Romance of the Three Kingdoms” within a period of time. If the value in thepopularity field316 increases one every time when the TV drama of “Romance of the Three Kingdoms” is selected, therecord302 may be repeatedly matched. Thesearch system200 may learn it from browsing the data stored in thepopularity field316. In yet another embodiment, the data stored in thepopularity field316 may also be employed to represent the popularity of accessing the data provided by a certain provider, and a telecommunication service provider may store a code of the provider in thesource field314. For instance, assumed the “film drama of Romance of the Three Kingdoms” provided by a certain service provider is mostly selected. When a user inputs his/herrequest information102 indicating “I want to see the Romance of the Three Kingdoms,” the full-text search conducted to the structured database shown inFIG. 3B may find three matched results “read the book of Romance of the Three Kingdoms” (Record 8), “watch the TV drama of Romance of the Three Kingdoms” (Record 9), and “watch the film of Romance of the Three Kingdoms” (Record 10). However, since the data in thepopularity field316 show that watching the film drama of Romance of the Three Kingdoms is the most popular option (i.e., the values in the popularity fields ofRecords 8, 9, and 10 are 2, 5, and 8), the indication data ofRecord 10 may be firstly provided as theresponse result110 and output to the knowledgecomprehension assistance system400 for determining the user's intention. In an embodiment of the invention, the data in thesource field314 may be simultaneously provided to the user, so as to show to the user the service provider who provides the film drama for watching (and he/she may link to this service provider to watch film drama). In another embodiment, if there are a lot of records providing “film of Romance of the Three Kingdoms” for user's watching, thesearch system200 may deliver the data within thesource field314 of the record having the largest value in the popularity fields316 among all records providing the same contents (i.e., providing “a film of Romance of the Three Kingdoms”). Note that the way of changing the value stored in thesource field314 may be varied according to different computer systems equipped with the naturallanguage comprehension system100, and the invention is not limited thereto. The information included in thepopularity field316, thepreference field318, and thedislike field320 shown inFIG. 3B may be further divided into two parts respectively related to an individual user and all users. Furthermore, the information included in thepopularity field316, thepreference field318, and thedislike field320 and related to the individual user may be stored in a user's cell phone, while the server may store the information included in thepopularity field316, thepreference field318, and thedislike field320 and related to all users. Thereby, the personal information with respect to a user's selections or the intentions is merely stored in his/her own mobile communication apparatus (e.g., a cell phone, a tablet PC, a small notebook computer, and so on), and the server stores the information related to all users. The purposes of enhancing storage efficiencies for servers and ensuring privacies of user's personal information are thus achieved simultaneously.
Apparently, the value data in each record of the structured database described herein are correlated (e.g., the value data “Andy Lau,” “Days When We Were Together,” and “HK and Taiwan, Cantonese, pop” inRecord 1 all serve to describe the category of Record 1), and the value data (accompanied with associated indication data) in each record collectively illustrate a user's intention corresponding to the user's request information (e.g., when the value data “Days When We Were Together” are matched with the keywords in the user's request information, it indicates that the user may intend to access the data in Record 1). Thereby, when thesearch engine200 conducts the full-text search in the structureddatabase220 and also when the value data in a certain record are matched, the indication data (e.g., “songnameguid”) corresponding to the matched value data may be output as theresponse result110, so as to determine the intention corresponding to the request information102 (e.g., through comparison in the knowledge comprehension assistance module400).
Based on the contents disclosed or taught by the exemplary embodiments,FIG. 4A shows a flowchart illustrating a search method according to an embodiment of the invention. With reference to4A, the search method includes the following steps.
In step S410, a structured database that stores a plurality of records having structuralized data is provided.
In step S420, at least one keyword is received.
In step S430, a full-text search is conducted to a title field of each record according to the keyword. For instance, thekeyword108 is input to thesearch interface unit260, such that thesearch engine240 may conduct the full-text search to the title fields304 of therecords302 in the structureddatabase220. The actual search processes may be referred to descriptions associated with FIGS.1 to3A/3B, or may be modified without departing from the spirits and scopes of the invention.
In step S440, after the full-text search is conducted, thesearch engine240 determines whether a matched result is found. For instance, thesearch engine240 is applied to determine whether a matched result corresponding to thekeywords108 is found during the full-text search processes.
If there is a matched result, a response result according to completely matched record and a partially matched record are sequentially output in step S450. For instance, if thekeyword108 is matched with records in the structureddatabase220, thesearch interface unit260 sequentially outputs the indication data corresponding to the completely matched records and then the indication data corresponding to those partially matched records. The indication data may be obtained through the indication data storedapparatus280 shown inFIG. 3C and may act as theresponse result110 being sent to the knowledgecomprehension assistance system400. In one embodiment, theresponse result110 may further comprise information associate with the matched record, e.g., those values stored in thesource field314 and/or thecontent field306, for further processing (such as shown for redirecting to associate database aforementioned). The priority of the completely matched record is higher than that of the partially matched record in one embodiment.
From another perspective, if no matched result is found (e.g., the full-text search is conducted according to keywords “Andy Lau” and “Betrayal” such that no matched result is found), the naturallanguage comprehension system100 may inform the user such a mismatch situation and then terminate the search processes afterward. Alternatively, the user may be informed that no matched result is found and he/she may input another request again. Additionally, the naturallanguage comprehension system100 may also provide some possible options to the user for further selection (step S460).
The above-mentioned steps and processes are not limited to those described herein, and some of the steps and processes may be omitted. For instance, in an embodiment of the invention, a match determination module (not shown) located outside thesearch system200 may be applied to determinate whether an outputtingresponse result110 indicates a completely or partial match record in the step S440. In another embodiment, the step S450 may be deleted.
Based on the contents disclosed or taught by the exemplary embodiments,FIG. 4B shows a flowchart illustrating a work process of the naturallanguage comprehension system100 according to another embodiment of the invention. With reference toFIG. 4B, the work process of the naturallanguage comprehension system100 includes the following steps.
In step S510, a user's request information is received. For instance, a user may transmit his/herrequest information102 represented by means of speeches or textual contents to the naturallanguage comprehension system100.
In step S520, a structured database storing a plurality of records is provided.
In step S530, the request information is grammaticalized. For instance, after thenatural language processor300 parses the user'srequest information102, thenatural language processor300 converts the parsedrequest information102 into corresponding possibleintention syntax data106.
In step S540, possible categories associated with a keyword are recognized. For instance, the knowledgecomprehension assistance system400 may recognize the possible category of thekeyword108 in the possibleintention syntax data106 as, e.g., the keyword “Romance of the Three Kingdoms” a book, a film, or a TV drama.
In step S550, a full-text search is conducted in atitle field304 of each record according to thekeyword108. For instance, thekeyword108 is input via thesearch interface unit260, such that thesearch engine240 conducts the full-text search in thetitle field304 of each record302 stored in the structureddatabase220.
In step S560, after the full-text search is conducted, thesearch engine240 may determine whether a matched result is found. For instance, thesearch engine240 may determine whether a matched result (whatever completely or partially matched) corresponding to thekeyword108 is found after the full-text search is conducted.
If there is a matched result, in step S570, a completely matched record and a partially matched record acting as theresponse result110 may be sequentially output. For instance, if thekeywords108 are matched with a record in the structureddatabase220, thesearch interface unit260 may sequentially output the indication data corresponding to the completely matched records and then the partially matched records, wherein the output indication data are considered as theresponse result110.
Here, the priority of the completely matched record is higher than that of the partially matched record.
In step S580, corresponding confirmative intention syntax data are sequentially output. For instance, the knowledgecomprehension assistance module400 outputs the confirmativeintention syntax data114 according to the sequentially output indication data, which may correspond to the completely matched record and the partially matched record, respectively.
From another perspective, if no matched result is found in step S560 (e.g., the full-text search is conducted according to “Andy Lau” and “Betrayal” and no matched result is found), a step similar to the step S460 is performed (i.e., the user may be informed of match failure, and the process is terminated here). Alternatively, the user may be informed by the disclosed system that no matched result is found and he/she may need to input another request. In one embodiment, the disclosed system may provide some possible options to the user for further selections (step S590).
The above-mentioned steps and processes are not limited to those described herein, and some of the steps and processes may be omitted.
In conclusion, the keywords included in the user's request information are captured, and the full-text search is conducted in the title fields of the records (having the structures illustrating inFIGS. 3A and 3B) in the structured database. If there is a matched result, the category of the keyword may be compared with intension data so as to recognize and ascertain the user's intention corresponding to the request information.
The structured database may be further applied to recognize speeches, which will be elaborated hereinafter. Specifically, descriptions are given for illustrating how the naturallanguage comprehension system100 is employed to revise an incorrect speech response according to user's successive speech input, and how the naturallanguage comprehension system100 is able to find possible report answers reported to the user for further selections.
As is discussed above, a common mobile communication apparatus is able to perform the natural language dialogue function, such that the user may communicate with the mobile communication apparatus by means of his/her own speeches. However, in the common mobile communication apparatus, if the user's speech input is unclear or unrecognized, the system may output a speech response unconformable to user's speech input which refers to different intentions or purposes. The user may receive speech responses unconformable to his or her intention during some dialogue scenarios. Therefore, a method and related natural language dialogue system for correcting speech responses are provided herein, and the disclosed natural language dialogue system is able to correct an erroneous speech response according to user's following speech inputs and then find additional possible report answers reporting to the user. In order to make the invention being more comprehensible, embodiments in accompanied with examples are described in the following paragraphs.
FIG. 5A is a block diagram illustrating a natural language dialogue system according to an embodiment of the invention. With reference toFIG. 5A, the naturallanguage dialogue system500 includes aspeech sampling module510, a naturallanguage comprehension system520, and aspeech synthesis database530. According to an embodiment of the invention, thespeech sampling module510 receives a first speech input501 (e.g., from a user), which is then parsed to generate a first request information503. The naturallanguage comprehension system520 parses the first request information503 and thereby obtains afirst keyword509 included in the first request information503. After afirst report answer511 that matches the first request information503 is found, the naturallanguage comprehension system520 performs a corresponding speech search in thespeech synthesis database530 according to thefirst report answer511 so as to find a first speech513. The naturallanguage comprehension system520 then generates a first speech response507 (which is responsive to the first speech input501) according the first speech513, and finally outputs thefirst speech response507 to the user. The first request information503 may be considered as therequest information102 described and depicted inFIG. 1 and follow the same processing procedures afterward. That is, after therequest information102 is parsed, the possibleintention syntax data106 are generated, and thekeyword108 therein may be applied to conduct the full-text search in the structureddatabase220 to obtain aresponse result110. Thisresponse result110 is then compared with theintention data112 in the possibleintention syntax data106, so as to generate required confirmedintention syntax data114. The parsedresult output module116 then outputs the parsedresult104 that may serve as thefirst report answer511 shown inFIG. 5A. Besides, the naturallanguage comprehension system520 is able to perform the corresponding speech search in thespeech synthesis database530 according to the first report answer511 (because the parsedresult104 associated with thefirst report answer511 may include data (e.g., the indication data stored in theindication field310, the value data stored in thevalue field312, and the data stored in the content field306) of a completely/partially matched record302). If the user considers thefirst speech response507 output by the naturallanguage comprehension system520 does not match his/her first request information503 included in thefirst speech input501, he/she may input another speech, e.g., asecond speech input501′. Processing of thefirst speech input501 by the naturallanguage comprehension system520 as described above is the same as that of thesecond speech input501′, so as to generate second request information503′. The second request information503′ is parsed to obtain asecond keyword509′ therein. After asecond report answer511′ that matches the second request information503 is found, the naturallanguage comprehension system520 searches and finds a corresponding second speech513′, generates asecond speech response507′ corresponding to the second speech513′, and outputs thesecond speech response507′ to the user as a correction to thefirst report answer511. Apparently, by adding new modules in the naturallanguage comprehension system100 ofFIG. 1, the naturallanguage comprehension system520 is able to correct any incorrect speech response according to speech inputs from a user, which will be further explained below with reference toFIG. 5B.
The components of the naturallanguage dialogue system500 may be configured within an identical machine. For instance, thespeech sampling module510 and the naturallanguage comprehension system520 may be configured in the same electronic apparatus. Here, the electronic apparatus may be a mobile communication apparatus (e.g., a cell phone, a PDA phone, a smart phone, etc.) or an electronic apparatus with communication functions or communication software, such as a pocket PC, a tablet PC, a notebook computer, a PC, and so on. The invention is not limited thereto. Besides, those electronic apparatuses may be operated by an Android operating system, a Microsoft operating system, a Linux operating system, and so forth, which should not be construed as a limitation to the invention. Certainly, the components of the naturallanguage dialogue system500 may also be configured in different apparatuses or systems and may be connected according to different communication protocols. For instance, the naturallanguage comprehension system520 may be configured in a cloud server or in a LAN server. The components of the naturallanguage dialogue system500 may also be positioned at different machines, e.g., the components of the naturallanguage dialogue system500 may be placed in the same machine where thespeech sampling module510 is located, or placed in a machine different from the machine where thespeech sampling module510 is located.
In an embodiment, thespeech sampling module510 receives the speech input. Thespeech sampling module510 may be an apparatus receiving audio, e.g., a microphone, and the first/second speech input501/501′ may be a user's speech.
According to the present embodiment, the naturallanguage comprehension system520 may be implemented by means of hardware circuitry constituted by using logic gates. In another embodiment, the naturallanguage comprehension system520 may be implemented by computer programming codes. For instance, the naturallanguage comprehension system520 may be programmed by using a programming language and acting as an application or a driver operated by an operating system. Program codes of the naturallanguage comprehension system520 may be stored in a storage unit and executed by a processing unit (not shown inFIG. 5A). Another embodiment is further shown below to enable people skilled in the art to further comprehend the naturallanguage comprehension system520 described herein. Note that the embodiment provided herein is merely exemplary and should not be construed as a limitation to the invention, and the natural language comprehension system may be implemented under hardware, software, firmware, or a combination thereof.
FIG. 5B is a block diagram illustrating a naturallanguage comprehension system520 according to an embodiment of the invention. With reference toFIG. 5B, the naturallanguage comprehension system520 described in the present embodiment may include aspeech recognition module522, a naturallanguage processing module524, and aspeech synthesis module526. Thespeech recognition module522 receives the request information503/503′ from the speech sampling module510 (e.g., the first request information503 parsed by the first speech input501) and captures one or more first keywords509 (e.g., thekeyword108 shown inFIG. 1A or other phrases) of thefirst speech input501/501′. The naturallanguage processing module524 may further parse thefirst keyword509 to obtain a candidate list having at least one report answer. The processing method described herein is similar to that depicted inFIG. 5A, e.g., a full-text search is conducted in the structureddatabase220 by thesearch system200 shown inFIG. 1A. That is, after theresponse result110 is obtained and compared with theintention data116, a confirmativeintention syntax data114 are generated, and the report answer is generated by the parsedresult output module116 according to the parsed result (the confirmative intention syntax data114) sent thereby. In the present embodiment, one report answer relatively conformable to thefirst speech input501 is selected from all the report answers in the candidate list (e.g., the completely matched record may be selected), and the selected report answer serves as thefirst report answer511. Thereport answer511 is internally parsed and obtained by the naturallanguage comprehension system520, and therefore the parsed result must be converted into a speech output before it is output to the user, and finally the user may determine whether the speech output matches with his/her speech input. According to thefirst report answer511, thespeech synthesis module526 conducts a search in aspeech synthesis database530 that records texts and corresponding speech information, such that thespeech synthesis module526 is able to find out required first speech513 corresponding to thefirst report answer511 and thereby create a synthesizedfirst speech response507. The synthesizedfirst speech response507 may then be output by thespeech synthesis module526 through the speech output interface (not shown) and broadcast to the user, wherein the speech output interface may be a speaker, an amplifier, a headset, or another similar device. When thespeech synthesis module526 conducts the search in thespeech synthesis database530 according to thefirst report answer511, the format of thefirst report answer511 may be converted, and then call thespeech synthesis database530 by means of interfaces (e.g., APIs) provided by thespeech synthesis database530. During associated calls to thespeech synthesis database530, whether a format conversion is required is determined according to the definitions of thespeech synthesis database530 and is well known to people skilled in the art. Therefore, no detailed description is provided hereinafter.
An example is given below for explanations. If thefirst speech input501 from the user indicates “I want to see the Romance of the Three Kingdoms,” thespeech recognition module522 receives the first request information503 parsed from thefirst speech input501 within thespeech sampling module510 and finally captures thefirst keyword509 exemplarily including “Romance of the Three Kingdoms.” The naturallanguage processing module524 may further parse thefirst keywords509 including “Romance of the Three Kingdoms” (e.g., through conducting a full-text search in the structureddatabase220 by thesearch system200 shown inFIG. 1A, comparing theintention data112 with theresponse result110 after theresponse result110 is obtained to generate the confirmativeintention syntax data114, and outputting the parsedresult104 by the parsed result output module116), generate the report answers having three intention options corresponding to “Romance of the Three Kingdoms,” organize the report answers to generate a candidate list, and select the report answer (e.g.,select Record 10 shown inFIG. 3B) having the largest value in thepopularity field316 from the three report answers included in the candidate list. The selected report answer is thefirst report answer511. Here, each of the three intention options (“read the book,” “watch the TV drama,” and “watch the film”) is assumed to correspond to one of the report answers. In an embodiment of the invention, the record having the largest popularity value in thepopularity field316 may be directly provided to the user (e.g., the song “Betrayal” performed by Jam Hsiao may be directly played as shown aforementioned), and the invention is not limited thereto.
Besides, the naturallanguage processing module524 may determine whether thefirst report answer511 is correct by parsing the subsequently receivedsecond speech input501′ (by following the same mechanism feeding thespeech input501 into the speech sampling module510). Thesecond speech input501′ is provided by the user in response to thefirst speech response507, which indicates whether the user considers thefirst speech response507 is correct or not. After thesecond speech input501′ is parsed, the naturallanguage processing module524 may select another report answer from the candidate list as thesecond report answer511′ if the naturallanguage processing module524 recognizes the user considers thefirst report answer511 is incorrect. For instance, after thefirst report answer511 is removed form the candidate list, asecond report answer511′ is then selected from those remaining report answers, and then the second speech513′ corresponding to thesecond report answer511′ is found by means of thespeech synthesis module526. Through thespeech synthesis module526, the synthesizedsecond speech response507′ corresponding to the second speech513′ is generated and broadcast to the user.
For instance, if the user inputs “I want to see the Romance of the Three Kingdoms” and he/she actually intends to watch the TV drama of Romance of the Three Kingdoms, Record 10 (i.e., watch the film of Romance of the Three Kingdoms) previously output to the user and shown inFIG. 3B is not his/her desired report answer. Accordingly, the user may further input “I want to watch the TV drama of Romance of the Three Kingdoms” (clearly indicating the intention to watch “TV drama”), or input “I don't want to watch the film of Romance of the Three Kingdoms” (merely denying the current report answer) as thesecond speech input501′. After thesecond speech input501′ is parsed to obtain the second request information503′ (or the second keywords509), the naturallanguage dialogue system500 finds that thesecond keywords509′ in the second request information503′ include “TV drama” (clearly pointed out by the user) or “don't want to watch the film” (denied by the user). Therefore, the naturallanguage dialogue system500 determines that thefirst report answer511 does not conform to the user's request. Accordingly, another report answer may be selected from the candidate list as thesecond report answer511′, and the correspondingsecond speech response507′ may be output. For instance, thesecond speech response507′ “I now play the TV drama of Romance of the Three Kingdoms for you” is output (if the user clearly indicates that he/she intends to watch the TV drama of Romance of the Three Kingdoms). Alternatively, thesecond speech response507′ “which option do you want” is output (if the user merely denies the current option), and in accompanied with other options in the candidate list are displayed for his/her further selection (e.g., selecting thesecond report answer511′ having the second largest popularity value in thepopularity field316 to the user). In another embodiment, if the user'ssecond speech input501′ includes the “selection” information, e.g., if three options “read the book of Romance of the Three Kingdoms,” “watch the TV drama of Romance of the Three Kingdoms,” and “watch the film of Romance of the Three Kingdoms” are provided to the user for his/her selection, the user may input “I want to watch the film” as thesecond speech input501′. At this time, the second request information503′ included in thesecond speech input501′ is parsed to learn the user's intention (e.g., the option “watch the film” selected by the user is found according to thesecond keyword509′), thesecond speech response507′ “I now play the film of Romance of the Three Kingdoms for you” is output (if the user intends to watch the film of Romance of the Three Kingdoms), and the film is directly played to the user. Certainly, if the user inputs “I want the third option” (e.g., if the user selects to read the book), associated application corresponding to the third option is performed, i.e., the e-book of Romance of the Three Kingdoms is displayed and thesecond speech response507′ “You want to read the book of Romance of the Three Kingdoms” is played.
According to the present embodiment, in the naturallanguage comprehension system520, thespeech recognition module524, the naturallanguage processing module524, thespeech synthesis module526, and thespeech sampling module510 may be configured in an identical machine. In other embodiments of the invention, thespeech recognition module522, the naturallanguage processing module524, and thespeech synthesis module526 may be separately arranged in different machines (e.g., a computer system, a server, or other similar devices/systems). For instance, in the naturallanguage comprehension system520′ ofFIG. 5C, thespeech synthesis module526 and thespeech sampling module510 may be configured in thesame machine502, while thespeech recognition module522 and the naturallanguage processing module524 may be configured in another machine. In addition, as shown inFIG. 5C, the naturallanguage processing module524 transmits thefirst report answer511/thesecond report answer511′ to thespeech synthesis module526, and thefirst report answer511/thesecond report answer511′ is then sent to thespeech synthesis database530 to search the first speech513/the second speech513′ through which thefirst speech response507/thesecond speech response507′ is generated.
FIG. 6 is a flowchart illustrating a method for correcting thefirst speech response507 according to an embodiment of the invention. According to the method for correcting thefirst speech response507 described herein, when the user thinks that thefirst speech response507 does not conform to his/her first request information503, he/she further feeds thesecond speech input501′ into thespeech sampling module510, which is then parsed by the naturallanguage comprehension system520 such that the naturallanguage dialogue system500 recognizes the first speech response previously output to theuse507 does not conform to the his/her intention. At this time, the naturallanguage comprehension system520 may output thesecond speech response507′ to correct thefirst speech response507. For the purpose of illustrations, the naturallanguage dialogue system500 shown inFIG. 5A is taken for example, while the method for correcting thefirst speech response507 described herein is also applicable to the naturallanguage dialogue system500′ shown inFIG. 5C.
With reference toFIG. 5A andFIG. 6, in step S602, thespeech sampling module510 receives the first speech input501 (i.e., thefirst speech input501 is fed to the speech sampling module510). Here, thefirst speech input501 is a user's speech, for instance, and thefirst speech input501 may also include first request information503 from the user. Specifically, thefirst speech input501 from the user may be an interrogative sentence, an imperative sentence, or any sentence having request information, such as “I want to read the Romance of the Three Kingdoms,” “I want to listen to the song of Forget-Love Potion,” or “What is the temperature today?”.
In steps S604, the naturallanguage comprehension system520 parses at least onefirst keyword509 included in thefirst speech input501 to obtain the candidate list, and the candidate list has at least one report answer. For instance, when the user'sfirst speech input501 indicates “I want to see the Romance of the Three Kingdoms,” thefirst keywords509 parsed and obtained by the naturallanguage comprehension system520 are “Romance of the Three Kingdoms” and “see.” In another example, when the user'sfirst speech input501 indicates “I want to listen to the song of Forget-Love Potion,” thefirst keywords509 parsed and obtained by the naturallanguage comprehension system520 are “Forget-Love Potion,” “listen,” and “song.”
According to thefirst keywords509, the naturallanguage comprehension system520 may search the structureddatabase220 to obtain at least one search result (e.g., the parsed result shown104 inFIG. 1) as report answers in the candidate list. The way of selecting thefirst report answer511 from plural report answers may refer to that depicted inFIG. 1 and thus will not describe herein. Since thefirst keywords509 may relate to different category or knowledge fields (such as films, books, music, or games) and the same category may be further classified into different sub-fields (e.g., different authors of a film or a book, different singers performing one song, different versions of a game, and so on), the naturallanguage comprehension system520 may search the structured database and obtain one or more search results (e.g., the parsed result104) corresponding to thefirst keywords509. Here, each search result may include the indication data and “other data” corresponding to thefirst keywords509. For instance, if thekeywords108 “Jam Hsiao” and “Betrayal” is conducted as a full-text search to the structureddatabase220 shown in FIG.3A/3B, two matched results (e.g.,Records 6 and 7 shown inFIG. 3A) that respectively include the indication data “singerguid” and “songnameguid” stored in theindication field310 are found. The “other data” refer to the keywords other than thefirst keywords709 in the search results. For instance, if the keywords “Days When We Were Together” is used to conduct the full-text search to the structureddatabase220 shown inFIG. 3A andRecord 1 is the matched result, “Andy Lau” and “HK and Taiwan, Cantonese, pop” are the so-called “other data.” Additionally, if thefirst speech input501 from the user has a plurality offirst keywords509, it indicates that the first request information503 from the user is of high clarity, such that the naturallanguage comprehension system520 is able to parse thefirst speech input501 and obtain the search result more closer to the first request information503.
For instance, when thefirst keywords509 are “Romance of the Three Kingdoms” (e.g., if the user inputs the speech input “I want to see the Romance of the Three Kingdoms”), the naturallanguage comprehension system520, after parsing thefirst keywords509, may generate three possible intention syntax data106 (as shown inFIG. 1):
“<readbook>,<bookname>=Romance of the Three Kingdoms”;
“<watchTV>,<TVname>=Romance of the Three Kingdoms”; and
“<watchfilm>,<filmname>=Romance of the Three Kingdoms”.
The search results are records corresponding to “ . . . ‘Romance of the Three Kingdoms’ . . . ‘book’” (i.e., the intention data are <readbook>), “ . . . ‘Romance of the Three Kingdoms’ . . . ‘TV drama’” (i.e., the intention data are <watchTV>), and “ . . . ‘Romance of the Three Kingdoms’ . . . ‘film’” (i.e., the intention data are <watchfilm>) (i.e.,Records 8, 9, and 10 shown inFIG. 3B). Here, “TV drama,” “book,” and “film” are recited as the user's intentions. In another example, when thefirst keywords509 are “Forget-Love Potion” and “music” (e.g., if the user inputs the speech input “I want to listen to the music of Forget-Love Potion”), the naturallanguage comprehension system520, after parsing thefirst keywords509, may generate the following possible intention syntax data:
“<playmusic>,<songname>=Forget-Love Potion”;
The search results are records corresponding to “ . . . ‘Forget-Love Potion’ . . . ‘Andy Lau’” (i.e.,
Record 11 shown in
FIG. 3B) and “ . . . ‘Forget-Love Potion’ . . . ‘E-jun Lee’” (i.e.,
Record 12 shown in
FIG. 3B), wherein “‘Andy Lau” and “E-jun Lee” (two Chinese singers) are corresponded to the user's intentions. That is, each search result may include the
first keywords509 and the intention data related to the
first keywords509, and the natural
language comprehension system520 may, according to the search result, convert the data in the search results into the report answers and also register the report answers into the candidate list for uses of the consecutive steps. Please note two requests “I want to listen to Andy Lau's Forget-Love Potion” and “I want to listen to Days When We Were Together” both employs the same sentence pattern in Chinese (i.e., “
” and “
”), but the disclosed natural
language comprehension system100/
520/
520′ are able to distinguish them by means of full-text searches in the disclosed
structured database220. For example, “Andy Lau's Forget-Love Potion” (i.e., “
”) may be recognized as a song sung by Andy Lau since the search result is found in
Record 1 of
FIG. 3B, while “Days When We Were Together” (i.e., “
”) may be recognized as a song (but a song “When We Were Together” sung by the singer “Days”) since there is a search result found in
Record 11 of
FIG. 3B. Additional information of these two comparison results (e.g., lyrics, preference, and/or popularity of “Days When We Were Together”) may be obtained from
Records 1 and 11. Obviously, the disclosed natural
language comprehension system100/
520/
520′ are capable of distinguishing different requests following the same sentence pattern by employing a full-text search to the disclosed
structured database220, because the disclosed
structured database220 stores a plurality of records each further stores data collectively demonstrate to which category of the record belongs. Moreover, the disclosed natural
language comprehension system100/
520/
520′ may be employed in different language systems (e.g., Cantonese, Shanghai dialect, or even English, Japanese . . . etc.) to distinguish different users' requests by following the same sentence pattern as long as a full-text search is performed to the disclosed
structured database220 with each record storing data collectively describing what category the record is. The invention is not limited to those presented embodiments.
In step S606, the naturallanguage comprehension system520 selects at least onefirst report answer511 from the candidate list and outputs afirst speech response507 associated with thefirst report answer511. According to the present embodiment, the naturallanguage comprehension system520 arranges the report answers in the candidate list according to a priority and selects the report answer from the candidate list according to the priority, thereby outputting thefirst speech response507.
For instance, if thefirst keyword509 is “Romance of the Three Kingdoms”, and if the naturallanguage comprehension system520 finds a number of records (e.g., 20) related to “Romance of the Three Kingdoms” and “books”, several records related to “Romance of the Three Kingdoms” and “music” (e.g., 18), and few records related to the “Romance of the Three Kingdoms” and “TV drama” (e.g., 10), the naturallanguage comprehension system520 considers the “book of Romance of the Three Kingdoms” as the first report answer (the report answer with the highest priority), the “music of Romance of the Three Kingdoms” as the second report answer (the report answer with the second highest priority), and the “TV drama of Romance of the Three Kingdoms” as the third report answer (the report answer with the third highest priority). That is, the priority is determined by the gathering statistics among all searched records. Certainly, in case the first report answer corresponding to the “book of Romance of the Three Kingdoms” refers to more than one record, another priority (e.g., the number of selecting times or the largest value among all the popularity fields316) may be used to select one record as thefirst report answer511, which is already described above and thus will not be explained again.
In step S608, thespeech sampling module510 receives asecond speech input501′, and the naturallanguage comprehension system520 parses thesecond speech input501′ to determine whether the selectedfirst report answer511 is correct. Thespeech sampling module510 parses thesecond speech input501′ to obtain asecond keyword509′ included in thesecond speech input501′, wherein thesecond keyword509′ refers to a keyword further provided by the user, such as time, intention, category or knowledge field, and so forth. When thesecond keyword509′ included in thesecond speech input501′ does not comply with the intention data in thefirst report answer511, the naturallanguage comprehension system520 determines the previously selectedfirst report answer511 is incorrect. The way of determining whether the second request information503′ of thesecond speech input501′ “confirms” or “negates” thefirst speech response507 is described above and will not be further explained here.
Specifically, thesecond speech input501′ parsed by the naturallanguage comprehension system520 may include or may not include a definitesecond keyword509′. For instance, the user's input received by thespeech sampling module510 may be “I don't mean the book of Romance of the Three Kingdoms” (scenario A), “I don't mean the book of Romance of the Three Kingdoms. I do mean the TV drama of Romance of the Three Kingdoms” (scenario B), or “I do mean the TV drama of Romance of the Three Kingdoms” (scenario C). Thesecond keywords509′ in the scenario A are “don't,” “Romance of the Three Kingdoms,” and “book”; thesecond keywords509′ in the scenario B are “don't,” “Romance of the Three Kingdoms,” “book,” “do,” “Romance of the Three Kingdoms,” and “TV drama”; while thesecond keywords509′ in the scenario C are “do,” “Romance of the Three Kingdoms,” and “TV drama,” for instance. For descriptive purposes, only the scenarios A, B, and C are exemplified herein, while the invention is not limited to those presented embodiments.
The naturallanguage comprehension system520 determines whether the relevant intention data in thefirst report answer511 are correct according to thesecond keywords509′ included in thesecond speech input501′. That is, if thefirst report answer511 is determined to be “the book of Romance of the Three Kingdoms,” and thesecond keywords509′ are “Romance of the Three Kingdoms” and “TV drama,” the naturallanguage comprehension system520 determines that the relevant intention data (i.e., the user intends to read the “book” of Romance of the Three Kingdoms) in thefirst report answer511 do not match thesecond keywords509′ included in thesecond speech input501′ (i.e., the user intends to watch the “TV drama” of Romance of the Three Kingdoms), and therefore the naturallanguage comprehension system520 determines that thefirst report answer511 is incorrect. Similarly, if thefirst report answer511 is determined to be “the book of Romance of the Three Kingdoms,” and thesecond keywords509′ are “don't,” “Romance of the Three Kingdoms,” and “book,” the naturallanguage comprehension system520 also determines that thefirst report answer511 is incorrect.
After the naturallanguage comprehension system520 parses thesecond speech input501′ and determines that thefirst report answer511 previously output to the user is correct, in step S610, the naturallanguage comprehension system520 responds to thesecond speech input501′. For instance, if thesecond speech input501′ from the user is “yes, I mean the book of Romance of the Three Kingdoms,” the naturallanguage comprehension system520 may output thesecond speech response507′ indicating “in the process of opening the book of Romance of the Three Kingdoms.” Alternatively, while the naturallanguage comprehension system520 is outputting thesecond speech response507′, the naturallanguage comprehension system520 directly loads file contents of the book “Romance of the Three Kingdoms” through a processing unit (not shown).
After the naturallanguage comprehension system520 parses thesecond speech input501′ and determines the previously first speech response507 (i.e., the report answer511) is incorrect, in step S612, the naturallanguage comprehension system520 selects from the candidate list a report answer other than the first report answer and then outputs thesecond speech response507′ according to currently selected report answer. At this time, if thesecond speech input501′ provided by the user does not contain a definitesecond keyword509′ (e.g., thesecond speech input501′ in the scenario A), the naturallanguage comprehension system520 may, according to the priority, select from the candidate list the report answer having the second highest priority. By contrast, if thesecond speech input501′ contains definitesecond keywords509′ (e.g., thesecond speech inputs501′ in the scenarios B and C), the naturallanguage comprehension system520 may, according to the second keywords referred by the user, select corresponding report answer from the candidate list.
If thesecond speech input501′ provided by the user contains definitesecond keywords509′ (e.g., thesecond speech inputs501′ in the scenario B and C), but the naturallanguage comprehension system520 does not find any report answer corresponding to thesecond keywords509 in the candidate list, the naturallanguage comprehension system520 then outputs the third speech response, such as “no such a book is found” or “I have no idea” to the user.
Another embodiment is further exemplified below to enable people skilled in the art to more detailed comprehend the method for correcting the speech response and the natural language dialogue system described herein.
Firstly, if thefirst speech input501 received by thespeech sampling module501 is “I want to see the Romance of the Three Kingdoms” (step S602), the naturallanguage dialogue system520 parses thefirst speech input501 to obtain thefirst keywords509 “see” and “Romance of the Three Kingdoms” and acquires a candidate list with a plurality of first report answers. Each of the first report answers has relevant keywords and other data (which may be stored in thecontent field306 shown in FIG.3A/3B or may be parts of thevalue field312 of each record302) (step S604), as shown in Table 1. Here, it is assumed that the search result respectively includes one book of “the Romance of the Three Kingdoms,” one TV drama of “the Romance of the Three Kingdoms,” one music of “the Romance of the Three Kingdoms,” and one movie of “the Romance of the Three Kingdoms.”
| TABLE 1 |
|
| Candidate List | Keyword | Other Data |
|
| Report Answer | Whether to display the | Book | Luo Guanzhong |
| A | book of “the Romance of | | the Ming Dynasty |
| the Three Kingdoms” | | Print Version |
| Report Answer | Whether to play the TV | TV | TV Station |
| B | drama of “the Romance | Drama | Cast |
| of the Three Kingdoms” | | Number of |
| | | Episodes |
| Report Answer | Whether to play the | Music | Vocal |
| D | music of “the Romance | | Lyrics |
| of the Three Kingdoms” | | |
| Report Answer | Whether to play the film | Film | Cast |
| E | of “the Romance of the | | Theatrical Release |
| Three Kingdoms” | | Director |
|
The naturallanguage comprehension system520 then selects the desired report answer from the candidate list. If the naturallanguage comprehension system520 selects the report answer A as thefirst report answer511 from the candidate list according to orders (e.g., from report answer A to report answer E), the naturallanguage comprehension system520 in step S606 outputs “whether to display the book of Romance of the Three Kingdoms” as thefirst speech response507, for instance.
At this time, if thesecond speech input501′ received by thespeech sampling module510 is “yes,” (step S608), the naturallanguage comprehension system520 determines the report answer A is correct. Besides, the naturallanguage comprehension system520 outputs anotherspeech response507 “please wait” (i.e., thesecond speech response507′) and loads the contents of the book of Romance of the Three Kingdoms through a processing unit (not shown) (step S610).
However, if thesecond speech input501′ received by thespeech sampling module510 is “I don't mean the book of Romance of the Three Kingdoms,” (step S608), the naturallanguage comprehension system520 determines that said report answer A is incorrect. Next, the naturallanguage comprehension system520 selects another report answer as thesecond report answer511′ from the report answers B to E in the candidate list, e.g., “whether to play the TV drama of Romance of the Three Kingdoms” in the report answer B. If the user continues to answer “I don't mean the TV drama of Romance of the Three Kingdoms,” the naturallanguage comprehension system520 selects one of the remaining report answers as the report answer. If all of the report answers A to E are already provided to the user by the naturallanguage comprehension system520 and none of these report answers corresponds to thespeech input501 from the user, the naturallanguage comprehension system520 may output aspeech response507 “no data are found” to the user (step S612).
According to another embodiment, in step S608, if thesecond speech input501′ delivered from the user to thespeech sampling module510 is “I mean the comics of Romance of the Three Kingdoms,” the naturallanguage comprehension system520 may directly output thesecond speech response507′ indicating “no data are found” because no report answer regarding the comics is included in the candidate list.
In light of the foregoing, the naturallanguage comprehension system520 is able to output thefirst speech response507 corresponding to thefirst speech input501 from the user. If thefirst speech response507 output by the naturallanguage comprehension system520 does not match the first request information503 of thefirst speech input501 from the user, the naturallanguage comprehension system520 may correct the answer shown in thisfirst speech response501 and further outputs associatedsecond speech response507′ (that may relatively conform to the first request information503 of the user than the previous one) according to thesecond speech input501′ subsequently provided by the user. Advantageously, in the event that the user is still dissatisfied with the report answer provided by the naturallanguage comprehension system520, the naturallanguage comprehension system520 may automatically correct the report answer and provide a new speech response to the user, so as to facilitate the user in dialogue with of the naturallanguage comprehension system520.
It should be mentioned that the naturallanguage comprehension system520 may arrange the report answers in the candidate list under different priority in steps S606 and S612 depicted inFIG. 6, and the naturallanguage comprehension system520 may select required report answer from the candidate list according to different priority and then output the speech response corresponding to the selected report answer.
For instance, the naturallanguage comprehension system520 may determine the priority of thefirst report answer511 in the candidate list according to a public usage habit (e.g., references to values associated with the public, i.e., the values stored in thepreference field318 and thedislike field320 shown inFIG. 3B if these fields are separated to store preferences of individual user and the public). The more often thefirst report answer511 is selected and used by the public, the higher priority the report answer has. Thefirst keyword509 is again exemplified by using “Romance of the Three Kingdoms,” and the report answers found by the naturallanguage comprehension system520 are assumed to be the TV drama, the book, and the music of “Romance of the Three Kingdoms.” If “Romance of the Three Kingdoms” frequently refers to the book of “Romance of the Three Kingdoms” by the public (e.g, 20 records relating to book), sometimes refers to the TV drama of “Romance of the Three Kingdoms” (e.g, 18 records relating to TV drama), and scarcely refers to the music of “Romance of the Three Kingdoms” (e.g, 10 records relating to music), the values stored in thepopularity field316 shows that the public prefers to read the “book” of “Romance of the Three Kingdoms” has the largest value such that the naturallanguage comprehension system520 arranges the report answers in the order of “book,” “TV drama,” and “music” according to the priority determined by the public usage habit. That is, the naturallanguage comprehension system520 firstly selects “the book of Romance of the Three Kingdoms” as thefirst report answer511 and outputs associatedfirst speech response507 according to thefirst report answer511.
The naturallanguage comprehension system520 may also determine the priority of the report answers according to a user's habit merely (e.g., references to values associated with an individual user, i.e., the values stored in thepreference field318 and thedislike field320 shown inFIG. 3B if these fields are separated to store preferences of individual user and the public). Particularly, the naturallanguage comprehension system520 may store the speech inputs (including thefirst speech input501, thesecond speech input501′, or any other speech input) from the user into a properties database (as exemplarily shown in FIGS.7A/7B), and the properties database may be stored in a storage device, e.g., a hard drive. The properties database may include thefirst keywords509 obtained when the naturallanguage comprehension system520 parses thespeech input501 of the user, and the properties database may also include response records (including the user's preferences and/or habits) generated by the naturallanguage comprehension system520. The way of storing and capturing the user's preferences/habits will be shown below with references to FIGS.7A/7B/8. Besides, in an embodiment, when the values stored in thepopularity field316 shown inFIG. 3B are related to the user's habit (e.g., the number of times of the matched conditions), the values stored in thepopularity field316 may be employed to determine use's preference or priority. Therefore, the naturallanguage comprehension system520 may select the report answer according to the priority corresponding to the recorded user's preference or other information stored in theproperties database730, and thereby output thespeech response507 relatively conforms to thespeech input501 from the user. For instance, inFIG. 3B, the values stored in thepopularity field316 in eachrecord 8/9/10 are 2/5/8, which respectively indicates the number of times the “book,” the “TV drama,” and the “film” of “Romance of the Three Kingdoms” matching with the user's speech inputs is 2/5/8. Therefore, the report answer corresponding to the “film of Romance of the Three Kingdoms” is selected at first.
The naturallanguage comprehension system520 may select the report answer according to a user's preference. For instance, when the user talks to the naturallanguage comprehension system520 and he/she frequently mentions “I want to read the book of Romance of the Three Kingdoms,” sometimes mentions “I want to watch the TV drama of Romance of the Three Kingdoms,” and hardly mentions “I want to listen to the music of Romance of the Three Kingdoms”. For example, there are 20 records regarding “the book of Romance of the Three Kingdoms” (shown in thepreference field318 in therecord 8 ofFIG. 3B), 8 records regarding “the TV drama of Romance of the Three Kingdoms” (shown in thepreference field318 in therecord 9 ofFIG. 3B), and 1 record regarding “the music of Romance of the Three Kingdoms” (not shown inFIG. 3B). The report answers in the candidate list are then arranged in the order of “the book of Romance of the Three Kingdoms,” “the TV drama of Romance of the Three Kingdoms,” and “the music of Romance of the Three Kingdoms”. In other words, when thefirst keyword509 is “Romance of the Three Kingdoms,” the naturallanguage comprehension system520 firstly selects “the book of Romance of the Three Kingdoms” as thefirst report answer511 and outputs correspondingfirst speech response507 according to this selectedfirst report answer511.
Note that the naturallanguage comprehension system520 may determine the priority of the report answers according to a user's preference. Specifically, the user's dialogue database may store the keywords used by the user, such as “like,” “idol,” “hate,” “dislike,” etc. According to the number of the registered keywords, the naturallanguage comprehension system520 may arrange the report answers of the candidate list in a certain order. For instance, if the number of the recorded keyword “like” in a specific report answer is significant large, this specific report answer is selected firstly. Alternatively, if the number of the registered keyword “hate” in a specific report answer is significant large, associated report answer may selected later than others.
For instance, when the user talks to the naturallanguage comprehension system520 and he/she frequently mentions “I dislike watching the TV drama of Romance of the Three Kingdoms,” sometimes mentions “I dislike listening to the music of Romance of the Three Kingdoms,” and seldom mentions “I dislike reading the book of Romance of the Three Kingdoms”. For example, there may be 20 records regarding “I dislike watching the TV drama of Romance of the Three Kingdoms” (shown in thedislike field320 in therecord 9 ofFIG. 3B), 8 records regarding “I dislike listening to the music of Romance of the Three Kingdoms” (not shown inFIG. 3B), and 1 record regarding “I dislike reading the book of Romance of the Three Kingdoms” (shown in thedislike field320 in therecord 8 ofFIG. 3B). The report answers in the candidate list are then arranged in the order of “the book of Romance of the Three Kingdoms,” “the TV drama of Romance of the Three Kingdoms,” and “the music of Romance of the Three Kingdoms” according to the priority. That is, if thefirst keyword509 is “Romance of the Three Kingdoms,” the naturallanguage comprehension system520 selects the book of “Romance of the Three Kingdoms” as thefirst report answer511 and outputs correspondingfirst speech response507 relating to thefirst report answer511. According to an embodiment, a “dislike field320” may be added to thepopularity field316 shown inFIG. 3B for recording the “degrees of dislike” of the user. In another embodiment, when a user's “dislike” information with respect to a certain record is parsed, one or another numeric value may be directly subtracted from the popularity field316 (or the preference field318) in corresponding record, so as to register the user's preference without any additional field. Any possible way of registering the user's preferences is applicable to an embodiment of the invention and should not be construed as a limitation to the invention. Different embodiments of providing the report answer and the speech response according to other ways of registering and employing the user's preferences as well as the user/public usage habits and preferences will be further given below with reference to FIG.7A/7B/8.
In another aspect, the naturallanguage comprehension system520 may determine the priority of at least one report answer according to a user's speech input that is input before the naturallanguage dialogue system500 provides the report answer, i.e., before thefirst speech input501 is broadcast (at this time, the user is not aware what report answer may be selected and provided by the natural language dialogue system500). Namely, if a speech input (e.g., the fourth speech input) is received by thespeech sampling module510 earlier than the time thefirst speech input501 is broadcast, the naturallanguage comprehension system520 is also able to parse fourth keywords in the fourth speech input, to select from the candidate list the fourth report answer corresponding to the fourth keywords according to the priority, and output the fourth speech response according to the fourth report answer.
For instance, it is assumed that the naturallanguage comprehension system520 receives thefirst speech input501 indicating “I want to watch TV drama”, and after a few seconds the naturallanguage comprehension system520 further receives thefourth speech input501 indicating “Play Romance of the Three Kingdoms for me.” At this time, the naturallanguage comprehension system520 is able to recognize thefirst keywords509 “TV drama” in thefirst speech input501 and then recognize the fourth keywords “Romance of the Three Kingdoms” in the fourth speech input. Finally, the naturallanguage comprehension system520 selects from the candidate list the report answers corresponding to “TV drama” and “Romance of the Three Kingdoms” as the fourth report answer and outputs the corresponding fourth speech response according to the fourth report answer.
As is discussed above, the naturallanguage comprehension system520 is able to output the speech response (corresponding to the user's speech input) in reply to the user's request information according to the public's/individual user's habits such as preferences/dislikes, or the context of the dialogue. The naturallanguage comprehension system520 may arrange the report answers in the candidate list according to different priorities determined by the public's/individual user's habits such as preferences/dislikes, the context of the dialogue, and so on. If the speech input from the user is unclear, the naturallanguage comprehension system520 is able to determine the user's intention included in the user'sspeech input501 according to the public's/individual user's habits such as preferences/dislikes, or the context of the dialogue (e.g., the category/knowledge field of thekeywords509 relating to the first speech input501). That is, if a report answer is close to the previous intention of the user or the intention mostly accepted by the public, the naturallanguage comprehension system520 may firstly select this report answer. Thereby, the speech response output by the naturallanguage dialogue system500 is more likely to correspond to the request information by the user.
According to the method for correcting the speech response and the natural language dialogue system described in the present embodiment, the natural language dialogue system is able to output thefirst speech response507 corresponding to thefirst speech input501 from the user. If thefirst speech response507 output by the natural language dialogue system neither matches with the first request information503 of thefirst speech input507 from the user nor matches with thefirst keyword509, the natural language dialogue system corrects the previously outputfirst speech response507 and further outputs thesecond speech response507′ (that may relatively conforms to the first request information503 of the user) according to thesecond speech input501′ subsequently provided by the user. In addition, the natural language dialogue system may arrange report answers according to different priorities determined by the public's/individual user's habits such as preferences/dislikes, the context of the dialogue, and so on, and thereby the natural language dialogue system is able to output corresponding speech response to the user. In the event that the user is dissatisfied with the report answer provided by the natural language dialogue system, the natural language dialogue system may automatically revise the report answer according to each request information delivered by the user and then provide a new speech response to the user, so as to facilitate the user in dialogue with the natural language dialogue system.
Different embodiments of providing report answers and the speech responses according to the context of the dialogue, the public's/individual user's habits such as preferences/dislikes, will be further given below, and the structure and the components of the naturallanguage comprehension system100 and thestructured database220 are applied in these embodiments for the purpose of explanations.
FIG. 7A is a block diagram illustrating a natural language dialogue system according to an embodiment of the invention. With reference toFIG. 7A, the naturallanguage dialogue system700 includes aspeech sampling module710, a naturallanguage comprehension system720, aproperties database730, and aspeech synthesis database740. As a matter of fact, thespeech sampling module710 depicted inFIG. 7A contains the same configuration as thespeech sampling module510 shown inFIG. 5A and thus executes the same functions. Similarly, both the naturallanguage comprehension system720 and the naturallanguage comprehension system520 have the same configurations and thus execute the same functions. Besides, when the naturallanguage comprehension system720 parses therequest information703, the user's intention may be obtained by means of a full-text search conducting to the structureddatabase220 as shown inFIG. 1, which has been already described above with references toFIG. 1 and thus will not be further explained. Theproperties database730 serves to store a user'spreference715 transmitted by the naturallanguage comprehension system720 or to provide a user'spreference717 to the naturallanguage comprehension system720, which will be illustrated hereinafter. Thespeech synthesis database740 is equivalent to thespeech synthesis database530 for providing speech outputs to users. In the present embodiment, thespeech sampling module710 receives the speech input701 (i.e., the first/second speech input501/501′ shown in FIG.5A/B) from a user, and the naturallanguage comprehension system720 parses request information703 (i.e., the first/second request information503/503′ shown in FIG.5A/B) included in thespeech input701 and outputs the corresponding speech response707 (i.e., the first/second speech response507/507′ shown in FIG.5A/B). The components of the naturallanguage dialogue system700 may be configured within an identical machine, which should not be construed as a limitation to the invention.
The naturallanguage comprehension system720 receives therequest information703 that is parsed from speech theinput701, and the naturallanguage comprehension system720 generates a candidate list including at least one report answer according to one ormore keywords709 included in thespeech input701. Next, the naturallanguage comprehension system720 selects from the candidate list one of the report answers as thereport answer711 matching with thekeyword709 and thereby searches thespeech synthesis database740 to find thespeech713 in response to this selectedreport answer711. At last, the naturallanguage comprehension system720 outputs aspeech response707 according to thespeech713. According to the present embodiment, the naturallanguage comprehension system720 may be implemented by hardware circuitry constituted by means of logic gates or computer programming codes, which should not limit claims scopes of the present invention.
FIG. 7B is a block diagram illustrating a naturallanguage dialogue system700′ according to another embodiment of the invention. The naturallanguage comprehension system720′ depicted inFIG. 7B may include aspeech recognition module722 and a naturallanguage processing module724, wherein aspeech sampling module710 and aspeech synthesis module726 may be integrated into anspeech processing module702. Thespeech recognition module722 receives from thespeech sampling module710request information703 parsed from thespeech input701 and then converts therequest information703 into one ormore keywords709. The naturallanguage processing module724 processes thekeywords709 to obtain at least one candidate list and selects from the candidate list one report answer as thereport answer711 that relatively conforms to thespeech input701. Thereport answer711 is parsed and obtained by the naturallanguage comprehension system720′ internally, and therefore the parsed result must be converted as user's accessible information such as texts or speeches before outputting to the user. Thespeech synthesis module726 searches thespeech synthesis database740 according to thereport answer711, wherein thespeech synthesis database740 records (mapping) relationships between texts and corresponding speech information, so as to facilitate thespeech synthesis module726 to find out associatedspeech713 corresponding to thereport answer711 and thereby create the synthesizedspeech response707. The synthesizedspeech response707 may then be output by thespeech synthesis module726 through the speech output interface (not shown) and broadcast to the user, and the speech output interface may be a speaker, an amplifier, a headset, or another similar device. Please note the naturallanguage comprehension system720 inFIG. 7A embeds thespeech synthesis module726 therein, with the structural design similar to that shown inFIG. 5B (but thespeech synthesis module726 is not shown inFIG. 7A). According to thereport answer711, thespeech synthesis module726 searches thespeech synthesis database740 to obtain thespeech713 by which the synthesizedspeech response707 can be created.
In the present embodiment, thespeech recognition module722, the naturallanguage processing module724, and thespeech synthesis module726 in the naturallanguage comprehension system720 may be respectively equivalent to thespeech recognition module522, the naturallanguage processing module524, and thespeech synthesis module526 shown inFIG. 5B, and these equivalent modules are capable of performing the same functions. Besides, thespeech recognition module722, the naturallanguage processing module724, thespeech synthesis module726, and thespeech sampling module710 may be configured into an identical machine. In other embodiments of the invention, thespeech recognition module722, the naturallanguage processing module724, and thespeech synthesis module726 may be separately arranged in different machines (e.g., a computer system, a server, or other similar devices/systems). For instance, in the naturallanguage comprehension system720′ shown inFIG. 7B, thespeech synthesis module726 and thespeech sampling module710 may be configured in thesame machine702, while thespeech recognition module722 and the naturallanguage processing module724 may be configured in another machine. InFIG. 7B, thespeech synthesis module726 and thespeech sampling module710 are configured in thesame machine702, and therefore the naturallanguage comprehension system720′ is required to transmit thereport answer711 to themachine702, and thespeech synthesis module726 transmits thereport answer711 to thespeech synthesis database740 to find thecorresponding speech713 by which thespeech response707 is generated. In another aspect, when thespeech synthesis module726 calls thespeech synthesis database740 according to thereport answer711, the format of thereport answer711 may be converted, and the call to thespeech synthesis database740 may be made through the interface regulated by thespeech synthesis database740. This is well known to people skilled in the art and thus will not be further explained.
A natural language dialogue method will be described with reference to the naturallanguage dialogue system700 shown inFIG. 7A.FIG. 8A is a flowchart illustrating a natural language dialogue method according to an embodiment of the invention. For the sake of explanations, only operations within the naturallanguage dialogue system700 ofFIG. 7A are shown, but the disclosed method is also applicable to the naturallanguage dialogue system700′ shown inFIG. 7B. In comparison toFIGS. 5 and 6 showing the output information is automatically revised according to speech inputs from the user, FIG.7A/7B/8A show that the user'spreference715 is recorded in theproperties database730, one report answer is selected as thereport answer711 from the candidate list according to the user's preference, and then the speech response corresponding to thereport answer711 is output to the user. In fact, those embodiments shown in FIGS.5/6 and FIGS.7A/7B/8 may be applied individually or collectively, which should not be construed as a limitation to the invention.
With reference toFIGS. 7A and 8A, in step S810, thespeech sampling module710 receives thespeech input701. For instance, thespeech input701 is a user's speech, and thespeech input701 may also includerequest information703 from the user. Specifically, thespeech input701 from the user may be an interrogative sentence, an imperative sentence, or any sentence having other request information such as “I want to read the Romance of the Three Kingdoms,” “I want to listen to the song of Forget-Love Potion,” or “what is the temperature today”, as mentioned above. Note that steps S802 to S806 are operations the naturallanguage dialogue system700stores user preferences715 based on user's previous speech inputs, and subsequent steps S810 to S840 are operations performed on those previously-stored user's preferences in theproperties database730. Details of the steps S802 to S806 will be shown later in the disclosure, while the steps S820 to S840 are described below.
In steps S820, the naturallanguage comprehension system720 parses at least onefirst keyword709 included in thefirst speech input701 to derive a candidate list having at least one report answer. Specifically, the naturallanguage comprehension system720 parses thespeech input701 to obtain one ormore keywords709 included in thespeech input701. For instance, when the user'sspeech input701 indicates “I want to see the Romance of the Three Kingdoms,” thekeywords709 parsed and obtained by the naturallanguage comprehension system720 are “Romance of the Three Kingdoms” and “see.” As described above, the naturallanguage dialogue system700 may further determines whether the user intends to read the book, watch the TV drama, or watch the film. When the user'sspeech input701 indicates “I want to listen to the song of Forget-Love Potion,” thekeywords709 parsed and obtained by the naturallanguage comprehension system720 are “Forget-Love Potion,” “listen,” and “song.” As described above, the naturallanguage dialogue system700 may further determine whether the user intends to listen to the song performed by Andy Lau or E-jun Lee. According to thekeywords709, the naturallanguage comprehension system720 may perform a full-text search in the structured database and obtain at least one search result (e.g., at least one record shown in FIG.3A/3B) as the report answer in the candidate list. Since onekeyword709 may relate to different categories (such as films, books, music, or games), and the same category may be further categorized into different sub-fields (e.g., different authors of one film or one book, different singers performing one song, different versions of one game, and so on), the naturallanguage comprehension system720 may obtain one or more search results corresponding to thekeyword709 after parsing the user's speech input701 (e.g., performing a full-text search in the structured database220), and the search results not only include thekeyword709 but also contain other information as exemplified in Table 1. Therefore, if thefirst speech input701 from the user has a plurality ofkeywords709, it indicates that therequest information703 from the user is of high clarity, such that the naturallanguage comprehension system720 is able to parse thefirst speech input701 and then obtain associated search results close to therequest information703. If the naturallanguage comprehension system720 is able to obtain a search result completely matching with the first speech input, it indicates that the correct report answer desired by the user is found.
For instance, when thekeywords709 are “Romance of the Three Kingdoms”, the search results obtained by the naturallanguage comprehension system720 may be records related to “ . . . ‘Romance of the Three Kingdoms’ . . . ‘TV drama’” and “ . . . ‘Romance of the Three Kingdoms’ . . . ‘book’”, wherein the “TV drama” and the “book” are the user's intentions indicated by the report answers. Additionally, when thekeywords709 are “‘Forget-Love Potion’ and ‘music’”, the user's intentions parsed and interpreted by the naturallanguage comprehension system720 may be records related to “ . . . ‘Forget-Love Lau’” and “ . . . ‘Forget-Love Potion’ . . . ‘music’ . . . ‘E-jun Lee’”, wherein “Andy Lau” and “E-jun Lee” are the search results representing the user's intentions. That is, after the naturallanguage comprehension system720 conducts the full-text search in the structureddatabase220, each search result may include thekeyword709 and other information (exemplified in Table 1) related to thekeyword709, and the naturallanguage comprehension system720 then converts the obtained search results into the candidate list including at least one report answer, so as to perform following steps.
In step S830, the naturallanguage comprehension system720 selects thereport answer711 from the candidate list according to the user'spreference717 sent by theproperties database730 and then outputs thespeech response707 according to thereport answer711. The user'spreference717 is obtained by organizing the user'spreferences715 stored in theproperties database730, which will be explained later. According to the present embodiment, the naturallanguage comprehension system720 selects thereport answer711 from the candidate list according to a priority (which will be described hereinafter). In step S840, thespeech response707 is output according to thereport answer711.
According to an embodiment of the invention, the priority may be determined by the quantity of the search results. For instance, if thekeyword709 is “Romance of the Three Kingdoms”, and if the naturallanguage comprehension system720 finds the most records related to “Romance of the Three Kingdoms” and “books”, less records related to “Romance of the Three Kingdoms” and “music”, and the least records related to “Romance of the Three Kingdoms” and “TV drama”, the naturallanguage comprehension system720 considers the “book of Romance of the Three Kingdoms” as the first report answer with the highest priority (e.g., all search results related to the “book of Romance of the Three Kingdoms” are organized to create a candidate list, and those report answers in the candidate lists are further arranged according to a priority determined by the values in the preference field316), the “music of Romance of the Three Kingdoms” as the second report answer (the report answer with the second highest priority), and the “TV drama of Romance of the Three Kingdoms” as the third report answer (the report answer with the third highest priority). It should be mentioned that the priority is determined not only by the quantity of the search results but also by the public's/individual user's habits such as preferences/dislikes, which will be further described below.
Another embodiment is given below to enable people skilled in the art to further comprehend the natural language dialogue method and the natural language dialogue system described herein.
It is first assumed that thefirst speech input701 received by thespeech sampling module710 is “I want to see the Romance of the Three Kingdoms” (step S810), the naturallanguage dialogue system700 parses thefirst speech input701 to obtain thefirst keywords709 “see” and “Romance of the Three Kingdoms” and then acquires a candidate list with a plurality of report answers. Each of the report answers has relevant keywords (step S820) and other data, as shown in the above Table 1.
The naturallanguage comprehension system720 then selects the desired report answer from the candidate list. If the naturallanguage comprehension system720 selects the report answer A (shown in Table 1) as thefirst report answer711 from the candidate list, the naturallanguage comprehension system720 in steps S830 to S840 outputs “whether to display the book of the Romance of the Three Kingdoms” as thefirst speech response707, for instance.
As described above, the naturallanguage comprehension system720 may arrange report answers in the candidate list according to another priority determined in a different manner and thereby output aspeech response707 corresponding to thereport answer711. For instance, the naturallanguage comprehension system720 may determine the user's preference according to a plurality of user's dialogue records, such as positive/negative expressions used by the user. Additionally, the naturallanguage comprehension system720 may determine the priority of thereport answer711 according to the user'spreference717. Before explanations of the positive/negative expressions used by the user are provided, the way of storing the preference/dislike, or the habit of the user or the public as the user'spreference715 is described.
Specifically, steps S802 to S806 are performed to store the user'spreference715. In an embodiment of the invention, before thecurrent speech input701 is received (in step S810), a plurality of previous speech inputs701 (i.e., those previous dialogue records) are received in step S802, associated user'spreferences715 are captured according to those previous speech inputs701 (step S804), and the captured user'spreferences715 are stored in theproperties database730. As a matter of fact, the user'spreference715 may also be stored into thestructured database220, and theproperties database730 may be integrated into thestructured database220. For instance, in an embodiment of the invention, the user's preference may be registered in thepopularity field316 shown inFIG. 3B. Since the way of registering information in thepopularity field316 is already explained above (e.g., once aspecific record302 is matched, the value in the popularity field increases one automatically), no further explanation is provided hereinafter. Certainly, additional field may be configured in the structureddatabase220 to store the user'spreference715. For instance, a keyword (e.g., “Romance of the Three Kingdoms”) and user's preferences (e.g., when the user mentions “like” or other positive expressions and “dislike” or other negative expressions, the values in thepreference field318 and thedislike field320 inFIG. 3B may increase one, respectively) may be integrated so as to calculate the quantity of the user's preferences (e.g., calculate the quantity of positive expressions and the quantity of negative expressions). When the naturallanguage comprehension system720 searches the user'spreference717 in the structureddatabase200, the naturallanguage comprehension system720 may directly search the values in thepreference field318 and/or the dislike field320 (e.g., search the quantities of positive expressions and negative expressions, respectively) and thereby determine the user's preferences (i.e., the calculated quantities of positive expressions and negative expressions may be considered as the user'sreference717 and transmitted to the natural language comprehension system720).
The following description relates to the condition on which the user'spreference715 is stored into theproperties database730, i.e., theproperties database730 is not integrated into thestructured database220. According to an embodiment of the invention, the user'spreference715 may be stored by using keywords and in accompanied with the “degrees of preference” of the keywords. For example, the user's personal preference and dislike with respect to a specific set of keywords are registered directly in thepreference field852 and thedislike field862 shown inFIG. 8B, and thepreference field854 and thedislike field864 may be applied to register the public preference and dislike with respect to the specific set of keywords. For instance, inFIG. 8B, the keywords “Romance of the Three Kingdoms” and “book” stored in the record832 correspond tovalues 20 and 1 respectively in thecorresponding preference field852 and the correspondingdislike field862. The keywords “Romance of the Three Kingdoms” and “TV drama” stored in the record834 correspond to thevalues 8 and 20 respectively in thecorresponding preference field852 and the correspondingdislike field862. The keywords “Romance of the Three Kingdoms” and “music” stored in therecord836 correspond to thevalues 1 and 8 respectively in thecorresponding preference field852 and the correspondingdislike field862. These all represent the degrees of the user's personal preference and dislike with respect to the specific set of keywords (e.g., a large value in thepreference field852 indicates the user's preferences for the corresponding keyword, while a large value in thedislike field862 indicates the user's dislikes for the corresponding keyword). Besides, the values in thepreference field854 and thedislike field864 corresponding to the record832 are 5 and 3. Values in thepreference field854 and thedislike field864 corresponding to the record834 are 80 and 20. The values in thepreference field854 and thedislike field864 corresponding to therecord836 are 2 and 10. These all represent the degrees of the public preferences and dislikes with respect to specific sets of keywords (referred to as “preference indication”). According to the user's preference, the values in thepreference field852 and thedislike field862 may be increased. Therefore, if the user's speech input indicates “I want to watch the TV drama of Romance of the Three Kingdoms,” the naturallanguage comprehension system720 may integrate the keywords (“Romance of the Three Kingdoms” and “TV drama”) and a “preference indication” of increasing the value in thepreference field852 as the user'spreference715. The user'spreference715 is then directed to theproperties database730, and finally theproperties database730 may increase one to the value in thepreference field852 of the record834 (since the user's intention is to watch the “TV drama” of “Romance of the Three Kingdoms” indicates the increase in the degrees of user's preference). In view of the way of registering user's preferences, if the user subsequently inputs relevant keywords, e.g., if the user's input indicates “I want to see the Romance of the Three Kingdoms,” the naturallanguage comprehension system720 may, according to the keywords “Romance of the Three Kingdoms,” find threerecords832,834, and836 related to “Romance of the Three Kingdoms” in theproperties database730 as shown inFIG. 8B, and theproperties database730 considers the values in thepreference field852 and thedislike field862 as the user'spreference717 and finally transfer this user'spreference717 back to the naturallanguage comprehension system720. Thereby, the naturallanguage comprehension system720 may employ the user'spreference717 as the basis for determining the user's personal preference. Undoubtedly, theproperties database730 may also consider the values in thepreference field854 and thedislike field864 as the user'spreference717 and then transmits this user'spreference717 back to the naturallanguage comprehension system720. Thereby, the naturallanguage comprehension system720 may employ the user'spreference717 as the basis of determining the public preference. The way of employing the user'spreference717 to indicate the user's personal preferences or the public preferences should not be construed as a limitation to the invention.
In another embodiment, the values in thepreference field852 and thedislike field862 may also determine the user's or the public's usage habits. For instance, after the naturallanguage comprehension system720 receives the user'spreference717, the naturallanguage comprehension system720 may determine the differences in the values of the preference fields852 and854 and/or the dislike fields862 and864. If the difference in the values of the preference field and the dislike field is larger than a certain threshold, it implies the user is accustomed to a specific dialogue manner. For instance, if the value in thepreference field852 is larger than the value of thedislike field862 by 10, the user may prefer to the uses of “positive expressions” (i.e., one way of registering the “user's habit” aforementioned). In this situation, the naturallanguage comprehension system720 may merely select the report answer according to the value in thepreference field852. If the naturallanguage comprehension system720 employs the values of thepreference field854 and thedislike field864 stored in theproperties database730, which indicates the preference records of all users are applied for determinations and associated determination results may be used as references of the public usage habits. Note that the user'spreference717 from theproperties database730 back to the naturallanguage comprehension system720 may simultaneously include user's personal preference (e.g., the values in thepreference field852 and the dislike field862) and the public's preference (e.g., the values in thepreference field854 and the dislike field864), which should not be construed as a limitation to the invention.
During the step S820 of generating the candidate list (regardless of the results being completely or partially matched), the naturallanguage dialogue system700 may store the user'spreference715 obtained from user's speech inputs. For instance, in step S820, once a keyword has a match with at least one record of the structureddatabase220, which implies the user prefers the matched record(s) (in other words, the matched record(s) may meet his/her requirement), and therefore the “keyword” and the “preference indication” may be sent to theproperties database730. After a corresponding record is found in theproperties database730, the values in thecorresponding preference field852/854 and the correspondingdislike field862/864 in corresponding records may be changed (e.g., when the user inputs “I want to read the book of Romance of the Three Kingdoms,” the value of thepreference field852/854 in the record832 shown inFIG. 8B may increase one automatically). According to yet another embodiment, in step S830, the naturallanguage dialogue system700 may store the user'spreference715 after the user select one report answer. Besides, if no corresponding keyword is found in theproperties database730, a new record may be established to store the user'spreference715. For instance, if the user inputs “I want to listen to Forget-Love Potion by Andy Lau,” and corresponding keywords “Andy Lau” and “Forget-Love Potion” are derived. If no corresponding keyword is found in theproperties database730 in the step of storing the user's preference, anew record838 may be created in theproperties database730, and one may be registered as the value of thecorresponding preference field852/854. Timing and the way of storing the user'spreference715 shown aforementioned are merely exemplary, and people skilled in the art may make modifications and variations to the embodiments provide herein without departing from the spirit and scope of the invention.
Although the formats of the records832 to838 stored in theproperties database730 shown inFIG. 7B are different from those in the structured database220 (as shown in FIG.3A/3B/3C), the formats of the stored records are not limited in the invention. Besides, the way of storing data into thepreference field852/854 or thedislike field862/864 and the way of using the stored data are described in the previous embodiments, in another embodiment of the invention,additional fields872/874 may be further established in theproperties database730 to respectively store user's/the public's additional usage habits, e.g., those times of downloading, citing, recommending, commenting, or referring to the data associate with the record. In yet another embodiment, the times of downloading, citing, recommending, commenting, or referring to the data may also be stored in the preference fields852/854 and/or thedislike field862/864. For instance, as long as the user provides positive comments on a certain record or refer a certain record as others' references, the values of the preference fields852/854 may increase one automatically. If the user provides negative comments on a certain record, the values of the dislike fields862/864 may increase one automatically. The way of registering the number of records and the values in the fields aforementioned are not limited to those described herein. People skilled in the field should be aware that thepreference field852, thedislike field862, andadditional field872 shown inFIG. 8B . . . etc., are merely related to the user's personal selection and preference. Accordingly, the user's personal choice/preference/dislike information may be stored into the user's mobile communication apparatus, and the data in thepreference field854, thedislike field864 andadditional field874 and other data related to all users (or at least a specific group of users) may be stored in the server. Thereby, the storage space on the server may be economized, and the privacy of the user's personal preference may be guaranteed.
User's actual usage conditions are further described below with reference toFIG. 7A andFIG. 8B. In view of the dialogue contents in a lot ofspeech inputs701, when the user talks to the naturallanguage comprehension system720 and he/she frequently mentions “I dislike watching the TV drama of Romance of the Three Kingdoms,” sometimes mentions “I dislike listening to the music of Romance of the Three Kingdoms,” and scarcely mentions “I dislike reading the book of Romance of the Three Kingdoms,” e.g., there are 20 records regarding “I dislike watching the TV drama of Romance of the Three Kingdoms” (i.e., the number of the negative expressions with respect to “Romance of the Three Kingdoms” and “TV drama” shown inFIG. 8B is 20 (Record834)) in theproperties database730, 8 records regarding “I dislike listening to the music of Romance of the Three Kingdoms” in the properties database (i.e., the number of the negative expressions with respect to “Romance of the Three Kingdoms” and “music” shown inFIG. 8B is 8 (Record836)), and 1 record regarding “I dislike reading the book of Romance of the Three Kingdoms” (i.e., the number of the negative expressions with respect to “Romance of the Three Kingdoms” and “book” shown inFIG. 8B is 1 (Record832)) in the properties database, the naturallanguage comprehension system720 sequentially arrange the report answers in the candidate list in the order of “the book of Romance of the Three Kingdoms,” “the music of Romance of the Three Kingdoms,” and “the TV drama of Romance of the Three Kingdoms” according to the priority. Note that the user'spreference717 transmitted from theproperties database730 includes quantities of the three negative expressions (i.e., 20, 8, and 1). That is, if thekeyword709 is “Romance of the Three Kingdoms,” the naturallanguage comprehension system720 selects the book of “Romance of the Three Kingdoms” as thereport answer711 and outputs correspondingspeech response707 according to thereport answer711. Although the priority described above is determined merely by the calculated quantities of the negative expressions used by the user, it should be mentioned that the calculated quantities of user's positive expressions may be independently employed to determine the priority (e.g., if the value in thepreference field852 is larger than the value in thedislike field862 by a certain threshold).
Note that the naturallanguage comprehension system720 may also determine the priorities of the report answers according to both the quantities of the positive expressions and the negative expressions used by the user. In particular, theproperties database730 may store the keywords used by the user, such as “like,” “idol,” “hate,” “dislike,” etc., wherein the former two expressions are positive, while the latter two are negative. Hence, the naturallanguage comprehension system720 not only may compare the difference in the number of using the expression “like” and the number of using the expression “dislike”, but also may directly arrange the report answers according to the priority determined by the numbers of using the positive/negative expressions corresponding to the keywords, i.e., the naturallanguage comprehension system720 may compare the citing number relating to positive expressions with the citing number relating to negative expressions. For instance, if the citing number relating to expression “like” in a specific report answer is significant (i.e., the citing number of the positive expressions is significantly larger, or the value in thepreference field852 is significantly larger than that in the dislike field862), this specific report answer is selected firstly. Alternatively, if the citing number of the expression “dislike” of a specific report answer is significant (i.e., the citing number for the negative expressions is significant larger, or the value of thepreference field852 is significantly smaller than that of the dislike field862), this specific report answer is later selected. Thereby, the naturallanguage comprehension system720 is able to organize the report answers so as to create a candidate list according to the priority. Some users may prefer to use positive expressions (e.g., the value in thepreference field852 is relative large), while the others may prefer to use negative expressions (the value in thedislike field862 is relative large). Advantageously, in the previous embodiment, the user'spreference717 reflects the user's personal usage habit, and thus the report answer conforming to the user's habit may be provided for user's selections.
In addition, the naturallanguage comprehension system720 may also determine the priority of thereport answer711 in the candidate list according to the public's usage habits. The more often thereport answer711 is selected and used by the public, the higher priority the report answer owns (e.g., thepopularity field316 shown inFIG. 3C, or the preference/dislike fields854/864 ofFIG. 8B may be applied to keep records). Thekeywords709 are exemplified as “Romance of the Three Kingdoms” here, and the report answers found by the naturallanguage comprehension system720 are assumed to be the TV drama, the book, and the music of “Romance of the Three Kingdoms.” If “Romance of the Three Kingdoms” mentioned by the public frequently refers to the TV drama of “Romance of the Three Kingdoms,” sometimes refers to the film of “Romance of the Three Kingdoms,” and seldom refers to the book of “Romance of the Three Kingdoms” (e.g., when the values of the relevant records stored in thepreference field854 shown inFIG. 8B are 8, 40, and 5, respectively), the naturallanguage comprehension system720 arranges the report answers711 in the order of “TV drama,” “film,” and “book” according to the priority determined by the public usage habit. That is, the naturallanguage comprehension system720 firstly selects “the TV drama of Romance of the Three Kingdoms” as thereport answer711 and then outputs correspondingspeech response707 according to thereport answer711. Please note that thepopularity field316 shown inFIG. 3C (or the preference/dislike fields854/864 ofFIG. 8B) applying to keep records may be employed as the way of arranging the report answers according to the priorities determined by the public usage habit aforementioned, and how to keep records is already provided in previous paragraphs with reference toFIG. 3C (8B) and thus will not be further described below.
The naturallanguage comprehension system720 may also determine the priorities of the report answers711 according to a user's usage frequencies. Specifically, the naturallanguage comprehension system720 is able to register those received user'sspeech inputs701 in theproperties database730, and theproperties database730 may register thosekeywords709 obtained when the naturallanguage comprehension system720 parses the user'sspeech inputs701 and may also register all the report answers711 generated by the naturallanguage comprehension system720. Afterwards, the naturallanguage comprehension system720 may find thereport answer711 relatively conformable to the user's intention (determined by the user's speech input) according to the priority, so as to find the corresponding speech response finally. The recorded information mentioned here may include the user's preferences/dislikes/habits and even the public preferences/dislikes/habits. Please note that thepopularity field316 shown inFIG. 3C (or the preference/dislike fields852/862 ofFIG. 8B) applying to keep records may be used as the way of determining the priorities of the report answers711 according to the user's habits described above, and how to keep records is already provided in previous paragraphs with reference toFIG. 3C (8B) and thus will not be further described below.
Briefly, the naturallanguage comprehension system720 may store user's preferences (e.g., positive and/or negative expressions), user's usage habits, and the public's usage habits into theproperties database730 in step S806. That is, in steps S802, S804, and S806, user'spreferences715 are learned from user's historical dialogue records (registered in the properties database730), and the user'spreferences715 are stored into the properties database730 (by means of the user'spreference717 feeding into theproperties database730 to modify the user's/the public's preferences). Besides, the user's/the public's usage habits are also stored into theproperties database730. Thereby, a large quantity of information in the properties database730 (e.g., via the user'spreference717 stored into the properties database730) may be utilized by the naturallanguage comprehension system720 for providing speech responses accurately.
Step S830 is further demonstrated hereinafter. After the speech input is received in step S810 and after thekeywords709 included in the speech input are parsed to derived required candidate list in step S820, the naturallanguage comprehension system720 in step S830 determines the priority of at least one report answer according to the user's preference717 (step S880) which includes the user's preferences, the user's usage habits, or the public usage habits. As is described above, the priority may be determined by using the search/citing numbers, the user's or the public's positive/negative expressions, and so on. In step S890, areport answer711 is selected from the candidate list according to a priority, and the selectedreport answer711 may be the one most matched with keywords or have the highest priority. In step S840, thespeech response707 is output according to thereport answer711.
On the other hand, the naturallanguage comprehension system720 may determine the priority of at least one report answer according to the user'sprevious speech input701. That is, if another speech input701 (e.g., the fourth speech input) is received by thespeech sampling module710 at the time earlier than that when thespeech response707 is broadcast, the naturallanguage comprehension system720 is also able to parse the keyword (i.e., the fourth keyword) in the speech input701 (i.e., the fourth speech input), select from the candidate list the report answer corresponding to the keyword as thereport answer711 according to the priority, and output thespeech response707 according to thereport answer711.
For instance, it is assumed that the naturallanguage comprehension system720 receives aspeech input701 “I want to watch TV drama”, and after few seconds the naturallanguage comprehension system720 further receives anotherspeech input701 “play Romance of the Three Kingdoms for me.” The naturallanguage comprehension system720 is able to recognize the keyword “TV drama” (the first keyword) from thefirst speech input701; however, the naturallanguage comprehension system720 may recognize the keywords “Romance of the Three Kingdoms” (i.e., the fourth keywords) later. Therefore, the naturallanguage comprehension system720 selects the report answer corresponding to the user's intention with respect to “Romance of the Three Kingdoms” and “TV drama” from the candidate list and outputs thespeech response707 to the user according to thisreport answer711.
As is discussed above, the naturallanguage comprehension system720 is able to output the speech response707 (relatively conformable to the user's speech input701) in reply to the user'srequest information703 according to the public's/the user's preferences, the user's/the public's usage habits, or dialogue contexts. The naturallanguage comprehension system720 may sequentially arrange the report answers in the candidate list according to different priorities determined by the public's usage habits, the user's preferences, the user's personnel usage habits, the dialogue contexts, and so on. If thespeech input701 from the user is unclear, the naturallanguage comprehension system720 is able to refer to the user's intention included in the user'sspeech input701 according to the public's usage habits, the user's preferences, the user's personnel usage habits, or the dialogue context (e.g., the category/knowledge field of thekeywords709 contained in the speech input701). That is, the naturallanguage comprehension system720 may firstly select thisreport answer711 having intention close to those intentions the user/public used/described/showed previously. Thereby, thespeech response707 output by the naturallanguage comprehension system720 may more likely correspond to therequest information703 from the user.
Theproperties database730 and thestructured database220 are independently described above, while these two databases may be integrated, and any people skilled in the art may determine his/her database structure according to practical applications and/or actual demands.
Summarily, the natural language dialogue method and the natural language dialogue system are provided herein, and the natural language dialogue system is able to output speech responses corresponding to speech inputs from the user. The natural language dialogue system described herein may also select a proper report answer according to a priority determined by the public's usage habits, the user's preferences, the user's personnel usage habits, or the dialogue contexts, and thereby the natural language dialogue system may output adaptive speech responses to the user, so as to facilitate the use of the natural language dialogue system.
In the following embodiment, the components and structure of the naturallanguage comprehension system100 and thestructured database220 are employed to determine whether operations/applications are directly activated according to file data type associated user's request information or wait for user's further instructions, according to the number of the report answers obtained by parsing the request information included in the user's speech input. In case there is only one report answer is left, operations/applications associated with file data type indicated by this report answer may be directly activated. Under such a user-friendly interface, the system does not filter the report answers but directly provides the candidate list including all report answers for user's selections. Any user may self-determine the to-be-executed application or the desired service by selecting corresponding report answer.
FIG. 9 is a schematic diagram illustrating a system of a mobile terminal apparatus according to an embodiment of the invention. With reference toFIG. 9, in the present embodiment, the mobileterminal apparatus900 includes aspeech receiving unit910, adata processing unit920, adisplay unit930, and astorage unit940. Thedata processing unit920 is coupled to thespeech receiving unit910, thedisplay unit930, and thestorage unit940. Thespeech receiving unit910 is configured to receive and then transmit a first speech input SP1 and a second speech input SP2 to thedata processing unit920. The first/second speech input SP1/SP2 described herein may refer to thespeech inputs501/501′ and701/701′. Thedisplay unit930 is subject to the controls of thedata processing unit920 and thereby displays a first/second candidate list908/908′. Thestorage unit940 is configured to store data including those data stored in the structureddatabase220 and/or stored in theproperties database730, which will not be further described hereinafter. Besides, thestorage unit940 may be any type of storage unit in a server or a computer system, such as a dynamic random access memory (DRAM), a static random access memory (SRAM), a flash memory, a read-only memory (ROM), and so on. This should not be construed as a limitation to the invention, and people skilled in the art should be able to make proper modifications based on actual requirements.
In the present embodiment, the functions of thedata processing unit920 are similar to those of the naturallanguage comprehension system100 depicted inFIG. 1. That is, thedata processing unit920 recognizes the first speech input SP1 to generate thefirst request information902, which is parsed and a natural language processing process is performed on thefirst request information902 so as to generate afirst keyword904 corresponding to the first speech input SP1. According to thefirst keyword904 derived from the first speech input SP1, a first report answer906 (e.g., equivalent to thefirst report answer511/711) is selected from the data stored in the storage unit940 (e.g., through conducting a full-text search to the structureddatabase220 by thesearch engine240 according to the keyword108). When the number of the selectedfirst report answer906 is 1, thedata processing unit920 may directly activate an application in accompanied with file data indicted by thefirst report answer906. When the number of the selected first report answers906 is more than 1, thedata processing unit920 organizes those first report answers906 into afirst candidate list908 and controls thedisplay unit940 to display thefirst candidate list908 for user's further selection. At this time, thedata processing unit920 receives and recognizes the second speech input SP2 to generatesecond request information902′, performs a natural language processing process on thesecond request information902′ to generate asecond keyword904′ corresponding to the second speech input SP2, and then selects a part of report answers from thefirst candidate list908 according to thesecond keyword904′. Thefirst keyword904 and thesecond keyword904′ may be respectively constituted by a lot of keywords. The way of parsing the second speech input SP2 to generate thesecond request information902′ and thesecond keyword904′ may refer to the way of parsing the second speech input as shown inFIGS. 5A and 7A and therefore will not be further elaborated.
Similarly, if the number of thesecond report answer906′ is 1, thedata processing unit920 may directly perform a corresponding operation, such as activating an application running file data associated with thesecond report answer906′. When the number of the second report answers906′ is larger than 1, thedata processing unit920 organizes the second report answers906′ as asecond candidate list908′ and controls thedisplay unit940 to display thesecond candidate list908′. Consecutively, corresponding report answer(s) is(are) selected according to the user's next speech input, and associated operation(s) is(are) performed according to the number of the subsequently selected report answer(s) (i.e., generating more candidate list displaying to the user for further selection, or activating associated operation associated with the only one report answer). This may be deduced from the above descriptions and thus will not be further explained hereinafter.
To be specific, thedata processing unit920 compares a lot of records302 (e.g., the value data in each sub-field308 of the title field304) in the structureddatabase220 with thefirst keyword904 corresponding to the first speech input SP1, as described in previous paragraphs and depicted inFIGS. 1,3A,3B, and3C. When one of therecords302 in the structureddatabase220 at least partially matches thefirst keyword904, therecord302 is deemed as a matched result (e.g., the matched result as depicted in paragraphs relating toFIGS. 3A and 3B) generated according to the first speech input SP1. In one embodiment, if the file data type associated the matched record relates to music, therecord302 may include a song title, a singer's name, an album title, release time, a playlist, and so forth; if the file data type associated the matched record relates to films, therecord302 may include a film title, release time, staff (including the cast), and so forth; if the file data type associated the matched record relates to webpages, therecord302 may include the name of a web site, a webpage type, a corresponding user's account, and so on; if the file data type associated with the matched record relates to pictures, therecord302 may include the name of a picture, information of the picture, and so on; if the file data type relates to business cards, therecord302 may include the name of a contact person, the phone number thereof, the address thereof, and so forth. Therecords302 described above are exemplary and may be defined according to practical applications and/or requirements, and therefore the records should not be construed as limitations to the invention.
Thedata processing unit920 then determines whether thesecond keyword904′ corresponding to the second speech input SP2 includes a term indicating an order (e.g., “I want the third option” or “I select the third one”). If thesecond keyword904′ corresponding to the second speech input SP2 includes an ordinal term (i.e., a term indicating an order), thedata processing unit920 selects data at related position from thefirst candidate list908 according to the order term. If thesecond keyword904′ corresponding to the second speech input SP2 does not include the ordinal term, the user may directly select a specificfirst report answer906 from thefirst candidate list908. Thedata processing unit920 may compare thesecond keyword904′ with each record302 corresponding to each one of the first report answers306 in thefirst candidate list908, and thedata processing unit920 may then determine which of the first report answers906 in thefirst candidate list908 corresponds to the second speech input SP2 based on comparison results. In an embodiment of the invention, thedata processing unit920 may determine whether any of the first report answers906 in thefirst candidate list908 corresponds to the second speech input SP2 according to the comparison result (e.g. completely match or partially match), thereby simplifying the selection processes. One of the first report answers906 is selected as thefirst report answer906 associated with the second speech input SP2 by thedata processing unit920 if it is the most matching one with the second speech input SP2 currently.
For instance, if the first speech input SP1 is “what is the weather today”, thefirst keywords904 corresponding to the first speech input SP1 is identified to include “today” and “weather” after the first speech input SP1 is recognized and manipulated under the rules of natural language processing. Accordingly, thedata processing unit920 reads data corresponding to today's weather and then controls thedisplay unit930 to display the weather data in thefirst candidate list908. If the second speech input SP2 is “I want to read the third data” or “I select the third data”, and thesecond keyword904′ corresponding to the second speech input SP2 is identified to include “the third” (which may be interpreted as a term indicating an order) after the first speech input SP1 is recognized and manipulated under the rules of natural language processing. Accordingly, thedata processing unit920 reads the third data in the first candidate list908 (i.e., the thirdfirst report answer906 in the first candidate list908) and controls thedisplay unit930 to display the corresponding weather data. Alternatively, if the second speech input SP2 is “I want to see the weather in Beijing” or “I select the weather in Beijing” and after the second speech input SP2 is recognized and manipulated under the rules of natural language processing, thesecond keywords904′ corresponding to the second speech input SP2 may include “Beijing” and “weather”. Accordingly, thedata processing unit920 reads data corresponding to Beijing in thefirst candidate list908. When only one selectedfirst report answer906 is left, the corresponding weather information may be directly displayed on thedisplay unit930. Additionally, when there are more than one selected first report answers906, a second candidate list908 (including at least onesecond report answer906′) is displayed for user's further selection.
In another example, if the first speech input SP1 is “I want to make a phone call to Mr. Chang” and after the first speech input SP1 is recognized and manipulated under the rules of natural language processing, thefirst keywords904 corresponding to the first speech input SP1 may include “phone” and “Chang”. Accordingly, thedata processing unit920 reads data of the contact people with the last name “Chang” (e.g., through a full-text search conducting to the structureddatabase220 to obtain the detailed data corresponding to the record302) and controls thedisplay unit930 to display the data of the contact people (i.e., the first report answer906) in thefirst candidate list908. Afterwards, if the second speech input SP2 is “the third Mr. Chang” or “I select the third”, and after the second speech input SP2 is recognized and manipulated under the rules of natural language processing, thesecond keyword904′ corresponding to the second speech input SP2 may include “the third”, which may be interpreted as an ordinal term. Accordingly, thedata processing unit920 reads the third data in the first candidate list908 (i.e., the third first report answer908) and dials the phone number according to the selected data. Alternatively, if the second speech input SP2 is “I select the number starting from 139” and after the second speech input SP2 is recognized and manipulated under the rules of natural language processing, thesecond keywords904′ corresponding to the second speech input SP2 may include “139” and “starting”. Please note “139” is not interpreted as an ordinal term, and accordingly thedata processing unit920 reads the data of the contact person with the phone number starting from 139. If the second speech input SP2 is “I want the Mr. Chang in Beijing” and after the second speech input SP2 is recognized and manipulated under the rules of natural language processing, thesecond keywords904′ corresponding to the second speech input SP2 may include “Beijing” and “Chang”, and accordingly thedata processing unit920 reads those contact persons' data with addresses in Beijing. When there is only onefirst report answer906 is left, thedata processing unit920 directly dials the number according to the first report answer906 (i.e., an application of dialing phone number associated with thisfirst report answer906 is performed). Additionally, when more than one selected first report answers906 are found, those selected first report answers906 are considered as the second report answers906′ which are further organized into asecond candidate list908′. Thesecond candidate list908′ is displayed to the user for further selection.
If the first speech input SP1 is “I want to look for a restaurant” and after the first speech input SP1 is recognized and manipulated under the rules of natural language processing, thefirst keyword904 corresponding to the first speech input SP1 may include “restaurant”. Accordingly, thedata processing unit920 reads all of the first report answers906 corresponding to “restaurant”. Since such an instruction is not clear enough, thefirst candidate list908 including all of the first report answers906 corresponding to “restaurant” is still displayed on thedisplay unit930 waiting for user's further instruction. After that, if the second speech input SP2 from the user is “the third restaurant” or “I select the third” and after the second speech input SP2 is recognized and manipulated under the rules of natural language processing, thesecond keyword904′ may include “the third” which may be interpreted as an ordinal term. Accordingly, thedata processing unit920 reads the third data in thefirst candidate list908 and displays associate data on thedisplay unit930. Alternatively, if the second speech input SP2 is “I select the nearest” and after the second speech input SP2 is recognized and manipulated under the rules of natural language processing, thesecond keyword904′ may include “nearest,” and accordingly thedata processing unit920 reads the address and relevant data of the restaurant closest to the user. If the second speech input SP2 is “I want a restaurant in Beijing” and after the second speech input SP2 is recognized and manipulated under the rules of natural language processing, thesecond keywords904′ may include “Beijing” and “restaurant”, and accordingly thedata processing unit920 reads the data of the restaurant with the address in Beijing. When only one selectedfirst report answer906 is left, thedata processing unit920 directly displays the selected data (e.g., related information of this only one restaurant) through thedisplay unit930. And, when there are more than one selected first report answers906, those selected first report answers906 are then considered as the second report answers906′ which are further organized as asecond candidate list908′, which is displayed to the user for further selection.
In view of the foregoing, thedata processing unit920 may perform an application according to the selected first report answer906 (or the selectedsecond report answer906′). For instance, if the application data type associated with the selectedfirst report answer906 relates to music, thedata processing unit920 plays associated music file according to the selected data; if the application data type relates to films, thedata processing unit920 plays associated film file according to the selected data; if the application data type relates to webpages, thedata processing unit920 displays associated webpage on thedisplay unit930 according to the selected data; if the application data type relates to pictures, thedata processing unit920 displays associated picture on thedisplay unit930 according to the selected data; and if the application data type relates to business cards, thedata processing unit920 dials corresponding phone number according to the selected data.
FIG. 10 is a schematic diagram illustrating an information system according to an embodiment of the invention. With references toFIGS. 9 and 10 of the present embodiment, theinformation system1000 includes a mobileterminal apparatus1010 and aserver1020, wherein theserver1020 may be a cloud server, a LAN server, or any other similar device, which should however not be construed as a limitation to the invention. The mobileterminal apparatus1010 includes aspeech receiving unit1011, adata processing unit1013, and adisplay unit1015. Thedata processing unit1013 is coupled to thespeech receiving unit1011, thedisplay unit1015, and theserver1020. The mobileterminal apparatus1010 may be a cell phone, a PDA phone, a smart phone, or any other mobile communication apparatus, which should neither be construed as a limitation to the invention. The functions of thespeech receiving unit1011 are similar to those of thespeech receiving unit910, and the functions of thedisplay unit1015 are similar to those of thedisplay unit930. Theserver1020 is configured to store a plurality of data, and the server has a speech recognition function.
In the present embodiment, thedata processing unit1013 recognizes the first speech input SP1 through theserver1020 to generate thefirst request information902. A natural language processing process is performed on thefirst request information902 to generate afirst keyword904 corresponding to the first speech input SP1. According to thefirst keyword904, theserver1020 conducts a full-text search in the structureddatabase220 to find afirst report answer906 and then delivers thefirst report answer906 to thedata processing unit1013. When the number of thefirst report answer906 is 1, thedata processing unit1013 may directly perform an application with associated file data indicated by thefirst report answer906. When the number of the first report answers906 is larger than 1, thedata processing unit1013 organizes the first report answers906 as thefirst candidate list908 and inform thedisplay unit1015 to display thefirst candidate list908 to the user for his/her further instruction. If the user further inputs an instruction, thedata processing unit1013 recognizes the second speech input SP2 through theserver1020 to generatesecond request information902′. Thesecond request information902′ is parsed, and then a natural language processing process is performed on thesecond request information902′ to generate asecond keyword904′ corresponding to the second speech input SP2. According to thesecond keyword904′ derived from the second speech input SP2, theserver1020 selects one or morefirst report answer906 from thefirst candidate list908 as thesecond report answer906′ and transmits thesecond report answer906′ to thedata processing unit1013. Similarly, when there is only onesecond report answer906′ left, thedata processing unit1013 may directly perform an operation with file data associated with thesecond report answer906′. Additionally, when the number of the second report answers906′ is larger than 1, thedata processing unit1013 arranges the second report answers906′ into asecond candidate list908′ and controls thedisplay unit1015 to display thesecond candidate list908′ to the user for his/her further selection. Theserver1020 then selects report answers according to the user's next speech input, and thedata processing unit1013 performs corresponding operation according to the number of the subsequently selected data. These functions have been described above and thus will not be further explained hereinafter.
In an embodiment, if there is onlyfirst report answer906 according to thefirst keyword904 is selected, an operation (in accompanied with associated file data) corresponding to the selected data may be directly performed. Besides, in another embodiment, a hint may be output to inform the user that the operation corresponding to the selectedfirst report answer906 is performed. In yet another embodiment, if there is only onesecond report answer906 according to thesecond keyword904′ is selected, an operation (also in accompanied with associated file data) corresponding to the selected data may be directly performed. In yet another embodiment, a hint may also be output to inform the user that the operation corresponding to the selectedfirst report answer906 is performed. This should not be construed as a limitation to the invention.
To be specific, theserver1020 compares each record302 in the structureddatabase220 with thefirst keyword904 corresponding to the first speech input SP1. When one of therecords302 in the structureddatabase220 at least partially matches thefirst keyword904, therecord302 is considered as the matched result corresponding to the first speech input SP1, and therecord302 then serves as one of the first report answers906. If the number of thefirst report answer906 selected according to thefirst keyword904 is more than one, the user may further input his/her instruction by way of the second speech input SP2. The user's instruction input by way of the second speech input SP2 may include an order (that indicates the order of displaying the data). The user may also directly select one of the displayed data (e.g., the user may directly indicate the content of certain information). Alternatively, the user's intention may be determined according to the user's instruction (e.g., if the user selects the nearest restaurant, the “nearest” restaurant is displayed to the user). Afterwards, theserver1020 determines whether thesecond keyword904′ corresponding to the second speech input SP2 includes an ordinal term indicating an order. If thesecond keyword904′ corresponding to the second speech input SP2 includes the ordinal term, theserver1020 selects thefirst report answer906 at a position from thefirst candidate list908 according to the ordinal term. By contrast, if thesecond keyword904′ corresponding to the second speech input SP2 does not include any ordinal term, theserver1020 compares thesecond keyword904′ corresponding to the second speech input SP2 with eachfirst report answer906 in thefirst candidate list908 to decide a match degree between each of the first report answers906 and the second speech input SP2, and then theserver1020 determines which of the first report answers906 in thefirst candidate list908 corresponds to the second speech input SP2 according to those match degrees. In an embodiment of the invention, theserver1020 may determine whether any of the first report answers906 in thefirst candidate list908 corresponds to the second speech input SP2 according to the match degree between each of the first report answers906 and thesecond keyword904′, thereby simplifying the selection process. The server may select the first report answers906 having the largest match degree with the second speech input SP2 as the corresponding one.
FIG. 11 is a flowchart illustrating a selection method based on speech recognition according to an embodiment of the invention. With reference toFIG. 11, a first speech input SP1 is received in step S1100, and the first speech input SP1 is recognized to generate thefirst request information902 in step S1110. In step S1120, thefirst request information902 is parsed and a natural language processing process is performed thereon so as to generate afirst keyword904 corresponding to the first speech input SP1. At least onefirst report answer906 corresponding to thefirst keyword904 is selected from a plurality of data (step S1130), and theserver1020 determines whether there is only onefirst report answer906 left (step S1140). If there is only one selectedfirst report answer906, i.e., the determination result in step S1140 is “yes,” an operation/application is performed on the file data indicated by the first report answer906 (step S1150). If the number of the selected first report answers906 is larger than one, i.e., the determination result in step S1140 is “no,” afirst candidate list908 is displayed according to the selected first report answers906, and then a second speech input SP2 is received (step S1160). The second speech input SP2 is recognized to generatesecond request information902′ (step S1170), and thesecond request information902′ is parsed and a natural language processing process is performed thereon to generate asecond keyword904′ corresponding to the second speech input SP2 (step S1180). In step S1190, corresponding report answer(s) is(are) selected from the first report answers906 in thefirst candidate list908 according to thesecond request information902′. After that, the process goes back to step S1140 to determine whether there is only onefirst report answer906 is selected again. The order of performing the steps in the selection method is merely exemplary and should not be construed as a limitation to the invention. The details of these steps may be referred to as those described in the embodiments shown inFIGS. 9 and 10 and thus will not be further explained hereinafter.
As is discussed above, in the selection method based on speech recognition, the mobile terminal apparatus, and the information system, the first speech input and the second speech input are recognized and then a natural language processing process is performed thereon, so as to obtain the keywords corresponding to the first and second speech inputs. A selection of the report answers is then made according to the keywords derived from the first and second speech inputs, so as to facilitate users' conveniences in operations.
An embodiment which applies the structure and the components of the naturallanguage comprehension system100 and thestructured database220 with an activation assisting apparatus will be given below.
FIG. 12 is a block diagram illustrating a speech control system according to an embodiment of the invention. With reference toFIG. 12, thespeech control system1200 includes anactivation assisting apparatus1210, a mobileterminal apparatus1220, and aserver1230. In the present embodiment, theactivation assisting apparatus1210 activates a speech system of the mobileterminal apparatus1220 through a wireless transmission signal, such that the mobile terminal apparatus120 may communicate with theserver1230 according to a speech signal.
Specifically, theactivation assisting apparatus1210 includes a firstwireless transmission module1212 and a triggeringmodule1214 coupled to the firstwireless transmission module1212. The firstwireless transmission module1212 may be a device supporting a wireless communication protocol, such as wireless fidelity (Wi-Fi), worldwide interoperability for microwave access (WiMAX), Bluetooth, ultra-wideband (UWB), or radio-frequency identification (RFID), and the firstwireless transmission module1212 is capable of transmitting a wireless transmission signal, so as to correspond to and establish a wireless connection with another wireless transmission module. The triggeringmodule1214 is, for instance, a button or a key. In the present embodiment, when the triggeringmodule1214 is pressed by a user and generates a triggering signal, the firstwireless transmission module1212 receives the triggering signal and is thereby activated. At this time, the firstwireless transmission module1212 generates the wireless transmission signal and transmits the wireless transmission signal to the mobileterminal apparatus1220. According to an embodiment of the invention, theactivation assisting apparatus1210 may be a Bluetooth headset.
Although some existing hands-free headsets/microphones have features designed for activating the mobileterminal apparatus1220, theactivation assisting apparatus1210 described in another embodiment of the invention may be different from the above-mentioned hands-free headsets/microphones. Specifically, unlike the headsets/microphones on the mobileterminal apparatus1220, the existing hands-free headsets/microphones are connected to the mobile terminal apparatus for performing the reception and communication functions, and the activation function is merely auxiliary; however, theactivation assisting apparatus1210 described herein is “only” configured to activate the speech system of the mobileterminal apparatus1220 and does not have the reception and communication functions. Hence, the interior circuit design of theactivation assisting apparatus1210 may be simplified, and the costs of theactivation assisting apparatus1210 may also be reduced. In other words, compared to the above hands-free headsets/microphones, theactivation assisting apparatus1210 is an independent apparatus, i.e., the user may simultaneously have the hands-free headsets/microphones and theactivation assisting apparatus1210 described herein.
In addition, theactivation assisting apparatus1210 may be made in form of portable objects that are readily available for the user, e.g., a ring, a watch, a pair of earrings, a necklace, a pair of glasses, or other accessories; alternatively, theactivation assisting apparatus1210 may be made in form of installation components, e.g., vehicle accessories configured on the steering wheel. The invention is not limited thereto. That is, theactivation assisting apparatus1210 is an apparatus that “goes into our lives,” and the interior system design of theactivation assisting apparatus1210 allows the user to easily touch the triggeringmodule1214, so as to activate the speech system. For instance, when theactivation assisting apparatus1210 is in form of a ring, the user may easily trigger the triggeringmodule1214 by moving his/her finger to press the ring. On the other hand, when theactivation assisting apparatus1210 is an accessory to a car, the user may also easily trigger the triggeringmodule1214 while he or she is driving. In addition, wearing the headsets/microphones may cause discomfort. However, theactivation assisting apparatus1210 described herein is capable of activating the speech system in the mobileterminal apparatus1220 and even further performing a sound amplifying function (described hereinafter), such that the user can pick up the phone or talk on the phone through the mobileterminal apparatus1220 in no need of wearing the headsets/microphones. As far as the user is concerned, theactivation assisting apparatus1210 that “goes into our lives” are accessories to be worn or used, and thus the user does not need to get used to wearing or using theactivation assisting apparatus1210. For instance, when the user cooks in the kitchen and needs to make a phone call through a mobile phone placed in the living room, if the user wears theactivation assisting apparatus1210 in form of a ring, a necklace, or a watch, the user may touch the ring, the necklace, or the watch to activate the speech system to ask a friend for the details in a menu. Although some existing headsets/microphones having the activation functions may also complete said task, it is not necessary for the user to call a friend every time during cooking, and therefore it is rather inconvenient for the user to constantly wear the headsets/microphones during cooking for fear of not being able to control the mobile terminal apparatus if necessary.
In another embodiment, theactivation assisting apparatus1210 may also be equipped with awireless charge battery1216 for driving the firstwireless transmission module1212. More specifically, thewireless charge battery1216 includes abattery unit12162 and awireless charge module12164 that is coupled to thebattery unit12162. Here, thewireless charge module12164 is capable of receiving energy from a wireless power supply apparatus (not shown) and converting the energy into electricity to charge thebattery unit12162. As a result, the firstwireless transmission module1212 of theactivation assisting apparatus1210 may be charged conveniently by thewireless charge battery1216.
On the other hand, the mobileterminal apparatus1220 is, for instance, a cell phone, a PDA phone, a smart phone, a pocket PC with communication software, a tablet PC with communication software, or a notebook computer with communication software. In brief, the mobileterminal apparatus1220 may be any portable mobile apparatus capable of performing communication functions, and the type of the mobileterminal apparatus1220 is not limited in the invention. Besides, said electronic apparatuses may be operated by an Android operating system, a Microsoft operating system, a Linux operating system, and so forth, which should not be construed as a limitation to the invention.
The mobileterminal apparatus1220 includes a secondwireless transmission module1222. The secondwireless transmission module1222 matches the firstwireless transmission module1212 in theactivation assisting apparatus1210 and is subject to the corresponding wireless communication protocol, such as Wi-Fi, WiMAX, Bluetooth, UWB, or RFID, so as to establish a wireless connection with the firstwireless transmission module1212. It should be mentioned that the “first”wireless transmission module1212 and the “second”wireless transmission module1222 indicate that these wireless transmission modules are configured in different apparatuses, respectively, and the terms “first” and “second” should not be construed as limitations to the invention.
In another embodiment, the mobileterminal apparatus1220 further includes aspeech system1221. Thespeech system1221 is coupled to the secondwireless transmission module1222; therefore, after the user triggers the triggeringmodule1214 in theactivation assisting apparatus1210, thespeech system1221 may be activated in a wireless manner through the firstwireless transmission module1212 and the secondwireless transmission module1222. In an embodiment of the invention, thespeech system1221 may include aspeech sampling module1224, aspeech synthesis module1226, and aspeech output interface1227. Thespeech sampling module1224 is configured to receive speech signals from the user. Here, thespeech sampling module1224 is a microphone or another device that receives audio signals, for instance. Thespeech synthesis module1226 may conduct a search in a speech synthesis database that records texts and corresponding speech information, for instance, such that thespeech synthesis module1226 is allowed to find out the speech corresponding to certain text information and thereby create a synthesized speech based on the text information. The synthesized speech may then be output by thespeech synthesis module1226 through thespeech output interface1227 and broadcast to the user. Thespeech output interface1227 is, for instance, a speaker or a headset.
The mobileterminal apparatus1220 may further include acommunication module1228. Thecommunication module1228 is, for instance, a device (e.g., a radio-frequency transceiver) that can transmit and receive wireless signals. To be specific, thecommunication module1228 allows the user to receive or make a phone call or enjoy other services provided by telecommunication service providers via the mobileterminal apparatus1220. According to the present embodiment, thecommunication module1228 may receive response information from theserver1230 through the Internet and establish a communication connection between the mobileterminal apparatus1220 and at least one electronic apparatus according to the response information. Here, the electronic apparatus is, for instance, another mobile terminal apparatus (not shown).
Theserver1230 is, for instance, a network server or a cloud server, and theserver1230 has aspeech comprehension module1232. In the present embodiment, thespeech comprehension module1232 includes aspeech recognition module12322 and aspeech processing module12324 coupled to thespeech recognition module12322. Thespeech recognition module12322 receives the speech signal transmitted from thespeech sampling module1224 and converts the speech signal into a plurality of semantic segments (e.g., keywords or phrases). Thespeech processing module12324 may parse the semantic segments to learn their meanings (e.g., intentions, time, places, and so forth) and further determine the meaning of the speech signal. In addition, thespeech processing module12324 may generate the corresponding response information according to the result of parsing the semantic segments. According to the present embodiment, thespeech comprehension module1232 may be implemented by hardware circuits constituted by one or several logic gates or computer programming codes. In another embodiment, note that thespeech comprehension module1232 may be configured in the mobileterminal apparatus1320, such as thespeech control system1300 shown inFIG. 13. The operations of thespeech comprehension module1232 in theserver1230 may be referred to as those of the naturallanguage comprehension system100 shown inFIG. 1A and those of the naturallanguage dialogue system500/700/700′ shown in FIG.5A/7A/7B.
A speech control method is described hereinafter with reference to the above-mentionedspeech control system1200.FIG. 14 is a block diagram illustrating a speech control method according to an embodiment of the invention. With reference toFIG. 12 andFIG. 14, in step S1402, theactivation assisting apparatus1210 transmits a wireless transmission signal to the mobileterminal apparatus1220. Specifically, when the firstwireless transmission module1212 of theactivation assisting apparatus1210 receives a triggering signal and is accordingly triggered, theactivation assisting apparatus1210 transmits the wireless transmission signal to the mobileterminal apparatus1220. To be more specific, when the triggeringmodule1214 of theactivation assisting apparatus1210 is pressed by the user, the triggeringmodule1214 is triggered because of its triggering signal, such that the firstwireless transmission module1212 transmits the wireless transmission signal to the secondwireless transmission module1222 in the mobileterminal apparatus1220, and that the firstwireless transmission module1212 can be connected to the secondwireless transmission module1222 through the wireless communication protocol. Theactivation assisting apparatus1210 merely serves to activate the speech system in the mobileterminal apparatus1220 and does not have the reception and communication function; therefore, the interior circuit design of theactivation assisting apparatus1210 may be simplified, and the costs of theactivation assisting apparatus1210 may also be reduced. In other words, compared to the hands-free headsets/microphones attached to the normal mobile terminal apparatus, theactivation assisting apparatus1210 is an independent apparatus, i.e., the user may simultaneously have the hands-free headsets/microphones and theactivation assisting apparatus1210 described herein.
Note that theactivation assisting apparatus1210 may be made in form of portable objects that are readily available for the user, e.g., a ring, a watch, a pair of earrings, a necklace, a pair of glasses, or other accessories; alternatively, theactivation assisting apparatus1210 may be made in form of installation components, e.g., vehicle accessories configured on the steering wheel. The invention is not limited thereto. That is, theactivation assisting apparatus1210 is an apparatus that “goes into our lives,” and the interior system design of theactivation assisting apparatus1210 allows the user to easily touch the triggeringmodule1214, so as to activate thespeech system1221. Accordingly, theactivation assisting apparatus1210 described herein is capable of activating thespeech system1221 in the mobileterminal apparatus1220 and even further performing a sound amplifying function (described hereinafter), such that the user can pick up the phone or talk on the phone through the mobileterminal apparatus1220 in no need of wearing the headsets/microphones. As far as the user is concerned, theactivation assisting apparatus1210 that “goes into our lives” are accessories to be worn or used, and thus the user does not need to get used to wearing or using theactivation assisting apparatus1210.
Both the firstwireless transmission module1212 and the secondwireless transmission module1222 may run in a sleep mode or a working mode. In the sleep mode, the wireless transmission modules are in a turned-off state, i.e., the wireless transmission modules neither receive nor detect the wireless transmission signal and thus are not able to be connected to other wireless transmission modules. In the working mode, the wireless transmission modules are in a turned-on state, i.e., the wireless transmission modules continuously detect the wireless transmission signal or transmit the wireless transmission signal at any time, and thus the wireless transmission modules are able to connect to other wireless transmission modules. If the triggeringmodule1214 is triggered when the firstwireless transmission module1212 runs in the sleep mode, the triggeringmodule1214 wakes up the firstwireless transmission module1212, so that the firstwireless transmission module1212 enters the working mode and transmits the wireless transmission signal to the secondwireless transmission module1222. Thereby, the firstwireless transmission module1212 is connected to the secondwireless transmission module1222 of the mobileterminal apparatus1220 according to the wireless communication protocol.
On the other hand, in order to prevent excessive power consumption caused by the firstwireless transmission module1212 that is kept running in the working mode, during a predetermined time (e.g., 5 minutes) after the firstwireless transmission module1212 runs in the working mode, if the triggeringmodule1214 is not triggered again, the firstwireless transmission module1212 in the working mode enters the sleep mode, and the connection between the firstwireless transmission module1212 and the secondwireless transmission module1222 of the mobileterminal apparatus1220 is terminated.
In step S1404, the secondwireless transmission module1222 of the mobileterminal apparatus1220 receives the wireless transmission signal to activate thespeech system1221. In step S1406, when the secondwireless transmission module1222 detects the wireless transmission signal, the mobileterminal apparatus1220 activates thespeech system1221, and thespeech sampling module1224 in thespeech system1221 starts to receive the speech signal, such as “what is the temperature today,” “make a phone call to Mr. Wang,” “please search a phone number,” etc.
In step S1408, thespeech sampling module1224 transmits the speech signal to thespeech comprehension module1232 in theserver1230 to parse the speech signal and generate the response information through thespeech comprehension module1232. Particularly, thespeech recognition module12322 in thespeech comprehension module1232 receives the speech signal from thespeech sampling module1224 and divides the speech signal into several semantic segments. Thespeech processing module12324 then proceeds to understand the meanings of the semantic segments, so as to generate the response information corresponding to the speech signal.
In another embodiment, the mobileterminal apparatus1220 may further receive the response information generated by thespeech processing module12324 and output the contents of the response information through thespeech output interface1227 or execute the commands issued by the response information. In step S1410, thespeech synthesis module1226 of the mobileterminal apparatus1220 receives the response information generated by thespeech comprehension module1232, conducts speech synthesizing according to the contents of the response information (e.g., words or phrases) to generate a corresponding speech response. In step S1412, thespeech output interface1227 receives and outputs the speech response.
For instance, when the user presses the triggeringmodule1214 of theactivation assisting apparatus1210, the firstwireless transmission module1212 transmits the wireless transmission signal to the secondwireless transmission module1222, such that the mobileterminal apparatus1220 activates thespeech sampling module1224 in thespeech system1221. Here, the speech signal from the user is assumed to be an interrogative sentence, e.g., “what is the temperature today?” and thespeech sampling module1224 receives and transmits the to-be-parsed speech signal to thespeech comprehension module1232 in theserver1230. After the speech signal is parsed, thespeech comprehension module1232 transmits response information corresponding to the parsed speech signal back to the mobileterminal apparatus1230. If the response information generated by thespeech comprehension module1232 indicates “30° C.”, thespeech synthesis module1226 converts the information “30° C.” into a synthesized speech response, and thespeech output interface1227 broadcasts the synthesized speech response to the user.
In another embodiment of the invention, the speech signal from the user is assumed to be an imperative sentence, e.g., “make a phone call to Mr. Wang”, and thespeech comprehension module1232 may recognize this imperative sentence as “a request for making a phone call to Mr. Wang.” Thespeech comprehension module1232 may further generate a new response information, e.g., “please confirm whether to call Mr. Wang or not,” and thespeech comprehension module1232 transmits this new response information to the mobileterminal apparatus1220. Here, thespeech synthesis module1226 may convert the new response information into a synthesized speech response and output the synthesized speech response to the user through thespeech output interface1227. More specifically, if the response from the user is affirmative (e.g., “yes”), thespeech sampling module1224 may receive and transmit the speech signal (e.g. “yes”) to theserver1230, such that thespeech comprehension module1232 may parse the speech signal. After thespeech comprehension module1232 completes the parsing process on the speech signal, thespeech comprehension module1232 may generate dial command information stored in the response information and transmit the response information to the mobileterminal apparatus1220. At this time, thecommunication module1228 may search and find the phone number of “Mr. Wang” according to the contact information stored in a phone number database, so as to establish a communication connection between the mobileterminal apparatus1220 and another electronic apparatus. That is, thecommunication module1228 makes a phone call to “Mr. Wang”.
According to other embodiments of the invention, in addition to thespeech control system1200 described above, thespeech control system1300 or other similar system may be applicable when said speech control method is conducted, and the invention is not limited thereto.
To sum up, in the speech control system and the speech control method described herein, the speech function of the mobile terminal apparatus may be activated by the activation assisting apparatus in a wireless manner. In addition, the activation assisting apparatus may be made in form of portable objects that are readily available for the user, e.g., a ring, a watch, a pair of earrings, a necklace, a pair of glasses, or other accessories; alternatively, the activation assisting apparatus may be made in form of installation components, e.g., vehicle accessories configured on the steering wheel. The invention is not limited thereto. Unlike wearing the existing hands-free headsets/microphones which causes discomfort, using theactivation assisting apparatus1210 to activate the speech system in the mobileterminal apparatus1220 is rather convenient.
Note that theserver1230 that includes the speech comprehension module may be a network server or a cloud server, and the cloud server may lead to issues regarding the user's privacy. For instance, the user has to upload the complete contact information to the cloud server for making a phone call, sending information, or performing other operations that may require the uploaded contact information. Even though the cloud server employs encrypted connections, and no cache file is saved, the user's concerns about security cannot be alleviated. Therefore, another speech control method and a corresponding speech interaction system are provided, so that the mobile terminal apparatus is capable of providing the speech interaction service with the cloud server on the condition that the complete contact information is not required to be uploaded. In order to make the invention more comprehensible, embodiments are described below as the examples to prove that the invention can actually be realized. Although the disclosure has been described with reference to the above embodiments, it will be apparent to one of the ordinary skill in the art that modifications to the described embodiment may be made without departing from the spirit of the disclosure. Accordingly, the scope of the disclosure will be defined by the attached claims not by the above detailed descriptions.