Movatterモバイル変換


[0]ホーム

URL:


US9836452B2 - Discriminating ambiguous expressions to enhance user experience - Google Patents

Discriminating ambiguous expressions to enhance user experience
Download PDF

Info

Publication number
US9836452B2
US9836452B2US14/586,395US201414586395AUS9836452B2US 9836452 B2US9836452 B2US 9836452B2US 201414586395 AUS201414586395 AUS 201414586395AUS 9836452 B2US9836452 B2US 9836452B2
Authority
US
United States
Prior art keywords
dialog
natural language
language expression
domain
responses
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active, expires
Application number
US14/586,395
Other versions
US20160188565A1 (en
Inventor
Jean-Philippe Robichaud
Ruhi Sarikaya
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Microsoft Technology Licensing LLC
Original Assignee
Microsoft Technology Licensing LLC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Microsoft Technology Licensing LLCfiledCriticalMicrosoft Technology Licensing LLC
Priority to US14/586,395priorityCriticalpatent/US9836452B2/en
Assigned to MICROSOFT TECHNOLOGY LICENSING, LLCreassignmentMICROSOFT TECHNOLOGY LICENSING, LLCASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS).Assignors: SARIKAYA, RUHI, ROBICHAUD, JEAN-PHILIPPE
Assigned to MICROSOFT TECHNOLOGY LICENSING, LLCreassignmentMICROSOFT TECHNOLOGY LICENSING, LLCASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS).Assignors: MICROSOFT CORPORATION
Priority to BR112017010222Aprioritypatent/BR112017010222A2/en
Priority to AU2015374382Aprioritypatent/AU2015374382B2/en
Priority to RU2017122991Aprioritypatent/RU2017122991A/en
Priority to MX2017008583Aprioritypatent/MX367096B/en
Priority to CN201580070449.8Aprioritypatent/CN107111611A/en
Priority to KR1020177018038Aprioritypatent/KR102602475B1/en
Priority to PCT/US2015/067238prioritypatent/WO2016109307A2/en
Priority to JP2017535358Aprioritypatent/JP6701206B2/en
Priority to CA2968016Aprioritypatent/CA2968016C/en
Priority to EP15821005.4Aprioritypatent/EP3241125A2/en
Publication of US20160188565A1publicationCriticalpatent/US20160188565A1/en
Priority to US15/830,767prioritypatent/US11386268B2/en
Publication of US9836452B2publicationCriticalpatent/US9836452B2/en
Application grantedgrantedCritical
Priority to AU2020267218Aprioritypatent/AU2020267218B2/en
Activelegal-statusCriticalCurrent
Adjusted expirationlegal-statusCritical

Links

Images

Classifications

Definitions

Landscapes

Abstract

Methods and systems are provided for discriminating ambiguous expressions to enhance user experience. For example, a natural language expression may be received by a speech recognition component. The natural language expression may include at least one of words, terms, and phrases of text. A dialog hypothesis set from the natural language expression may be created by using contextual information. In some cases, the dialog hypothesis set has at least two dialog hypotheses. A plurality of dialog responses may be generated for the dialog hypothesis set. The dialog hypothesis set may be ranked based on an analysis of the plurality of the dialog responses. An action may be performed based on ranking the dialog hypothesis set.

Description

BACKGROUND
Language understanding applications (e.g., digital assistant applications) require at least some contextual language understanding for interpreting spoken language input. In this regard, digital assistant applications may have experience interpreting spoken language inputs having a specific domain and/or task. For example, a digital assistant application may provide accurate results when interpreting a spoken language input related to a calendar event. However, in scenarios where the digital assistant application does not know how to handle the spoken language input, a backend solution (e.g., the web) may be used to provide a user with results. It may be difficult to determine when to use the digital assistant application and when to use a backend solution for a given spoken language input. In some cases, deterministic hard-coded rules may be used to determine when to use the digital assistant application and when to use a backend solution to fulfill a user's request. The cost of crafting and implementing these rules, as well as evaluating their accuracy, is high. Additionally, hard-coded rules do not scale well for locale expansion (e.g., interpreting new and/or different languages). Furthermore, when it is determined to use a backend solution, the spoken language input is sent to the backend solution “as is” and a result is provided based on the received spoken language input. Consequently, as commonly known to the community, the hard-coded rules are “coarse-grained” and the overall user experience suboptimal.
It is with respect to these and other general considerations that embodiments have been made. Also, although relatively specific problems have been discussed, it should be understood that the embodiments should not be limited to solving the specific problems identified in the background.
SUMMARY
In summary, the disclosure generally relates to discriminating ambiguous expressions. More particularly, the disclosure relates to methods and systems for discriminating ambiguous expressions to enhance user experience. For example, a natural language expression may be received by a speech recognition component. The natural language expression may include at least one of words, terms, and phrases of text. A dialog hypothesis set from the natural language expression may be created by using contextual information. In some cases, the dialog hypothesis set has at least two dialog hypotheses. A plurality of dialog responses may be generated for the dialog hypothesis set. The dialog hypothesis set may be ranked based on an analysis of the plurality of the dialog responses. An action may be performed based on ranking the dialog hypothesis set.
This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.
BRIEF DESCRIPTION OF THE DRAWINGS
Non-limiting and non-exhaustive examples are described with reference to the following Figures.
FIG. 1 illustrates an exemplary dynamic system implemented at a client computing device for discriminating ambiguous expressions, according to an example embodiment.
FIG. 2 illustrates an exemplary dynamic system implemented at a server computing device for discriminating ambiguous expressions, according to an example embodiment.
FIG. 3 illustrates an exemplary block diagram of a dialog component for discriminating ambiguous expressions, according to an example embodiment.
FIG. 4 illustrates an exemplary method for discriminating ambiguous expressions, according to an example embodiment.
FIG. 5 illustrates an exemplary method for training a dialog component to discriminate ambiguous expressions, according to an example embodiment.
FIG. 6 illustrates an exemplary method for discriminating ambiguous expressions, according to an example embodiment.
FIG. 7 is a block diagram illustrating example physical components of a computing device with which embodiments of the disclosure may be practiced.
FIGS. 8A and 8B are simplified block diagrams of a mobile computing device with which embodiments of the present disclosure may be practiced.
FIG. 9 is a simplified block diagram of a distributed computing system in which embodiments of the present disclosure may be practiced.
FIG. 10 illustrates a tablet computing device for executing one or more embodiments of the present disclosure.
DETAILED DESCRIPTION
In the following detailed description, references are made to the accompanying drawings that form a part hereof, and in which are shown by way of illustrations specific aspects or examples. These aspects may be combined, other aspects may be utilized, and structural changes may be made without departing from the spirit or scope of the present disclosure. Aspects may be practiced as methods, systems or devices. Accordingly, aspects may take the form of a hardware implementation, an entirely software implementation or an implementation combining software and hardware aspects. The following detailed description is therefore not to be taken in a limiting sense, and the scope of the present disclosure is defined by the appended claims and their equivalents.
The present disclosure generally relates to using supervised and unsupervised machine learning techniques for discriminating ambiguous requests. Existing techniques for discriminating ambiguous requests rely on deterministic hard-coded rules that are costly to craft and implement. For example, pre-determined rules may be written and implemented in current systems based on data (e.g., spoken language inputs) received by the system over time to determine how to respond to spoken language inputs. However, using hard-coded rules to discriminate ambiguous requests is difficult to do with good confidence due to a natural overlap with multiple domains. Furthermore, using hard-coded rules to discriminate ambiguous requests may provide a suboptimal user experience. Accordingly, aspects described herein include machine learning based techniques for dynamically discriminating ambiguous requests. Such machine learning based techniques enable determining which user experience to use to best respond to a specific user spoken language input (e.g., request). For example, information from various sources may be used to dynamically convert an ambiguous request into a query that provides relevant results to the user. Dynamically converting an ambiguous request into a query that provides relevant results to the user based on information from various sources may result in a better user experience with the system and/or an application associated with the system (e.g., a digital assistant application). Additionally, discriminating ambiguous requests may reduce the number of clarifying requests and/or responses the system and/or application has to provide. As such, fewer computations may be required by a client and/or server computing device.
With reference toFIG. 1, one aspect of adynamic system100 for discriminating ambiguous request is illustrated. In aspects, thedynamic system100 may be implemented on aclient computing device104. In a basic configuration, theclient computing device104 is a handheld computer having both input elements and output elements. Theclient computing device104 may be any suitable computing device for implementing thedynamic system100 for contextual language understanding. For example, theclient computing device104 may be at least one of: a mobile telephone; a smart phone; a tablet; a phablet; a smart watch; a wearable computer; a personal computer; a desktop computer; a laptop computer; etc. This list is exemplary only and should not be considered as limiting. Any suitable client computing device for implementing thedynamic system100 for contextual language understanding may be utilized.
In aspects, thedynamic system100 may include aspeech recognition component110, alanguage understanding component120, adialog component130, and abackend engine140. The various components may be implemented using hardware, software, or a combination of hardware and software. Thedynamic system100 may be configured to process natural language expressions. In this regard, thedynamic system100 may facilitate discriminating ambiguous requests. In one example, a natural language expression may include phrases, words, and/or terms in the form of a spoken language input (e.g., a user query and/or request). In another example, a natural language expression may include phrases, words, and/or terms in the form of a textual language input (e.g., a user query and/or request). In this regard, the natural language expression may be ambiguous and/or have missing information. For example, the natural language expression, “how about tomorrow,” is ambiguous when analyzed in isolation.
Thedynamic system100 may be configured to process natural language expressions in different scenarios. For example, thedynamic system100 may process natural language expressions in single-turn scenarios and/or multi-turn scenarios. A single-turn scenario may be a scenario where a spoken language input/natural language expression is processed in isolation during a session between a user and thedynamic system100. A single-turn scenario may indicate that only information from the currently processed natural language expression is utilized to discriminate ambiguous requests. A multi-turn scenario is a scenario where more than one spoken language input/natural language expression is processed during a session between auser102 and thedynamic system100. In some cases, each natural language expression may be interpreted as a turn during a session. A turn may include both the natural language expression and a response/action by thedynamic system100. That is, a first turn may include both a natural language expression and a response/action by thedynamic system100. In other aspects, a multi-turn scenario indicates that information from multiple turns of the session may be utilized to make a prediction and/or discriminate ambiguous requests. A session may include a conversation between a user and application (e.g., a digital assistant application) of thedynamic system100. The session may start when the application is activated and a user starts speaking and end when the application is de-activated.
As discussed above, thedynamic system100 may include aspeech recognition component110, alanguage understanding component120, adialog component130, and abackend engine140. In aspects, thespeech recognition component110 may include standard speech recognition techniques known to those skilled in the art such as “automatic speech recognition” (ASR), “computer speech recognition”, and “speech to text” (STT). In some cases, thespeech recognition component110 may include standard text to speech techniques known to those skilled in the art such as “text to speech” (TTS). One skilled in the art would recognize thatspeech recognition component110 may include one or more various different types of speech recognition and/or text recognition components. In some cases, thespeech recognition component110 is configured to receive a natural language expression and output a plurality of n-best representations of the received natural language expression. For example, thespeech recognition component110 may receive the natural language expression “is the five twenty on time,” and output a first representation including, “is the five twenty on time,” and a second representation including, “is BE five twenty on time.” In this regard, there may be ambiguity regarding whether the natural language expression refers to a public transport service, for example, or a flight “BE520.” The n-best representations may be generated using a single ASR, SST, or TTS, or using multiple ASRs, SSTs, or TTSs. The n-best representations of the natural language expression may be further processed to discriminate the ambiguity in the representations of the natural language expression, which is discussed in detail below.
In aspects, thelanguage understanding component120 may include standard spoken language understanding models such as support vector machines, conditional random fields and/or convolutional non-recurrent neural networks for training purposes. One skilled in the art would recognize that various different standard language understanding models such as the support vector machines, conditional random fields, and convolutional neural networks, can be employed by the different aspects disclosed herein. In this regard, thelanguage understanding component120 may be configured to receive n-best representations from thespeech recognition component110 and make predictions based on the received n-best representations from thespeech recognition component110. For example, thelanguage understanding component120 may perform domain and intent prediction (e.g., using the support vector machines) and slot tagging (e.g., using conditional random fields). In one aspect, domain prediction may include classifying the natural language expression into a supported domain of thelanguage understanding component120. Domain may refer to generally known topics such as places, reminder, calendar, weather, communication, and the like. For example, in the natural language expression, “show me driving directions to Portland,” thelanguage understanding component120 may extract the feature, “Portland” and classify the natural language expression into the supported domain, “Places,” of thelanguage understanding component120.
In one aspect, intent prediction may include determining intent of theuser102 via the natural language expression. For example, in the natural language expression, “show me driving directions to Portland,” thelanguage understanding component120 may determine that the intent of theuser102 is an intent classification such as, for example “get_route.” In one aspect, slot tagging may include performing slot detection on the natural language expression. In one case, slot detection may include filling slot types (e.g., slot types supported by the language understanding component120) with semantically loaded words from the natural language expression. For example, in the natural language expression, “from 2 pm to 4 pm,” slot tagging may include filling the slot type “start_time” with “2 pm” and the slot type “end_type” with “4 pm.”
As discussed above, thedynamic system100 may process the natural language expression in a variety of scenarios including both single-turn and multi-turn scenarios. In this regard, thelanguage understanding component120 may evaluate the natural language expression using information from the currently processed natural language expression and contextual information from the currently processed natural language expression. Contextual information may include information extracted from each turn in a session. For example, the information extracted may include the domain prediction, intent prediction, and slot types predicted (e.g., the results) from a previous turn (e.g., a previous natural language expression/request from the current session). In another case, the contextual information may include the response to a previous turn by thedynamic system100. For example, the response to a previous turn may include how thedynamic system100 responded to the previous request from a user (e.g., what the dynamic system output/said to the user), items located on a display of theclient computing device104, text located on the display of theclient computing device104, and the like. In another case, the contextual information may include client context. For example, client context may include a contact list on theclient computing device104, a calendar on theclient computing device104, GPS information (e.g., a location of the client computing device104), the current time (e.g., morning, night, in a meeting, in a workout, driving, etc.), and the like. In another case, the contextual information may include knowledge content. For example, knowledge content may include a knowledge database that maps features from the natural language expression with stored data. As an example, “John Howie,” which is the name of a restaurant in Bellevue, may be mapped to a restaurant in the knowledge database. In yet another case, the contextual information includes any combination of the above-discussed contextual information.
In aspects, thelanguage understanding component120 may perform domain and intent prediction (e.g., using the support vector machines) and slot tagging (e.g., using conditional random fields) using the contextual information described above. For example, a first turn of a session may include the natural language expression, “how is the weather tomorrow.” In this example, thelanguage understanding component120 may predict the domain classification as “Weather.” A second turn of the same session may include the natural language expression, “how about this weekend.” In this example, thelanguage understanding component120 may predict the domain classification as “Weather.” For example, thelanguage understanding component120 may evaluate the first turn, “how is the weather tomorrow” and the first turn predicted domain classification “Weather,” to predict the domain classification of the second turn, “how about this weekend.” In this regard, based on the first turn of the same session being a request about the weather and having a “Weather” domain classification, thelanguage understanding component120 may predict that the expression “how about this weekend” is related to the first expression “how is the weather tomorrow,” and therefore classify the domain as “Weather.”
In another example, a first turn of a session may include the natural language expression, “show me driving directions to Portland.” In this example, thelanguage understanding component120 may predict the domain classification as “Places,” and the intent classification of the user as “get_route.” A second turn of the same session may include the natural language expression, “how about Vancouver.” In this example, thelanguage understanding component120 may predict the domain classification as “Places,” and the intent classification of the user as “get_route.” As illustrated, thelanguage understanding component120 uses contextual information from the first turn in the first session to predict the intent classification of theuser102 from the second turn in the first session, “how about Vancouver,” as “get_route.”
In yet another example, a first turn of a session may include the natural language expression, “create a meeting with Jason.” In this example, thelanguage understanding component120 may predict the domain classification as “Calendar,” and the intent classification of theuser102 as “create_meeting.” A second turn of the same session may include the natural language expression, “from 2 pm to 4 pm.” In this example, thelanguage understanding component120 may predict the domain classification as “Calendar,” and the slot types as “start_time=2 pm” and “end_time=4 pm.” As illustrated, thelanguage understanding component120 uses contextual information from the first turn in the first session to predict the slot type for the second turn in the first session “from 2 pm to 4 pm” as “start_time=2 pm” and “end_time=4 pm.”
In aspects, the predications determined by thelanguage understanding component120 may be sent to thedialog component130 for processing. In this regard, thedialog component130 may be configured to create a dialog hypothesis set for each natural language expression and determine what response/action to take for each natural language expression, which will be described in detail below relative toFIG. 3. Thedialog component130 may receive a combination of information for processing. For example, thedialog component130 may receive input context (e.g., contextual information), the natural language expressions received by thedynamic system100, and the predictions made by thelanguage understanding component120. The input context may include client information (e.g., the type of device of the client), and the contextual information discussed above.
When thedialog component130 receives the combination of information for processing, thedialog component130 may create a dialog hypothesis set. The dialog hypothesis set may include at least two dialog hypotheses based on the natural language expression. In some cases, the dialog hypothesis set may include any number of dialog hypotheses. In one case, a dialog hypothesis may be created based on the prediction received from thelanguage understanding component120. For example, thelanguage understanding component120 may predict that the natural language expression, “create a meeting with Jason,” is a request to create a meeting with Jason and is categorized in the “Calendar” domain. As such, thedialog component130 may create a similar hypothesis and send the natural language expression, “create a meeting with Jason” to a Calendar domain component for processing. In another case, a dialog hypothesis may be created based on the combination of information (e.g., contextual information) received from other components in thedynamic system100. For example, thelanguage understanding component120 may not handle the natural language expression, “how did my football team do yesterday.” As such, thedialog component130 may create a similar hypothesis and send the natural language expression, “how did my football team do yesterday,” to a web domain component for processing. The web domain component may utilize the combination of information to create a web domain hypothesis set. The web domain hypothesis set may include a plurality of queries created using the natural language expression and the combination of information such that each query of the plurality of queries includes a different expression, which will be described in detail below inFIG. 3.
In aspects, thedialog component130 may determine what response/action to take for each natural language expression. In this regard, thedialog component130 may rank the hypotheses in the dialog hypothesis set by analyzing responses that are returned in response to performing a query using the hypotheses, which will be described in detail relative toFIG. 3. The query may be performed by using abackend engine140. Thebackend engine140 may include any backend engine suitable to receive and process text and/or keyword natural language expressions/queries. In one example, thebackend engine140 may include a search engine such as Bing, Google, Yahoo, and the like. In another example, thebackend engine140 may include a domain specific search engine such as places, reminder, calendar, weather, communication, and the like. In one case, thebackend engine140 may be located at thedialog component130. In other cases, thebackend engine140 may be located at a server computing device that is in communication with thedialog component130. In other cases, portions of thebackend engine140 may be located at thedialog component130 and portions of thebackend engine140 may be located at the server computing device in any combination.
FIG. 2 illustrates adynamic system200 for discriminating ambiguous requests according to one or more aspects disclosed herein. In aspects, thedynamic system200 may be implemented on aserver computing device204. Theserver computing device204 may provide data to and from theclient computing device104 through anetwork205. In one aspect, thenetwork205 is a distributed computing network, such as the Internet. In aspects, thatdynamic system200 may be implemented on more than oneserver computing device204, such as a plurality ofserver computing devices204. As shown inFIG. 2, thedynamic system200 may include aspeech recognition component210, alanguage understanding component220, adialog component230, and abackend engine240. Thedynamic system200 may be configured to process natural language expressions. In this regard, thedynamic system200 may discriminate ambiguous requests. Thespeech recognition component210, thelanguage understanding component220, thedialog component230, and thebackend engine240 may be configured similar to thespeech recognition component110, thelanguage understanding component120, thedialog component130, and thebackend engine140 described above relative toFIG. 1. In this regard, thedynamic system200 may include all the functionality described in the above aspects relative to thedynamic system100 ofFIG. 1.
As discussed above, theserver computing device204 may provide data to and from theclient computing device104 through thenetwork205. The data may be communicated over any network suitable to transmit data. In some aspects, thenetwork205 is a computer network such as the internet. In this regard, thenetwork205 may include a Local Area Network (LAN), a Wide Area Network (WAN), the Internet, wireless and wired transmission mediums. One of skill in the art will appreciate that other types of networks may be employed with the aspects disclosed herein. In this regard, the natural language expression may be received at theclient computing device104 and transmitting over thenetwork205 for processing by thestatistical system200 at theserver computing device204. It is appreciated that the dynamic system (e.g.,dynamic system100 and dynamic system200) components (e.g., thespeech recognition component110/210, thelanguage understanding component120/220, thedialog component130/230, and thebackend engine140/240) may be located at theclient computing device104, theserver computing device204, and/or both theclient computing device104 and theserver computing device204 in any combination. For example, in one aspect, theclient computing device104 may include thespeech recognition component110 and thelanguage understanding component120 and theserver computing device204 may include thedialog component230 and thebackend engine240 in one configuration. This is exemplary only and should not be considered as limiting. Any suitable combination of dynamic system components at theclient computing device104 and theserver computing device204 for discriminating ambiguous requests may be utilized.
FIG. 3 illustrates an exemplary block diagram of adialog component130/230 for discriminating ambiguous requests, according to one or more aspects of the present disclosure. As discussed above, thedialog component130 may be configured to create a dialog hypothesis set for each natural language expression and determine what response/action to take for each natural language expression, for example. In this regard, as illustrated inFIG. 3, thedialog component130/230 may include ahypothesis preparation component310, anshallow answer component320, afallback query component330,domain components340A-340N, a hypothesis and ranking selection component (HRS)350, and abackend engine360. As discussed above, thedialog component130 may receive a combination of information for processing. For example, thedialog component130 may receive input context, the natural language expressions received by thedynamic system100, and the predictions made by the language understanding component120 (e.g., the contextual information as described above). The input context may include client information (e.g., the type of device of the client), and the contextual information discussed above. In this regard, thehypothesis preparation component310, theshallow answer component320, thefallback query component330, thedomain components340A-340N, and the hypothesis and ranking selection component (HRS)350 may be configured to receive the combination of information for processing.
In one aspect, thehypothesis preparation component310 is configured to create a hypothesis set based on the received information. As discussed above, the dialog hypothesis set may include at least two dialog hypotheses based on the natural language expression. In some cases, the dialog hypothesis set may include any number of dialog hypotheses. In one case, a dialog hypothesis may be created based on the prediction received from thelanguage understanding component120. For example, thelanguage understanding component120 may predict that the natural language expression, “create a meeting with Jason,” is a request to create a meeting with Jason and is categorized in the “Calendar” domain. As such, thehypothesis preparation component310 may create a similar hypothesis and send the natural language expression, “create a meeting with Jason” to a Calendar domain component for processing. In another case, a dialog hypothesis may be created based on the combination of information received from other components in thedynamic system100. For example, thelanguage understanding component120 may not handle the natural language expression, “how did my football team do yesterday.” As such, thehypothesis preparation component310 may create a similar hypothesis and send the natural language expression, “how did my football team do yesterday,” to a web domain component for processing.
In the example where thelanguage understanding component120 does not handle the natural language expression and sends the natural language expression to a web domain component for processing, the web domain component may create a fallback query to be sent to thebackend engine360. For example, a first turn of a session may include the natural language expression a session may include “find restaurants near me.” The natural language expression, “find restaurants near me,” may be handled by a Place domain component. A second turn of the session may include the natural language expression, “show the Italian ones only.” The natural language expression, “show the Italian ones only” may be handled by the Place domain component. A third turn of the session may include the natural language expression, “which ones are kids friendly.” The Place domain component may not be able to handle the natural language expression, “which ones are kids friendly.” As such, thedialog component130/230 may create a fallback query to be handled by thebackend engine360. Thedialog component130/230 may create a query to facilitate improved search results generated by thebackend engine360. For example, thedialog component130/230 may create a first query by concatenating all previous and current turns of a session. Using the example described above, the first query may be, “find restaurants near me show the Italian ones which ones are kids friendly.” In another example, thedialog component130/230 may create a second query by concatenating a stop-word removal analysis performed from the previous and current turns of a session. Using the same example as described above, the second query may be, “restaurants near me show Italian ones only kids friendly.” In yet another example, thedialog component130/230 may create a third query by concatenating semantic entities extracted from the previous and current turns of a session. In one case, a semantic entity may be any portion of the natural language expression, classifications of the natural language expression and/or results from processing the natural language expression that have been determined to have meaning. Using the same example as described above, the third query may be, “restaurant Bellevue WA Italian Food Family.” In this regard, when thedialog component130/230 uses thebackend engine360 to perform a search, a query other than the natural language expression “as is” is created to facilitate more relevant results being returned.
In one aspect, thedomain components340A-340N may include domains handled by the digital assistant application and a web domain. The domains handled by the digital assistant application may include places, reminder, calendar, weather, communication, and the like. For example,domain component340A may be a calendar domain component and may process calendar domain hypotheses. In another example,domain component340B may be a weather domain component and may process weather domain hypotheses. In yet another example,domain component340N may be a web domain component and may process web domain hypotheses. It is appreciated that thedomain components340A-340N may be any type of domain components and thedialog component130/230 may include any number ofdomain components340A-340N. In the example wheredomain component340A is a calendar domain component, whendomain component340A receives a calendar domain hypothesis from thehypothesis preparation component310, thedomain component340A may schedule a meeting based on the hypothesis. For example, if the calendar domain hypothesis is, “schedule a meeting with Jason from 2 pm to 4 pm tomorrow,” thedomain component340A may add this meeting to the user's calendar for tomorrow from 2 pm-4 pm.
In another example, when the hypothesis is a web domain hypothesis, theweb domain component340N may receive the web domain hypothesis and the combination of information from different sources. In this regard, theweb domain component340N may use the combination of information from different sources to discriminate ambiguous information in the web domain hypothesis. In one example, a web domain hypothesis may be, “who do the Broncos play at that time.” Instead of theweb domain component340N performing a search using the web domain hypothesis/query, “who do the Broncos play at that time,” theweb domain component340N may use the combination of information received to create a web domain hypothesis set of created web domain hypotheses. In one example, theweb domain component340N may use a previous turn from the current session to create the web domain hypothesis set. For example, the first turn of the current session may be, “what is the weather like tomorrow.” In this regard, theweb domain component340N may use the first turn and the determined slot type, “time=tomorrow,” to create a first created web domain hypothesis such as, “who do the Broncos play tomorrow.” As illustrated, theweb domain component340N replaced the ambiguous phrase, “at that time,” with the determined slot type, “time=tomorrow.” In another example, theweb domain component340N may combine the first turn of the current session with web domain hypothesis to create a second created web domain hypothesis, “what is the weather like tomorrow who do the Broncos play at that time.” In yet another example, theweb domain component340N may combine only semantic entities from the first turn and current web domain hypothesis to create a third created web domain hypothesis, “weather tomorrow Broncos.”
In some aspects, the web domain hypothesis set may be sent to theshallow answer component320. Theshallow answer component320 may provide answers for each of the web domain hypotheses in the web domain hypothesis set. For example, each web domain hypothesis may be sent to theshallow answer component320 to perform a query using the web domain hypothesis. In some cases, the answers for each of the web domain hypotheses may include specialized results for query types that are frequency received. For example, a frequent query type may include queries about the weather. In this example, the answers may include specialized results relating the weather. As such, when theshallow answer component320 performs a query using the web domain hypothesis, the answers returned by theshallow answer component320 may be based on the specialized results. For example, if the web domain hypothesis includes terms/entities that are frequently queried, the answers returned may include specialized results. In another example, if the web domain hypothesis does not include terms/entities that are frequently queried, the answers returned may not include specialized results (e.g., the results returned may not be useful). In this regard, the answers from theshallow answer component320 may be indicative of which web domain hypotheses in the web domain hypothesis set return the best/most relevant results.
In one case, the results for each web domain hypothesis may be reviewed by a human to determine which result is the best. In this regard, theHRS component350 may learn which features from a domain hypothesis correlate with the most relevant search results. For example, the features extracted for the domain hypothesis may include confidence scores, the number of results returned (e.g., if any), the presence or absence of specialized results, etc. As such, when a human determines the most relevant results for a domain hypothesis of a set of domain hypotheses, theHRS component350 may learn how to use the features associated with the domain hypothesis that generates the most relevant results.
In another case, logged queries and their corresponding search results may be compared with the results of each web domain hypothesis. For example, using the example described above, a first turn of a session may be, “what is the weather like tomorrow.” A second turn of the session may be, “Who do the Broncos play against at that time.” Thedialog component130 may not be able to handle the second turn, “Who do the Broncos play against at that time,” and may send this query to thebackend engine360. Thebackend engine360 may not be able to discriminate the ambiguity, “at that time.” In this regard, the user may have to re-query and say something like, “Who do the Broncos play against tomorrow.” Thedialog component130 may send this query to thebackend engine360 and get relevant results back. These sessions of natural language expressions and their corresponding query results may be logged. As such, theHRS component350 may analyze the logged data to determine when two turns of a session are very similar and when a turn of a session is a re-query of the session. For example, theHRS component350 may identify lexical similarities between the two turns of the session. In another example, theHRS component350 may identify that the number and/or quality of results of a second turn are better than a first turn. The more relevant results together with the lexical similarities may indicate that the turn is a re-query. As such, theHRS component350 may determine what information/features should be carried over from a previous turn to a current turn to get relevant search results. That is, theHRS component350 may learn what features produce results equivalent to the results produced for the re-query of the session. As such, machine learning techniques are used to determine what information to carry over from a previous turn to a current turn for providing relevant search results. In some cases, machine learning techniques may include artificial neural networks, Bayesian classifiers, and/or genetically derived algorithms, which have been developed through training with annotated training sets.
In aspects, theHRS component350 may include ranking techniques such as an “N-best” list, a priority queue, a Gaussian distribution, and/or a histogram (e.g., a histogram identifying trends in the hypothesis scores of the respective dialog hypotheses). As discussed above, theHRS component350 may extract features from the dialog hypotheses of the dialog hypothesis set and score and rank the features. In one case, the features extracted from the dialog hypotheses may include at least a confidence score for the predicted domain classification, a confidence score for the predicted intent classification, and a slot count for the predicted slot types. In another case, the features extracted from the dialog hypotheses may include features associated with the dialog hypotheses. For example, the extracted features may include the number of web results returned, the number of deep links returned, the number of answers triggered, and the number of answers suppressed. In yet another case, the features extracted from the dialog hypotheses may include a word count from the natural language expression, the text from the natural language expression, and the combined text from multiple turns in a session. It is appreciated that any combination of the features as described herein may be extracted from the dialog hypotheses.
In one case, the scores may be calculated and ranked using discriminative approaches based on a conditional probability distribution among the dialog hypotheses. In another case, the scores may be calculated and ranked using generative approaches involving a joint probability distribution of potential dialog hypotheses. As discussed above, theHRS component350 may receive the dialog hypotheses from thedomain components340A-340N, shallow answers from theshallow answer component320, the combination of information from different sources, and results from thebackend engine360. In this regard, the features extracted from the dialog hypotheses are scored and ranked by analyzing the results received for each dialog hypothesis. For example, if it is determined that a first dialog hypothesis returns more relevant results than a second dialog hypothesis, the features extracted from the first dialog hypothesis will be scored and ranked higher than the features from the second dialog hypothesis.
In some cases, theHRS component350 may calculate a score for two dialog hypotheses that is similar. As such, there may be an ambiguity as to which dialog hypothesis should be ranked the highest. In the case of ambiguity, a fallback query may be used to discriminate the ambiguity. For example, thefallback query component330 may include a set of fallback queries that can be used to discriminate ambiguities. For example, a fallback query may include a query such as, “sorry, I didn't hear you well,” “sorry, I don't understand what you mean,” and the like. In other cases, when there is an ambiguity as to which dialog hypothesis should be ranked the highest, theHRS component350 may decide to pick the dialog hypothesis with the highest score, even if the difference is very small. In other cases, when there is an ambiguity as to which dialog hypothesis should be ranked the highest, theHRS component350 may send a disambiguation question to a user of theclient computing device104 such as, “I'm not sure what you want to do, do you want to look up the opening hours of 5 Guys Burger restaurant?” If the user answers yes, theHRS component350 may rank the dialog hypothesis associated with the answer as the highest. In the user answers no, theHRS component350 may send a generic web search query to thebackend engine360. In another case, when there is an ambiguity as to which dialog hypothesis should be ranked the highest, theHRS component350 may ask the user to disambiguate between the two dialog hypotheses. For example, theHRS component350 may send a question to the user of the client computing device104 a questions such as, “please tell me what's closer to what you mean: “weather Broncos tomorrow,” or “who do the Broncos play at that time tomorrow.”
FIG. 4 illustrates a method for discriminating ambiguous requests according to one or more embodiments of the present disclosure.Method400 begins atoperation402 where a natural language expression is received. For example, the natural language expression may be received by the dynamic system for processing to determine the intent and/or ultimate goal of a user of a digital assistant application, for example. In one example, a natural language expression may include phrases, words, and/or terms in the form of a spoken language input (e.g., a user query and/or request). In this regard, the natural language expression may be ambiguous and/or have missing information. For example, the natural language expression, “how about tomorrow,” is ambiguous when analyzed in isolation.
When a natural language expression is received at the dynamic system, flow proceeds tooperation404 where a dialog hypothesis set is created using contextual information. In one case, contextual information may include information extracted from each turn in a session. For example, the information extracted may include the domain prediction, intent prediction, and slot types predicted (e.g., the results) from a previous turn (e.g., a previous natural language expression/request from the current session). In another case, the contextual information may include the response to a previous turn by the dynamic system. For example, the response to a previous turn may include how the dynamic system responded to the previous request from a user (e.g., what the dynamic system output/said to the user), items located on a display of the client computing device, text located on the display of the client computing device, and the like. In another case, the contextual information may include client context. For example, client context may include a contact list on the client computing device, a calendar on the client computing device, GPS information (e.g., a location of the client computing device), the current time (e.g., morning, night, in a meeting, in a workout, driving, etc.), and the like. In another case, the contextual information may include knowledge content. For example, knowledge content may include a knowledge database that maps features from the natural language expression with stored data. As an example, “John Howie” may be mapped to a restaurant in the knowledge database. In this regard, a plurality of dialog hypotheses may be generated for the received natural language expression such that each dialog hypothesis consists of a different expression including a variety of features from the contextual information.
After the dialog hypothesis set is created using contextual information, flow proceeds tooperation406 where a plurality of dialog responses are generated for the dialog hypothesis set. For example, each dialog hypothesis in the dialog hypothesis set may have a corresponding set of query results. In one case, the plurality of dialog responses may be generated by sending the dialog hypotheses to a web backend engine. In another case, the plurality of dialog responses may be generated by domain specific components. For example, the dialog hypotheses may include features indicating a weather domain. In this case, the dialog hypotheses may be sent to a weather domain backend engine. In another case, the plurality of dialog responses may be generated by domain specific components and a web backend engine. In this regard, the plurality of responses may include results from both the domain specific component and the web backend engine.
When the plurality of dialog responses are generated for the dialog hypothesis set, flow proceeds tooperation408 where the dialog hypothesis set is ranked. For example, features may be extracted from the dialog hypotheses in the dialog hypothesis set. A score for the extracted features may be calculated. In this regard, the extracted features may be ranked based on the calculated score. In turn, it may be determined which dialog hypothesis in the dialog hypothesis set returns the most relevant results. In other cases, it may be determined which backend engine for the highest ranked dialog hypothesis is the best backend engine to use for generating results. In one case, the features extracted from the dialog hypotheses are scored and ranked by analyzing the results received for each dialog hypothesis. For example, if it is determined that a first dialog hypothesis returns more relevant results than a second dialog hypothesis, the features extracted from the first dialog hypothesis will be scored and ranked higher than the features from the second dialog hypothesis.
When the dialog hypothesis set is ranked, flow proceeds tooperation410 where an action based on the ranking is performed. In one case, the action performed may include using the highest ranked dialog hypothesis to query a web backend engine for results and sending the results to the user of the client computing device. In some examples, the user of the client computing device can identify the query used to obtain the search results. As such, the user may see that the query used to obtain the search results is different than the user's original natural language expression/request and may include features extracted from the user's previous request in the same session. In other cases, there may be an ambiguity as to which dialog hypothesis should be ranked the highest. In this case, the action performed may include using a fallback query. For example, a fallback query may include a query such as, “sorry, I didn't hear you well,” “sorry, I don't understand what you mean,” and the like. In other cases, the action performed may include sending a generic web search query to a backend engine.
FIG. 5 illustrates a method for training a dialog component to discriminate ambiguous requests, according to one or more embodiments of the present disclosure.Method500 begins atoperation502 where a dialog hypothesis set is created using contextual information. In one case, contextual information may include information extracted from each turn in a session. For example, the information extracted may include the domain prediction, intent prediction, and slot types predicted (e.g., the results) from a previous turn (e.g., a previous natural language expression/request from the current session). In another case, the contextual information may include the response to a previous turn by the dynamic system. For example, the response to a previous turn may include how the dynamic system responded to the previous request from a user (e.g., what the dynamic system output/said to the user), items located on a display of the client computing device, text located on the display of the client computing device, and the like. In another case, the contextual information may include client context. For example, client context may include a contact list on the client computing device, a calendar on the client computing device, GPS information (e.g., a location of the client computing device), the current time (e.g., morning, night, in a meeting, in a workout, driving, etc.), and the like. In another case, the contextual information may include knowledge content. For example, knowledge content may include a knowledge database that maps features from the natural language expression with stored data. As an example, “John Howie” may be mapped to a restaurant in the knowledge database. In this regard, a plurality of dialog hypotheses may be generated for the received natural language expression such that each dialog hypothesis consists of a different expression including a variety of features from the contextual information.
After the dialog hypothesis set is created using contextual information, flow proceeds tooperation504 where a plurality of dialog responses are generated for the dialog hypothesis set. For example, each dialog hypothesis in the dialog hypothesis set may have a corresponding set of query results. In one case, the plurality of dialog responses may be generated by sending the dialog hypotheses to a web backend engine. In another case, the plurality of dialog responses may be generated by domain specific components. For example, the dialog hypotheses may include features indicating a weather domain. In this case, the dialog hypotheses may be sent to a weather domain backend engine. In another case, the plurality of dialog responses may be generated by domain specific components and a web backend engine. In this regard, the plurality of responses may include results from both the domain specific component and the web backend engine.
When the plurality of dialog responses have been generated, flow proceeds tooperation506 where the plurality of dialog responses are compared with a plurality of logged dialog responses. In one case, logged responses may include responses generated from a natural language expression (as opposed to responses generated from a created dialog hypothesis). For example, a first turn of a session may include the natural language expression, “what's the weather like for tomorrow,” and a second turn of the session may include the natural language expression, “who do the Broncos play against at that time.” In this case, a user may have to re-query to get relevant results. As such, a third turn of the session may include the natural language expression, “who do the Broncos play against tomorrow.” All the data from the session may be logged. For example, the first turn, second turn, and third turn and their corresponding responses may be logged. As such, in one example, the results from the third turn where the user had to re-query may be compared with the results of a dialog hypothesis to determine a similarity between the results.
Atoperation508, it is determined which of the plurality of dialog responses match the logged dialog responses. When it is determined that a dialog response matches a logged response, flow proceeds tooperation510 where the dialog hypothesis corresponding to the dialog response that matches the logged response is labeled. For example, the label may indicate to the dialog component that the features carried over from a previous turn to create the dialog hypothesis are good features to carry over. That is, carrying over those features may facilitate generating relevant responses. In one example, the label may be a “true” label. In some cases, more than one dialog hypothesis may be labeled. For example, there may be more than one dialog response that matches a logged response and/or a plurality of logged dialog responses. In this case, the dialog hypotheses corresponding with the dialog responses that match the logged dialog response and/or the plurality of logged dialog responses may be labeled. After the dialog hypothesis corresponding to the dialog response that matches the logged response is labeled, the dialog hypothesis may be stored (e.g., operation512). When it is determined that a dialog response does not match a logged response, flow proceeds tooperation512 where the dialog hypothesis corresponding to the dialog responses that don't match the logged responses are stored.
FIG. 6 illustrates an exemplary method for discriminating ambiguous requests, according to one or more aspects of the present disclosure.Method600 begins atoperation602 where a natural language expression is received. For example, the natural language expression may be received by the dynamic system for processing to determine the intent and/or ultimate goal of a user of a digital assistant application, for example. In one example, a natural language expression may include phrases, words, and/or terms in the form of a spoken language input (e.g., a user query and/or request). In this regard, the natural language expression may be ambiguous and/or have missing information. For example, the natural language expression, “how about tomorrow,” is ambiguous when analyzed in isolation.
When a natural language expression is received at the dynamic system, flow proceeds tooperation604 where a dialog hypothesis set is created using contextual information. In one case, contextual information may include information extracted from each turn in a session. For example, the information extracted may include the domain prediction, intent prediction, and slot types predicted (e.g., the results) from a previous turn (e.g., a previous natural language expression/request from the current session). In another case, the contextual information may include the response to a previous turn by the dynamic system. For example, the response to a previous turn may include how the dynamic system responded to the previous request from a user (e.g., what the dynamic system output/said to the user), items located on a display of the client computing device, text located on the display of the client computing device, and the like. In another case, the contextual information may include client context. For example, client context may include a contact list on the client computing device, a calendar on the client computing device, GPS information (e.g., a location of the client computing device), the current time (e.g., morning, night, in a meeting, in a workout, driving, etc.), and the like. In another case, the contextual information may include knowledge content. For example, knowledge content may include a knowledge database that maps features from the natural language expression with stored data. As an example, “John Howie” may be mapped to a restaurant in the knowledge database. In this regard, a plurality of dialog hypotheses may be generated for the received natural language expression such that each dialog hypothesis consists of a different expression including a variety of features from the contextual information.
After the dialog hypothesis set is created using contextual information, flow proceeds tooperation606 where a plurality of dialog responses are generated for the dialog hypothesis set. For example, each dialog hypothesis in the dialog hypothesis set may have a corresponding set of query results. In one case, the plurality of dialog responses may be generated by sending the dialog hypotheses to a web backend engine. In another case, the plurality of dialog responses may be generated by domain specific components. For example, the dialog hypotheses may include features indicating a weather domain. In this case, the dialog hypotheses may be sent to a weather domain backend engine. In another case, the plurality of dialog responses may be generated by domain specific components and a web backend engine. In this regard, the plurality of responses may include results from both the domain specific component and the web backend engine.
When the plurality of dialog responses are generated for the dialog hypothesis set, flow proceeds tooperation608 where the dialog hypothesis set is ranked. For example, features may be extracted from the dialog hypotheses in the dialog hypothesis set. A score for the extracted features may be calculated. In this regard, the extracted features may be ranked based on the calculated score. In turn, it may be determined which dialog hypothesis in the dialog hypothesis set returns the most relevant results. In other cases, it may be determined which backend engine for the highest ranked dialog hypothesis is the best backend engine to use for generating results. In one case, the features extracted from the dialog hypotheses are scored and ranked by analyzing the results received for each dialog hypothesis. For example, if it is determined that a first dialog hypothesis returns more relevant results than a second dialog hypothesis, the features extracted from the first dialog hypothesis will be scored and ranked higher than the features from the second dialog hypothesis.
Atoperation610, it is determined whether the ranking of the dialog hypothesis set is ambiguous. For example, two or more dialog hypotheses may have a similar score such that there is ambiguity regarding the dialog hypothesis with the highest score. When it is determined that the ranking of the dialog hypothesis set is ambiguous, flow proceeds tooperation612 where a fallback query is used. For example, a fallback query may include a query such as, “sorry, I didn't hear you well,” “sorry, I don't understand what you mean,” and the like. When it is determined that the ranking of the dialog hypothesis set is not ambiguous, flow proceeds tooperation614 where an action is performed based on the ranking. For example, the action performed may include using the highest ranked dialog hypothesis to query a web backend engine for results and sending the results to the user of the client computing device. In another example, the action performed may include sending a generic web search query to a backend engine.
FIGS. 7-10 and the associated descriptions provide a discussion of a variety of operating environments in which aspects of the disclosure may be practiced. However, the devices and systems illustrated and discussed with respect toFIGS. 7-10 are for purposes of example and illustration and are not limiting of a vast number of computing device configurations that may be utilized for practicing embodiments of the disclosure, described herein
FIG. 7 is a block diagram illustrating physical components (e.g., hardware) of acomputing device700 with which aspects of the disclosure may be practiced. The computing device components described below may have computer executable instructions for a digitalassistant application713, e.g., of a client and/or computer, executable instructions for contextuallanguage understanding module711, e.g., of a client, that can be executed to employ themethods400 through600 disclosed herein. In a basic configuration, thecomputing device700 may include at least oneprocessing unit702 and asystem memory704. Depending on the configuration and type of computing device, thesystem memory704 may comprise, but is not limited to, volatile storage (e.g., random access memory), non-volatile storage (e.g., read-only memory), flash memory, or any combination of such memories. Thesystem memory704 may include anoperating system705 and one ormore program modules706 suitable for runningsoftware applications720 such as discriminating ambiguous request applications in regards toFIGS. 1-3 and, in particular, digitalassistant application713 ordialog module711. Theoperating system705, for example, may be suitable for controlling the operation of thecomputing device700. Furthermore, embodiments of the disclosure may be practiced in conjunction with a graphics library, other operating systems, or any other application program and is not limited to any particular application or system. This basic configuration is illustrated inFIG. 7 by those components within a dashedline708. Thecomputing device700 may have additional features or functionality. For example, thecomputing device700 may also include additional data storage devices (removable and/or non-removable) such as, for example, magnetic disks, optical disks, or tape. Such additional storage is illustrated inFIG. 7 by aremovable storage device709 and anon-removable storage device710.
As stated above, a number of program modules and data files may be stored in thesystem memory704. While executing on theprocessing unit702, the program modules706 (e.g.,dialog module711 or digital assistant application713) may perform processes including, but not limited to, the aspects, as described herein. Other program modules that may be used in accordance with aspects of the present disclosure, and in particular for contextual language understanding, may include single-turn models, multi-turn models, combination models, final models, and/or computer-aided application programs, etc.
Furthermore, embodiments of the disclosure may be practiced in an electrical circuit comprising discrete electronic elements, packaged or integrated electronic chips containing logic gates, a circuit utilizing a microprocessor, or on a single chip containing electronic elements or microprocessors. For example, embodiments of the disclosure may be practiced via a system-on-a-chip (SOC) where each or many of the components illustrated inFIG. 7 may be integrated onto a single integrated circuit. Such an SOC device may include one or more processing units, graphics units, communications units, system virtualization units and various application functionality all of which are integrated (or “burned”) onto the chip substrate as a single integrated circuit. When operating via an SOC, the functionality, described herein, with respect to the capability of client to switch protocols may be operated via application-specific logic integrated with other components of thecomputing device600 on the single integrated circuit (chip). Embodiments of the disclosure may also be practiced using other technologies capable of performing logical operations such as, for example, AND, OR, and NOT, including but not limited to mechanical, optical, fluidic, and quantum technologies. In addition, embodiments of the disclosure may be practiced within a general purpose computer or in any other circuits or systems.
Thecomputing device700 may also have one or more input device(s)712 such as a keyboard, a mouse, a pen, a sound or voice input device, a touch or swipe input device, etc. The output device(s)714 such as a display, speakers, a printer, etc. may also be included. The aforementioned devices are examples and others may be used. Thecomputing device700 may include one ormore communication connections716 allowing communications with other computing devices718. Examples ofsuitable communication connections716 include, but are not limited to, RF transmitter, receiver, and/or transceiver circuitry; universal serial bus (USB), parallel, and/or serial ports.
The term computer readable media as used herein may include computer storage media. Computer storage media may include volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information, such as computer readable instructions, data structures, or program modules. Thesystem memory704, theremovable storage device709, and thenon-removable storage device710 are all computer storage media examples (e.g., memory storage). Computer storage media may include RAM, ROM, electrically erasable read-only memory (EEPROM), flash memory or other memory technology, CD-ROM, digital versatile disks (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other article of manufacture which can be used to store information and which can be accessed by thecomputing device700. Any such computer storage media may be part of thecomputing device700. Computer storage media does not include a carrier wave or other propagated or modulated data signal.
Communication media may be embodied by computer readable instructions, data structures, program modules, or other data in a modulated data signal, such as a carrier wave or other transport mechanism, and includes any information delivery media. The term “modulated data signal” may describe a signal that has one or more characteristics set or changed in such a manner as to encode information in the signal. By way of example, and not limitation, communication media may include wired media such as a wired network or direct-wired connection, and wireless media such as acoustic, radio frequency (RF), infrared, and other wireless media.
FIGS. 8A and 8B illustrate amobile computing device800, for example, a mobile telephone, a smart phone, wearable computer (such as a smart watch), a tablet computer, a laptop computer, and the like, with which embodiments of the disclosure may be practiced. In some aspects, the client may be a mobile computing device. With reference toFIG. 8A, one aspect of amobile computing device800 for implementing the aspects is illustrated. In a basic configuration, themobile computing device800 is a handheld computer having both input elements and output elements. Themobile computing device800 typically includes adisplay805 and one ormore input buttons810 that allow the user to enter information into themobile computing device800. Thedisplay805 of themobile computing device800 may also function as an input device (e.g., a touch screen display). If included, an optionalside input element815 allows further user input. Theside input element815 may be a rotary switch, a button, or any other type of manual input element. In alternative aspects,mobile computing device800 may incorporate more or less input elements. For example, thedisplay805 may not be a touch screen in some embodiments. In yet another alternative embodiment, themobile computing device800 is a portable phone system, such as a cellular phone. Themobile computing device800 may also include anoptional keypad835.Optional keypad835 may be a physical keypad or a “soft” keypad generated on the touch screen display. In various embodiments, the output elements include thedisplay805 for showing a graphical user interface (GUI), a visual indicator820 (e.g., a light emitting diode), and/or an audio transducer825 (e.g., a speaker). In some aspects, themobile computing device800 incorporates a vibration transducer for providing the user with tactile feedback. In yet another aspect, themobile computing device800 incorporates input and/or output ports, such as an audio input (e.g., a microphone jack), an audio output (e.g., a headphone jack), and a video output (e.g., a HDMI port) for sending signals to or receiving signals from an external device.
FIG. 8B is a block diagram illustrating the architecture of one aspect of a mobile computing device. That is, themobile computing device800 can incorporate a system (e.g., an architecture)802 to implement some aspects. In one embodiment, thesystem802 is implemented as a “smart phone” capable of running one or more applications (e.g., browser, e-mail, calendaring, contact managers, messaging clients, games, and media clients/players). In some aspects, thesystem802 is integrated as a computing device, such as an integrated personal digital assistant (PDA) and wireless phone.
One ormore application programs866 may be loaded into thememory862 and run on or in association with theoperating system864. Examples of the application programs include phone dialer programs, e-mail programs, personal information management (PIM) programs, word processing programs, spreadsheet programs, Internet browser programs, messaging programs, and so forth. Thesystem802 also includes anon-volatile storage area868 within thememory862. Thenon-volatile storage area868 may be used to store persistent information that should not be lost if thesystem802 is powered down. Theapplication programs866 may use and store information in thenon-volatile storage area868, such as e-mail or other messages used by an e-mail application, and the like. A synchronization application (not shown) also resides on thesystem802 and is programmed to interact with a corresponding synchronization application resident on a host computer to keep the information stored in thenon-volatile storage area868 synchronized with corresponding information stored at the host computer. As should be appreciated, other applications may be loaded into thememory862 and run on themobile computing device800, including the instructions to create a calendar event as described herein (e.g., and/or optionally calendar event creation module711).
Thesystem802 has apower supply870, which may be implemented as one or more batteries. Thepower supply870 might further include an external power source, such as an AC adapter or a powered docking cradle that supplements or recharges the batteries.
Thesystem802 may also include aradio872 that performs the function of transmitting and receiving radio frequency communications. Theradio872 facilitates wireless connectivity between thesystem802 and the “outside world,” via a communications carrier or service provider. Transmissions to and from theradio872 are conducted under control of theoperating system864. In other words, communications received by theradio872 may be disseminated to theapplication programs866 via theoperating system864, and vice versa.
Thevisual indicator820 may be used to provide visual notifications, and/or anaudio interface874 may be used for producing audible notifications via theaudio transducer825. In the illustrated embodiment, thevisual indicator820 is a light emitting diode (LED) and theaudio transducer825 is a speaker. These devices may be directly coupled to thepower supply870 so that when activated, they remain on for a duration dictated by the notification mechanism even though theprocessor860 and other components might shut down for conserving battery power. The LED may be programmed to remain on indefinitely until the user takes action to indicate the powered-on status of the device. Theaudio interface874 is used to provide audible signals to and receive audible signals from the user. For example, in addition to being coupled to theaudio transducer825, theaudio interface874 may also be coupled to a microphone to receive audible input, such as to facilitate a telephone conversation. In accordance with embodiments of the present disclosure, the microphone may also serve as an audio sensor to facilitate control of notifications, as will be described below. Thesystem802 may further include avideo interface876 that enables an operation of an on-board camera830 to record still images, video stream, and the like.
Amobile computing device800 implementing thesystem802 may have additional features or functionality. For example, themobile computing device800 may also include additional data storage devices (removable and/or non-removable) such as, magnetic disks, optical disks, or tape. Such additional storage is illustrated inFIG. 8B by thenon-volatile storage area868.
Data/information generated or captured by themobile computing device800 and stored via thesystem802 may be stored locally on themobile computing device800, as described above, or the data may be stored on any number of storage media that may be accessed by the device via theradio872 or via a wired connection between themobile computing device800 and a separate computing device associated with themobile computing device800, for example, a server computer in a distributed computing network, such as the Internet. As should be appreciated such data/information may be accessed via themobile computing device800 via theradio872 or via a distributed computing network. Similarly, such data/information may be readily transferred between computing devices for storage and use according to well-known data/information transfer and storage means, including electronic mail and collaborative data/information sharing systems.
FIG. 9 illustrates one aspect of the architecture of a system for processing data received at a computing system from a remote source, such as acomputing device904,tablet906, ormobile device908, as described above. Content displayed atserver device902 may be stored in different communication channels or other storage types. For example, various documents may be stored using adirectory service922, aweb portal924, amailbox service926, aninstant messaging store928, or asocial networking site930. Thedigital assistant application713 may be employed by a client who communicates withserver902. Theserver902 may provide data to and from a client computing device such as apersonal computer904, atablet computing device906 and/or a mobile computing device908 (e.g., a smart phone) through anetwork915. By way of example, the computer system described above with respect toFIGS. 1-3 may be embodied in apersonal computer904, atablet computing device906 and/or a mobile computing device908 (e.g., a smart phone). Any of these embodiments of the computing devices may obtain content from thestore916, in addition to receiving graphical data useable to be either pre-processed at a graphic-originating system, or post-processed at a receiving computing system.
FIG. 10 illustrates an exemplarytablet computing device1000 that may execute one or more aspects disclosed herein. In addition, the aspects and functionalities described herein may operate over distributed systems (e.g., cloud-based computing systems), where application functionality, memory, data storage and retrieval and various processing functions may be operated remotely from each other over a distributed computing network, such as the Internet or an intranet. User interfaces and information of various types may be displayed via on-board computing device displays or via remote display units associated with one or more computing devices. For example user interfaces and information of various types may be displayed and interacted with on a wall surface onto which user interfaces and information of various types are projected. Interaction with the multitude of computing systems with which embodiments of the invention may be practiced include, keystroke entry, touch screen entry, voice or other audio entry, gesture entry where an associated computing device is equipped with detection (e.g., camera) functionality for capturing and interpreting user gestures for controlling the functionality of the computing device, and the like.
Among other examples, the present disclosure presents systems for discriminating ambiguous requests comprising: receiving a natural language expression, wherein the natural language expression includes at least one of words, terms, and phrases of text; creating a dialog hypothesis set from the natural language expression by using contextual information, wherein the dialog hypothesis set has at least two dialog hypotheses; generating a plurality of dialog responses for the dialog hypothesis set; ranking the dialog hypothesis set based on an analysis of the plurality of the dialog responses; and performing an action based on ranking the dialog hypothesis set. In further examples, the natural language expression is at least one of a spoken language input and a textual input. In further examples, the contextual information includes at least one of information extracted from a previously received natural language expression, a response to a previously received natural language expression, client context, and knowledge content. In further examples, the information extracted from the previously received natural language expression includes at least a domain prediction, an intent prediction, and a slot type. In further examples, creating the dialog hypothesis set comprises: extracting at least one feature from the natural language expression; and generating at least two dialog hypotheses, where each dialog hypothesis of the dialog hypothesis set includes a different natural language expression having at least one extracted feature. In further examples, generating a plurality of dialog responses for the dialog hypothesis set comprises generating a plurality of responses for each dialog hypothesis of the dialog hypothesis set. In further examples, generating a plurality of dialog responses for the dialog hypothesis set comprises at least one of sending the dialog hypotheses to a web backend engine and sending the dialog hypotheses to a domain specific component. In further examples, ranking the dialog hypothesis set based on an analysis of the plurality of the dialog responses comprises: extracting features from the at least two dialog hypotheses in the dialog hypothesis set; and calculating a score for the extracted features, wherein the calculated score is indicative of the dialog hypothesis rank within the dialog hypothesis set. In further examples, ranking the dialog hypothesis set based on an analysis of the plurality of the dialog responses comprises comparing the plurality of the dialog responses with a plurality of logged dialog responses. In further examples, performing an action based on ranking the dialog hypothesis set comprises: using a highest ranked dialog hypothesis to query a web backend engine for results; and sending the results to a user of a client computing device.
Further aspects disclosed herein provide an exemplary system comprising: a speech recognition component for receiving a plurality of natural language expressions, wherein the plurality of natural language expressions include at least one of words, terms, and phrases of text; and a dialog component for: creating a first fallback query from the plurality of natural language expressions, wherein creating the first fallback query comprises concatenating the plurality of natural language expressions; and sending the at least one fallback query to a backend engine for generating search results from the at least one fallback query. In further examples, the system further comprises the dialog component for receiving the search results from the backend engine. In further examples, the system further comprises the dialog component for performing a stop-word removal analysis on the plurality of natural language expressions. In further examples, the system further comprises the dialog component for creating a second fallback query from the plurality of natural language expressions, wherein creating the second fallback query comprises concatenating the stop-word removal analysis performed on the plurality of natural language expressions. In further examples, the system further comprises the dialog component for extracting semantic entities from the plurality of natural language expressions. In further examples, the system further comprises the dialog component for creating a third fallback query from the plurality of natural language expressions, wherein creating the third fallback query comprises concatenating the semantic entities extracted from the plurality of natural language expressions.
Additional aspects disclosed herein provide exemplary systems and methods for training a dialog component to discriminate ambiguous requests, the method comprising: creating a dialog hypothesis set from a natural language expression by using contextual information, wherein the dialog hypothesis set has at least two dialog hypotheses; generating a plurality of dialog responses for the dialog hypothesis set; comparing the plurality of dialog responses with a plurality of logged dialog responses; determining whether at least one of the plurality of dialog responses matches at least one of the logged dialog responses; and when it is determined that at least one of the plurality of dialog responses matches at least one of the logged dialog responses, labeling at least one of the two dialog hypotheses in the dialog hypothesis set corresponding to the at least one dialog response that matches the at least one logged dialog response. In further examples, the plurality of logged dialog responses includes a plurality of responses generated from the natural language expression. In further examples, creating the dialog hypothesis set comprises: extracting at least one feature from the natural language expression; and generating at least two dialog hypotheses, where each dialog hypothesis of the dialog hypothesis set includes a different natural language expression having at least one extracted feature. In further examples, labeling at least one of the two dialog hypotheses in the dialog hypothesis set corresponding to the at least one dialog response that matches the at least one logged dialog response indicates that the natural language expression having the at least one extracted feature can be used to generate relevant responses.
Aspects of the present disclosure, for example, are described above with reference to block diagrams and/or operational illustrations of methods, systems, and computer program products according to aspects of the disclosure. The functions/acts noted in the blocks may occur out of the order as shown in any flowchart. For example, two blocks shown in succession may in fact be executed substantially concurrently or the blocks may sometimes be executed in the reverse order, depending upon the functionality/acts involved.
The description and illustration of one or more aspects provided in this application are not intended to limit or restrict the scope of the disclosure as claimed in any way. The aspects, examples, and details provided in this application are considered sufficient to convey possession and enable others to make and use the best mode of claimed disclosure. The claimed disclosure should not be construed as being limited to any aspect, example, or detail provided in this application. Regardless of whether shown and described in combination or separately, the various features (both structural and methodological) are intended to be selectively included or omitted to produce an embodiment with a particular set of features. Having been provided with the description and illustration of the present application, one skilled in the art may envision variations, modifications, and alternate aspects falling within the spirit of the broader aspects of the general inventive concept embodied in this application that do not depart from the broader scope of the claimed disclosure.

Claims (20)

What is claimed is:
1. A system comprising:
at least one processor; and
memory encoding computer executable instructions that, when executed by at least one processor, perform a method for discriminating ambiguous requests comprising:
receiving a natural language expression, wherein the natural language expression includes at least one of words, terms, and phrases of text;
creating a dialog hypothesis set from the natural language expression by using contextual information, wherein the dialog hypothesis set has a first dialog hypothesis corresponding to a first domain and a second dialog hypothesis corresponding to a second domain;
generating, from a first domain engine component and a second domain engine component, a plurality of dialog responses for the dialog hypothesis set;
ranking by machine learning techniques the first domain engine component and the second domain engine component based on an analysis of the plurality of the dialog responses; and
performing an action with the highest ranked domain engine component.
2. The system ofclaim 1, wherein the natural language expression is at least one of a spoken language input and a textual input.
3. The system ofclaim 1, wherein the contextual information includes at least one of information extracted from a previously received natural language expression, a response to a previously received natural language expression, client context, and knowledge content.
4. The system ofclaim 3, wherein the information extracted from the previously received natural language expression includes at least a domain prediction, an intent prediction, and a slot type.
5. The system ofclaim 1, wherein creating the dialog hypothesis set comprises:
extracting at least one feature from the natural language expression; and
generating at least two dialog hypotheses, where each dialog hypothesis of the dialog hypothesis set includes a different natural language expression having at least one extracted feature.
6. The system ofclaim 1, wherein generating a plurality of dialog responses for the dialog hypothesis set comprises generating a plurality of responses for each dialog hypothesis of the dialog hypothesis set.
7. The system ofclaim 1, wherein generating a plurality of dialog responses for the dialog hypothesis set comprises at least one of sending the dialog hypotheses to a web backend engine and sending the dialog hypotheses to a domain specific component.
8. The system ofclaim 1, wherein the ranking further comprises:
extracting features from the at least two dialog hypotheses in the dialog hypothesis set; and
calculating a score for the extracted features, wherein the calculated score is indicative of the dialog hypothesis rank within the dialog hypothesis set.
9. The system ofclaim 1, wherein the ranking further comprises comparing the plurality of the dialog responses with a plurality of logged dialog responses.
10. The system ofclaim 1, wherein performing an action based on ranking the dialog hypothesis set comprises:
using a highest ranked dialog hypothesis to query a web backend engine for results; and
sending the results to a user of a client computing device.
11. One or more computer-readable storage media, having computer-executable instructions that, when executed by at least one processor, perform a method for training a dialog component to discriminate ambiguous requests, the method comprising:
receiving a natural language expression, wherein the natural language expression includes at least one of words, terms, and phrases of text;
creating a dialog hypothesis set from the natural language expression by using contextual information, wherein the dialog hypothesis set has a first dialog hypothesis corresponding to a first domain and a second hypothesis corresponding to a second domain;
generating, from a first domain engine component and a second domain engine component, a plurality of dialog responses for the dialog hypothesis set;
ranking by machine learning techniques the first domain engine component and the second domain engine component based on an analysis of the plurality of the dialog responses; and
performing an action with the highest ranked domain engine component.
12. The computer-readable storage media ofclaim 11, wherein the method further comprises comparing the plurality of dialog responses with a plurality of logged dialog responses, wherein plurality of logged dialog responses includes a plurality of responses generated from the natural language expression.
13. The computer-readable storage media ofclaim 11, wherein creating the dialog hypothesis set comprises:
extracting at least one feature from the natural language expression; and
generating at least two dialog hypotheses, where each dialog hypothesis of the dialog hypothesis set includes a different natural language expression having at least one extracted feature.
14. The computer-readable storage media ofclaim 12, wherein the method further comprises:
determining whether at least one of the plurality of dialog responses matches at least one of the logged dialog responses; and
labeling at least one of the two dialog hypotheses in the dialog hypothesis set corresponding to the at least one dialog response that matches the at least one logged dialog response.
15. A computer-implemented method comprising:
receiving a natural language expression, wherein the natural language expression includes at least one of words, terms, and phrases of text;
creating a dialog hypothesis set from the natural language expression by using contextual information, wherein the dialog hypothesis set has a first dialog hypothesis corresponding to a first domain and a second dialog hypothesis corresponding to a second domain;
generating, from a first domain engine component and a second domain engine component, a plurality of dialog responses for the dialog hypothesis set;
ranking, by machine learning techniques, the first domain engine component and the second domain engine component based on an analysis of the plurality of the dialog responses; and
performing an action with the highest ranked domain engine component.
16. The computer-implemented method ofclaim 15, wherein the natural language expression is at least one of a spoken language input and a textual input.
17. The computer-implemented method ofclaim 15, wherein the contextual information includes at least one of information extracted from a previously received natural language expression, a response to a previously received natural language expression, client context, and knowledge content.
18. The computer-implemented method ofclaim 17, wherein the information extracted from the previously received natural language expression includes at least a domain prediction, an intent prediction, and a slot type.
19. The computer-implemented method ofclaim 15, wherein creating the dialog hypothesis set comprises:
extracting at least one feature from the natural language expression; and
generating at least two dialog hypotheses, where each dialog hypothesis of the dialog hypothesis set includes a different natural language expression having at least one extracted feature.
20. The computer-implemented method ofclaim 15, wherein generating a plurality of dialog responses for the dialog hypothesis set comprises generating a plurality of responses for each dialog hypothesis of the dialog hypothesis set.
US14/586,3952014-12-302014-12-30Discriminating ambiguous expressions to enhance user experienceActive2035-04-21US9836452B2 (en)

Priority Applications (13)

Application NumberPriority DateFiling DateTitle
US14/586,395US9836452B2 (en)2014-12-302014-12-30Discriminating ambiguous expressions to enhance user experience
AU2015374382AAU2015374382B2 (en)2014-12-302015-12-22Discriminating ambiguous expressions to enhance user experience
PCT/US2015/067238WO2016109307A2 (en)2014-12-302015-12-22Discriminating ambiguous expressions to enhance user experience
EP15821005.4AEP3241125A2 (en)2014-12-302015-12-22Discriminating ambiguous expressions to enhance user experience
RU2017122991ARU2017122991A (en)2014-12-302015-12-22 DIFFERENCE OF UNCERTAINTY EXPRESSIONS FOR IMPROVEMENT OF INTERACTION WITH THE USER
MX2017008583AMX367096B (en)2014-12-302015-12-22Discriminating ambiguous expressions to enhance user experience.
CN201580070449.8ACN107111611A (en)2014-12-302015-12-22Ambiguity expression is distinguished to strengthen Consumer's Experience
KR1020177018038AKR102602475B1 (en)2014-12-302015-12-22Discriminating ambiguous expressions to enhance user experience
BR112017010222ABR112017010222A2 (en)2014-12-302015-12-22 discriminating ambiguous expressions to enhance user experience
JP2017535358AJP6701206B2 (en)2014-12-302015-12-22 Discriminate ambiguous expressions to improve user experience
CA2968016ACA2968016C (en)2014-12-302015-12-22Discriminating ambiguous expressions to enhance user experience
US15/830,767US11386268B2 (en)2014-12-302017-12-04Discriminating ambiguous expressions to enhance user experience
AU2020267218AAU2020267218B2 (en)2014-12-302020-11-11Discriminating ambiguous expressions to enhance user experience

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
US14/586,395US9836452B2 (en)2014-12-302014-12-30Discriminating ambiguous expressions to enhance user experience

Related Child Applications (1)

Application NumberTitlePriority DateFiling Date
US15/830,767ContinuationUS11386268B2 (en)2014-12-302017-12-04Discriminating ambiguous expressions to enhance user experience

Publications (2)

Publication NumberPublication Date
US20160188565A1 US20160188565A1 (en)2016-06-30
US9836452B2true US9836452B2 (en)2017-12-05

Family

ID=55073177

Family Applications (2)

Application NumberTitlePriority DateFiling Date
US14/586,395Active2035-04-21US9836452B2 (en)2014-12-302014-12-30Discriminating ambiguous expressions to enhance user experience
US15/830,767Active2035-12-12US11386268B2 (en)2014-12-302017-12-04Discriminating ambiguous expressions to enhance user experience

Family Applications After (1)

Application NumberTitlePriority DateFiling Date
US15/830,767Active2035-12-12US11386268B2 (en)2014-12-302017-12-04Discriminating ambiguous expressions to enhance user experience

Country Status (11)

CountryLink
US (2)US9836452B2 (en)
EP (1)EP3241125A2 (en)
JP (1)JP6701206B2 (en)
KR (1)KR102602475B1 (en)
CN (1)CN107111611A (en)
AU (2)AU2015374382B2 (en)
BR (1)BR112017010222A2 (en)
CA (1)CA2968016C (en)
MX (1)MX367096B (en)
RU (1)RU2017122991A (en)
WO (1)WO2016109307A2 (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US10798031B1 (en)2020-04-132020-10-06Moveworks, Inc.Generic disambiguation
US10841251B1 (en)2020-02-112020-11-17Moveworks, Inc.Multi-domain chatbot
US11250853B2 (en)2020-04-302022-02-15Robert Bosch GmbhSarcasm-sensitive spoken dialog system
US11386268B2 (en)2014-12-302022-07-12Microsoft Technology Licensing, LlcDiscriminating ambiguous expressions to enhance user experience
US12417767B2 (en)2021-12-162025-09-16Samsung Electronics Co., Ltd.Electronic device and method to control external apparatus

Families Citing this family (66)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US9690776B2 (en)*2014-12-012017-06-27Microsoft Technology Licensing, LlcContextual language understanding for multi-turn language tasks
JP2016189128A (en)*2015-03-302016-11-04ファナック株式会社Numerical controller having ambiguous retrieval function in program
US10418032B1 (en)*2015-04-102019-09-17Soundhound, Inc.System and methods for a virtual assistant to manage and use context in a natural language dialog
US10372755B2 (en)*2015-09-232019-08-06Motorola Solutions, Inc.Apparatus, system, and method for responding to a user-initiated query with a context-based response
US10262062B2 (en)*2015-12-212019-04-16Adobe Inc.Natural language system question classifier, semantic representations, and logical form templates
WO2017168246A1 (en)*2016-03-292017-10-05Maluuba Inc.Hierarchical attention for spoken dialogue state tracking
US9858265B1 (en)*2016-06-082018-01-02Rovi Guides, Inc.Systems and methods for determining context switching in conversation
US10223067B2 (en)*2016-07-152019-03-05Microsoft Technology Licensing, LlcLeveraging environmental context for enhanced communication throughput
US10573299B2 (en)*2016-08-192020-02-25Panasonic Avionics CorporationDigital assistant and associated methods for a transportation vehicle
US10102200B2 (en)2016-08-252018-10-16International Business Machines CorporationPredicate parses using semantic knowledge
US20180090141A1 (en)*2016-09-292018-03-29Microsoft Technology Licensing, LlcConversational interactions using superbots
US10437841B2 (en)2016-10-102019-10-08Microsoft Technology Licensing, LlcDigital assistant extension automatic ranking and selection
US10446144B2 (en)2016-11-212019-10-15Google LlcProviding prompt in an automated dialog session based on selected content of prior automated dialog session
EP3561643B1 (en)*2017-01-202023-07-19Huawei Technologies Co., Ltd.Method and terminal for implementing voice control
US10860628B2 (en)2017-02-162020-12-08Google LlcStreaming real-time dialog management
US20180253638A1 (en)*2017-03-022018-09-06Accenture Global Solutions LimitedArtificial Intelligence Digital Agent
US10372824B2 (en)*2017-05-152019-08-06International Business Machines CorporationDisambiguating concepts in natural language
US10446147B1 (en)*2017-06-272019-10-15Amazon Technologies, Inc.Contextual voice user interface
US11043205B1 (en)*2017-06-272021-06-22Amazon Technologies, Inc.Scoring of natural language processing hypotheses
EP3451189B1 (en)*2017-08-302020-12-02Deutsche Telekom AGA system and method for user query recognition
CN110019699B (en)*2017-09-052023-10-20声音猎手公司Classification of inter-domain through grammar slots
JP6857581B2 (en)*2017-09-132021-04-14株式会社日立製作所 Growth interactive device
US11113608B2 (en)2017-10-302021-09-07Accenture Global Solutions LimitedHybrid bot framework for enterprises
US10713300B2 (en)*2017-11-032020-07-14Google LlcUsing distributed state machines for human-to-computer dialogs with automated assistants to protect private data
KR101970899B1 (en)2017-11-272019-04-24주식회사 머니브레인Method and computer device for providing improved speech-to-text based on context, and computer readable recording medium
KR101959292B1 (en)2017-12-082019-03-18주식회사 머니브레인Method and computer device for providing improved speech recognition based on context, and computer readable recording medium
JP2019106054A (en)*2017-12-132019-06-27株式会社東芝Dialog system
US10430447B2 (en)2018-01-312019-10-01International Business Machines CorporationPredicting intent of a user from anomalous profile data
US10741176B2 (en)2018-01-312020-08-11International Business Machines CorporationCustomizing responses to users in automated dialogue systems
US10231285B1 (en)2018-03-122019-03-12International Business Machines CorporationCognitive massage dynamic response optimization
US11568863B1 (en)*2018-03-232023-01-31Amazon Technologies, Inc.Skill shortlister for natural language processing
US10929601B1 (en)*2018-03-232021-02-23Amazon Technologies, Inc.Question answering for a multi-modal system
US11307880B2 (en)2018-04-202022-04-19Meta Platforms, Inc.Assisting users with personalized and contextual communication content
US11715042B1 (en)2018-04-202023-08-01Meta Platforms Technologies, LlcInterpretability of deep reinforcement learning models in assistant systems
US11676220B2 (en)2018-04-202023-06-13Meta Platforms, Inc.Processing multimodal user input for assistant systems
US11886473B2 (en)2018-04-202024-01-30Meta Platforms, Inc.Intent identification for agent matching by assistant systems
US11010179B2 (en)2018-04-202021-05-18Facebook, Inc.Aggregating semantic information for improved understanding of users
WO2019216876A1 (en)*2018-05-072019-11-14Google LlcActivation of remote devices in a networked system
US10956462B1 (en)*2018-06-212021-03-23Amazon Technologies, Inc.System answering of user inputs
US11868728B1 (en)*2018-09-192024-01-09Amazon Technologies, Inc.Multi-domain skills
CN109325234B (en)*2018-10-102023-06-20深圳前海微众银行股份有限公司Sentence processing method, sentence processing device and computer readable storage medium
EP3871213B1 (en)2018-10-252024-12-18Microsoft Technology Licensing, LLCMulti-phrase responding in full duplex voice conversation
KR102734292B1 (en)*2018-11-122024-11-26삼성전자주식회사Method and apparatus for classifying data, method and apparatus for training classifier
CN109712619B (en)*2018-12-242020-12-11出门问问信息科技有限公司Method and device for decoupling dialog hypothesis and executing dialog hypothesis and voice interaction system
US10943588B2 (en)*2019-01-032021-03-09International Business Machines CorporationMethods and systems for managing voice response systems based on references to previous responses
CN111552784A (en)*2019-02-122020-08-18厦门邑通软件科技有限公司Man-machine conversation method based on ABC communication rule
US11194796B2 (en)*2019-02-142021-12-07Microsoft Technology Licensing, LlcIntuitive voice search
CN110188182B (en)*2019-05-312023-10-27中国科学院深圳先进技术研究院Model training method, dialogue generating method, device, equipment and medium
US11302330B2 (en)*2019-06-032022-04-12Microsoft Technology Licensing, LlcClarifying questions for rewriting ambiguous user utterance
US11256868B2 (en)*2019-06-032022-02-22Microsoft Technology Licensing, LlcArchitecture for resolving ambiguous user utterance
US12038953B2 (en)2019-06-272024-07-16Sony Group CorporationInformation processing apparatus and information processing method
US11328711B2 (en)*2019-07-052022-05-10Korea Electronics Technology InstituteUser adaptive conversation apparatus and method based on monitoring of emotional and ethical states
US12174899B2 (en)*2019-09-042024-12-24International Business Machines CorporationGeofencing queries based on query intent and result semantics
KR20210036169A (en)2019-09-252021-04-02현대자동차주식회사Dialogue system, dialogue processing method, translating apparatus and method of translation
US11508372B1 (en)*2020-06-182022-11-22Amazon Technologies, Inc.Natural language input routing
US10818293B1 (en)2020-07-142020-10-27Drift.com, Inc.Selecting a response in a multi-turn interaction between a user and a conversational bot
CN112000787B (en)*2020-08-172021-05-14上海小鹏汽车科技有限公司 Voice interaction method, server and voice interaction system
CN111985249B (en)*2020-09-032024-10-08贝壳技术有限公司Semantic analysis method, semantic analysis device, computer readable storage medium and electronic equipment
KR102339794B1 (en)2020-12-042021-12-16주식회사 애자일소다Apparatus and method for servicing question and answer
KR20220094400A (en)*2020-12-292022-07-06현대자동차주식회사Dialogue system, Vehicle and method for controlling the dialogue system
US20230077874A1 (en)*2021-09-142023-03-16Samsung Electronics Co., Ltd.Methods and systems for determining missing slots associated with a voice command for an advanced voice interaction
US11977852B2 (en)*2022-01-122024-05-07Bank Of America CorporationAnaphoric reference resolution using natural language processing and machine learning
WO2023158050A1 (en)*2022-02-182023-08-24Samsung Electronics Co., Ltd.Methods and electronic device for providing interaction with virtual assistant
US12245299B2 (en)*2022-04-062025-03-04Theatro Labs, Inc.Target disambiguation in a computer mediated communication system
US12367870B2 (en)*2022-11-152025-07-22Soundhound Ai Ip, LlcReal-time natural language processing and fulfillment
JP2024119383A (en)*2023-02-222024-09-03本田技研工業株式会社 DIALOGUE UNDERSTANDING DEVICE AND DIALOGUE UNDERSTANDING METHOD

Citations (33)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
EP0631244A2 (en)1993-06-241994-12-28Xerox CorporationA method and system of information retrieval
US6266668B1 (en)1998-08-042001-07-24Dryken Technologies, Inc.System and method for dynamic data-mining and on-line communication of customized information
US6272488B1 (en)1998-04-012001-08-07International Business Machines CorporationManaging results of federated searches across heterogeneous datastores with a federated collection object
US20030214523A1 (en)*2002-05-162003-11-20Kuansan WangMethod and apparatus for decoding ambiguous input using anti-entities
US6745177B2 (en)1999-04-092004-06-01Metro One Telecommunications, Inc.Method and system for retrieving data from multiple data sources using a search routing database
US20050004905A1 (en)2003-03-032005-01-06Scott DresdenSearch engine with neural network weighting based on parametric user data
US20050149496A1 (en)2003-12-222005-07-07Verity, Inc.System and method for dynamic context-sensitive federated search of multiple information repositories
US20050182628A1 (en)*2004-02-182005-08-18Samsung Electronics Co., Ltd.Domain-based dialog speech recognition method and apparatus
US20060136375A1 (en)2004-12-162006-06-22At&T Corp.System and method for providing a natural language interface to a database
US20070078658A1 (en)2005-09-302007-04-05Rockwell Automation Technologies, Inc.HMI presentation layer configuration system
US7340454B2 (en)2003-08-182008-03-04Sap AgProcessing index action requests for search engines
US7398209B2 (en)*2002-06-032008-07-08Voicebox Technologies, Inc.Systems and methods for responding to natural language speech utterance
US20100023502A1 (en)2008-07-282010-01-28Yahoo! Inc.Federated community search
US20100114944A1 (en)*2008-10-312010-05-06Nokia CorporationMethod and system for providing a voice interface
US20100138215A1 (en)2008-12-012010-06-03At&T Intellectual Property I, L.P.System and method for using alternate recognition hypotheses to improve whole-dialog understanding accuracy
US20100306213A1 (en)2009-05-272010-12-02Microsoft CorporationMerging Search Results
US20100312549A1 (en)*2008-01-182010-12-09Ugochukwu AkuwudikeMethod and system for storing and retrieving characters, words and phrases
US20110119302A1 (en)*2009-11-172011-05-19Glace Holdings LlcSystem and methods for accessing web pages using natural language
US20110320186A1 (en)*2010-06-232011-12-29Rolls-Royce PlcEntity recognition
US8131705B2 (en)2007-06-292012-03-06Emc CorporationRelevancy scoring using query structure and data structure for federated search
US20120084086A1 (en)*2010-09-302012-04-05At&T Intellectual Property I, L.P.System and method for open speech recognition
US8180754B1 (en)2008-04-012012-05-15Dranias Development LlcSemantic neural network for aggregating query searches
US8214310B2 (en)2005-05-182012-07-03International Business Machines CorporationCross descriptor learning system, method and program product therefor
US20140006012A1 (en)2012-07-022014-01-02Microsoft CorporationLearning-Based Processing of Natural Language Questions
US8645361B2 (en)2012-01-202014-02-04Microsoft CorporationUsing popular queries to decide when to federate queries
US20140163959A1 (en)*2012-12-122014-06-12Nuance Communications, Inc.Multi-Domain Natural Language Processing Architecture
US8756233B2 (en)*2010-04-162014-06-17Video SemanticsSemantic segmentation and tagging engine
US20140236575A1 (en)2013-02-212014-08-21Microsoft CorporationExploiting the semantic web for unsupervised natural language semantic parsing
US20140365502A1 (en)2013-06-112014-12-11International Business Machines CorporationDetermining Answers in a Question/Answer System when Answer is Not Contained in Corpus
US20140365209A1 (en)2013-06-092014-12-11Apple Inc.System and method for inferring user intent from speech inputs
US20150039292A1 (en)*2011-07-192015-02-05MaluubaInc.Method and system of classification in a natural language user interface
US20150242387A1 (en)*2014-02-242015-08-27Nuance Communications, Inc.Automated text annotation for construction of natural language understanding grammars
US9465833B2 (en)*2012-07-312016-10-11Veveo, Inc.Disambiguating user intent in conversational interaction system for large corpus information retrieval

Family Cites Families (15)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US7725307B2 (en)*1999-11-122010-05-25Phoenix Solutions, Inc.Query engine for processing voice based queries including semantic decoding
US8301436B2 (en)2003-05-292012-10-30Microsoft CorporationSemantic object synchronous understanding for highly interactive interface
US8041570B2 (en)*2005-05-312011-10-18Robert Bosch CorporationDialogue management using scripts
US7640160B2 (en)*2005-08-052009-12-29Voicebox Technologies, Inc.Systems and methods for responding to natural language speech utterance
US9063975B2 (en)*2013-03-152015-06-23International Business Machines CorporationResults of question and answer systems
JP5379627B2 (en)*2009-09-292013-12-25エヌ・ティ・ティ・コミュニケーションズ株式会社 Search control apparatus, search control method, and program
EP2691870A4 (en)*2011-03-312015-05-20Microsoft Technology Licensing LlcTask driven user intents
US9760566B2 (en)*2011-03-312017-09-12Microsoft Technology Licensing, LlcAugmented conversational understanding agent to identify conversation context between two humans and taking an agent action thereof
US9842168B2 (en)*2011-03-312017-12-12Microsoft Technology Licensing, LlcTask driven user intents
US9064006B2 (en)*2012-08-232015-06-23Microsoft Technology Licensing, LlcTranslating natural language utterances to keyword search queries
EP3392876A1 (en)*2011-09-302018-10-24Apple Inc.Using context information to facilitate processing of commands in a virtual assistant
US9582608B2 (en)*2013-06-072017-02-28Apple Inc.Unified ranking with entropy-weighted information for phrase-based semantic auto-completion
JP5734354B2 (en)*2013-06-262015-06-17ファナック株式会社 Tool clamping device
US9275115B2 (en)*2013-07-162016-03-01International Business Machines CorporationCorrelating corpus/corpora value from answered questions
US9836452B2 (en)2014-12-302017-12-05Microsoft Technology Licensing, LlcDiscriminating ambiguous expressions to enhance user experience

Patent Citations (33)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
EP0631244A2 (en)1993-06-241994-12-28Xerox CorporationA method and system of information retrieval
US6272488B1 (en)1998-04-012001-08-07International Business Machines CorporationManaging results of federated searches across heterogeneous datastores with a federated collection object
US6266668B1 (en)1998-08-042001-07-24Dryken Technologies, Inc.System and method for dynamic data-mining and on-line communication of customized information
US6745177B2 (en)1999-04-092004-06-01Metro One Telecommunications, Inc.Method and system for retrieving data from multiple data sources using a search routing database
US20030214523A1 (en)*2002-05-162003-11-20Kuansan WangMethod and apparatus for decoding ambiguous input using anti-entities
US7398209B2 (en)*2002-06-032008-07-08Voicebox Technologies, Inc.Systems and methods for responding to natural language speech utterance
US20050004905A1 (en)2003-03-032005-01-06Scott DresdenSearch engine with neural network weighting based on parametric user data
US7340454B2 (en)2003-08-182008-03-04Sap AgProcessing index action requests for search engines
US20050149496A1 (en)2003-12-222005-07-07Verity, Inc.System and method for dynamic context-sensitive federated search of multiple information repositories
US20050182628A1 (en)*2004-02-182005-08-18Samsung Electronics Co., Ltd.Domain-based dialog speech recognition method and apparatus
US20060136375A1 (en)2004-12-162006-06-22At&T Corp.System and method for providing a natural language interface to a database
US8214310B2 (en)2005-05-182012-07-03International Business Machines CorporationCross descriptor learning system, method and program product therefor
US20070078658A1 (en)2005-09-302007-04-05Rockwell Automation Technologies, Inc.HMI presentation layer configuration system
US8131705B2 (en)2007-06-292012-03-06Emc CorporationRelevancy scoring using query structure and data structure for federated search
US20100312549A1 (en)*2008-01-182010-12-09Ugochukwu AkuwudikeMethod and system for storing and retrieving characters, words and phrases
US8180754B1 (en)2008-04-012012-05-15Dranias Development LlcSemantic neural network for aggregating query searches
US20100023502A1 (en)2008-07-282010-01-28Yahoo! Inc.Federated community search
US20100114944A1 (en)*2008-10-312010-05-06Nokia CorporationMethod and system for providing a voice interface
US20100138215A1 (en)2008-12-012010-06-03At&T Intellectual Property I, L.P.System and method for using alternate recognition hypotheses to improve whole-dialog understanding accuracy
US20100306213A1 (en)2009-05-272010-12-02Microsoft CorporationMerging Search Results
US20110119302A1 (en)*2009-11-172011-05-19Glace Holdings LlcSystem and methods for accessing web pages using natural language
US8756233B2 (en)*2010-04-162014-06-17Video SemanticsSemantic segmentation and tagging engine
US20110320186A1 (en)*2010-06-232011-12-29Rolls-Royce PlcEntity recognition
US20120084086A1 (en)*2010-09-302012-04-05At&T Intellectual Property I, L.P.System and method for open speech recognition
US20150039292A1 (en)*2011-07-192015-02-05MaluubaInc.Method and system of classification in a natural language user interface
US8645361B2 (en)2012-01-202014-02-04Microsoft CorporationUsing popular queries to decide when to federate queries
US20140006012A1 (en)2012-07-022014-01-02Microsoft CorporationLearning-Based Processing of Natural Language Questions
US9465833B2 (en)*2012-07-312016-10-11Veveo, Inc.Disambiguating user intent in conversational interaction system for large corpus information retrieval
US20140163959A1 (en)*2012-12-122014-06-12Nuance Communications, Inc.Multi-Domain Natural Language Processing Architecture
US20140236575A1 (en)2013-02-212014-08-21Microsoft CorporationExploiting the semantic web for unsupervised natural language semantic parsing
US20140365209A1 (en)2013-06-092014-12-11Apple Inc.System and method for inferring user intent from speech inputs
US20140365502A1 (en)2013-06-112014-12-11International Business Machines CorporationDetermining Answers in a Question/Answer System when Answer is Not Contained in Corpus
US20150242387A1 (en)*2014-02-242015-08-27Nuance Communications, Inc.Automated text annotation for construction of natural language understanding grammars

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
"International Preliminary Report on Patentability Issued in PCT Application No. PCT/US2015/067238", dated Mar. 31, 2017, 12 Pages.
PCT 2nd Written Opinion in International Application PCT/US2015/067238, dated Nov. 23, 2016, 11 pages.
PCT International Search Report in PCT/US2015067238, dated Aug. 5, 2016 19 pages.
Wang, et al., "Modeling Action-level Satisfaction for Search Task Satisfaction Prediction", In Proceedings of 37th International AGM SIGIR Conference on Research and Development in Information Retrievai, Jul. 6, 2014, 10 pages.

Cited By (6)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US11386268B2 (en)2014-12-302022-07-12Microsoft Technology Licensing, LlcDiscriminating ambiguous expressions to enhance user experience
US10841251B1 (en)2020-02-112020-11-17Moveworks, Inc.Multi-domain chatbot
US10798031B1 (en)2020-04-132020-10-06Moveworks, Inc.Generic disambiguation
US11277360B2 (en)2020-04-132022-03-15Moveworks, Inc.Generic disambiguation
US11250853B2 (en)2020-04-302022-02-15Robert Bosch GmbhSarcasm-sensitive spoken dialog system
US12417767B2 (en)2021-12-162025-09-16Samsung Electronics Co., Ltd.Electronic device and method to control external apparatus

Also Published As

Publication numberPublication date
AU2015374382A1 (en)2017-05-25
CN107111611A (en)2017-08-29
MX2017008583A (en)2017-11-15
AU2015374382B2 (en)2020-08-13
WO2016109307A2 (en)2016-07-07
KR102602475B1 (en)2023-11-14
RU2017122991A (en)2018-12-29
US20160188565A1 (en)2016-06-30
CA2968016A1 (en)2016-07-07
US20180089167A1 (en)2018-03-29
US11386268B2 (en)2022-07-12
BR112017010222A2 (en)2017-12-26
AU2020267218B2 (en)2021-12-09
EP3241125A2 (en)2017-11-08
MX367096B (en)2019-08-05
RU2017122991A3 (en)2019-07-17
AU2020267218A1 (en)2020-12-10
CA2968016C (en)2023-01-24
WO2016109307A3 (en)2016-10-06
JP2018506113A (en)2018-03-01
KR20170099917A (en)2017-09-01
JP6701206B2 (en)2020-05-27

Similar Documents

PublicationPublication DateTitle
AU2020267218B2 (en)Discriminating ambiguous expressions to enhance user experience
US10007660B2 (en)Contextual language understanding for multi-turn language tasks
AU2016209220B2 (en)Methods for understanding incomplete natural language query
US11580350B2 (en)Systems and methods for an emotionally intelligent chat bot
US9965465B2 (en)Distributed server system for language understanding
US10360300B2 (en)Multi-turn cross-domain natural language understanding systems, building platforms, and methods
US11017767B2 (en)Hierarchical attention for spoken dialogue state tracking
US11914600B2 (en)Multiple semantic hypotheses for search query intent understanding
WO2022099566A1 (en)Knowledge injection model for generative commonsense reasoning
US20240321269A1 (en)Generating Contextual Responses for Out-of-coverage Requests for Assistant Systems

Legal Events

DateCodeTitleDescription
ASAssignment

Owner name:MICROSOFT TECHNOLOGY LICENSING, LLC, WASHINGTON

Free format text:ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:ROBICHAUD, JEAN-PHILIPPE;SARIKAYA, RUHI;SIGNING DATES FROM 20141217 TO 20141223;REEL/FRAME:034613/0105

ASAssignment

Owner name:MICROSOFT TECHNOLOGY LICENSING, LLC, WASHINGTON

Free format text:ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:MICROSOFT CORPORATION;REEL/FRAME:034819/0001

Effective date:20150123

STCFInformation on status: patent grant

Free format text:PATENTED CASE

MAFPMaintenance fee payment

Free format text:PAYMENT OF MAINTENANCE FEE, 4TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1551); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

Year of fee payment:4

MAFPMaintenance fee payment

Free format text:PAYMENT OF MAINTENANCE FEE, 8TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1552); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

Year of fee payment:8


[8]ページ先頭

©2009-2025 Movatter.jp