FIELD OF THE INVENTIONThe present invention relates to assisting communications using semantic processing.
BACKGROUNDCompanies need to efficiently interact with customers to provide services to their customers. For example, customers may need to obtain information about services of the company, may have a question about billing, or may need technical support from the company. Companies interact with customers in a variety of different ways. Companies may have a website and the customer may navigate the website to perform various actions. Companies may have an application (“app”) that runs on a user device, such as a smart phone or a tablet, that provides similar services as a website. Companies may have a phone number that customers can call to obtain information via interactive voice response or to speak with a customer service representative. Companies may also respond to customers using various social media services, such as Facebook or Twitter.
Some existing techniques for allowing customers to interact with companies may be a nuisance to the customer. Navigating to the right page on a website or an app or navigating a voice menu on a phone call may be time consuming. Some existing techniques for allowing customers to interact with companies may be expensive for a company to implement. Hiring customer service representatives to manually respond to requests and answer phone calls may be a significant expense. For at least these reasons, customer support facilities may not currently satisfy customer needs or may be expensive for the company to operate. Therefore, techniques for improving company interactions with customers is needed.
BRIEF DESCRIPTION OF THE FIGURESThe invention and the following detailed description of certain embodiments thereof may be understood by reference to the following figures:
FIG. 1 is an example system for suggesting resources to a customer service representative for responding to a customer.
FIG. 2 is an example of a user interface for a customer service representative for responding to a customer.
FIG. 3 is a flowchart of an example implementation of suggesting a resource to a customer service representative using a context of a session.
FIG. 4 is an example of data that may be used to train models for suggesting resources.
FIG. 5 is an example system for training a model for suggesting resources.
FIG. 6 is an example system for suggesting a resource using a context of a session.
FIGS. 7A and 7B are example systems for training a model for suggesting resources.
FIGS. 8A, 8B, and 8C are example systems for training a model for suggesting resources using hash vectors.
FIGS. 9A and 9B are example systems for suggesting a resource using hash vectors.
FIG. 10 is a flowchart for an example implementation of training a model for suggesting resources using hash vectors.
FIG. 11 is a flowchart of an example implementation of suggesting a resource using hash vectors.
FIG. 12 is an exemplary computing device that may be used to suggest resources and train models for suggesting resources.
DETAILED DESCRIPTIONDescribed herein are techniques for suggesting resources based on a context of interactions between two entities. Although the techniques described herein may be used for a wide variety of entities and interactions, for clarity of presentation, an example of a customer service representative of a company providing a response to a request of a customer will be used. The techniques described herein, however, are not limited to customers and companies, responses may be provided to requests from users who are not customers, and responses may be from an entity that is not a company, such as an individual. For instance, the interactions may be between an online instructor and a student, a contractor and specialists, an online publisher and commenters, and the like. The interactions may include text messages sent from one entity to another, but are not limited to text messages. For instance, interactions may be through social media channels, telephone communications, online interactions, and the like. One skilled in the art will appreciate that the techniques described herein may be used in a broad scope of interaction environments.
FIG. 1 illustrates asystem100 for suggesting a resource to a customer service representative in responding to a request of a customer. InFIG. 1, a customer may usecustomer device110 to communicate with a company.Customer device110 may include any appropriate device, such as a computer, smart phone, tablet, wearable device, or Internet of things device. The customer may submit the request using any appropriate techniques, such as typing or speaking a request to an app running on customer device110 (e.g., an app of a particular company or a third-party app created for processing customer requests), typing or speaking a request on a web page, sending a text message, or sending an email. As used herein, a text message includes any message sent as text including but not limited to a message sent using SMS (short message service) or a special purpose application (e.g., Facebook messenger, Apple iMessage, Google Hangouts, or WhatsApp). Other customers may also interact withsystem100, such as another customer usingcustomer device111.
The customer's request may be sent bycustomer device110 toservers130 of the company vianetwork120. Network120 may include any appropriate networks, such as a local area network, a wide area network, the Internet, Wi-Fi, or a cellular data network. The request may be sent using any appropriate transmission protocols that include sending one or more of the text of the message or audio of the customer speaking the request. Where the customer speaks a request tocustomer device110, speech recognition may be performed bycustomer device110, atservers130, or by another component ofsystem100.
Servers130 receive the customer request and may coordinate further processing of the customer request.Servers130 may include components comprising particular functionality or may be connected to other computing devices that include such components. For example,servers130 may includesupport component131 that facilitates a customer support session between a customer and a customer service representative (CSR). For example,support component131 may select a CSR (e.g.,CSR150,151, or152), may transmit a message from a customer to a selected CSR, and may transmit a response from a CSR back to the customer. A CSR may use a user interface, such as an application on a computer or a web page, to receive customer requests and provide responses to them.
Servers130 may includesuggestion component132 to suggest resources to a CSR in responding to a request of a customer. For example,suggestion component132 may process messages and other information (e.g., images or URLs) transmitted betweencustomer110 and the CSR to determine a context of the session. A context of session may be any representation of a description of interactions between the customer and the CSR. For example, a context may be a vector computed using an artificial neural network that describes a meaning of the interactions. An interaction may include any information or data transmitted between the customer and the CSR. For example, a message is an example of an interaction.Suggestion component132 may retrieve one or more resources fromresources data store140 and present information about the resources to the CSR to assist the CSR in responding to the customer. The suggested resources may include any information that may assist a CSR in responding to a customer. For example, the resources may include a text response (such as a text response used by the current CSR or another CSR in a previous support session similar to the current support session), an image, a URL to relevant information, a document, or a trouble shooting tree that may be used by the CSR for diagnosing problems.
The suggested resources may be presented on a user interface (UI) used by the CSR, such as the user interface ofFIG. 2. The UI ofFIG. 2 includes different portions that contain different types of information. For example,FIG. 2 includes acustomer list portion210 that includes a list of customers who the CSR is currently communicating with. In this example, the CSR is communicating with five different customers, and the customer named Cathy Washington is a selected or active customer.FIG. 2 also includes aconversation portion220 that shows messages between the customer and the CSR.
FIG. 2 also includes asuggestions portion230 that may present suggested resources to the CSR. In this example the suggested resources include three text responses241-243, aURL250, and adocument260. The CSR may use any of the suggestions by selecting them or using any other appropriate user interface techniques. For example, the CSR could click suggestedresponse241 to have it inserted into the text entry box of conversation portion, and then send it to the customer. Similarly, the CSR may selectURL250 or document260 to send them to the customer or to view them.
FIG. 3 is a flowchart of an example implementation of suggesting resources to a CSR. InFIG. 3, the ordering of the steps is exemplary and other orders are possible, not all steps are required and, in some implementations, some steps may be omitted or other steps may be added. The process of the flowcharts may be implemented, for example, by any of the computers or systems described herein.
Atstep310, a support session between a customer and a CSR is started. The support session may be started using any appropriate techniques, such as by sending a text message (e.g., SMS), sending an email message, using an application or app (e.g., Facebook or a company specific app), or using a webpage. The session may be started by either the customer or the CSR, and they may communicate via text, voice, or other means of interactive communication. A support session may comprise any number of interactions between the customer and the CSR.
Atstep320, a message is received from either the customer or the CSR and transmitted to the other. For example, where the message is text, the text of messages between the customer and CSR may be presented in user interfaces to the customer and the CSR. Where the message is audio, automatic speech recognition may be used to convert the audio to text for subsequent processing.
Atstep330, a context vector is computed that describes the interactions in the support session, as described in greater detail below. For example, a context vector may be computed by iteratively processing messages between the customer and the CSR. After receiving a message from the customer or CSR, a previous context vector may be updated using the message to generate a new context vector that accounts for the received message. A context vector need not be in the form of a vector and may take other formats, such as a matrix or tensor. A context vector may include any data that describes the interactions so far between the customer and the CSR, such as a vector that indicates a semantic meaning of the interactions.
Atstep340, suggestions of resources for the CSR are obtained from a data store of resources using the context vector. For example, resources may be retrieved from the data store that have corresponding resource vectors that are an exact match to the context vector or that are close to the context vector (e.g., using a Euclidean or Hamming distance). The resources in the data store may include any resources that may assist the CSR. For example, the resources data store may include a large number of messages previously sent by CSRs to customers, and the context vector may be used to receive one or more previous messages that may be an appropriate response for the CSR to send to the customer. The resources in the resources data store may also include any of the other resources described above, such as images, documents, or URLs.
Atstep350, suggestions are presented to the CSR using the resources retrieved from the resources data store atstep340. For example, where the resources are previous messages, the text of the messages may be presented to the CSR. Where the resources are images, a thumbnail of the image may be presented along with a short description of the image. The CSR may then use the suggestions in responding to the customer. In some situations, the CSR may not use the suggestions at all and type or speak a response to the customer. In some situations, the CSR may click on a suggestion to send a response to the customer, such as by clicking a suggested message to send that same message to the customer.
Atstep360, it is determined if the session is done. For example, the session may be terminated by either the customer or the CSR. If the session is not done, then processing proceeds to step320 where another message is received from the customer or the CSR. A new context vector may then be computed using that message, resources may be retrieved from a data store using the new context vector, and suggestions may be presented to the CSR using the retrieved resources. If the session is done, then processing proceeds to step370 where the support session is ended. For example, a connection between the customer and CSR may be terminated so that messages are no longer communicated between them.
The context vector computed during the processing ofFIG. 3 may be created by processing the received message using one or more mathematical models, such as a neural network model. Mathematical models for creating context vectors may be trained by using a corpus of training data. For example, a corpus of training data may include data about previous support sessions between customers and CSRs.
FIG. 4 illustrates an example of a previous support session between a customer and a CSR that may be included as part of a training corpus for training models. InFIG. 4, time proceeds from the top to the bottom of the figure. The first column of the figure indicates messages sent by the customer to the CSR, the second column indicates messages sent by the CSR to the customer, and the third column indicates resources used by the CSR during the support session (e.g., an image, document, or URL).
In the example ofFIG. 4, the first action is a customer sending a message to a CSR for assistance, such as “My Internet is not working. Can you help?” The second message is sent by the CSR to the customer, such as “Hi, my name is John, and I will be helping you today.” Similarly, subsequent messages may be sent between the customer and CSR as indicated by messages three through twelve. In this example, the CSR also uses resources in responding to the customer. For example, when sendingmessage5 to the customer, the CSR may include an image to assist the customer in configuring his router. Similarly, when sendingmessage10, the CSR may send a URL to the customer to allow the customer to check the speed of his network connection.
FIG. 5 illustrates asystem500 for training one or models for suggesting resources that may be used by a CSR in a session with a customer. InFIG. 5, trainingcorpus data store510 may include information about previous sessions between customers and CSRs, such as the session presented inFIG. 4. Trainingcorpus data store510 may include a large number of sessions, such as tens of thousands or hundreds of thousands of sessions. InFIG. 5,resources data store140 may include any resources that may be suggested to a CSR. For example, resources data store may include some or all messages previously sent by a CSR to a customer (e.g.,messages2,4,5,7,10, and12 fromFIG. 4).
Model training component520 may process training data fromtraining corpus510 and resources fromresources data store140 to generate one or more mathematical models, such as neural network models, for suggesting resources to a CSR.Model training component520 may also generate resource vectors for the resources in the data store, where each resource vector describes the resource, such as by indicating a context in which the resource was previously used. The one or more models and resource vectors may be used to select resources as described in greater detail below.
FIG. 6 illustrates asystem600 for suggesting resources to a CSR using the one or models created bysystem500 ofFIG. 5. InFIG. 6,context computation component610 receives a message and generates a context vector by processing the message with the one or more models, such as one or more neural network models. In some implementations,context computation component610 may iteratively process a sequence of messages in a session and generate a context vector after processing each message. For example,context computation component610 may use a context vector from a previous message in generating a context vector for a current message. Accordingly, each context vector may be computed using a message and a previous context vector.
FIG. 6 includessearch component620 to search for resources using the context vector. In some implementations,search component620 may queryresources data store140 to obtain resources. For example,search component620 may obtain resources that have resource vectors that match the context vector or are close to the context vector.Search component620 may output the retrieved resources, which may then be presented to a CSR.
FIGS. 7A and 7B illustrate additional details of an implementation of thesystem500 ofFIG. 5 for training models for suggesting resources. For example,systems701 and702 ofFIGS. 7A and 7B may be part ofmodel training component520 ofsystem500.
InFIG. 7A,system701 may train the one or more models using training data, such as the training data oftraining corpus510.System701 may iteratively process pairs of inputs where the pairs of inputs may comprise, for example, a message received from the customer and a resource used in response to the received message by the CSR in the training data. In one iteration,system701 receivesmessage710 and resource used inresponse720, where resource used inresponse720 was used in response tomessage710 in the training data. For example,message710 may bemessage1 ofFIG. 4 and resource used inresponse720 may bemessage2 ofFIG. 4.
System701 may process these two inputs and update model parameters, such as model parameters used bysession context component730 andresource encoding component740. The solid lines inFIG. 7A illustrate data flows for processing training data to determine updated model parameters, and the dashed lines inFIG. 7A illustrate data flows for updating the model parameters in the components ofsystem701. For example,optimization component750 may compute updated model parameters and transmit the updated model parameters to other components.
Session context component730 may process an interaction, such as any of the interactions ofFIG. 4, and generate a context vector that describes the session between the customer and the CSR from the beginning of the session through the most recently processed interaction. The context vector may, for example, indicate (not necessarily in a human understandable way) a meaning of the interactions in the session. Any appropriate techniques may be used to compute a context vector, such as any of the techniques described in greater detail below.
Resource encoding component740 may process the resource that was actually used in response to the message in the training data.Resource encoding component740 may generate a resource vector that describes the resource (not necessarily in a human understandable way), for example by describing contexts where the resource was previously used by a CSR. Any appropriate techniques may be used to compute a resource vector, such as any of the techniques described in greater detail below.
Optimization component750 receives the context vector fromsession context component730 and the resource vector fromresource encoding component740.Optimization component750 then determines updated model parameters so that context vector is “closer” to the resource vector in some sense. Because the corresponding resource was actually used at this point of the training data,optimization component750 determines updated model parameters to increase a similarity between the context vector and the resource vector. The meaning of closeness and similarity with regards to the context and resource vectors may correspond to any mathematical notion of similarity, such as a distance or measure. Becausesession context component730 andresource encoding component740 are trained in parallel, the parameters of these components may converge in parallel so that that the context vector may be used to identify resources (via resource vectors) that are relevant to the current session.
Session context component730 may iteratively process interactions in a session. For example,session context component730 may iteratively process some or all of the interactions ofFIG. 4 and generate a context vector at each iteration. The interactions processed bysession context component730 may depend on the implementation. In some implementations,session context component730 may process only messages received from the customer (e.g.,messages1,3,6,8,9, and11 ofFIG. 4). In some implementations,session context component730 may process all messages between the customer and the CSR. In some implementations,session context component730 may process all interactions in the session, including for example, resources used by the CSR (e.g.,resources1 and2 ofFIG. 4) and any other resources described herein.
An example of the process ofsystem701 is now given using the example training data ofFIG. 4. For a first iteration,message710 may bemessage1 ofFIG. 4 and resource used inresponse720 may bemessage2 ofFIG. 4.Session context component730 may compute a context vector for the session although at this point, the session is justmessage1 ofFIG. 4.Resource encoding component740 may compute a resource vector, andoptimization component750 may update model parameters.
For a second iteration,session context component730 may update the context vector by processingmessage2 ofFIG. 4. During the second iteration, a resource may not be processed since no resource was used at that point in the session. Where no resource is processed, no optimization may be performed as well. In some implementations, the second iteration may not be performed, such as when the context is computed using only messages from the customer.
For a third iteration,message710 may bemessage3 ofFIG. 4 and resource used inresponse720 may bemessage4 ofFIG. 4.Session context component730 may again compute a context for the session and now the context for the session may include messages1-3 ofFIG. 4. As with the first iteration, a resource vector may be computed, and model parameters updated.
For a fourth iteration,message710 may bemessage4 ofFIG. 4 and resource used inresponse720 may beresource1 ofFIG. 4.Session context component730 may again compute a context for the session and now the context for the session may include messages1-4 ofFIG. 4. As above, a resource vector may be computed, and model parameters updated.
Subsequent iterations may process the remaining interactions ofFIG. 4. After all the interactions ofFIG. 4 have been processed, the context vector may be reset to an initial state to prepare for processing another session of training data.
As the model parameters are updated byoptimization component750, the model parameters may be transmitted tosession context component730 andresource encoding component740 for subsequent processing. In some implementations, the model parameters may be transmitted to the components each time they are updated or they may be transmitted after processing is completed for an entire session of the training data or for multiple sessions of the training data.
The processing described above may be completed for an entire corpus of training data. Each session in the training corpus may be processed one or more times to train the one or more models used bysession context component730 andresource encoding component740. When the training is complete, as determined by any suitable termination criteria, the trained one or more models may be used for subsequent processing.
FIG. 7B illustrates anexample system702 for computing resource vectors for each resource inresources data store140 using the one or more models trained bysystem701 ofFIG. 7A. InFIG. 7B,resource encoding component740 receives a resource and generates a resource vector for that resource. A resource vector for a resource may be stored in a data store, such asdata store140, in association with the resource to facilitate retrieval of the corresponding resource.
The one or more models and resource vectors may then be used to suggest resources to a CSR. For example,context computation component610 ofsystem600 may use the one or models to generate a context vector for a session, andsearch component620 may compare the context vector to previously stored resource vectors to retrieve one or more resources as suggestions for a CSR.
FIGS. 8A, 8B, and 8C, illustrate additional details of an implementation of the systems ofFIGS. 7A and 7B for training one or more models for suggesting resources. The examples ofFIGS. 8A, 8B, and 8C, are not limiting and many variations of these examples are possible.
InFIG. 8A,system801 may train the one or more models using training data, such as the training data ofFIG. 4.System801 takes as input message805 (similar to message710) and resource used in response806 (similar to resource used in response720), where resource used inresponse806 was used in response tomessage805 in the training data. In addition,system801 takes as inputother resource807, which may be any other resource, such as a randomly selected resource fromresources data store140.Other resource807 is used during optimization as described in greater detail below.
Insystem801,session context component830 may compute a context vector similar to the context vector computed bysession context component730. Similarly,optimization component850 may update parameters of the one or more models similar to the operations ofoptimization component750. As withFIG. 7, the solid lines inFIG. 8A illustrate data flows for processing training data to determine updated model parameters, and the dashed lines inFIG. 8A illustrate data flows for updating the model parameters in the components ofsystem801. Other components ofsystem801 perform additional processing that are now described.
Messagefeature extraction component810 receivesmessage805 and extracts features from the message. Any appropriate features may be extracted. For example, messagefeature extraction component810 may generate a feature vector, where the length of the feature vector is the size of the vocabulary of known words (and possibly n-grams) and each element of the vector indicates a number of times the corresponding word appeared in the message. In another example,feature extraction component810 may generate a feature matrix, where the number of rows (or columns) in the matrix is equal to the number of words in the message and each row represents the corresponding word as a word embedding (e.g., an N-dimensional vector of real numbers that represents the corresponding word).
Message embedding component820 receives the features from messagefeature extraction component810 and outputs a semantic representation of the message, such as a message embedding. For example, a message embedding may be a fixed-length vector that represents a meaning of the message in an N-dimensional space. Any appropriate techniques may be used to generate the message embedding from the message features. For example,message embedding component820 may include any unsupervised embedding method (e.g., Word2Vec, GLoVe, SPPMI-SVD), a multi-layer perceptron, or another type of neural network (e.g., a deep averaging, convolution, recurrent, or recursive neural network). In some implementations, message embedding component may receive TFIDF (term frequency, inverse document frequency) features and use a multi-layer perceptron with rectified linear units.
Session context component830 may receive the message embedding frommessage embedding component820 and generate a context vector for the session (e.g., a context vector for the current message and one or more previous messages in the session). Any appropriate techniques may be used to generate a context vector for the session. In some implementations, a context vector may be computed using a neural network, such as a single-layer convolution neural network that processes a linear combination of a fixed-size window of messages or a recurrent neural network (RNN) with long short-term memory units. Now described is an example RNN with long short-term memory units that may be implemented bysession context component830.
Wheresystem801 is processing a sequence of messages in a session, let mtrepresent a message in the session for t from 1 to s, where s is the number of messages in the session so far. For each message, mt, let xtrepresent the features for the message as computed by messagefeature extraction component810, and let f(xt) represent the message embedding as computed bymessage embedding component820. Let a be the length of a vector representing the message embedding and d be the hidden vector size of the RNN.
In some implementation,session context component830 may compute a context vector as follows:
where the Ui, Uo, Uf, Uyare d by a matrices of parameters; the Vi, Vo, Vf, Vyare d by d matrices of parameters; ⊙ is the element-wise multiplication operator, y0is initialized as a zero vector; and c0is initialized as a zero vector. After computing the above, the vector ytis a context vector that indicates a context of the session after processing message mt.
Encoding component840 receives the context vector fromsession context component830 and outputs an encoded context vector. As explained in greater detail below, the encoded context vector may be referred to as an approximate context hash vector and later processing may compute a context hash vector from the approximate context hash vector.
In some implementations,encoding component840 may perform a divide-and-encode operation so that the output has a smaller dimension than the input and that the output is an approximately binary vector (many of the elements of the output vector are close to 0 or 1). For example, where the context vector has length d, the encoded context vector may have length k, where d is divisible by k. In some implementations, the encoded context vector may be computed as
where Mt[j] indicates the jthelement of the vector Mtfor j from 1 to k, wjis a parameter vector of length d/k for j from 1 to k, the superscript (j) means the jthslice of length d/k, and E is a small positive number. The encoded context vector may be represented as Mt. In some implementations, encoding component may be implemented using a single layer perceptron with sigmoid units.
Optimization component850 then receives the encoded context vector and uses it to update model parameters.Optimization component850 may also take as inputs one or more encoded resource vectors, and optimization component is described in greater detail below following the description of computation of the encoded resource vectors.
System801 also receives as input resource used inresponse806, where resource used inresponse806 is the resource that was actually used in response tomessage805 in the training data. Resource used inresponse806 may be any type of resource described herein, such as a message by a CSR to respond to a customer message or an image used by a CSR in responding to a customer.
Resourcefeature extraction component815 receives resource used inresponse806 and extracts features from the resource. Any appropriate features may be extracted, and the type of features extracted may depend on a type of the resource. For example, a first type of features may be extracted where the resource is a message and a second type of features may be extracted where the resource is an image. Where the resource is a message, resourcefeature extraction component815 may provide the same functionality as messagefeature extraction component810. Where the resource is an image, resourcefeature extraction component815 may, for example, use pixel values as features or compute SIFT (scale-invariant feature transform), SURF (speeded up robust features), or HOG (histogram of ordered gradients) features.
Resource embedding component825 receives features fromresource extraction component815 and outputs a semantic representation of the resource, such as resource embedding. For example, a resource embedding may be a fixed-length vector that represents a meaning of the resource in an N-dimensional space. Any appropriate techniques may be used to generate the resource embedding from the resource features, and the type of embedding may depend on a type of the resource. Where the resource is a message, the functionality ofresource embedding component825 may (but need not be) the same as message embedding component. Where the resource is an image,resource embedding component825 may, for example, embed the image by processing it with a convolution neural network.
Encoding component841 may receive the resource embedding fromresource embedding component825 and output an encoded resource vector.Encoding component841 may provide the same functionality asencoding component840. The encoded resource vector for the resource used inresponse806 may be represented as Nt.
System801 also receives as inputother resource807.Other resource807 may be any other resource inresources data store140. For example,other resource807 may be a randomly selected resource fromresources data store140. In some implementations, a type ofother resource807 may be the same type as resource used in response806 (e.g., if resource used inresponse806 is an image, thenother resource807 may be a randomly selected image from resources data store140). The use ofother resource807 may improve the convergence of the algorithms implemented byoptimization component850.
Resource embedding component826 may perform the same processing asresource embedding component825 andencoding component842 may perform the same processing asencoding component841. Accordingly,encoding component842 may output an encoded resource vector forother resource807, which may be represented as Ñt.
Accordingly, optimization component may receive encoded message vector Mt, encoded resource vector Ntfor resource used inresponse806, and encoded resource vector Ñtforother resource807.Optimization component850 may use the encoded context vector and encoded resource vectors to update parameter values for any parameters insystem801, such as parameters used bymessage embedding component820,resource embedding component825,session context component830, andencoding component840.
In some implementations,optimization component850 may update parameter values by using stochastic gradient descent, for example by minimizing a triplet rank loss as a cost function. For example, the triplet rank loss cost function may be represented as
L=max(0,1+∥Mt−Nt∥22−∥Mt−Ñt∥22)
In some implementations,system801 may receive as input additional resources, such as a second other resource (not shown), where the second other resource may be another randomly selected resource. An encoded resource vector may be computed for the second other resource, and this additional encoded resource vector may be used byoptimization component850 in determining updated parameters.
As above withFIG. 7A, the updated parameters computed byoptimization component850 may be transmitted to other components ofsystem801. For example, updated parameters may be transmitted after each iteration, after processing an entire session from the training corpus, or after processing multiple sessions from the training corpus.
Aftersystem801 has completed the training process, additional processing may be performed to improve the performance of a system for suggesting resources. Above, the encoded context vector computed by encodingcomponent840 was referred to as an approximate context hash vector, and the encoded resource vector computed by encodingcomponent841 was referred to as an approximate resource hash vector. To improve the performance of a system for suggesting resources, a context hash vector may be computed from the approximate context hash vector and a resource hash vector may be computed from an approximate resource hash vector.
A hash vector may provide improved performance in searching for resources over an approximate hash vector. An approximate hash vector may contain real numbers (although many may be close to 0 or 1 as noted above) while a hash vector may contain only boolean values. Performing a search with hash vectors may allow for a quicker search. A hash vector, as used herein, is not limited to storing values in a vector form, and a hash vector may include storing values as a matrix or tensor as the techniques described herein are not limited to any precise arrangement of hash values.
FIGS. 8B and 8C illustrate exemplary systems for creating hash vectors from approximate hash vectors.
FIG. 8B illustrates asystem802 for generating an approximate resource hash vector for resources, such as each of the resources inresources data store140. InFIG. 8B, each of the components may have the same implementation as the corresponding components ofFIG. 8A. For each resource being processed, features may be computed by resourcefeature extraction component815, a resource embedding may be computed byresource embedding component825, and an approximate resource hash vector (also called an encoded resource vector) may be computed by encodingcomponent840. Accordingly, an approximate resource hash vector may be computed for each resource.
FIG. 8C illustrates asystem803 for training a quantization model for computing hash vectors from approximate hash vectors. InFIG. 8C, quantizationmodel training component860 receives as input all of the approximate resource hash vectors computed bysystem802. Quantizationmodel training component860 then generates a model that allows approximate hash vectors to be converted to hash vectors. For example, a quantization model may implement a rotation.
In some implementations, quantization
model training component860 may be implemented as follows. A matrix
may be created where each row of the matrix
is an approximate resource hash vector. This matrix
may then be average-centered. An average row vector may be computed as
where n is the number of rows and
iis the i
throw of
. The matrix
may then be average-centered by subtracting the average row vector from each row of the matrix
.
The average-centered matrix
may then be used train a rotation matrix R for generating hash vectors. The rotation matrix may be initialized, such as by initializing it to a random rotation. The rotation matrix may then be trained by sequentially performing the following updates:
B=sign(
R)
U,S,V=SVD(
BT)
R=VUTwhere signs( ) returns matrix of 1's and −1's according to the sign of corresponding elements of the input and SVD( ) performs a singular value decomposition of the input. This sequence of operations may be performed until a convergence criterion has been met. Each row of the final matrix B contains a resource hash vector for a corresponding resource and the final matrix B may have values of only 1 and −1. In some implementations, the matrix B may be converted to a matrix of 1s and 0s by converting all the −1s to 0s or performing some other similar operation. The quantization model comprises rotating a vector with rotation matrix R and then performing the sign( ) operation on the resulting vector.
FIGS. 9A and 9B illustrate systems for suggesting a resource to a CSR using hash vectors. At the beginning of a session with a CSR a first message is received. The first message may be from either the customer or the CSR. The received message is processed to compute a context hash vector, and then the context hash vector is used to retrieve resources fromresources data store140.
Insystem901 ofFIG. 9A, messagefeature extraction component810 computes features for the message,message embedding component820 computes a message embedding from the features,session context830 computes a context vector for the session (which may include only the first message at this point),encoding component840 computes an approximate context hash vector, andquantization component910 computes a context hash vector.Quantization component910 may receive the approximate context hash vector (which may have real values) and may perform a rotation and then compute the sign( ) of the rotated vector to generate the context hash vector (which may have values of only 1 and −1). In some implementations, the context hash vector may be converted to a vector of 1s and 0s similar to the processing of the resource hash vectors above.
Insystem902, the context hash vector is used to obtain resources to suggest to a CSR.Search component920 receives a context hash vector and retrieves resources fromresources data store140 by comparing the context hash vector to resource hash vectors in the data store.
In some implementations,search component920 may retrieve all resources where the resource hash vector of the resource is equal to the context hash vector by performing a query using the context hash vector.
In some implementations,search component920 may retrieve all resources where the resource hash vector of the resource is within a Hamming radius of the context hash vector. A Hamming radius of a hash vector may comprise all other vectors where the number of different elements is less than or equal to a specified value. A Hamming radius of 1 for a context hash vector would include a resource hash vector that is identical to the context hash vector and all resource hash vectors whose elements are the same as the context hash vector for all but one element. For example, for a context hash vector of [1, 0, 1, 0], resource hash vectors within a Hamming distance of 1 would include [1, 0, 1, 0]; [0, 0, 1, 0]; [1, 1, 1, 0]; [1, 0, 0, 0]; and [1, 0, 1, 1].Search component920 may determine all resource hash vectors within a Hamming radius of the context hash vector and retrieve corresponding resources fromresources data store140.
In some implementations,search component920 may implement an inverted index to speed up retrieval of resources using a context hash vector. An inverted index may include a list of resources corresponding to each possible resource hash vector, and allow for fast retrieval of resources fromresources data store140.
In some implementations,system902 may includepost-processing component930.Post-processing component930 may receive a list of resources fromsearch component920 and perform additional processing to determine which resources to present to a CSR.Post-processing component930 may rerank the resources received fromsearch component920 or may select no resources so that no suggestions are presented to a CSR. In some implementations,post-processing component930 may use a translation language model that was trained to translate between messages and resources used in response to messages; may apply any statistical machine translation techniques that indicate a match between a message and a resource, may use a ranking support vector machine that processes TFIDF features of a previous message, or may use any other known reranking techniques.
The resource suggestions may then be presented to a CSR, such as by presenting information about the suggested resources in a user interface, such as the user interface ofFIG. 2. The CSR may then use the suggested resources in a conversation with the customer. For example, where the suggested resource is a message, the CSR may send that message to the customer or may modify it and then send it to the customer.
FIG. 10 is a flowchart of an example implementation of training one or more models for computing hash vectors for suggesting resources. InFIG. 10, the ordering of the steps is exemplary and other orders are possible, not all steps are required and, in some implementations, some steps may be omitted or other steps may be added. The process of the flowcharts may be implemented, for example, by any of the computers or systems described herein.
Atstep1010, a message sequence is obtained from a training corpus, such as the message sequence ofFIG. 4 or a portion of the message sequence ofFIG. 4. The message sequence may include any number of messages between a customer and a CSR. The message sequence may also include other information, such as other resources used by a CSR during the session between the customer and the CSR. For example, the message sequence may be messages1-3 ofFIG. 4.
Atstep1020, an approximate context hash vector is computed for the message sequence. The approximate context hash vector may be computed using any of the techniques described above. For example, an approximate context hash vector may be computed iteratively for each message in the sequence, where each approximate context hash vector is computed using the approximate context hash vector from the previous iteration. Computing the approximate context hash vector may also include processing other resources used by a CSR, such as when the CSR uses an image in responding to the customer. In some implementations, an approximate context hash vector may be computed by messagefeature extraction component810,message embedding component820,session context component830, andencoding component840.
Atstep1030, a response of the CSR to the last message in the message sequence is obtained from the training data. For example, the response of the CSR may bemessage4 ofFIG. 4.
Atstep1040, an approximate resource hash vector is computed for the response. The approximate resource hash vector may be computed using any of the techniques described above. In some implementations, an approximate resource hash vector for the response may be computed by resourcefeature extraction component815,resource embedding component825, andencoding component841.
Atstep1050, a different resource is obtained that is different from the response of the CSR to customer atstep1040. For example, a resource may be selected randomly fromresources data store140.
Atstep1060, an approximate resource hash vector is computed for the different resource obtained atstep1050. The approximate resource hash vector for the different resource may be computed using any of the techniques described above. In some implementations, approximate resource hash vector for the different resource may be computed by resourcefeature extraction component816,resource embedding component826, andencoding component842.
In some implementations, multiple different resources may be used as described above. Accordingly, steps1050 and1060 may be performed for each different resource that is used.
Atstep1070, model parameters are updated using the approximate context hash vector, the approximate resource hash vector for the response of the CSR, and one or more approximate resource hash vectors for the different resources. The model parameters may be updated using any of the techniques described herein.
Atstep1080, it is determined whether the training process has completed. Any appropriate criteria may be used to determined when the training process has completed. For example, where the model parameters have converged to stable values (e.g., the differences with a previous iteration are small), it may be determined that the training has completed. Where training has not completed, processing may return to step1010 to, for example, obtain and process the following message in a message sequence.
Atstep1090, training is completed and the trained one or more models may be further processed (e.g., compute approximate resource hash vectors, train a quantization model, and compute resource hash vectors) and used for suggesting resources.
FIG. 11 is a flowchart of an example implementation of suggesting resources using hash vectors. InFIG. 11, the ordering of the steps is exemplary and other orders are possible, not all steps are required and, in some implementations, some steps may be omitted or other steps may be added. The process of the flowcharts may be implemented, for example, by any of the computers or systems described herein.
Atstep1110, a message is received from either a customer or a CSR. For example, the message may be sent from a customer to the CSR or from the CSR to the customer. The message may be the first message between them or it may follow other messages between them.
Atstep1120, a semantic representation is computed from the message. The semantic representation may be any representation of the message that indicates a meaning of the message, although the semantic representation may not be understandable by a person. The semantic representation may be a vector of real numbers or may take another form, such as a matrix or a tensor. In some implementations, the semantic representation may be a message embedding computed by messagefeature extraction component810 andmessage embedding component820.
Atstep1130, a context vector is computed for the session using the semantic representation of the message. The context vector may be any representation of the session that indicates a meaning of the session (for example, the meaning of the session may include a meaning of the current message and a meaning of previous messages in the session). The context vector may be computed using a context vector from a previous iteration that processed a previous message in the session. The context vector may be a vector of real numbers or may be in another format, such as a matrix or a tensor (the term context vector is used for clarity of presentation but a context matrix or context tensor may be computed instead). In some implementations, the context vector may be computed usingsession context component830.
Atstep1140, a context hash vector is computed for the session. The context hash vector may be any hash vector that indicates a meaning of the session (for example, the meaning of the session may include a meaning of the current message and a meaning of previous messages in the session). The context hash vector may be a vector where each element of the vector takes one of two values, such as 0 or 1. In some implementations, a context hash matrix or a context hash tensor may be computed instead of a context hash vector. In some implementations, the context hash vector may be computed usingencoding component840 andquantization component910.
Atstep1150, one or more resources are obtained using the context hash vector. For example, one or more resources may be retrieved from a data store of resources where a resource hash vector of a resource matches or is close to the context hash vector. In some implementations, resources may be obtained where the resource hash vectors are within a Hamming distance of the context hash vector. In some implementations, the one or more resources may be obtained bysearch component920.
Atstep1160, post-processing may be performed on the obtained resources as described above. In some implementations, the post-processing may be performed bypost-processing component930.
Atstep1170, one or more resources are caused to be presented to the CSR. For example, the one or more resources may be presented to the CSR using the user interface ofFIG. 2. The CSR may then use the resource in responding to the customer as described above.
FIG. 12 illustrates components of one implementation of acomputing device1200 for implementing any of the techniques described above. InFIG. 12, the components are shown as being on asingle computing device1200, but the components may be distributed among multiple computing devices, such as a system of computing devices, including, for example, an end-user computing device (e.g., a smart phone or a tablet) and/or a server computing device (e.g., cloud computing).
Computing device1200 may include any components typical of a computing device, such as volatile ornonvolatile memory1210, one ormore processors1211, and one or more network interfaces1212.Computing device1200 may also include any input and output components, such as displays, keyboards, and touch screens.Computing device1200 may also include a variety of components or modules providing specific functionality, and these components or modules may be implemented in software, hardware, or a combination thereof. Below, several examples of components are described for one example implementation, and other implementations may include additional components or exclude some of the components described below.
Computing device1200 may have asupport component1220 that provides functionality for allowing a customer and a CSR to interact with each other in a support session, such as presenting user interfaces to a customer or CSR, allowing messages to be transmitted between the customer and the CSR, or presenting suggestions to a CSR.Computing device1200 may have asuggestion component1230 that may identify resources as possible suggestions for a CSR, such as by processing a message transmitted between a customer and a CSR, computing a context for the session, and retrieving resources from a data store using the computed context.Computing device1200 may have amodel training component1240 that train mathematical models, such as artificial neural networks, for suggesting resources based on a context of a session.
Computing device1200 may include or have access to various data stores, such asdata stores140 and510. Data stores may use any known storage technology, such as files or relational or non-relational databases. For example,computing device1200 may have aresources data store140 and a trainingcorpus data store510, as described above.
For clarity of presentation, the techniques described above have been presented in the context of a session between a customer and a CSR where the customer and CSR are exchanging messages with each other. The techniques described above, however, are not limited to that particular example, and other applications are possible.
The techniques described above may be applied to any two entities exchanging messages with each other. For example, two individuals may be exchanging messages with each other, resources may be suggested to either user, and the suggested resources may include a message to send in response or something else, such as a URL to a website with information relevant to the conversation.
The techniques described above may be applied to interactions other than messages. For example, the interactions between two entities may be in the form of audio and/or video and resources may be suggested to the entities by processing the audio and/or video to determine a context of the session and suggest resources to the entities.
The techniques described above may be applied to interactions that proceed in non-linear ways, such as a directed acyclic graph (in comparison to linear, sequential exchanges in a messaging session). For interactions that proceed as a directed acyclic graph, a recursive neural network (e.g., with long short-term memory units) may be used that is adapted to process nodes of a directed acyclic graph.
The techniques described above may be combined with any of the techniques described in U.S. patent application Ser. No. 15/254,008 filed on Sep. 1, 2016, now issued as U.S. Pat. No. 9,715,496 on Jul. 25, 2017, and U.S. patent application Ser. No. 15/383,707, filed on Dec. 19, 2016 and entitled “Word Hash Language Model”, each of which is herein incorporated by reference in its entirety for all purposes. For example, any of the techniques described herein may be provided as part of a third-party semantic processing service whereby a third party provides semantic processing services to a company to assist the company in providing customer service to its customers.
The methods and systems described herein may be deployed in part or in whole through a machine that executes computer software, program codes, and/or instructions on a processor. “Processor” as used herein is meant to include at least one processor and unless context clearly indicates otherwise, the plural and the singular should be understood to be interchangeable. The present invention may be implemented as a method on the machine, as a system or apparatus as part of or in relation to the machine, or as a computer program product embodied in a computer readable medium executing on one or more of the machines. The processor may be part of a server, client, network infrastructure, mobile computing platform, stationary computing platform, or other computing platform. A processor may be any kind of computational or processing device capable of executing program instructions, codes, binary instructions and the like. The processor may be or include a signal processor, digital processor, embedded processor, microprocessor or any variant such as a co-processor (math co-processor, graphic co-processor, communication co-processor and the like) and the like that may directly or indirectly facilitate execution of program code or program instructions stored thereon. In addition, the processor may enable execution of multiple programs, threads, and codes. The threads may be executed simultaneously to enhance the performance of the processor and to facilitate simultaneous operations of the application. By way of implementation, methods, program codes, program instructions and the like described herein may be implemented in one or more thread. The thread may spawn other threads that may have assigned priorities associated with them; the processor may execute these threads based on priority or any other order based on instructions provided in the program code. The processor may include memory that stores methods, codes, instructions and programs as described herein and elsewhere. The processor may access a storage medium through an interface that may store methods, codes, and instructions as described herein and elsewhere. The storage medium associated with the processor for storing methods, programs, codes, program instructions or other type of instructions capable of being executed by the computing or processing device may include but may not be limited to one or more of a CD-ROM, DVD, memory, hard disk, flash drive, RAM, ROM, cache and the like.
A processor may include one or more cores that may enhance speed and performance of a multiprocessor. In embodiments, the process may be a dual core processor, quad core processors, other chip-level multiprocessor and the like that combine two or more independent cores (called a die).
The methods and systems described herein may be deployed in part or in whole through a machine that executes computer software on a server, client, firewall, gateway, hub, router, or other such computer and/or networking hardware. The software program may be associated with a server that may include a file server, print server, domain server, internet server, intranet server and other variants such as secondary server, host server, distributed server and the like. The server may include one or more of memories, processors, computer readable media, storage media, ports (physical and virtual), communication devices, and interfaces capable of accessing other servers, clients, machines, and devices through a wired or a wireless medium, and the like. The methods, programs, or codes as described herein and elsewhere may be executed by the server. In addition, other devices required for execution of methods as described in this application may be considered as a part of the infrastructure associated with the server.
The server may provide an interface to other devices including, without limitation, clients, other servers, printers, database servers, print servers, file servers, communication servers, distributed servers and the like. Additionally, this coupling and/or connection may facilitate remote execution of program across the network. The networking of some or all of these devices may facilitate parallel processing of a program or method at one or more location without deviating from the scope of the invention. In addition, any of the devices attached to the server through an interface may include at least one storage medium capable of storing methods, programs, code and/or instructions. A central repository may provide program instructions to be executed on different devices. In this implementation, the remote repository may act as a storage medium for program code, instructions, and programs.
The software program may be associated with a client that may include a file client, print client, domain client, internet client, intranet client and other variants such as secondary client, host client, distributed client and the like. The client may include one or more of memories, processors, computer readable media, storage media, ports (physical and virtual), communication devices, and interfaces capable of accessing other clients, servers, machines, and devices through a wired or a wireless medium, and the like. The methods, programs, or codes as described herein and elsewhere may be executed by the client. In addition, other devices required for execution of methods as described in this application may be considered as a part of the infrastructure associated with the client.
The client may provide an interface to other devices including, without limitation, servers, other clients, printers, database servers, print servers, file servers, communication servers, distributed servers and the like. Additionally, this coupling and/or connection may facilitate remote execution of program across the network. The networking of some or all of these devices may facilitate parallel processing of a program or method at one or more location without deviating from the scope of the invention. In addition, any of the devices attached to the client through an interface may include at least one storage medium capable of storing methods, programs, applications, code and/or instructions. A central repository may provide program instructions to be executed on different devices. In this implementation, the remote repository may act as a storage medium for program code, instructions, and programs.
The methods and systems described herein may be deployed in part or in whole through network infrastructures. The network infrastructure may include elements such as computing devices, servers, routers, hubs, firewalls, clients, personal computers, communication devices, routing devices and other active and passive devices, modules and/or components as known in the art. The computing and/or non-computing device(s) associated with the network infrastructure may include, apart from other components, a storage medium such as flash memory, buffer, stack, RAM, ROM and the like. The processes, methods, program codes, instructions described herein and elsewhere may be executed by one or more of the network infrastructural elements.
The methods, program codes, and instructions described herein and elsewhere may be implemented on a cellular network having multiple cells. The cellular network may either be frequency division multiple access (FDMA) network or code division multiple access (CDMA) network. The cellular network may include mobile devices, cell sites, base stations, repeaters, antennas, towers, and the like. The cell network may be a GSM, GPRS, 3G, EVDO, mesh, or other networks types.
The methods, programs codes, and instructions described herein and elsewhere may be implemented on or through mobile devices. The mobile devices may include navigation devices, cell phones, mobile phones, mobile personal digital assistants, laptops, palmtops, netbooks, pagers, electronic books readers, music players and the like. These devices may include, apart from other components, a storage medium such as a flash memory, buffer, RAM, ROM and one or more computing devices. The computing devices associated with mobile devices may be enabled to execute program codes, methods, and instructions stored thereon. Alternatively, the mobile devices may be configured to execute instructions in collaboration with other devices. The mobile devices may communicate with base stations interfaced with servers and configured to execute program codes. The mobile devices may communicate on a peer-to-peer network, mesh network, or other communications network. The program code may be stored on the storage medium associated with the server and executed by a computing device embedded within the server. The base station may include a computing device and a storage medium. The storage device may store program codes and instructions executed by the computing devices associated with the base station.
The computer software, program codes, and/or instructions may be stored and/or accessed on machine readable media that may include: computer components, devices, and recording media that retain digital data used for computing for some interval of time; semiconductor storage known as random access memory (RAM); mass storage typically for more permanent storage, such as optical discs, forms of magnetic storage like hard disks, tapes, drums, cards and other types; processor registers, cache memory, volatile memory, non-volatile memory; optical storage such as CD, DVD; removable media such as flash memory (e.g. USB sticks or keys), floppy disks, magnetic tape, paper tape, punch cards, standalone RAM disks, Zip drives, removable mass storage, off-line, and the like; other computer memory such as dynamic memory, static memory, read/write storage, mutable storage, read only, random access, sequential access, location addressable, file addressable, content addressable, network attached storage, storage area network, bar codes, magnetic ink, and the like.
The methods and systems described herein may transform physical and/or or intangible items from one state to another. The methods and systems described herein may also transform data representing physical and/or intangible items from one state to another.
The elements described and depicted herein, including in flow charts and block diagrams throughout the figures, imply logical boundaries between the elements. However, according to software or hardware engineering practices, the depicted elements and the functions thereof may be implemented on machines through computer executable media having a processor capable of executing program instructions stored thereon as a monolithic software structure, as standalone software modules, or as modules that employ external routines, code, services, and so forth, or any combination of these, and all such implementations may be within the scope of the present disclosure. Examples of such machines may include, but may not be limited to, personal digital assistants, laptops, personal computers, mobile phones, other handheld computing devices, medical equipment, wired or wireless communication devices, transducers, chips, calculators, satellites, tablet PCs, electronic books, gadgets, electronic devices, devices having artificial intelligence, computing devices, networking equipments, servers, routers and the like. Furthermore, the elements depicted in the flow chart and block diagrams or any other logical component may be implemented on a machine capable of executing program instructions. Thus, while the foregoing drawings and descriptions set forth functional aspects of the disclosed systems, no particular arrangement of software for implementing these functional aspects should be inferred from these descriptions unless explicitly stated or otherwise clear from the context. Similarly, it will be appreciated that the various steps identified and described above may be varied, and that the order of steps may be adapted to particular applications of the techniques disclosed herein. All such variations and modifications are intended to fall within the scope of this disclosure. As such, the depiction and/or description of an order for various steps should not be understood to require a particular order of execution for those steps, unless required by a particular application, or explicitly stated or otherwise clear from the context.
The methods and/or processes described above, and steps thereof, may be realized in hardware, software or any combination of hardware and software suitable for a particular application. The hardware may include a general-purpose computer and/or dedicated computing device or specific computing device or particular aspect or component of a specific computing device. The processes may be realized in one or more microprocessors, microcontrollers, embedded microcontrollers, programmable digital signal processors or other programmable device, along with internal and/or external memory. The processes may also, or instead, be embodied in an application specific integrated circuit, a programmable gate array, programmable array logic, or any other device or combination of devices that may be configured to process electronic signals. It will further be appreciated that one or more of the processes may be realized as a computer executable code capable of being executed on a machine-readable medium.
The computer executable code may be created using a structured programming language such as C, an object oriented programming language such as C++, or any other high-level or low-level programming language (including assembly languages, hardware description languages, and database programming languages and technologies) that may be stored, compiled or interpreted to run on one of the above devices, as well as heterogeneous combinations of processors, processor architectures, or combinations of different hardware and software, or any other machine capable of executing program instructions.
Thus, in one aspect, each method described above and combinations thereof may be embodied in computer executable code that, when executing on one or more computing devices, performs the steps thereof. In another aspect, the methods may be embodied in systems that perform the steps thereof, and may be distributed across devices in a number of ways, or all of the functionality may be integrated into a dedicated, standalone device or other hardware. In another aspect, the means for performing the steps associated with the processes described above may include any of the hardware and/or software described above. All such permutations and combinations are intended to fall within the scope of the present disclosure.
While the invention has been disclosed in connection with the preferred embodiments shown and described in detail, various modifications and improvements thereon will become readily apparent to those skilled in the art. Accordingly, the spirit and scope of the present invention is not to be limited by the foregoing examples, but is to be understood in the broadest sense allowable by law.
All documents referenced herein are hereby incorporated by reference.