codeaudit/polyai-modelsPublic

NotificationsYou must be signed in to change notification settings
Fork37
Star6

Neural Models for Conversational AI

License

Apache-2.0 license

6 stars 37 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
.circleci		.circleci
dstc7		dstc7
examples		examples
testdata/tfhub_modules		testdata/tfhub_modules
.flake8		.flake8
.gitignore		.gitignore
.isort.cfg		.isort.cfg
LICENSE		LICENSE
README.md		README.md
encoder_client.py		encoder_client.py
encoder_client_test.py		encoder_client_test.py
encoder_layers.py		encoder_layers.py
encoder_layers_test.py		encoder_layers_test.py
encoder_utils.py		encoder_utils.py
encoder_utils_test.py		encoder_utils_test.py
polyai-logo.png		polyai-logo.png
requirements.txt		requirements.txt

Repository files navigation

polyai-models

Neural Models for Conversational AI

This repo shares models fromPolyAI publications, including theConveRT efficient dual-encoder model. These are shared as Tensorflow Hub modules, listed below.We also share example code and utility classes, though for many theTensorflow Hub URLs will be enough.

Requirements

Using these models requiresTensorflow Hub andTensorflow Text. In particular, Tensorflow Text provides ops that allow the model to directly work on text, requiring no pre-processing or tokenization from the user. We test using Tensorflow version 1.14 and Tensorflow Text version 0.6.0 (which is compatible with Tensorflow 1.14). A list of available versions can be foundon the Tensorflow Text github repo.

Models

ConveRT

This is the ConveRT dual-encoder model, using subword representations and lighter-weight more efficient transformer-styleblocks to encode text, as described inthe ConveRT paper.It provides powerful representations for conversational data, and can also be used as a response ranker.The model costs under $100 to train from scratch, can be quantized to under 60MB, and is competitive with larger Transformer networks on conversational tasks.We share an unquantized version of the model, facilitating fine-tuning. Pleaseget in touch if you are interested in using the quantized ConveRT model. The Tensorflow Hub url is:

module=tfhub.Module("http://models.poly-ai.com/convert/v1/model.tar.gz")

See theconvert-examples.ipynb notebook for some examples of how to use this model.

TFHub signatures

default

Takes as inputsentences, a string tensor of sentences to encode. Outputs 1024 dimensional vectors, giving a representation for each sentence. These are the output of the sqrt-N reduction in the shared tranformer encoder. These representations work well as input to classification models.

sentence_encodings=module(  ["hello how are you?","what is your name?","thank you good bye"])

encode_context

Takes as inputcontexts, a string tensor of contexts to encode. Outputs 512 dimensional vectors, giving the context representation of each input. These are trained to have a high cosine-similarity with the response representations of good responses (from theencode_response signature)

context_encodings=module(  ["hello how are you?","what is your name?","thank you good bye"],signature="encode_context")

encode_response

Takes as inputresponses, a string tensor of responses to encode. Outputs 512 dimensional vectors, giving the response representation of each input. These are trained to have a high cosine-similarity with the context representations of good corresponding contexts (from theencode_context signature)

response_encodings=module(  ["i am well","I am Matt","bye!"],signature="encode_response")

encode_sequence

Takes as inputsentences, a string tensor of sentences to encode. This outputs sequence encodings, a 3-tensor of shape[batch_size, max_sequence_length, 512], as well as the corresponding subword tokens, a utf8-encoded matrix of shape[batch_size, max_sequence_length]. The tokens matrix is padded with empty strings, which may help in masking the sequence tensor. Theencoder_utils.py library has a few functions for dealing with these tokenizations, including a detokenization function, and a function that infers byte spans in the original strings.

output=module(  ["i am well","I am Matt","bye!"],signature="encode_sequence",as_dict=True)sequence_encodings=output['sequence_encoding']tokens=output['tokens']

tokenize

Takes as inputsentences, a string tensor of sentences to encode. This outputs the corresponding subword tokens, a utf8-encoded matrix of shape[batch_size, max_sequence_length]. The tokens matrix is padded with empty strings. Usually this process is internal to the network, but for some applications it may be useful to access the internal tokenization.

tokens=module(  ["i am well","I am Matt","bye!"],signature="tokenize")

Multi-Context ConveRT

This is the multi-context ConveRT model fromthe ConveRT paper, that uses extra contexts from the conversational history to refine the context representations. This is an unquantized version of the model. The Tensorflow Hub url is:

module=tfhub.Module("http://models.poly-ai.com/multi_context_convert/v1/model.tar.gz")

TFHub signatures

This model has the same signatures as the ConveRT encoder, except for theencode_context signature that also takes the extra contexts as input. The extra contexts are the previous messages in the dialogue (typically at most 10) prior to the immediate context, and must be joined with spaces from most recent to oldest.

For example, consider the dialogue:

A: Hey!B: Hello how are you?A: Fine, strange weather recently right?B: Yeah

then the context representation is computed as:

context= ["Yeah"]extra_context= ["Fine, strange weather recently right? Hello how are you? Hey!"]context_encodings=module(  {'context':context,'extra_context':extra_context,  },signature="encode_context",)

Seeencoder_client.py for code that computes these features.

ConveRT finetuned on Ubuntu

This is the multi-context ConveRT model, fine-tuned to the DSTC7 Ubuntu response ranking task. It has the exact same signatures as the extra context model, and has TFHub urihttp://models.poly-ai.com/ubuntu_convert/v1/model.tar.gz. Note that this model requires prefixing the extra context features with"0: ","1: ","2: " etc.

Thedstc7/evaluate_encoder.py script demonstrates using this encoder to reproduce the results fromthe ConveRT paper.

Keras layers

Keras layers for the above encoder models are implemented inencoder_layers.py. These may be useful for building a model that extends the encoder models, and/or fine-tuning them to your own data.

Encoder client

A python classEncoderClient is implemented inencoder_client.py, that gives a simple interface for encoding sentences, contexts, and responses with the above models. It takes python strings as input, and numpy matrices as output:

client=encoder_client.EncoderClient("http://models.poly-ai.com/convert/v1/model.tar.gz")# We will find good responses to the following context.context_encodings=client.encode_contexts(["What's your name?"])# Let's rank the following responses as candidates.candidate_responses= ["No thanks.","I'm Matt.","Hey.","I have a dog."]response_encodings=client.encode_responses(candidate_responses)# The scores are computed using the dot product.scores=response_encodings.dot(context_encodings.T).flatten()# Output the top scoring response.top_idx=scores.argmax()print(f"Best response:{candidate_responses[top_idx]}, score:{scores[top_idx]:.3f}")# This should print "Best response: I'm Matt., score: 0.377".

Internally it implements caching, deduplication, and batching, to help speed up encoding. Note that because it does batching internally, you can pass very large lists of sentences to encode without going out of memory.

Citations

ConveRT: Efficient and Accurate Conversational Representations from Transformers

@article{Henderson2019convert,    title={{ConveRT}: Efficient and Accurate Conversational Representations from Transformers},    author={Matthew Henderson and I{\~{n}}igo Casanueva and Nikola Mrk\v{s}i\'{c} and Pei-Hao Su and Tsung-Hsien and Ivan Vuli\'{c}},    journal={CoRR},    volume={abs/1911.03688},    year={2019},    url={http://arxiv.org/abs/1911.03688},}

Development

Setting up an environment for development:

Create a python 3 virtual environment

python3 -m venv ./venv

Install the requirements

. venv/bin/activatepip install -r requirements.txt

Run the unit tests

python -m unittest discover -p'*_test.py'.

Pull requests will trigger a CircleCI build that:

runs flake8 and isort
runs the unit tests

About

Neural Models for Conversational AI

Languages

Python100.0%

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

License

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

polyai-models

Requirements

Models

ConveRT

TFHub signatures

Multi-Context ConveRT

TFHub signatures

ConveRT finetuned on Ubuntu

Keras layers

Encoder client

Citations

Development

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages

Contributors2

Languages

Movatterモバイル変換

License

codeaudit/polyai-models

Folders and files

Latest commit

History

Repository files navigation

polyai-models

Requirements

Models

ConveRT

TFHub signatures

Multi-Context ConveRT

TFHub signatures

ConveRT finetuned on Ubuntu

Keras layers

Encoder client

Citations

Development

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages0

Contributors2

Languages

Packages