This repository was archived by the owner on Nov 17, 2020. It is now read-only.
- Notifications
You must be signed in to change notification settings - Fork34
Huffon/sentence-similarity
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
This repo contains various ways to calculate the similarity between source and target sentences. You can choosethe pre-trained models you want to use such asELMo,BERT andUniversal Sentence Encoder (USE).
And you can also choosethe method to be used to get the similarity:
1. Cosine similarity2. Manhattan distance3. Euclidean distance4. Angular distance5. Inner product6. TS-SS score7. Pairwise-cosine similarity8. Pairwise-cosine similarity + IDFYou can experiment with (The number of models) x (The number of methods) combinations!
- This project is developed underconda enviroment
- After cloning this repository, you can simply install all the dependent libraries described in
requirements.txtwithbash install.sh
conda create -n sensim python=3.7conda activate sensimgit clone https://github.com/Huffon/sentence-similarity.gitcd sentence-similaritybash install.sh- Totest your own sentences, you should fill outcorpus.txt with sentences as below:
I ate an apple.I went to the Apple.I ate an orange....- Then,choose themodel andmethod to be used to calculate the similarity between source and target sentences
python sensim.py --model MODEL_NAME [use, bert, elmo] --method METHOD_NAME [cosine, manhattan, euclidean, inner, ts-ss, angular, pairwise, pairwise-idf] --verbose LOG_OPTION (bool)- In this section, you can see the example result of
sentence-similarity - As you know, there is a nosilver-bullet which can calculateperfect similarity between sentences
- You should conduct various experiments with your dataset
- Caution:
TS-SS scoremight not fit withsentence similarity task, since this method originally devised to calculate the similarity between long documents
- Caution:
- Result:
- Universal Sentence Encoder
- Deep contextualized word representations
- BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
- Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
- BERTScore: Evaluating Text Generation with BERT
- A Hybrid Geometric Approach for Measuring Similarity Level Among Documents and Document Clustering
About
This repository contains various ways to calculate sentence vector similarity using NLP models
Topics
Resources
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Releases
No releases published
Packages0
No packages published
Uh oh!
There was an error while loading.Please reload this page.
Contributors3
Uh oh!
There was an error while loading.Please reload this page.
