- Notifications
You must be signed in to change notification settings - Fork7
An industrial-grade implementation of DSSM
License
NotificationsYou must be signed in to change notification settings
Chiang97912/dssm
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
An industrial-grade implementation of the paper:Learning Deep Structured Semantic Models for Web Search using Clickthrough Data
Latent semantic models, such as LSA, intend to map a query to its relevant documents at the semantic level where keyword-based matching often fails. DSSM project queries and documents into a common low-dimensional space where the relevance of a document given a query is readily computed as the distance between them.
This model can be used as a search engine that helps people find out their desired document even with searching a query that:
- is abbreviation of the document words;
- changed the order of the words in the document;
- shortened words in the document;
- has typos;
- has spacing issues.
DSSM is dependent on PyTorch. Two ways to install DSSM:
Install DSSM from Pypi:
pip install dssm
Install DSSM from the Github source:
git clone https://github.com/Chiang97912/dssm.gitcd dssmpython setup.py install
fromdssm.modelimportDSSMqueries= ['...']# query list, words need to be segmented in advance, and tokens should be spliced with spaces.documents= ['...']# document list, words need to be segmented in advance, and tokens should be spliced with spaces.model=DSSM('dssm-model',device='cuda:0',lang='en')model.fit(queries,documents)
fromdssm.modelimportDSSMfromsklearn.metrics.pairwiseimportcosine_similaritytext_left='...'text_right='...'model=DSSM('dssm-model',device='cpu')vectors=model.encode([text_left,text_right])score=cosine_similarity([vectors[0]], [vectors[1]])print(score)
Python
version 3.6Numpy
version 1.19.5PyTorch
version 1.9.0