Commite0b091e

Martijn van Beers

committed

Do the dot-product on the sparse matrix

When you have a large corpus, first making the matrix dense takes a lot ofmemory. Doing the dot product first and then expanding the result ismore memory-efficient and still gives the same result

1 parent3a95063 commite0b091eCopy full SHA for e0b091e

File tree

1 file changed

-2

lines changed

Chapter-6
- document_similarity.py

1 file changed

-2

lines changed

`‎Chapter-6/document_similarity.py‎`

Lines changed: 2 additions & 2 deletions

Original file line number	Diff line number	Diff line change
`@@ -39,11 +39,11 @@`
`39`	`39`	`defcompute_cosine_similarity(doc_features,corpus_features,`
`40`	`40`	`top_n=3):`
`41`	`41`	`# get document vectors`
`42`		`-doc_features=doc_features.toarray()[0]`
`43`		`-corpus_features=corpus_features.toarray()`
	`42`	`+doc_features=doc_features[0]`
`44`	`43`	`# compute similarities`
`45`	`44`	`similarity=np.dot(doc_features,`
`46`	`45`	`corpus_features.T)`
	`46`	`+similarity=similarity.toarray()[0]`
`47`	`47`	`# get docs with highest similarity scores`
`48`	`48`	`top_docs=similarity.argsort()[::-1][:top_n]`
`49`	`49`	`top_docs_with_score= [(index,round(similarity[index],3))`

0 commit comments

Comments

(0)

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Commite0b091e

File tree

1 file changed

1 file changed

`‎Chapter-6/document_similarity.py‎`

0 commit comments