- Notifications
You must be signed in to change notification settings - Fork328
Added blog post semantic search in postgres in 15 minutes#1535
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Merged
SilasMarvin merged 21 commits intomasterfromsilas-semantic-search-in-postgres-in-15-minutesJun 18, 2024
Uh oh!
There was an error while loading.Please reload this page.
Merged
Changes from1 commit
Commits
Show all changes
21 commits Select commitHold shift + click to select a range
18f8f44
Preliminary draft of semantic search in postgres in 15 minutes
SilasMarvin00bd75d
Cleanups
SilasMarvin068af92
Ready for review
SilasMarvina9148da
Cleanup first paragraph
SilasMarvin3e0fa33
A few suggestions (#1536)
levkkc71fcd2
Add reason on why to use semantic search
SilasMarvin9b6e75f
Clean up spelling errors
SilasMarvinb451c9b
Fix more small spelling errors
SilasMarvind418deb
Finish timings
SilasMarvin84872ac
Update pgml-cms/blog/semantic-search-in-postgres-in-15-minutes.md
SilasMarvin1686f93
Update pgml-cms/blog/semantic-search-in-postgres-in-15-minutes.md
SilasMarvinb2b9d88
Update pgml-cms/blog/semantic-search-in-postgres-in-15-minutes.md
SilasMarvinb8766bd
Update pgml-cms/blog/semantic-search-in-postgres-in-15-minutes.md
SilasMarvin4574183
Update pgml-cms/blog/semantic-search-in-postgres-in-15-minutes.md
SilasMarvin4db2149
Update pgml-cms/blog/semantic-search-in-postgres-in-15-minutes.md
SilasMarvin68368e2
Update pgml-cms/blog/semantic-search-in-postgres-in-15-minutes.md
SilasMarvinaf8dd3e
Convert italics back to backticks
SilasMarvin2c156ae
Remove hnsw link out
SilasMarvinfaf0be1
Alude to arrays
SilasMarvin27445f5
Finalize post
SilasMarvin427f77f
Merge branch 'master' into silas-semantic-search-in-postgres-in-15-mi…
SilasMarvinFile filter
Filter by extension
Conversations
Failed to load comments.
Loading
Uh oh!
There was an error while loading.Please reload this page.
Jump to
Jump to file
Failed to load files.
Loading
Uh oh!
There was an error while loading.Please reload this page.
Diff view
Diff view
Finish timings
- Loading branch information
Uh oh!
There was an error while loading.Please reload this page.
commitd418debfdb243b1059130a709b0801c76d6331b6
There are no files selected for viewing
33 changes: 23 additions & 10 deletionspgml-cms/blog/semantic-search-in-postgres-in-15-minutes.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -35,7 +35,7 @@ Embeddings are vectors. Given some text and some embedding model, we can convert | ||
!!! generic | ||
!!! code_block | ||
```postgresql | ||
SELECT pgml.embed('mixedbread-ai/mxbai-embed-large-v1', 'Generating embeddings in Postgres is fun!'); | ||
@@ -130,7 +130,7 @@ This is a somewhat confusing formula but luckily _pgvector_ provides an operato | ||
!!! generic | ||
!!! code_block | ||
```postgresql | ||
SELECT '[1,2,3]'::vector <=> '[2,3,4]'::vector; | ||
@@ -206,17 +206,30 @@ It is inefficient to compute embeddings for all the documents every time we sear | ||
_pgvector_ provides us with the `vector` data type for storing embeddings in regular PostgreSQL tables: | ||
SilasMarvin marked this conversation as resolved. Show resolvedHide resolvedUh oh!There was an error while loading.Please reload this page. | ||
!!! generic | ||
!!! code_block time="12.547 ms" | ||
```postgresql | ||
CREATE TABLE text_and_embeddings ( | ||
id SERIAL PRIMARY KEY, | ||
text text, | ||
embedding vector (1024) | ||
); | ||
``` | ||
!!! | ||
!!! | ||
Let's add some data to our table: | ||
!!! generic | ||
!!! code_block time="72.156 ms" | ||
```postgresql | ||
INSERT INTO text_and_embeddings (text, embedding) | ||
VALUES | ||
( | ||
@@ -240,11 +253,11 @@ VALUES | ||
!!! | ||
Now thatour table has some data, we can search over it using the following query: | ||
!!! generic | ||
!!! code_block time="35.016 ms" | ||
```postgresql | ||
WITH query_embedding AS ( | ||
@@ -288,7 +301,7 @@ Let's demonstrate this by inserting 100,000 additional embeddings: | ||
!!! generic | ||
!!! code_block time="3114242.499 ms" | ||
```postgresql | ||
INSERT INTO text_and_embeddings (text, embedding) | ||
@@ -309,7 +322,7 @@ Now trying our search engine again: | ||
!!! generic | ||
!!! code_block time="138.252 ms" | ||
```postgresql | ||
WITH embedded_query AS ( | ||
@@ -364,7 +377,7 @@ and search again, we would get much better performance: | ||
!!! generic | ||
!!! code_block time="44.508 ms" | ||
```postgresql | ||
WITH embedded_query AS ( | ||
@@ -405,7 +418,7 @@ HNSW indexes typically have better and faster recall but require more compute wh | ||
!!! generic | ||
!!! code_block time="115564.303" | ||
```postgresql | ||
DROP index text_and_embeddings_embedding_idx; | ||
@@ -422,7 +435,7 @@ Now let's try searching again: | ||
!!! generic | ||
!!! code_block time="35.716 ms" | ||
```postgresql | ||
WITH embedded_query AS ( | ||
Oops, something went wrong.
Uh oh!
There was an error while loading.Please reload this page.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.