Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Added blog post semantic search in postgres in 15 minutes#1535

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Merged
Merged
Changes from1 commit
Commits
Show all changes
21 commits
Select commitHold shift + click to select a range
18f8f44
Preliminary draft of semantic search in postgres in 15 minutes
SilasMarvinJun 11, 2024
00bd75d
Cleanups
SilasMarvinJun 12, 2024
068af92
Ready for review
SilasMarvinJun 14, 2024
a9148da
Cleanup first paragraph
SilasMarvinJun 17, 2024
3e0fa33
A few suggestions (#1536)
levkkJun 17, 2024
c71fcd2
Add reason on why to use semantic search
SilasMarvinJun 17, 2024
9b6e75f
Clean up spelling errors
SilasMarvinJun 17, 2024
b451c9b
Fix more small spelling errors
SilasMarvinJun 17, 2024
d418deb
Finish timings
SilasMarvinJun 18, 2024
84872ac
Update pgml-cms/blog/semantic-search-in-postgres-in-15-minutes.md
SilasMarvinJun 18, 2024
1686f93
Update pgml-cms/blog/semantic-search-in-postgres-in-15-minutes.md
SilasMarvinJun 18, 2024
b2b9d88
Update pgml-cms/blog/semantic-search-in-postgres-in-15-minutes.md
SilasMarvinJun 18, 2024
b8766bd
Update pgml-cms/blog/semantic-search-in-postgres-in-15-minutes.md
SilasMarvinJun 18, 2024
4574183
Update pgml-cms/blog/semantic-search-in-postgres-in-15-minutes.md
SilasMarvinJun 18, 2024
4db2149
Update pgml-cms/blog/semantic-search-in-postgres-in-15-minutes.md
SilasMarvinJun 18, 2024
68368e2
Update pgml-cms/blog/semantic-search-in-postgres-in-15-minutes.md
SilasMarvinJun 18, 2024
af8dd3e
Convert italics back to backticks
SilasMarvinJun 18, 2024
2c156ae
Remove hnsw link out
SilasMarvinJun 18, 2024
faf0be1
Alude to arrays
SilasMarvinJun 18, 2024
27445f5
Finalize post
SilasMarvinJun 18, 2024
427f77f
Merge branch 'master' into silas-semantic-search-in-postgres-in-15-mi…
SilasMarvinJun 18, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
PrevPrevious commit
NextNext commit
Convert italics back to backticks
  • Loading branch information
@SilasMarvin
SilasMarvin committedJun 18, 2024
commitaf8dd3edcf97bdd4cb92c06a9e8a98105c7ab04c
10 changes: 5 additions & 5 deletionspgml-cms/blog/semantic-search-in-postgres-in-15-minutes.md
View file
Open in desktop
Original file line numberDiff line numberDiff line change
Expand Up@@ -95,7 +95,7 @@ embedding for 'Rust is the best'

!!!

You'll notice how similar the vectors produced by the text "I like Postgres" and "I like SQL" are compared to "Rust is the best". This is a artificial example, but the same idea holds true when translating to real models like_mixedbread-ai/mxbai-embed-large-v1_.
You'll notice how similar the vectors produced by the text "I like Postgres" and "I like SQL" are compared to "Rust is the best". This is a artificial example, but the same idea holds true when translating to real models like`mixedbread-ai/mxbai-embed-large-v1`.

## What does it mean to be "close"?

Expand All@@ -109,11 +109,11 @@ For instance let’s say that we have the following documents:
| 2 | I think tomatoes are incredible on burgers. |


and a user is looking for the answer to the question: "What is the pgml.transform function?". If we embed the search query and all of the documents using a model like_mixedbread-ai/mxbai-embed-large-v1_, we can compare the query embedding to all of the document embeddings, and select the document that has the closest embedding in vector space, and therefore in meaning, to the to the answer.
and a user is looking for the answer to the question: "What is the pgml.transform function?". If we embed the search query and all of the documents using a model like`mixedbread-ai/mxbai-embed-large-v1`, we can compare the query embedding to all of the document embeddings, and select the document that has the closest embedding in vector space, and therefore in meaning, to the to the answer.

These are big embeddings, so we can’t simply estimate which one is closest. So, how do we actually measure the similarity (distance) between different vectors?

_pgvector_ as of this writing supports four different measurements of vector similarity:
`pgvector` as of this writing supports four different measurements of vector similarity:

- L2 distance
- (negative) inner product
Expand All@@ -126,7 +126,7 @@ For most use cases we recommend using the cosine distance as defined by the form

where A and B are two vectors.

This is a somewhat confusing formula but luckily_pgvector_ provides an operator that computes the cosine distance for us:
This is a somewhat confusing formula but luckily`pgvector` provides an operator that computes the cosine distance for us:

!!! generic

Expand DownExpand Up@@ -204,7 +204,7 @@ You'll notice that the distance between "What is the pgml.transform function?" a

It is inefficient to compute embeddings for all the documents every time we search the dataset. Instead, we should embed our documents once and search against precomputed embeddings.

_pgvector_ provides us with the `vector` data type for storing embeddings in regular PostgreSQL tables:
`pgvector` provides us with the `vector` data type for storing embeddings in regular PostgreSQL tables:


!!! generic
Expand Down
Loading

[8]ページ先頭

©2009-2025 Movatter.jp