Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commite889f72

Browse files
authored
Update embeddings.md to use cosine distance in pgvector example (#716)
1 parent08fa0cd commite889f72

File tree

1 file changed

+5
-15
lines changed

1 file changed

+5
-15
lines changed

‎pgml-dashboard/static/docs/guides/transformers/embeddings.md‎

Lines changed: 5 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -63,28 +63,18 @@ ORDER BY similarity DESC
6363
LIMIT 50;
6464
```
6565

66-
```
67-
WITH query AS (
68-
SELECT pgml.embed('sentence-transformers/all-MiniLM-L6-v2', 'Star Wars christmas special is on Disney') AS embedding
69-
)
70-
SELECT text, pgml.cosine_similarity(tweet_embeddings_2.embedding, query.embedding) AS similarity
71-
FROM tweet_embeddings_2, query
72-
ORDER BY similarity DESC
73-
LIMIT 50;
74-
```
7566
On small datasets (<100k rows), a linear search that compares every row to the query will give sub-second results, which may be fast enough for your use case. For larger datasets, you may want to consider various indexing strategies offered by additional extensions.
7667

7768
-[Cube](https://www.postgresql.org/docs/current/cube.html) is a built-in extension that provides a fast indexing strategy for finding similar vectors. By default it has an arbitrary limit of 100 dimensions, unless Postgres is compiled with a larger size.
7869
-[PgVector](https://github.com/pgvector/pgvector) supports embeddings up to 2000 dimensions out of the box, and provides a fast indexing strategy for finding similar vectors.
7970

8071
```
8172
CREATE EXTENSION vector;
82-
CREATE TABLE items (text text, embedding vector(384));
83-
insert into items select text, embedding from tweet_embeddings_2;
73+
CREATE TABLE items (text TEXT, embedding VECTOR(768));
74+
INSERT INTO items SELECT text, embedding FROM tweet_embeddings;
75+
CREATE INDEX ON items USING ivfflat (embedding vector_cosine_ops);
8476
WITH query AS (
85-
SELECT pgml.embed('sentence-transformers/all-MiniLM-L6-v2', 'Star Wars christmas special is on Disney')::vector AS embedding
77+
SELECT pgml.embed('distilbert-base-uncased', 'Star Wars christmas special is on Disney')::vector AS embedding
8678
)
87-
SELECT * FROM items, query ORDER BY items.embedding <-> query.embedding LIMIT 10;
88-
89-
CREATE INDEX ON tweet_embeddings_2 USING ivfflat (embedding vector_cosine_ops);
79+
SELECT * FROM items, query ORDER BY items.embedding <=> query.embedding LIMIT 10;
9080
```

0 commit comments

Comments
 (0)

[8]ページ先頭

©2009-2025 Movatter.jp