You signed in with another tab or window.Reload to refresh your session.You signed out in another tab or window.Reload to refresh your session.You switched accounts on another tab or window.Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: pgml-cms/docs/product/vector-database.md
+8-8Lines changed: 8 additions & 8 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -33,14 +33,14 @@ UPDATE
33
33
SET embedding=pgml.embed('intfloat/e5-small',"Address");
34
34
```
35
35
36
-
```
36
+
```sql
37
37
UPDATE5000
38
38
```
39
39
40
40
That's it. We just embedding 5,000 "Address" values with a single SQL query. Let's take a look at what we got:
41
41
42
-
```
43
-
postgresml=#SELECT
42
+
```sql
43
+
SELECT
44
44
"Address",
45
45
(embedding::real[])[1:5]
46
46
FROM usa_house_prices
@@ -79,7 +79,7 @@ ORDER BY
79
79
LIMIT3;
80
80
```
81
81
82
-
```
82
+
```sql
83
83
Address
84
84
----------------------------------------
85
85
1 Infinite Loop, Cupertino, California
@@ -104,8 +104,8 @@ When searching for a nearest neighbor match, `pgvector` picks the closest centro
104
104
105
105
The number of lists in an IVFFlat index is configurable when creating the index. The more lists are created, the faster you can search it, but the nearest neighbor approximation becomes less precise. The best number of lists for a dataset is typically its square root, e.g. if a dataset has 5,000,000 vectors, the number of lists should be:
106
106
107
-
```
108
-
postgresml=#SELECT round(sqrt(5000000)) AS lists;
107
+
```sql
108
+
SELECT round(sqrt(5000000))AS lists;
109
109
lists
110
110
-------
111
111
2236
@@ -124,8 +124,8 @@ WITH (lists = 71);
124
124
125
125
71 is the approximate square root of 5,000 rows we have in that table. With the index created, if we`EXPLAIN` the query we just ran, we'll get an "Index Scan" on the cosine distance index: