11---
2- description :>-
3- Organizational building blocks of the SDK. Manage all documents and related chunks, embeddings, tsvectors, and pipelines.
2+ description :Organizational building blocks of the SDK. Manage all documents and related chunks, embeddings, tsvectors, and pipelines.
43---
4+
55#Collections
66
77Collections are the organizational building blocks of the SDK. They manage all documents and related chunks, embeddings, tsvectors, and pipelines.
88
99##Creating Collections
1010
11- By default, collections will read and write to the database specified by` DATABASE_URL ` environment variable.
11+ By default, collections will read and write to the database specified by` PGML_DATABASE_URL ` environment variable.
1212
13- ###** Default` DATABASE_URL ` **
13+ ###** Default` PGML_DATABASE_URL ` **
1414
1515{% tabs %}
1616{% tab title="JavaScript" %}
@@ -26,9 +26,9 @@ collection = Collection("test_collection")
2626{% endtab %}
2727{% endtabs %}
2828
29- ###** CustomDATABASE \_ URL **
29+ ###Custom` PGML_DATABASE_URL `
3030
31- Create a Collection that reads from a different database than that set by the environment variable` DATABASE_URL ` .
31+ Create a Collection that reads from a different database than that set by the environment variable` PGML_DATABASE_URL ` .
3232
3333{% tabs %}
3434{% tab title="Javascript" %}
@@ -46,21 +46,23 @@ collection = Collection("test_collection", CUSTOM_DATABASE_URL)
4646
4747##Upserting Documents
4848
49- Documents are dictionaries withtwo requiredkeys :` id ` and ` text ` . All other keys/value pairs are stored asmetadata for the document .
49+ Documents are dictionaries withone requiredkey :` id ` . All other keys/value pairs are storedand can be chunked, embedded, broken into tsvectors, and searched over asspecified by a ` Pipeline ` .
5050
5151{% tabs %}
5252{% tab title="JavaScript" %}
5353``` javascript
5454const documents = [
5555 {
56- id: " Document One" ,
56+ id: " document_one" ,
57+ title: " Document One" ,
5758 text: " document one contents..." ,
58- random_key: " this will be metadata for the document " ,
59+ random_key: " here is some random data " ,
5960 },
6061 {
61- id: " Document Two" ,
62+ id: " document_two" ,
63+ title: " Document Two" ,
6264 text: " document two contents..." ,
63- random_key: " this will be metadata for the document " ,
65+ random_key: " here is some random data " ,
6466 },
6567];
6668await collection .upsert_documents (documents);
@@ -71,35 +73,40 @@ await collection.upsert_documents(documents);
7173``` python
7274documents= [
7375 {
74- " id" :" Document 1" ,
76+ " id" :" document_one" ,
77+ " title" :" Document One" ,
7578" text" :" Here are the contents of Document 1" ,
76- " random_key" :" this will be metadata for the document "
79+ " random_key" :" here is some random data " ,
7780 },
7881 {
79- " id" :" Document 2" ,
82+ " id" :" document_two" ,
83+ " title" :" Document Two" ,
8084" text" :" Here are the contents of Document 2" ,
81- " random_key" :" this will be metadata for the document "
82- }
85+ " random_key" :" here is some random data " ,
86+ },
8387]
84- collection= Collection(" test_collection" )
8588await collection.upsert_documents(documents)
8689```
8790{% endtab %}
8891{% endtabs %}
8992
90- Document metadata can be replaced by upsertingthe document without the` text ` key .
93+ Documents can be replaced by upsertingdocuments with thesame ` id ` .
9194
9295{% tabs %}
9396{% tab title="JavaScript" %}
9497``` javascript
9598const documents = [
9699 {
97- id: " Document One" ,
98- random_key: " this will be NEW metadata for the document" ,
100+ id: " document_one" ,
101+ title: " Document One New Title" ,
102+ text: " Here is some new text for document one" ,
103+ random_key: " here is some new random data" ,
99104 },
100105 {
101- id: " Document Two" ,
102- random_key: " this will be NEW metadata for the document" ,
106+ id: " document_two" ,
107+ title: " Document Two New Title" ,
108+ text: " Here is some new text for document two" ,
109+ random_key: " here is some new random data" ,
103110 },
104111];
105112await collection .upsert_documents (documents);
@@ -110,39 +117,42 @@ await collection.upsert_documents(documents);
110117``` python
111118documents= [
112119 {
113- " id" :" Document 1" ,
114- " random_key" :" this will be NEW metadata for the document"
120+ " id" :" document_one" ,
121+ " title" :" Document One" ,
122+ " text" :" Here is some new text for document one" ,
123+ " random_key" :" here is some random data" ,
115124 },
116125 {
117- " id" :" Document 2" ,
118- " random_key" :" this will be NEW metadata for the document"
119- }
126+ " id" :" document_two" ,
127+ " title" :" Document Two" ,
128+ " text" :" Here is some new text for document two" ,
129+ " random_key" :" here is some random data" ,
130+ },
120131]
121- collection= Collection(" test_collection" )
122132await collection.upsert_documents(documents)
123133```
124134{% endtab %}
125135{% endtabs %}
126136
127- Document metadata can be mergedwith new metadata byupserting thedocument without the ` text ` key and specifying the merge option .
137+ Documents can be merged bysetting the` merge ` option. On conflict, new document keys will override old document keys .
128138
129139{% tabs %}
130140{% tab title="JavaScript" %}
131141``` javascript
132142const documents = [
133143 {
134- id: " Document One" ,
135- text: " document one contents..." ,
144+ id: " document_one" ,
145+ new_key: " this will be a new key in document one" ,
146+ random_key: " this will replace old random_key"
136147 },
137148 {
138- id: " Document Two" ,
139- text: " document two contents..." ,
149+ id: " document_two" ,
150+ new_key: " this will bew a new key in document two" ,
151+ random_key: " this will replace old random_key"
140152 },
141153];
142154await collection .upsert_documents (documents, {
143- metdata: {
144- merge: true
145- }
155+ merge: true
146156});
147157```
148158{% endtab %}
@@ -151,20 +161,17 @@ await collection.upsert_documents(documents, {
151161``` python
152162documents= [
153163 {
154- " id" :" Document 1" ,
155- " random_key" :" this will be NEW merged metadata for the document"
164+ " id" :" document_one" ,
165+ " new_key" :" this will be a new key in document one" ,
166+ " random_key" :" this will replace old random_key" ,
156167 },
157168 {
158- " id" :" Document 2" ,
159- " random_key" :" this will be NEW merged metadata for the document"
160- }
169+ " id" :" document_two" ,
170+ " new_key" :" this will be a new key in document two" ,
171+ " random_key" :" this will replace old random_key" ,
172+ },
161173]
162- collection= Collection(" test_collection" )
163- await collection.upsert_documents(documents, {
164- " metadata" : {
165- " merge" :True
166- }
167- })
174+ await collection.upsert_documents(documents, {" merge" :True })
168175```
169176{% endtab %}
170177{% endtabs %}
@@ -176,14 +183,12 @@ Documents can be retrieved using the `get_documents` method on the collection ob
176183{% tabs %}
177184{% tab title="JavaScript" %}
178185``` javascript
179- const collection = Collection (" test_collection" )
180186const documents = await collection .get_documents ({limit: 100 })
181187```
182188{% endtab %}
183189
184190{% tab title="Python" %}
185191``` python
186- collection= Collection(" test_collection" )
187192documents= await collection.get_documents({" limit" :100 })
188193```
189194{% endtab %}
@@ -198,14 +203,12 @@ The SDK supports limit-offset pagination and keyset pagination.
198203{% tabs %}
199204{% tab title="JavaScript" %}
200205``` javascript
201- const collection = pgml .newCollection (" test_collection" )
202206const documents = await collection .get_documents ({ limit: 100 , offset: 10 })
203207```
204208{% endtab %}
205209
206210{% tab title="Python" %}
207211``` python
208- collection= Collection(" test_collection" )
209212documents= await collection.get_documents({" limit" :100 ," offset" :10 })
210213```
211214{% endtab %}
@@ -216,41 +219,31 @@ documents = await collection.get_documents({ "limit": 100, "offset": 10 })
216219{% tabs %}
217220{% tab title="JavaScript" %}
218221``` javascript
219- const collection = Collection (" test_collection" )
220222const documents = await collection .get_documents ({ limit: 100 , last_row_id: 10 })
221223```
222224{% endtab %}
223225
224226{% tab title="Python" %}
225227``` python
226- collection= Collection(" test_collection" )
227228documents= await collection.get_documents({" limit" :100 ," last_row_id" :10 })
228229```
229230{% endtab %}
230231{% endtabs %}
231232
232- The` last_row_id ` can be taken from the` row_id ` field in the returned document's dictionary.
233+ The` last_row_id ` can be taken from the` row_id ` field in the returned document's dictionary. Keyset pagination does not currently work when specifying the ` order_by ` key.
233234
234235###Filtering Documents
235236
236- Metadata and full text filtering are supported just like they are invector recall .
237+ Documents can be filtered by passing inthe ` filter ` key .
237238
238239{% tabs %}
239240{% tab title="JavaScript" %}
240241``` javascript
241- const collection = pgml .newCollection (" test_collection" )
242242const documents = await collection .get_documents ({
243- limit: 100 ,
244- offset: 10 ,
243+ limit: 10 ,
245244 filter: {
246- metadata: {
247- id: {
248- $eq: 1
249- }
250- },
251- full_text_search: {
252- configuration: " english" ,
253- text: " Some full text query"
245+ id: {
246+ $eq: " document_one"
254247 }
255248 }
256249})
@@ -259,34 +252,25 @@ const documents = await collection.get_documents({
259252
260253{% tab title="Python" %}
261254``` python
262- collection= Collection(" test_collection" )
263- documents= await collection.get_documents({
264- " limit" :100 ,
265- " offset" :10 ,
266- " filter" : {
267- " metadata" : {
268- " id" : {
269- " $eq" :1
270- }
255+ documents= await collection.get_documents(
256+ {
257+ " limit" :100 ,
258+ " filter" : {
259+ " id" : {" $eq" :" document_one" },
271260 },
272- " full_text_search" : {
273- " configuration" :" english" ,
274- " text" :" Some full text query"
275- }
276261 }
277- } )
262+ )
278263```
279264{% endtab %}
280265{% endtabs %}
281266
282267###Sorting Documents
283268
284- Documents can be sorted on anymetadata key. Note that this does not currently work well with Keyset based pagination. If paginating and sorting, use Limit-Offset based pagination.
269+ Documents can be sorted on any key. Note that this does not currently work well with Keyset based pagination. If paginating and sorting, use Limit-Offset based pagination.
285270
286271{% tabs %}
287272{% tab title="JavaScript" %}
288273``` javascript
289- const collection = pgml .newCollection (" test_collection" )
290274const documents = await collection .get_documents ({
291275 limit: 100 ,
292276 offset: 10 ,
@@ -299,7 +283,6 @@ const documents = await collection.get_documents({
299283
300284{% tab title="Python" %}
301285``` python
302- collection= Collection(" test_collection" )
303286documents= await collection.get_documents({
304287" limit" :100 ,
305288" offset" :10 ,
@@ -315,39 +298,24 @@ documents = await collection.get_documents({
315298
316299Documents can be deleted with the` delete_documents ` method on the collection object.
317300
318- Metadata and full text filtering are supported just like they are in vector recall.
319-
320301{% tabs %}
321302{% tab title="JavaScript" %}
322303``` javascript
323- const collection = pgml .newCollection (" test_collection" )
324304const documents = await collection .delete_documents ({
325- metadata: {
326305 id: {
327306 $eq: 1
328307 }
329- },
330- full_text_search: {
331- configuration: " english" ,
332- text: " Some full text query"
333- }
334308})
335309```
336310{% endtab %}
337311
338312{% tab title="Python" %}
339313``` python
340- documents= await collection.delete_documents({
341- " metadata" : {
342- " id" : {
343- " $eq" :1
344- }
345- },
346- " full_text_search" : {
347- " configuration" :" english" ,
348- " text" :" Some full text query"
314+ documents= await collection.delete_documents(
315+ {
316+ " id" : {" $eq" :1 },
349317 }
350- } )
318+ )
351319```
352320{% endtab %}
353321{% endtabs %}