|
| 1 | +--- |
| 2 | +description:PostgresML client SDK for JavaScript, Python and Rust implements common use cases and PostgresML connection management. |
| 3 | +--- |
| 4 | + |
1 | 5 | #Client SDK
|
2 | 6 |
|
3 |
| -###Key Features |
| 7 | +The client SDK can be installed using standard package managers for JavaScript, Python, and Rust. Since the SDK is written in Rust, the JavaScript and Python packages come with no additional dependencies. |
| 8 | + |
| 9 | + |
| 10 | +##Installation |
| 11 | + |
| 12 | +Installing the SDK into your project is as simple as: |
| 13 | + |
| 14 | +{% tabs %} |
| 15 | +{% tab title="JavaScript " %} |
| 16 | +```bash |
| 17 | +npm i pgml |
| 18 | +``` |
| 19 | +{% endtab %} |
| 20 | + |
| 21 | +{% tab title="Python " %} |
| 22 | +```bash |
| 23 | +pip install pgml |
| 24 | +``` |
| 25 | +{% endtab %} |
| 26 | +{% endtabs %} |
| 27 | + |
| 28 | +##Getting started |
| 29 | + |
| 30 | +The SDK uses the database to perform most of its functionality. Before continuing, make sure you created a[PostgresML database](https://postgresml.org/signup) and have the`DATABASE_URL` connection string handy. |
| 31 | + |
| 32 | +###Connect to PostgresML |
| 33 | + |
| 34 | +The SDK automatically manages connections to PostgresML. The connection string can be specified as an argument to the collection constructor, or as an environment variable. |
| 35 | + |
| 36 | +If your app follows the twelve-factor convention, we recommend you configure the connection in the environment using the`PGML_DATABASE_URL` variable: |
| 37 | + |
| 38 | +```bash |
| 39 | +export PGML_DATABASE_URL=postgres://user:password@sql.cloud.postgresml.org:6432/pgml_database |
| 40 | +``` |
| 41 | + |
| 42 | +###Create a collection |
| 43 | + |
| 44 | +The SDK is written in asynchronous code, so you need to run it inside an async runtime. Both Python and JavaScript support async functions natively. |
| 45 | + |
| 46 | +{% tabs %} |
| 47 | +{% tab title="JavaScript " %} |
| 48 | +```javascript |
| 49 | +constpgml=require("pgml"); |
| 50 | + |
| 51 | +constmain=async ()=> { |
| 52 | +constcollection=pgml.newCollection("sample_collection"); |
| 53 | +} |
| 54 | +``` |
| 55 | +{% endtab %} |
| 56 | + |
| 57 | +{% tab title="Python" %} |
| 58 | +```python |
| 59 | +from pgmlimport Collection, Pipeline |
| 60 | +import asyncio |
| 61 | + |
| 62 | +asyncdefmain(): |
| 63 | + collection= Collection("sample_collection") |
| 64 | +``` |
| 65 | +{% endtab %} |
| 66 | +{% endtabs %} |
| 67 | + |
| 68 | +The above example imports the`pgml` module and creates a collection object. By itself, the collection only tracks document contents and identifiers, but once we add a pipeline, we can instruct the SDK to perform additional tasks when documents and are inserted and retrieved. |
| 69 | + |
| 70 | + |
| 71 | +###Create a pipeline |
| 72 | + |
| 73 | +Continuing the example, we will create a pipeline called`sample_pipeline`, which will use in-database embeddings generation to automatically chunk and embed documents: |
| 74 | + |
| 75 | +{% tabs %} |
| 76 | +{% tab title="JavaScript" %} |
| 77 | +```javascript |
| 78 | +// Add this code to the end of the main function from the above example. |
| 79 | +constpipeline=pgml.newPipeline("sample_pipeline", { |
| 80 | + text: { |
| 81 | + splitter: { model:"recursive_character" }, |
| 82 | + semantic_search: { |
| 83 | + model:"intfloat/e5-small", |
| 84 | + }, |
| 85 | + }, |
| 86 | +}); |
| 87 | + |
| 88 | +awaitcollection.add_pipeline(pipeline); |
| 89 | +``` |
| 90 | +{% endtab %} |
| 91 | + |
| 92 | +{% tab title="Python" %} |
| 93 | +```python |
| 94 | +# Add this code to the end of the main function from the above example. |
| 95 | +pipeline= Pipeline( |
| 96 | +"test_pipeline", |
| 97 | + { |
| 98 | +"text": { |
| 99 | +"splitter": {"model":"recursive_character" }, |
| 100 | +"semantic_search": { |
| 101 | +"model":"intfloat/e5-small", |
| 102 | + }, |
| 103 | + }, |
| 104 | + }, |
| 105 | +) |
| 106 | + |
| 107 | +await collection.add_pipeline(pipeline) |
| 108 | +``` |
| 109 | +{% endtab %} |
| 110 | +{% endtabs %} |
| 111 | + |
| 112 | +The pipeline configuration is a key/value object, where the key is the name of a column in a document, and the value is the action the SDK should perform on that column. |
| 113 | + |
| 114 | +In this example, the documents contain a column called`text` which we are instructing the SDK to chunk the contents of using the recursive character splitter, and to embed those chunks using the Hugging Face`intfloat/e5-small` embeddings model. |
| 115 | + |
| 116 | +###Add documents |
| 117 | + |
| 118 | +Once the pipeline is configured, we can start adding documents: |
| 119 | + |
| 120 | +{% tabs %} |
| 121 | +{% tab title="JavaScript" %} |
| 122 | +```javascript |
| 123 | +// Add this code to the end of the main function from the above example. |
| 124 | +constdocuments= [ |
| 125 | + { |
| 126 | + id:"Document One", |
| 127 | + text:"document one contents...", |
| 128 | + }, |
| 129 | + { |
| 130 | + id:"Document Two", |
| 131 | + text:"document two contents...", |
| 132 | + }, |
| 133 | +]; |
| 134 | + |
| 135 | +awaitcollection.upsert_documents(documents); |
| 136 | +``` |
| 137 | +{% endtab %} |
| 138 | + |
| 139 | +{% tab title="Python" %} |
| 140 | +```python |
| 141 | +# Add this code to the end of the main function in the above example. |
| 142 | +documents= [ |
| 143 | + { |
| 144 | +"id":"Document One", |
| 145 | +"text":"document one contents...", |
| 146 | + }, |
| 147 | + { |
| 148 | +"id":"Document Two", |
| 149 | +"text":"document two contents...", |
| 150 | + }, |
| 151 | +] |
| 152 | + |
| 153 | +await collection.upsert_documents(documents) |
| 154 | +``` |
| 155 | +{% endtab %} |
| 156 | +{% endtabs %} |
| 157 | + |
| 158 | +If the same document`id` is used, the SDK computes the difference between existing and new documents and only updates the chunks that have changed. |
| 159 | + |
| 160 | +###Search documents |
| 161 | + |
| 162 | +Now that the documents are stored, chunked and embedded, we can start searching the collection: |
| 163 | + |
| 164 | +{% tabs %} |
| 165 | +{% tab title="JavaScript" %} |
| 166 | +```javascript |
| 167 | +// Add this code to the end of the main function in the above example. |
| 168 | +constresults=awaitcollection.vector_search( |
| 169 | + { |
| 170 | + query: { |
| 171 | + fields: { |
| 172 | + text: { |
| 173 | + query:"Something about a document...", |
| 174 | + }, |
| 175 | + }, |
| 176 | + }, |
| 177 | + limit:2, |
| 178 | + }, |
| 179 | + pipeline, |
| 180 | +); |
| 181 | + |
| 182 | +console.log(results); |
| 183 | +``` |
| 184 | +{% endtab %} |
| 185 | + |
| 186 | +{% tab title="Python" %} |
| 187 | +```python |
| 188 | +# Add this code to the end of the main function in the above example. |
| 189 | +results=await collection.vector_search( |
| 190 | + { |
| 191 | +"query": { |
| 192 | +"fields": { |
| 193 | +"text": { |
| 194 | +"query":"Something about a document...", |
| 195 | + }, |
| 196 | + }, |
| 197 | + }, |
| 198 | +"limit":2, |
| 199 | + }, |
| 200 | + pipeline, |
| 201 | +) |
| 202 | + |
| 203 | +print(results) |
| 204 | +``` |
| 205 | +{% endtab %} |
| 206 | +{% endtabs %} |
| 207 | + |
| 208 | +We are using built-in vector search, powered by embeddings and the PostgresML[pgml.embed()](../sql-extension/pgml.embed) function, which embeds the`query` argument, compares it to the embeddings stored in the database, and returns the top two results, ranked by cosine similarity. |
| 209 | + |
| 210 | +###Run the example |
4 | 211 |
|
5 |
| -***Automated Database Management**: You can easily handle the management of database tables related to documents, text chunks, text splitters, LLM models, and embeddings. This automated management system simplifies the process of setting up and maintaining your vector search application's data structure. |
6 |
| -***Embedding Generation from Open Source Models**: Provides the ability to generate embeddings using hundreds of open source models. These models, trained on vast amounts of data, capture the semantic meaning of text and enable powerful analysis and search capabilities. |
7 |
| -***Flexible and Scalable Vector Search**: Build flexible and scalable vector search applications. PostgresML seamlessly integrates with PgVector, a PostgreSQL extension specifically designed for handling vector-based indexing and querying. By leveraging these indices, you can perform advanced searches, rank results by relevance, and retrieve accurate and meaningful information from your database. |
| 212 | +Since the SDK is using async code, both JavaScript and Python need a little bit of code to run it correctly: |
8 | 213 |
|
9 |
| -###Use Cases |
| 214 | +{% tabs %} |
| 215 | +{% tab title="JavaScript" %} |
| 216 | +```javascript |
| 217 | +main().then(()=> { |
| 218 | +console.log("SDK example complete"); |
| 219 | +}); |
| 220 | +``` |
| 221 | +{% endtab %} |
10 | 222 |
|
11 |
| -* Search: Embeddings are commonly used for search functionalities, where results are ranked by relevance to a query string. By comparing the embeddings of query strings and documents, you can retrieve search results in order of their similarity or relevance. |
12 |
| -* Clustering: With embeddings, you can group text strings by similarity, enabling clustering of related data. By measuring the similarity between embeddings, you can identify clusters or groups of text strings that share common characteristics. |
13 |
| -* Recommendations: Embeddings play a crucial role in recommendation systems. By identifying items with related text strings based on their embeddings, you can provide personalized recommendations to users. |
14 |
| -* Anomaly Detection: Anomaly detection involves identifying outliers or anomalies that have little relatedness to the rest of the data. Embeddings can aid in this process by quantifying the similarity between text strings and flagging outliers. |
15 |
| -* Classification: Embeddings are utilized in classification tasks, where text strings are classified based on their most similar label. By comparing the embeddings of text strings and labels, you can classify new text strings into predefined categories. |
| 223 | +{% tab title="Python" %} |
| 224 | +```python |
| 225 | +if__name__=="__main__": |
| 226 | + asyncio.run(main()) |
| 227 | +``` |
| 228 | +{% endtab %} |
| 229 | +{% endtabs %} |
16 | 230 |
|
17 |
| -###HowtheSDK Works |
| 231 | +Once you runtheexample, you should see something like this in the terminal: |
18 | 232 |
|
19 |
| -SDK streamlines the development of vector search applications by abstracting away the complexities of database management and indexing. Here's an overview of how the SDK works: |
| 233 | +```bash |
| 234 | +[ |
| 235 | + { |
| 236 | +"chunk":"document one contents...", |
| 237 | +"document": {"id":"Document One","text":"document one contents..."}, |
| 238 | +"score": 0.9034339189529419, |
| 239 | + }, |
| 240 | + { |
| 241 | +"chunk":"document two contents...", |
| 242 | +"document": {"id":"Document Two","text":"document two contents..."}, |
| 243 | +"score": 0.8983734250068665, |
| 244 | + }, |
| 245 | +] |
| 246 | +``` |
20 | 247 |
|
21 |
| -***Automatic Document and Text Chunk Management**: The SDK provides a convenient interface to manage documents and pipelines, automatically handling chunking and embedding for you. You can easily organize and structure your text data within the PostgreSQL database. |
22 |
| -***Open Source Model Integration**: With the SDK, you can seamlessly incorporate a wide range of open source models to generate high-quality embeddings. These models capture the semantic meaning of text and enable powerful analysis and search capabilities. |
23 |
| -***Embedding Indexing**: The Python SDK utilizes the PgVector extension to efficiently index the embeddings generated by the open source models. This indexing process optimizes search performance and allows for fast and accurate retrieval of relevant results. |
24 |
| -***Querying and Search**: Once the embeddings are indexed, you can perform vector-based searches on the documents and text chunks stored in the PostgreSQL database. The SDK provides intuitive methods for executing queries and retrieving search results. |
|