- Notifications
You must be signed in to change notification settings - Fork328
more docs#1425
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Merged
Uh oh!
There was an error while loading.Please reload this page.
Merged
more docs#1425
Changes fromall commits
Commits
Show all changes
3 commits Select commitHold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Uh oh!
There was an error while loading.Please reload this page.
Jump to
Jump to file
Failed to load files.
Loading
Uh oh!
There was an error while loading.Please reload this page.
Diff view
Diff view
There are no files selected for viewing
Binary file modifiedpgml-cms/docs/.gitbook/assets/fdw_1.png
Loading
Sorry, something went wrong.Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modifiedpgml-cms/docs/.gitbook/assets/logical_replication_1.png
Loading
Sorry, something went wrong.Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1 change: 0 additions & 1 deletionpgml-cms/docs/SUMMARY.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters
49 changes: 34 additions & 15 deletionspgml-cms/docs/api/apis.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,28 +1,47 @@ | ||
--- | ||
description: Overview of the PostgresML SQL API and SDK. | ||
--- | ||
# API overview | ||
PostgresMLis a PostgreSQL extension whichaddsSQL functionsto the database where it's installed. The functions work with modern machine learning algorithmsandlatest open source LLMs while maintaining a stable API signature. They can be used by any application that connects to the database. | ||
In addition to the SQL API, we built and maintain a client SDK for JavaScript, PythonandRust. The SDK uses the same extension functionality to implement commonML& AI use cases, like retrieval-augmented generation (RAG), chatbots, and semantic & hybrid search engines. | ||
Using the SDK is optional, and you canimplement thesame functionality with standard SQL queries. If you feel more comfortable using a programming language, the SDK can help you to get started quickly. | ||
##[SQLextension](sql-extension/) | ||
The PostgreSQL extension provides all of the ML & AI functionality, like training models and inference, via SQL functions. The functions are designed for ML practitioners to use dozens of ML algorithms to train models, and run real time inference, on live application data. Additionally, the extension provides access to the latest Hugging Face transformers for a wide range of NLP tasks. | ||
### Functions | ||
The following functions are implemented and maintained bythePostgresML extension: | ||
| Function name | Description | | ||
|------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | ||
| [pgml.embed()](sql-extension/pgml.embed) | Generate embeddings inside the database using open source embedding models from Hugging Face. | | ||
| [pgml.transform()](sql-extension/pgml.transform/) | Download and run latest Hugging Face transformer models, like Llama, Mixtral, and many more to perform various NLP tasks like text generation, summarization, sentiment analysis and more. | | ||
| [pgml.train()](sql-extension/pgml.train/) | Train a machine learning model on data from a Postgres table or view. Supports XGBoost, LightGBM, Catboost and all Scikit-learn algorithms. | | ||
| [pgml.deploy()](sql-extension/pgml.deploy) | Deploy a version of the model created with pgml.train(). | | ||
| [pgml.predict()](sql-extension/pgml.predict/) | Perform real time inference using a model trained with pgml.train() on live application data. | | ||
| [pgml.tune()](sql-extension/pgml.tune) | Run LoRA fine tuning on an open source model from Hugging Face using data from a Postgres table or view. | | ||
Together with standard database functionality provided by PostgreSQL, these functions allowtocreateandmanagetheentire life cycle of a machine learning application. | ||
## [ClientSDK](client-sdk/) | ||
The client SDK implements best practices and common use cases, usingthePostgresML SQL functions and standard PostgreSQL features to do it. The SDK core is written in Rust, which manages creating and running queries, connection pooling, and error handling. | ||
For each additional language we support (current JavaScript and Python), we create and publish language-native bindings. This architecture ensures all programming languages we support have identical APIs and similar performance when interacting with PostgresML. | ||
### Use cases | ||
The SDK currently implements the following use cases: | ||
| Use case | Description | | ||
|----------|---------| | ||
| [Collections](client-sdk/collections) | Manage documents, embeddings, full text and vector search indexes, and more, using one simple interface. | | ||
| [Pipelines](client-sdk/pipelines) | Easily build complex queries to interact with collections using a programmable interface. | | ||
| [Vector search](client-sdk/search) | Implement semantic search using in-database generated embeddings and ANN vector indexes. | | ||
| [Document search](client-sdk/document-search) | Implement hybrid full text search using in-database generated embeddings and PostgreSQL tsvector indexes. | |
255 changes: 239 additions & 16 deletionspgml-cms/docs/api/client-sdk/README.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,24 +1,247 @@ | ||
--- | ||
description: PostgresML client SDK for JavaScript, Python and Rust implements common use cases and PostgresML connection management. | ||
--- | ||
# Client SDK | ||
The client SDK can be installed using standard package managers for JavaScript, Python, and Rust. Since the SDK is written in Rust, the JavaScript and Python packages come with no additional dependencies. | ||
## Installation | ||
Installing the SDK into your project is as simple as: | ||
{% tabs %} | ||
{% tab title="JavaScript " %} | ||
```bash | ||
npm i pgml | ||
``` | ||
{% endtab %} | ||
{% tab title="Python " %} | ||
```bash | ||
pip install pgml | ||
``` | ||
{% endtab %} | ||
{% endtabs %} | ||
## Getting started | ||
The SDK uses the database to perform most of its functionality. Before continuing, make sure you created a [PostgresML database](https://postgresml.org/signup) and have the `DATABASE_URL` connection string handy. | ||
### Connect to PostgresML | ||
The SDK automatically manages connections to PostgresML. The connection string can be specified as an argument to the collection constructor, or as an environment variable. | ||
If your app follows the twelve-factor convention, we recommend you configure the connection in the environment using the `PGML_DATABASE_URL` variable: | ||
```bash | ||
export PGML_DATABASE_URL=postgres://user:password@sql.cloud.postgresml.org:6432/pgml_database | ||
``` | ||
### Create a collection | ||
The SDK is written in asynchronous code, so you need to run it inside an async runtime. Both Python and JavaScript support async functions natively. | ||
{% tabs %} | ||
{% tab title="JavaScript " %} | ||
```javascript | ||
const pgml = require("pgml"); | ||
const main = async () => { | ||
const collection = pgml.newCollection("sample_collection"); | ||
} | ||
``` | ||
{% endtab %} | ||
{% tab title="Python" %} | ||
```python | ||
from pgml import Collection, Pipeline | ||
import asyncio | ||
async def main(): | ||
collection = Collection("sample_collection") | ||
``` | ||
{% endtab %} | ||
{% endtabs %} | ||
The above example imports the `pgml` module and creates a collection object. By itself, the collection only tracks document contents and identifiers, but once we add a pipeline, we can instruct the SDK to perform additional tasks when documents and are inserted and retrieved. | ||
### Create a pipeline | ||
Continuing the example, we will create a pipeline called `sample_pipeline`, which will use in-database embeddings generation to automatically chunk and embed documents: | ||
{% tabs %} | ||
{% tab title="JavaScript" %} | ||
```javascript | ||
// Add this code to the end of the main function from the above example. | ||
const pipeline = pgml.newPipeline("sample_pipeline", { | ||
text: { | ||
splitter: { model: "recursive_character" }, | ||
semantic_search: { | ||
model: "intfloat/e5-small", | ||
}, | ||
}, | ||
}); | ||
await collection.add_pipeline(pipeline); | ||
``` | ||
{% endtab %} | ||
{% tab title="Python" %} | ||
```python | ||
# Add this code to the end of the main function from the above example. | ||
pipeline = Pipeline( | ||
"test_pipeline", | ||
{ | ||
"text": { | ||
"splitter": { "model": "recursive_character" }, | ||
"semantic_search": { | ||
"model": "intfloat/e5-small", | ||
}, | ||
}, | ||
}, | ||
) | ||
await collection.add_pipeline(pipeline) | ||
``` | ||
{% endtab %} | ||
{% endtabs %} | ||
The pipeline configuration is a key/value object, where the key is the name of a column in a document, and the value is the action the SDK should perform on that column. | ||
In this example, the documents contain a column called `text` which we are instructing the SDK to chunk the contents of using the recursive character splitter, and to embed those chunks using the Hugging Face `intfloat/e5-small` embeddings model. | ||
### Add documents | ||
Once the pipeline is configured, we can start adding documents: | ||
{% tabs %} | ||
{% tab title="JavaScript" %} | ||
```javascript | ||
// Add this code to the end of the main function from the above example. | ||
const documents = [ | ||
{ | ||
id: "Document One", | ||
text: "document one contents...", | ||
}, | ||
{ | ||
id: "Document Two", | ||
text: "document two contents...", | ||
}, | ||
]; | ||
await collection.upsert_documents(documents); | ||
``` | ||
{% endtab %} | ||
{% tab title="Python" %} | ||
```python | ||
# Add this code to the end of the main function in the above example. | ||
documents = [ | ||
{ | ||
"id": "Document One", | ||
"text": "document one contents...", | ||
}, | ||
{ | ||
"id": "Document Two", | ||
"text": "document two contents...", | ||
}, | ||
] | ||
await collection.upsert_documents(documents) | ||
``` | ||
{% endtab %} | ||
{% endtabs %} | ||
If the same document `id` is used, the SDK computes the difference between existing and new documents and only updates the chunks that have changed. | ||
### Search documents | ||
Now that the documents are stored, chunked and embedded, we can start searching the collection: | ||
{% tabs %} | ||
{% tab title="JavaScript" %} | ||
```javascript | ||
// Add this code to the end of the main function in the above example. | ||
const results = await collection.vector_search( | ||
{ | ||
query: { | ||
fields: { | ||
text: { | ||
query: "Something about a document...", | ||
}, | ||
}, | ||
}, | ||
limit: 2, | ||
}, | ||
pipeline, | ||
); | ||
console.log(results); | ||
``` | ||
{% endtab %} | ||
{% tab title="Python" %} | ||
```python | ||
# Add this code to the end of the main function in the above example. | ||
results = await collection.vector_search( | ||
{ | ||
"query": { | ||
"fields": { | ||
"text": { | ||
"query": "Something about a document...", | ||
}, | ||
}, | ||
}, | ||
"limit": 2, | ||
}, | ||
pipeline, | ||
) | ||
print(results) | ||
``` | ||
{% endtab %} | ||
{% endtabs %} | ||
We are using built-in vector search, powered by embeddings and the PostgresML [pgml.embed()](../sql-extension/pgml.embed) function, which embeds the `query` argument, compares it to the embeddings stored in the database, and returns the top two results, ranked by cosine similarity. | ||
### Run the example | ||
Since the SDK is using async code, both JavaScript and Python need a little bit of code to run it correctly: | ||
{% tabs %} | ||
{% tab title="JavaScript" %} | ||
```javascript | ||
main().then(() => { | ||
console.log("SDK example complete"); | ||
}); | ||
``` | ||
{% endtab %} | ||
{% tab title="Python" %} | ||
```python | ||
if __name__ == "__main__": | ||
asyncio.run(main()) | ||
``` | ||
{% endtab %} | ||
{% endtabs %} | ||
Once you runtheexample, you should see something like this in the terminal: | ||
```bash | ||
[ | ||
{ | ||
"chunk": "document one contents...", | ||
"document": {"id": "Document One", "text": "document one contents..."}, | ||
"score": 0.9034339189529419, | ||
}, | ||
{ | ||
"chunk": "document two contents...", | ||
"document": {"id": "Document Two", "text": "document two contents..."}, | ||
"score": 0.8983734250068665, | ||
}, | ||
] | ||
``` | ||
Oops, something went wrong.
Uh oh!
There was an error while loading.Please reload this page.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.