- Notifications
You must be signed in to change notification settings - Fork328
SQL api docs#1426
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Merged
Uh oh!
There was an error while loading.Please reload this page.
Merged
SQL api docs#1426
Changes fromall commits
Commits
Show all changes
2 commits Select commitHold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Uh oh!
There was an error while loading.Please reload this page.
Jump to
Jump to file
Failed to load files.
Loading
Uh oh!
There was an error while loading.Please reload this page.
Diff view
Diff view
There are no files selected for viewing
4 changes: 2 additions & 2 deletionspgml-cms/docs/SUMMARY.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -15,7 +15,7 @@ | ||
## API | ||
* [Overview](api/apis.md) | ||
* [SQLextension](api/sql-extension/README.md) | ||
* [pgml.deploy()](api/sql-extension/pgml.deploy.md) | ||
* [pgml.embed()](api/sql-extension/pgml.embed.md) | ||
* [pgml.chunk()](api/sql-extension/pgml.chunk.md) | ||
@@ -85,7 +85,7 @@ | ||
* [Documents](resources/data-storage-and-retrieval/documents.md) | ||
* [Partitioning](resources/data-storage-and-retrieval/partitioning.md) | ||
* [LLM based pipelines with PostgresML and dbt (data build tool)](resources/data-storage-and-retrieval/llm-based-pipelines-with-postgresml-and-dbt-data-build-tool.md) | ||
* [Benchmarks](resources/benchmarks/postgresml-is-8-40x-faster-than-python-http-microservices.md) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others.Learn more. This previously led to an empty page. | ||
* [PostgresML is 8-40x faster than Python HTTP microservices](resources/benchmarks/postgresml-is-8-40x-faster-than-python-http-microservices.md) | ||
* [Scaling to 1 Million Requests per Second](resources/benchmarks/million-requests-per-second.md) | ||
* [MindsDB vs PostgresML](resources/benchmarks/mindsdb-vs-postgresml.md) | ||
5 changes: 3 additions & 2 deletionspgml-cms/docs/api/apis.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters
211 changes: 169 additions & 42 deletionspgml-cms/docs/api/sql-extension/README.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,69 +1,196 @@ | ||
--- | ||
description: >- | ||
ThePostgresML extension for PostgreSQL provides Machine Learning and Artificial | ||
Intelligence APIs with access to algorithms to train your models, or download | ||
state-of-the-art open source models fromHugging Face. | ||
--- | ||
# SQLextension | ||
PostgresML is a PostgreSQL extension which adds SQL functions to the database. Those functions provide access to AI models downloaded from Hugging Face, and classical machine learning algorithms like XGBoost and LightGBM. | ||
Our SQL API is stable and safetouse in your applications, whilethe models andalgorithms we support continuetoevolveandimprove. | ||
## Open-source LLMs | ||
PostgresML defines two SQL functions which use [🤗 Hugging Face](https://huggingface.co/transformers) transformers and embeddingsmodels, running directly in the database: | ||
| Function | Description | | ||
|---------------|-------------| | ||
| [pgml.embed()](pgml.embed) | Generate embeddings using latest sentence transformers from Hugging Face. | | ||
| [pgml.transform()](pgml.transform/) | Text generation using LLMs like Llama, Mixtral, and many more, with models downloaded from Hugging Face. | | ||
| pgml.transform_stream() | Streaming version of [pgml.transform()](pgml.transform/), which fetches partial responses as they are being generated by the model, substantially decreasing time to first token. | | ||
| [pgml.tune()](pgml.tune) | Perform fine tuning tasks on Hugging Face models, using data stored in the database. | | ||
###Example | ||
Using a SQL function for interacting with open-source models makes things really easy: | ||
{% tabs %} | ||
{% tab title="SQL" %} | ||
```postgresql | ||
SELECT pgml.embed( | ||
'intfloat/e5-small', | ||
'This text will be embedded using the intfloat/e5-small model.' | ||
) AS embedding; | ||
``` | ||
{% endtab %} | ||
{% tab title="Output" %} | ||
``` | ||
embedding | ||
------------------------------------------- | ||
{-0.028478337,-0.06275077,-0.04322059, [...] | ||
``` | ||
{% endtab %} | ||
{% endtabs %} | ||
Using the `pgml` SQL functions inside regular queries, it's possible to add embeddings and LLM-generated text inside any query, without the data ever leaving the database, removing the cost of a remote network call. | ||
## Classical machine learning | ||
PostgresML defines four SQL functions which allow training regression, classification, and clustering models on tabular data: | ||
| Function | Description | | ||
|---------------|-------------| | ||
| [pgml.train()](pgml.train/) | Train a model on PostgreSQL tables or views using any algorithm from Scikit-learn, with the additional support for XGBoost, LightGBM and Catboost. | | ||
| [pgml.predict()](pgml.predict/) | Run inference on live application data using a model trained with [pgml.train()](pgml.train/). | | ||
| [pgml.deploy()](pgml.deploy) | Deploy a specific version of a model trained with pgml.train(), using your own accuracy metrics. | | ||
| pgml.load_dataset() | Load any of the toy datasets from Scikit-learn or any dataset from Hugging Face. | | ||
### Example | ||
#### Load data | ||
Using `pgml.load_dataset()`, we can load an example classification dataset from Scikit-learn: | ||
{% tabs %} | ||
{% tab title="SQL" %} | ||
```postgresql | ||
SELECT * | ||
FROM pgml.load_dataset('digits'); | ||
``` | ||
{% endtab %} | ||
{% tab title="Output" %} | ||
``` | ||
table_name | rows | ||
-------------+------ | ||
pgml.digits | 1797 | ||
(1 row) | ||
``` | ||
{% endtab %} | ||
{% endtabs %} | ||
#### Train a model | ||
Once we have some data, we can train a model on this data using [pgml.train()](pgml.train/): | ||
{% tabs %} | ||
{% tab title="SQL" %} | ||
```postgresql | ||
SELECT * | ||
FROM pgml.train( | ||
project_name => 'My project name', | ||
task => 'classification', | ||
relation_name =>'pgml.digits', | ||
y_column_name => 'target', | ||
algorithm => 'xgboost', | ||
); | ||
``` | ||
{% endtab %} | ||
{% tab title="Output" %} | ||
``` | ||
INFO: Metrics: { | ||
"f1": 0.8755124, | ||
"precision": 0.87670505, | ||
"recall": 0.88005465, | ||
"accuracy": 0.87750554, | ||
"mcc": 0.8645154, | ||
"fit_time": 0.33504912, | ||
"score_time": 0.001842427 | ||
} | ||
project | task | algorithm | deployed | ||
-----------------+----------------+-----------+---------- | ||
My project name | classification | xgboost | t | ||
(1 row) | ||
``` | ||
{% endtab %} | ||
{% endtabs %} | ||
[pgml.train()](pgml.train/) reads data from the table, using the `target` column as the label, automatically splits the dataset into test and train sets, and trains an XGBoost model. Our extension supports more than 50 machine learning algorithms, and you can train a model using any of them by just changing the name of the `algorithm` argument. | ||
#### Real time inference | ||
Now that we have a model, we can use it to predict new data points, in real time, on live application data: | ||
{% tabs %} | ||
{% tab title="SQL" %} | ||
```postgresql | ||
SELECT | ||
target, | ||
pgml.predict( | ||
'My project name', | ||
image | ||
) AS prediction | ||
FROM | ||
pgml.digits | ||
LIMIT 1; | ||
``` | ||
{% endtab %} | ||
{% tab title="Output" %} | ||
``` | ||
target | prediction | ||
--------+------------ | ||
0 | 0 | ||
(1 row) | ||
``` | ||
{% endtab %} | ||
{% endtabs %} | ||
#### Change model version | ||
The train function automatically deploys the best model into production, using the precision score relevant to the type of the model. If you prefer to deploy models using your own accuracy metrics, the [pgml.deploy()](pgml.deploy) function can manually change which model version is used for subsequent database queries: | ||
{% tabs %} | ||
{% tab title="SQL" %} | ||
```postgresql | ||
SELECT * | ||
FROM | ||
pgml.deploy( | ||
'My project name', | ||
strategy => 'most_recent', | ||
algorithm => 'xgboost' | ||
); | ||
``` | ||
{% endtab %} | ||
{% tab title="Output" %} | ||
``` | ||
project | strategy | algorithm | ||
-----------------+-------------+----------- | ||
My project name | most_recent | xgboost | ||
(1 row) | ||
``` | ||
{% endtab %} | ||
{% endtabs %} |
2 changes: 1 addition & 1 deletionpgml-dashboard/src/components/cms/index_link/mod.rs
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.