Apr 26, 2024 · Apr 26, 2024 · Apr 26, 2024 · Apr 26, 2024
diff --git a/pgml-cms/docs/SUMMARY.md b/pgml-cms/docs/SUMMARY.md
 ##API

 *[Overview](api/apis.md)
 *[SQLExtension](api/sql-extension/README.md)
 *[SQLextension](api/sql-extension/README.md)
 *[pgml.deploy()](api/sql-extension/pgml.deploy.md)
 *[pgml.embed()](api/sql-extension/pgml.embed.md)
 *[pgml.chunk()](api/sql-extension/pgml.chunk.md)
 *[Documents](resources/data-storage-and-retrieval/documents.md)
 *[Partitioning](resources/data-storage-and-retrieval/partitioning.md)
 *[LLM based pipelines with PostgresML and dbt (data build tool)](resources/data-storage-and-retrieval/llm-based-pipelines-with-postgresml-and-dbt-data-build-tool.md)
 *[Benchmarks](resources/benchmarks/README.md)
 *[Benchmarks](resources/benchmarks/postgresml-is-8-40x-faster-than-python-http-microservices.md)
 *[PostgresML is 8-40x faster than Python HTTP microservices](resources/benchmarks/postgresml-is-8-40x-faster-than-python-http-microservices.md)
 *[Scaling to 1 Million Requests per Second](resources/benchmarks/million-requests-per-second.md)
 *[MindsDB vs PostgresML](resources/benchmarks/mindsdb-vs-postgresml.md)
diff --git a/pgml-cms/docs/api/apis.md b/pgml-cms/docs/api/apis.md

 The following functions are implemented and maintained by the PostgresML extension:

 | Functionname   | Description                                                                                                                                                                                        |
 | Function    | Description                                                                                                                                                                                        |
 |------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
 | [pgml.embed()](sql-extension/pgml.embed)     | Generate embeddings inside the database using open source embedding models from Hugging Face.                                                                                                |
 | [pgml.transform()](sql-extension/pgml.transform/) | Download and run latest Hugging Face transformer models, like Llama, Mixtral, and many more to perform various NLP tasks like text generation, summarization, sentiment analysis and more. |
 | pgml.transform_stream() | Streaming version of [pgml.transform()](sql-extension/pgml.transform/). Retrieve tokens as they are generated by the LLM, decreasing time to first token. |
 | [pgml.train()](sql-extension/pgml.train/)     | Train a machine learning model on data from a Postgres table or view. Supports XGBoost, LightGBM, Catboost and all Scikit-learn algorithms.       |
 | [pgml.deploy()](sql-extension/pgml.deploy)    | Deploy a version of the model created with pgml.train(). |
 | [pgml.predict()](sql-extension/pgml.predict/) | Perform real time inference using a model trained with pgml.train() on live application data. |

 The client SDK implements best practices and common use cases, using the PostgresML SQL functions and standard PostgreSQL features to do it. The SDK core is written in Rust, which manages creating and running queries, connection pooling, and error handling.

 For each additional language we support (current JavaScript and Python), we create and publish language-native bindings. This architecture ensures all programming languages we support have identical APIs and similar performance when interacting with PostgresML.
 For each additional language we support (currently JavaScript and Python), we create and publish language-native bindings. This architecture ensures all programming languages we support have identical APIs and similar performance when interacting with PostgresML.

 ### Use cases

diff --git a/pgml-cms/docs/api/sql-extension/README.md b/pgml-cms/docs/api/sql-extension/README.md
 ---
 description: >-
  Thepgml extension for PostgreSQL provides Machine Learning and Artificial
  ThePostgresML extension for PostgreSQL provides Machine Learning and Artificial
  Intelligence APIs with access to algorithms to train your models, or download
 SOTA open source models fromHuggingFace.
 state-of-the-art open source models fromHugging Face.
 ---

 # SQLExtension
 # SQLextension

 ## Open Source Models
 PostgresML is a PostgreSQL extension which adds SQL functions to the database. Those functions provide access to AI models downloaded from Hugging Face, and classical machine learning algorithms like XGBoost and LightGBM.

 PostgresML integrates [🤗 Hugging Face Transformers](https://huggingface.co/transformers)tobring state-of-the-art models intothedata layer. There are tens of thousands of pre-trainedmodelswith pipelines to turn raw inputs into useful results. Many LLMs have been publishedandmade available for download. You will wanttobrowse all the [models](https://huggingface.co/models) available to find the perfect solution for your [dataset](https://huggingface.co/dataset)and[task](https://huggingface.co/tasks). The pgml extension provides a few APIs for different use cases:
 Our SQL API is stable and safetouse in your applications, whilethe models andalgorithms we support continuetoevolveandimprove.

 * [pgml.embed.md](pgml.embed.md "mention") returns vector embeddings for nearest neighbor searches and other vector database use cases
 * [pgml.generate.md](pgml.generate.md "mention") returns streaming text responses for chatbots
 * [pgml.transform](../../api/sql-extension/pgml.transform/ "mention") allows you to perform dozens of natural language processing (NLP) tasks with thousands of models, like sentiment analysis, question and answering, translation, summarization and text generation
 * [pgml.tune.md](pgml.tune.md "mention") fine tunes an open source model on your own data
 ## Open-source LLMs

 ## Train & deploy your ownmodels
 PostgresML defines two SQL functions which use [🤗 Hugging Face](https://huggingface.co/transformers) transformers and embeddingsmodels, running directly in the database:

 PostgresML also supports more than 50 machine learning algorithms to train your own models for classification, regression or clustering. We organize a family of Models in Projects that are intended to address a particular opportunity. Different algorithms can be used in the same Project, to test and compare the performance of various approaches, and track progress over time, all within your database.
 | Function | Description |
 |---------------|-------------|
 | [pgml.embed()](pgml.embed) | Generate embeddings using latest sentence transformers from Hugging Face. |
 | [pgml.transform()](pgml.transform/) | Text generation using LLMs like Llama, Mixtral, and many more, with models downloaded from Hugging Face. |
 | pgml.transform_stream() | Streaming version of [pgml.transform()](pgml.transform/), which fetches partial responses as they are being generated by the model, substantially decreasing time to first token. |
 | [pgml.tune()](pgml.tune) | Perform fine tuning tasks on Hugging Face models, using data stored in the database. |

 ###Train
 ###Example

 Training creates a Model based on the data in your database.
 Using a SQL function for interacting with open-source models makes things really easy:

 ```sql
 SELECT pgml.train(
  project_name = > 'Sales Forecast',
  task => 'regression',
  relation_name => 'hist_sales',
  y_column_name => 'next_sales',
  algorithm => 'xgboost'
 );
 {% tabs %}
 {% tab title="SQL" %}

 ```postgresql
 SELECT pgml.embed(
  'intfloat/e5-small',
  'This text will be embedded using the intfloat/e5-small model.'
 ) AS embedding;
 ```

 {% endtab %}
 {% tab title="Output" %}

 ```
       embedding
 -------------------------------------------
 {-0.028478337,-0.06275077,-0.04322059, [...]
 ```

 See [pgml.train](../../api/sql-extension/pgml.train/README.md) for more information.
 {% endtab %}
 {% endtabs %}

 Using the `pgml` SQL functions inside regular queries, it's possible to add embeddings and LLM-generated text inside any query, without the data ever leaving the database, removing the cost of a remote network call.

 ## Classical machine learning

 PostgresML defines four SQL functions which allow training regression, classification, and clustering models on tabular data:

 | Function | Description |
 |---------------|-------------|
 | [pgml.train()](pgml.train/) | Train a model on PostgreSQL tables or views using any algorithm from Scikit-learn, with the additional support for XGBoost, LightGBM and Catboost. |
 | [pgml.predict()](pgml.predict/) | Run inference on live application data using a model trained with [pgml.train()](pgml.train/). |
 | [pgml.deploy()](pgml.deploy) | Deploy a specific version of a model trained with pgml.train(), using your own accuracy metrics. |
 | pgml.load_dataset() | Load any of the toy datasets from Scikit-learn or any dataset from Hugging Face. |

 ### Example

 #### Load data

 ### Deploy
 Using `pgml.load_dataset()`, we can load an example classification dataset from Scikit-learn:

 Deploy an active Model for a particular Project, using a deployment strategy to select the best model.
 {% tabs %}
 {% tab title="SQL" %}

 ```sql
 SELECT pgml.deploy(
  project_name => 'Sales Forecast',
  strategy => 'best_score',
  algorithm => 'xgboost'
 ```postgresql
 SELECT *
 FROM pgml.load_dataset('digits');
 ```

 {% endtab %}
 {% tab title="Output" %}

 ```
 table_name  | rows
 -------------+------
 pgml.digits | 1797
 (1 row)
 ```

 {% endtab %}
 {% endtabs %}

 #### Train a model

 Once we have some data, we can train a model on this data using [pgml.train()](pgml.train/):

 {% tabs %}
 {% tab title="SQL" %}

 ```postgresql
 SELECT *
 FROM pgml.train(
  project_name => 'My project name',
  task => 'classification',
  relation_name =>'pgml.digits',
  y_column_name => 'target',
  algorithm => 'xgboost',
 );
 ```

 See [pgml.deploy.md](pgml.deploy.md "mention") for more information.
 {% endtab %}
 {% tab title="Output" %}

 ### Predict
 ```
 INFO:  Metrics: {
  "f1": 0.8755124,
  "precision": 0.87670505,
  "recall": 0.88005465,
  "accuracy": 0.87750554,
  "mcc": 0.8645154,
  "fit_time": 0.33504912,
  "score_time": 0.001842427
 }

     project     |      task      | algorithm | deployed
 -----------------+----------------+-----------+----------
 My project name | classification | xgboost   | t
 (1 row)

 Use your Model on novel data points not seen during training to infer a new data point.
 ```

 {% endtab %}
 {% endtabs %}

 [pgml.train()](pgml.train/) reads data from the table, using the `target` column as the label, automatically splits the dataset into test and train sets, and trains an XGBoost model. Our extension supports more than 50 machine learning algorithms, and you can train a model using any of them by just changing the name of the `algorithm` argument.


 #### Real time inference

 Now that we have a model, we can use it to predict new data points, in real time, on live application data:

 ```sql
 SELECT pgml.predict(
  project_name => 'Sales Forecast',
  features => ARRAY[
    last_week_sales,
    week_of_year
  ]
 {% tabs %}
 {% tab title="SQL" %}

 ```postgresql
 SELECT
  target,
  pgml.predict(
    'My project name',
    image
 ) AS prediction
 FROM new_sales
 ORDER BY prediction DESC;
 FROM
  pgml.digits
 LIMIT 1;
 ```

 {% endtab %}
 {% tab title="Output" %}

 ```
 target | prediction
 --------+------------
      0 |          0
 (1 row)
 ```

 {% endtab %}
 {% endtabs %}

 #### Change model version

 The train function automatically deploys the best model into production, using the precision score relevant to the type of the model. If you prefer to deploy models using your own accuracy metrics, the [pgml.deploy()](pgml.deploy) function can manually change which model version is used for subsequent database queries:

 {% tabs %}
 {% tab title="SQL" %}

 ```postgresql
 SELECT *
 FROM
  pgml.deploy(
    'My project name',
    strategy => 'most_recent',
    algorithm => 'xgboost'
 );
 ```

 {% endtab %}
 {% tab title="Output" %}

 ```
     project     |  strategy   | algorithm
 -----------------+-------------+-----------
 My project name | most_recent | xgboost
 (1 row)
 ```

 See[pgml.predict](../../api/sql-extension/pgml.predict/ "mention") for more information.
 {% endtab %}
 {% endtabs %}
diff --git a/pgml-dashboard/src/components/cms/index_link/mod.rs b/pgml-dashboard/src/components/cms/index_link/mod.rs
        self
    }

    // Adds a suffix to this and all children ids.
    // Adds a suffix to this and all children ids.
    // this prevents id collision with multiple naves on one screen
    // like d-none for mobile nav
    pub fn id_suffix(mut self, id_suffix: &str) -> IndexLink {
Original file line number	Diff line number	Diff line change
Expand Up		@@ -15,7 +15,7 @@
		##API

		*[Overview](api/apis.md)
		*[SQLExtension](api/sql-extension/README.md)
		*[SQLextension](api/sql-extension/README.md)
		*[pgml.deploy()](api/sql-extension/pgml.deploy.md)
		*[pgml.embed()](api/sql-extension/pgml.embed.md)
		*[pgml.chunk()](api/sql-extension/pgml.chunk.md)
Expand DownExpand Up		@@ -85,7 +85,7 @@
		*[Documents](resources/data-storage-and-retrieval/documents.md)
		*[Partitioning](resources/data-storage-and-retrieval/partitioning.md)
		*[LLM based pipelines with PostgresML and dbt (data build tool)](resources/data-storage-and-retrieval/llm-based-pipelines-with-postgresml-and-dbt-data-build-tool.md)
		*[Benchmarks](resources/benchmarks/README.md)
		*[Benchmarks](resources/benchmarks/postgresml-is-8-40x-faster-than-python-http-microservices.md)
Copy link ContributorAuthor levkkApr 26, 2024 Choose a reason for hiding this comment The reason will be displayed to describe this comment to others.Learn more. This previously led to an empty page.
		*[PostgresML is 8-40x faster than Python HTTP microservices](resources/benchmarks/postgresml-is-8-40x-faster-than-python-http-microservices.md)
		*[Scaling to 1 Million Requests per Second](resources/benchmarks/million-requests-per-second.md)
		*[MindsDB vs PostgresML](resources/benchmarks/mindsdb-vs-postgresml.md)
Expand Down
Original file line number	Diff line number	Diff line change
Expand Up		@@ -18,10 +18,11 @@ The PostgreSQL extension provides all of the ML & AI functionality, like trainin

		The following functions are implemented and maintained by the PostgresML extension:

		\| Functionname \| Description \|
		\| Function \| Description \|
		\|------------------\|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------\|
		\| [pgml.embed()](sql-extension/pgml.embed) \| Generate embeddings inside the database using open source embedding models from Hugging Face. \|
		\| [pgml.transform()](sql-extension/pgml.transform/) \| Download and run latest Hugging Face transformer models, like Llama, Mixtral, and many more to perform various NLP tasks like text generation, summarization, sentiment analysis and more. \|
		\| pgml.transform_stream() \| Streaming version of [pgml.transform()](sql-extension/pgml.transform/). Retrieve tokens as they are generated by the LLM, decreasing time to first token. \|
		\| [pgml.train()](sql-extension/pgml.train/) \| Train a machine learning model on data from a Postgres table or view. Supports XGBoost, LightGBM, Catboost and all Scikit-learn algorithms. \|
		\| [pgml.deploy()](sql-extension/pgml.deploy) \| Deploy a version of the model created with pgml.train(). \|
		\| [pgml.predict()](sql-extension/pgml.predict/) \| Perform real time inference using a model trained with pgml.train() on live application data. \|
Expand All		@@ -33,7 +34,7 @@ Together with standard database functionality provided by PostgreSQL, these func

		The client SDK implements best practices and common use cases, using the PostgresML SQL functions and standard PostgreSQL features to do it. The SDK core is written in Rust, which manages creating and running queries, connection pooling, and error handling.

		For each additional language we support (current JavaScript and Python), we create and publish language-native bindings. This architecture ensures all programming languages we support have identical APIs and similar performance when interacting with PostgresML.
		For each additional language we support (currently JavaScript and Python), we create and publish language-native bindings. This architecture ensures all programming languages we support have identical APIs and similar performance when interacting with PostgresML.

		### Use cases

Expand Down
Original file line number	Diff line number	Diff line change
		@@ -1,69 +1,196 @@
		---
		description: >-
		Thepgml extension for PostgreSQL provides Machine Learning and Artificial
		ThePostgresML extension for PostgreSQL provides Machine Learning and Artificial
		Intelligence APIs with access to algorithms to train your models, or download
		SOTA open source models fromHuggingFace.
		state-of-the-art open source models fromHugging Face.
		---

		# SQLExtension
		# SQLextension

		## Open Source Models
		PostgresML is a PostgreSQL extension which adds SQL functions to the database. Those functions provide access to AI models downloaded from Hugging Face, and classical machine learning algorithms like XGBoost and LightGBM.

		PostgresML integrates [🤗 Hugging Face Transformers](https://huggingface.co/transformers)tobring state-of-the-art models intothedata layer. There are tens of thousands of pre-trainedmodelswith pipelines to turn raw inputs into useful results. Many LLMs have been publishedandmade available for download. You will wanttobrowse all the [models](https://huggingface.co/models) available to find the perfect solution for your [dataset](https://huggingface.co/dataset)and[task](https://huggingface.co/tasks). The pgml extension provides a few APIs for different use cases:
		Our SQL API is stable and safetouse in your applications, whilethe models andalgorithms we support continuetoevolveandimprove.

		* [pgml.embed.md](pgml.embed.md "mention") returns vector embeddings for nearest neighbor searches and other vector database use cases
		* [pgml.generate.md](pgml.generate.md "mention") returns streaming text responses for chatbots
		* [pgml.transform](../../api/sql-extension/pgml.transform/ "mention") allows you to perform dozens of natural language processing (NLP) tasks with thousands of models, like sentiment analysis, question and answering, translation, summarization and text generation
		* [pgml.tune.md](pgml.tune.md "mention") fine tunes an open source model on your own data
		## Open-source LLMs

		## Train & deploy your ownmodels
		PostgresML defines two SQL functions which use [🤗 Hugging Face](https://huggingface.co/transformers) transformers and embeddingsmodels, running directly in the database:

		PostgresML also supports more than 50 machine learning algorithms to train your own models for classification, regression or clustering. We organize a family of Models in Projects that are intended to address a particular opportunity. Different algorithms can be used in the same Project, to test and compare the performance of various approaches, and track progress over time, all within your database.
		\| Function \| Description \|
		\|---------------\|-------------\|
		\| [pgml.embed()](pgml.embed) \| Generate embeddings using latest sentence transformers from Hugging Face. \|
		\| [pgml.transform()](pgml.transform/) \| Text generation using LLMs like Llama, Mixtral, and many more, with models downloaded from Hugging Face. \|
		\| pgml.transform_stream() \| Streaming version of [pgml.transform()](pgml.transform/), which fetches partial responses as they are being generated by the model, substantially decreasing time to first token. \|
		\| [pgml.tune()](pgml.tune) \| Perform fine tuning tasks on Hugging Face models, using data stored in the database. \|

		###Train
		###Example

		Training creates a Model based on the data in your database.
		Using a SQL function for interacting with open-source models makes things really easy:

		```sql
		SELECT pgml.train(
		project_name = > 'Sales Forecast',
		task => 'regression',
		relation_name => 'hist_sales',
		y_column_name => 'next_sales',
		algorithm => 'xgboost'
		);
		{% tabs %}
		{% tab title="SQL" %}

		```postgresql
		SELECT pgml.embed(
		'intfloat/e5-small',
		'This text will be embedded using the intfloat/e5-small model.'
		) AS embedding;
		```

		{% endtab %}
		{% tab title="Output" %}

		```
		embedding
		-------------------------------------------
		{-0.028478337,-0.06275077,-0.04322059, [...]
		```

		See [pgml.train](../../api/sql-extension/pgml.train/README.md) for more information.
		{% endtab %}
		{% endtabs %}

		Using the `pgml` SQL functions inside regular queries, it's possible to add embeddings and LLM-generated text inside any query, without the data ever leaving the database, removing the cost of a remote network call.

		## Classical machine learning

		PostgresML defines four SQL functions which allow training regression, classification, and clustering models on tabular data:

		\| Function \| Description \|
		\|---------------\|-------------\|
		\| [pgml.train()](pgml.train/) \| Train a model on PostgreSQL tables or views using any algorithm from Scikit-learn, with the additional support for XGBoost, LightGBM and Catboost. \|
		\| [pgml.predict()](pgml.predict/) \| Run inference on live application data using a model trained with [pgml.train()](pgml.train/). \|
		\| [pgml.deploy()](pgml.deploy) \| Deploy a specific version of a model trained with pgml.train(), using your own accuracy metrics. \|
		\| pgml.load_dataset() \| Load any of the toy datasets from Scikit-learn or any dataset from Hugging Face. \|

		### Example

		#### Load data

		### Deploy
		Using `pgml.load_dataset()`, we can load an example classification dataset from Scikit-learn:

		Deploy an active Model for a particular Project, using a deployment strategy to select the best model.
		{% tabs %}
		{% tab title="SQL" %}

		```sql
		SELECT pgml.deploy(
		project_name => 'Sales Forecast',
		strategy => 'best_score',
		algorithm => 'xgboost'
		```postgresql
		SELECT *
		FROM pgml.load_dataset('digits');
		```

		{% endtab %}
		{% tab title="Output" %}

		```
		table_name \| rows
		-------------+------
		pgml.digits \| 1797
		(1 row)
		```

		{% endtab %}
		{% endtabs %}

		#### Train a model

		Once we have some data, we can train a model on this data using [pgml.train()](pgml.train/):

		{% tabs %}
		{% tab title="SQL" %}

		```postgresql
		SELECT *
		FROM pgml.train(
		project_name => 'My project name',
		task => 'classification',
		relation_name =>'pgml.digits',
		y_column_name => 'target',
		algorithm => 'xgboost',
		);
		```

		See [pgml.deploy.md](pgml.deploy.md "mention") for more information.
		{% endtab %}
		{% tab title="Output" %}

		### Predict
		```
		INFO: Metrics: {
		"f1": 0.8755124,
		"precision": 0.87670505,
		"recall": 0.88005465,
		"accuracy": 0.87750554,
		"mcc": 0.8645154,
		"fit_time": 0.33504912,
		"score_time": 0.001842427
		}

		project \| task \| algorithm \| deployed
		-----------------+----------------+-----------+----------
		My project name \| classification \| xgboost \| t
		(1 row)

		Use your Model on novel data points not seen during training to infer a new data point.
		```

		{% endtab %}
		{% endtabs %}

		[pgml.train()](pgml.train/) reads data from the table, using the `target` column as the label, automatically splits the dataset into test and train sets, and trains an XGBoost model. Our extension supports more than 50 machine learning algorithms, and you can train a model using any of them by just changing the name of the `algorithm` argument.


		#### Real time inference

		Now that we have a model, we can use it to predict new data points, in real time, on live application data:

		```sql
		SELECT pgml.predict(
		project_name => 'Sales Forecast',
		features => ARRAY[
		last_week_sales,
		week_of_year
		]
		{% tabs %}
		{% tab title="SQL" %}

		```postgresql
		SELECT
		target,
		pgml.predict(
		'My project name',
		image
		) AS prediction
		FROM new_sales
		ORDER BY prediction DESC;
		FROM
		pgml.digits
		LIMIT 1;
		```

		{% endtab %}
		{% tab title="Output" %}

		```
		target \| prediction
		--------+------------
		0 \| 0
		(1 row)
		```

		{% endtab %}
		{% endtabs %}

		#### Change model version

		The train function automatically deploys the best model into production, using the precision score relevant to the type of the model. If you prefer to deploy models using your own accuracy metrics, the [pgml.deploy()](pgml.deploy) function can manually change which model version is used for subsequent database queries:

		{% tabs %}
		{% tab title="SQL" %}

		```postgresql
		SELECT *
		FROM
		pgml.deploy(
		'My project name',
		strategy => 'most_recent',
		algorithm => 'xgboost'
		);
		```

		{% endtab %}
		{% tab title="Output" %}

		```
		project \| strategy \| algorithm
		-----------------+-------------+-----------
		My project name \| most_recent \| xgboost
		(1 row)
		```

		See[pgml.predict](../../api/sql-extension/pgml.predict/ "mention") for more information.
		{% endtab %}
		{% endtabs %}
Original file line number	Diff line number	Diff line change
Expand Up		@@ -73,7 +73,7 @@ impl IndexLink {
		self
		}

		// Adds a suffix to this and all children ids.
		// Adds a suffix to this and all children ids.
		// this prevents id collision with multiple naves on one screen
		// like d-none for mobile nav
		pub fn id_suffix(mut self, id_suffix: &str) -> IndexLink {
Expand Down