- Notifications
You must be signed in to change notification settings - Fork328
add PCA as first decomposition method#1441
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Merged
Uh oh!
There was an error while loading.Please reload this page.
Merged
Changes fromall commits
Commits
Show all changes
5 commits Select commitHold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Uh oh!
There was an error while loading.Please reload this page.
Jump to
Jump to file
Failed to load files.
Loading
Uh oh!
There was an error while loading.Please reload this page.
Diff view
Diff view
There are no files selected for viewing
50 changes: 50 additions & 0 deletionspgml-cms/docs/api/sql-extension/pgml.decompose.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,50 @@ | ||
--- | ||
description: Decompose an input vector into it's principal components | ||
--- | ||
# pgml.decompose() | ||
Chunks are pieces of documents split using some specified splitter. This is typically done before embedding. | ||
## API | ||
```sql | ||
pgml.decompose( | ||
project_name TEXT, -- project name | ||
vector REAL[] -- features to decompose | ||
) | ||
``` | ||
### Parameters | ||
| Parameter | Example | Description | | ||
|----------------|---------------------------------|----------------------------------------------------------| | ||
| `project_name` | `'My First PostgresML Project'` | The project name used to train models in `pgml.train()`. | | ||
| `vector` | `ARRAY[0.1, 0.45, 1.0]` | The feature vector that needs decomposition. | | ||
## Example | ||
```sql | ||
SELECT pgml.decompose('My PCA', ARRAY[0.1, 2.0, 5.0]); | ||
``` | ||
!!! example | ||
```sql | ||
SELECT *, | ||
pgml.decompose( | ||
'Buy it Again', | ||
ARRAY[ | ||
user.location_id, | ||
NOW() - user.created_at, | ||
user.total_purchases_in_dollars | ||
] | ||
) AS buying_score | ||
FROM users | ||
WHERE tenant_id = 5 | ||
ORDER BY buying_score | ||
LIMIT 25; | ||
``` | ||
!!! |
6 changes: 3 additions & 3 deletionspgml-cms/docs/api/sql-extension/pgml.train/clustering.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters
42 changes: 42 additions & 0 deletionspgml-cms/docs/api/sql-extension/pgml.train/decomposition.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,42 @@ | ||
# Decomposition | ||
Models can be trained using `pgml.train` on unlabeled data to identify important features within the data. To decompose a dataset into it's principal components, we can use the table or a view. Since decomposition is an unsupervised algorithm, we don't need a column that represents a label as one of the inputs to `pgml.train`. | ||
## Example | ||
This example trains models on the sklearn digits dataset -- which is a copy of the test set of the [UCI ML hand-written digits datasets](https://archive.ics.uci.edu/ml/datasets/Optical+Recognition+of+Handwritten+Digits). This demonstrates using a table with a single array feature column for principal component analysis. You could do something similar with a vector column. | ||
```sql | ||
SELECT pgml.load_dataset('digits'); | ||
-- create an unlabeled table of the images for unsupervised learning | ||
CREATE VIEW pgml.digit_vectors AS | ||
SELECT image FROM pgml.digits; | ||
-- view the dataset | ||
SELECT left(image::text, 40) || ',...}' FROM pgml.digit_vectors LIMIT 10; | ||
-- train a simple model to cluster the data | ||
SELECT * FROM pgml.train('Handwritten Digit Components', 'decomposition', 'pgml.digit_vectors', hyperparams => '{"n_components": 3}'); | ||
-- check out the compenents | ||
SELECT target, pgml.decompose('Handwritten Digit Components', image) AS pca | ||
FROM pgml.digits | ||
LIMIT 10; | ||
``` | ||
Note that the input vectors have been reduced from 64 dimensions to 3, which explain nearly half of the variance across all samples. | ||
## Algorithms | ||
All decomposition algorithms implemented by PostgresML are online versions. You may use the [pgml.decompose](../../../api/sql-extension/pgml.decompose "mention") function to decompose novel data points after the model has been trained. | ||
| Algorithm | Reference | | ||
|---------------------------|---------------------------------------------------------------------------------------------------------------------| | ||
| `pca` | [PCA](https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html) | | ||
### Examples | ||
```sql | ||
SELECT * FROM pgml.train('Handwritten Digit Clusters', algorithm => 'pca', hyperparams => '{"n_components": 10}'); | ||
``` |
6 changes: 4 additions & 2 deletionspgml-dashboard/src/models.rs
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters
File renamed without changes.
2 changes: 1 addition & 1 deletionpgml-extension/Cargo.lock
Some generated files are not rendered by default. Learn more abouthow customized files appear on GitHub.
Oops, something went wrong.
Uh oh!
There was an error while loading.Please reload this page.
2 changes: 1 addition & 1 deletionpgml-extension/Cargo.toml
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,6 +1,6 @@ | ||
[package] | ||
name = "pgml" | ||
version = "2.8.4" | ||
edition = "2021" | ||
[lib] | ||
2 changes: 1 addition & 1 deletionpgml-extension/examples/cluster.sql → pgml-extension/examples/clustering.sql
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters
60 changes: 60 additions & 0 deletionspgml-extension/examples/decomposition.sql
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,60 @@ | ||
-- This example reduces the dimensionality of images in the sklean digits dataset | ||
-- which is a copy of the test set of the UCI ML hand-written digits datasets | ||
-- https://archive.ics.uci.edu/ml/datasets/Optical+Recognition+of+Handwritten+Digits | ||
-- | ||
-- This demonstrates using a table with a single array feature column | ||
-- for decomposition to reduce dimensionality. | ||
-- | ||
-- Exit on error (psql) | ||
-- \set ON_ERROR_STOP true | ||
\timing on | ||
SELECT pgml.load_dataset('digits'); | ||
-- view the dataset | ||
SELECT left(image::text, 40) || ',...}', target FROM pgml.digits LIMIT 10; | ||
-- create a view of just the vectors for decomposition, without any labels | ||
CREATE VIEW digit_vectors AS | ||
SELECT image FROM pgml.digits; | ||
SELECT * FROM pgml.train('Handwritten Digits Reduction', 'decomposition', 'digit_vectors'); | ||
-- check out the decomposed vectors | ||
SELECT target, pgml.decompose('Handwritten Digits Reduction', image) AS pca | ||
FROM pgml.digits | ||
LIMIT 10; | ||
-- | ||
-- After a project has been trained, omitted parameters will be reused from previous training runs | ||
-- In these examples we'll reuse the training data snapshots from the initial call. | ||
-- | ||
-- We can reduce the image vectors from 64 dimensions to 3 components | ||
SELECT * FROM pgml.train('Handwritten Digits Reduction', hyperparams => '{"n_components": 3}'); | ||
-- check out the reduced vectors | ||
SELECT target, pgml.decompose('Handwritten Digits Reduction', image) AS pca | ||
FROM pgml.digits | ||
LIMIT 10; | ||
-- check out all that hard work | ||
SELECT trained_models.* FROM pgml.trained_models | ||
JOIN pgml.models on models.id = trained_models.id | ||
ORDER BY models.metrics->>'cumulative_explained_variance' DESC LIMIT 5; | ||
-- deploy the PCA model for prediction use | ||
SELECT * FROM pgml.deploy('Handwritten Digits Reduction', 'most_recent', 'pca'); | ||
-- check out that throughput | ||
SELECT * FROM pgml.deployed_models ORDER BY deployed_at DESC LIMIT 5; | ||
-- deploy the "best" model for prediction use | ||
SELECT * FROM pgml.deploy('Handwritten Digits Reduction', 'best_score'); | ||
SELECT * FROM pgml.deploy('Handwritten Digits Reduction', 'most_recent'); | ||
SELECT * FROM pgml.deploy('Handwritten Digits Reduction', 'rollback'); | ||
SELECT * FROM pgml.deploy('Handwritten Digits Reduction', 'best_score', 'pca'); | ||
-- check out the improved predictions | ||
SELECT target, pgml.predict('Handwritten Digits Reduction', image) AS prediction | ||
FROM pgml.digits | ||
LIMIT 10; |
5 changes: 2 additions & 3 deletionspgml-extension/examples/image_classification.sql
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters
2 changes: 1 addition & 1 deletionpgml-extension/examples/regression.sql
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters
13 changes: 13 additions & 0 deletionspgml-extension/sql/pgml--2.8.3--2.8.4.sql
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,13 @@ | ||
ALTER TYPE pgml.task RENAME VALUE 'cluster' TO 'clustering'; | ||
ALTER TYPE pgml.task ADD VALUE IF NOT EXISTS 'decomposition'; | ||
ALTER TYPE pgml.algorithm ADD VALUE IF NOT EXISTS 'pca'; | ||
-- pgml::api::decompose | ||
CREATE FUNCTION pgml."decompose"( | ||
"project_name" TEXT, /* alloc::string::String */ | ||
"vector" FLOAT4[] /* Vec<f32> */ | ||
) RETURNS FLOAT4[] /* Vec<f32> */ | ||
IMMUTABLE STRICT PARALLEL SAFE | ||
LANGUAGE c /* Rust */ | ||
AS 'MODULE_PATHNAME', 'decompose_wrapper'; |
11 changes: 10 additions & 1 deletionpgml-extension/src/api.rs
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters
16 changes: 14 additions & 2 deletionspgml-extension/src/bindings/mod.rs
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters
28 changes: 23 additions & 5 deletionspgml-extension/src/bindings/sklearn/mod.rs
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters
Oops, something went wrong.
Uh oh!
There was an error while loading.Please reload this page.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.