Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

add careers#1176

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Merged
montanalow merged 9 commits intomasterfrommontana/cms
Nov 27, 2023
Merged
Show file tree
Hide file tree
Changes from1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
NextNext commit
add careers
  • Loading branch information
@montanalow
montanalow committedNov 26, 2023
commit3229714188d0e822748210343b52c8936ec6bee1
4 changes: 2 additions & 2 deletionsREADME.md
View file
Open in desktop
Original file line numberDiff line numberDiff line change
Expand Up@@ -108,7 +108,7 @@ SELECT pgml.transform(
```

## Tabular data
- [47+ classification and regression algorithms](https://postgresml.org/docs/guides/training/algorithm_selection)
- [47+ classification and regression algorithms](https://postgresml.org/docs/training/algorithm_selection)
- [8 - 40X faster inference than HTTP based model serving](https://postgresml.org/blog/postgresml-is-8x-faster-than-python-http-microservices)
- [Millions of transactions per second](https://postgresml.org/blog/scaling-postgresml-to-one-million-requests-per-second)
- [Horizontal scalability](https://github.com/postgresml/pgcat)
Expand DownExpand Up@@ -154,7 +154,7 @@ docker run \
sudo -u postgresml psql -d postgresml
```

For more details, take a look at our [Quick Start with Docker](https://postgresml.org/docs/guides/developer-docs/quick-start-with-docker) documentation.
For more details, take a look at our [Quick Start with Docker](https://postgresml.org/docs/developer-docs/quick-start-with-docker) documentation.

# Getting Started

Expand Down
2 changes: 1 addition & 1 deletionpackages/cargo-pgml-components/src/local_dev.rs
View file
Open in desktop
Original file line numberDiff line numberDiff line change
Expand Up@@ -82,7 +82,7 @@ static PG_PGVECTOR: &str = "
static PG_PGML: &str = "To install PostgresML into your PostgreSQL database,
follow the instructions on:

\thttps://postgresml.org/docs/guides/setup/v2/installation
\thttps://postgresml.org/docs/setup/v2/installation
";

#[cfg(target_os = "linux")]
Expand Down
2 changes: 1 addition & 1 deletionpgml-dashboard/README.md
View file
Open in desktop
Original file line numberDiff line numberDiff line change
Expand Up@@ -2,4 +2,4 @@

PostgresML provides a dashboard with analytical views of the training data and model performance, as well as integrated notebooks for rapid iteration. It is primarily written in Rust using [Rocket](https://rocket.rs/) as a lightweight web framework and [SQLx](https://github.com/launchbadge/sqlx) to interact with the database.

Please see the [quick start instructions](https://postgresml.org/docs/guides/getting-started/sign-up) for general information on installing or deploying PostgresML. A [developer guide](https://postgresml.org/developer_guide/overview/) is also available for those who would like to contribute.
Please see the [quick start instructions](https://postgresml.org/docs/getting-started/sign-up) for general information on installing or deploying PostgresML. A [developer guide](https://postgresml.org/developer_guide/overview/) is also available for those who would like to contribute.
View file
Open in desktop
Original file line numberDiff line numberDiff line change
Expand Up@@ -121,7 +121,7 @@ LIMIT 5;

## Generating embeddings from natural language text

PostgresML provides a simple interface to generate embeddings from text in your database. You can use the [`pgml.embed`](https://postgresml.org/docs/guides/transformers/embeddings) function to generate embeddings for a column of text. The function takes a transformer name and a text value. The transformer will automatically be downloaded and cached on your connection process for reuse. You can see a list of potential good candidate models to generate embeddings on the [Massive Text Embedding Benchmark leaderboard](https://huggingface.co/spaces/mteb/leaderboard).
PostgresML provides a simple interface to generate embeddings from text in your database. You can use the [`pgml.embed`](https://postgresml.org/docs/transformers/embeddings) function to generate embeddings for a column of text. The function takes a transformer name and a text value. The transformer will automatically be downloaded and cached on your connection process for reuse. You can see a list of potential good candidate models to generate embeddings on the [Massive Text Embedding Benchmark leaderboard](https://huggingface.co/spaces/mteb/leaderboard).

Since our corpus of documents (movie reviews) are all relatively short and similar in style, we don't need a large model. <code>[intfloat/e5-small](https://huggingface.co/intfloat/e5-small)</code> will be a good first attempt. The great thing about PostgresML is you can always regenerate your embeddings later to experiment with different embedding models.

Expand Down
2 changes: 1 addition & 1 deletionpgml-dashboard/content/docs/about/faq.md
View file
Open in desktop
Original file line numberDiff line numberDiff line change
Expand Up@@ -10,7 +10,7 @@ Postgres is widely considered mission critical, and some of the most [reliable](

*How good are the models?*

Model quality is often a trade-off between compute resources and incremental quality improvements. Sometimes a few thousands training examples and an off the shelf algorithm can deliver significant business value after a few seconds of training. PostgresML allows stakeholders to choose several [different algorithms](/docs/guides/training/algorithm_selection/) to get the most bang for the buck, or invest in more computationally intensive techniques as necessary. In addition, PostgresML can automatically apply best practices for [data cleaning](/docs/guides/training/preprocessing/)) like imputing missing values by default and normalizing features to prevent common problems in production.
Model quality is often a trade-off between compute resources and incremental quality improvements. Sometimes a few thousands training examples and an off the shelf algorithm can deliver significant business value after a few seconds of training. PostgresML allows stakeholders to choose several [different algorithms](/docs/training/algorithm_selection/) to get the most bang for the buck, or invest in more computationally intensive techniques as necessary. In addition, PostgresML can automatically apply best practices for [data cleaning](/docs/training/preprocessing/)) like imputing missing values by default and normalizing features to prevent common problems in production.

PostgresML doesn't help with reformulating a business problem into a machine learning problem. Like most things in life, the ultimate in quality will be a concerted effort of experts working over time. PostgresML is intended to establish successful patterns for those experts to collaborate around while leveraging the expertise of open source and research communities.

Expand Down
View file
Open in desktop
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
# Dashboard

PostgresML comes with a web app to provide visibility into models and datasets in your database. If you're running [our Docker container](/docs/guides/developer-docs/quick-start-with-docker), you can view it running on [http://localhost:8000/](http://localhost:8000/).
PostgresML comes with a web app to provide visibility into models and datasets in your database. If you're running [our Docker container](/docs/developer-docs/quick-start-with-docker), you can view it running on [http://localhost:8000/](http://localhost:8000/).


## Generate example data
Expand Down
View file
Open in desktop
Original file line numberDiff line numberDiff line change
Expand Up@@ -51,7 +51,7 @@ LIMIT 25;

### Example

If you've already been through the [Training Overview](/docs/guides/training/overview/), you can see the results of those efforts:
If you've already been through the [Training Overview](/docs/training/overview/), you can see the results of those efforts:

=== "SQL"

Expand DownExpand Up@@ -106,7 +106,7 @@ SELECT * FROM pgml.deployed_models;

PostgresML will automatically deploy a model only if it has better metrics than existing ones, so it's safe to experiment with different algorithms and hyperparameters.

Take a look at [Deploying Models](/docs/guides/predictions/deployments/) documentation for more details.
Take a look at [Deploying Models](/docs/predictions/deployments/) documentation for more details.

## Specific Models

Expand Down
View file
Open in desktop
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
# Deployments

Deployments are an artifact of calls to `pgml.deploy()` and `pgml.train()`. See [Deployments](/docs/guides/predictions/deployments/) for ways to create new deployments manually.
Deployments are an artifact of calls to `pgml.deploy()` and `pgml.train()`. See [Deployments](/docs/predictions/deployments/) for ways to create new deployments manually.

![Deployment](/dashboard/static/images/dashboard/deployment.png)

Expand Down
2 changes: 1 addition & 1 deletionpgml-dashboard/content/docs/guides/schema/models.md
View file
Open in desktop
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
# Models

Models are an artifact of calls to `pgml.train()`. See [Training Overview](/docs/guides/training/overview/) for ways to create new models.
Models are an artifact of calls to `pgml.train()`. See [Training Overview](/docs/training/overview/) for ways to create new models.

![Models](/dashboard/static/images/dashboard/model.png)

Expand Down
2 changes: 1 addition & 1 deletionpgml-dashboard/content/docs/guides/schema/projects.md
View file
Open in desktop
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
# Projects

Projects are an artifact of calls to `pgml.train()`. See [Training Overview](/docs/guides/training/overview/) for ways to create new projects.
Projects are an artifact of calls to `pgml.train()`. See [Training Overview](/docs/training/overview/) for ways to create new projects.

![Projects](/dashboard/static/images/dashboard/project.png)

Expand Down
2 changes: 1 addition & 1 deletionpgml-dashboard/content/docs/guides/schema/snapshots.md
View file
Open in desktop
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
# Snapshots

Snapshots are an artifact of calls to `pgml.train()` that specify the `relation_name` and `y_column_name` parameters. See [Training Overview](/docs/guides/training/overview/) for ways to create new snapshots.
Snapshots are an artifact of calls to `pgml.train()` that specify the `relation_name` and `y_column_name` parameters. See [Training Overview](/docs/training/overview/) for ways to create new snapshots.

![Snapshots](/dashboard/static/images/dashboard/snapshot.png)

Expand Down
View file
Open in desktop
Original file line numberDiff line numberDiff line change
Expand Up@@ -22,7 +22,7 @@ psql \
-f dump.sql
```

If you're using our <a href="/docs/guides/developer-docs/quick-start-with-docker">Docker</a> stack, you can import the data there:</p>
If you're using our <a href="/docs/developer-docs/quick-start-with-docker">Docker</a> stack, you can import the data there:</p>

```
psql \
Expand Down
6 changes: 3 additions & 3 deletionspgml-dashboard/content/docs/guides/setup/gpu_support.md
View file
Open in desktop
Original file line numberDiff line numberDiff line change
Expand Up@@ -9,13 +9,13 @@ Models trained on GPU may also require GPU support to make predictions. Consult
!!!

## Tensorflow
GPU setup for Tensorflow is covered in the [documentation](https://www.tensorflow.org/install/pip). You may acquire pre-trained GPU enabled models for fine tuning from [Hugging Face](/docs/guides/transformers/fine_tuning/).
GPU setup for Tensorflow is covered in the [documentation](https://www.tensorflow.org/install/pip). You may acquire pre-trained GPU enabled models for fine tuning from [Hugging Face](/docs/transformers/fine_tuning/).

## Torch
GPU setup for Torch is covered in the [documentation](https://pytorch.org/get-started/locally/). You may acquire pre-trained GPU enabled models for fine tuning from [Hugging Face](/docs/guides/transformers/fine_tuning/).
GPU setup for Torch is covered in the [documentation](https://pytorch.org/get-started/locally/). You may acquire pre-trained GPU enabled models for fine tuning from [Hugging Face](/docs/transformers/fine_tuning/).

## Flax
GPU setup for Flax is covered in the [documentation](https://github.com/google/jax#pip-installation-gpu-cuda). You may acquire pre-trained GPU enabled models for fine tuning from [Hugging Face](/docs/guides/transformers/fine_tuning/).
GPU setup for Flax is covered in the [documentation](https://github.com/google/jax#pip-installation-gpu-cuda). You may acquire pre-trained GPU enabled models for fine tuning from [Hugging Face](/docs/transformers/fine_tuning/).

## XGBoost
GPU setup for XGBoost is covered in the [documentation](https://xgboost.readthedocs.io/en/stable/gpu/index.html).
Expand Down
View file
Open in desktop
Original file line numberDiff line numberDiff line change
Expand Up@@ -2,7 +2,7 @@

!!! note

With the release of PostgresML 2.0, this documentation has been deprecated. New installation instructions are <a href="/docs/guides/setup/v2/installation/">available</a>.
With the release of PostgresML 2.0, this documentation has been deprecated. New installation instructions are <a href="/docs/setup/v2/installation/">available</a>.

!!!

Expand Down
View file
Open in desktop
Original file line numberDiff line numberDiff line change
Expand Up@@ -278,7 +278,7 @@ The following common machine learning tasks are performed automatically by Postg
4. Save it into the model store (a Postgres table)
5. Load it and cache it during inference

Check out our [Training](/docs/guides/training/overview/) and [Predictions](/docs/guides/predictions/overview/) documentation for more details. Some more advanced topics like [hyperparameter search](/docs/guides/training/hyperparameter_search/) and [GPU acceleration](/docs/guides/setup/gpu_support/) are available as well.
Check out our [Training](/docs/training/overview/) and [Predictions](/docs/predictions/overview/) documentation for more details. Some more advanced topics like [hyperparameter search](/docs/training/hyperparameter_search/) and [GPU acceleration](/docs/setup/gpu_support/) are available as well.

## Dashboard

Expand Down
View file
Open in desktop
Original file line numberDiff line numberDiff line change
Expand Up@@ -10,7 +10,7 @@ The extension can be installed by compiling it from source, or if you're using U

!!! tip

If you're just looking to try PostgresML without installing it on your system, take a look at our [Quick Start with Docker](/docs/guides/developer-docs/quick-start-with-docker) guide.
If you're just looking to try PostgresML without installing it on your system, take a look at our [Quick Start with Docker](/docs/developer-docs/quick-start-with-docker) guide.

!!!

Expand Down
View file
Open in desktop
Original file line numberDiff line numberDiff line change
Expand Up@@ -5,7 +5,7 @@ The API is identical between v1.0 and v2.0, and models trained with v1.0 can be

!!! note

Make sure you've set up the system requirements in [v2.0 installation](/docs/guides/setup/v2/installation/), so that the v2.0 extension may be installed.
Make sure you've set up the system requirements in [v2.0 installation](/docs/setup/v2/installation/), so that the v2.0 extension may be installed.

!!!

Expand Down
View file
Open in desktop
Original file line numberDiff line numberDiff line change
Expand Up@@ -28,7 +28,7 @@ SELECT * FROM pgml.train(

!!!

You may pass any of the arguments listed in the algorithms documentation as hyperparameters. See [Algorithms](/docs/guides/training/algorithm_selection/) for the complete list of algorithms and their associated hyperparameters.
You may pass any of the arguments listed in the algorithms documentation as hyperparameters. See [Algorithms](/docs/training/algorithm_selection/) for the complete list of algorithms and their associated hyperparameters.

### Search Algorithms

Expand Down
8 changes: 4 additions & 4 deletionspgml-dashboard/content/docs/guides/training/overview.md
View file
Open in desktop
Original file line numberDiff line numberDiff line change
Expand Up@@ -30,9 +30,9 @@ pgml.train(
| `task` | The objective of the experiment: `regression` or `classification`. | `classification` |
| `relation_name` | The Postgres table or view where the training data is stored or defined. | `public.users` |
| `y_column_name` | The name of the label (aka "target" or "unknown") column in the training table. | `is_bot` |
| `algorithm` | The algorithm to train on the dataset, see [Algorithm Selection](/docs/guides/training/algorithm_selection/) for details. | `xgboost` |
| `algorithm` | The algorithm to train on the dataset, see [Algorithm Selection](/docs/training/algorithm_selection/) for details. | `xgboost` |
| `hyperparams ` | The hyperparameters to pass to the algorithm for training, JSON formatted. | `{ "n_estimators": 25 }` |
| `search` | If set, PostgresML will perform a hyperparameter search to find the best hyperparameters for the algorithm. See [Hyperparameter Search](/docs/guides/training/hyperparameter_search/) for details. | `grid` |
| `search` | If set, PostgresML will perform a hyperparameter search to find the best hyperparameters for the algorithm. See [Hyperparameter Search](/docs/training/hyperparameter_search/) for details. | `grid` |
| `search_params` | Search parameters used in the hyperparameter search, using the scikit-learn notation, JSON formatted. | ```{ "n_estimators": [5, 10, 25, 100] }``` |
| `search_args` | Configuration parameters for the search, JSON formatted. Currently only `n_iter` is supported for `random` search. | `{ "n_iter": 10 }` |
| `test_size ` | Fraction of the dataset to use for the test set and algorithm validation. | `0.25` |
Expand DownExpand Up@@ -136,7 +136,7 @@ target |

## Training a Model

Now that we've got data, we're ready to train a model using an algorithm. We'll start with the default `linear` algorithm to demonstrate the basics. See the [Algorithms](/docs/guides/training/algorithm_selection/) for a complete list of available algorithms.
Now that we've got data, we're ready to train a model using an algorithm. We'll start with the default `linear` algorithm to demonstrate the basics. See the [Algorithms](/docs/training/algorithm_selection/) for a complete list of available algorithms.


=== "SQL"
Expand DownExpand Up@@ -177,7 +177,7 @@ INFO: Metrics: {
===


The output gives us information about the training run, including the `deployed` status. This is great news indicating training has successfully reached a new high score for the project's key metric and our new model was automatically deployed as the one that will be used to make new predictions for the project. See [Deployments](/docs/guides/predictions/deployments/) for a guide to managing the active model.
The output gives us information about the training run, including the `deployed` status. This is great news indicating training has successfully reached a new high score for the project's key metric and our new model was automatically deployed as the one that will be used to make new predictions for the project. See [Deployments](/docs/predictions/deployments/) for a guide to managing the active model.

## Inspecting the results
Now we can inspect some of the artifacts a training run creates.
Expand Down
View file
Open in desktop
Original file line numberDiff line numberDiff line change
Expand Up@@ -29,7 +29,7 @@ There are 3 steps to preprocessing data:
- [Imputing](#imputing-missing-values) NULL values to some quantitative value
- [Scaling](#scaling-values) quantitative values across all variables to similar ranges

These preprocessing steps may be specified on a per-column basis to the [train()](/docs/guides/training/overview/) function. By default, PostgresML does minimal preprocessing on training data, and will raise an error during analysis if NULL values are encountered without a preprocessor. All types other than `TEXT` are treated as quantitative variables and cast to floating point representations before passing them to the underlying algorithm implementations.
These preprocessing steps may be specified on a per-column basis to the [train()](/docs/training/overview/) function. By default, PostgresML does minimal preprocessing on training data, and will raise an error during analysis if NULL values are encountered without a preprocessor. All types other than `TEXT` are treated as quantitative variables and cast to floating point representations before passing them to the underlying algorithm implementations.

```postgresql title="pgml.train()"
SELECT pgml.train(
Expand Down
View file
Open in desktop
Original file line numberDiff line numberDiff line change
Expand Up@@ -94,7 +94,7 @@ SELECT pgml.transform(

===

See [text classification documentation](https://huggingface.co/tasks/text-classification) for more options and potential use cases beyond sentiment analysis. You'll notice the outputs are not great in this example. RoBERTa is a breakthrough model, that demonstrated just how important each particular hyperparameter is for the task and particular dataset regardless of how large your model is. We'll show how to [fine tune](/docs/guides/transformers/fine_tuning/) models on your data in the next step.
See [text classification documentation](https://huggingface.co/tasks/text-classification) for more options and potential use cases beyond sentiment analysis. You'll notice the outputs are not great in this example. RoBERTa is a breakthrough model, that demonstrated just how important each particular hyperparameter is for the task and particular dataset regardless of how large your model is. We'll show how to [fine tune](/docs/transformers/fine_tuning/) models on your data in the next step.

### Summarization
Sometimes we need all the nuanced detail, but sometimes it's nice to get to the point. Summarization can reduce a very long and complex document to a few sentences. One studied application is reducing legal bills passed by Congress into a plain english summary. Hollywood may also need some intelligence to reduce a full synopsis down to a pithy blurb for movies like Inception.
Expand DownExpand Up@@ -225,4 +225,4 @@ SELECT pgml.transform(
===

### More
There are many different [tasks](https://huggingface.co/tasks) and tens of thousands of state-of-the-art [models](https://huggingface.co/models) available for you to explore. The possibilities are expanding every day. There can be amazing performance improvements in domain specific versions of these general tasks by fine tuning published models on your dataset. See the next section for [fine tuning](/docs/guides/transformers/fine_tuning/) demonstrations.
There are many different [tasks](https://huggingface.co/tasks) and tens of thousands of state-of-the-art [models](https://huggingface.co/models) available for you to explore. The possibilities are expanding every day. There can be amazing performance improvements in domain specific versions of these general tasks by fine tuning published models on your dataset. See the next section for [fine tuning](/docs/transformers/fine_tuning/) demonstrations.
Loading

[8]ページ先頭

©2009-2025 Movatter.jp