Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

MVP goals#1

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Merged
levkk merged 17 commits intomasterfrommontana/readme
Apr 14, 2022
Merged
Show file tree
Hide file tree
Changes fromall commits
Commits
Show all changes
17 commits
Select commitHold shift + click to select a range
92b80f2
MVP goals
Apr 12, 2022
f18f276
Use unittest as the test running harness
Apr 12, 2022
3c66272
remove validate because validation has a different meaning in ML, and…
Apr 12, 2022
958cfba
keep model in memory to avoid going to disk
Apr 12, 2022
14b1f61
use bytea directly for pl/python rather than hex/text conversion
Apr 12, 2022
829b62e
add a draft schema to support snapshots and multiple training runs fo…
Apr 12, 2022
9907aaa
sketch out the regression model training cycle
Apr 13, 2022
b50f000
break it down into model classes
Apr 13, 2022
89b467d
add categoricals
Apr 14, 2022
d9d6727
Update pgml/tests/test_train.py
montanalowApr 14, 2022
dfb57c6
fix categorical test
Apr 14, 2022
56e033d
Merge branch 'montana/readme' of github.com:postgresml/postgresml int…
Apr 14, 2022
a1ef909
docs
Apr 14, 2022
c2de3d8
make test that "works"
Apr 14, 2022
ffedbc5
Update pgml/pgml/model.py
montanalowApr 14, 2022
aa44f94
remove parens around ifs
Apr 14, 2022
4ca1a5f
Merge branch 'montana/readme' of github.com:postgresml/postgresml int…
Apr 14, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
84 changes: 79 additions & 5 deletionsREADME.md
View file
Open in desktop
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,82 @@
## Postgres ML demo
## PostgresML

PostgresML aims to be the easiest way to gain value from machine learning. Anyone with a basic understanding of SQL should be able to build and deploy models to production, while receiving the benefits of a high performance machine learning platform. PostgresML leverages state of the art algorithms with built in best practices, without having to setup additional infrastructure or learn additional programming languages.

Getting started is as easy as creating a `table` or `view` that holds the training data, and then registering that with PostgresML.

```sql
SELECT pgml.model_regression('Red Wine Quality', training_data_table_or_view_name, label_column_name);
```

And predict novel datapoints:

```sql
SELECT pgml.predict('Red Wine Quality', red_wines.*)
FROM pgml.red_wines
LIMIT 3;

quality
---------
0.896432
0.834822
0.954502
(3 rows)
```

PostgresML similarly supports classification to predict discrete classes rather than numeric scores for novel data.

```sql
SELECT pgml.create_classification('Handwritten Digit Classifier', pgml.mnist_training_data, label_column_name);
```

And predict novel datapoints:

```sql
SELECT pgml.predict('Handwritten Digit Classifier', pgml.mnist_test_data.*)
FROM pgml.mnist
LIMIT 1;

digit | likelihood
-------+----
5 | 0.956432
(1 row)
```

Checkout the [documentation](https://TODO) to view the full capabilities, including:
- [Creating Training Sets](https://TODO)
- [Classification](https://TODO)
- [Regression](https://TODO)
- [Supported Algorithms](https://TODO)
- [Scikit Learn](https://TODO)
- [XGBoost](https://TODO)
- [Tensorflow](https://TODO)
- [PyTorch](https://TODO)

### Planned features
- Model management dashboard
- Data explorer
- More algorithms and libraries incluiding custom algorithm support


### FAQ

*How well does this scale?*

Petabyte sized Postgres deployements are [documented](https://www.computerworld.com/article/2535825/size-matters--yahoo-claims-2-petabyte-database-is-world-s-biggest--busiest.html) in production since at least 2008, and [recent patches](https://www.2ndquadrant.com/en/blog/postgresql-maximum-table-size/) have enabled working beyond exabyte up to the yotabyte scale. Machine learning models can be horizontally scaled using well tested Postgres replication techniques on top of a mature storage and compute platform.

*How reliable is this system?*

Postgres is widely considered mission critical, and some of the most [reliable](https://www.postgresql.org/docs/current/wal-reliability.html) technology in any modern stack. PostgresML allows an infrastructure organization to leverage pre-existing best practices to deploy machine learning into production with less risk and effort than other systems. For example, model backup and recovery happens automatically alongside normal data backup procedures.

*How good are the models?*

Model quality is often a tradeoff between compute resources and incremental quality improvements. PostgresML allows stakeholders to choose algorithms from several libraries that will provide the most bang for the buck. In addition, PostgresML automatically applies best practices for data cleaning like imputing missing values by default and normalizing data to prevent common problems in production. After quickly enabling 0 to 1 value creation, PostgresML enables further expert iteration with custom data preperation and algorithm implementations. Like most things in life, the ultimate in quality will be a concerted effort of experts working over time, but that shouldn't get in the way of a quick start.

*Is PostgresML fast?*

Colocating the compute with the data inside the database removes one of the most common latency bottlenecks in the ML stack, which is the (de)serialization of data between stores and services across the wire. Modern versions of Postgres also support automatic query parrellization across multiple workers to further minimize latency in large batch workloads. Finally, PostgresML will utilize GPU compute if both the algorithm and hardware support it, although it is currently rare in practice for production databases to have GPUs. Checkout our [benchmarks](https://todo).


Quick demo with Postgres, PL/Python, and Scikit.

### Installation in WSL or Ubuntu

Expand DownExpand Up@@ -29,11 +105,9 @@ Install Scikit globally (I didn't bother setup Postgres with a virtualenv, but i
sudo pip3 install sklearn
```

### Run thedemo
### Run theexample

```bash
sudo mkdir /app/models
sudo chown postgres:postgres /app/models
psql -f scikit_train_and_predict.sql
```

Expand Down
23 changes: 23 additions & 0 deletionsbenchmarks.sql
View file
Open in desktop
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
--
-- CREATE EXTENSION
--
CREATE EXTENSION IF NOT EXISTS plpython3u;

CREATE OR REPLACE FUNCTION pg_call()
RETURNS INT
AS $$
BEGIN
RETURN 1;
END;
$$ LANGUAGE plpgsql;

CREATE OR REPLACE FUNCTION py_call()
RETURNS INT
AS $$
return 1;
$$ LANGUAGE plpython3u;

\timing on
SELECT generate_series(1, 50000), pg_call(); -- Time: 20.679 ms
SELECT generate_series(1, 50000), py_call(); -- Time: 67.355 ms

Loading

[8]ページ先頭

©2009-2025 Movatter.jp