Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commit5f7f626

Browse files
authored
Merge pull request#1 from postgresml/montana/readme
MVP goals
2 parents28c58de +fea419b commit5f7f626

File tree

14 files changed

+691
-294
lines changed

14 files changed

+691
-294
lines changed

‎README.md‎

Lines changed: 79 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,82 @@
1-
##Postgres ML demo
1+
##PostgresML
2+
3+
PostgresML aims to be the easiest way to gain value from machine learning. Anyone with a basic understanding of SQL should be able to build and deploy models to production, while receiving the benefits of a high performance machine learning platform. PostgresML leverages state of the art algorithms with built in best practices, without having to setup additional infrastructure or learn additional programming languages.
4+
5+
Getting started is as easy as creating a`table` or`view` that holds the training data, and then registering that with PostgresML.
6+
7+
```sql
8+
SELECTpgml.model_regression('Red Wine Quality', training_data_table_or_view_name, label_column_name);
9+
```
10+
11+
And predict novel datapoints:
12+
13+
```sql
14+
SELECTpgml.predict('Red Wine Quality', red_wines.*)
15+
FROMpgml.red_wines
16+
LIMIT3;
17+
18+
quality
19+
---------
20+
0.896432
21+
0.834822
22+
0.954502
23+
(3 rows)
24+
```
25+
26+
PostgresML similarly supports classification to predict discrete classes rather than numeric scores for novel data.
27+
28+
```sql
29+
SELECTpgml.create_classification('Handwritten Digit Classifier',pgml.mnist_training_data, label_column_name);
30+
```
31+
32+
And predict novel datapoints:
33+
34+
```sql
35+
SELECTpgml.predict('Handwritten Digit Classifier',pgml.mnist_test_data.*)
36+
FROMpgml.mnist
37+
LIMIT1;
38+
39+
digit | likelihood
40+
-------+----
41+
5 |0.956432
42+
(1 row)
43+
```
44+
45+
Checkout the[documentation](https://TODO) to view the full capabilities, including:
46+
-[Creating Training Sets](https://TODO)
47+
-[Classification](https://TODO)
48+
-[Regression](https://TODO)
49+
-[Supported Algorithms](https://TODO)
50+
-[Scikit Learn](https://TODO)
51+
-[XGBoost](https://TODO)
52+
-[Tensorflow](https://TODO)
53+
-[PyTorch](https://TODO)
54+
55+
###Planned features
56+
- Model management dashboard
57+
- Data explorer
58+
- More algorithms and libraries incluiding custom algorithm support
59+
60+
61+
###FAQ
62+
63+
*How well does this scale?*
64+
65+
Petabyte sized Postgres deployements are[documented](https://www.computerworld.com/article/2535825/size-matters--yahoo-claims-2-petabyte-database-is-world-s-biggest--busiest.html) in production since at least 2008, and[recent patches](https://www.2ndquadrant.com/en/blog/postgresql-maximum-table-size/) have enabled working beyond exabyte up to the yotabyte scale. Machine learning models can be horizontally scaled using well tested Postgres replication techniques on top of a mature storage and compute platform.
66+
67+
*How reliable is this system?*
68+
69+
Postgres is widely considered mission critical, and some of the most[reliable](https://www.postgresql.org/docs/current/wal-reliability.html) technology in any modern stack. PostgresML allows an infrastructure organization to leverage pre-existing best practices to deploy machine learning into production with less risk and effort than other systems. For example, model backup and recovery happens automatically alongside normal data backup procedures.
70+
71+
*How good are the models?*
72+
73+
Model quality is often a tradeoff between compute resources and incremental quality improvements. PostgresML allows stakeholders to choose algorithms from several libraries that will provide the most bang for the buck. In addition, PostgresML automatically applies best practices for data cleaning like imputing missing values by default and normalizing data to prevent common problems in production. After quickly enabling 0 to 1 value creation, PostgresML enables further expert iteration with custom data preperation and algorithm implementations. Like most things in life, the ultimate in quality will be a concerted effort of experts working over time, but that shouldn't get in the way of a quick start.
74+
75+
*Is PostgresML fast?*
76+
77+
Colocating the compute with the data inside the database removes one of the most common latency bottlenecks in the ML stack, which is the (de)serialization of data between stores and services across the wire. Modern versions of Postgres also support automatic query parrellization across multiple workers to further minimize latency in large batch workloads. Finally, PostgresML will utilize GPU compute if both the algorithm and hardware support it, although it is currently rare in practice for production databases to have GPUs. Checkout our[benchmarks](https://todo).
78+
279

3-
Quick demo with Postgres, PL/Python, and Scikit.
480

581
###Installation in WSL or Ubuntu
682

@@ -29,11 +105,9 @@ Install Scikit globally (I didn't bother setup Postgres with a virtualenv, but i
29105
sudo pip3 install sklearn
30106
```
31107

32-
###Run thedemo
108+
###Run theexample
33109

34110
```bash
35-
sudo mkdir /app/models
36-
sudo chown postgres:postgres /app/models
37111
psql -f scikit_train_and_predict.sql
38112
```
39113

‎benchmarks.sql‎

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
--
2+
-- CREATE EXTENSION
3+
--
4+
CREATE EXTENSION IF NOT EXISTS plpython3u;
5+
6+
CREATE OR REPLACEFUNCTIONpg_call()
7+
RETURNSINT
8+
AS $$
9+
BEGIN
10+
RETURN1;
11+
END;
12+
$$ LANGUAGE plpgsql;
13+
14+
CREATE OR REPLACEFUNCTIONpy_call()
15+
RETURNSINT
16+
AS $$
17+
return1;
18+
$$ LANGUAGE plpython3u;
19+
20+
\timingon
21+
SELECT generate_series(1,50000), pg_call();-- Time: 20.679 ms
22+
SELECT generate_series(1,50000), py_call();-- Time: 67.355 ms
23+

0 commit comments

Comments
 (0)

[8]ページ先頭

©2009-2025 Movatter.jp