|
30 | 30 | </a>
|
31 | 31 | </p>
|
32 | 32 |
|
33 |
| -<palign="center"> |
34 |
| -Train and deploy models to make online predictions using only SQL, with an open source extension for Postgres. Manage your projects and visualize datasets using the built-in dashboard. |
35 |
| -</p> |
36 | 33 |
|
37 |
| - |
| 34 | +##Table of contents |
| 35 | +-[Introduction](#introduction) |
| 36 | +-[Installation](#installation) |
| 37 | +-[Getting started](#getting-started) |
| 38 | +-[Natural Language Processing](#nlp-tasks) |
| 39 | +-[Regression](#regression) |
| 40 | +-[Classification](#classification) |
38 | 41 |
|
39 |
| -The dashboard makes it easy to compare different algorithms or hyperparameters across models and datasets. |
| 42 | +##Introduction |
| 43 | +PostgresML is a PostgreSQL extension that enables you to perform ML training and inference on text and tabular data using SQL queries. With PostgresML, you can seamlessly integrate machine learning models into your PostgreSQL database and harness the power of cutting-edge algorithms to process text and tabular data efficiently. |
40 | 44 |
|
41 |
| -[](https://cloud.postgresml.org/) |
| 45 | +###Text Data |
| 46 | +- Perform natural language processing (NLP) tasks like sentiment analysis, question and answering, translation, summarization and text generation |
| 47 | +- Access 1000s of state-of-the-art language models like GPT-2, GPT-J, GPT-Neo from :hugging_face: HuggingFace model hub |
| 48 | +- Fine tune large language models (LLMs) on your own text data for different tasks |
42 | 49 |
|
43 |
| -<h2align="center"> |
44 |
| -See it in action — <a href="https://cloud.postgresml.org/" target="_blank">cloud.postgresml.org</a> |
45 |
| -</h2> |
| 50 | +**Translation** |
| 51 | +<table> |
| 52 | +<tr> |
| 53 | +<td>SQL Query</td> |
| 54 | +<td>Result </td> |
| 55 | +</tr> |
| 56 | +<tr> |
| 57 | +<td> |
| 58 | + |
| 59 | +```sql |
| 60 | +SELECTpgml.transform( |
| 61 | +'translation_en_to_fr', |
| 62 | + inputs=> ARRAY[ |
| 63 | +'Welcome to the future!', |
| 64 | +'Where have you been all this time?' |
| 65 | + ] |
| 66 | +)AS french; |
| 67 | +``` |
| 68 | +</td> |
| 69 | +<td> |
| 70 | + |
| 71 | +```sql |
| 72 | + french |
| 73 | +------------------------------------------------------------ |
| 74 | + |
| 75 | +[ |
| 76 | + {"translation_text":"Bienvenue à l'avenir!"}, |
| 77 | + {"translation_text":"Où êtes-vous allé tout ce temps?"} |
| 78 | +] |
| 79 | +``` |
| 80 | +</td> |
| 81 | +</tr> |
| 82 | +</table> |
| 83 | + |
| 84 | + |
| 85 | + |
| 86 | +**Sentiment Analysis** |
| 87 | +<table> |
| 88 | +<tr> |
| 89 | +<td>SQL Query</td> |
| 90 | +<td>Result </td> |
| 91 | +</tr> |
| 92 | +<tr> |
| 93 | +<td> |
| 94 | + |
| 95 | +```sql |
| 96 | +SELECTpgml.transform( |
| 97 | + |
| 98 | +'{"model": "roberta-large-mnli"}'::JSONB, |
| 99 | + inputs=> ARRAY |
| 100 | + [ |
| 101 | +'I love how amazingly simple ML has become!', |
| 102 | +'I hate doing mundane and thankless tasks. ☹️' |
| 103 | + ] |
| 104 | + |
| 105 | +)AS positivity; |
| 106 | +``` |
| 107 | +</td> |
| 108 | +<td> |
| 109 | + |
| 110 | +```sql |
| 111 | + positivity |
| 112 | +------------------------------------------------------ |
| 113 | +[ |
| 114 | + {"label":"NEUTRAL","score":0.8143417835235596}, |
| 115 | + {"label":"NEUTRAL","score":0.7637073993682861} |
| 116 | +] |
| 117 | +``` |
| 118 | +</td> |
| 119 | +</tr> |
| 120 | +</table> |
| 121 | + |
| 122 | + |
| 123 | +###Tabular data |
| 124 | +-[47+ classification and regression algorithms](https://postgresml.org/docs/guides/training/algorithm_selection) |
| 125 | +-[8 - 40X faster inference than HTTP based model serving](https://postgresml.org/blog/postgresml-is-8x-faster-than-python-http-microservices) |
| 126 | +-[Millions of transactions per second](https://postgresml.org/blog/scaling-postgresml-to-one-million-requests-per-second) |
| 127 | +-[Horizontal scalability](https://github.com/postgresml/pgcat) |
| 128 | + |
| 129 | + |
| 130 | +**Training a classification model** |
| 131 | + |
| 132 | +<table> |
| 133 | +<tr> |
| 134 | +<td> Training </td> |
| 135 | +<td> Inference </td> |
| 136 | +</tr> |
| 137 | +<tr> |
| 138 | +<td> |
| 139 | + |
| 140 | + |
| 141 | +```sql |
| 142 | +SELECT*FROMpgml.train( |
| 143 | +'Handwritten Digit Image Classifier', |
| 144 | + algorithm=>'xgboost', |
| 145 | +'classification', |
| 146 | +'pgml.digits', |
| 147 | +'target' |
| 148 | +); |
| 149 | +``` |
| 150 | + |
| 151 | +</td> |
| 152 | +<td> |
| 153 | + |
| 154 | +```sql |
| 155 | +SELECTpgml.predict( |
| 156 | +'My Classification Project', |
| 157 | + ARRAY[0.1,2.0,5.0] |
| 158 | +)AS prediction; |
| 159 | +``` |
| 160 | +</td> |
| 161 | +</tr> |
| 162 | +</table> |
| 163 | + |
| 164 | +##Installation |
| 165 | +PostgresML installation consists of three parts: PostgreSQL database, Postgres extension for machine learning and a dashboard app. The extension provides all the machine learning functionality and can be used independently using any SQL IDE. The dashboard app provides a eays to use interface for writing SQL notebooks, performing and tracking ML experiments and ML models. |
| 166 | + |
| 167 | +###Docker |
46 | 168 |
|
47 |
| -Please see the[quick start instructions](https://postgresml.org/user_guides/setup/quick_start_with_docker/) for general information on installing or deploying PostgresML. A[developer guide](https://postgresml.org/docs/guides/setup/developers) is also available for those who would like to contribute. |
| 169 | +Step 1: Clone this repository |
| 170 | + |
| 171 | +```bash |
| 172 | +git clone git@github.com:postgresml/postgresml.git |
| 173 | +``` |
| 174 | + |
| 175 | +Step 2: Start dockerized services. PostgresML will run on port 5433, just in case you already have Postgres running. You can find Docker installation instructions[here](https://docs.docker.com/desktop/) |
| 176 | +```bash |
| 177 | +cd postgresml |
| 178 | +docker-compose up |
| 179 | +``` |
| 180 | + |
| 181 | +Step 3: Connect to PostgresDB with PostgresML enabled using a SQL IDE or[`psql`](https://www.postgresql.org/docs/current/app-psql.html) |
| 182 | +```bash |
| 183 | +postgres://postgres@localhost:5433/pgml_development |
| 184 | +``` |
| 185 | + |
| 186 | +###Free trial |
| 187 | +If you want to check out the functionality without the hassle of Docker please go ahead and start PostgresML by signing up for a free account[here](https://postgresml.org/signup). We will provide 5GiB disk space on a shared tenant. |
| 188 | + |
| 189 | +##Getting Started |
| 190 | + |
| 191 | +###IDE support |
| 192 | +- DBeaver |
| 193 | +- Data Grip |
| 194 | +- Tableau |
| 195 | +- Power BI |
| 196 | +- Jupyter |
| 197 | +- VSCode |
| 198 | + |
| 199 | +##NLP Tasks |
| 200 | +- Text Classification |
| 201 | +- Token Classification |
| 202 | +- Table Question Answering |
| 203 | +- Question Answering |
| 204 | +- Zero-Shot Classification |
| 205 | +- Translation |
| 206 | +- Summarization |
| 207 | +- nConversational |
| 208 | +- Text Generation |
| 209 | +- Text2Text Generation |
| 210 | +- Fill-Mask |
| 211 | +- Sentence Similarity |
| 212 | + |
| 213 | +##Regression |
| 214 | +##Classification |
| 215 | + |
| 216 | +##Applications |
| 217 | +###Text |
| 218 | +- AI writing partner |
| 219 | +- Chatbot for customer support |
| 220 | +- Social media post analysis |
| 221 | +- Fintech |
| 222 | +- Healthcare |
| 223 | +- Insurance |
| 224 | + |
| 225 | + |
| 226 | +###Tabular data |
| 227 | +- Fraud detection |
| 228 | +- Recommendation |
| 229 | + |
| 230 | + |
| 231 | +##Benefits |
| 232 | +- Access to hugging face models - a little more about open source language models |
| 233 | +- Ease of fine tuning and why |
| 234 | +- Rust based extension and its benefits |
| 235 | +- Problems with HTTP serving and how PML enables microsecond latency |
| 236 | +- Pgcat for horizontal scaling |
| 237 | + |
| 238 | +##Concepts |
| 239 | +- Database |
| 240 | +- Extension |
| 241 | +- ML on text data |
| 242 | +- Transform operation |
| 243 | +- Fine tune operation |
| 244 | +- ML on tabular data |
| 245 | +- Train operation |
| 246 | +- Deploy operation |
| 247 | +- Predict operation |
| 248 | + |
| 249 | +##Deployment |
| 250 | +- Docker images |
| 251 | +- CPU |
| 252 | +- GPU |
| 253 | +- Data persistence on local/EC2/EKS |
| 254 | +- Deployment on AWS using docker images |
48 | 255 |
|
49 | 256 | ##What's in the box
|
50 | 257 | See the documentation for a complete**[list of functionality](https://postgresml.org/)**.
|
@@ -73,35 +280,6 @@ Since your data never leaves the database, you retain the speed, reliability and
|
73 | 280 | ###Open source
|
74 | 281 | We're building on the shoulders of giants. These machine learning libraries and Postgres have received extensive academic and industry use, and we'll continue their tradition to build with the community. Licensed under MIT.
|
75 | 282 |
|
76 |
| -##Quick Start |
77 |
| - |
78 |
| -1) Clone this repo: |
79 |
| - |
80 |
| -```bash |
81 |
| -$ git clone git@github.com:postgresml/postgresml.git |
82 |
| -``` |
83 |
| - |
84 |
| -2) Start dockerized services. PostgresML will run on port 5433, just in case you already have Postgres running: |
85 |
| - |
86 |
| -```bash |
87 |
| -$cd postgresml&& docker-compose up |
88 |
| -``` |
89 |
| - |
90 |
| -3) Connect to PostgreSQL in the Docker container with PostgresML installed: |
| 283 | +##Frequently Asked Questions (FAQs) |
91 | 284 |
|
92 |
| -```bash |
93 |
| -$ psql postgres://postgres@localhost:5433/pgml_development |
94 |
| -``` |
95 |
| - |
96 |
| -4) Validate your installation: |
97 |
| - |
98 |
| -```sql |
99 |
| -pgml_development=# SELECT pgml.version(); |
100 |
| - |
101 |
| - version |
102 |
| ---------- |
103 |
| -0.8.1 |
104 |
| -(1 row) |
105 |
| -``` |
106 | 285 |
|
107 |
| -See the documentation for a complete guide to**[working with PostgresML](https://postgresml.org/)**. |
|