NotificationsYou must be signed in to change notification settings
Fork328
Star6.4k

Commitb044163

committed

Merge branch 'master' into dan-product-left-nav-update

2 parentsdcd197e +65b898d commitb044163Copy full SHA for b044163

File tree

83 files changed

+1728

-787

lines changed

README.md
packages/pgml-rds-proxy/ec2
pgml-apps/pgml-chat
- README.md
pgml-cms
- .gitignore
- blog
- docs
  - .gitbook/assets
  - README.md
  - SUMMARY.md
  - api
    - apis.md
    - client-sdk
      - README.md
      - getting-started.md
    - sql-extension/pgml.train
      - joint-optimization.md
  - introduction/getting-started
    - README.md
    - connect-your-app.md
    - create-your-database.md
    - import-your-data
      - README.md
      - copy.md
      - csv.md
      - foreign-data-wrapper.md
      - foreign-data-wrappers.md
      - logical-replication
        README.md
        inside-a-vpc.md
      - pg-dump.md
  - product
    - cloud-database
      - README.md
      - serverless.md
    - pgcat
    - vector-database.md
  - resources
    - benchmarks
      - making-postgres-30-percent-faster-in-production.md
      - mindsdb-vs-postgresml.md
    - data-storage-and-retrieval
      - llm-based-pipelines-with-postgresml-and-dbt-data-build-tool.md
    - developer-docs
      - contributing.md
  - use-cases
pgml-dashboard
- content/blog/benchmarks/python_microservices_vs_postgresml
  - README.md
- package-lock.json
- package.json
- src
  - api
    - cms.rs
  - components
    - cms/index_link
      - mod.rs
      - template.html
    - code_block
      - code_block_controller.js
    - navigation/left_nav/docs
      - docs_controller.js
      - template.html
  - utils
    - markdown.rs
- static
  - css
    - bootstrap-5.3.0-alpha1
      - README.md
    - bootstrap-theme.scss
    - scss
      - components
        _modals.scss
      - pages
        _docs.scss
  - images/gym
    - quick_start.md
  - js/utilities
    - code_mirror_theme.js
pgml-extension
- examples/dbt/embeddings
  - README.md
- src/bindings/transformers
  - whitelist.rs

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

83 files changed

+1728

-787

lines changed

`‎README.md`

Lines changed: 0 additions & 12 deletions

Original file line number	Diff line number	Diff line change
`@@ -30,7 +30,6 @@`
`30`	`30`	`</a>`
`31`	`31`	`</p>`
`32`	`32`
`33`		`-`
`34`	`33`	`#Table of contents`
`35`	`34`	`-[Introduction](#introduction)`
`36`	`35`	`-[Installation](#installation)`
`@@ -87,8 +86,6 @@ SELECT pgml.transform(`
`87`	`86`	`]`
`88`	`87`	```
`89`	`88`
`90`		`-`
`91`		`-`
`92`	`89`	`Sentiment Analysis`
`93`	`90`	`SQL query`
`94`	`91`
`@@ -117,7 +114,6 @@ SELECT pgml.transform(`
`117`	`114`	`-[Millions of transactions per second](https://postgresml.org/blog/scaling-postgresml-to-one-million-requests-per-second)`
`118`	`115`	`-[Horizontal scalability](https://github.com/postgresml/pgcat)`
`119`	`116`
`120`		`-`
`121`	`117`	`Training a classification model`
`122`	`118`
`123`	`119`	`Training`
`@@ -242,7 +238,6 @@ SELECT pgml.transform(`
`242`	`238`	```
`243`	`239`	`The default <ahref="https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english"target="_blank">model</a> used for text classification is a fine-tuned version of DistilBERT-base-uncased that has been specifically optimized for the Stanford Sentiment Treebank dataset (sst2).`
`244`	`240`
`245`		`-`
`246`	`241`	`Using specific model`
`247`	`242`
`248`	`243`	To use one of the over 19,000 models available on Hugging Face, include the name of the desired model and`text-classification` task as a JSONB object in the SQL query. For example, if you want to use a RoBERTa <ahref="https://huggingface.co/models?pipeline_tag=text-classification"target="_blank">model</a> trained on around 40,000 English tweets and that has POS (positive), NEG (negative), and NEU (neutral) labels for its classes, include this information in the JSONB object when making your query.
`@@ -681,7 +676,6 @@ SELECT pgml.transform(`
`681`	`676`	`Sampling methods involve selecting the next word or sequence of words at random from the set of possible candidates, weighted by their probabilities according to the language model. This can result in more diverse and creative text, as well as avoiding repetitive patterns. In its most basic form, sampling means randomly picking the next word $w_t$ according to its conditional probability distribution:`
`682`	`677`	`$$ w_t \approx P(w_t\|w_{1:t-1})$$`
`683`	`678`
`684`		`-`
`685`	`679`	`However, the randomness of the sampling method can also result in less coherent or inconsistent text, depending on the quality of the model and the chosen sampling parameters such as temperature, top-k, or top-p. Therefore, choosing an appropriate sampling method and parameters is crucial for achieving the desired balance between creativity and coherence in generated text.`
`686`	`680`
`687`	`681`	You can pass`do_sample = True` in the arguments to use sampling methods. It is recommended to alter`temperature` or`top_p` but not both.
`@@ -821,7 +815,6 @@ SELECT * from tweet_embeddings limit 2;`
`821`	`815`	`\|"QT@user In the original draft of the 7th book, Remus Lupin survived the Battle of Hogwarts. #HappyBirthdayRemusLupin"\|{-0.1567948312,-0.3149209619,0.2163394839,..}\|`
`822`	`816`	`\|"Ben Smith / Smith (concussion) remains out of the lineup Thursday, Curtis #NHL #SJ"\|{-0.0701668188,-0.012231146,0.1304316372,.. }\|`
`823`	`817`
`824`		`-`
`825`	`818`	`##Step 2: Indexing your embeddings using different algorithms`
`826`	`819`	After you've created embeddings for your data, you need to index them using one or more indexing algorithms. There are several different types of indexing algorithms available, including B-trees, k-nearest neighbors (KNN), and approximate nearest neighbors (ANN). The specific type of indexing algorithm you choose will depend on your use case and performance requirements. For example, B-trees are a good choice for range queries, while KNN and ANN algorithms are more efficient for similarity searches.
`827`	`820`
`@@ -860,7 +853,6 @@ SELECT * FROM items, query ORDER BY items.embedding <-> query.embedding LIMIT 5;`
`860`	`853`	`\|5 RT's if you want the next episode of twilight princess tomorrow\|`
`861`	`854`	`\|Jurassic Park is BACK! New Trailer for the 4th Movie, Jurassic World -\|`
`862`	`855`
`863`		`-`
`864`	`856`	`<!-- ## Sentence Similarity`
`865`	`857`	`Sentence Similarity involves determining the degree of similarity between two texts. To accomplish this, Sentence similarity models convert the input texts into vectors (embeddings) that encapsulate semantic information, and then measure the proximity (or similarity) between the vectors. This task is especially beneficial for tasks such as information retrieval and clustering/grouping.`
`866`	`858`	`![sentence similarity](pgml-cms/docs/images/sentence-similarity.png)`
`@@ -869,7 +861,6 @@ Sentence Similarity involves determining the degree of similarity between two te`
`869`	`861`	`<!-- # Regression`
`870`	`862`	`# Classification-->`
`871`	`863`
`872`		`-`
`873`	`864`	`#LLM Fine-tuning`
`874`	`865`
`875`	`866`	`In this section, we will provide a step-by-step walkthrough for fine-tuning a Language Model (LLM) for differnt tasks.`
`@@ -1036,7 +1027,6 @@ Fine-tuning a language model requires careful consideration of training paramete`
`1036`	`1027`	`* hub_token: Your Hugging Face API token to push the fine-tuned model to the Hugging Face Model Hub. Replace "YOUR_HUB_TOKEN" with the actual token.`
`1037`	`1028`	`* push_to_hub: A boolean flag indicating whether to push the model to the Hugging Face Model Hub after fine-tuning.`
`1038`	`1029`
`1039`		`-`
`1040`	`1030`	`####5.3 Monitoring`
`1041`	`1031`	`During training, metrics like loss, gradient norm will be printed as info and also logged in pgml.logs table. Below is a snapshot of such output.`
`1042`	`1032`
`@@ -1151,7 +1141,6 @@ Here is an example pgml.transform call for real-time predictions on the newly mi`
`1151`	`1141`	`Time:175.264 ms`
`1152`	`1142`	```
`1153`	`1143`
`1154`		`-`
`1155`	`1144`	`Batch predictions`
`1156`	`1145`
`1157`	`1146`	```sql
`@@ -1247,7 +1236,6 @@ SELECT pgml.tune(`
`1247`	`1236`
`1248`	`1237`	`By following these steps, you can effectively restart trainingfrom a previously trained model, allowing for further refinementand adaptation of the model basedon new requirementsor insights. Adjust parametersas needed for your specific use caseand dataset.`
`1249`	`1238`
`1250`		`-`
`1251`	`1239`	`## 8. Hugging Face Hub vs. PostgresML as Model Repository`
`1252`	`1240`	`We utilize the Hugging Face Hubas the primary repository for fine-tuning Large Language Models (LLMs). Leveraging the HF hub offers several advantages:`
`1253`	`1241`

`‎packages/pgml-rds-proxy/ec2/.gitignore`

Lines changed: 4 additions & 0 deletions

Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,4 @@`
	`1`	`+.terraform`
	`2`	`+*.lock.hcl`
	`3`	`+*.tfstate`
	`4`	`+*.tfstate.backup`

`‎packages/pgml-rds-proxy/ec2/README.md`

Lines changed: 7 additions & 0 deletions

Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,7 @@`
	`1`	`+#Terraform configuration for pgml-rds-proxy on EC2`
	`2`	`+`
	`3`	`+This is a sample Terraform deployment for running pgml-rds-proxy on EC2. This will spin up an EC2 instance`
	`4`	`+with a public IP and a working security group & install the community Docker runtime.`
	`5`	`+`
	`6`	`+Once the instance is running, you can connect to it using the root key and run the pgml-rds-proxy Docker container`
	`7`	+with the correct PostgresML`DATABASE_URL`.

`‎packages/pgml-rds-proxy/ec2/ec2-deployment.tf`

Lines changed: 84 additions & 0 deletions

Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,84 @@`
	`1`	`+terraform {`
	`2`	`+required_providers {`
	`3`	`+aws={`
	`4`	`+ source="hashicorp/aws"`
	`5`	`+ version="~> 5.46"`
	`6`	`+ }`
	`7`	`+ }`
	`8`	`+`
	`9`	`+required_version=">= 1.2.0"`
	`10`	`+}`
	`11`	`+`
	`12`	`+provider"aws" {`
	`13`	`+region="us-west-2"`
	`14`	`+}`
	`15`	`+`
	`16`	`+data"aws_ami""ubuntu" {`
	`17`	`+most_recent=true`
	`18`	`+`
	`19`	`+filter {`
	`20`	`+name="name"`
	`21`	`+values=["ubuntu/images/hvm-ssd/ubuntu-jammy-22.04-amd64-server-*"]`
	`22`	`+ }`
	`23`	`+`
	`24`	`+filter {`
	`25`	`+name="virtualization-type"`
	`26`	`+values=["hvm"]`
	`27`	`+ }`
	`28`	`+`
	`29`	`+owners=["099720109477"]# Canonical`
	`30`	`+}`
	`31`	`+`
	`32`	`+resource"aws_security_group""pgml-rds-proxy" {`
	`33`	`+egress {`
	`34`	`+from_port=0`
	`35`	`+to_port=0`
	`36`	`+protocol="-1"`
	`37`	`+cidr_blocks=["0.0.0.0/0"]`
	`38`	`+ipv6_cidr_blocks=["::/0"]`
	`39`	`+ }`
	`40`	`+`
	`41`	`+ingress {`
	`42`	`+from_port=6432`
	`43`	`+to_port=6432`
	`44`	`+protocol="tcp"`
	`45`	`+cidr_blocks=["0.0.0.0/0"]`
	`46`	`+ipv6_cidr_blocks=["::/0"]`
	`47`	`+ }`
	`48`	`+`
	`49`	`+ingress {`
	`50`	`+from_port=22`
	`51`	`+to_port=22`
	`52`	`+protocol="tcp"`
	`53`	`+cidr_blocks=["0.0.0.0/0"]`
	`54`	`+ipv6_cidr_blocks=["::/0"]`
	`55`	`+ }`
	`56`	`+}`
	`57`	`+`
	`58`	`+resource"aws_instance""pgml-rds-proxy" {`
	`59`	`+ami=data.aws_ami.ubuntu.id`
	`60`	`+instance_type="t3.micro"`
	`61`	`+key_name=var.root_key`
	`62`	`+`
	`63`	`+root_block_device {`
	`64`	`+volume_size=30`
	`65`	`+delete_on_termination=true`
	`66`	`+ }`
	`67`	`+`
	`68`	`+vpc_security_group_ids=[`
	`69`	`+"${aws_security_group.pgml-rds-proxy.id}",`
	`70`	`+ ]`
	`71`	`+`
	`72`	`+associate_public_ip_address=true`
	`73`	`+user_data=file("${path.module}/user_data.sh")`
	`74`	`+user_data_replace_on_change=false`
	`75`	`+`
	`76`	`+tags={`
	`77`	`+ Name="pgml-rds-proxy"`
	`78`	`+ }`
	`79`	`+}`
	`80`	`+`
	`81`	`+variable"root_key" {`
	`82`	`+type=string`
	`83`	`+description="The name of the SSH Root Key you'd like to assign to this EC2 instance. Make sure it's a key you have access to."`
	`84`	`+}`

`‎packages/pgml-rds-proxy/ec2/user_data.sh`

Lines changed: 21 additions & 0 deletions

Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,21 @@`
	`1`	`+#!/bin/bash`
	`2`	`+#`
	`3`	`+# Cloud init script to install Docker on an EC2 instance running Ubuntu 22.04.`
	`4`	`+#`
	`5`	`+`
	`6`	`+sudo apt-get update`
	`7`	`+sudo apt-get install ca-certificates curl`
	`8`	`+sudo install -m 0755 -d /etc/apt/keyrings`
	`9`	`+sudo curl -fsSL https://download.docker.com/linux/ubuntu/gpg -o /etc/apt/keyrings/docker.asc`
	`10`	`+sudo chmod a+r /etc/apt/keyrings/docker.asc`
	`11`	`+`
	`12`	`+# Add the repository to Apt sources:`
	`13`	`+echo \`
	`14`	`+"deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/ubuntu\`
	`15`	`+$(. /etc/os-release&&echo"$VERSION_CODENAME") stable"\| \`
	`16`	`+ sudo tee /etc/apt/sources.list.d/docker.list> /dev/null`
	`17`	`+sudo apt-get update`
	`18`	`+`
	`19`	`+sudo apt-get install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin`
	`20`	`+sudo groupadd docker`
	`21`	`+sudo usermod -aG docker ubuntu`

`‎pgml-apps/pgml-chat/README.md`

Lines changed: 0 additions & 5 deletions

Original file line number	Diff line number	Diff line change
`@@ -14,7 +14,6 @@ Before you begin, make sure you have the following:`
`14`	`14`	`- Python version >=3.8`
`15`	`15`	`- (Optional) OpenAI API key`
`16`	`16`
`17`		`-`
`18`	`17`	`#Getting started`
`19`	`18`	1. Create a virtual environment and install`pgml-chat` using`pip`:
`20`	`19`	```bash
`@@ -104,7 +103,6 @@ model performance, as well as integrated notebooks for rapid iteration. Postgres`
`104`	`103`	`If you have any further questions or need more information, please feel free to send an email to team@postgresml.org or join the PostgresML Discord community at https://discord.gg/DmyJP3qJ7U.`
`105`	`104`	```
`106`	`105`
`107`		`-`
`108`	`106`	`### Slack`
`109`	`107`
`110`	`108`	`Setup`
`@@ -128,7 +126,6 @@ Once the slack app is running, you can interact with the chatbot on Slack as sho`
`128`	`126`
`129`	`127`	`![Slack Chatbot](./images/slack_screenshot.png)`
`130`	`128`
`131`		`-`
`132`	`129`	`### Discord`
`133`	`130`
`134`	`131`	`Setup`
`@@ -194,8 +191,6 @@ pip install .`
`194`	`191`	`4. Check the [roadmap](#roadmap) for features that you would like to work on.`
`195`	`192`	`5. If you are looking for features that are not included here, please open an issue and we will add it to the roadmap.`
`196`	`193`
`197`		`-`
`198`		`-`
`199`	`194`	`# Roadmap`
`200`	`195`	`- ~~Use a collection for chat history that can be retrieved and used to generate responses.~~`
`201`	`196`	`- Support for file formats like rst, html, pdf, docx, etc.`

`‎pgml-cms/.gitignore`

Lines changed: 1 addition & 0 deletions

Original file line number	Diff line number	Diff line change
`@@ -0,0 +1 @@`
	`1`	`+*.md.bak`

`‎pgml-cms/blog/.gitbook/assets/landscape.png`

942 KB

`‎pgml-cms/blog/.gitbook/assets/machine-learning-platform.png`

132 KB

`‎pgml-cms/blog/.gitbook/assets/open-weight-models.png`

142 KB

0 commit comments

Comments

(0)

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Commitb044163

File tree

83 files changed

Some content is hidden

83 files changed

`‎README.md`

`‎packages/pgml-rds-proxy/ec2/.gitignore`

`‎packages/pgml-rds-proxy/ec2/README.md`

`‎packages/pgml-rds-proxy/ec2/ec2-deployment.tf`

`‎packages/pgml-rds-proxy/ec2/user_data.sh`

`‎pgml-apps/pgml-chat/README.md`

`‎pgml-cms/.gitignore`

`‎pgml-cms/blog/.gitbook/assets/landscape.png`

`‎pgml-cms/blog/.gitbook/assets/machine-learning-platform.png`

`‎pgml-cms/blog/.gitbook/assets/open-weight-models.png`

0 commit comments