Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Data docs#1418

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Merged
levkk merged 12 commits intomasterfromlevkk-more-docs
Apr 24, 2024
Merged
Show file tree
Hide file tree
Changes fromall commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletionspackages/pgml-rds-proxy/ec2/.gitignore
View file
Open in desktop
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
.terraform
*.lock.hcl
*.tfstate
*.tfstate.backup
7 changes: 7 additions & 0 deletionspackages/pgml-rds-proxy/ec2/README.md
View file
Open in desktop
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
# Terraform configuration for pgml-rds-proxy on EC2

This is a sample Terraform deployment for running pgml-rds-proxy on EC2. This will spin up an EC2 instance
with a public IP and a working security group & install the community Docker runtime.

Once the instance is running, you can connect to it using the root key and run the pgml-rds-proxy Docker container
with the correct PostgresML `DATABASE_URL`.
84 changes: 84 additions & 0 deletionspackages/pgml-rds-proxy/ec2/ec2-deployment.tf
View file
Open in desktop
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,84 @@
terraform {
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 5.46"
}
}

required_version = ">= 1.2.0"
}

provider "aws" {
region = "us-west-2"
}

data "aws_ami" "ubuntu" {
most_recent = true

filter {
name = "name"
values = ["ubuntu/images/hvm-ssd/ubuntu-jammy-22.04-amd64-server-*"]
}

filter {
name = "virtualization-type"
values = ["hvm"]
}

owners = ["099720109477"] # Canonical
}

resource "aws_security_group" "pgml-rds-proxy" {
egress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
ipv6_cidr_blocks = ["::/0"]
}

ingress {
from_port = 6432
to_port = 6432
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
ipv6_cidr_blocks = ["::/0"]
}

ingress {
from_port = 22
to_port = 22
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
ipv6_cidr_blocks = ["::/0"]
}
}

resource "aws_instance" "pgml-rds-proxy" {
ami = data.aws_ami.ubuntu.id
instance_type = "t3.micro"
key_name = var.root_key

root_block_device {
volume_size = 30
delete_on_termination = true
}

vpc_security_group_ids = [
"${aws_security_group.pgml-rds-proxy.id}",
]

associate_public_ip_address = true
user_data = file("${path.module}/user_data.sh")
user_data_replace_on_change = false

tags = {
Name = "pgml-rds-proxy"
}
}

variable "root_key" {
type = string
description = "The name of the SSH Root Key you'd like to assign to this EC2 instance. Make sure it's a key you have access to."
}
21 changes: 21 additions & 0 deletionspackages/pgml-rds-proxy/ec2/user_data.sh
View file
Open in desktop
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
#!/bin/bash
#
# Cloud init script to install Docker on an EC2 instance running Ubuntu 22.04.
#

sudo apt-get update
sudo apt-get install ca-certificates curl
sudo install -m 0755 -d /etc/apt/keyrings
sudo curl -fsSL https://download.docker.com/linux/ubuntu/gpg -o /etc/apt/keyrings/docker.asc
sudo chmod a+r /etc/apt/keyrings/docker.asc

# Add the repository to Apt sources:
echo \
"deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/ubuntu \
$(. /etc/os-release && echo "$VERSION_CODENAME") stable" | \
sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt-get update

sudo apt-get install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
sudo groupadd docker
sudo usermod -aG docker ubuntu
1 change: 1 addition & 0 deletionspgml-cms/.gitignore
View file
Open in desktop
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
*.md.bak
View file
Open in desktop
Loading
Sorry, something went wrong.Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modifiedpgml-cms/docs/.gitbook/assets/architecture.png
View file
Open in desktop
Loading
Sorry, something went wrong.Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file addedpgml-cms/docs/.gitbook/assets/fdw_1.png
View file
Open in desktop
Loading
Sorry, something went wrong.Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
View file
Open in desktop
Loading
Sorry, something went wrong.Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file addedpgml-cms/docs/.gitbook/assets/vpc_1.png
View file
Open in desktop
Loading
Sorry, something went wrong.Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
58 changes: 35 additions & 23 deletionspgml-cms/docs/README.md
View file
Open in desktop
Original file line numberDiff line numberDiff line change
Expand Up@@ -4,38 +4,50 @@ description: The key concepts that make up PostgresML.

# Overview

PostgresML is a complete MLOps platform built on PostgreSQL.
PostgresML is a complete MLOps platform built on PostgreSQL. Our operating principle is:

> _Move the models to the database, rather thancontinuously moving the data to the models._
> _Move the models to the database, rather thanconstantly moving the data to the models._

The data for ML & AI systems is inherently larger and more dynamic than the models. It's more efficient, manageable and reliable to move the models to the database, rather than continuously movingthedata to the models. PostgresML allows you to take advantage of the fundamental relationship between data and models, by extending the database with the following capabilities and goals:
The data for ML & AI systems is inherently larger and more dynamic than the models. It's more efficient, manageable and reliable to move the models to the database, rather than continuously moving data to the models.

* **Model Serving** - _**GPU accelerated**_ inference engine for interactive applications, with no additional networking latency or reliability costs.
* **Model Store** - Download _**open-source**_ models including state of the art LLMs from HuggingFace, and track changes in performance between versions.
* **Model Training** - Train models with _**your application data**_ using more than 50 algorithms for regression, classification or clustering tasks. Fine tune pre-trained models like LLaMA and BERT to improve performance.
* **Feature Store** - _**Scalable**_ access to model inputs, including vector, text, categorical, and numeric data. Vector database, text search, knowledge graph and application data all in one _**low-latency**_ system.
## AI engine

<figure><img src=".gitbook/assets/ml_system.svg" alt="Machine Learning Infrastructure (2.0) by a16z"><figcaption><p>PostgresML handles all of the functions typically performedbya cacophony of services, <a href="https://a16z.com/emerging-architectures-for-modern-data-infrastructure/">described by a16z</a></p></figcaption></figure>
PostgresML allows you to take advantage of the fundamental relationship between data and models,byextending the database with the following capabilities:

These capabilities are primarily provided by two open-source software projects, that may be used independently, but are designed to be used with the rest of the Postgres ecosystem, including trusted extensions like pgvector and pg\_partman.
* **Model Serving** - GPU accelerated inference engine for interactive applications, with no additional networking latency or reliability costs
* **Model Store** - Access to open-source models including state of the art LLMs from HuggingFace, and track changes in performance between versions
* **Model Training** - Train models with your application data using more than 50 algorithms for regression, classification or clustering tasks; fine tune pre-trained models like LLaMA and BERT to improve performance
* **Feature Store** - Scalable access to model inputs, including vector, text, categorical, and numeric data: vector database, text search, knowledge graph and application data all in one low-latency system

* **pgml** is an open source extension for PostgreSQL. It adds support for GPUs and the latest ML & AI algorithms _**inside**_ the database with a SQL API and no additional infrastructure, networking latency, or reliability costs.
* **PgCat** is an open source proxy pooler for PostgreSQL. It abstracts the scalability and reliability concerns of managing a distributed cluster of Postgres databases. Client applications connect only to the proxy, which handles load balancing and failover, _**outside**_ of any single database.
<figure><img src=".gitbook/assets/ml_system.svg" alt="Machine Learning Infrastructure (2.0) by a16z"><figcaption class="mt-2"><p>PostgresML handles all of the functions <a href="https://a16z.com/emerging-architectures-for-modern-data-infrastructure/">described by a16z</a></p></figcaption></figure>

<figure><img src=".gitbook/assets/architecture.png" alt="PostgresML architectural diagram" width="275"><figcaption><p>A PostgresML deployment at scale</p></figcaption></figure>
These capabilities are primarily provided by two open-source software projects, that may be used independently, but are designed to be used with the rest of the Postgres ecosystem:

In addition, PostgresML provides [native language SDKs](https://github.com/postgresml/postgresml/tree/master/pgml-sdks/pgml) to implement best practices for common ML & AI applications. The JavaScript and Python SDKs are generated from the core Rust SDK, to provide the same API, correctness and efficiency across all application runtimes.
* **pgml** - an open source extension for PostgreSQL. It adds support for GPUs and the latest ML & AI algorithms _inside_ the database with a SQL API and no additional infrastructure, networking latency, or reliability costs
* **PgCat** - an open source pooler for PostgreSQL. It abstracts the scalability and reliability concerns of managing a distributed cluster of Postgres databases. Client applications connect only to the pooler, which handles load balancing, sharding, and failover, outside of any single database server.

SDK clients can perform advanced machine learning tasks in a single SQL request, without having to transfer additional data, models, hardware or dependencies to the client application. For example:
<figure><img src=".gitbook/assets/architecture.png" alt="PostgresML architectural diagram"><figcaption></figcaption></figure>

* Chat with streaming response support from the latest LLMs
* Search with both keywords and embedding vectors
* Text Generation with RAG in a single request
* Translate text between hundreds of language pairs
* Summarization to distil complex documents
* Forecasting timeseries data for key metrics with complex metadata
* Fraud and anomaly detection with application data
## Client SDK

Our goal is to provide access to Open Source AI for everyone. PostgresML is under continuous development to keep up with the rapidly evolving use casesfor ML & AI, and we release non breaking changes with minor version updates in accordance with SemVer. We welcome contributions to our [open source code and documentation](https://github.com/postgresml).
The PostgresML team also provides [native language SDKs](https://github.com/postgresml/postgresml/tree/master/pgml-sdks/pgml) which implement best practicesforcommonML & AI applications. The JavaScript and Python SDKs are generated from the a core Rust library, which provides a uniform API, correctness and efficiency across all environments.

We can host your AI database in our cloud, or you can run our Docker image locally with PostgreSQL, pgml, pgvector and NVIDIA drivers included.
While using the SDK is completely optional, SDK clients can perform advanced machine learning tasks in a single SQL request, without having to transfer additional data, models, hardware or dependencies to the client application.

Use cases include:

* Chat with streaming responses from state-of-the-art open source LLMs
* Semantic search with keywords and embeddings
* RAG in a single request without using any third-party services
* Text translation between hundreds of languages
* Text summarization to distill complex documents
* Forecasting timeseries data for key metrics with and metadata
* Anomaly detection using application data

## Our mission

PostgresML strives to provide access to open source AI for everyone. We are continuously developping PostgresML to keep up with the rapidly evolving use cases for ML & AI, but we remain committed to never breaking user facing APIs. We welcome contributions to our [open source code and documentation](https://github.com/postgresml) from the community.

## Managed cloud

While our extension and pooler are open source, we also offer a managed cloud database service for production deployments of PostgresML. You can [sign up](https://postgresml.org/signup) for an account and get a free Serverless database in seconds.
12 changes: 7 additions & 5 deletionspgml-cms/docs/SUMMARY.md
View file
Open in desktop
Original file line numberDiff line numberDiff line change
Expand Up@@ -6,9 +6,11 @@
* [Getting Started](introduction/getting-started/README.md)
* [Create your database](introduction/getting-started/create-your-database.md)
* [Connect your app](introduction/getting-started/connect-your-app.md)
* [Import your data](introduction/getting-started/import-your-data/README.md)
* [CSV](introduction/getting-started/import-your-data/csv.md)
* [Foreign Data Wrapper](introduction/getting-started/import-your-data/foreign-data-wrapper.md)
* [Import your data](introduction/getting-started/import-your-data/README.md)
* [Logical replication](introduction/getting-started/import-your-data/logical-replication/README.md)
* [Foreign Data Wrappers](introduction/getting-started/import-your-data/foreign-data-wrappers.md)
* [Move data with COPY](introduction/getting-started/import-your-data/copy.md)
* [Migrate with pg_dump](introduction/getting-started/import-your-data/pg-dump.md)

## API

Expand DownExpand Up@@ -51,7 +53,7 @@
## Product

* [Cloud Database](product/cloud-database/README.md)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

This is Cloud as opposed to Vector section

* [Serverless databases](product/cloud-database/serverless-databases.md)
* [Serverless](product/cloud-database/serverless.md)
* [Dedicated](product/cloud-database/dedicated.md)
* [Enterprise](product/cloud-database/plans.md)
* [Vector Database](product/vector-database.md)
Expand DownExpand Up@@ -79,7 +81,7 @@
## Resources

* [FAQs](resources/faqs.md)
* [Data Storage & Retrieval](resources/data-storage-and-retrieval/README.md)
* [Data Storage & Retrieval](resources/data-storage-and-retrieval/tabular-data.md)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

This is a dup?

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Not a dup, README.md is currently empty.

* [Tabular data](resources/data-storage-and-retrieval/tabular-data.md)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Suggested change
* [Tabular data](resources/data-storage-and-retrieval/tabular-data.md)

* [Documents](resources/data-storage-and-retrieval/documents.md)
* [Partitioning](resources/data-storage-and-retrieval/partitioning.md)
Expand Down
2 changes: 1 addition & 1 deletionpgml-cms/docs/api/client-sdk/README.md
View file
Open in desktop
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
# ClientSDKs
# ClientSDK

### Key Features

Expand Down
4 changes: 2 additions & 2 deletionspgml-cms/docs/api/client-sdk/getting-started.md
View file
Open in desktop
Original file line numberDiff line numberDiff line change
Expand Up@@ -18,7 +18,7 @@ pip install pgml

## Example

Once the SDK is installed, youan use the following example to get started.
Once the SDK is installed, youcan use the following example to get started.

### Create a collection

Expand DownExpand Up@@ -85,7 +85,7 @@ await collection.add_pipeline(pipeline)
{% endtab %}
{% endtabs %}

#### Explanation:
#### Explanation

* The code constructs a pipeline called `"sample_pipeline"` and adds it to the collection we Initialized above. This pipeline automatically generates chunks and embeddings for the `text` key for every upserted document.

Expand Down
12 changes: 6 additions & 6 deletionspgml-cms/docs/introduction/getting-started/README.md
View file
Open in desktop
Original file line numberDiff line numberDiff line change
Expand Up@@ -4,14 +4,14 @@ description: Setup a database and connect your application to PostgresML

# Getting Started

A PostgresML deployment consists of multiple components working in concert to provide a complete Machine Learning platform. We provide a fully managed solution in our cloud.
A PostgresML deployment consists of multiple components working in concert to provide a complete Machine Learning platform. We provide a fully managed solution in[our cloud](create-your-database), and document a self-hosted installation in [Developer Docs](/docs/resources/developer-docs/quick-start-with-docker).

*APostgreSQL database, with pgmlandpgvectorextensions installed, including backups, metrics, logs, replicas and high availability configurations
*APgCatpooling proxyto provide secure access and model load balancing across tens of thousands of clients
* A web application to manage deployed models andhost SQL notebooks
* PostgreSQL database, with`pgml`, `pgvector`andmany otherextensions installed, including backups, metrics, logs, replicas and high availability
* PgCatpoolerto provide secure access and model load balancing across thousands of clients
* A web application to manage deployed models andshare experiments and analysis in SQL notebooks

<figure><img src="../../.gitbook/assets/architecture.png" alt=""><figcaption></figcaption></figure>
<figure class="m-3"><img src="../../.gitbook/assets/architecture.png" alt="PostgresML architecture"><figcaption></figcaption></figure>

By building PostgresML on top of a mature database, we get reliable backups for model inputs and proven scalability without reinventing the wheel, so that we can focus on providing access to the latest developments in open source machine learning and artificial intelligence.

This guide will help you get started with a generous free account, that includes access to GPU accelerated models and5GBof storage, or you can skip to our Developer Docs to see how to run PostgresML locally with our Docker image.
This guide will help you get started with a generous free account, that includes access to GPU accelerated models and5 GBof storage, or you can skip to our[Developer Docs](/docs/resources/developer-docs/quick-start-with-docker) to see how to run PostgresML locally with our Docker image.
View file
Open in desktop
Original file line numberDiff line numberDiff line change
Expand Up@@ -4,16 +4,16 @@ description: PostgresML is compatible with all standard PostgreSQL clients

# Connect your app

You can connect to your database from anyPostgrescompatible client. PostgresMLis intended toserve in the traditional role of an application database, along with it's extended role as an MLOps platform to make it easy to build and maintain AI applications.
You can connect to yourPostgresMLdatabase from anyPostgreSQL-compatible client. PostgresMLcanserve in the traditional role of an application database, along with it's extended role as an MLOps platform, to make it easy to build and maintain AI applications together with your application data.

##Application SDKs
##Client SDK

We provide client SDKs for JavaScript, Python and Rust apps that manage connections to the Postgres database and make it easy to construct efficient queries for AI use cases, like managing a document collection for RAG, or building a chatbot. All of the ML & AI still happens in the database, with centralized operations, hardware and dependency management.

These SDKs are under rapid development to add new features and use cases, but we release non breaking changes with minor version updates in accordance with SemVer. It's easy to install into your existing application.
We provide a client SDK for JavaScript, Python and Rust. The SDK manages connections to the database, and makes it easy to construct efficient queries for AI use cases, like managing RAG document collections, or building chatbots. All of the ML & AI still happens inside the database, with centralized operations, hardware and dependency management.

### Installation

The SDK is available from npm and PyPI:

{% tabs %}
{% tab title="JavaScript" %}
```bash
Expand All@@ -28,8 +28,12 @@ pip install pgml
{% endtab %}
{% endtabs %}

Our SDK comes with zero additional dependencies. The core of the SDK is written in Rust, and we provide language bindings and native packaging & distribution.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Suggested change
Our SDK comes with zero additional dependencies. The core of the SDK is written in Rust, and we provide language bindings and native packaging & distribution.
Our SDK comes with zero additional dependencies, to provide the simplest and safest ML application deployment and maintenance possible. The core of the SDK is written in Rust, and we provide language bindings and native packaging & distribution.


### Test the connection

Once you have installed our SDK into your environment, you can test connectivity to our cloud with just a few lines of code:

{% tabs %}
{% tab title="JavaScript" %}
```javascript
Expand DownExpand Up@@ -80,9 +84,9 @@ async def main():
{% endtab %}
{% endtabs %}

## NativeLanguage Bindings
## NativePostgreSQL libraries

You can alsoconnect directly to the databasewith your favoritebindings or ORM:
Using the SDK is completely optional. If you're comfortable with writing SQL, you canconnect directly to the databaseusing your favoritePostgreSQL client library or ORM:

* C++: [libpqxx](https://www.tutorialspoint.com/postgresql/postgresql\_c\_cpp.htm)
* C#: [Npgsql](https://github.com/npgsql/npgsql),[Dapper](https://github.com/DapperLib/Dapper), or [Entity Framework Core](https://github.com/dotnet/efcore)
Expand All@@ -101,9 +105,9 @@ You can also connect directly to the database with your favorite bindings or ORM
* Rust: [postgres](https://crates.io/crates/postgres), [SQLx](https://github.com/launchbadge/sqlx) or [Diesel](https://github.com/diesel-rs/diesel)
* Swift: [PostgresNIO](https://github.com/vapor/postgres-nio) or [PostgresClientKit](https://github.com/codewinsdotcom/PostgresClientKit)

## SQLEditors
## SQLeditors

Useany of these popular tools to execute SQL queries directlyagainst the database:
If you need to write ad-hoc queries, you can useany of these popular tools to execute SQL queries directlyon your database:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Suggested change
If you need to write ad-hoc queries, you can use any of these popular tools to execute SQL queries directly on your database:
If you need to write ad-hoc queries,or perform administrative functions,you can use any of these popular tools to execute SQL queries directly on your database:


* [Apache Superset](https://superset.apache.org/)
* [DBeaver](https://dbeaver.io/)
Expand Down
Loading

[8]ページ先頭

©2009-2025 Movatter.jp