Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commit2b83a11

Browse files
authored
more docs (#1425)
1 parent4a82a57 commit2b83a11

File tree

24 files changed

+443
-333
lines changed

24 files changed

+443
-333
lines changed
4.1 KB
Loading
Loading

‎pgml-cms/docs/SUMMARY.md

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -41,7 +41,6 @@
4141
*[Zero-shot Classification](api/sql-extension/pgml.transform/zero-shot-classification.md)
4242
*[pgml.tune()](api/sql-extension/pgml.tune.md)
4343
*[Client SDK](api/client-sdk/README.md)
44-
*[Overview](api/client-sdk/getting-started.md)
4544
*[Collections](api/client-sdk/collections.md)
4645
*[Pipelines](api/client-sdk/pipelines.md)
4746
*[Vector Search](api/client-sdk/search.md)

‎pgml-cms/docs/api/apis.md

Lines changed: 34 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -1,28 +1,47 @@
1-
#Overview
1+
---
2+
description:Overview of the PostgresML SQL API and SDK.
3+
---
24

3-
##Introduction
5+
#API overview
46

5-
PostgresML addsextensionsto thePostgreSQLdatabase, as well as providing separate Client SDKs in JavaScriptandPython that leverage the database to implement common ML & AI use cases.
7+
PostgresMLis a PostgreSQL extension whichaddsSQL functionsto the database where it's installed. The functions work with modern machine learning algorithmsandlatest open source LLMs while maintaining a stable API signature. They can be used by any application that connects to the database.
68

7-
The extensions provide all of the ML & AI functionality via SQL APIs, like trainingandinference. They are designed to be used directly for allMLpractitioners who implement dozens of different use cases on their own machine learning models.
9+
In addition to the SQL API, we built and maintain a client SDK for JavaScript, PythonandRust. The SDK uses the same extension functionality to implement commonML& AI use cases, like retrieval-augmented generation (RAG), chatbots, and semantic & hybrid search engines.
810

9-
We also provide Client SDKs thatimplement thebest practices on top of the SQL APIs, to ease adoption and implement common application use cases in applications, like chatbots or search engines.
11+
Using the SDK is optional, and you canimplement thesame functionality with standard SQL queries. If you feel more comfortable using a programming language, the SDK can help you to get started quickly.
1012

11-
##SQLExtension
13+
##[SQLextension](sql-extension/)
1214

13-
PostgreSQL is designed to be_**extensible**_. This has created a rich open-source ecosystem of additional functionality built around the core project. Some[extensions](https://www.postgresql.org/docs/current/contrib.html) are include in the base Postgres distribution, but others are also available via the[PostgreSQL Extension Network](https://pgxn.org/).\
14-
There are 2 foundational extensions included in a PostgresML deployment that provide functionality inside the database through SQL APIs.
15+
The PostgreSQL extension provides all of the ML & AI functionality, like training models and inference, via SQL functions. The functions are designed for ML practitioners to use dozens of ML algorithms to train models, and run real time inference, on live application data. Additionally, the extension provides access to the latest Hugging Face transformers for a wide range of NLP tasks.
1516

16-
***pgml** - provides Machine Learning and Artificial Intelligence APIs with access to more than 50 ML algorithms to train classification, clustering and regression models on your own data, or you can perform dozens of tasks with thousands of models downloaded from HuggingFace.
17-
***pgvector** - provides indexing and search functionality on vectors, in addition to the traditional application database storage, including JSON and plain text, provided by PostgreSQL.
17+
###Functions
1818

19-
Learn more about developing withthe[sql-extension](sql-extension/"mention")
19+
The following functions are implemented and maintained bythePostgresML extension:
2020

21-
##Client SDK
21+
| Function name| Description|
22+
|------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
23+
|[pgml.embed()](sql-extension/pgml.embed)| Generate embeddings inside the database using open source embedding models from Hugging Face.|
24+
|[pgml.transform()](sql-extension/pgml.transform/)| Download and run latest Hugging Face transformer models, like Llama, Mixtral, and many more to perform various NLP tasks like text generation, summarization, sentiment analysis and more.|
25+
|[pgml.train()](sql-extension/pgml.train/)| Train a machine learning model on data from a Postgres table or view. Supports XGBoost, LightGBM, Catboost and all Scikit-learn algorithms.|
26+
|[pgml.deploy()](sql-extension/pgml.deploy)| Deploy a version of the model created with pgml.train().|
27+
|[pgml.predict()](sql-extension/pgml.predict/)| Perform real time inference using a model trained with pgml.train() on live application data.|
28+
|[pgml.tune()](sql-extension/pgml.tune)| Run LoRA fine tuning on an open source model from Hugging Face using data from a Postgres table or view.|
2229

23-
PostgresML provides a client SDK that streamlines ML & AI use cases in both JavaScript and Python. With this SDK, you can seamlessly manage various database tables relatedtodocuments, text chunks, text splitters, LLM (Language Model) models,andembeddings. By leveragingtheSDK's capabilities, you can efficiently index LLM embeddings using pgvector with HNSW for fast and accurate queries.
30+
Together with standard database functionality provided by PostgreSQL, these functions allowtocreateandmanagetheentire life cycle of a machine learning application.
2431

25-
The SDK delegates all work to the extension running in the database, which minimizes software and hardware dependencies that need to be maintained at the application layer, as well as securing data and models inside the data center. OurSDK minimizes data transfer to maximize performance, efficiency, security and reliability.
32+
##[ClientSDK](client-sdk/)
2633

27-
Learn more about developing withthe[client-sdk](client-sdk/"mention")
34+
The client SDK implements best practices and common use cases, usingthePostgresML SQL functions and standard PostgreSQL features to do it. The SDK core is written in Rust, which manages creating and running queries, connection pooling, and error handling.
2835

36+
For each additional language we support (current JavaScript and Python), we create and publish language-native bindings. This architecture ensures all programming languages we support have identical APIs and similar performance when interacting with PostgresML.
37+
38+
###Use cases
39+
40+
The SDK currently implements the following use cases:
41+
42+
| Use case| Description|
43+
|----------|---------|
44+
|[Collections](client-sdk/collections)| Manage documents, embeddings, full text and vector search indexes, and more, using one simple interface.|
45+
|[Pipelines](client-sdk/pipelines)| Easily build complex queries to interact with collections using a programmable interface.|
46+
|[Vector search](client-sdk/search)| Implement semantic search using in-database generated embeddings and ANN vector indexes.|
47+
|[Document search](client-sdk/document-search)| Implement hybrid full text search using in-database generated embeddings and PostgreSQL tsvector indexes.|

‎pgml-cms/docs/api/client-sdk/README.md

Lines changed: 239 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -1,24 +1,247 @@
1+
---
2+
description:PostgresML client SDK for JavaScript, Python and Rust implements common use cases and PostgresML connection management.
3+
---
4+
15
#Client SDK
26

3-
###Key Features
7+
The client SDK can be installed using standard package managers for JavaScript, Python, and Rust. Since the SDK is written in Rust, the JavaScript and Python packages come with no additional dependencies.
8+
9+
10+
##Installation
11+
12+
Installing the SDK into your project is as simple as:
13+
14+
{% tabs %}
15+
{% tab title="JavaScript " %}
16+
```bash
17+
npm i pgml
18+
```
19+
{% endtab %}
20+
21+
{% tab title="Python " %}
22+
```bash
23+
pip install pgml
24+
```
25+
{% endtab %}
26+
{% endtabs %}
27+
28+
##Getting started
29+
30+
The SDK uses the database to perform most of its functionality. Before continuing, make sure you created a[PostgresML database](https://postgresml.org/signup) and have the`DATABASE_URL` connection string handy.
31+
32+
###Connect to PostgresML
33+
34+
The SDK automatically manages connections to PostgresML. The connection string can be specified as an argument to the collection constructor, or as an environment variable.
35+
36+
If your app follows the twelve-factor convention, we recommend you configure the connection in the environment using the`PGML_DATABASE_URL` variable:
37+
38+
```bash
39+
export PGML_DATABASE_URL=postgres://user:password@sql.cloud.postgresml.org:6432/pgml_database
40+
```
41+
42+
###Create a collection
43+
44+
The SDK is written in asynchronous code, so you need to run it inside an async runtime. Both Python and JavaScript support async functions natively.
45+
46+
{% tabs %}
47+
{% tab title="JavaScript " %}
48+
```javascript
49+
constpgml=require("pgml");
50+
51+
constmain=async ()=> {
52+
constcollection=pgml.newCollection("sample_collection");
53+
}
54+
```
55+
{% endtab %}
56+
57+
{% tab title="Python" %}
58+
```python
59+
from pgmlimport Collection, Pipeline
60+
import asyncio
61+
62+
asyncdefmain():
63+
collection= Collection("sample_collection")
64+
```
65+
{% endtab %}
66+
{% endtabs %}
67+
68+
The above example imports the`pgml` module and creates a collection object. By itself, the collection only tracks document contents and identifiers, but once we add a pipeline, we can instruct the SDK to perform additional tasks when documents and are inserted and retrieved.
69+
70+
71+
###Create a pipeline
72+
73+
Continuing the example, we will create a pipeline called`sample_pipeline`, which will use in-database embeddings generation to automatically chunk and embed documents:
74+
75+
{% tabs %}
76+
{% tab title="JavaScript" %}
77+
```javascript
78+
// Add this code to the end of the main function from the above example.
79+
constpipeline=pgml.newPipeline("sample_pipeline", {
80+
text: {
81+
splitter: { model:"recursive_character" },
82+
semantic_search: {
83+
model:"intfloat/e5-small",
84+
},
85+
},
86+
});
87+
88+
awaitcollection.add_pipeline(pipeline);
89+
```
90+
{% endtab %}
91+
92+
{% tab title="Python" %}
93+
```python
94+
# Add this code to the end of the main function from the above example.
95+
pipeline= Pipeline(
96+
"test_pipeline",
97+
{
98+
"text": {
99+
"splitter": {"model":"recursive_character" },
100+
"semantic_search": {
101+
"model":"intfloat/e5-small",
102+
},
103+
},
104+
},
105+
)
106+
107+
await collection.add_pipeline(pipeline)
108+
```
109+
{% endtab %}
110+
{% endtabs %}
111+
112+
The pipeline configuration is a key/value object, where the key is the name of a column in a document, and the value is the action the SDK should perform on that column.
113+
114+
In this example, the documents contain a column called`text` which we are instructing the SDK to chunk the contents of using the recursive character splitter, and to embed those chunks using the Hugging Face`intfloat/e5-small` embeddings model.
115+
116+
###Add documents
117+
118+
Once the pipeline is configured, we can start adding documents:
119+
120+
{% tabs %}
121+
{% tab title="JavaScript" %}
122+
```javascript
123+
// Add this code to the end of the main function from the above example.
124+
constdocuments= [
125+
{
126+
id:"Document One",
127+
text:"document one contents...",
128+
},
129+
{
130+
id:"Document Two",
131+
text:"document two contents...",
132+
},
133+
];
134+
135+
awaitcollection.upsert_documents(documents);
136+
```
137+
{% endtab %}
138+
139+
{% tab title="Python" %}
140+
```python
141+
# Add this code to the end of the main function in the above example.
142+
documents= [
143+
{
144+
"id":"Document One",
145+
"text":"document one contents...",
146+
},
147+
{
148+
"id":"Document Two",
149+
"text":"document two contents...",
150+
},
151+
]
152+
153+
await collection.upsert_documents(documents)
154+
```
155+
{% endtab %}
156+
{% endtabs %}
157+
158+
If the same document`id` is used, the SDK computes the difference between existing and new documents and only updates the chunks that have changed.
159+
160+
###Search documents
161+
162+
Now that the documents are stored, chunked and embedded, we can start searching the collection:
163+
164+
{% tabs %}
165+
{% tab title="JavaScript" %}
166+
```javascript
167+
// Add this code to the end of the main function in the above example.
168+
constresults=awaitcollection.vector_search(
169+
{
170+
query: {
171+
fields: {
172+
text: {
173+
query:"Something about a document...",
174+
},
175+
},
176+
},
177+
limit:2,
178+
},
179+
pipeline,
180+
);
181+
182+
console.log(results);
183+
```
184+
{% endtab %}
185+
186+
{% tab title="Python" %}
187+
```python
188+
# Add this code to the end of the main function in the above example.
189+
results=await collection.vector_search(
190+
{
191+
"query": {
192+
"fields": {
193+
"text": {
194+
"query":"Something about a document...",
195+
},
196+
},
197+
},
198+
"limit":2,
199+
},
200+
pipeline,
201+
)
202+
203+
print(results)
204+
```
205+
{% endtab %}
206+
{% endtabs %}
207+
208+
We are using built-in vector search, powered by embeddings and the PostgresML[pgml.embed()](../sql-extension/pgml.embed) function, which embeds the`query` argument, compares it to the embeddings stored in the database, and returns the top two results, ranked by cosine similarity.
209+
210+
###Run the example
4211

5-
***Automated Database Management**: You can easily handle the management of database tables related to documents, text chunks, text splitters, LLM models, and embeddings. This automated management system simplifies the process of setting up and maintaining your vector search application's data structure.
6-
***Embedding Generation from Open Source Models**: Provides the ability to generate embeddings using hundreds of open source models. These models, trained on vast amounts of data, capture the semantic meaning of text and enable powerful analysis and search capabilities.
7-
***Flexible and Scalable Vector Search**: Build flexible and scalable vector search applications. PostgresML seamlessly integrates with PgVector, a PostgreSQL extension specifically designed for handling vector-based indexing and querying. By leveraging these indices, you can perform advanced searches, rank results by relevance, and retrieve accurate and meaningful information from your database.
212+
Since the SDK is using async code, both JavaScript and Python need a little bit of code to run it correctly:
8213

9-
###Use Cases
214+
{% tabs %}
215+
{% tab title="JavaScript" %}
216+
```javascript
217+
main().then(()=> {
218+
console.log("SDK example complete");
219+
});
220+
```
221+
{% endtab %}
10222

11-
* Search: Embeddings are commonly used for search functionalities, where results are ranked by relevance to a query string. By comparing the embeddings of query strings and documents, you can retrieve search results in order of their similarity or relevance.
12-
* Clustering: With embeddings, you can group text strings by similarity, enabling clustering of related data. By measuring the similarity between embeddings, you can identify clusters or groups of text strings that share common characteristics.
13-
* Recommendations: Embeddings play a crucial role in recommendation systems. By identifying items with related text strings based on their embeddings, you can provide personalized recommendations to users.
14-
* Anomaly Detection: Anomaly detection involves identifying outliers or anomalies that have little relatedness to the rest of the data. Embeddings can aid in this process by quantifying the similarity between text strings and flagging outliers.
15-
* Classification: Embeddings are utilized in classification tasks, where text strings are classified based on their most similar label. By comparing the embeddings of text strings and labels, you can classify new text strings into predefined categories.
223+
{% tab title="Python" %}
224+
```python
225+
if__name__=="__main__":
226+
asyncio.run(main())
227+
```
228+
{% endtab %}
229+
{% endtabs %}
16230

17-
###HowtheSDK Works
231+
Once you runtheexample, you should see something like this in the terminal:
18232

19-
SDK streamlines the development of vector search applications by abstracting away the complexities of database management and indexing. Here's an overview of how the SDK works:
233+
```bash
234+
[
235+
{
236+
"chunk":"document one contents...",
237+
"document": {"id":"Document One","text":"document one contents..."},
238+
"score": 0.9034339189529419,
239+
},
240+
{
241+
"chunk":"document two contents...",
242+
"document": {"id":"Document Two","text":"document two contents..."},
243+
"score": 0.8983734250068665,
244+
},
245+
]
246+
```
20247

21-
***Automatic Document and Text Chunk Management**: The SDK provides a convenient interface to manage documents and pipelines, automatically handling chunking and embedding for you. You can easily organize and structure your text data within the PostgreSQL database.
22-
***Open Source Model Integration**: With the SDK, you can seamlessly incorporate a wide range of open source models to generate high-quality embeddings. These models capture the semantic meaning of text and enable powerful analysis and search capabilities.
23-
***Embedding Indexing**: The Python SDK utilizes the PgVector extension to efficiently index the embeddings generated by the open source models. This indexing process optimizes search performance and allows for fast and accurate retrieval of relevant results.
24-
***Querying and Search**: Once the embeddings are indexed, you can perform vector-based searches on the documents and text chunks stored in the PostgreSQL database. The SDK provides intuitive methods for executing queries and retrieving search results.

0 commit comments

Comments
 (0)

[8]ページ先頭

©2009-2025 Movatter.jp