Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commit1d513fd

Browse files
authored
SDK documentation (#657)
1 parente09be46 commit1d513fd

File tree

8 files changed

+483
-24
lines changed

8 files changed

+483
-24
lines changed

‎pgml-sdks/python/pgml/README.md‎

Lines changed: 255 additions & 16 deletions
Large diffs are not rendered by default.
Lines changed: 128 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,128 @@
1+
Module pgml.collection
2+
======================
3+
4+
Variables
5+
---------
6+
7+
8+
`log`
9+
: Collection class to store tables for documents, chunks, models, splitters, and embeddings
10+
11+
Classes
12+
-------
13+
14+
`Collection(pool: psycopg_pool.pool.ConnectionPool, name: str)`
15+
: The function initializes an object with a connection pool and a name, and creates several tables
16+
while registering a text splitter and a model.
17+
18+
:param pool:`pool` is an instance of`ConnectionPool` class which manages a pool of database
19+
connections
20+
:type pool: ConnectionPool
21+
:param name: The`name` parameter is a string that represents the name of an object being
22+
initialized. It is used as an identifier for the object within the code
23+
:type name: str
24+
25+
### Methods
26+
27+
`generate_chunks(self, splitter_id: int = 1) ‑> None`
28+
: This function generates chunks of text from unchunked documents using a specified text splitter.
29+
30+
:param splitter_id: The ID of the splitter to use for generating chunks, defaults to 1
31+
:type splitter_id: int (optional)
32+
33+
`generate_embeddings(self, model_id: Optional[int] = 1, splitter_id: Optional[int] = 1) ‑> None`
34+
: This function generates embeddings for chunks of text using a specified model and inserts them into
35+
a database table.
36+
37+
:param model_id: The ID of the model to use for generating embeddings, defaults to 1
38+
:type model_id: Optional[int] (optional)
39+
:param splitter_id: The `splitter_id` parameter is an optional integer that specifies the ID of the
40+
data splitter to use for generating embeddings. If not provided, it defaults to 1, defaults to 1
41+
:type splitter_id: Optional[int] (optional)
42+
43+
`get_models(self) ‑> List[Dict[str, Any]]`
44+
: The function retrieves a list of dictionaries containing information about models from a database
45+
table.
46+
:return: The function `get_models` is returning a list of dictionaries, where each dictionary
47+
represents a model and contains the following keys: "id", "task", "name", and "parameters". The
48+
values associated with these keys correspond to the respective fields in the database table
49+
specified by `self.models_table`.
50+
51+
`get_text_splitters(self) ‑> List[Dict[str, Any]]`
52+
: This function retrieves a list of dictionaries containing information about text splitters from a
53+
database.
54+
:return: The function `get_text_splitters` is returning a list of dictionaries, where each
55+
dictionary contains the `id`, `name`, and `parameters` of a text splitter.
56+
57+
`register_model(self, task: Optional[str] = 'embedding', model_name: Optional[str] = 'intfloat/e5-small', model_params: Optional[Dict[str, Any]] = {}) ‑> None`
58+
: This function registers a model in a database if it does not already exist.
59+
60+
:param task: The type of task the model is being registered for, with a default value of
61+
"embedding", defaults to embedding
62+
:type task: Optional[str] (optional)
63+
:param model_name: The name of the model being registered, defaults to intfloat/e5-small
64+
:type model_name: Optional[str] (optional)
65+
:param model_params: model_params is a dictionary that contains the parameters for the model being
66+
registered. These parameters can be used to configure the model for a specific task. The dictionary
67+
can be empty if no parameters are needed
68+
:type model_params: Optional[Dict[str, Any]]
69+
:return: the id of the registered model.
70+
71+
`register_text_splitter(self, splitter_name: Optional[str] = 'RecursiveCharacterTextSplitter', splitter_params: Optional[Dict[str, Any]] = {}) ‑> None`
72+
: This function registers a text splitter with a given name and parameters in a database table if it
73+
does not already exist.
74+
75+
:param splitter_name: The name of the text splitter being registered. It is an optional parameter
76+
and defaults to "RecursiveCharacterTextSplitter" if not provided, defaults to
77+
RecursiveCharacterTextSplitter
78+
:type splitter_name: Optional[str] (optional)
79+
:param splitter_params: splitter_params is a dictionary that contains parameters for a text
80+
splitter. These parameters can be used to customize the behavior of the text splitter. The function
81+
takes this dictionary as an optional argument and if it is not provided, an empty dictionary is used
82+
as the default value
83+
:type splitter_params: Optional[Dict[str, Any]]
84+
:return: the id of the splitter that was either found in the database or inserted into the database.
85+
86+
`upsert_documents(self, documents: List[Dict[str, Any]], text_key: Optional[str] = 'text', id_key: Optional[str] = 'id') ‑> None`
87+
: The function `upsert_documents` inserts or updates documents in a database table based on their ID,
88+
text, and metadata.
89+
90+
:param documents: A list of dictionaries, where each dictionary represents a document to be upserted
91+
into a database table. Each dictionary should contain metadata about the document, as well as the
92+
actual text of the document
93+
:type documents: List[Dict[str, Any]]
94+
:param text_key: The key in the dictionary that corresponds to the text of the document, defaults to
95+
text
96+
:type text_key: Optional[str] (optional)
97+
:param id_key: The `id_key` parameter is an optional string parameter that specifies the key in the
98+
dictionary of each document that contains the unique identifier for that document. If this key is
99+
present in the dictionary, its value will be used as the document ID. If it is not present, a hash
100+
of the document, defaults to id
101+
:type id_key: Optional[str] (optional)
102+
:param verbose: A boolean parameter that determines whether or not to print verbose output during
103+
the upsert process. If set to True, additional information will be printed to the console during the
104+
upsert process. If set to False, only essential information will be printed, defaults to False
105+
106+
`vector_search(self, query: str, query_parameters: Optional[Dict[str, Any]] = {}, top_k: int = 5, model_id: int = 1, splitter_id: int = 1) ‑> List[Dict[str, Any]]`
107+
: This function performs a vector search on a database using a query and returns the top matching
108+
results.
109+
110+
:param query: The search query string
111+
:type query: str
112+
:param query_parameters: Optional dictionary of additional parameters to be used in generating
113+
the query embeddings. These parameters are specific to the model being used and can be used to
114+
fine-tune the search results. If no parameters are provided, default values will be used
115+
:type query_parameters: Optional[Dict[str, Any]]
116+
:param top_k: The number of search results to return, sorted by relevance score, defaults to 5
117+
:type top_k: int (optional)
118+
:param model_id: The ID of the model to use for generating embeddings, defaults to 1
119+
:type model_id: int (optional)
120+
:param splitter_id: The `splitter_id` parameter is an integer that identifies the specific
121+
splitter used to split the documents into chunks. It is used to retrieve the embeddings table
122+
associated with the specified splitter, defaults to 1
123+
:type splitter_id: int (optional)
124+
:return: a list of dictionaries containing search results for a given query. Each dictionary
125+
contains the following keys: "score", "text", and "metadata". The "score" key contains a float
126+
value representing the similarity score between the query and the search result. The "text" key
127+
contains the text of the search result, and the "metadata" key contains any metadata associated
128+
with the search result
Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,34 @@
1+
Module pgml.database
2+
====================
3+
4+
Classes
5+
-------
6+
7+
`Database(conninfo: str, min_connections: Optional[int] = 1)`
8+
: This function initializes a connection pool and creates a table in a PostgreSQL database if it does
9+
not already exist.
10+
11+
:param conninfo: A string containing the connection information for the PostgreSQL database, such
12+
as the host, port, database name, username, and password
13+
:type conninfo: str
14+
:param min_connections: The minimum number of connections that should be maintained in the
15+
connection pool at all times. If there are no available connections in the pool when a new
16+
connection is requested, a new connection will be created up to the maximum size of the pool,
17+
defaults to 1
18+
:type min_connections: Optional[int] (optional)
19+
20+
### Methods
21+
22+
`archive_collection(self, name: str) ‑> None`
23+
: This function deletes a PostgreSQL schema if it exists.
24+
25+
:param name: The name of the collection (or schema) to be deleted
26+
:type name: str
27+
28+
`create_or_get_collection(self, name: str) ‑> pgml.collection.Collection`
29+
: This function creates a new collection in a PostgreSQL database if it does not already exist and
30+
returns a Collection object.
31+
32+
:param name: The name of the collection to be created
33+
:type name: str
34+
:return: A Collection object is being returned.
Lines changed: 48 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,48 @@
1+
Module pgml.dbutils
2+
===================
3+
4+
Functions
5+
---------
6+
7+
8+
`run_create_or_insert_statement(conn: psycopg.Connection, statement: str, autocommit: bool = False) ‑> None`
9+
: This function executes a SQL statement on a database connection and optionally commits the changes.
10+
11+
:param conn: The`conn` parameter is a connection object that represents a connection to a database.
12+
It is used to execute SQL statements and manage transactions
13+
:type conn: Connection
14+
15+
:param statement: The SQL statement to be executed
16+
:type statement: str
17+
18+
:param autocommit: A boolean parameter that determines whether the transaction should be
19+
automatically committed after executing the statement. If set to True, the transaction will be
20+
committed automatically. If set to False, the transaction will need to be manually committed using
21+
the conn.commit() method, defaults to False
22+
:type autocommit: bool (optional)
23+
24+
25+
`run_drop_or_delete_statement(conn: psycopg.Connection, statement: str) ‑> None`
26+
: This function executes a given SQL statement to drop or delete data from a database using a provided
27+
connection object.
28+
29+
:param conn: The parameter`conn` is of type`Connection`, which is likely a connection object to a
30+
database. It is used to execute SQL statements on the database
31+
:type conn: Connection
32+
:param statement: The SQL statement to be executed on the database connection object
33+
:type statement: str
34+
35+
36+
`run_select_statement(conn: psycopg.Connection, statement: str) ‑> List[Any]`
37+
: The function runs a select statement on a database connection and returns the results as a list of
38+
dictionaries.
39+
40+
:param conn: The`conn` parameter is a connection object that represents a connection to a database.
41+
It is used to execute SQL statements and retrieve results from the database
42+
:type conn: Connection
43+
:param statement: The SQL SELECT statement to be executed on the database
44+
:type statement: str
45+
:return: The function`run_select_statement` returns a list of dictionaries, where each dictionary
46+
represents a row of the result set of the SQL query specified in the`statement` parameter. The keys
47+
of each dictionary are the column names of the result set, and the values are the corresponding
48+
values of the row.
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
Module pgml
2+
===========
3+
4+
Sub-modules
5+
-----------
6+
* pgml.collection
7+
* pgml.database
8+
* pgml.dbutils

‎pgml-sdks/python/pgml/examples/vector_search.py‎

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -29,6 +29,6 @@
2929

3030
start=time()
3131
results=collection.vector_search("Who won 20 grammy awards?",top_k=2)
32-
rprint(json.dumps(results,indent=2))
3332
rprint("Query time %0.3f"%(time()-start))
33+
rprint(json.dumps(results,indent=2))
3434
db.archive_collection(collection_name)

‎pgml-sdks/python/pgml/pgml/collection.py‎

Lines changed: 2 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -264,10 +264,9 @@ def _create_chunks_table(self):
264264

265265
defupsert_documents(
266266
self,
267-
documents:List[Dict[str,str]],
267+
documents:List[Dict[str,Any]],
268268
text_key:Optional[str]="text",
269269
id_key:Optional[str]="id",
270-
verbose:bool=False,
271270
)->None:
272271
"""
273272
The function `upsert_documents` inserts or updates documents in a database table based on their ID,
@@ -276,7 +275,7 @@ def upsert_documents(
276275
:param documents: A list of dictionaries, where each dictionary represents a document to be upserted
277276
into a database table. Each dictionary should contain metadata about the document, as well as the
278277
actual text of the document
279-
:type documents: List[Dict[str,str]]
278+
:type documents: List[Dict[str,Any]]
280279
:param text_key: The key in the dictionary that corresponds to the text of the document, defaults to
281280
text
282281
:type text_key: Optional[str] (optional)
@@ -288,7 +287,6 @@ def upsert_documents(
288287
:param verbose: A boolean parameter that determines whether or not to print verbose output during
289288
the upsert process. If set to True, additional information will be printed to the console during the
290289
upsert process. If set to False, only essential information will be printed, defaults to False
291-
:type verbose: bool (optional)
292290
"""
293291
conn=self.pool.getconn()
294292
fordocumentintrack(documents,description="Upserting documents"):

‎pgml-sdks/python/pgml/pyproject.toml‎

Lines changed: 7 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,14 @@
11
[tool.poetry]
22
name ="pgml"
3-
version ="0.1.0"
4-
description =""
5-
authors = ["Santi Adavani <santi@hyperparam.ai>"]
3+
version ="0.6.0"
4+
description ="Python SDK is designed to facilitate the development of scalable vector search applications on PostgreSQL databases."
5+
authors = ["PostgresML <team@postgresml.org>"]
6+
homepage ="https://postgresml.org"
7+
repository ="https://github.com/postgresml/postgresml"
8+
documentation ="https://github.com/postgresml/postgresml/tree/master/pgml-sdks/python/pgml"
69
readme ="README.md"
710
packages = [{include ="pgml"}]
11+
keywords = ["postgres","machine learning","vector databases","embeddings"]
812

913
[tool.poetry.dependencies]
1014
python =">=3.8.1,<4.0"

0 commit comments

Comments
 (0)

[8]ページ先頭

©2009-2025 Movatter.jp