NotificationsYou must be signed in to change notification settings
Fork352
Star6.6k

Commit1d513fd

authored

SDK documentation (#657)

1 parente09be46 commit1d513fdCopy full SHA for 1d513fd

File tree

8 files changed

+483

-24

lines changed

pgml-sdks/python/pgml
- README.md
- docs/pgml
- examples
  - vector_search.py
- pgml
  - collection.py
- pyproject.toml

8 files changed

+483

-24

lines changed

`‎pgml-sdks/python/pgml/README.md‎`

Lines changed: 255 additions & 16 deletions

Large diffs are not rendered by default.

`‎pgml-sdks/python/pgml/docs/pgml/collection.md‎`

Lines changed: 128 additions & 0 deletions

Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,128 @@`
	`1`	`+Module pgml.collection`
	`2`	`+======================`
	`3`	`+`
	`4`	`+Variables`
	`5`	`+---------`
	`6`	`+`
	`7`	`+`
	`8`	+`log`
	`9`	`+: Collection class to store tables for documents, chunks, models, splitters, and embeddings`
	`10`	`+`
	`11`	`+Classes`
	`12`	`+-------`
	`13`	`+`
	`14`	+`Collection(pool: psycopg_pool.pool.ConnectionPool, name: str)`
	`15`	`+: The function initializes an object with a connection pool and a name, and creates several tables`
	`16`	`+ while registering a text splitter and a model.`
	`17`	`+`
	`18`	+:param pool:`pool` is an instance of`ConnectionPool` class which manages a pool of database
	`19`	`+ connections`
	`20`	`+:type pool: ConnectionPool`
	`21`	+:param name: The`name` parameter is a string that represents the name of an object being
	`22`	`+ initialized. It is used as an identifier for the object within the code`
	`23`	`+:type name: str`
	`24`	`+`
	`25`	`+### Methods`
	`26`	`+`
	`27`	+`generate_chunks(self, splitter_id: int = 1) ‑> None`
	`28`	`+: This function generates chunks of text from unchunked documents using a specified text splitter.`
	`29`	`+`
	`30`	`+ :param splitter_id: The ID of the splitter to use for generating chunks, defaults to 1`
	`31`	`+ :type splitter_id: int (optional)`
	`32`	`+`
	`33`	+`generate_embeddings(self, model_id: Optional[int] = 1, splitter_id: Optional[int] = 1) ‑> None`
	`34`	`+: This function generates embeddings for chunks of text using a specified model and inserts them into`
	`35`	`+ a database table.`
	`36`	`+`
	`37`	`+ :param model_id: The ID of the model to use for generating embeddings, defaults to 1`
	`38`	`+ :type model_id: Optional[int] (optional)`
	`39`	+ :param splitter_id: The `splitter_id` parameter is an optional integer that specifies the ID of the
	`40`	`+ data splitter to use for generating embeddings. If not provided, it defaults to 1, defaults to 1`
	`41`	`+ :type splitter_id: Optional[int] (optional)`
	`42`	`+`
	`43`	+`get_models(self) ‑> List[Dict[str, Any]]`
	`44`	`+: The function retrieves a list of dictionaries containing information about models from a database`
	`45`	`+ table.`
	`46`	+ :return: The function `get_models` is returning a list of dictionaries, where each dictionary
	`47`	`+ represents a model and contains the following keys: "id", "task", "name", and "parameters". The`
	`48`	`+ values associated with these keys correspond to the respective fields in the database table`
	`49`	+ specified by `self.models_table`.
	`50`	`+`
	`51`	+`get_text_splitters(self) ‑> List[Dict[str, Any]]`
	`52`	`+: This function retrieves a list of dictionaries containing information about text splitters from a`
	`53`	`+ database.`
	`54`	+ :return: The function `get_text_splitters` is returning a list of dictionaries, where each
	`55`	+ dictionary contains the `id`, `name`, and `parameters` of a text splitter.
	`56`	`+`
	`57`	+`register_model(self, task: Optional[str] = 'embedding', model_name: Optional[str] = 'intfloat/e5-small', model_params: Optional[Dict[str, Any]] = {}) ‑> None`
	`58`	`+: This function registers a model in a database if it does not already exist.`
	`59`	`+`
	`60`	`+ :param task: The type of task the model is being registered for, with a default value of`
	`61`	`+ "embedding", defaults to embedding`
	`62`	`+ :type task: Optional[str] (optional)`
	`63`	`+ :param model_name: The name of the model being registered, defaults to intfloat/e5-small`
	`64`	`+ :type model_name: Optional[str] (optional)`
	`65`	`+ :param model_params: model_params is a dictionary that contains the parameters for the model being`
	`66`	`+ registered. These parameters can be used to configure the model for a specific task. The dictionary`
	`67`	`+ can be empty if no parameters are needed`
	`68`	`+ :type model_params: Optional[Dict[str, Any]]`
	`69`	`+ :return: the id of the registered model.`
	`70`	`+`
	`71`	+`register_text_splitter(self, splitter_name: Optional[str] = 'RecursiveCharacterTextSplitter', splitter_params: Optional[Dict[str, Any]] = {}) ‑> None`
	`72`	`+: This function registers a text splitter with a given name and parameters in a database table if it`
	`73`	`+ does not already exist.`
	`74`	`+`
	`75`	`+ :param splitter_name: The name of the text splitter being registered. It is an optional parameter`
	`76`	`+ and defaults to "RecursiveCharacterTextSplitter" if not provided, defaults to`
	`77`	`+ RecursiveCharacterTextSplitter`
	`78`	`+ :type splitter_name: Optional[str] (optional)`
	`79`	`+ :param splitter_params: splitter_params is a dictionary that contains parameters for a text`
	`80`	`+ splitter. These parameters can be used to customize the behavior of the text splitter. The function`
	`81`	`+ takes this dictionary as an optional argument and if it is not provided, an empty dictionary is used`
	`82`	`+ as the default value`
	`83`	`+ :type splitter_params: Optional[Dict[str, Any]]`
	`84`	`+ :return: the id of the splitter that was either found in the database or inserted into the database.`
	`85`	`+`
	`86`	+`upsert_documents(self, documents: List[Dict[str, Any]], text_key: Optional[str] = 'text', id_key: Optional[str] = 'id') ‑> None`
	`87`	+: The function `upsert_documents` inserts or updates documents in a database table based on their ID,
	`88`	`+ text, and metadata.`
	`89`	`+`
	`90`	`+ :param documents: A list of dictionaries, where each dictionary represents a document to be upserted`
	`91`	`+ into a database table. Each dictionary should contain metadata about the document, as well as the`
	`92`	`+ actual text of the document`
	`93`	`+ :type documents: List[Dict[str, Any]]`
	`94`	`+ :param text_key: The key in the dictionary that corresponds to the text of the document, defaults to`
	`95`	`+ text`
	`96`	`+ :type text_key: Optional[str] (optional)`
	`97`	+ :param id_key: The `id_key` parameter is an optional string parameter that specifies the key in the
	`98`	`+ dictionary of each document that contains the unique identifier for that document. If this key is`
	`99`	`+ present in the dictionary, its value will be used as the document ID. If it is not present, a hash`
	`100`	`+ of the document, defaults to id`
	`101`	`+ :type id_key: Optional[str] (optional)`
	`102`	`+ :param verbose: A boolean parameter that determines whether or not to print verbose output during`
	`103`	`+ the upsert process. If set to True, additional information will be printed to the console during the`
	`104`	`+ upsert process. If set to False, only essential information will be printed, defaults to False`
	`105`	`+`
	`106`	+`vector_search(self, query: str, query_parameters: Optional[Dict[str, Any]] = {}, top_k: int = 5, model_id: int = 1, splitter_id: int = 1) ‑> List[Dict[str, Any]]`
	`107`	`+: This function performs a vector search on a database using a query and returns the top matching`
	`108`	`+ results.`
	`109`	`+`
	`110`	`+ :param query: The search query string`
	`111`	`+ :type query: str`
	`112`	`+ :param query_parameters: Optional dictionary of additional parameters to be used in generating`
	`113`	`+ the query embeddings. These parameters are specific to the model being used and can be used to`
	`114`	`+ fine-tune the search results. If no parameters are provided, default values will be used`
	`115`	`+ :type query_parameters: Optional[Dict[str, Any]]`
	`116`	`+ :param top_k: The number of search results to return, sorted by relevance score, defaults to 5`
	`117`	`+ :type top_k: int (optional)`
	`118`	`+ :param model_id: The ID of the model to use for generating embeddings, defaults to 1`
	`119`	`+ :type model_id: int (optional)`
	`120`	+ :param splitter_id: The `splitter_id` parameter is an integer that identifies the specific
	`121`	`+ splitter used to split the documents into chunks. It is used to retrieve the embeddings table`
	`122`	`+ associated with the specified splitter, defaults to 1`
	`123`	`+ :type splitter_id: int (optional)`
	`124`	`+ :return: a list of dictionaries containing search results for a given query. Each dictionary`
	`125`	`+ contains the following keys: "score", "text", and "metadata". The "score" key contains a float`
	`126`	`+ value representing the similarity score between the query and the search result. The "text" key`
	`127`	`+ contains the text of the search result, and the "metadata" key contains any metadata associated`
	`128`	`+ with the search result`

`‎pgml-sdks/python/pgml/docs/pgml/database.md‎`

Lines changed: 34 additions & 0 deletions

Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,34 @@`
	`1`	`+Module pgml.database`
	`2`	`+====================`
	`3`	`+`
	`4`	`+Classes`
	`5`	`+-------`
	`6`	`+`
	`7`	+`Database(conninfo: str, min_connections: Optional[int] = 1)`
	`8`	`+: This function initializes a connection pool and creates a table in a PostgreSQL database if it does`
	`9`	`+ not already exist.`
	`10`	`+`
	`11`	`+:param conninfo: A string containing the connection information for the PostgreSQL database, such`
	`12`	`+ as the host, port, database name, username, and password`
	`13`	`+:type conninfo: str`
	`14`	`+:param min_connections: The minimum number of connections that should be maintained in the`
	`15`	`+ connection pool at all times. If there are no available connections in the pool when a new`
	`16`	`+ connection is requested, a new connection will be created up to the maximum size of the pool,`
	`17`	`+ defaults to 1`
	`18`	`+:type min_connections: Optional[int] (optional)`
	`19`	`+`
	`20`	`+### Methods`
	`21`	`+`
	`22`	+`archive_collection(self, name: str) ‑> None`
	`23`	`+: This function deletes a PostgreSQL schema if it exists.`
	`24`	`+`
	`25`	`+ :param name: The name of the collection (or schema) to be deleted`
	`26`	`+ :type name: str`
	`27`	`+`
	`28`	+`create_or_get_collection(self, name: str) ‑> pgml.collection.Collection`
	`29`	`+: This function creates a new collection in a PostgreSQL database if it does not already exist and`
	`30`	`+ returns a Collection object.`
	`31`	`+`
	`32`	`+ :param name: The name of the collection to be created`
	`33`	`+ :type name: str`
	`34`	`+ :return: A Collection object is being returned.`

`‎pgml-sdks/python/pgml/docs/pgml/dbutils.md‎`

Lines changed: 48 additions & 0 deletions

Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,48 @@`
	`1`	`+Module pgml.dbutils`
	`2`	`+===================`
	`3`	`+`
	`4`	`+Functions`
	`5`	`+---------`
	`6`	`+`
	`7`	`+`
	`8`	+`run_create_or_insert_statement(conn: psycopg.Connection, statement: str, autocommit: bool = False) ‑> None`
	`9`	`+: This function executes a SQL statement on a database connection and optionally commits the changes.`
	`10`	`+`
	`11`	+:param conn: The`conn` parameter is a connection object that represents a connection to a database.
	`12`	`+ It is used to execute SQL statements and manage transactions`
	`13`	`+:type conn: Connection`
	`14`	`+`
	`15`	`+:param statement: The SQL statement to be executed`
	`16`	`+:type statement: str`
	`17`	`+`
	`18`	`+:param autocommit: A boolean parameter that determines whether the transaction should be`
	`19`	`+ automatically committed after executing the statement. If set to True, the transaction will be`
	`20`	`+ committed automatically. If set to False, the transaction will need to be manually committed using`
	`21`	`+ the conn.commit() method, defaults to False`
	`22`	`+:type autocommit: bool (optional)`
	`23`	`+`
	`24`	`+`
	`25`	+`run_drop_or_delete_statement(conn: psycopg.Connection, statement: str) ‑> None`
	`26`	`+: This function executes a given SQL statement to drop or delete data from a database using a provided`
	`27`	`+ connection object.`
	`28`	`+`
	`29`	+:param conn: The parameter`conn` is of type`Connection`, which is likely a connection object to a
	`30`	`+ database. It is used to execute SQL statements on the database`
	`31`	`+:type conn: Connection`
	`32`	`+:param statement: The SQL statement to be executed on the database connection object`
	`33`	`+:type statement: str`
	`34`	`+`
	`35`	`+`
	`36`	+`run_select_statement(conn: psycopg.Connection, statement: str) ‑> List[Any]`
	`37`	`+: The function runs a select statement on a database connection and returns the results as a list of`
	`38`	`+ dictionaries.`
	`39`	`+`
	`40`	+:param conn: The`conn` parameter is a connection object that represents a connection to a database.
	`41`	`+ It is used to execute SQL statements and retrieve results from the database`
	`42`	`+:type conn: Connection`
	`43`	`+:param statement: The SQL SELECT statement to be executed on the database`
	`44`	`+:type statement: str`
	`45`	+ :return: The function`run_select_statement` returns a list of dictionaries, where each dictionary
	`46`	+ represents a row of the result set of the SQL query specified in the`statement` parameter. The keys
	`47`	`+ of each dictionary are the column names of the result set, and the values are the corresponding`
	`48`	`+ values of the row.`

`‎pgml-sdks/python/pgml/docs/pgml/index.md‎`

Lines changed: 8 additions & 0 deletions

Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,8 @@`
	`1`	`+Module pgml`
	`2`	`+===========`
	`3`	`+`
	`4`	`+Sub-modules`
	`5`	`+-----------`
	`6`	`+* pgml.collection`
	`7`	`+* pgml.database`
	`8`	`+* pgml.dbutils`

`‎pgml-sdks/python/pgml/examples/vector_search.py‎`

Lines changed: 1 addition & 1 deletion

Original file line number	Diff line number	Diff line change
`@@ -29,6 +29,6 @@`
`29`	`29`
`30`	`30`	`start=time()`
`31`	`31`	`results=collection.vector_search("Who won 20 grammy awards?",top_k=2)`
`32`		`-rprint(json.dumps(results,indent=2))`
`33`	`32`	`rprint("Query time %0.3f"%(time()-start))`
	`33`	`+rprint(json.dumps(results,indent=2))`
`34`	`34`	`db.archive_collection(collection_name)`

`‎pgml-sdks/python/pgml/pgml/collection.py‎`

Lines changed: 2 additions & 4 deletions

Original file line number	Diff line number	Diff line change
`@@ -264,10 +264,9 @@ def _create_chunks_table(self):`
`264`	`264`
`265`	`265`	`defupsert_documents(`
`266`	`266`	`self,`
`267`		`-documents:List[Dict[str,str]],`
	`267`	`+documents:List[Dict[str,Any]],`
`268`	`268`	`text_key:Optional[str]="text",`
`269`	`269`	`id_key:Optional[str]="id",`
`270`		`-verbose:bool=False,`
`271`	`270`	`)->None:`
`272`	`271`	`"""`
`273`	`272`	The function `upsert_documents` inserts or updates documents in a database table based on their ID,
`@@ -276,7 +275,7 @@ def upsert_documents(`
`276`	`275`	`:param documents: A list of dictionaries, where each dictionary represents a document to be upserted`
`277`	`276`	`into a database table. Each dictionary should contain metadata about the document, as well as the`
`278`	`277`	`actual text of the document`
`279`		`- :type documents: List[Dict[str,str]]`
	`278`	`+ :type documents: List[Dict[str,Any]]`
`280`	`279`	`:param text_key: The key in the dictionary that corresponds to the text of the document, defaults to`
`281`	`280`	`text`
`282`	`281`	`:type text_key: Optional[str] (optional)`
`@@ -288,7 +287,6 @@ def upsert_documents(`
`288`	`287`	`:param verbose: A boolean parameter that determines whether or not to print verbose output during`
`289`	`288`	`the upsert process. If set to True, additional information will be printed to the console during the`
`290`	`289`	`upsert process. If set to False, only essential information will be printed, defaults to False`
`291`		`- :type verbose: bool (optional)`
`292`	`290`	`"""`
`293`	`291`	`conn=self.pool.getconn()`
`294`	`292`	`fordocumentintrack(documents,description="Upserting documents"):`

`‎pgml-sdks/python/pgml/pyproject.toml‎`

Lines changed: 7 additions & 3 deletions

Original file line number	Diff line number	Diff line change
`@@ -1,10 +1,14 @@`
`1`	`1`	`[tool.poetry]`
`2`	`2`	`name ="pgml"`
`3`		`-version ="0.1.0"`
`4`		`-description =""`
`5`		`-authors = ["Santi Adavani <santi@hyperparam.ai>"]`
	`3`	`+version ="0.6.0"`
	`4`	`+description ="Python SDK is designed to facilitate the development of scalable vector search applications on PostgreSQL databases."`
	`5`	`+authors = ["PostgresML <team@postgresml.org>"]`
	`6`	`+homepage ="https://postgresml.org"`
	`7`	`+repository ="https://github.com/postgresml/postgresml"`
	`8`	`+documentation ="https://github.com/postgresml/postgresml/tree/master/pgml-sdks/python/pgml"`
`6`	`9`	`readme ="README.md"`
`7`	`10`	`packages = [{include ="pgml"}]`
	`11`	`+keywords = ["postgres","machine learning","vector databases","embeddings"]`
`8`	`12`
`9`	`13`	`[tool.poetry.dependencies]`
`10`	`14`	`python =">=3.8.1,<4.0"`

0 commit comments

Comments

(0)

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Commit1d513fd

File tree

8 files changed

8 files changed

`‎pgml-sdks/python/pgml/README.md‎`

`‎pgml-sdks/python/pgml/docs/pgml/collection.md‎`

`‎pgml-sdks/python/pgml/docs/pgml/database.md‎`

`‎pgml-sdks/python/pgml/docs/pgml/dbutils.md‎`

`‎pgml-sdks/python/pgml/docs/pgml/index.md‎`

`‎pgml-sdks/python/pgml/examples/vector_search.py‎`

`‎pgml-sdks/python/pgml/pgml/collection.py‎`

`‎pgml-sdks/python/pgml/pyproject.toml‎`

0 commit comments