postgresml/postgresmlPublic

NotificationsYou must be signed in to change notification settings
Fork352
Star6.6k

Commite22134f

Moloejoe

authored and

gitbook-bot

committed

GITBOOK-72: Fix broken Pipelines and Search and add a bit more info

1 parent14a5976 commite22134fCopy full SHA for e22134f

File tree

7 files changed

+153

-122

lines changed

pgml-docs/docs/guides
- deploying-postgresml/self-hosting
  - README.md
  - building-from-source.md
- developer-docs
  - installation.md
- getting-started
  - sign-up.md
- machine-learning/supervised-learning
  - data-pre-processing.md
- sdks
  - pipelines.md
  - search.md

7 files changed

+153

-122

lines changed

`‎pgml-docs/docs/guides/deploying-postgresml/self-hosting/README.md‎`

Lines changed: 1 addition & 1 deletion

Original file line number	Diff line number	Diff line change
`@@ -31,7 +31,7 @@ Finally, you can install PostgresML:`
`31`	`31`	`sudo apt install -y postgresml-14`
`32`	`32`	```
`33`	`33`
`34`		-Ubuntu 22.04 ships with PostgreSQL 14, but if you have a different version installed on your system, just change`14` in the package name to your Postgres version. We currently support all versions supported by the community: Postgres 12 through16.
	`34`	+Ubuntu 22.04 ships with PostgreSQL 14, but if you have a different version installed on your system, just change`14` in the package name to your Postgres version. We currently support all versions supported by the community: Postgres 12 through15.
`35`	`35`
`36`	`36`	`###Validate your installation`
`37`	`37`

`‎pgml-docs/docs/guides/deploying-postgresml/self-hosting/building-from-source.md‎`

Lines changed: 2 additions & 2 deletions

Original file line number	Diff line number	Diff line change
`@@ -40,12 +40,12 @@ For a typical deployment in production, you would need to compile and install th`
`40`	`40`
`41`	`41`	`####Install pgrx`
`42`	`42`
`43`		-`pgrx` is open source and available from crates.io. We are currently using the`0.11.0` version. It's important that your`pgrx` version matches what we're using, since there are some hard dependencies between our code and`pgrx`.
	`43`	+`pgrx` is open source and available from crates.io. We are currently using the`0.10.0` version. It's important that your`pgrx` version matches what we're using, since there are some hard dependencies between our code and`pgrx`.
`44`	`44`
`45`	`45`	To install`pgrx`, simply run:
`46`	`46`
`47`	`47`	```
`48`		`-cargo install cargo-pgrx --version "0.11.0"`
	`48`	`+cargo install cargo-pgrx --version "0.10.0"`
`49`	`49`	```
`50`	`50`
`51`	`51`	Before using`pgrx`, it needs to be initialized against the installed version of PostgreSQL. In this example, we'll be using the Ubuntu 22.04 default PostgreSQL 14 installation:

`‎pgml-docs/docs/guides/developer-docs/installation.md‎`

Lines changed: 0 additions & 1 deletion

Original file line number	Diff line number	Diff line change
@@ -64,7 +64,6 @@ To install the necessary Python packages into a virtual environment, use the `vi
`64`	`64`	`virtualenv pgml-venv&& \`
`65`	`65`	`source pgml-venv/bin/activate&& \`
`66`	`66`	`pip install -r requirements.txt&& \`
`67`		`-pip install -r requirements-autogptq.txt&& \`
`68`	`67`	`pip install -r requirements-xformers.txt --no-dependencies`
`69`	`68`	```
`70`	`69`

`‎pgml-docs/docs/guides/getting-started/sign-up.md‎`

Lines changed: 1 addition & 0 deletions

Original file line number	Diff line number	Diff line change
`@@ -5,6 +5,7 @@`
`5`	`5`	`1. Go to[https://postgresml.org/signup](https://postgresml.org/signup)`
`6`	`6`	`2. Sign up using your email or using Google or Github authentication`
`7`	`7`	`3. Login using your account`
	`8`	`+4.[data-pre-processing.md](../machine-learning/supervised-learning/data-pre-processing.md"mention")`
`8`	`9`
`9`	`10`
`10`	`11`

`‎pgml-docs/docs/guides/machine-learning/supervised-learning/data-pre-processing.md‎`

Lines changed: 4 additions & 4 deletions

Original file line number	Diff line number	Diff line change
`@@ -25,9 +25,9 @@ In this example:`
`25`	`25`
`26`	`26`	`There are 3 steps to preprocessing data:`
`27`	`27`
`28`		`-*[Encoding](#categorical-encodings) categorical values into quantitative values`
`29`		`-*[Imputing](#imputing-missing-values) NULL values to some quantitative value`
`30`		`-*[Scaling](#scaling-values) quantitative values across all variables to similar ranges`
	`28`	`+*[Encoding](../../../../../pgml-dashboard/content/docs/guides/training/preprocessing.md#categorical-encodings) categorical values into quantitative values`
	`29`	`+*[Imputing](../../../../../pgml-dashboard/content/docs/guides/training/preprocessing.md#imputing-missing-values) NULL values to some quantitative value`
	`30`	`+*[Scaling](../../../../../pgml-dashboard/content/docs/guides/training/preprocessing.md#scaling-values) quantitative values across all variables to similar ranges`
`31`	`31`
`32`	`32`	These preprocessing steps may be specified on a per-column basis to the[train()](../../../../../docs/guides/training/overview) function. By default, PostgresML does minimal preprocessing on training data, and will raise an error during analysis if NULL values are encountered without a preprocessor. All types other than`TEXT` are treated as quantitative variables and cast to floating point representations before passing them to the underlying algorithm implementations.
`33`	`33`
`@@ -71,7 +71,7 @@ Encoding categorical variables is an O(N log(M)) where N is the number of rows,`
`71`	`71`	`\|name\|description\|`
`72`	`72`	`\| ---------\| -----------------------------------------------------------------------------------------------------------------------------------------------\|`
`73`	`73`	\|`none`\|Default - Casts the variable to a 32-bit floating point representation compatible with numerics. This is the default for non-`TEXT` values.\|
`74`		-\|`target`\| Encodes the variable as theaverage value of the target label for all members of the category. This is the default for`TEXT` variables.\|
	`74`	+\|`target`\| Encodes the variable as themean value of the target label for all members of the category. This is the default for`TEXT` variables.\|
`75`	`75`	\|`one_hot`\| Encodes the variable as multiple independent boolean columns.\|
`76`	`76`	\|`ordinal`\| Encodes the variable as integer values provided by their position in the input array. NULLS are always 0.\|
`77`	`77`

`‎pgml-docs/docs/guides/sdks/pipelines.md‎`

Lines changed: 66 additions & 32 deletions

Original file line number	Diff line number	Diff line change
`@@ -17,7 +17,7 @@ model = Model()`
`17`	`17`
`18`	`18`	`{% tab title="JavaScript" %}`
`19`	`19`	```javascript
`20`		`-model=pgml.newModel()`
	`20`	`+constmodel=pgml.newModel()`
`21`	`21`	```
`22`	`22`	`{% endtab %}`
`23`	`23`	`{% endtabs %}`
`@@ -36,9 +36,10 @@ model = Model(`
`36`	`36`
`37`	`37`	`{% tab title="JavaScript" %}`
`38`	`38`	```javascript
`39`		`-model=pgml.newModel(`
`40`		`- name="hkunlp/instructor-base",`
`41`		`- parameters={instruction:"Represent the Wikipedia document for retrieval:"}`
	`39`	`+constmodel=pgml.newModel(`
	`40`	`+"hkunlp/instructor-base",`
	`41`	`+"pgml",`
	`42`	`+ { instruction:"Represent the Wikipedia document for retrieval:" }`
`42`	`43`	`)`
`43`	`44`	```
`44`	`45`	`{% endtab %}`
`@@ -55,7 +56,7 @@ model = Model(name="text-embedding-ada-002", source="openai")`
`55`	`56`
`56`	`57`	`{% tab title="JavaScript" %}`
`57`	`58`	```javascript
`58`		`-model=pgml.newModel(name="text-embedding-ada-002",source="openai")`
	`59`	`+constmodel=pgml.newModel("text-embedding-ada-002","openai")`
`59`	`60`	```
`60`	`61`	`{% endtab %}`
`61`	`62`	`{% endtabs %}`
`@@ -75,7 +76,7 @@ splitter = Splitter()`
`75`	`76`
`76`	`77`	`{% tab title="JavaScript" %}`
`77`	`78`	```javascript
`78`		`-splitter=pgml.newSplitter()`
	`79`	`+constsplitter=pgml.newSplitter()`
`79`	`80`	```
`80`	`81`	`{% endtab %}`
`81`	`82`	`{% endtabs %}`
`@@ -95,8 +96,8 @@ splitter = Splitter(`
`95`	`96`	`{% tab title="JavaScript" %}`
`96`	`97`	```javascript
`97`	`98`	`splitter=pgml.newSplitter(`
`98`		`- name="recursive_character",`
`99`		`- parameters={chunk_size:1500, chunk_overlap:40}`
	`99`	`+"recursive_character",`
	`100`	`+{chunk_size:1500, chunk_overlap:40}`
`100`	`101`	`)`
`101`	`102`	```
`102`	`103`	`{% endtab %}`
`@@ -120,9 +121,9 @@ await collection.add_pipeline(pipeline)`
`120`	`121`
`121`	`122`	`{% tab title="JavaScript" %}`
`122`	`123`	```javascript
`123`		`-model=pgml.newModel()`
`124`		`-splitter=pgml.newSplitter()`
`125`		`-pipeline=pgml.newPipeline("test_pipeline", model, splitter)`
	`124`	`+constmodel=pgml.newModel()`
	`125`	`+constsplitter=pgml.newSplitter()`
	`126`	`+constpipeline=pgml.newPipeline("test_pipeline", model, splitter)`
`126`	`127`	`awaitcollection.add_pipeline(pipeline)`
`127`	`128`	```
`128`	`129`	`{% endtab %}`
`@@ -151,17 +152,51 @@ await collection.add_pipeline(pipeline)`
`151`	`152`
`152`	`153`	`{% tab title="JavaScript" %}`
`153`	`154`	```javascript
`154`		`-model=pgml.newModel()`
`155`		`-splitter=pgml.newSplitter()`
`156`		`-pipeline=pgml.newPipeline("test_pipeline", model, splitter, {`
`157`		`-"full_text_search": {`
`158`		`- active: True,`
`159`		`- configuration:"english"`
	`155`	`+constmodel=pgml.newModel()`
	`156`	`+constsplitter=pgml.newSplitter()`
	`157`	`+constpipeline=pgml.newPipeline("test_pipeline", model, splitter, {`
	`158`	`+ full_text_search: {`
	`159`	`+ active:true,`
	`160`	`+ configuration:"english"`
	`161`	`+ }`
	`162`	`+})`
	`163`	`+awaitcollection.add_pipeline(pipeline)`
	`164`	+```
	`165`	`+{% endtab %}`
	`166`	`+{% endtabs %}`
	`167`	`+`
	`168`	`+###Customizing the HNSW Index`
	`169`	`+`
	`170`	+By default the SDK uses HNSW indexes to efficiently perform vector recall. The default HNSW index sets`m` to 16 and`ef_construction` to 64. These defaults can be customized when the Pipeline is created.
	`171`	`+`
	`172`	`+{% tabs %}`
	`173`	`+{% tab title="Python" %}`
	`174`	+```python
	`175`	`+model= Model()`
	`176`	`+splitter= Splitter()`
	`177`	`+pipeline= Pipeline("test_pipeline", model, splitter, {`
	`178`	`+"hnsw": {`
	`179`	`+"m":16,`
	`180`	`+"ef_construction":64`
`160`	`181`	`}`
`161`	`182`	`})`
`162`	`183`	`await collection.add_pipeline(pipeline)`
`163`	`184`	```
`164`	`185`	`{% endtab %}`
	`186`	`+`
	`187`	`+{% tab title="JavaScript" %}`
	`188`	+```javascript
	`189`	`+constmodel=pgml.newModel()`
	`190`	`+constsplitter=pgml.newSplitter()`
	`191`	`+constpipeline=pgml.newPipeline("test_pipeline", model, splitter, {`
	`192`	`+ hnsw: {`
	`193`	`+ m:16,`
	`194`	`+ ef_construction:64`
	`195`	`+ }`
	`196`	`+})`
	`197`	`+awaitcollection.add_pipeline(pipeline)`
	`198`	+```
	`199`	`+{% endtab %}`
`165`	`200`	`{% endtabs %}`
`166`	`201`
`167`	`202`	`##Searching with Pipelines`
`@@ -179,19 +214,17 @@ results = await collection.query().vector_recall("Why is PostgresML the best?",`
`179`	`214`
`180`	`215`	`{% tab title="JavaScript" %}`
`181`	`216`	```javascript
`182`		`-pipeline=pgml.newPipeline("test_pipeline")`
`183`		`-collection=pgml.newCollection("test_collection")`
`184`		`-results=awaitcollection.query().vector_recall("Why is PostgresML the best?", pipeline).fetch_all()`
	`217`	`+constpipeline=pgml.newPipeline("test_pipeline")`
	`218`	`+constcollection=pgml.newCollection("test_collection")`
	`219`	`+constresults=awaitcollection.query().vector_recall("Why is PostgresML the best?", pipeline).fetch_all()`
`185`	`220`	```
`186`	`221`	`{% endtab %}`
`187`	`222`	`{% endtabs %}`
`188`	`223`
`189`		`-`
	`224`	`+##Disable a Pipeline`
`190`	`225`
`191`	`226`	`Pipelines can be disabled or removed to prevent them from running automatically when documents are upserted.`
`192`	`227`
`193`		`-##Disable a Pipeline`
`194`		`-`
`195`	`228`	`{% tabs %}`
`196`	`229`	`{% tab title="Python" %}`
`197`	`230`	```python
`@@ -203,8 +236,8 @@ await collection.disable_pipeline(pipeline)`
`203`	`236`
`204`	`237`	`{% tab title="JavaScript" %}`
`205`	`238`	```javascript
`206`		`-pipeline=pgml.newPipeline("test_pipeline")`
`207`		`-collection=pgml.newCollection("test_collection")`
	`239`	`+constpipeline=pgml.newPipeline("test_pipeline")`
	`240`	`+constcollection=pgml.newCollection("test_collection")`
`208`	`241`	`awaitcollection.disable_pipeline(pipeline)`
`209`	`242`	```
`210`	`243`	`{% endtab %}`
`@@ -214,6 +247,8 @@ Disabling a Pipeline prevents it from running automatically, but leaves all chun`
`214`	`247`
`215`	`248`	`##Enable a Pipeline`
`216`	`249`
	`250`	`+Disabled pipelines can be re-enabled.`
	`251`	`+`
`217`	`252`	`{% tabs %}`
`218`	`253`	`{% tab title="Python" %}`
`219`	`254`	```python
`@@ -225,8 +260,8 @@ await collection.enable_pipeline(pipeline)`
`225`	`260`
`226`	`261`	`{% tab title="JavaScript" %}`
`227`	`262`	```javascript
`228`		`-pipeline=pgml.newPipeline("test_pipeline")`
`229`		`-collection=pgml.newCollection("test_collection")`
	`263`	`+constpipeline=pgml.newPipeline("test_pipeline")`
	`264`	`+constcollection=pgml.newCollection("test_collection")`
`230`	`265`	`awaitcollection.enable_pipeline(pipeline)`
`231`	`266`	```
`232`	`267`	`{% endtab %}`
`@@ -246,11 +281,10 @@ await collection.remove_pipeline(pipeline)`
`246`	`281`	`{% endtab %}`
`247`	`282`
`248`	`283`	`{% tab title="JavaScript" %}`
`249`		-```javascript
`250`		`-pipeline=pgml.newPipeline("test_pipeline")`
`251`		`-collection=pgml.newCollection("test_collection")`
`252`		`-awaitcollection.remove_pipeline(pipeline)`
`253`		-```
	`284`	`+<preclass="language-javascript"><codeclass="lang-javascript">const pipeline = pgml.newPipeline("test_pipeline")`
	`285`	`+<strong>const collection = pgml.newCollection("test_collection")`
	`286`	`+</strong>await collection.remove_pipeline(pipeline)`
	`287`	`+</code></pre>`
`254`	`288`	`{% endtab %}`
`255`	`289`	`{% endtabs %}`
`256`	`290`

0 commit comments

Comments

(0)

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Commite22134f

File tree

7 files changed

7 files changed

`‎pgml-docs/docs/guides/deploying-postgresml/self-hosting/README.md‎`

`‎pgml-docs/docs/guides/deploying-postgresml/self-hosting/building-from-source.md‎`

`‎pgml-docs/docs/guides/developer-docs/installation.md‎`

`‎pgml-docs/docs/guides/getting-started/sign-up.md‎`

`‎pgml-docs/docs/guides/machine-learning/supervised-learning/data-pre-processing.md‎`

`‎pgml-docs/docs/guides/sdks/pipelines.md‎`

0 commit comments