Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commitb9eeb5e

Browse files
authored
pgml.transform() docs fixes (#1428)
1 parentbbf8dd0 commitb9eeb5e

File tree

10 files changed

+284
-137
lines changed

10 files changed

+284
-137
lines changed

‎pgml-cms/docs/SUMMARY.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -18,10 +18,10 @@
1818
*[SQL extension](api/sql-extension/README.md)
1919
*[pgml.embed()](api/sql-extension/pgml.embed.md)
2020
*[pgml.transform()](api/sql-extension/pgml.transform/README.md)
21-
*[FillMask](api/sql-extension/pgml.transform/fill-mask.md)
22-
*[QuestionAnswering](api/sql-extension/pgml.transform/question-answering.md)
21+
*[Fill-Mask](api/sql-extension/pgml.transform/fill-mask.md)
22+
*[Questionanswering](api/sql-extension/pgml.transform/question-answering.md)
2323
*[Summarization](api/sql-extension/pgml.transform/summarization.md)
24-
*[TextClassification](api/sql-extension/pgml.transform/text-classification.md)
24+
*[Textclassification](api/sql-extension/pgml.transform/text-classification.md)
2525
*[Text Generation](api/sql-extension/pgml.transform/text-generation.md)
2626
*[Text-to-Text Generation](api/sql-extension/pgml.transform/text-to-text-generation.md)
2727
*[Token Classification](api/sql-extension/pgml.transform/token-classification.md)

‎pgml-cms/docs/api/sql-extension/pgml.embed.md

Lines changed: 2 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@ pgml.embed(
2222
|----------|-------------|---------|
2323
| transformer| The name of a Hugging Face embedding model.|`intfloat/e5-large-v2`|
2424
| text| The text to embed. This can be a string or the name of a column from a PostgreSQL table.|`'I am your father, Luke'`|
25-
| kwargs| Additional arguments that are passed to the model.||
25+
| kwargs| Additional arguments that are passed to the model during inference.||
2626

2727
###Examples
2828

@@ -43,7 +43,7 @@ SELECT * FROM pgml.embed(
4343
{% endtab %}
4444
{% endtabs %}
4545

46-
####Generate embeddingsfrom a table
46+
####Generate embeddingsinside a table
4747

4848
SQL functions can be used as part of a query to insert, update, or even automatically generate column values of any table:
4949

@@ -96,9 +96,3 @@ LIMIT 1;
9696
{% endtabs %}
9797

9898
This query will return the quote with the most similar meaning to`'Feel the force!'` by generating an embedding of that quote and comparing it to all other embeddings in the table, using vector cosine similarity as the measure of distance.
99-
100-
##Performance
101-
102-
First time`pgml.embed()` is called with a new model, it is downloaded from Hugging Face and saved in the cache directory. Subsequent calls will use the cached model, which is faster, and if the connection to the database is kept open, the model will be reused across multiple queries without being unloaded from memory.
103-
104-
If a GPU is available, the model will be automatically loaded onto the GPU and the embedding generation will be even faster.

‎pgml-cms/docs/api/sql-extension/pgml.transform/README.md

Lines changed: 60 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@ The `pgml.transform()` function comes in two flavors, task-based and model-based
2727

2828
###Task-based API
2929

30-
The task-based API automatically chooses a modelto usebased on the task:
30+
The task-based API automatically chooses a model based on the task:
3131

3232
```postgresql
3333
pgml.transform(
@@ -37,22 +37,34 @@ pgml.transform(
3737
)
3838
```
3939

40-
| Argument| Description| Example|
41-
|----------|-------------|---------|
42-
| task| The name of a natural language processing task.|`text-generation`|
43-
| args| Additional kwargs to pass to the pipeline.|`{"max_new_tokens": 50}`|
44-
| inputs| Array of prompts to pass to the model for inference.|`['Once upon a time...']`|
40+
| Argument| Description| Example| Required|
41+
|----------|-------------|---------|----------|
42+
| task| The name of a natural language processing task.|`'text-generation'`| Required|
43+
| args| Additional kwargs to pass to the pipeline.|`'{"max_new_tokens": 50}'::JSONB`| Optional|
44+
| inputs| Array of prompts to pass to the model for inference.Each prompt is evaluated independently and a separate result is returned.|`ARRAY['Once upon a time...']`| Required|
4545

46-
####Example
46+
####Examples
4747

4848
{% tabs %}
49-
{% tab title="SQL" %}
49+
{% tabs %}
50+
{% tab title="Text generation" %}
51+
52+
```postgresql
53+
SELECT *
54+
FROM pgml.transform(
55+
task => 'text-generation',
56+
inputs => ARRAY['In a galaxy far far away']
57+
);
58+
```
59+
60+
{% endtab %}
61+
{% tab title="Translation" %}
5062

5163
```postgresql
5264
SELECT *
53-
FROM pgml.transform(
54-
'translation_en_to_fr',
55-
'How do I say hello in French?',
65+
FROM pgml.transform(
66+
task =>'translation_en_to_fr',
67+
inputs => ARRAY['How do I say hello in French?']
5668
);
5769
```
5870

@@ -61,7 +73,7 @@ FROM pgml.transform (
6173

6274
###Model-based API
6375

64-
The model-based API requires the name of the model and the task, passed as a JSON object, which allows it to be more generic:
76+
The model-based API requires the name of the model and the task, passed as a JSON object. This allows it to be more generic and support more models:
6577

6678
```postgresql
6779
pgml.transform(
@@ -71,16 +83,41 @@ pgml.transform(
7183
)
7284
```
7385

74-
| Argument| Description| Example|
75-
|----------|-------------|---------|
76-
| task| Model configuration, including name and task.|`{"task": "text-generation", "model": "mistralai/Mixtral-8x7B-v0.1"}`|
77-
| args| Additional kwargs to pass to the pipeline.|`{"max_new_tokens": 50}`|
78-
| inputs| Array of prompts to pass to the model for inference.|`['Once upon a time...']`|
86+
<tableclass="table-sm table">
87+
<thead>
88+
<th>Argument</th>
89+
<th>Description</th>
90+
<th>Example</th>
91+
</thead>
92+
<tbody>
93+
<tr>
94+
<td>model</td>
95+
<td>Model configuration, including name and task.</td>
96+
<td>
97+
<div class="code-multi-line font-monospace">
98+
'{
99+
<br>&nbsp;&nbsp;"task": "text-generation",
100+
<br>&nbsp;&nbsp;"model": "mistralai/Mixtral-8x7B-v0.1"
101+
<br>}'::JSONB
102+
</div>
103+
</td>
104+
</tr>
105+
<tr>
106+
<td>args</td>
107+
<td>Additional kwargs to pass to the pipeline.</td>
108+
<td><code>'{"max_new_tokens": 50}'::JSONB</code></td>
109+
</tr>
110+
<tr>
111+
<td>inputs</td>
112+
<td>Array of prompts to pass to the model for inference. Each prompt is evaluated independently.</td>
113+
<td><code>ARRAY['Once upon a time...']</code></td>
114+
</tr>
115+
</table>
79116

80117
####Example
81118

82119
{% tabs %}
83-
{% tab title="SQL" %}
120+
{% tab title="PostgresMLSQL" %}
84121

85122
```postgresql
86123
SELECT pgml.transform(
@@ -89,8 +126,9 @@ SELECT pgml.transform(
89126
"model": "TheBloke/zephyr-7B-beta-GPTQ",
90127
"model_type": "mistral",
91128
"revision": "main",
129+
"device_map": "auto"
92130
}'::JSONB,
93-
inputs => ['AI is going to change the world in the following ways:'],
131+
inputs =>ARRAY['AI is going to'],
94132
args => '{
95133
"max_new_tokens": 100
96134
}'::JSONB
@@ -138,11 +176,12 @@ PostgresML currently supports most NLP tasks available on Hugging Face:
138176
|[Token classification](token-classification)|`token-classification`| Classify tokens in a text.|
139177
|[Translation](translation)|`translation`| Translate text from one language to another.|
140178
|[Zero-shot classification](zero-shot-classification)|`zero-shot-classification`| Classify a text without training data.|
179+
| Conversational|`conversational`| Engage in a conversation with the model, e.g. chatbot.|
141180

181+
###Structured inputs
142182

143-
##Performance
183+
Both versions of the`pgml.transform()` function also support structured inputs, formatted with JSON. Structured inputs are used with the conversational task, e.g. to differentiate between the system and user prompts. Simply replace the text array argument with an array of JSONB objects.
144184

145-
Much like`pgml.embed()`, the models used in`pgml.transform()` are downloaded from Hugging Face and cached locally. If the connection to the database is kept open, the model remains in memory, which allows for faster inference on subsequent calls. If you want to free up memory, you can close the connection.
146185

147186
##Additional resources
148187

‎pgml-cms/docs/api/sql-extension/pgml.transform/fill-mask.md

Lines changed: 49 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -2,29 +2,69 @@
22
description:Task to fill words in a sentence that are hidden
33
---
44

5-
#FillMask
5+
#Fill-Mask
66

7-
Fill-mask refers to a task where certain words in a sentence are hidden or "masked", and the objective is to predict what words should fill in those masked positions. Such models are valuable when we want to gain statistical insights about the language used to train the model.
7+
Fill-Mask is a task where certain words in a sentence are hidden or "masked", and the objective for the model is to predict what words should fill in those masked positions. Such models are valuable when we want to gain statistical insights about the language used to train the model.
8+
9+
##Example
10+
11+
{% tabs %}
12+
{% tab title="SQL" %}
813

914
```sql
1015
SELECTpgml.transform(
1116
task=>'{
1217
"task" : "fill-mask"
1318
}'::JSONB,
1419
inputs=> ARRAY[
15-
'Paris is the<mask> of France.'
20+
'Paris is the&lt;mask&gt; of France.'
1621

1722
]
1823
)AS answer;
1924
```
2025

21-
_Result_
26+
{% endtab %}
27+
28+
{% tab title="Result" %}
2229

2330
```json
2431
[
25-
{"score":0.679,"token":812,"sequence":"Paris is the capital of France.","token_str":" capital"},
26-
{"score":0.051,"token":32357,"sequence":"Paris is the birthplace of France.","token_str":" birthplace"},
27-
{"score":0.038,"token":1144,"sequence":"Paris is the heart of France.","token_str":" heart"},
28-
{"score":0.024,"token":29778,"sequence":"Paris is the envy of France.","token_str":" envy"},
29-
{"score":0.022,"token":1867,"sequence":"Paris is the Capital of France.","token_str":" Capital"}]
32+
{
33+
"score":0.6811484098434448,
34+
"token":812,
35+
"sequence":"Paris is the capital of France.",
36+
"token_str":" capital"
37+
},
38+
{
39+
"score":0.050908513367176056,
40+
"token":32357,
41+
"sequence":"Paris is the birthplace of France.",
42+
"token_str":" birthplace"
43+
},
44+
{
45+
"score":0.03812871500849724,
46+
"token":1144,
47+
"sequence":"Paris is the heart of France.",
48+
"token_str":" heart"
49+
},
50+
{
51+
"score":0.024047480896115303,
52+
"token":29778,
53+
"sequence":"Paris is the envy of France.",
54+
"token_str":" envy"
55+
},
56+
{
57+
"score":0.022767696529626846,
58+
"token":1867,
59+
"sequence":"Paris is the Capital of France.",
60+
"token_str":" Capital"
61+
}
62+
]
3063
```
64+
65+
{% endtab %}
66+
{% endtabs %}
67+
68+
###Additional resources
69+
70+
-[Hugging Face documentation](https://huggingface.co/tasks/fill-mask)

‎pgml-cms/docs/api/sql-extension/pgml.transform/question-answering.md

Lines changed: 19 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,15 @@
11
---
2-
description:Retrieve the answer to a question from a given text
2+
description:Retrieve the answer to a question from a given text.
33
---
44

5-
#QuestionAnswering
5+
#Questionanswering
66

7-
Question Answering models are designed to retrieve the answer to a question from a given text, which can be particularly useful for searching for information within a document. It's worth noting that some question answering models are capable of generating answers even without any contextual information.
7+
Question answering models are designed to retrieve the answer to a question from a given text, which can be particularly useful for searching for information within a document. It's worth noting that some question answering models are capable of generating answers even without any contextual information.
8+
9+
##Example
10+
11+
{% tabs %}
12+
{% tab title="SQL" %}
813

914
```sql
1015
SELECTpgml.transform(
@@ -18,7 +23,9 @@ SELECT pgml.transform(
1823
)AS answer;
1924
```
2025

21-
_Result_
26+
{% endtab %}
27+
28+
{% tab title="Result" %}
2229

2330
```json
2431
{
@@ -28,3 +35,11 @@ _Result_
2835
"answer":"İstanbul"
2936
}
3037
```
38+
39+
{% endtab %}
40+
{% endtabs %}
41+
42+
43+
###Additional resources
44+
45+
-[Hugging Face documentation](https://huggingface.co/tasks/question-answering)
Lines changed: 25 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -1,53 +1,46 @@
11
---
2-
description:Task of creating a condensed version of a document
2+
description:Task of creating a condensed version of a document.
33
---
44

55
#Summarization
66

77
Summarization involves creating a condensed version of a document that includes the important information while reducing its length. Different models can be used for this task, with some models extracting the most relevant text from the original document, while other models generate completely new text that captures the essence of the original content.
88

9+
##Example
10+
11+
{% tabs %}
12+
{% tab title="SQL" %}
13+
914
```sql
1015
SELECTpgml.transform(
11-
task=>'{"task": "summarization",
12-
"model": "sshleifer/distilbart-cnn-12-6"
16+
task=>'{
17+
"task": "summarization",
18+
"model": "google/pegasus-xsum"
1319
}'::JSONB,
14-
inputs=> array[
15-
'Paris is the capital and most populous city of France, with an estimated population of 2,175,601 residents as of 2018, in an area of more than 105 square kilometres (41 square miles). The City of Paris is the centre and seat of government of the region and province of Île-de-France, or Paris Region, which has an estimated population of 12,174,880, or about 18 percent of the population of France as of 2017.'
16-
]
20+
inputs=> array[
21+
'Paris is the capital and most populous city of France, with an estimated population of 2,175,601 residents as of 2018,
22+
in an area of more than 105 square kilometres (41 square miles). The City of Paris is the centre and seat of government
23+
of the region and province of Île-de-France, or Paris Region, which has an estimated population of 12,174,880,
24+
or about 18 percent of the population of France as of 2017.'
25+
]
1726
);
1827
```
1928

20-
_Result_
29+
{% endtab %}
30+
{% tab title="Result" %}
2131

2232
```json
2333
[
24-
{
25-
"summary_text":"Paris is the capital and most populous cityofFrance, with an estimated population of 2,175,601 residents as of 2018 . The cityis the centre and seat of government of the region and province ofÎle-de-France, or Paris Region . Paris Regionhas an estimated 18 percent of the population of France as of 2017."
26-
}
34+
{
35+
"summary_text":"The CityofParisis the centre and seat of government of the region and province ofle-de-France, or Paris Region, whichhas an estimatedpopulation of 12,174,880, or about18 percent of the population of France as of 2017."
36+
}
2737
]
2838
```
2939

30-
You can control the length of summary\_text by passing`min_length` and`max_length` as arguments to the SQL query.
40+
{% endtab %}
41+
{% endtabs %}
3142

32-
```sql
33-
SELECTpgml.transform(
34-
task=>'{"task": "summarization",
35-
"model": "sshleifer/distilbart-cnn-12-6"
36-
}'::JSONB,
37-
inputs=> array[
38-
'Paris is the capital and most populous city of France, with an estimated population of 2,175,601 residents as of 2018, in an area of more than 105 square kilometres (41 square miles). The City of Paris is the centre and seat of government of the region and province of Île-de-France, or Paris Region, which has an estimated population of 12,174,880, or about 18 percent of the population of France as of 2017.'
39-
],
40-
args=>'{
41-
"min_length" : 20,
42-
"max_length" : 70
43-
}'::JSONB
44-
);
45-
```
43+
###Additional resources
4644

47-
```json
48-
[
49-
{
50-
"summary_text":" Paris is the capital and most populous city of France, with an estimated population of 2,175,601 residents as of 2018 . City of Paris is centre and seat of government of the region and province of Île-de-France, or Paris Region, which has an estimated 12,174,880, or about 18 percent"
51-
}
52-
]
53-
```
45+
-[Hugging Face documentation](https://huggingface.co/tasks/summarization)
46+
-[google/pegasus-xsum](https://huggingface.co/google/pegasus-xsum)

0 commit comments

Comments
 (0)

[8]ページ先頭

©2009-2025 Movatter.jp