Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commit9284cf1

Browse files
committed
Added a tutorial for 9 classes - draft 1
1 parent7cbee43 commit9284cf1

File tree

1 file changed

+75
-4
lines changed

1 file changed

+75
-4
lines changed

‎README.md

Lines changed: 75 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -47,7 +47,8 @@
4747
-[Fill-Mask](#fill-mask)
4848
-[Vector Database](#vector-database)
4949
-[LLM Fine-tuning](#llm-fine-tuning)
50-
-[Text Classification](#llm-fine-tuning-text-classification)
50+
-[Text Classification - 2 classes](#text-classification-2-classes)
51+
-[Text Classification - 9 classes](#text-classification-9-classes)
5152
<!-- - [Regression](#regression)
5253
- [Classification](#classification)-->
5354

@@ -878,7 +879,7 @@ In this section, we will provide a step-by-step walkthrough for fine-tuning a La
878879

879880
2. Obtain a Hugging Face API token to push the fine-tuned model to the Hugging Face Model Hub. Follow the instructions on the[Hugging Face website](https://huggingface.co/settings/tokens) to get your API token.
880881

881-
##LLM Fine-tuningText Classification
882+
##Text Classification 2 Classes
882883

883884
###1. Loading the Dataset
884885

@@ -1245,7 +1246,77 @@ SELECT pgml.tune(
12451246

12461247
By following these steps, you can effectively restart trainingfrom a previously trained model, allowing for further refinementand adaptation of the model basedon new requirementsor insights. Adjust parametersas needed for your specific use caseand dataset.
12471248

1248-
##Conclusion
1249+
##Text Classification 9 Classes
12491250

1250-
By following these steps, you can leverage PostgresML to seamlessly integrate fine-tuning of Language Models fortext classification directly within your PostgreSQL database. Adjust the dataset, model,and hyperparameters to suit your specific requirements.
1251+
### 1. Load and Shuffle the Dataset
1252+
In this section, webegin by loading the FinGPT sentiment analysis dataset using the`pgml.load_dataset` function. The dataset is then processedand organized into a shuffled view (pgml.fingpt_sentiment_shuffled_view), ensuring a randomized order of records. This step is crucial for preventing biases introduced by the original data orderingand enhancing the training process.
12511253

1254+
```sql
1255+
-- Load the dataset
1256+
SELECT pgml.load_dataset('FinGPT/fingpt-sentiment-train');
1257+
1258+
-- Create a shuffled view
1259+
CREATE VIEW pgml.fingpt_sentiment_shuffled_view AS
1260+
SELECT * FROM pgml."FinGPT/fingpt-sentiment-train" ORDER BY RANDOM();
1261+
```
1262+
1263+
### 2. Explore Class Distribution
1264+
Once the dataset is loadedand shuffled, we delve into understanding the distribution of sentiment classes within the data. By querying the shuffled view, we obtain valuable insights into thenumber of instances for each sentiment class. This exploration is essential for gaining a comprehensive understanding of the datasetand its inherent class imbalances.
1265+
1266+
```sql
1267+
-- Explore class distribution
1268+
SELECT
1269+
output,
1270+
COUNT(*) AS class_count
1271+
FROM pgml.fingpt_sentiment_shuffled_view
1272+
GROUP BY output
1273+
ORDER BY output;
1274+
1275+
```
1276+
1277+
### 3. Create Training and Test Views
1278+
To facilitate the training process, we create distinct views for trainingand testing purposes. The training view (pgml.fingpt_sentiment_train_view) contains80% of the shuffled dataset, enabling the model to learn patternsand associations. Simultaneously, the test view (pgml.fingpt_sentiment_test_view) encompasses the remaining20% of the data, providing a reliable evaluationset to assess the model's performance.
1279+
1280+
```sql
1281+
-- Create a view for training data (e.g., 80% of the shuffled records)
1282+
CREATE VIEW pgml.fingpt_sentiment_train_view AS
1283+
SELECT *
1284+
FROM pgml.fingpt_sentiment_shuffled_view
1285+
LIMIT (SELECT COUNT(*) * 0.8 FROM pgml.fingpt_sentiment_shuffled_view);
1286+
1287+
-- Create a view for test data (remaining 20% of the shuffled records)
1288+
CREATE VIEW pgml.fingpt_sentiment_test_view AS
1289+
SELECT *
1290+
FROM pgml.fingpt_sentiment_shuffled_view
1291+
OFFSET (SELECT COUNT(*) * 0.8 FROM pgml.fingpt_sentiment_shuffled_view);
1292+
1293+
```
1294+
1295+
### 4. Fine-Tune the Model for 9 Classes
1296+
In the final section, we kick off the fine-tuning process using the `pgml.tune` function. The model will be internally configured for sentiment analysis with 9 classes. The training is executed on the 80% of the train view and evaluated on the remaining 20% of the train view. The test view is reserved for evaluating the model's accuracy after training is completed. Please note that the option`hub_private_repo: true` is used to push the model to a private Hugging Face repository.
1297+
1298+
```sql
1299+
-- Fine-tune the model for 9 classes without HUB token
1300+
SELECT pgml.tune(
1301+
'fingpt_sentiement',
1302+
task => 'text-classification',
1303+
relation_name => 'pgml.fingpt_sentiment_train_view',
1304+
model_name => 'distilbert-base-uncased',
1305+
test_size => 0.2,
1306+
test_sampling => 'last',
1307+
hyperparams => '{
1308+
"training_args": {
1309+
"learning_rate": 2e-5,
1310+
"per_device_train_batch_size": 16,
1311+
"per_device_eval_batch_size": 16,
1312+
"num_train_epochs": 5,
1313+
"weight_decay": 0.01,
1314+
"hub_token" : "YOUR_HUB_TOKEN",
1315+
"push_to_hub": true,
1316+
"hub_private_repo": true
1317+
},
1318+
"dataset_args": { "text_column": "input", "class_column": "output" }
1319+
}'
1320+
);
1321+
1322+
```

0 commit comments

Comments
 (0)

[8]ページ先頭

©2009-2025 Movatter.jp