Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commitdd233ac

Browse files
committed
Update README.md
1 parentac802c7 commitdd233ac

File tree

8 files changed

+258
-194
lines changed

8 files changed

+258
-194
lines changed

‎README.md

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -108,6 +108,23 @@ zero/few-shot設定ではプロンプトが非常に重要になりますが、
108108
表から、LoRAでの学習にはある程度大きな学習率が有効であるものの、その上限は`1e-3`くらいで、`1e-2`などの非常に大きな学習率を使うと、学習がうまくいかなくなってしまうことがわかります。
109109
もう少し広範なモデルでの実験結果が欲しいところですが、LLM+LoRAで分類を行う場合は、`5e-4`くらいの学習率を初手で試すのが安牌ではないかなと思います。
110110

111+
さらに、モデルを`rinna/japanese-gpt-neox-3.6b``template_type``2`、LoRAの`r``32`に固定した場合の、batch sizeごとの性能の違いをみてみます。
112+
113+
| batch size| LR| Val. F1| Accuracy| Precision| Recall| F1|
114+
| ---------:| :---:| :-----:| :------:| :-------:| :----:| :---:|
115+
| 2| 5e-4| 97.12| 98.10| 98.02| 97.48| 97.70|
116+
| 16| 1e-3| 97.12| 97.83| 97.77| 97.37| 97.52|
117+
| 32| 1e-3| 96.92| 97.69| 97.51| 97.27| 97.36|
118+
| 64| 5e-4| 96.57| 97.55| 97.39| 97.35| 97.35|
119+
| 4| 5e-4| 97.08| 97.42| 97.37| 97.01| 97.15|
120+
| 8| 3e-4| 97.20| 97.28| 96.99| 96.87| 96.91|
121+
122+
この表はF値について降順に並んでいます。
123+
結果としては、batch sizeの違いによって性能差がある程度出そうな可能性がある、ということはいえそうですが、今回の実験では一つの乱数シード値で1度しか実験を行っていないため、明確な結論を出すのは難しそうな結果となりました。
124+
一般にbatch sizeが小さい方が訓練に長い時間を要し、性能も不安定になる傾向があることから、とりあえずbatch sizeは16か32くらいにしておくのがいいかもしれません。
125+
126+
127+
111128
最後に、モデルを`rinna/japanese-gpt-neox-3.6b``template_type``2`に固定した場合の、LoRAのrごとの性能を見てみます。
112129

113130
| LoRA r| LR| Val. F1| Accuracy| Precision| Recall| F1|

‎results/all.csv

Lines changed: 161 additions & 141 deletions
Large diffs are not rendered by default.

‎results/best.csv

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
1-
model_name,lr,lora_r,template_type,best-val-epoch,best-val-f1,loss,accuracy,precision,recall,f1
2-
rinna/japanese-gpt-neox-3.6b-instruction-sft-v2,0.0005,32,2,6.0,0.972675653985774,0.31981012095575745,0.9796195652173914,0.9777368541759694,0.9776445774827304,0.9775271926753373
3-
rinna/japanese-gpt-neox-3.6b,0.001,32,1,7.0,0.9714318645388512,0.2791306454202403,0.9755434782608695,0.9724152726276595,0.9738750261536485,0.9730127556638624
4-
rinna/japanese-gpt-neox-3.6b-instruction-sft,0.001,32,1,8.0,0.9744637209605678,0.24878821165665335,0.9755434782608695,0.9732493908790878,0.9726913122530654,0.972730121283728
5-
rinna/japanese-gpt-neox-3.6b-instruction-ppo,0.0003,32,1,8.0,0.9749415170556315,0.01194865029791127,0.9755434782608695,0.9703188348020095,0.973691510312087,0.9717878006990937
6-
cyberagent/open-calm-7b,0.0005,32,1,9.0,0.9721767440618394,0.4630058122717816,0.970108695652174,0.9675934574125561,0.9641708318193566,0.9655370120538438
7-
cyberagent/open-calm-3b,0.0003,32,2,6.0,0.978706517497902,0.3900306121162746,0.96875,0.9637587728670183,0.9651366691204668,0.9642023118285358
8-
cyberagent/open-calm-1b,0.0003,32,1,9.0,0.9354804772511817,0.4417938978775688,0.9442934782608695,0.9424439019277594,0.9380075831803892,0.939817312595081
1+
model_name,lr,batch_size,lora_r,template_type,best-val-epoch,best-val-f1,loss,accuracy,precision,recall,f1
2+
rinna/japanese-gpt-neox-3.6b-instruction-sft-v2,0.0005,32,32,2,6.0,0.972675653985774,0.31981012095575745,0.9796195652173914,0.9777368541759694,0.9776445774827304,0.9775271926753373
3+
rinna/japanese-gpt-neox-3.6b,0.001,32,32,1,7.0,0.9714318645388512,0.2791306454202403,0.9755434782608695,0.9724152726276595,0.9738750261536485,0.9730127556638624
4+
rinna/japanese-gpt-neox-3.6b-instruction-sft,0.001,32,32,1,8.0,0.9744637209605678,0.24878821165665335,0.9755434782608695,0.9732493908790878,0.9726913122530654,0.972730121283728
5+
rinna/japanese-gpt-neox-3.6b-instruction-ppo,0.0003,32,32,1,8.0,0.9749415170556315,0.01194865029791127,0.9755434782608695,0.9703188348020095,0.973691510312087,0.9717878006990937
6+
cyberagent/open-calm-7b,0.0005,32,32,1,9.0,0.9721767440618394,0.4630058122717816,0.970108695652174,0.9675934574125561,0.9641708318193566,0.9655370120538438
7+
cyberagent/open-calm-3b,0.0003,32,32,2,6.0,0.978706517497902,0.3900306121162746,0.96875,0.9637587728670183,0.9651366691204668,0.9642023118285358
8+
cyberagent/open-calm-1b,0.0003,32,32,1,9.0,0.9354804772511817,0.4417938978775688,0.9442934782608695,0.9424439019277594,0.9380075831803892,0.939817312595081

‎results/lr.csv

Lines changed: 13 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,13 @@
1-
model_name,lr,lora_r,template_type,best-val-epoch,best-val-f1,loss,accuracy,precision,recall,f1
2-
rinna/japanese-gpt-neox-3.6b,0.05,32,2,,0.02178649237472767,,0.12907608695652173,0.014341787439613526,0.1111111111111111,0.025404465837678834
3-
rinna/japanese-gpt-neox-3.6b,0.03,32,2,,0.02178649237472767,,0.12907608695652173,0.014341787439613526,0.1111111111111111,0.025404465837678834
4-
rinna/japanese-gpt-neox-3.6b,0.01,32,2,0.0,0.02178649237472767,,0.12907608695652173,0.014341787439613526,0.1111111111111111,0.025404465837678834
5-
rinna/japanese-gpt-neox-3.6b,0.005,32,2,9.0,0.24782949123856568,0.07766325577445653,0.3220108695652174,0.362972442018135,0.30273233652691306,0.2821083167234807
6-
rinna/japanese-gpt-neox-3.6b,0.003,32,2,0.0,0.02178649237472767,,0.12907608695652173,0.014341787439613526,0.1111111111111111,0.025404465837678834
7-
rinna/japanese-gpt-neox-3.6b,0.001,32,2,8.0,0.9692023837818683,0.32656831326692,0.9769021739130435,0.9751131471972203,0.9726814705282213,0.9735649694748739
8-
rinna/japanese-gpt-neox-3.6b,0.0005,32,2,7.0,0.9676627778762603,0.1648371115974758,0.9823369565217391,0.9801636609319908,0.9787140777661717,0.9792501059515425
9-
rinna/japanese-gpt-neox-3.6b,0.0003,32,2,9.0,0.9673515686703067,0.22281936977220618,0.96875,0.9646122014002015,0.96205760973849,0.9630130846936239
10-
rinna/japanese-gpt-neox-3.6b,0.0001,32,2,6.0,0.9479486156840069,0.27199859204499616,0.970108695652174,0.9685454813262968,0.9671523401339012,0.9676011282967338
11-
rinna/japanese-gpt-neox-3.6b,5e-05,32,2,7.0,0.9428255855629699,0.007521537335022636,0.9592391304347826,0.9572636842620663,0.9549589480148636,0.9558110742366291
12-
rinna/japanese-gpt-neox-3.6b,3e-05,32,2,8.0,0.9373914494534366,0.3120411997256072,0.9402173913043478,0.9350482310055871,0.9360980338305569,0.9354864525782044
13-
rinna/japanese-gpt-neox-3.6b,1e-05,32,2,8.0,0.7894091702723591,0.6105850883152174,0.8125,0.8021489508329067,0.7942614254968875,0.7962131510664897
1+
model_name,lr,batch_size,lora_r,template_type,best-val-epoch,best-val-f1,loss,accuracy,precision,recall,f1
2+
rinna/japanese-gpt-neox-3.6b,0.05,32,32,2,,0.02178649237472767,,0.12907608695652173,0.014341787439613526,0.1111111111111111,0.025404465837678834
3+
rinna/japanese-gpt-neox-3.6b,0.03,32,32,2,,0.02178649237472767,,0.12907608695652173,0.014341787439613526,0.1111111111111111,0.025404465837678834
4+
rinna/japanese-gpt-neox-3.6b,0.01,32,32,2,0.0,0.02178649237472767,,0.12907608695652173,0.014341787439613526,0.1111111111111111,0.025404465837678834
5+
rinna/japanese-gpt-neox-3.6b,0.005,32,32,2,9.0,0.24782949123856568,0.07766325577445653,0.3220108695652174,0.362972442018135,0.30273233652691306,0.2821083167234807
6+
rinna/japanese-gpt-neox-3.6b,0.003,32,32,2,0.0,0.02178649237472767,,0.12907608695652173,0.014341787439613526,0.1111111111111111,0.025404465837678834
7+
rinna/japanese-gpt-neox-3.6b,0.001,32,32,2,8.0,0.9692023837818683,0.32656831326692,0.9769021739130435,0.9751131471972203,0.9726814705282213,0.9735649694748739
8+
rinna/japanese-gpt-neox-3.6b,0.0005,32,32,2,7.0,0.9676627778762603,0.1648371115974758,0.9823369565217391,0.9801636609319908,0.9787140777661717,0.9792501059515425
9+
rinna/japanese-gpt-neox-3.6b,0.0003,32,32,2,9.0,0.9673515686703067,0.22281936977220618,0.96875,0.9646122014002015,0.96205760973849,0.9630130846936239
10+
rinna/japanese-gpt-neox-3.6b,0.0001,32,32,2,6.0,0.9479486156840069,0.27199859204499616,0.970108695652174,0.9685454813262968,0.9671523401339012,0.9676011282967338
11+
rinna/japanese-gpt-neox-3.6b,5e-05,32,32,2,7.0,0.9428255855629699,0.007521537335022636,0.9592391304347826,0.9572636842620663,0.9549589480148636,0.9558110742366291
12+
rinna/japanese-gpt-neox-3.6b,3e-05,32,32,2,8.0,0.9373914494534366,0.3120411997256072,0.9402173913043478,0.9350482310055871,0.9360980338305569,0.9354864525782044
13+
rinna/japanese-gpt-neox-3.6b,1e-05,32,32,2,8.0,0.7894091702723591,0.6105850883152174,0.8125,0.8021489508329067,0.7942614254968875,0.7962131510664897

‎results/r.csv

Lines changed: 7 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
1-
model_name,lr,lora_r,template_type,best-val-epoch,best-val-f1,loss,accuracy,precision,recall,f1
2-
rinna/japanese-gpt-neox-3.6b,0.0005,8,2,9.0,0.9745265022260914,0.0001252619144709214,0.9714673913043478,0.9696909476867,0.9675000833840384,0.9683212865779591
3-
rinna/japanese-gpt-neox-3.6b,0.001,64,2,8.0,0.9722178251982494,0.19521456179411514,0.9728260869565217,0.969571761239398,0.9684849090012144,0.9688862165533373
4-
rinna/japanese-gpt-neox-3.6b,0.001,16,2,6.0,0.9720044802852491,0.2179690236630647,0.9769021739130435,0.9758731082778743,0.9726714879177705,0.9737736732633085
5-
rinna/japanese-gpt-neox-3.6b,0.0003,4,2,3.0,0.9712207562574271,0.0030310957328132963,0.9769021739130435,0.9763959983056318,0.9723598222135378,0.9739662595223572
6-
rinna/japanese-gpt-neox-3.6b,0.001,32,2,8.0,0.9692023837818683,0.32656831326692,0.9769021739130435,0.9751131471972203,0.9726814705282213,0.9735649694748739
1+
model_name,lr,batch_size,lora_r,template_type,best-val-epoch,best-val-f1,loss,accuracy,precision,recall,f1
2+
rinna/japanese-gpt-neox-3.6b,0.0003,8,32,2,6.0,0.9719836606292459,3.2393828682277513e-10,0.9728260869565217,0.969915204379785,0.9686869222039347,0.969090597749366
3+
rinna/japanese-gpt-neox-3.6b,0.001,16,32,2,6.0,0.9712203216176003,0.001815833313309628,0.9782608695652174,0.977742792333945,0.9736805465119684,0.9751880027713722
4+
rinna/japanese-gpt-neox-3.6b,0.0005,2,32,2,9.0,0.9711932113227147,0.0,0.9809782608695652,0.9801549703121704,0.9748360138979351,0.9770025998800451
5+
rinna/japanese-gpt-neox-3.6b,0.0005,4,32,2,8.0,0.970769531790148,1.6196914341138756e-10,0.9741847826086957,0.9737083250713154,0.9701445437950714,0.9714534339623773
6+
rinna/japanese-gpt-neox-3.6b,0.001,32,32,2,8.0,0.9692023837818683,0.32656831326692,0.9769021739130435,0.9751131471972203,0.9726814705282213,0.9735649694748739
7+
rinna/japanese-gpt-neox-3.6b,0.0005,64,32,2,5.0,0.9656896363870922,0.009133089495741802,0.9755434782608695,0.9739427009003551,0.9735245594737807,0.9735495118410125

‎results/template.csv

Lines changed: 22 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -1,22 +1,22 @@
1-
model_name,lr,lora_r,template_type,best-val-epoch,best-val-f1,loss,accuracy,precision,recall,f1
2-
rinna/japanese-gpt-neox-3.6b-instruction-sft-v2,0.0005,32,2,6.0,0.972675653985774,0.31981012095575745,0.9796195652173914,0.9777368541759694,0.9776445774827304,0.9775271926753373
3-
rinna/japanese-gpt-neox-3.6b-instruction-sft-v2,0.0005,32,1,6.0,0.9718251020165288,0.4276221731434698,0.9741847826086957,0.9729574156574005,0.9705056718297419,0.9713745384480791
4-
rinna/japanese-gpt-neox-3.6b-instruction-sft-v2,0.001,32,0,7.0,0.9704718337442715,0.3859038145645805,0.9714673913043478,0.9711221699533995,0.9659990027012361,0.9679901382825964
5-
rinna/japanese-gpt-neox-3.6b-instruction-sft,0.001,32,1,8.0,0.9744637209605678,0.24878821165665335,0.9755434782608695,0.9732493908790878,0.9726913122530654,0.972730121283728
6-
rinna/japanese-gpt-neox-3.6b-instruction-sft,0.0005,32,2,4.0,0.9720591817102174,0.2664394171341606,0.9823369565217391,0.9817167424286074,0.9802904551108901,0.9807491518924935
7-
rinna/japanese-gpt-neox-3.6b-instruction-sft,0.0003,32,0,8.0,0.9719762175294555,0.2313618867293648,0.9769021739130435,0.9743773454326975,0.9737555319272919,0.9736986980100907
8-
rinna/japanese-gpt-neox-3.6b-instruction-ppo,0.0003,32,1,8.0,0.9749415170556315,0.01194865029791127,0.9755434782608695,0.9703188348020095,0.973691510312087,0.9717878006990937
9-
rinna/japanese-gpt-neox-3.6b-instruction-ppo,0.0003,32,0,4.0,0.9725054353288302,0.3162763429724652,0.9769021739130435,0.9734543138430379,0.9734725233130652,0.973285187874197
10-
rinna/japanese-gpt-neox-3.6b-instruction-ppo,0.0003,32,2,4.0,0.9692350021308259,0.009632307550181513,0.9741847826086957,0.9696371437649349,0.9709240102866123,0.9702058769346517
11-
rinna/japanese-gpt-neox-3.6b,0.001,32,1,7.0,0.9714318645388512,0.2791306454202403,0.9755434782608695,0.9724152726276595,0.9738750261536485,0.9730127556638624
12-
rinna/japanese-gpt-neox-3.6b,0.001,32,2,8.0,0.9692023837818683,0.32656831326692,0.9769021739130435,0.9751131471972203,0.9726814705282213,0.9735649694748739
13-
rinna/japanese-gpt-neox-3.6b,0.001,32,0,5.0,0.9661477517122675,0.46608543395996094,0.9714673913043478,0.9668219206511591,0.9672679680937202,0.9669080183716638
14-
cyberagent/open-calm-7b,0.0005,32,1,9.0,0.9721767440618394,0.4630058122717816,0.970108695652174,0.9675934574125561,0.9641708318193566,0.9655370120538438
15-
cyberagent/open-calm-7b,0.0003,32,0,7.0,0.9706802625048326,0.6126027729200281,0.970108695652174,0.9651957120265597,0.9665182555769127,0.9655961069048524
16-
cyberagent/open-calm-7b,0.001,32,2,5.0,0.9687790086726576,0.5211613281913425,0.9728260869565217,0.9686593614999146,0.9685751576362038,0.9684875154665361
17-
cyberagent/open-calm-3b,0.0003,32,2,6.0,0.978706517497902,0.3900306121162746,0.96875,0.9637587728670183,0.9651366691204668,0.9642023118285358
18-
cyberagent/open-calm-3b,0.0003,32,0,8.0,0.9757983071002586,0.6012747391410496,0.9646739130434783,0.9591225228274664,0.9597275753276926,0.9591430422917269
19-
cyberagent/open-calm-3b,0.0003,32,1,6.0,0.973837507510648,0.5647119024525518,0.96875,0.9642563514358058,0.9632628764380868,0.9634765863889336
20-
cyberagent/open-calm-1b,0.0003,32,1,9.0,0.9354804772511817,0.4417938978775688,0.9442934782608695,0.9424439019277594,0.9380075831803892,0.939817312595081
21-
cyberagent/open-calm-1b,0.0005,32,0,8.0,0.9288325624236864,0.46775249812913977,0.9307065217391305,0.9291389343144885,0.9239648209576914,0.9256573229890049
22-
cyberagent/open-calm-1b,0.0005,32,2,9.0,0.9079371848176309,0.31601302520088526,0.904891304347826,0.8967762431036896,0.8983576757509568,0.8971041201211061
1+
model_name,lr,batch_size,lora_r,template_type,best-val-epoch,best-val-f1,loss,accuracy,precision,recall,f1
2+
rinna/japanese-gpt-neox-3.6b-instruction-sft-v2,0.0005,32,32,2,6.0,0.972675653985774,0.31981012095575745,0.9796195652173914,0.9777368541759694,0.9776445774827304,0.9775271926753373
3+
rinna/japanese-gpt-neox-3.6b-instruction-sft-v2,0.0005,32,32,1,6.0,0.9718251020165288,0.4276221731434698,0.9741847826086957,0.9729574156574005,0.9705056718297419,0.9713745384480791
4+
rinna/japanese-gpt-neox-3.6b-instruction-sft-v2,0.001,32,32,0,7.0,0.9704718337442715,0.3859038145645805,0.9714673913043478,0.9711221699533995,0.9659990027012361,0.9679901382825964
5+
rinna/japanese-gpt-neox-3.6b-instruction-sft,0.001,32,32,1,8.0,0.9744637209605678,0.24878821165665335,0.9755434782608695,0.9732493908790878,0.9726913122530654,0.972730121283728
6+
rinna/japanese-gpt-neox-3.6b-instruction-sft,0.0005,32,32,2,4.0,0.9720591817102174,0.2664394171341606,0.9823369565217391,0.9817167424286074,0.9802904551108901,0.9807491518924935
7+
rinna/japanese-gpt-neox-3.6b-instruction-sft,0.0003,32,32,0,8.0,0.9719762175294555,0.2313618867293648,0.9769021739130435,0.9743773454326975,0.9737555319272919,0.9736986980100907
8+
rinna/japanese-gpt-neox-3.6b-instruction-ppo,0.0003,32,32,1,8.0,0.9749415170556315,0.01194865029791127,0.9755434782608695,0.9703188348020095,0.973691510312087,0.9717878006990937
9+
rinna/japanese-gpt-neox-3.6b-instruction-ppo,0.0003,32,32,0,4.0,0.9725054353288302,0.3162763429724652,0.9769021739130435,0.9734543138430379,0.9734725233130652,0.973285187874197
10+
rinna/japanese-gpt-neox-3.6b-instruction-ppo,0.0003,32,32,2,4.0,0.9692350021308259,0.009632307550181513,0.9741847826086957,0.9696371437649349,0.9709240102866123,0.9702058769346517
11+
rinna/japanese-gpt-neox-3.6b,0.001,32,32,1,7.0,0.9714318645388512,0.2791306454202403,0.9755434782608695,0.9724152726276595,0.9738750261536485,0.9730127556638624
12+
rinna/japanese-gpt-neox-3.6b,0.001,32,32,2,8.0,0.9692023837818683,0.32656831326692,0.9769021739130435,0.9751131471972203,0.9726814705282213,0.9735649694748739
13+
rinna/japanese-gpt-neox-3.6b,0.001,32,32,0,5.0,0.9661477517122675,0.46608543395996094,0.9714673913043478,0.9668219206511591,0.9672679680937202,0.9669080183716638
14+
cyberagent/open-calm-7b,0.0005,32,32,1,9.0,0.9721767440618394,0.4630058122717816,0.970108695652174,0.9675934574125561,0.9641708318193566,0.9655370120538438
15+
cyberagent/open-calm-7b,0.0003,32,32,0,7.0,0.9706802625048326,0.6126027729200281,0.970108695652174,0.9651957120265597,0.9665182555769127,0.9655961069048524
16+
cyberagent/open-calm-7b,0.001,32,32,2,5.0,0.9687790086726576,0.5211613281913425,0.9728260869565217,0.9686593614999146,0.9685751576362038,0.9684875154665361
17+
cyberagent/open-calm-3b,0.0003,32,32,2,6.0,0.978706517497902,0.3900306121162746,0.96875,0.9637587728670183,0.9651366691204668,0.9642023118285358
18+
cyberagent/open-calm-3b,0.0003,32,32,0,8.0,0.9757983071002586,0.6012747391410496,0.9646739130434783,0.9591225228274664,0.9597275753276926,0.9591430422917269
19+
cyberagent/open-calm-3b,0.0003,32,32,1,6.0,0.973837507510648,0.5647119024525518,0.96875,0.9642563514358058,0.9632628764380868,0.9634765863889336
20+
cyberagent/open-calm-1b,0.0003,32,32,1,9.0,0.9354804772511817,0.4417938978775688,0.9442934782608695,0.9424439019277594,0.9380075831803892,0.939817312595081
21+
cyberagent/open-calm-1b,0.0005,32,32,0,8.0,0.9288325624236864,0.46775249812913977,0.9307065217391305,0.9291389343144885,0.9239648209576914,0.9256573229890049
22+
cyberagent/open-calm-1b,0.0005,32,32,2,9.0,0.9079371848176309,0.31601302520088526,0.904891304347826,0.8967762431036896,0.8983576757509568,0.8971041201211061

0 commit comments

Comments
 (0)

[8]ページ先頭

©2009-2025 Movatter.jp