- Notifications
You must be signed in to change notification settings - Fork0
Training and evaluation scripts for JGLUE, a Japanese language understanding benchmark
License
nobu-g/JGLUE-evaluation-scripts
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
- Python: 3.9+
- Dependencies: Seepyproject.toml.
Create a virtual environment and install dependencies.
$ uv venv -p /path/to/python$ uv sync
Log in towandb.
$ wandb login
You can train and test a model with the following command:
# For training and evaluating MARC-jauv run python src/train.py -cn marc_ja devices=[0,1] max_batches_per_device=16
Here are commonly used options:
-cn
: Task name. Choose frommarc_ja
,jcola
,jsts
,jnli
,jsquad
, andjcqa
.devices
: GPUs to use.max_batches_per_device
: Maximum number of batches to process per device (default:4
).compile
: JIT-compile the modelwithtorch.compile for faster training (default:false
).model
: Pre-trained model name. see YAML config files underconfigs/model.
To evaluate on the out-of-domain split of the JCoLA dataset, specifydatamodule/valid=jcola_ood
(ordatamodule/valid=jcola_ood_annotated
).For more options, see YAML config files underconfigs.
uv run python scripts/train.py -cn marc_ja.debug
You can specifytrainer=cpu.debug
to use CPU.
uv run python scripts/train.py -cn marc_ja.debug trainer=cpu.debug
If you are on a machine with GPUs, you can specify the GPUs to use with thedevices
option.
uv run python scripts/train.py -cn marc_ja.debug devices=[0]
$ wandb sweep<(sed's/MODEL_NAME/deberta_base/' sweeps/jcola.yaml)wandb: Creating sweep from: /dev/fd/xxwandb: Created sweep with ID: xxxxxxxxwandb: View sweep at: https://wandb.ai/<wandb-user>/JGLUE-evaluation-scripts/sweeps/xxxxxxxxwandb: Run sweep agent with: wandb agent<wandb-user>/JGLUE-evaluation-scripts/xxxxxxxx$ DEVICES=0,1 MAX_BATCHES_PER_DEVICE=16 COMPILE=true wandb agent<wandb-user>/JGLUE-evaluation-scripts/xxxxxxxx
We fine-tuned the following models and evaluated them on the dev set of JGLUE.We tuned learning rate and training epochs for each model and taskfollowingthe JGLUE paper.
Model | MARC-ja/acc | JCoLA/acc | JSTS/pearson | JSTS/spearman | JNLI/acc | JSQuAD/EM | JSQuAD/F1 | JComQA/acc |
---|---|---|---|---|---|---|---|---|
Waseda RoBERTa base | 0.965 | 0.867 | 0.913 | 0.876 | 0.905 | 0.853 | 0.916 | 0.853 |
Waseda RoBERTa large (seq512) | 0.969 | 0.849 | 0.925 | 0.890 | 0.928 | 0.910 | 0.955 | 0.900 |
LUKE Japanese base* | 0.965 | - | 0.916 | 0.877 | 0.912 | - | - | 0.842 |
LUKE Japanese large* | 0.965 | - | 0.932 | 0.902 | 0.927 | - | - | 0.893 |
DeBERTaV2 base | 0.970 | 0.879 | 0.922 | 0.886 | 0.922 | 0.899 | 0.951 | 0.873 |
DeBERTaV2 large | 0.968 | 0.882 | 0.925 | 0.892 | 0.924 | 0.912 | 0.959 | 0.890 |
DeBERTaV3 base | 0.960 | 0.878 | 0.927 | 0.891 | 0.927 | 0.896 | 0.947 | 0.875 |
*The scores of LUKE are fromthe official repository.
- Learning rate: {2e-05, 3e-05, 5e-05}
Model | MARC-ja/acc | JCoLA/acc | JSTS/pearson | JSTS/spearman | JNLI/acc | JSQuAD/EM | JSQuAD/F1 | JComQA/acc |
---|---|---|---|---|---|---|---|---|
Waseda RoBERTa base | 3e-05 | 3e-05 | 2e-05 | 2e-05 | 3e-05 | 3e-05 | 3e-05 | 5e-05 |
Waseda RoBERTa large (seq512) | 2e-05 | 2e-05 | 3e-05 | 3e-05 | 2e-05 | 2e-05 | 2e-05 | 3e-05 |
DeBERTaV2 base | 2e-05 | 3e-05 | 5e-05 | 5e-05 | 3e-05 | 2e-05 | 2e-05 | 5e-05 |
DeBERTaV2 large | 5e-05 | 2e-05 | 5e-05 | 5e-05 | 2e-05 | 2e-05 | 2e-05 | 3e-05 |
DeBERTaV3 base | 5e-05 | 2e-05 | 3e-05 | 3e-05 | 2e-05 | 5e-05 | 5e-05 | 2e-05 |
- Training epochs: {3, 4}
Model | MARC-ja/acc | JCoLA/acc | JSTS/pearson | JSTS/spearman | JNLI/acc | JSQuAD/EM | JSQuAD/F1 | JComQA/acc |
---|---|---|---|---|---|---|---|---|
Waseda RoBERTa base | 4 | 3 | 4 | 4 | 3 | 4 | 4 | 3 |
Waseda RoBERTa large (seq512) | 4 | 4 | 4 | 4 | 3 | 3 | 3 | 3 |
DeBERTaV2 base | 3 | 4 | 3 | 3 | 3 | 4 | 4 | 4 |
DeBERTaV2 large | 3 | 3 | 4 | 4 | 3 | 4 | 4 | 3 |
DeBERTaV3 base | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 |
- Waseda RoBERTa base:nlp-waseda/roberta-base-japanese
- Waseda RoBERTa large (seq512):nlp-waseda/roberta-large-japanese-seq512
- LUKE Japanese base:studio-ousia/luke-base-japanese
- LUKE Japanese large:studio-ousia/luke-large-japanese
- DeBERTaV2 base:ku-nlp/deberta-v2-base-japanese
- DeBERTaV2 large:ku-nlp/deberta-v2-large-japanese
- DeBERTaV3 base:ku-nlp/deberta-v3-base-japanese
Nobuhiro Ueda (uedaat nlp.ist.i.kyoto-u.ac.jp)
- yahoojapan/JGLUE: JGLUE: Japanese General Language Understanding Evaluation
- JGLUE: Japanese General Language Understanding Evaluation (Kurihara etal., LREC 2022)
- 栗原 健太郎, 河原 大輔, 柴田 知秀, JGLUE: 日本語言語理解ベンチマーク, 自然言語処理, 2023, 30 巻, 1 号, p. 63-87, 公開日2023/03/15, Online ISSN 2185-8314, Print ISSN1340-7619,https://doi.org/10.5715/jnlp.30.63,https://www.jstage.jst.go.jp/article/jnlp/30/1/30_63/_article/-char/ja
About
Training and evaluation scripts for JGLUE, a Japanese language understanding benchmark
Resources
License
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Releases
Packages0
Uh oh!
There was an error while loading.Please reload this page.
Contributors3
Uh oh!
There was an error while loading.Please reload this page.