nobu-g/JGLUE-evaluation-scriptsPublic

NotificationsYou must be signed in to change notification settings
Fork0
Star17

Training and evaluation scripts for JGLUE, a Japanese language understanding benchmark

License

Apache-2.0 license

17 stars 0 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 366 Commits
.github		.github
configs		configs
scripts		scripts
src		src
sweeps		sweeps
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Repository files navigation

JGLUE Evaluation Scripts

Requirements

Python: 3.9+
Dependencies: Seepyproject.toml.

Getting started

Create a virtual environment and install dependencies.
```
$ uv venv -p /path/to/python$ uv sync
```
Log in towandb.
```
$ wandb login
```

Training and evaluation

You can train and test a model with the following command:

# For training and evaluating MARC-jauv run python src/train.py -cn marc_ja devices=[0,1] max_batches_per_device=16

Here are commonly used options:

-cn: Task name. Choose frommarc_ja,jcola,jsts,jnli,jsquad, andjcqa.
devices: GPUs to use.
max_batches_per_device: Maximum number of batches to process per device (default:4).
compile: JIT-compile the modelwithtorch.compile for faster training (default:false).
model: Pre-trained model name. see YAML config files underconfigs/model.

To evaluate on the out-of-domain split of the JCoLA dataset, specifydatamodule/valid=jcola_ood (ordatamodule/valid=jcola_ood_annotated).For more options, see YAML config files underconfigs.

Debugging

uv run python scripts/train.py -cn marc_ja.debug

You can specifytrainer=cpu.debug to use CPU.

uv run python scripts/train.py -cn marc_ja.debug trainer=cpu.debug

If you are on a machine with GPUs, you can specify the GPUs to use with thedevices option.

uv run python scripts/train.py -cn marc_ja.debug devices=[0]

Tuning hyper-parameters

$ wandb sweep<(sed's/MODEL_NAME/deberta_base/' sweeps/jcola.yaml)wandb: Creating sweep from: /dev/fd/xxwandb: Created sweep with ID: xxxxxxxxwandb: View sweep at: https://wandb.ai/<wandb-user>/JGLUE-evaluation-scripts/sweeps/xxxxxxxxwandb: Run sweep agent with: wandb agent<wandb-user>/JGLUE-evaluation-scripts/xxxxxxxx$ DEVICES=0,1 MAX_BATCHES_PER_DEVICE=16 COMPILE=true wandb agent<wandb-user>/JGLUE-evaluation-scripts/xxxxxxxx

Results

We fine-tuned the following models and evaluated them on the dev set of JGLUE.We tuned learning rate and training epochs for each model and taskfollowingthe JGLUE paper.

Model	MARC-ja/acc	JCoLA/acc	JSTS/pearson	JSTS/spearman	JNLI/acc	JSQuAD/EM	JSQuAD/F1	JComQA/acc
Waseda RoBERTa base	0.965	0.867	0.913	0.876	0.905	0.853	0.916	0.853
Waseda RoBERTa large (seq512)	0.969	0.849	0.925	0.890	0.928	0.910	0.955	0.900
LUKE Japanese base*	0.965	-	0.916	0.877	0.912	-	-	0.842
LUKE Japanese large*	0.965	-	0.932	0.902	0.927	-	-	0.893
DeBERTaV2 base	0.970	0.879	0.922	0.886	0.922	0.899	0.951	0.873
DeBERTaV2 large	0.968	0.882	0.925	0.892	0.924	0.912	0.959	0.890
DeBERTaV3 base	0.960	0.878	0.927	0.891	0.927	0.896	0.947	0.875

*The scores of LUKE are fromthe official repository.

Tuned hyper-parameters

Learning rate: {2e-05, 3e-05, 5e-05}

Model	MARC-ja/acc	JCoLA/acc	JSTS/pearson	JSTS/spearman	JNLI/acc	JSQuAD/EM	JSQuAD/F1	JComQA/acc
Waseda RoBERTa base	3e-05	3e-05	2e-05	2e-05	3e-05	3e-05	3e-05	5e-05
Waseda RoBERTa large (seq512)	2e-05	2e-05	3e-05	3e-05	2e-05	2e-05	2e-05	3e-05
DeBERTaV2 base	2e-05	3e-05	5e-05	5e-05	3e-05	2e-05	2e-05	5e-05
DeBERTaV2 large	5e-05	2e-05	5e-05	5e-05	2e-05	2e-05	2e-05	3e-05
DeBERTaV3 base	5e-05	2e-05	3e-05	3e-05	2e-05	5e-05	5e-05	2e-05

Training epochs: {3, 4}

Model	MARC-ja/acc	JCoLA/acc	JSTS/pearson	JSTS/spearman	JNLI/acc	JSQuAD/EM	JSQuAD/F1	JComQA/acc
Waseda RoBERTa base	4	3	4	4	3	4	4	3
Waseda RoBERTa large (seq512)	4	4	4	4	3	3	3	3
DeBERTaV2 base	3	4	3	3	3	4	4	4
DeBERTaV2 large	3	3	4	4	3	4	4	3
DeBERTaV3 base	4	4	4	4	4	4	4	4

Huggingface hub links

Waseda RoBERTa base:nlp-waseda/roberta-base-japanese
Waseda RoBERTa large (seq512):nlp-waseda/roberta-large-japanese-seq512
LUKE Japanese base:studio-ousia/luke-base-japanese
LUKE Japanese large:studio-ousia/luke-large-japanese
DeBERTaV2 base:ku-nlp/deberta-v2-base-japanese
DeBERTaV2 large:ku-nlp/deberta-v2-large-japanese
DeBERTaV3 base:ku-nlp/deberta-v3-base-japanese

Author

Nobuhiro Ueda (uedaat nlp.ist.i.kyoto-u.ac.jp)

Reference

yahoojapan/JGLUE: JGLUE: Japanese General Language Understanding Evaluation
JGLUE: Japanese General Language Understanding Evaluation (Kurihara etal., LREC 2022)
栗原健太郎, 河原大輔, 柴田知秀, JGLUE: 日本語言語理解ベンチマーク, 自然言語処理, 2023, 30 巻, 1 号, p. 63-87, 公開日2023/03/15, Online ISSN 2185-8314, Print ISSN1340-7619,https://doi.org/10.5715/jnlp.30.63,https://www.jstage.jst.go.jp/article/jnlp/30/1/30_63/_article/-char/ja

About

Training and evaluation scripts for JGLUE, a Japanese language understanding benchmark

Releases

No releases published

Packages

No packages published

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

License

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

JGLUE Evaluation Scripts

Requirements

Getting started

Training and evaluation

Debugging

Tuning hyper-parameters

Results

Tuned hyper-parameters

Huggingface hub links

Author

Reference

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages

Uh oh!

Contributors3

Uh oh!

Languages

Movatterモバイル変換

License

nobu-g/JGLUE-evaluation-scripts

Folders and files

Latest commit

History

Repository files navigation

JGLUE Evaluation Scripts

Requirements

Getting started

Training and evaluation

Debugging

Tuning hyper-parameters

Results

Tuned hyper-parameters

Huggingface hub links

Author

Reference

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages0

Uh oh!

Contributors3

Uh oh!

Languages

Packages