- Notifications
You must be signed in to change notification settings - Fork14
The evaluation scripts of JMTEB (Japanese Massive Text Embedding Benchmark)
License
sbintuitions/JMTEB
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
README |leaderboard |submission guideline
JMTEB is a benchmark for evaluating Japanese text embedding models. It consists of 5 tasks.
This is an easy-to-use evaluation script designed for JMTEB evaluation.
JMTEB leaderboard ishere. If you would like to submit your model, please refer to thesubmission guideline.
git clone git@github.com:sbintuitions/JMTEBcd JMTEBpoetry installpoetry run pytest tests
The following command evaluate the specified model on the all the tasks in JMTEB.
poetry run python -m jmteb \ --embedder SentenceBertEmbedder \ --embedder.model_name_or_path"<model_name_or_path>" \ --save_dir"output/<model_name_or_path>"
Note
In order to gurantee the robustness of evaluation, a validation dataset is mandatorily required for hyperparameter tuning.For a dataset that doesn't have a validation set, we set the validation set the same as the test set.
By default, the evaluation tasks are read fromsrc/jmteb/configs/jmteb.jsonnet
.If you want to evaluate the model on a specific task, you can specify the task via--evaluators
option with the task config.
poetry run python -m jmteb \ --evaluators"src/configs/tasks/jsts.jsonnet" \ --embedder SentenceBertEmbedder \ --embedder.model_name_or_path"<model_name_or_path>" \ --save_dir"output/<model_name_or_path>"
Note
Some tasks (e.g., AmazonReviewClassification in classification, JAQKET and Mr.TyDi-ja in retrieval, esci in reranking) are time-consuming and memory-consuming. Heavy retrieval tasks take hours to encode the large corpus, and use much memory for the storage of such vectors. If you want to exclude them, add--eval_exclude "['amazon_review_classification', 'mrtydi', 'jaqket', 'esci']"
. Similarly, you can also use--eval_include
to include only evaluation datasets you want.
Note
If you want to log model predictions to further analyze the performance of your model, you may want to use--log_predictions true
to enable all evaluators to log predictions. It is also available to set whether to log in the config of evaluators.
There are two ways to enable multi-GPU evaluation.
- New class
DataParallelSentenceBertEmbedder
(here).
poetry run python -m jmteb \ --evaluators"src/configs/tasks/jsts.jsonnet" \ --embedder DataParallelSentenceBertEmbedder \ --embedder.model_name_or_path"<model_name_or_path>" \ --save_dir"output/<model_name_or_path>"
- With
torchrun
, multi-GPU inTransformersEmbedder
is available. For example,
MODEL_NAME=<model_name_or_path>MODEL_KWARGS="\{\'torch_dtype\':\'torch.bfloat16\'\}"torchrun \ --nproc_per_node=$GPUS_PER_NODE --nnodes=1 \ src/jmteb/__main__.py --embedder TransformersEmbedder \ --embedder.model_name_or_path${MODEL_NAME} \ --embedder.pooling_mode cls \ --embedder.batch_size 4096 \ --embedder.model_kwargs${MODEL_KWARGS} \ --embedder.max_seq_length 512 \ --save_dir"output/${MODEL_NAME}" \ --evaluators src/jmteb/configs/jmteb.jsonnet
Note that the batch size here is global batch size (per_device_batch_size
×n_gpu
).
About
The evaluation scripts of JMTEB (Japanese Massive Text Embedding Benchmark)
Resources
License
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Packages0
Uh oh!
There was an error while loading.Please reload this page.
Contributors6
Uh oh!
There was an error while loading.Please reload this page.