oshizo/JapaneseEmbeddingEvalPublic

NotificationsYou must be signed in to change notification settings
Fork8
Star180

You must be signed in to change notification settings

Folders and files

Repository files navigation

⚠️重要 2024/10/8 より多様なタスクにより埋め込みモデルを評価したリーダーボード JMTEBが公開されておりますので、こちらを参照することをお勧めします。
⚠️IMPORTANT UPDATE: we recommend checking outJMTEB, a new leaderboard that evaluates embedding models using a more diverse set of tasks.

JapaneseEmbeddingEval

JSTS/JSICK: Spearman's rank correlation coefficient
- Cosine similarity was used to calculate the similarity of sentence pairs.
MIRACL: top30 recall

Model	#dims	#params	JSTS valid-v1.1	JSICK test	MIRACL dev	Average
BAAI/bge-m3(dense_vecs)	1024	567M	0.802	0.798	0.910¹	0.837
jinaai/jina-embeddings-v3	1024	12M	0.819	0.782	0.862	0.821
MU-Kindai/SBERT-JSNLI-base	768	110M	0.766	0.652	0.326	0.581
MU-Kindai/SBERT-JSNLI-large	1024	337M	0.774	0.677	0.278	0.576
bclavie/fio-base-japanese-v0.1²	768	111M	0.863	0.894	0.718	0.825
cl-nagoya/ruri-small	768	67M	0.821	0.833	0.791¹	0.815
cl-nagoya/ruri-base	768	111M	0.833	0.823	0.846¹	0.834
cl-nagoya/ruri-large	1024	337M	0.842	0.819	0.864¹	0.842
cl-nagoya/sup-simcse-ja-base	768	111M	0.809	0.827	0.527	0.721
cl-nagoya/sup-simcse-ja-large	1024	337M	0.831	0.831	0.507	0.723
cl-nagoya/unsup-simcse-ja-base	768	111M	0.789	0.790	0.487	0.689
cl-nagoya/unsup-simcse-ja-large	1024	337M	0.814	0.796	0.485	0.699
colorfulscoop/sbert-base-ja	768	110M	0.742	0.657	0.254	0.551
intfloat/multilingual-e5-small	384	117M	0.789	0.814	0.847¹	0.817
intfloat/multilingual-e5-base	768	278M	0.796	0.806	0.845¹	0.816
intfloat/multilingual-e5-large	1024	559M	0.819	0.794	0.883¹	0.832
intfloat/multilingual-e5-large-instruct	1024	559M	0.832	0.822	0.876¹	0.844
oshizo/sbert-jsnli-luke-japanese-base-lite	768	133M	0.811	0.726	0.497	0.678
pkshatech/GLuCoSE-base-ja-v2	768	133M	0.809	0.849	0.879¹	0.846
pkshatech/RoSEtta-base-ja	768	190M	0.790	0.835	0.845¹	0.823
pkshatech/GLuCoSE-base-ja	768	133M	0.818	0.757	0.692	0.755
pkshatech/simcse-ja-bert-base-clcmlp	768	111M	0.801	0.735	0.544	0.693
API
text-embedding-3-large	3072		0.838	0.812	0.841³	0.830
text-embedding-3-small	1536		0.781	0.804	0.795³	0.793
text-embedding-ada-002	1536		0.790	0.790	0.728³	0.769
textembedding-gecko-multilingual@001	768		0.801	0.804	0.800³	0.801
LLM
intfloat/e5-mistral-7b-instruct	4096	7.3B	0.836	0.836	0.885	0.852
oshizo/japanese-e5-mistral-7b_slerp	4096	7.3B	0.846	0.842	0.886	0.858
oshizo/japanese-e5-mistral-1.9b	4096	1.9B	0.826	0.833	0.797	0.819
ColBERT
bclavie/jacolbert_first_100⁴	128/token	111M			0.872³
bclavie/JaColBERTv2⁴	128/token	111M			0.918³
BAAI/bge-m3(colbert_vecs)	1024/token	567M	0.799	0.798	0.917¹	0.838
BAAI/bge-m3(colbert+sparse+dense)	1024/token⁵	567M	0.800	0.805	0.926¹	0.844
Reranker
hotchpotch/japanese-bge-reranker-v2-m3-v1	-	567M			0.947¹
Sparse Retrieval
hotchpotch/japanese-splade-base-v1	-	111M			0.925¹

Datasets

JSTS valid-v1.1
- https://github.com/yahoojapan/JGLUE
- 1,457 sentence pairs
JSICK test
- https://github.com/verypluming/JSICK
- 4,927 sentence pairs
MIRACL dev
- https://huggingface.co/datasets/miracl/miracl
- 860 japanese queries
- From the 6,953,614 japanese data in miracl/miracl-corpus, the sentences to be searched were selected as follows to reduce computation time.
  1. positive passage for each query
  2. 300 hard negatives for each query
  - Hard negative mining was performed using intfloat/multilingual-e5-base
  - Scores for models other than intfloat/multilingual-e5-base are calculated higher only in the following case, but we believe that they are almost unaffected.
    - A negative that is ranked lower than the top 300 by intfloat/multilingual-e5-base is ranked within the top 30 by that model, which pushes the positive into the top 30 or lower.
- Some queries contain more than 30 potential positive documents in the miracl-corpus. In this case, even a very good model may not be able to rank the ground truth positive documents within the top 30. We estimated such queries to be about 7% to 10% of the total 860 queries. This number was estimated by referring to the tydiqa data for the same query as the corresponding miracl dev query and counting whether the tydiqa answer phrase was in at least 30 of the 300 hard negatives documents.

Footnotes

These models have been fine-tuned using the MIRACL dataset, so the MIRACL task is not an unseen task for them. For detailed information on each model, please refer to the following links:multilingual-e5,BGE-M3,hotchpotch/japanese-bge-reranker-v2-m3-v1,hotchpotch/japanese-splade-base-v1,Ruri,pkshatech/GLuCoSE-base-ja-v2,pkshatech/RoSEtta-base-ja ↩↩² ↩³ ↩⁴ ↩⁵ ↩⁶ ↩⁷ ↩⁸ ↩⁹ ↩¹⁰ ↩¹¹ ↩¹² ↩¹³ ↩¹⁴
According to theblog post about fio-base-japanese-v0.1, the tasks aren't unseen by the model, which makes it hard to directly compare with the other models.↩
Evaluate only the first 100 queries out of 860 queries↩↩² ↩³ ↩⁴ ↩⁵ ↩⁶
JaColBERT is a retrieval model. It is optimised only for document retrieval tasks, and not for semantic similarity/entailment tasks like JSTS or JSICK.↩↩²
Embedded dimension for dence is 1024, sparse is one float value per unique token, colbert is 1024 per token.↩

About

No description, website, or topics provided.

Releases

No releases published

Packages

No packages published

Contributors7

Languages

Jupyter Notebook100.0%

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

JapaneseEmbeddingEval

Datasets

Footnotes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages

Uh oh!

Contributors7

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 70 Commits
scores		scores
.gitignore		.gitignore
10_intfloat_multilingual-e5-base.ipynb		10_intfloat_multilingual-e5-base.ipynb
11_intfloat_multilingual-e5-large.ipynb		11_intfloat_multilingual-e5-large.ipynb
12_pkshatech_GLuCoSE-base-ja.ipynb		12_pkshatech_GLuCoSE-base-ja.ipynb
13_cl-nagoya_unsup-simcse-ja-base.ipynb		13_cl-nagoya_unsup-simcse-ja-base.ipynb
14_cl-nagoya_unsup-simcse-ja-large.ipynb		14_cl-nagoya_unsup-simcse-ja-large.ipynb
15_cl-nagoya_sup-simcse-ja-base.ipynb		15_cl-nagoya_sup-simcse-ja-base.ipynb
16_cl-nagoya_sup-simcse-ja-large.ipynb		16_cl-nagoya_sup-simcse-ja-large.ipynb
17_intfloat_multilingual-e5-small.ipynb		17_intfloat_multilingual-e5-small.ipynb
18_bclavie_fio-base-japanese-v0.1.ipynb		18_bclavie_fio-base-japanese-v0.1.ipynb
19_textembedding-gecko-multilingual@001.ipynb		19_textembedding-gecko-multilingual@001.ipynb
1_colorfulscoop_sbert-base-ja.ipynb		1_colorfulscoop_sbert-base-ja.ipynb
20_intfloat_e5-mistral-7b-instruct.ipynb		20_intfloat_e5-mistral-7b-instruct.ipynb
21_oshizo_japanese-e5-mistral-7b_slerp.ipynb		21_oshizo_japanese-e5-mistral-7b_slerp.ipynb
22_text-embedding-3-small_1536.ipynb		22_text-embedding-3-small_1536.ipynb
23_text-embedding-3-large_3072.ipynb		23_text-embedding-3-large_3072.ipynb
24_bclavie_JaColBERT.ipynb		24_bclavie_JaColBERT.ipynb
25_BAAI_bge-m3.ipynb		25_BAAI_bge-m3.ipynb
25_BAAI_bge-m3_all.ipynb		25_BAAI_bge-m3_all.ipynb
25_BAAI_bge-m3_colbert.ipynb		25_BAAI_bge-m3_colbert.ipynb
26_oshizo_japanese-e5-mistral-1.9b.ipynb		26_oshizo_japanese-e5-mistral-1.9b.ipynb
27_intfloat_multilingual-e5-large-instruct.ipynb		27_intfloat_multilingual-e5-large-instruct.ipynb
28_bclavie_JaColBERTv2.ipynb		28_bclavie_JaColBERTv2.ipynb
29_hotchpotch_japanese-bge-reranker-v2-m3-v1.ipynb		29_hotchpotch_japanese-bge-reranker-v2-m3-v1.ipynb
2_MU-Kindai_SBERT-JSNLI-base.ipynb		2_MU-Kindai_SBERT-JSNLI-base.ipynb
30_hotchpotch_japanse_splade.ipynb		30_hotchpotch_japanse_splade.ipynb
31_cl-nagoya_ruri-small.ipynb		31_cl-nagoya_ruri-small.ipynb
32_cl-nagoya_ruri-base.ipynb		32_cl-nagoya_ruri-base.ipynb
33_cl-nagoya_ruri-large.ipynb		33_cl-nagoya_ruri-large.ipynb
34_pkshatech_GLuCoSE-base-ja-v2.ipynb		34_pkshatech_GLuCoSE-base-ja-v2.ipynb
35_pkshatech_RoSEtta-base-ja.ipynb		35_pkshatech_RoSEtta-base-ja.ipynb
36_jinaai_jina-embeddings-v3.ipynb		36_jinaai_jina-embeddings-v3.ipynb
3_MU-Kindai_SBERT-JSNLI-large.ipynb		3_MU-Kindai_SBERT-JSNLI-large.ipynb
4_sonoisa_sentence-bert-base-ja-mean-tokens-v2.ipynb		4_sonoisa_sentence-bert-base-ja-mean-tokens-v2.ipynb
5_oshizo_sbert-jsnli-luke-japanese-base-lite.ipynb		5_oshizo_sbert-jsnli-luke-japanese-base-lite.ipynb
6_pkshatech_simcse-ja-bert-base-clcmlp.ipynb		6_pkshatech_simcse-ja-bert-base-clcmlp.ipynb
7_universal-sentence-encoder-multilingual-large-3.ipynb		7_universal-sentence-encoder-multilingual-large-3.ipynb
8_universal-sentence-encoder-multilingual-3.ipynb		8_universal-sentence-encoder-multilingual-3.ipynb
9_text-embedding-ada-002_1536.ipynb		9_text-embedding-ada-002_1536.ipynb
README.md		README.md
huggingface_access_token		huggingface_access_token
miracl_hard_negs_1000.json		miracl_hard_negs_1000.json
openai_key		openai_key
requirements.txt		requirements.txt
runner.ipynb		runner.ipynb
template_openai.ipynb		template_openai.ipynb
template_sentence-transformers.ipynb		template_sentence-transformers.ipynb

Movatterモバイル変換

oshizo/JapaneseEmbeddingEval

Folders and files

Latest commit

History

Repository files navigation

JapaneseEmbeddingEval

Datasets

Footnotes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages0

Uh oh!

Contributors7

Uh oh!

Languages

Packages