Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings
NotificationsYou must be signed in to change notification settings

oshizo/JapaneseEmbeddingEval

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

70 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

⚠️重要 2024/10/8 より多様なタスクにより埋め込みモデルを評価したリーダーボードJMTEBが公開されておりますので、こちらを参照することをお勧めします。
⚠️IMPORTANT UPDATE: we recommend checking outJMTEB, a new leaderboard that evaluates embedding models using a more diverse set of tasks.

JapaneseEmbeddingEval

  • JSTS/JSICK: Spearman's rank correlation coefficient
    • Cosine similarity was used to calculate the similarity of sentence pairs.
  • MIRACL: top30 recall
Model#dims#paramsJSTS valid-v1.1JSICK testMIRACL devAverage
BAAI/bge-m3(dense_vecs)1024567M0.8020.7980.91010.837
jinaai/jina-embeddings-v3102412M0.8190.7820.8620.821
MU-Kindai/SBERT-JSNLI-base768110M0.7660.6520.3260.581
MU-Kindai/SBERT-JSNLI-large1024337M0.7740.6770.2780.576
bclavie/fio-base-japanese-v0.12768111M0.8630.8940.7180.825
cl-nagoya/ruri-small76867M0.8210.8330.79110.815
cl-nagoya/ruri-base768111M0.8330.8230.84610.834
cl-nagoya/ruri-large1024337M0.8420.8190.86410.842
cl-nagoya/sup-simcse-ja-base768111M0.8090.8270.5270.721
cl-nagoya/sup-simcse-ja-large1024337M0.8310.8310.5070.723
cl-nagoya/unsup-simcse-ja-base768111M0.7890.7900.4870.689
cl-nagoya/unsup-simcse-ja-large1024337M0.8140.7960.4850.699
colorfulscoop/sbert-base-ja768110M0.7420.6570.2540.551
intfloat/multilingual-e5-small384117M0.7890.8140.84710.817
intfloat/multilingual-e5-base768278M0.7960.8060.84510.816
intfloat/multilingual-e5-large1024559M0.8190.7940.88310.832
intfloat/multilingual-e5-large-instruct1024559M0.8320.8220.87610.844
oshizo/sbert-jsnli-luke-japanese-base-lite768133M0.8110.7260.4970.678
pkshatech/GLuCoSE-base-ja-v2768133M0.8090.8490.87910.846
pkshatech/RoSEtta-base-ja768190M0.7900.8350.84510.823
pkshatech/GLuCoSE-base-ja768133M0.8180.7570.6920.755
pkshatech/simcse-ja-bert-base-clcmlp768111M0.8010.7350.5440.693
API
text-embedding-3-large30720.8380.8120.84130.830
text-embedding-3-small15360.7810.8040.79530.793
text-embedding-ada-00215360.7900.7900.72830.769
textembedding-gecko-multilingual@0017680.8010.8040.80030.801
LLM
intfloat/e5-mistral-7b-instruct40967.3B0.8360.8360.8850.852
oshizo/japanese-e5-mistral-7b_slerp40967.3B0.8460.8420.8860.858
oshizo/japanese-e5-mistral-1.9b40961.9B0.8260.8330.7970.819
ColBERT
bclavie/jacolbert_first_1004128/token111M0.8723
bclavie/JaColBERTv24128/token111M0.9183
BAAI/bge-m3(colbert_vecs)1024/token567M0.7990.7980.91710.838
BAAI/bge-m3(colbert+sparse+dense)1024/token5567M0.8000.8050.92610.844
Reranker
hotchpotch/japanese-bge-reranker-v2-m3-v1-567M0.9471
Sparse Retrieval
hotchpotch/japanese-splade-base-v1-111M0.9251

Datasets

  • JSTS valid-v1.1

  • JSICK test

  • MIRACL dev

    • https://huggingface.co/datasets/miracl/miracl
    • 860 japanese queries
    • From the 6,953,614 japanese data in miracl/miracl-corpus, the sentences to be searched were selected as follows to reduce computation time.
      1. positive passage for each query
      2. 300 hard negatives for each query
      • Hard negative mining was performed using intfloat/multilingual-e5-base
      • Scores for models other than intfloat/multilingual-e5-base are calculated higher only in the following case, but we believe that they are almost unaffected.
        • A negative that is ranked lower than the top 300 by intfloat/multilingual-e5-base is ranked within the top 30 by that model, which pushes the positive into the top 30 or lower.
    • Some queries contain more than 30 potential positive documents in the miracl-corpus. In this case, even a very good model may not be able to rank the ground truth positive documents within the top 30. We estimated such queries to be about 7% to 10% of the total 860 queries. This number was estimated by referring to the tydiqa data for the same query as the corresponding miracl dev query and counting whether the tydiqa answer phrase was in at least 30 of the 300 hard negatives documents.

Footnotes

  1. These models have been fine-tuned using the MIRACL dataset, so the MIRACL task is not an unseen task for them. For detailed information on each model, please refer to the following links:multilingual-e5,BGE-M3,hotchpotch/japanese-bge-reranker-v2-m3-v1,hotchpotch/japanese-splade-base-v1,Ruri,pkshatech/GLuCoSE-base-ja-v2,pkshatech/RoSEtta-base-ja234567891011121314

  2. According to theblog post about fio-base-japanese-v0.1, the tasks aren't unseen by the model, which makes it hard to directly compare with the other models.

  3. Evaluate only the first 100 queries out of 860 queries23456

  4. JaColBERT is a retrieval model. It is optimised only for document retrieval tasks, and not for semantic similarity/entailment tasks like JSTS or JSICK.2

  5. Embedded dimension for dence is 1024, sparse is one float value per unique token, colbert is 1024 per token.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors7


[8]ページ先頭

©2009-2025 Movatter.jp