verypluming/JSICKPublic

NotificationsYou must be signed in to change notification settings
Fork1
Star44

Repository for JSICK

License

CC-BY-SA-4.0 license

44 stars 1 fork Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
jsick-stress		jsick-stress
jsick		jsick
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Repository files navigation

Japanese Sentences Involving Compositional Knowledge (JSICK) Dataset/JSICK-stress Test Set

Japanese Sentences Involving Compositional Knowledge (JSICK) Dataset

JSICK is the Japanese NLI and STS dataset by manually translating the English datasetSICK (Marelli et al., 2014) into Japanese.We hope that our dataset will be useful in research for realizing more advanced models that are capable of appropriately performing multilingual compositional inference.You can use the JSICK dataset/JSICK-stress test set athuggingface dataset!

The dataset is splitted intotrain.tsv andtest.tsv.

Name	Description
pair_ID	ids (the same with oriinal SICK)
sentence_A_En	first sentence in English
sentence_B_En	second sentence in English
entailment_label_En	original entailment label in English
relatedness_score_En	original relatedness score in the range [1-5] in English
corr_entailment_labelAB_En	corrected entailment label from A to B in English by(Karouli et al., 2017)
corr_entailment_labelBA_En	corrected entailment label from B to A in English by(Karouli et al., 2017)
sentence_A_En	first sentence in Japanese
sentence_B_En	second sentence in Japanese
entailment_label_Ja	entailment label in Japanese
relatedness_score_Ja	relatedness score in the range [1-5] in Japanese
image_ID	original image in8K ImageFlickr dataset
original_caption	original caption in8K ImageFlickr dataset
semtag_short	linguistic phenomena tags in Japanese
semtag_long	details of linguistic phenomena tags in Japanese

JSICK-stress Test Set

The JSICK-stress test set is a dataset to investigate whether models capture word order and case particles in Japanese.The JSICK-stress test set is provided by transforming syntactic structures of sentence pairs in JSICK, where we analyze whether models are attentive to word order and case particles to predict entailment labels and similarity scores.The JSICK test set contains 1666, 797, and 1006 sentence pairs (A, B) whose premise sentences A (the columnsentence_A_Ja_origin) include the basic word order involving ga-o(nominative-accusative), ga-ni (nominative-dative), and ga-de (nominative-instrumental/locative) relations, respectively.We provide the JSICK-stress test set by transforming syntactic structures of these pairs by the following three ways:

scrum_ga_o: a scrambled pair, where the word order of premise sentences A is scrambled into o-ga, ni-ga, and de-ga order, respectively.
ex_ga_o: a rephrased pair, where the only case particles (ga, o, ni, de) in the premise A are swapped
del_ga_o: a rephrased pair, where the only case particles (ga, o, ni) in the premise A are deleted

The filejsick/jsick-all-annotations.tsv contains the JSICK raw annotations, and the filejsick-stress/jsick-stress-all-annotations.tsv is a subset of JSICK-stress test sets annotated with human judgements.

References

Hitomi Yanaka, Koji Mineshima.Compositional Evaluation on Japanese Textual Entailment and Similarity.Transactions of the Association for Computational Linguistics, 2022. (TACL2022)[arXiv]
谷中瞳, 峯島宏次.JSICK: 日本語構成的推論・類似度データセットの構築. 人工知能学会第35回全国大会, 2021.

If you use this dataset in any published research, please cite the following:

@article{yanaka-mineshima-2022-compositional,    title = "Compositional Evaluation on {J}apanese Textual Entailment and Similarity",    author = "Yanaka, Hitomi  and      Mineshima, Koji",    journal = "Transactions of the Association for Computational Linguistics",    volume = "10",    year = "2022",    address = "Cambridge, MA",    publisher = "MIT Press",    url = "https://aclanthology.org/2022.tacl-1.73",    doi = "10.1162/tacl_a_00518",    pages = "1266--1284",}

License

This work is licensed under aCreative Commons Attribution 4.0 International License.

About

Repository for JSICK

Releases

No releases published

Packages

No packages published

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

License

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Japanese Sentences Involving Compositional Knowledge (JSICK) Dataset/JSICK-stress Test Set

Japanese Sentences Involving Compositional Knowledge (JSICK) Dataset

JSICK-stress Test Set

References

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages

Movatterモバイル変換

License

verypluming/JSICK

Folders and files

Latest commit

History

Repository files navigation

Japanese Sentences Involving Compositional Knowledge (JSICK) Dataset/JSICK-stress Test Set

Japanese Sentences Involving Compositional Knowledge (JSICK) Dataset

JSICK-stress Test Set

References

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages0

Packages