- Notifications
You must be signed in to change notification settings - Fork0
nlp-titech/copa-japanese
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Choice of Plausible Alternatives (COPA) is a dataset for open-domain commonsense causal reasoning.This dataset (Japanese COPA) provides Japanese translations of all sentences (premise, answer1, and answer2) in the original English dataset.
Each line of thedataset presents a question (premise, answer1, answer2, etc) inJSON Lines format.
Name | Description |
---|---|
id | Question id |
premise | Premise for the question |
asks_for | Type of the answer: 原因 (reason) or 結果 (result) |
correct_answer | Correct answer: 1 or 2 |
answer1 | Answer 1 |
answer2 | Answer 2 |
{ "id": 1, "premise": "草の上に私の影ができた。", "asks_for": "原因", "correct_answer": 1, "answer1": "太陽が昇っていた。", "answer2": "草が刈られていた。"}
Questions with IDs ranging from 1 to 500 provide the development set, and those from 501 to 1000 provide the test set.
Some premises and answers have Japanese pronouns such as 彼 (he, him, his), 彼女 (she, her), それ (it), and 彼/彼女ら (they, their, them).Anaphoras of these pronouns were resolved and described in the dataset.
Format: [pronoun](antecedent)
Example (id: 3)
- Premise: 女性たちは会ってコーヒーを飲みに行った。
- Answer(原因): [彼女ら](女性たち)は互いの近況を語り合いたかった。
Example (id: 14)
- Premise: 犯罪者が仮釈放の条件に違反した。
- Answer(結果): [彼女](犯罪者)は刑務所に送り返された。
We can replace pronouns with antecedents by running the Python code:
importrep=re.compile(r'\[([^]]*)\]\(([^)]*)\)')input_text='[彼女ら](女性たち)は互いの近況を語り合いたかった。'text_with_pronoun=p.sub(r'\1',input_text)text_with_antecedent=p.sub(r'\2',input_text)print(text_with_pronoun)# 彼女らは互いの近況を語り合いたかった。print(text_with_antecedent)# 女性たちは互いの近況を語り合いたかった。
BSD 2-Clause License
About
COPA Dataset in Japanese
Resources
License
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Releases
Packages0
Contributors2
Uh oh!
There was an error while loading.Please reload this page.