nlp-titech/copa-japanesePublic

NotificationsYou must be signed in to change notification settings
Fork0
Star0

COPA Dataset in Japanese

License

BSD-2-Clause license

0 stars 0 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
COPA-ja.jsonl		COPA-ja.jsonl
LICENSE		LICENSE
README.md		README.md

Repository files navigation

Japanese COPA Dataset

Choice of Plausible Alternatives (COPA) is a dataset for open-domain commonsense causal reasoning.This dataset (Japanese COPA) provides Japanese translations of all sentences (premise, answer1, and answer2) in the original English dataset.

Dataset Description

Each line of thedataset presents a question (premise, answer1, answer2, etc) inJSON Lines format.

Name	Description
id	Question id
premise	Premise for the question
asks_for	Type of the answer: 原因 (reason) or 結果 (result)
correct_answer	Correct answer: 1 or 2
answer1	Answer 1
answer2	Answer 2

{  "id": 1,  "premise": "草の上に私の影ができた。",  "asks_for": "原因",  "correct_answer": 1,  "answer1": "太陽が昇っていた。",  "answer2": "草が刈られていた。"}

Questions with IDs ranging from 1 to 500 provide the development set, and those from 501 to 1000 provide the test set.

Anaphora resolution

Some premises and answers have Japanese pronouns such as 彼 (he, him, his), 彼女 (she, her), それ (it), and 彼/彼女ら (they, their, them).Anaphoras of these pronouns were resolved and described in the dataset.

Format: [pronoun](antecedent)

Example (id: 3)

Premise: 女性たちは会ってコーヒーを飲みに行った。
Answer(原因): [彼女ら](女性たち)は互いの近況を語り合いたかった。

Example (id: 14)

Premise: 犯罪者が仮釈放の条件に違反した。
Answer(結果): [彼女](犯罪者)は刑務所に送り返された。

We can replace pronouns with antecedents by running the Python code:

importrep=re.compile(r'\[([^]]*)\]\(([^)]*)\)')input_text='[彼女ら](女性たち)は互いの近況を語り合いたかった。'text_with_pronoun=p.sub(r'\1',input_text)text_with_antecedent=p.sub(r'\2',input_text)print(text_with_pronoun)# 彼女らは互いの近況を語り合いたかった。print(text_with_antecedent)# 女性たちは互いの近況を語り合いたかった。

License

BSD 2-Clause License

About

COPA Dataset in Japanese

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

License

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Japanese COPA Dataset

Dataset Description

Anaphora resolution

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages

Contributors2

Uh oh!

Movatterモバイル変換

License

nlp-titech/copa-japanese

Folders and files

Latest commit

History

Repository files navigation

Japanese COPA Dataset

Dataset Description

Anaphora resolution

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages0

Contributors2

Uh oh!

Packages