nlp-waseda/comet-atomic-jaPublic

NotificationsYou must be signed in to change notification settings
Fork0
Star30

COMET-ATOMIC ja

License

Apache-2.0 license

30 stars 0 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
LICENSE		LICENSE
README.md		README.md
graph.jsonl		graph.jsonl
graph_mrph.jsonl		graph_mrph.jsonl
graph_v2.jsonl		graph_v2.jsonl
graph_v2_mrph.jsonl		graph_v2_mrph.jsonl
requirements.txt		requirements.txt
train_gpt2.py		train_gpt2.py
train_t5.py		train_t5.py

Repository files navigation

COMET-ATOMIC ja

We built a commonsense knowledge graph on events in Japanese, with reference toATOMIC andCOMET.The graph was built from scratch, without translation.

We obtained the seed graph by Yahoo! Crowdsourcing and expanded it by in-context learning with HyperCLOVA JP.

Data

The graphs are in JSON Lines format.Each line contains an event and its inferences for the four relation types, derived from those in ATOMIC.

We rearranged the relation types in ATOMIC by considering the two dimensions:inference categories andtime series.Therefore, the graph covers the following relations:

\	Event	Mental state
Before	xNeed	xIntent
After	xEffect	xReact

An example of the JSON objects is as follows:

{"event":"Xが顔を洗う","inference": {"event": {"before": ["Xが水道で水を出す"            ],"after": ["Xがタオルを準備する","Xが鏡に映った自分の顔に覚えのない傷を見つける","Xが歯磨きをする"            ]        },"mental_state": {"before": ["スッキリしたい","眠いのでしゃきっとしたい"            ],"after": ["さっぱりして眠気覚ましになる","きれいになる","さっぱりした"            ]        }    }}

graph.jsonl is the original graph built inthis paper, whilegraph_v2.jsonl is the larger one expanded inthis paper.The graphs withmrph in their filename have the triples whose head and tail were segmented into words byJuman++.

The original graph andv2 have 1,471 and 1,429 unique heads, respectively.The numbers of unique triples in their graphs are as follows:

Relation	Original	V2
xNeed	9,403	44,269
xEffect	8,792	36,920
xIntent	10,155	52,745
xReact	10,941	60,616

For the original graph, ten inferences were generated for each event and relation.v2 was expanded by generating ten times as many inferences, i.e., 100 inferences for each event and relation.Note that in both graphs, duplicated triples were removed.

Models

We finetuned the JapaneseGPT-2 andT5 on the built graph.The models are available at Huggingface Models:

Note thatv2 models were finetuned on the expanded graph.

For the GPT2-based model, special tokens for the four relation are added to the vocabulary.Input a pair of a head and a special token to generate a tail.Note that the head should be segmented into words by Juman++, due to the base model.

The T5-based model infers a tail with a prompt in natural language.The prompts are different for each relation.

These two models were trained on 90% of the graph.The evaluation results for the remaining 10% are as follows:

Model	BLUE	BERTScore
COMET-GPT2 ja	43.61	87.56
COMET-GPT2 ja v2
COMET-T5 ja	39.85	82.37

COMET-GPT2 ja v2 will be evaluated soon.

Training

You can finetune models on the graph.Note that the scripts are separated for GPT-2 and T5.

An example of finetuning GPT-2 is as follows:

pip install -r requirements.txtpython train_gpt2.py \    --graph_jsonl graph_mrph.jsonl \    --model_name_or_path nlp-waseda/gpt2-small-japanese \    --output_dir comet_gpt2 \    --batch_size 16 \    --learning_rate 2e-5 \    --num_epochs 3

References

@misc{ide2023phalm,title={PHALM: Building a Knowledge Graph from Scratch by Prompting Humans and a Language Model},author={Tatsuya Ide and Eiki Murata and Daisuke Kawahara and Takato Yamazaki and Shengzhe Li and Kenta Shinzato and Toshinori Sato},year={2023},eprint={2310.07170},archivePrefix={arXiv},primaryClass={cs.CL}}

About

COMET-ATOMIC ja

Languages

Python100.0%

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

License

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

COMET-ATOMIC ja

Data

Models

Training

References

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages

Uh oh!

Contributors2

Uh oh!

Languages

Movatterモバイル変換

License

nlp-waseda/comet-atomic-ja

Folders and files

Latest commit

History

Repository files navigation

COMET-ATOMIC ja

Data

Models

Training

References

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages0

Uh oh!

Contributors2

Uh oh!

Languages

Packages