ku-nlp/VISAPublic

NotificationsYou must be signed in to change notification settings
Fork0
Star13

An ambiguous subtitles dataset for visual scene-aware machine translation

License

GPL-3.0 license

13 stars 0 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
omission		omission
polysemy		polysemy
LICENSE		LICENSE
README.md		README.md

Repository files navigation

VISA

VISA is a dataset that consists of 40k Japanese-English parallel sentence pairs and corresponding video clips with the following key features:

The parallel sentences aresubtitles from movies and TV episodes
The source subtitles areambiguous, which means they have multiple possible translations with different meanings
We divide the dataset intoPolysemy andOmission according to the cause of ambiguity

Examples:

Polysemy:

放せ！ --> Let me go!

Omission:

銃を持ってる。 --> I have a gun.

Splits:

Split	Train	Validation	test
Polysemy	18,666	1,000	1,000
Omission	17,214	1,000	1,000
Combined	35,880	2,000	2,000

Usage:

You can read json files to find the mapping from videos to parallel subtitle pairs.

Json Files Structure:

video_file_name: {      { "ja": Japanese_subtitle },      { "en": English_subtitle }  }

Note:

Please, note that by downloading the dataset, you agree to the following conditions:

Do not re-distribute the dataset without our permission.
The dataset can only be used for research purposes. Any other use is explicitly prohibited.

Downloadable Features:

If you are interested in the video features of VISA, you can download them from the following links:

The I3D Features of VISA:http://lotus.kuee.kyoto-u.ac.jp/~yihang/dataset/VISA_i3d.zip
The RCNN Features of VISA:http://lotus.kuee.kyoto-u.ac.jp/~yihang/dataset/VISA_rcnn.zip

Citation:

If you find this dataset helpful, please cite our publication "VISA: An Ambiguous Subtitles Dataset for Visual Scene-Aware Machine Translation":

@inproceedings{li-etal-2022-visa,    title = "{VISA}: An Ambiguous Subtitles Dataset for Visual Scene-aware Machine Translation",    author = "Li, Yihang  and      Shimizu, Shuichiro  and      Gu, Weiqi  and      Chu, Chenhui  and      Kurohashi, Sadao",    booktitle = "Proceedings of the Thirteenth Language Resources and Evaluation Conference",    month = jun,    year = "2022",    address = "Marseille, France",    publisher = "European Language Resources Association",    url = "https://aclanthology.org/2022.lrec-1.725",    pages = "6735--6743",}

Contact:

If you have any questions about this dataset, please contactliyh@nlp.ist.i.kyoto-u.ac.jp.

License:

GNU General Public License v3.0

About

An ambiguous subtitles dataset for visual scene-aware machine translation

Releases

No releases published

Packages

No packages published

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

License

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

VISA

Examples:

Splits:

Usage:

Json Files Structure:

Note:

Downloadable Features:

Citation:

Contact:

License:

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages

Contributors3

Uh oh!

Movatterモバイル変換

License

ku-nlp/VISA

Folders and files

Latest commit

History

Repository files navigation

VISA

Examples:

Splits:

Usage:

Json Files Structure:

Note:

Downloadable Features:

Citation:

Contact:

License:

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages0

Contributors3

Uh oh!

Packages