Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

An ambiguous subtitles dataset for visual scene-aware machine translation

License

NotificationsYou must be signed in to change notification settings

ku-nlp/VISA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 

Repository files navigation

VISA is a dataset that consists of 40k Japanese-English parallel sentence pairs and corresponding video clips with the following key features:

  • The parallel sentences aresubtitles from movies and TV episodes
  • The source subtitles areambiguous, which means they have multiple possible translations with different meanings
  • We divide the dataset intoPolysemy andOmission according to the cause of ambiguity

Examples:

Polysemy:

放せ! --> Let me go!

let_me_go

Omission:

銃を持ってる。 --> I have a gun.

I_carry_a_gun

Splits:

SplitTrainValidationtest
Polysemy18,6661,0001,000
Omission17,2141,0001,000
Combined35,8802,0002,000

Usage:

You can read json files to find the mapping from videos to parallel subtitle pairs.

Json Files Structure:

video_file_name: {      { "ja": Japanese_subtitle },      { "en": English_subtitle }  }

Note:

Please, note that by downloading the dataset, you agree to the following conditions:

  • Do not re-distribute the dataset without our permission.
  • The dataset can only be used for research purposes. Any other use is explicitly prohibited.

Downloadable Features:

If you are interested in the video features of VISA, you can download them from the following links:

Citation:

If you find this dataset helpful, please cite our publication "VISA: An Ambiguous Subtitles Dataset for Visual Scene-Aware Machine Translation":

@inproceedings{li-etal-2022-visa,    title = "{VISA}: An Ambiguous Subtitles Dataset for Visual Scene-aware Machine Translation",    author = "Li, Yihang  and      Shimizu, Shuichiro  and      Gu, Weiqi  and      Chu, Chenhui  and      Kurohashi, Sadao",    booktitle = "Proceedings of the Thirteenth Language Resources and Evaluation Conference",    month = jun,    year = "2022",    address = "Marseille, France",    publisher = "European Language Resources Association",    url = "https://aclanthology.org/2022.lrec-1.725",    pages = "6735--6743",}

Contact:

If you have any questions about this dataset, please contactliyh@nlp.ist.i.kyoto-u.ac.jp.

License:

GNU General Public License v3.0

About

An ambiguous subtitles dataset for visual scene-aware machine translation

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors3

  •  
  •  
  •  

[8]ページ先頭

©2009-2025 Movatter.jp