- Notifications
You must be signed in to change notification settings - Fork0
The Icelandic translation of the ATIS dataset
License
egillanton/ice-atis
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
The Icelandic translation of the ATIS (Airline Travel Information System) dataset.
The ICE-ATIS dataset tries to be the Icelandic version of the ATIS (Airline Travel Information System) dataset.
The dataset was translated with the use of Machine Translation, Natural Language Processing, and Manual Translation.
Original sample from the ATIS dataset:
BOS a listing of all flights from boston to baltimore before 10 am on thursday EOSO O O O O O O B-fromloc.city_name O B-toloc.city_name B-depart_time.time_relative B-depart_time.time I-depart_time.time O B-depart_date.day_name atis_flight
Extracted text:
a listing of all flights from boston to baltimore before 10 am on thursday
Google Machine Translated:
skrá yfir allt flug frá boston til baltimore fyrir klukkan 10 á fimmtudag
Manual Edited:
lista yfir öll flug frá boston til baltimore fyrir klukkan 10 á fimmtudag
Manual Labled Slot-tags and Intentions:
BOS lista yfir öll flug frá boston til baltimore fyrir klukkan 10 á fimmtudag EOSO O O O O B-fromloc.city_name O B-toloc.city_name B-depart_time.time_relative B-depart_time.time I-depart_time.time O B-depart_date.day_name atis_flight
For the task of re-labeling the IOB slot-tags, the following text annotation tool was created and used:
Text Annoation Tool:https://github.com/egillanton/flask-text-annotation-tool
The Original data set can be obtained fromKaggle, but I will usethese optained files, for a main reason that is already preprocessed into two, simple to work with files. We can stil use theKaggle files for refrenceing when labeling the slot tags and intention.
ATIS dataset:
$wc ./ATIS/*893 21900 153924 ATIS/atis.test.w-intent.iob4978 132312 900059 ATIS/atis.train.w-intent.iob5871 154212 1053983 total
$python ./ATIS/script.pyATISVocabsize for train_data: 897Vocabsize for test_data: 450Nr. of unseen words in test set: 52
Sample:
$ head -1 ATIS/atis.test.w-intent.iob BOS i would like to find a flight from charlotte to las vegas that makes a stop in st. louis EOS O O O O O O O O O B-fromloc.city_name O B-toloc.city_name I-toloc.city_name O O O O O B-stoploc.city_name I-stoploc.city_name atis_flight
ICE-ATIS dataset:
$wc ./ICE-ATIS/ice_atis.train.w-intent.iob ICE-ATIS/ice_atis.test.w-intent.iob4978 136516 954071 ICE-ATIS/ice_atis.train.w-intent.iob893 23339 166491 ICE-ATIS/ice_atis.test.w-intent.iob5871 159855 1120562 total
$python ./ICE-ATIS/script.pyICE-ATISVocabsize for train_data: 1380Vocabsize for test_data: 640Nr. of unseen words in test set: 127
Sample:
$head -1 ICE-ATIS/ice_atis.test.w-intent.iobBOS ég væri til í að finna flug frá charlotte til las vegas sem stoppar í st. louis EOS O O O O O O O O O B-fromloc.city_name O B-toloc.city_name I-toloc.city_name O O O B-stoploc.city_name I-stoploc.city_name O atis_flight