Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up

The Icelandic translation of the ATIS dataset

License

NotificationsYou must be signed in to change notification settings

egillanton/ice-atis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

The Icelandic translation of the ATIS (Airline Travel Information System) dataset.

Introduciton

The ICE-ATIS dataset tries to be the Icelandic version of the ATIS (Airline Travel Information System) dataset.

The dataset was translated with the use of Machine Translation, Natural Language Processing, and Manual Translation.

An Example

Original sample from the ATIS dataset:

BOS a listing of all flights from boston to baltimore before 10 am on thursday EOSO O O O O O O B-fromloc.city_name O B-toloc.city_name B-depart_time.time_relative B-depart_time.time I-depart_time.time O B-depart_date.day_name atis_flight

Extracted text:

a listing of all flights from boston to baltimore before 10 am on thursday

Google Machine Translated:

skrá yfir allt flug frá boston til baltimore fyrir klukkan 10 á fimmtudag

Manual Edited:

lista yfir öll flug frá boston til baltimore fyrir klukkan 10 á fimmtudag

Manual Labled Slot-tags and Intentions:

BOS lista yfir öll flug frá boston til baltimore fyrir klukkan 10 á fimmtudag EOSO O O O O B-fromloc.city_name O B-toloc.city_name B-depart_time.time_relative B-depart_time.time I-depart_time.time O B-depart_date.day_name atis_flight

For the task of re-labeling the IOB slot-tags, the following text annotation tool was created and used:

Text Annoation Tool:https://github.com/egillanton/flask-text-annotation-tool

Dataset

The Original data set can be obtained fromKaggle, but I will usethese optained files, for a main reason that is already preprocessed into two, simple to work with files. We can stil use theKaggle files for refrenceing when labeling the slot tags and intention.

ATIS dataset:

$wc  ./ATIS/*893   21900  153924 ATIS/atis.test.w-intent.iob4978  132312  900059 ATIS/atis.train.w-intent.iob5871  154212 1053983 total
$python ./ATIS/script.pyATISVocabsize for train_data: 897Vocabsize for test_data: 450Nr. of unseen words in test set: 52

Sample:

$ head -1 ATIS/atis.test.w-intent.iob BOS i would like to find a flight from charlotte to las vegas that makes a stop in st. louis EOS        O O O O O O O O O B-fromloc.city_name O B-toloc.city_name I-toloc.city_name O O O O O B-stoploc.city_name I-stoploc.city_name atis_flight

ICE-ATIS dataset:

$wc ./ICE-ATIS/ice_atis.train.w-intent.iob ICE-ATIS/ice_atis.test.w-intent.iob4978  136516  954071 ICE-ATIS/ice_atis.train.w-intent.iob893   23339  166491 ICE-ATIS/ice_atis.test.w-intent.iob5871  159855 1120562 total
$python ./ICE-ATIS/script.pyICE-ATISVocabsize for train_data: 1380Vocabsize for test_data: 640Nr. of unseen words in test set: 127

Sample:

$head -1  ICE-ATIS/ice_atis.test.w-intent.iobBOS ég væri til í að finna flug frá charlotte til las vegas sem stoppar í st. louis EOS O O O O O O O O O B-fromloc.city_name O B-toloc.city_name I-toloc.city_name O O O B-stoploc.city_name I-stoploc.city_name O     atis_flight

Releases

No releases published

Packages

No packages published

Languages


[8]ページ先頭

©2009-2025 Movatter.jp