- Notifications
You must be signed in to change notification settings - Fork1
Script for workflow to add morphological analysis into ELAN files
License
langdoc/elan-fst
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
This repository contains the scripts used in workflows which are described in thispaper, thispaper and thisposter. The idea is to populate lemma, part-of-speech and morphology tiers into ELAN files through open source Finite-State-Transducers developed byGiellatekno and collaborative projects with them.
The script has been tested in various projects, and the languages it has been applied to up to now are Komi-Zyrian, Kildin Saami, Pite Saami and Northern Saami.
Migration of the source code from Giellatekno SVN to GitHub was done in Giellatekno-SVN revision 162169.
The script requires FST-tools to be installed (xfst for the current version, but migration tohfst is planned), and there is access to compiled transducers. In order to compile the transducers it is necessary to install the completeGiellatekno infrastructure.
In this repository we have provided the previously compiled transducer for Komi-Zyrian, and both versions of the scripts can be run after installing lookup with:
git clone https://github.com/langdoc/elan-fstpython2.7 add_pos2elan_p2.7.pyOr:
python3 add_pos2elan_p3.pyThe image below describes the ideal workflow attained using the script. Currently, Constraint Grammar based disambiguation is not included, but this is one of the improvements that are planned for near future.
The version of the Python script namedadd_pos2elan_p3-sje-psdp.py differs slightly from the other versions in two ways. First, it adds annotations on a gloss tier (child of the pos tier) with an (ideally) brief, generl English translation of the relevant lemma (translations come from an external xml file containing lemmas and translations). Second, information on the individual non-final components of compounds are also included in annotations on in the part-of-speech, morphology and gloss tiers. This version is used in thePite Saami Syntax Project (progeny of thePite Saami Documentation Project); variations to the original script by Iris Perkmann and Joshua Wilbur.
The script was written by Ciprian Gerstenberger, and collaboration in the presented workflow has taken place with Niko Partanen, Michael Rießler, Joshua Wilbur and Iris Perkmann.
Ciprian Gerstenberger is employed byGiellatekno atThe Arctic University of Norway. Niko Partanen and Michael Rießler's work has been funded byKone Foundation as part of theIKDP-2 research project. Joshua Wilbur's and Iris Perkmann's contributions have been funded byDeutsche Forschungsgemeinschaft as part of thePite Saami Syntax Project.
If you use the script or create new workflows based on it, please provide a link to our script in your documentation. In your publications, please cite our papers in which we have presented and discussed our work.
@incollection{gerstenbergerEtAl2017b,Author = {Ciprian Gerstenberger and Niko Partanen and Michael Rie{\ss}ler},Booktitle = {Proceedings of the 2nd Workshop on the Use of Computational Methods in the Study of Endangered Languages},Location = {Honolulu},Month = {mar},Pages = {57-66},Publisher = {Association for Computational Linguistics},Series = {ACL Anthology},Title = {Instant annotations in ELAN corpora of spoken and written {K}omi, an endangered language of the {B}arents {S}ea region},Url = {http://www.aclweb.org/anthology/W17-0109},Editor = {Antti Arppe AND Jeff Good AND Mans Hulden AND Jordan Lachler AND Alexis Palmer AND Lane Schwartz},Year = {2017}}@incollection{gerstenbergerEtAl2017a,Author = {Ciprian Gerstenberger AND Niko Partanen AND Michael Rie{\ss}ler AND Joshua Wilbur},Title = {Instant annotations},Subtitle = {Applying {NLP} methods to the annotation of spoken language documentation corpora},Editor = {Tommi A. Pirinen AND Michael Rie{\ss}ler AND Trond Trosterud AND Francis M. Tyers},Location = {St. Petersburg},Month = {jan},Publisher = {Association for Computational Linguistics},Series = {ACL Anthology},Title = {Proceedings of the 3rd {I}nternational {W}orkshop on {C}omputational {L}inguistics for {U}ralic languages},Pages = {25-36},Year = {2017},Url = {http://www.aclweb.org/anthology/W17-0604}}Use is governed bythis GNU license.
About
Script for workflow to add morphological analysis into ELAN files
Topics
Resources
License
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Releases
Packages0
Contributors3
Uh oh!
There was an error while loading.Please reload this page.
