- Notifications
You must be signed in to change notification settings - Fork1
e-ditiones/Annotator
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
This script process segmentation, lemmatization, normalization and NER of XML-TEI encoded files.
Normalization and NER are still a work in progress.
- clone or download this repository
git clone https://github.com/e-ditiones/Annotator.gitcd Annotator
The XML-files to be processed need to be in the
in_XML
folder.Run the script
bash process.sh
- Results are in the
out
folder :XML
: contains XML annotated files ;TSV
: contains the annotation in TSV format.
For lemmatisation, we usePie-extended and the "freem" model.
This repository is developed by Alexandre Bartz with the help of Simon Gabay, as part of the projecte-ditiones.
Our work is licenced under aCreative Commons Attribution 4.0 International Licence.
Pie-extended is under theMozilla Public License 2.0.
Alexandre Bartz, Simon Gabay. 2020.Lemmatization and normalization of French modern manuscripts and printed documents. Retrieved fromhttps://github.com/e-ditiones/Annotator.