UDPipe is a trainable pipeline for tokenization, tagging, lemmatization anddependency parsing of CoNLL-U files. UDPipe is language-agnostic and can be trained givenannotated data inCoNLL-U format. Trained models are provided fornearly allUD treebanks. UDPipe is available as a binary for Linux/Windows/OS X, as a library forC++, Python, Perl, Java, C#, and as a web service.Third-party R CRAN package also exists.
UDPipe is a free software distributed under theMozilla Public License 2.0 and the linguistic modelsare free for non-commercial use and distributed under theCC BY-NC-SA license, although for somemodels the original data used to create the model may impose additionallicensing conditions. UDPipe is versioned usingSemantic Versioning.
Copyright 2017 by Institute of Formal and Applied Linguistics, Faculty ofMathematics and Physics, Charles University, Czech Republic.
Description of the available methods is available in theAPIDocumentation and the models are described in theUDPipe 2 models listandUDPipe 1 models list.
The service is freely available for testing. Respect theCC BY-NC-SA licence of the models –explicit written permission of the authors is required for any commercial exploitation of the system. If you use the service, you agree that data obtained by us during such use can be used for further improvements of the systems at UFAL. All comments and reactions are welcome.