Universal Dependencies 2.15 Models are distributed under theCC BY-NC-SA licence.The models are based solely onUniversal Dependencies2.15 treebanks, and additionallyusemultilingual BERTandRobeCzech.
The models requireUDPipe 2.
The latest version 241121 of the Universal Dependencies 2.15 modelscan be downloaded fromLINDAT/CLARIN repository.
The models are also available in theREST service.
This work has been supported by the Ministry of Education, Youth and Sports ofthe Czech Republic, Project No. LM2023062 LINDAT/CLARIAH-CZ.
The models were trained onUniversal Dependencies 2.15 treebanks.
For the UD treebanks which do not contain original plain text version,raw text is used to train the tokenizer instead. The plain textswere taken from theW2C -- Web to Corpus.
Finally,multilingual BERTandRobeCzech are used to providecontextualized word embeddings.
The Universal Dependencies 2.15 models contain 147 models of 78 languages, eachconsisting of a tokenizer, tagger, lemmatizer and dependency parser, all trainedusing the UD data. We used the original train-dev-test split, but for treebankswith only train and no dev data we used last 10% of the train data as dev data.We produce models only for treebanks with at least 1000 training words.
The tokenizer is trained using theSpaceAfter=No
features. If the featuresare not present in the data, they can be filled in using raw text in thelanguage in question.
The tagger, lemmatizer and parser are trained using gold UD data.
We present the tokenizer, tagger, lemmatizer and parser performance, measured onthe testing portion of the data, evaluated both on the raw text and using thegold tokenization. The results are F1 scores measured by theconll18_ud_eval.py
script.
Model | Mode | Words | Sents | UPOS | XPOS | UFeats | AllTags | Lemma | UAS | LAS | MLAS | BLEX |
---|---|---|---|---|---|---|---|---|---|---|---|---|
afrikaans-afribooms-ud-2.15-241121 | Raw text | 99.94 | 99.65 | 98.68 | 95.67 | 98.30 | 95.48 | 98.38 | 90.38 | 87.46 | 78.88 | 79.86 |
afrikaans-afribooms-ud-2.15-241121 | Gold tokenization | — | — | 98.74 | 95.72 | 98.36 | 95.53 | 98.42 | 90.59 | 87.67 | 79.10 | 80.02 |
albanian-staf-ud-2.15-241121 | Raw text | 98.39 | 92.68 | 88.98 | 98.39 | 70.16 | 68.01 | 82.80 | 77.42 | 66.40 | 34.15 | 40.49 |
albanian-staf-ud-2.15-241121 | Gold tokenization | — | — | 89.52 | 100.00 | 70.43 | 68.01 | 83.60 | 77.69 | 66.13 | 33.98 | 41.75 |
ancient_greek-proiel-ud-2.15-241121 | Raw text | 99.98 | 49.19 | 97.68 | 97.92 | 92.04 | 90.69 | 94.65 | 82.00 | 78.22 | 62.39 | 66.26 |
ancient_greek-proiel-ud-2.15-241121 | Gold tokenization | — | — | 97.88 | 98.08 | 92.38 | 91.20 | 94.68 | 86.76 | 82.97 | 68.47 | 72.07 |
ancient_greek-perseus-ud-2.15-241121 | Raw text | 99.97 | 98.85 | 93.08 | 86.04 | 91.66 | 85.23 | 86.68 | 80.03 | 74.40 | 54.44 | 55.56 |
ancient_greek-perseus-ud-2.15-241121 | Gold tokenization | — | — | 93.15 | 86.11 | 91.69 | 85.29 | 86.71 | 80.22 | 74.57 | 54.61 | 55.72 |
ancient_greek-ptnk-ud-2.15-241121 | Raw text | 99.97 | 55.12 | 98.49 | — | 90.24 | 89.18 | 95.04 | 87.92 | 83.91 | 65.12 | 72.02 |
ancient_greek-ptnk-ud-2.15-241121 | Gold tokenization | — | — | 98.47 | — | 90.36 | 89.27 | 95.09 | 91.59 | 87.36 | 68.22 | 75.12 |
ancient_hebrew-ptnk-ud-2.15-241121 | Raw text | 76.05 | 98.06 | 74.12 | 74.17 | 73.42 | 72.32 | 72.23 | 55.78 | 54.01 | 39.71 | 40.12 |
ancient_hebrew-ptnk-ud-2.15-241121 | Gold tokenization | — | — | 97.09 | 97.25 | 95.49 | 94.07 | 92.50 | 92.11 | 88.68 | 75.55 | 72.43 |
arabic-padt-ud-2.15-241121 | Raw text | 94.58 | 82.09 | 91.70 | 89.04 | 89.15 | 88.68 | 90.31 | 78.66 | 74.79 | 66.09 | 68.16 |
arabic-padt-ud-2.15-241121 | Gold tokenization | — | — | 96.99 | 94.37 | 94.54 | 94.01 | 95.26 | 88.14 | 83.66 | 74.87 | 76.34 |
armenian-armtdp-ud-2.15-241121 | Raw text | 99.28 | 95.70 | 96.25 | — | 91.90 | 90.65 | 95.14 | 86.90 | 82.10 | 69.44 | 73.94 |
armenian-armtdp-ud-2.15-241121 | Gold tokenization | — | — | 96.86 | — | 92.53 | 91.22 | 95.73 | 88.46 | 83.56 | 70.13 | 74.73 |
armenian-bsut-ud-2.15-241121 | Raw text | 99.79 | 98.73 | 97.30 | — | 92.14 | 91.36 | 96.68 | 90.08 | 85.52 | 71.11 | 78.28 |
armenian-bsut-ud-2.15-241121 | Gold tokenization | — | — | 97.52 | — | 92.35 | 91.58 | 96.89 | 90.59 | 86.02 | 71.60 | 78.76 |
basque-bdt-ud-2.15-241121 | Raw text | 99.97 | 99.83 | 96.27 | — | 93.34 | 91.43 | 96.33 | 88.11 | 84.98 | 74.95 | 79.47 |
basque-bdt-ud-2.15-241121 | Gold tokenization | — | — | 96.30 | — | 93.37 | 91.45 | 96.34 | 88.16 | 85.03 | 74.99 | 79.50 |
belarusian-hse-ud-2.15-241121 | Raw text | 99.37 | 86.58 | 98.22 | 97.62 | 94.51 | 93.65 | 93.09 | 87.06 | 85.09 | 76.62 | 76.27 |
belarusian-hse-ud-2.15-241121 | Gold tokenization | — | — | 98.84 | 98.21 | 95.17 | 94.26 | 93.69 | 89.64 | 87.43 | 78.58 | 78.05 |
bulgarian-btb-ud-2.15-241121 | Raw text | 99.91 | 94.17 | 99.17 | 97.27 | 97.95 | 96.85 | 98.00 | 94.49 | 91.80 | 85.90 | 86.51 |
bulgarian-btb-ud-2.15-241121 | Gold tokenization | — | — | 99.29 | 97.38 | 98.05 | 96.96 | 98.10 | 95.31 | 92.57 | 86.55 | 87.25 |
catalan-ancora-ud-2.15-241121 | Raw text | 99.94 | 99.49 | 99.09 | 97.18 | 98.69 | 96.92 | 99.43 | 94.77 | 93.22 | 87.87 | 89.32 |
catalan-ancora-ud-2.15-241121 | Gold tokenization | — | — | 99.17 | 97.28 | 98.76 | 97.01 | 99.49 | 94.93 | 93.37 | 88.05 | 89.49 |
chinese-gsdsimp-ud-2.15-241121 | Raw text | 90.29 | 99.10 | 86.92 | 87.19 | 89.68 | 86.49 | 90.21 | 72.55 | 70.08 | 62.74 | 66.30 |
chinese-gsdsimp-ud-2.15-241121 | Gold tokenization | — | — | 95.81 | 95.98 | 99.37 | 95.25 | 99.90 | 86.68 | 83.56 | 77.72 | 81.86 |
chinese-gsd-ud-2.15-241121 | Raw text | 90.27 | 99.10 | 86.67 | 86.98 | 89.65 | 86.23 | 90.20 | 72.24 | 69.81 | 62.53 | 66.10 |
chinese-gsd-ud-2.15-241121 | Gold tokenization | — | — | 95.85 | 95.99 | 99.42 | 95.29 | 99.92 | 86.88 | 83.84 | 77.75 | 82.10 |
classical_armenian-caval-ud-2.15-241121 | Raw text | 98.80 | 60.28 | 97.16 | — | 94.95 | 94.10 | 97.40 | 82.79 | 79.50 | 68.83 | 73.76 |
classical_armenian-caval-ud-2.15-241121 | Gold tokenization | — | — | 98.23 | — | 96.05 | 95.13 | 98.49 | 88.81 | 85.30 | 73.40 | 78.39 |
classical_chinese-kyoto-ud-2.15-241121 | Raw text | 97.94 | 46.37 | 89.65 | 88.83 | 91.59 | 86.00 | 97.51 | 71.18 | 65.95 | 62.52 | 64.56 |
classical_chinese-kyoto-ud-2.15-241121 | Gold tokenization | — | — | 93.28 | 92.05 | 94.54 | 89.77 | 99.53 | 84.51 | 79.24 | 75.52 | 78.15 |
coptic-scriptorium-ud-2.15-241121 | Raw text | 75.42 | 28.57 | 73.38 | 73.36 | 73.30 | 72.45 | 74.03 | 52.11 | 50.26 | 38.67 | 41.12 |
coptic-scriptorium-ud-2.15-241121 | Gold tokenization | — | — | 97.01 | 96.92 | 97.67 | 95.71 | 97.24 | 90.69 | 87.85 | 77.23 | 80.75 |
croatian-set-ud-2.15-241121 | Raw text | 99.93 | 94.79 | 98.46 | 95.79 | 96.19 | 95.50 | 97.70 | 92.34 | 89.48 | 81.66 | 84.64 |
croatian-set-ud-2.15-241121 | Gold tokenization | — | — | 98.52 | 95.89 | 96.28 | 95.59 | 97.76 | 92.83 | 89.97 | 82.08 | 85.09 |
czech-pdt-ud-2.15-241121 | Raw text | 99.93 | 93.37 | 99.29 | 98.42 | 98.78 | 98.22 | 99.39 | 94.92 | 93.55 | 90.68 | 92.23 |
czech-pdt-ud-2.15-241121 | Gold tokenization | — | — | 99.36 | 98.51 | 98.86 | 98.31 | 99.47 | 95.70 | 94.32 | 91.34 | 92.90 |
czech-cac-ud-2.15-241121 | Raw text | 99.99 | 99.68 | 99.67 | 98.32 | 98.13 | 97.76 | 99.18 | 96.16 | 94.95 | 90.89 | 92.85 |
czech-cac-ud-2.15-241121 | Gold tokenization | — | — | 99.68 | 98.33 | 98.14 | 97.77 | 99.19 | 96.16 | 94.95 | 90.92 | 92.87 |
czech-cltt-ud-2.15-241121 | Raw text | 99.32 | 96.92 | 98.79 | 93.98 | 94.10 | 93.72 | 98.54 | 90.80 | 89.09 | 81.43 | 86.95 |
czech-cltt-ud-2.15-241121 | Gold tokenization | — | — | 99.24 | 94.37 | 94.56 | 94.13 | 99.01 | 92.09 | 90.11 | 82.04 | 87.76 |
czech-fictree-ud-2.15-241121 | Raw text | 99.99 | 98.95 | 99.17 | 97.07 | 97.89 | 96.88 | 99.27 | 96.17 | 94.74 | 89.62 | 92.52 |
czech-fictree-ud-2.15-241121 | Gold tokenization | — | — | 99.18 | 97.08 | 97.90 | 96.89 | 99.28 | 96.25 | 94.80 | 89.74 | 92.64 |
danish-ddt-ud-2.15-241121 | Raw text | 99.82 | 89.80 | 97.83 | 99.82 | 97.32 | 96.44 | 97.52 | 88.74 | 86.71 | 79.57 | 81.68 |
danish-ddt-ud-2.15-241121 | Gold tokenization | — | — | 98.06 | 100.00 | 97.56 | 96.71 | 97.68 | 89.97 | 87.93 | 80.65 | 82.80 |
dutch-alpino-ud-2.15-241121 | Raw text | 99.75 | 89.10 | 98.02 | 97.09 | 97.82 | 96.73 | 95.66 | 93.52 | 91.46 | 85.25 | 82.50 |
dutch-alpino-ud-2.15-241121 | Gold tokenization | — | — | 98.23 | 97.28 | 97.98 | 96.88 | 95.89 | 94.92 | 92.86 | 86.60 | 83.78 |
dutch-lassysmall-ud-2.15-241121 | Raw text | 99.86 | 84.61 | 97.78 | 96.78 | 97.55 | 96.34 | 96.24 | 92.96 | 90.85 | 83.36 | 81.99 |
dutch-lassysmall-ud-2.15-241121 | Gold tokenization | — | — | 98.02 | 97.10 | 97.86 | 96.72 | 96.39 | 95.18 | 92.98 | 86.11 | 84.68 |
english-ewt-ud-2.15-241121 | Raw text | 99.01 | 87.55 | 96.57 | 96.24 | 97.06 | 95.27 | 96.94 | 90.97 | 89.14 | 82.66 | 83.86 |
english-ewt-ud-2.15-241121 | Gold tokenization | — | — | 97.50 | 97.17 | 97.97 | 96.20 | 97.87 | 93.42 | 91.52 | 85.10 | 86.21 |
english-atis-ud-2.15-241121 | Raw text | 100.00 | 80.03 | 98.91 | — | 98.50 | 98.04 | 99.67 | 94.19 | 92.71 | 87.41 | 89.83 |
english-atis-ud-2.15-241121 | Gold tokenization | — | — | 99.01 | — | 98.56 | 98.12 | 99.65 | 96.09 | 94.47 | 90.01 | 92.36 |
english-eslspok-ud-2.15-241121 | Raw text | 99.87 | 88.60 | 98.59 | 98.68 | — | 98.06 | — | 94.04 | 92.68 | 90.53 | 92.21 |
english-eslspok-ud-2.15-241121 | Gold tokenization | — | — | 98.72 | 98.81 | — | 98.19 | — | 96.12 | 94.66 | 91.90 | 93.81 |
english-gum-ud-2.15-241121 | Raw text | 99.71 | 96.06 | 98.14 | 98.08 | 97.98 | 97.13 | 98.90 | 93.00 | 91.23 | 85.88 | 87.29 |
english-gum-ud-2.15-241121 | Gold tokenization | — | — | 98.39 | 98.36 | 98.27 | 97.40 | 99.17 | 93.74 | 91.92 | 86.48 | 87.90 |
english-lines-ud-2.15-241121 | Raw text | 99.93 | 87.77 | 97.64 | 96.87 | 97.08 | 94.47 | 98.38 | 91.07 | 88.44 | 80.63 | 83.67 |
english-lines-ud-2.15-241121 | Gold tokenization | — | — | 97.72 | 96.96 | 97.16 | 94.53 | 98.43 | 91.89 | 89.22 | 81.24 | 84.39 |
english-partut-ud-2.15-241121 | Raw text | 99.72 | 99.02 | 97.34 | 97.26 | 96.82 | 95.82 | 98.25 | 93.82 | 91.86 | 84.00 | 87.14 |
english-partut-ud-2.15-241121 | Gold tokenization | — | — | 97.59 | 97.51 | 97.07 | 96.07 | 98.53 | 94.07 | 92.11 | 84.31 | 87.40 |
erzya-jr-ud-2.15-241121 | Raw text | 99.10 | 97.02 | 87.75 | 87.33 | 78.87 | 73.69 | 84.02 | 72.78 | 62.99 | 41.51 | 47.05 |
erzya-jr-ud-2.15-241121 | Gold tokenization | — | — | 88.52 | 87.97 | 79.52 | 74.24 | 84.64 | 73.74 | 63.78 | 41.84 | 47.46 |
estonian-edt-ud-2.15-241121 | Raw text | 99.94 | 91.46 | 97.80 | 98.40 | 96.65 | 95.60 | 95.45 | 88.98 | 86.50 | 81.03 | 80.14 |
estonian-edt-ud-2.15-241121 | Gold tokenization | — | — | 97.91 | 98.45 | 96.73 | 95.71 | 95.53 | 89.85 | 87.35 | 81.84 | 80.91 |
estonian-ewt-ud-2.15-241121 | Raw text | 98.63 | 78.03 | 94.95 | 96.19 | 94.09 | 91.86 | 94.04 | 83.48 | 80.27 | 72.42 | 73.82 |
estonian-ewt-ud-2.15-241121 | Gold tokenization | — | — | 96.30 | 97.53 | 95.38 | 93.17 | 95.28 | 87.47 | 83.97 | 75.36 | 76.72 |
faroese-farpahc-ud-2.15-241121 | Raw text | 99.74 | 92.77 | 97.38 | 93.09 | 94.39 | 92.38 | 99.74 | 86.35 | 82.48 | 68.50 | 75.88 |
faroese-farpahc-ud-2.15-241121 | Gold tokenization | — | — | 97.56 | 93.28 | 94.60 | 92.52 | 100.00 | 87.32 | 83.35 | 69.30 | 76.97 |
finnish-tdt-ud-2.15-241121 | Raw text | 99.70 | 90.82 | 97.61 | 98.23 | 95.97 | 95.06 | 92.07 | 90.34 | 88.38 | 82.15 | 78.29 |
finnish-tdt-ud-2.15-241121 | Gold tokenization | — | — | 97.92 | 98.54 | 96.26 | 95.41 | 92.34 | 91.72 | 89.70 | 83.17 | 79.26 |
finnish-ftb-ud-2.15-241121 | Raw text | 99.91 | 86.84 | 96.74 | 95.08 | 96.71 | 94.11 | 95.65 | 90.26 | 87.61 | 80.42 | 81.05 |
finnish-ftb-ud-2.15-241121 | Gold tokenization | — | — | 97.03 | 95.33 | 96.84 | 94.38 | 95.77 | 92.32 | 89.62 | 82.90 | 83.38 |
french-gsd-ud-2.15-241121 | Raw text | 98.95 | 94.67 | 97.47 | 98.95 | 97.34 | 96.76 | 97.83 | 93.56 | 91.56 | 85.36 | 87.34 |
french-gsd-ud-2.15-241121 | Gold tokenization | — | — | 98.48 | 100.00 | 98.39 | 97.79 | 98.86 | 95.04 | 93.23 | 87.01 | 88.35 |
french-parisstories-ud-2.15-241121 | Raw text | 99.64 | 93.36 | 97.45 | 99.64 | 94.91 | 93.39 | 98.84 | 80.64 | 77.40 | 66.21 | 73.69 |
french-parisstories-ud-2.15-241121 | Gold tokenization | — | — | 97.80 | 100.00 | 95.23 | 93.71 | 99.18 | 81.74 | 78.52 | 67.13 | 74.56 |
french-partut-ud-2.15-241121 | Raw text | 99.42 | 98.64 | 97.78 | 97.51 | 95.28 | 94.44 | 97.97 | 94.94 | 93.25 | 82.97 | 88.47 |
french-partut-ud-2.15-241121 | Gold tokenization | — | — | 98.31 | 98.12 | 95.81 | 94.93 | 98.54 | 95.54 | 94.01 | 83.59 | 89.02 |
french-rhapsodie-ud-2.15-241121 | Raw text | 99.16 | 99.82 | 97.45 | 99.16 | 96.57 | 95.59 | 98.47 | 87.73 | 84.65 | 76.48 | 80.84 |
french-rhapsodie-ud-2.15-241121 | Gold tokenization | — | — | 98.33 | 100.00 | 97.38 | 96.45 | 99.29 | 89.02 | 85.92 | 77.48 | 81.52 |
french-sequoia-ud-2.15-241121 | Raw text | 99.12 | 88.77 | 98.38 | — | 97.41 | 97.00 | 98.24 | 93.98 | 92.72 | 86.71 | 89.23 |
french-sequoia-ud-2.15-241121 | Gold tokenization | — | — | 99.28 | — | 98.29 | 97.86 | 99.09 | 95.75 | 94.49 | 88.54 | 90.42 |
galician-treegal-ud-2.15-241121 | Raw text | 98.74 | 87.99 | 96.29 | 94.37 | 95.37 | 93.52 | 97.27 | 83.29 | 79.18 | 68.58 | 72.41 |
galician-treegal-ud-2.15-241121 | Gold tokenization | — | — | 97.54 | 95.47 | 96.44 | 94.59 | 98.47 | 86.93 | 82.53 | 72.41 | 76.46 |
galician-ctg-ud-2.15-241121 | Raw text | 99.22 | 97.22 | 97.16 | 96.99 | 99.06 | 96.55 | 98.07 | 85.36 | 82.85 | 71.13 | 75.73 |
galician-ctg-ud-2.15-241121 | Gold tokenization | — | — | 97.88 | 97.72 | 99.84 | 97.26 | 98.83 | 87.00 | 84.36 | 72.95 | 77.61 |
georgian-glc-ud-2.15-241121 | Raw text | 99.12 | 95.88 | 95.89 | 95.87 | 91.35 | 90.93 | 94.03 | 83.18 | 78.90 | 68.71 | 72.92 |
georgian-glc-ud-2.15-241121 | Gold tokenization | — | — | 96.59 | 96.57 | 91.97 | 91.54 | 94.73 | 84.73 | 80.26 | 69.52 | 73.98 |
german-gsd-ud-2.15-241121 | Raw text | 99.67 | 83.63 | 96.67 | 97.53 | 91.24 | 88.72 | 96.91 | 87.27 | 83.53 | 66.10 | 75.49 |
german-gsd-ud-2.15-241121 | Gold tokenization | — | — | 97.07 | 97.90 | 91.68 | 89.23 | 97.24 | 89.23 | 85.46 | 67.85 | 77.49 |
german-hdt-ud-2.15-241121 | Raw text | 99.90 | 92.39 | 98.55 | 98.46 | 94.19 | 93.79 | 97.68 | 96.92 | 96.00 | 84.94 | 90.48 |
german-hdt-ud-2.15-241121 | Gold tokenization | — | — | 98.66 | 98.59 | 94.32 | 93.93 | 97.77 | 97.61 | 96.72 | 85.62 | 91.18 |
gothic-proiel-ud-2.15-241121 | Raw text | 100.00 | 31.12 | 96.13 | 96.65 | 90.10 | 88.00 | 94.71 | 78.67 | 72.55 | 58.51 | 63.15 |
gothic-proiel-ud-2.15-241121 | Gold tokenization | — | — | 96.68 | 97.22 | 91.05 | 89.29 | 94.77 | 86.93 | 81.01 | 68.64 | 72.88 |
greek-gdt-ud-2.15-241121 | Raw text | 99.87 | 90.19 | 98.12 | 98.15 | 95.68 | 95.00 | 96.04 | 92.91 | 91.13 | 81.57 | 81.64 |
greek-gdt-ud-2.15-241121 | Gold tokenization | — | — | 98.28 | 98.30 | 95.85 | 95.17 | 96.13 | 93.70 | 91.84 | 82.22 | 82.28 |
greek-gud-ud-2.15-241121 | Raw text | 99.92 | 94.98 | 97.11 | 96.29 | 94.42 | 90.66 | 95.76 | 92.98 | 90.15 | 76.44 | 80.55 |
greek-gud-ud-2.15-241121 | Gold tokenization | — | — | 97.15 | 96.33 | 94.45 | 90.68 | 95.82 | 93.65 | 90.82 | 76.90 | 81.03 |
hebrew-htb-ud-2.15-241121 | Raw text | 85.10 | 99.69 | 83.02 | 83.00 | 81.46 | 80.80 | 82.97 | 70.48 | 68.06 | 55.81 | 59.83 |
hebrew-htb-ud-2.15-241121 | Gold tokenization | — | — | 97.72 | 97.67 | 95.99 | 95.43 | 97.34 | 92.53 | 90.03 | 79.69 | 82.23 |
hebrew-iahltknesset-ud-2.15-241121 | Raw text | 87.98 | 100.00 | 85.29 | 85.25 | 81.63 | 80.79 | 86.84 | 71.33 | 68.88 | 56.06 | 62.80 |
hebrew-iahltknesset-ud-2.15-241121 | Gold tokenization | — | — | 96.93 | 96.95 | 92.70 | 91.84 | 98.29 | 90.09 | 87.40 | 72.65 | 80.96 |
hebrew-iahltwiki-ud-2.15-241121 | Raw text | 88.64 | 96.78 | 86.13 | 86.12 | 81.93 | 80.95 | 87.41 | 75.89 | 74.01 | 58.29 | 67.00 |
hebrew-iahltwiki-ud-2.15-241121 | Gold tokenization | — | — | 97.14 | 97.15 | 92.47 | 91.47 | 98.36 | 93.66 | 91.38 | 75.84 | 85.70 |
hindi-hdtb-ud-2.15-241121 | Raw text | 100.00 | 98.72 | 97.59 | 97.19 | 94.21 | 92.26 | 98.92 | 95.30 | 92.39 | 79.46 | 87.81 |
hindi-hdtb-ud-2.15-241121 | Gold tokenization | — | — | 97.59 | 97.18 | 94.23 | 92.27 | 98.92 | 95.41 | 92.50 | 79.58 | 87.94 |
hungarian-szeged-ud-2.15-241121 | Raw text | 99.85 | 95.89 | 96.76 | — | 94.29 | 93.58 | 94.91 | 88.24 | 84.66 | 74.89 | 78.03 |
hungarian-szeged-ud-2.15-241121 | Gold tokenization | — | — | 96.84 | — | 94.43 | 93.66 | 95.02 | 88.70 | 85.08 | 75.20 | 78.33 |
icelandic-modern-ud-2.15-241121 | Raw text | 99.44 | 94.59 | 97.74 | 95.29 | 89.29 | 86.46 | 97.09 | 85.69 | 82.73 | 64.77 | 75.00 |
icelandic-modern-ud-2.15-241121 | Gold tokenization | — | — | 98.24 | 95.85 | 89.69 | 86.87 | 97.61 | 86.55 | 83.54 | 65.49 | 75.94 |
icelandic-gc-ud-2.15-241121 | Raw text | 99.72 | 94.64 | 94.72 | 82.03 | 85.00 | 79.71 | 91.82 | 83.41 | 79.03 | 58.46 | 69.15 |
icelandic-gc-ud-2.15-241121 | Gold tokenization | — | — | 95.00 | 82.51 | 85.50 | 80.22 | 91.98 | 84.17 | 79.77 | 59.02 | 69.68 |
icelandic-icepahc-ud-2.15-241121 | Raw text | 99.82 | 92.69 | 96.89 | 93.33 | 92.09 | 87.24 | 96.37 | 87.08 | 83.25 | 66.85 | 74.53 |
icelandic-icepahc-ud-2.15-241121 | Gold tokenization | — | — | 97.05 | 93.56 | 92.22 | 87.45 | 96.51 | 87.58 | 83.70 | 67.31 | 75.07 |
indonesian-gsd-ud-2.15-241121 | Raw text | 99.49 | 93.04 | 94.30 | 93.86 | 95.56 | 88.79 | 98.08 | 87.84 | 81.86 | 72.68 | 77.23 |
indonesian-gsd-ud-2.15-241121 | Gold tokenization | — | — | 94.78 | 94.25 | 96.00 | 89.18 | 98.49 | 88.66 | 82.59 | 73.43 | 78.01 |
indonesian-csui-ud-2.15-241121 | Raw text | 99.45 | 91.01 | 95.96 | 96.11 | 96.81 | 95.36 | 98.17 | 86.51 | 82.20 | 76.66 | 78.83 |
indonesian-csui-ud-2.15-241121 | Gold tokenization | — | — | 96.48 | 96.63 | 97.32 | 95.85 | 98.81 | 87.93 | 83.42 | 77.61 | 79.92 |
irish-idt-ud-2.15-241121 | Raw text | 99.88 | 97.58 | 95.93 | 94.99 | 90.72 | 87.62 | 95.77 | 87.16 | 81.64 | 65.30 | 72.34 |
irish-idt-ud-2.15-241121 | Gold tokenization | — | — | 96.04 | 95.14 | 90.83 | 87.76 | 95.89 | 87.50 | 81.97 | 65.44 | 72.54 |
irish-twittirish-ud-2.15-241121 | Raw text | 98.50 | 46.62 | 90.63 | — | — | 90.63 | 88.28 | 78.98 | 72.60 | 58.85 | 57.12 |
irish-twittirish-ud-2.15-241121 | Gold tokenization | — | — | 91.84 | — | — | 91.84 | 89.54 | 85.80 | 79.26 | 66.75 | 64.25 |
italian-isdt-ud-2.15-241121 | Raw text | 99.74 | 99.07 | 98.51 | 98.40 | 98.06 | 97.67 | 98.58 | 94.65 | 92.95 | 86.68 | 87.71 |
italian-isdt-ud-2.15-241121 | Gold tokenization | — | — | 98.75 | 98.66 | 98.30 | 97.93 | 98.84 | 95.08 | 93.39 | 87.08 | 88.14 |
italian-markit-ud-2.15-241121 | Raw text | 99.62 | 98.24 | 96.98 | 97.07 | 94.16 | 92.53 | 88.34 | 88.48 | 84.70 | 70.60 | 78.16 |
italian-markit-ud-2.15-241121 | Gold tokenization | — | — | 97.35 | 97.41 | 94.42 | 92.78 | 88.66 | 89.27 | 85.48 | 71.20 | 78.90 |
italian-old-ud-2.15-241121 | Raw text | 99.08 | 97.76 | 96.30 | 86.81 | 91.87 | 83.24 | 96.49 | 85.37 | 80.93 | 64.37 | 72.68 |
italian-old-ud-2.15-241121 | Gold tokenization | — | — | 97.15 | 87.27 | 92.71 | 83.82 | 97.35 | 88.20 | 83.50 | 67.16 | 75.55 |
italian-parlamint-ud-2.15-241121 | Raw text | 99.42 | 94.12 | 98.64 | 98.05 | 97.96 | 97.02 | 98.70 | 91.94 | 89.98 | 84.45 | 86.25 |
italian-parlamint-ud-2.15-241121 | Gold tokenization | — | — | 99.22 | 98.59 | 98.48 | 97.50 | 99.20 | 93.40 | 91.43 | 86.08 | 87.84 |
italian-partut-ud-2.15-241121 | Raw text | 99.73 | 100.00 | 98.43 | 98.43 | 98.16 | 97.58 | 98.60 | 95.74 | 93.79 | 87.17 | 88.62 |
italian-partut-ud-2.15-241121 | Gold tokenization | — | — | 98.60 | 98.60 | 98.30 | 97.72 | 98.79 | 95.80 | 93.76 | 87.09 | 88.54 |
italian-postwita-ud-2.15-241121 | Raw text | 99.36 | 49.53 | 96.61 | 96.39 | 96.15 | 94.87 | 96.40 | 83.05 | 79.20 | 68.99 | 70.63 |
italian-postwita-ud-2.15-241121 | Gold tokenization | — | — | 97.20 | 96.95 | 96.65 | 95.40 | 96.96 | 88.06 | 83.78 | 75.30 | 76.72 |
italian-twittiro-ud-2.15-241121 | Raw text | 98.94 | 46.67 | 95.84 | 95.61 | 94.70 | 93.12 | 94.30 | 82.95 | 78.28 | 65.69 | 66.30 |
italian-twittiro-ud-2.15-241121 | Gold tokenization | — | — | 96.71 | 96.30 | 95.67 | 93.85 | 95.23 | 88.34 | 83.50 | 71.83 | 72.07 |
italian-vit-ud-2.15-241121 | Raw text | 99.75 | 95.06 | 98.14 | 97.29 | 97.64 | 96.14 | 98.85 | 92.20 | 89.31 | 81.25 | 83.96 |
italian-vit-ud-2.15-241121 | Gold tokenization | — | — | 98.39 | 97.68 | 97.85 | 96.51 | 99.09 | 93.03 | 90.11 | 82.06 | 84.81 |
japanese-gsdluw-ud-2.15-241121 | Raw text | 95.18 | 99.72 | 93.91 | 93.66 | 95.18 | 93.59 | 93.65 | 86.30 | 85.69 | 76.64 | 76.59 |
japanese-gsdluw-ud-2.15-241121 | Gold tokenization | — | — | 98.42 | 98.15 | 99.99 | 98.02 | 97.85 | 95.23 | 94.29 | 86.58 | 85.22 |
japanese-gsd-ud-2.15-241121 | Raw text | 96.17 | 100.00 | 95.02 | 94.28 | 96.16 | 94.01 | 95.11 | 88.07 | 87.37 | 81.12 | 81.42 |
japanese-gsd-ud-2.15-241121 | Gold tokenization | — | — | 98.59 | 97.62 | 99.98 | 97.32 | 98.55 | 95.14 | 94.24 | 89.25 | 89.12 |
korean-kaist-ud-2.15-241121 | Raw text | 100.00 | 100.00 | 96.26 | 87.65 | — | 87.45 | 94.46 | 89.09 | 87.20 | 83.13 | 80.96 |
korean-kaist-ud-2.15-241121 | Gold tokenization | — | — | 96.26 | 87.65 | — | 87.45 | 94.46 | 89.09 | 87.20 | 83.13 | 80.96 |
korean-gsd-ud-2.15-241121 | Raw text | 99.87 | 93.93 | 96.50 | 90.63 | 99.68 | 88.32 | 93.87 | 88.25 | 84.65 | 81.69 | 77.71 |
korean-gsd-ud-2.15-241121 | Gold tokenization | — | — | 96.67 | 90.84 | 99.81 | 88.52 | 94.01 | 88.88 | 85.24 | 82.32 | 78.31 |
korean-ksl-ud-2.15-241121 | Raw text | 100.00 | 99.22 | 96.75 | 89.63 | — | 87.83 | 95.15 | 89.83 | 86.38 | 81.58 | 80.07 |
korean-ksl-ud-2.15-241121 | Gold tokenization | — | — | 96.74 | 89.64 | — | 87.83 | 95.15 | 89.92 | 86.47 | 81.66 | 80.15 |
kyrgyz-ktmu-ud-2.15-241121 | Raw text | 99.16 | 98.03 | 90.81 | 90.36 | 77.09 | 72.50 | 88.58 | 83.55 | 72.59 | 53.12 | 62.75 |
kyrgyz-ktmu-ud-2.15-241121 | Gold tokenization | — | — | 91.56 | 91.11 | 77.78 | 73.17 | 89.33 | 84.47 | 73.43 | 53.62 | 63.27 |
latin-ittb-ud-2.15-241121 | Raw text | 99.98 | 91.79 | 99.11 | 96.63 | 97.19 | 95.80 | 99.17 | 89.48 | 87.54 | 81.42 | 84.91 |
latin-ittb-ud-2.15-241121 | Gold tokenization | — | — | 99.14 | 96.67 | 97.24 | 95.84 | 99.21 | 90.45 | 88.52 | 82.05 | 85.50 |
latin-llct-ud-2.15-241121 | Raw text | 99.99 | 99.49 | 99.73 | 97.09 | 97.13 | 96.83 | 97.79 | 95.35 | 94.38 | 88.99 | 90.31 |
latin-llct-ud-2.15-241121 | Gold tokenization | — | — | 99.73 | 97.09 | 97.13 | 96.84 | 97.80 | 95.36 | 94.39 | 88.99 | 90.31 |
latin-perseus-ud-2.15-241121 | Raw text | 98.23 | 99.09 | 91.46 | 80.03 | 83.36 | 76.59 | 87.60 | 76.93 | 70.09 | 52.28 | 58.28 |
latin-perseus-ud-2.15-241121 | Gold tokenization | — | — | 93.18 | 81.55 | 84.92 | 78.04 | 89.21 | 78.09 | 71.12 | 52.61 | 59.23 |
latin-proiel-ud-2.15-241121 | Raw text | 99.85 | 37.40 | 96.52 | 96.58 | 90.71 | 89.42 | 96.08 | 76.57 | 72.36 | 59.00 | 64.80 |
latin-proiel-ud-2.15-241121 | Gold tokenization | — | — | 97.06 | 97.10 | 91.55 | 90.41 | 96.30 | 83.88 | 79.62 | 67.94 | 73.42 |
latin-udante-ud-2.15-241121 | Raw text | 99.60 | 98.45 | 91.16 | 75.61 | 84.53 | 72.57 | 87.61 | 76.74 | 69.64 | 48.88 | 52.99 |
latin-udante-ud-2.15-241121 | Gold tokenization | — | — | 91.42 | 75.73 | 84.78 | 72.65 | 87.87 | 76.93 | 69.81 | 48.91 | 53.07 |
latvian-lvtb-ud-2.15-241121 | Raw text | 99.27 | 98.09 | 97.15 | 91.74 | 95.18 | 91.32 | 96.76 | 89.45 | 86.56 | 78.80 | 81.97 |
latvian-lvtb-ud-2.15-241121 | Gold tokenization | — | — | 97.83 | 92.42 | 95.89 | 91.99 | 97.42 | 90.53 | 87.60 | 79.96 | 83.09 |
lithuanian-alksnis-ud-2.15-241121 | Raw text | 99.91 | 87.87 | 96.03 | 90.49 | 91.22 | 89.72 | 93.59 | 82.91 | 79.35 | 68.74 | 71.87 |
lithuanian-alksnis-ud-2.15-241121 | Gold tokenization | — | — | 96.15 | 90.59 | 91.31 | 89.83 | 93.68 | 84.17 | 80.59 | 69.66 | 72.80 |
lithuanian-hse-ud-2.15-241121 | Raw text | 97.30 | 97.30 | 90.31 | 89.93 | 82.20 | 78.75 | 88.35 | 71.67 | 62.35 | 45.18 | 53.97 |
lithuanian-hse-ud-2.15-241121 | Gold tokenization | — | — | 92.17 | 91.89 | 84.06 | 80.38 | 90.85 | 75.28 | 64.91 | 46.72 | 55.70 |
low_saxon-lsdc-ud-2.15-241121 | Raw text | 99.25 | 90.23 | 89.96 | — | 71.84 | 69.14 | 83.89 | 74.31 | 65.21 | 37.02 | 48.45 |
low_saxon-lsdc-ud-2.15-241121 | Gold tokenization | — | — | 90.59 | — | 72.43 | 69.64 | 84.39 | 75.48 | 66.25 | 37.13 | 48.74 |
maghrebi_arabic_french-arabizi-ud-2.15-241121 | Raw text | 91.65 | 7.00 | 78.90 | 72.06 | 83.03 | 70.37 | 51.43 | 57.85 | 49.98 | 36.37 | 24.60 |
maghrebi_arabic_french-arabizi-ud-2.15-241121 | Gold tokenization | — | — | 86.55 | 78.66 | 90.64 | 77.33 | 54.89 | 76.14 | 65.63 | 47.29 | 31.71 |
maltese-mudt-ud-2.15-241121 | Raw text | 99.84 | 86.29 | 95.64 | 95.55 | — | 95.24 | — | 84.61 | 79.54 | 67.91 | 71.94 |
maltese-mudt-ud-2.15-241121 | Gold tokenization | — | — | 95.75 | 95.68 | — | 95.34 | — | 85.32 | 80.19 | 68.41 | 72.44 |
manx-cadhan-ud-2.15-241121 | Raw text | 97.36 | 98.25 | 94.09 | — | 95.84 | 93.32 | 93.34 | 87.60 | 84.14 | 77.74 | 77.73 |
manx-cadhan-ud-2.15-241121 | Gold tokenization | — | — | 96.68 | — | 98.43 | 95.85 | 95.88 | 92.57 | 89.12 | 82.90 | 81.70 |
marathi-ufal-ud-2.15-241121 | Raw text | 94.16 | 92.63 | 82.73 | — | 75.18 | 71.53 | 84.18 | 66.67 | 60.34 | 40.00 | 47.84 |
marathi-ufal-ud-2.15-241121 | Gold tokenization | — | — | 87.14 | — | 78.64 | 74.51 | 87.14 | 72.33 | 65.29 | 43.71 | 51.13 |
naija-nsc-ud-2.15-241121 | Raw text | 99.97 | 100.00 | 98.12 | — | 98.94 | 97.59 | 99.39 | 93.10 | 90.55 | 87.51 | 89.19 |
naija-nsc-ud-2.15-241121 | Gold tokenization | — | — | 98.15 | — | 98.95 | 97.60 | 99.42 | 93.13 | 90.58 | 87.52 | 89.21 |
north_sami-giella-ud-2.15-241121 | Raw text | 99.87 | 98.79 | 91.64 | 93.42 | 89.16 | 84.95 | 87.07 | 75.74 | 70.92 | 60.14 | 58.71 |
north_sami-giella-ud-2.15-241121 | Gold tokenization | — | — | 91.78 | 93.57 | 89.30 | 85.08 | 87.19 | 75.99 | 71.18 | 60.30 | 58.92 |
norwegian-bokmaal-ud-2.15-241121 | Raw text | 99.82 | 97.27 | 98.39 | 98.95 | 97.48 | 96.82 | 98.62 | 94.00 | 92.78 | 87.18 | 89.00 |
norwegian-bokmaal-ud-2.15-241121 | Gold tokenization | — | — | 98.59 | 99.13 | 97.65 | 96.99 | 98.82 | 94.67 | 93.43 | 87.79 | 89.65 |
norwegian-nynorsk-ud-2.15-241121 | Raw text | 99.93 | 94.54 | 98.36 | 99.06 | 97.29 | 96.45 | 98.40 | 93.94 | 92.47 | 85.90 | 88.09 |
norwegian-nynorsk-ud-2.15-241121 | Gold tokenization | — | — | 98.55 | 99.20 | 97.46 | 96.68 | 98.55 | 94.69 | 93.24 | 86.85 | 89.04 |
old_church_slavonic-proiel-ud-2.15-241121 | Raw text | 100.00 | 40.05 | 96.23 | 96.48 | 89.78 | 88.01 | 90.21 | 78.01 | 73.64 | 60.75 | 62.07 |
old_church_slavonic-proiel-ud-2.15-241121 | Gold tokenization | — | — | 96.68 | 96.97 | 90.42 | 88.99 | 90.29 | 85.13 | 80.56 | 68.45 | 69.16 |
old_east_slavic-torot-ud-2.15-241121 | Raw text | 100.00 | 34.53 | 95.34 | 95.41 | 89.70 | 87.47 | 88.42 | 77.02 | 72.42 | 58.65 | 58.48 |
old_east_slavic-torot-ud-2.15-241121 | Gold tokenization | — | — | 95.87 | 95.91 | 90.63 | 88.71 | 88.48 | 85.60 | 80.77 | 68.28 | 66.57 |
old_east_slavic-birchbark-ud-2.15-241121 | Raw text | 99.99 | 16.66 | 88.74 | 99.35 | 75.36 | 71.18 | 65.21 | 65.19 | 58.34 | 32.66 | 27.06 |
old_east_slavic-birchbark-ud-2.15-241121 | Gold tokenization | — | — | 88.88 | 99.35 | 75.99 | 71.81 | 65.28 | 76.89 | 69.91 | 40.37 | 32.66 |
old_east_slavic-rnc-ud-2.15-241121 | Raw text | 99.77 | 94.56 | 97.57 | 91.49 | 89.51 | 82.05 | 90.63 | 76.91 | 73.22 | 54.56 | 57.05 |
old_east_slavic-rnc-ud-2.15-241121 | Gold tokenization | — | — | 97.82 | 91.69 | 89.79 | 82.32 | 90.89 | 79.53 | 75.69 | 56.31 | 58.99 |
old_east_slavic-ruthenian-ud-2.15-241121 | Raw text | 99.87 | 99.61 | 96.17 | 89.35 | 87.73 | 80.61 | 82.89 | 78.09 | 74.30 | 53.77 | 49.03 |
old_east_slavic-ruthenian-ud-2.15-241121 | Gold tokenization | — | — | 96.23 | 89.77 | 87.78 | 81.02 | 82.96 | 78.16 | 74.35 | 53.73 | 49.06 |
old_french-profiterole-ud-2.15-241121 | Raw text | 99.82 | 100.00 | 97.15 | 97.05 | 97.54 | 95.63 | 99.79 | 91.04 | 87.47 | 80.05 | 84.53 |
old_french-profiterole-ud-2.15-241121 | Gold tokenization | — | — | 97.33 | 97.24 | 97.72 | 95.82 | 99.97 | 91.29 | 87.72 | 80.31 | 84.80 |
ottoman_turkish-boun-ud-2.15-241121 | Raw text | 99.41 | 87.96 | 87.32 | 90.51 | 80.87 | 72.90 | 82.19 | 61.58 | 51.26 | 32.83 | 36.22 |
ottoman_turkish-boun-ud-2.15-241121 | Gold tokenization | — | — | 87.77 | 90.97 | 81.24 | 73.21 | 82.52 | 62.34 | 51.82 | 33.10 | 36.63 |
persian-perdt-ud-2.15-241121 | Raw text | 99.66 | 99.83 | 97.45 | 97.40 | 97.65 | 95.65 | 98.96 | 93.58 | 91.39 | 86.23 | 88.74 |
persian-perdt-ud-2.15-241121 | Gold tokenization | — | — | 97.75 | 97.70 | 97.95 | 95.94 | 99.28 | 94.09 | 91.87 | 86.77 | 89.30 |
persian-seraji-ud-2.15-241121 | Raw text | 99.65 | 98.75 | 97.95 | 97.92 | 97.95 | 97.47 | 98.27 | 91.66 | 88.85 | 84.42 | 84.51 |
persian-seraji-ud-2.15-241121 | Gold tokenization | — | — | 98.27 | 98.23 | 98.27 | 97.76 | 98.54 | 92.37 | 89.52 | 85.00 | 85.11 |
polish-pdb-ud-2.15-241121 | Raw text | 99.86 | 97.00 | 98.99 | 96.07 | 96.05 | 95.35 | 98.12 | 94.40 | 92.57 | 85.84 | 88.86 |
polish-pdb-ud-2.15-241121 | Gold tokenization | — | — | 99.11 | 96.23 | 96.22 | 95.51 | 98.23 | 94.95 | 93.10 | 86.32 | 89.30 |
polish-lfg-ud-2.15-241121 | Raw text | 99.85 | 99.65 | 99.01 | 96.18 | 96.68 | 95.22 | 98.17 | 96.91 | 95.62 | 90.04 | 92.40 |
polish-lfg-ud-2.15-241121 | Gold tokenization | — | — | 99.18 | 96.35 | 96.86 | 95.39 | 98.31 | 97.29 | 96.00 | 90.44 | 92.73 |
pomak-philotis-ud-2.15-241121 | Raw text | 99.79 | 89.42 | 95.42 | — | 88.85 | 87.88 | 91.37 | 88.30 | 81.80 | 63.73 | 67.56 |
pomak-philotis-ud-2.15-241121 | Gold tokenization | — | — | 95.54 | — | 88.98 | 87.99 | 91.49 | 89.24 | 82.65 | 64.34 | 68.30 |
portuguese-bosque-ud-2.15-241121 | Raw text | 99.68 | 89.73 | 97.78 | — | 96.92 | 95.85 | 98.36 | 92.31 | 89.98 | 80.69 | 84.65 |
portuguese-bosque-ud-2.15-241121 | Gold tokenization | — | — | 98.11 | — | 97.19 | 96.12 | 98.65 | 93.46 | 91.08 | 81.78 | 85.74 |
portuguese-cintil-ud-2.15-241121 | Raw text | 99.41 | 78.66 | 97.44 | 96.04 | 95.33 | 93.23 | 97.49 | 85.30 | 82.28 | 72.33 | 75.94 |
portuguese-cintil-ud-2.15-241121 | Gold tokenization | — | — | 98.04 | 96.65 | 95.93 | 93.81 | 98.06 | 87.64 | 84.51 | 74.47 | 78.14 |
portuguese-dantestocks-ud-2.15-241121 | Raw text | 96.47 | 38.27 | 94.23 | 96.45 | 93.84 | 92.88 | 93.66 | 85.38 | 83.08 | 75.34 | 75.93 |
portuguese-dantestocks-ud-2.15-241121 | Gold tokenization | — | — | 97.71 | 99.98 | 97.35 | 96.35 | 95.93 | 93.04 | 90.56 | 84.23 | 82.70 |
portuguese-gsd-ud-2.15-241121 | Raw text | 99.29 | 86.25 | 97.49 | 89.64 | 94.60 | 89.18 | 97.14 | 92.75 | 90.80 | 80.17 | 85.14 |
portuguese-gsd-ud-2.15-241121 | Gold tokenization | — | — | 98.27 | 91.76 | 96.07 | 91.27 | 97.98 | 94.25 | 92.35 | 82.76 | 86.84 |
portuguese-petrogold-ud-2.15-241121 | Raw text | 99.59 | 93.11 | 98.79 | — | 98.69 | 98.21 | 99.12 | 94.69 | 93.53 | 88.53 | 90.01 |
portuguese-petrogold-ud-2.15-241121 | Gold tokenization | — | — | 99.10 | — | 98.96 | 98.47 | 99.54 | 95.61 | 94.37 | 89.44 | 91.03 |
portuguese-porttinari-ud-2.15-241121 | Raw text | 94.68 | 28.05 | 93.90 | — | 93.44 | 93.10 | 94.17 | 85.85 | 84.31 | 78.62 | 81.30 |
portuguese-porttinari-ud-2.15-241121 | Gold tokenization | — | — | 99.20 | — | 98.72 | 98.36 | 99.45 | 96.48 | 95.23 | 90.19 | 91.95 |
romanian-rrt-ud-2.15-241121 | Raw text | 99.70 | 95.50 | 97.83 | 97.19 | 97.41 | 96.91 | 97.99 | 91.90 | 88.44 | 81.88 | 83.42 |
romanian-rrt-ud-2.15-241121 | Gold tokenization | — | — | 98.11 | 97.43 | 97.67 | 97.16 | 98.25 | 92.64 | 89.11 | 82.32 | 83.92 |
romanian-nonstandard-ud-2.15-241121 | Raw text | 98.83 | 96.77 | 96.16 | 91.94 | 90.58 | 89.24 | 94.86 | 89.06 | 84.99 | 68.53 | 76.59 |
romanian-nonstandard-ud-2.15-241121 | Gold tokenization | — | — | 97.29 | 92.97 | 91.56 | 90.19 | 95.94 | 90.76 | 86.62 | 70.04 | 77.84 |
romanian-simonero-ud-2.15-241121 | Raw text | 99.84 | 100.00 | 98.46 | 97.94 | 97.55 | 97.22 | 98.88 | 94.01 | 92.09 | 85.42 | 88.34 |
romanian-simonero-ud-2.15-241121 | Gold tokenization | — | — | 98.62 | 98.09 | 97.70 | 97.37 | 99.04 | 94.36 | 92.41 | 85.69 | 88.61 |
russian-syntagrus-ud-2.15-241121 | Raw text | 99.67 | 98.31 | 98.48 | — | 94.01 | 93.76 | 98.18 | 93.80 | 91.67 | 82.76 | 88.83 |
russian-syntagrus-ud-2.15-241121 | Gold tokenization | — | — | 98.81 | — | 94.34 | 94.07 | 98.46 | 94.51 | 92.34 | 83.32 | 89.36 |
russian-gsd-ud-2.15-241121 | Raw text | 99.50 | 96.49 | 98.04 | 97.52 | 94.55 | 93.40 | 96.91 | 91.59 | 88.62 | 80.99 | 84.54 |
russian-gsd-ud-2.15-241121 | Gold tokenization | — | — | 98.51 | 97.94 | 94.97 | 93.78 | 97.29 | 92.83 | 89.74 | 81.84 | 85.48 |
russian-poetry-ud-2.15-241121 | Raw text | 99.59 | 95.96 | 97.86 | — | 94.43 | 93.89 | 97.01 | 89.10 | 86.14 | 77.13 | 80.68 |
russian-poetry-ud-2.15-241121 | Gold tokenization | — | — | 98.24 | — | 94.77 | 94.23 | 97.36 | 90.04 | 87.07 | 77.84 | 81.36 |
russian-taiga-ud-2.15-241121 | Raw text | 98.07 | 86.01 | 95.55 | — | 93.12 | 92.12 | 94.77 | 83.27 | 79.86 | 71.21 | 74.38 |
russian-taiga-ud-2.15-241121 | Gold tokenization | — | — | 97.27 | — | 94.91 | 93.83 | 96.47 | 85.97 | 82.33 | 73.62 | 76.76 |
sanskrit-vedic-ud-2.15-241121 | Raw text | 100.00 | 29.21 | 93.56 | — | 89.19 | 85.34 | 93.43 | 65.35 | 56.83 | 49.01 | 52.10 |
sanskrit-vedic-ud-2.15-241121 | Gold tokenization | — | — | 93.95 | — | 90.47 | 86.83 | 93.57 | 78.23 | 69.08 | 60.90 | 64.49 |
scottish_gaelic-arcosg-ud-2.15-241121 | Raw text | 97.42 | 61.26 | 93.83 | 89.65 | 91.09 | 88.52 | 95.13 | 80.86 | 76.41 | 65.27 | 70.06 |
scottish_gaelic-arcosg-ud-2.15-241121 | Gold tokenization | — | — | 96.61 | 92.63 | 94.01 | 91.57 | 97.71 | 86.97 | 82.58 | 71.95 | 76.15 |
serbian-set-ud-2.15-241121 | Raw text | 99.99 | 93.00 | 99.09 | 95.92 | 96.10 | 95.69 | 97.80 | 93.60 | 91.18 | 83.54 | 86.95 |
serbian-set-ud-2.15-241121 | Gold tokenization | — | — | 99.11 | 95.97 | 96.14 | 95.73 | 97.79 | 94.32 | 91.88 | 84.29 | 87.68 |
slovak-snk-ud-2.15-241121 | Raw text | 100.00 | 81.69 | 97.69 | 90.12 | 93.40 | 89.34 | 96.54 | 91.58 | 89.94 | 80.37 | 84.70 |
slovak-snk-ud-2.15-241121 | Gold tokenization | — | — | 97.83 | 90.35 | 93.48 | 89.56 | 96.57 | 93.99 | 92.30 | 82.68 | 87.13 |
slovenian-ssj-ud-2.15-241121 | Raw text | 99.94 | 98.95 | 98.78 | 97.01 | 97.12 | 96.57 | 98.59 | 94.37 | 92.78 | 87.22 | 89.16 |
slovenian-ssj-ud-2.15-241121 | Gold tokenization | — | — | 98.84 | 97.07 | 97.17 | 96.63 | 98.64 | 94.51 | 92.91 | 87.37 | 89.28 |
slovenian-sst-ud-2.15-241121 | Raw text | 99.87 | 95.47 | 98.45 | 96.90 | 97.01 | 96.19 | 98.83 | 84.82 | 82.12 | 73.82 | 77.42 |
slovenian-sst-ud-2.15-241121 | Gold tokenization | — | — | 98.59 | 97.00 | 97.08 | 96.28 | 98.97 | 85.33 | 82.63 | 74.23 | 77.91 |
spanish-ancora-ud-2.15-241121 | Raw text | 99.95 | 98.69 | 99.06 | 96.22 | 98.80 | 95.83 | 99.47 | 93.80 | 92.15 | 87.11 | 88.69 |
spanish-ancora-ud-2.15-241121 | Gold tokenization | — | — | 99.11 | 96.26 | 98.85 | 95.86 | 99.51 | 94.00 | 92.35 | 87.30 | 88.85 |
spanish-gsd-ud-2.15-241121 | Raw text | 99.73 | 93.84 | 97.10 | — | 96.86 | 95.15 | 98.61 | 92.52 | 90.36 | 78.94 | 84.50 |
spanish-gsd-ud-2.15-241121 | Gold tokenization | — | — | 97.35 | — | 97.12 | 95.38 | 98.86 | 93.42 | 91.18 | 79.71 | 85.29 |
swedish-talbanken-ud-2.15-241121 | Raw text | 99.84 | 96.53 | 98.41 | 97.22 | 97.21 | 96.19 | 98.62 | 92.73 | 90.34 | 84.18 | 87.11 |
swedish-talbanken-ud-2.15-241121 | Gold tokenization | — | — | 98.59 | 97.41 | 97.39 | 96.40 | 98.78 | 93.12 | 90.72 | 84.68 | 87.54 |
swedish-lines-ud-2.15-241121 | Raw text | 99.96 | 88.50 | 97.70 | 95.42 | 92.96 | 89.90 | 97.78 | 91.13 | 87.97 | 75.30 | 82.66 |
swedish-lines-ud-2.15-241121 | Gold tokenization | — | — | 97.73 | 95.49 | 93.04 | 89.96 | 97.82 | 91.89 | 88.67 | 76.02 | 83.39 |
tamil-ttb-ud-2.15-241121 | Raw text | 94.26 | 97.52 | 84.19 | 82.27 | 84.29 | 77.71 | 89.35 | 70.73 | 62.23 | 50.63 | 55.60 |
tamil-ttb-ud-2.15-241121 | Gold tokenization | — | — | 89.04 | 87.03 | 89.49 | 82.20 | 94.32 | 78.33 | 68.98 | 56.63 | 61.77 |
telugu-mtg-ud-2.15-241121 | Raw text | 99.58 | 96.62 | 93.63 | 93.63 | 98.48 | 93.35 | — | 90.03 | 83.24 | 76.00 | 79.24 |
telugu-mtg-ud-2.15-241121 | Gold tokenization | — | — | 94.04 | 94.04 | 98.89 | 93.76 | — | 90.98 | 84.05 | 76.64 | 79.89 |
turkish-boun-ud-2.15-241121 | Raw text | 96.57 | 86.25 | 89.96 | 85.96 | 80.92 | 71.62 | 90.60 | 72.93 | 66.98 | 49.09 | 61.03 |
turkish-boun-ud-2.15-241121 | Gold tokenization | — | — | 93.00 | 88.89 | 82.90 | 73.32 | 93.62 | 80.45 | 73.81 | 52.59 | 66.63 |
turkish-atis-ud-2.15-241121 | Raw text | 99.90 | 79.28 | 98.42 | — | 97.97 | 97.82 | 98.96 | 89.31 | 87.56 | 84.85 | 86.15 |
turkish-atis-ud-2.15-241121 | Gold tokenization | — | — | 98.53 | — | 98.11 | 97.94 | 99.07 | 91.63 | 89.72 | 86.98 | 88.39 |
turkish-framenet-ud-2.15-241121 | Raw text | 99.90 | 99.27 | 96.83 | — | 94.79 | 93.97 | 96.63 | 93.36 | 84.36 | 74.37 | 77.84 |
turkish-framenet-ud-2.15-241121 | Gold tokenization | — | — | 96.93 | — | 94.89 | 94.07 | 96.73 | 93.52 | 84.53 | 74.50 | 77.97 |
turkish-imst-ud-2.15-241121 | Raw text | 97.31 | 97.38 | 92.73 | 92.53 | 89.44 | 86.59 | 93.55 | 76.48 | 69.29 | 58.17 | 63.97 |
turkish-imst-ud-2.15-241121 | Gold tokenization | — | — | 95.20 | 94.86 | 91.65 | 88.65 | 95.88 | 81.39 | 73.76 | 60.66 | 67.22 |
turkish-kenet-ud-2.15-241121 | Raw text | 100.00 | 98.12 | 93.78 | — | 91.90 | 90.80 | 93.52 | 84.15 | 71.53 | 62.31 | 65.22 |
turkish-kenet-ud-2.15-241121 | Gold tokenization | — | — | 93.80 | — | 91.91 | 90.82 | 93.51 | 84.28 | 71.61 | 62.40 | 65.30 |
turkish-penn-ud-2.15-241121 | Raw text | 99.27 | 82.89 | 95.68 | — | 94.46 | 93.41 | 94.28 | 84.72 | 72.21 | 62.67 | 65.14 |
turkish-penn-ud-2.15-241121 | Gold tokenization | — | — | 96.39 | — | 95.10 | 94.06 | 94.95 | 86.91 | 74.09 | 63.77 | 66.34 |
turkish-tourism-ud-2.15-241121 | Raw text | 99.99 | 100.00 | 98.79 | — | 94.98 | 94.57 | 98.28 | 97.14 | 91.49 | 81.66 | 87.10 |
turkish-tourism-ud-2.15-241121 | Gold tokenization | — | — | 98.80 | — | 94.99 | 94.59 | 98.30 | 97.15 | 91.50 | 81.68 | 87.12 |
turkish_german-sagt-ud-2.15-241121 | Raw text | 98.91 | 99.44 | 90.21 | — | 80.19 | 75.40 | 90.76 | 70.92 | 60.68 | 40.99 | 50.56 |
turkish_german-sagt-ud-2.15-241121 | Gold tokenization | — | — | 91.11 | — | 80.80 | 75.92 | 91.47 | 72.37 | 61.79 | 41.55 | 51.31 |
ukrainian-iu-ud-2.15-241121 | Raw text | 99.81 | 96.23 | 98.02 | 94.29 | 94.51 | 93.26 | 97.63 | 90.72 | 88.37 | 79.31 | 83.57 |
ukrainian-iu-ud-2.15-241121 | Gold tokenization | — | — | 98.23 | 94.50 | 94.70 | 93.45 | 97.82 | 91.32 | 88.94 | 79.69 | 84.00 |
ukrainian-parlamint-ud-2.15-241121 | Raw text | 99.88 | 99.62 | 98.34 | 98.54 | 94.91 | 93.95 | 98.89 | 93.36 | 90.71 | 81.65 | 87.38 |
ukrainian-parlamint-ud-2.15-241121 | Gold tokenization | — | — | 98.47 | 98.64 | 95.00 | 94.04 | 99.01 | 93.53 | 90.89 | 81.71 | 87.50 |
urdu-udtb-ud-2.15-241121 | Raw text | 100.00 | 98.31 | 94.18 | 92.31 | 82.87 | 78.61 | 97.35 | 88.00 | 82.88 | 57.44 | 75.16 |
urdu-udtb-ud-2.15-241121 | Gold tokenization | — | — | 94.17 | 92.29 | 82.85 | 78.58 | 97.37 | 88.11 | 82.96 | 57.43 | 75.25 |
uyghur-udt-ud-2.15-241121 | Raw text | 99.54 | 81.87 | 89.74 | 91.79 | 87.99 | 80.67 | 94.74 | 75.59 | 64.70 | 50.04 | 57.43 |
uyghur-udt-ud-2.15-241121 | Gold tokenization | — | — | 90.21 | 92.34 | 88.44 | 81.14 | 95.23 | 77.35 | 66.34 | 51.14 | 58.63 |
vietnamese-vtb-ud-2.15-241121 | Raw text | 86.06 | 93.73 | 78.47 | 77.51 | — | 77.34 | 85.77 | 56.62 | 49.62 | 41.05 | 45.25 |
vietnamese-vtb-ud-2.15-241121 | Gold tokenization | — | — | 89.83 | 88.75 | — | 88.54 | 99.51 | 76.22 | 65.85 | 55.29 | 61.23 |
welsh-ccg-ud-2.15-241121 | Raw text | 99.56 | 97.79 | 95.63 | 94.63 | 89.87 | 87.59 | 94.69 | 87.52 | 81.57 | 63.84 | 70.76 |
welsh-ccg-ud-2.15-241121 | Gold tokenization | — | — | 96.03 | 95.00 | 90.26 | 87.95 | 95.11 | 88.54 | 82.54 | 64.77 | 71.75 |
western_armenian-armtdp-ud-2.15-241121 | Raw text | 99.89 | 98.68 | 96.91 | — | 92.71 | 91.97 | 97.10 | 89.37 | 84.87 | 70.30 | 76.52 |
western_armenian-armtdp-ud-2.15-241121 | Gold tokenization | — | — | 96.98 | — | 92.80 | 92.04 | 97.20 | 89.64 | 85.11 | 70.54 | 76.76 |
wolof-wtb-ud-2.15-241121 | Raw text | 99.23 | 91.95 | 94.07 | 93.99 | 93.53 | 91.34 | 95.15 | 84.04 | 78.76 | 66.90 | 70.21 |
wolof-wtb-ud-2.15-241121 | Gold tokenization | — | — | 95.08 | 94.96 | 94.33 | 92.24 | 95.93 | 86.19 | 80.86 | 69.06 | 72.20 |
Universal Dependencies 2.12 Models are distributed under theCC BY-NC-SA licence.The models are based solely onUniversal Dependencies2.12 treebanks, and additionallyusemultilingual BERTandRobeCzech.
The models requireUDPipe 2.
The latest version 230717 of the Universal Dependencies 2.12 modelscan be downloaded fromLINDAT/CLARIN repository.
The models are also available in theREST service.
This work has been supported by the Ministry of Education, Youth and Sports ofthe Czech Republic, Project No. LM2018101 LINDAT/CLARIAH-CZ.
The models were trained onUniversal Dependencies 2.12 treebanks.
For the UD treebanks which do not contain original plain text version,raw text is used to train the tokenizer instead. The plain textswere taken from theW2C -- Web to Corpus.
Finally,multilingual BERTandRobeCzech are used to providecontextualized word embeddings.
The Universal Dependencies 2.12 models contain 131 models of 72 languages, eachconsisting of a tokenizer, tagger, lemmatizer and dependency parser, all trainedusing the UD data. We used the original train-dev-test split, but for treebankswith only train and no dev data we used last 10% of the train data as dev data.We produce models only for treebanks with at least 1000 training words.
The tokenizer is trained using theSpaceAfter=No
features. If the featuresare not present in the data, they can be filled in using raw text in thelanguage in question.
The tagger, lemmatizer and parser are trained using gold UD data.
We present the tokenizer, tagger, lemmatizer and parser performance, measured onthe testing portion of the data, evaluated both on the raw text and using thegold tokenization. The results are F1 scores measured by theconll18_ud_eval.py
script.
Model | Mode | Words | Sents | UPOS | XPOS | UFeats | AllTags | Lemma | UAS | LAS | MLAS | BLEX |
---|---|---|---|---|---|---|---|---|---|---|---|---|
afrikaans-afribooms-ud-2.12-230717 | Raw text | 99.94 | 99.65 | 98.65 | 95.80 | 98.35 | 95.62 | 98.29 | 90.41 | 87.46 | 78.92 | 79.72 |
afrikaans-afribooms-ud-2.12-230717 | Gold tokenization | — | — | 98.71 | 95.85 | 98.41 | 95.67 | 98.33 | 90.58 | 87.63 | 79.08 | 79.85 |
ancient_greek-proiel-ud-2.12-230717 | Raw text | 99.98 | 49.19 | 97.63 | 97.93 | 92.13 | 90.72 | 94.76 | 81.74 | 77.92 | 61.96 | 66.22 |
ancient_greek-proiel-ud-2.12-230717 | Gold tokenization | — | — | 97.87 | 98.14 | 92.58 | 91.30 | 94.82 | 86.59 | 82.72 | 68.31 | 72.13 |
ancient_greek-perseus-ud-2.12-230717 | Raw text | 99.97 | 98.85 | 92.98 | 85.76 | 91.42 | 84.89 | 86.72 | 80.13 | 74.55 | 54.46 | 55.76 |
ancient_greek-perseus-ud-2.12-230717 | Gold tokenization | — | — | 93.02 | 85.78 | 91.44 | 84.91 | 86.74 | 80.30 | 74.71 | 54.56 | 55.87 |
ancient_hebrew-ptnk-ud-2.12-230717 | Raw text | 71.14 | 98.06 | 69.03 | 69.17 | 67.09 | 66.13 | 67.47 | 48.21 | 46.75 | 31.14 | 32.42 |
ancient_hebrew-ptnk-ud-2.12-230717 | Gold tokenization | — | — | 96.70 | 96.86 | 90.36 | 89.22 | 96.60 | 91.29 | 87.61 | 68.26 | 76.48 |
arabic-padt-ud-2.12-230717 | Raw text | 94.58 | 82.09 | 91.67 | 89.00 | 89.12 | 88.65 | 90.34 | 78.87 | 74.83 | 66.04 | 68.16 |
arabic-padt-ud-2.12-230717 | Gold tokenization | — | — | 96.96 | 94.34 | 94.50 | 94.01 | 95.20 | 88.24 | 83.65 | 74.73 | 76.32 |
armenian-armtdp-ud-2.12-230717 | Raw text | 99.28 | 95.70 | 96.29 | — | 91.22 | 90.21 | 94.93 | 86.55 | 81.73 | 68.65 | 73.67 |
armenian-armtdp-ud-2.12-230717 | Gold tokenization | — | — | 96.86 | — | 91.86 | 90.71 | 95.54 | 88.21 | 83.30 | 69.30 | 74.60 |
armenian-bsut-ud-2.12-230717 | Raw text | 99.79 | 98.73 | 97.38 | — | 91.96 | 91.20 | 96.79 | 90.06 | 85.83 | 71.10 | 79.05 |
armenian-bsut-ud-2.12-230717 | Gold tokenization | — | — | 97.60 | — | 92.16 | 91.41 | 96.99 | 90.62 | 86.38 | 71.65 | 79.57 |
basque-bdt-ud-2.12-230717 | Raw text | 99.94 | 99.83 | 96.33 | — | 93.32 | 91.44 | 96.38 | 87.81 | 84.80 | 74.93 | 79.49 |
basque-bdt-ud-2.12-230717 | Gold tokenization | — | — | 96.39 | — | 93.37 | 91.48 | 96.41 | 87.88 | 84.86 | 74.97 | 79.52 |
belarusian-hse-ud-2.12-230717 | Raw text | 99.37 | 86.58 | 98.21 | 97.59 | 94.52 | 93.57 | 93.25 | 87.32 | 85.41 | 76.98 | 76.66 |
belarusian-hse-ud-2.12-230717 | Gold tokenization | — | — | 98.81 | 98.16 | 95.10 | 94.13 | 93.82 | 89.79 | 87.60 | 78.81 | 78.42 |
bulgarian-btb-ud-2.12-230717 | Raw text | 99.91 | 94.17 | 99.20 | 97.25 | 97.96 | 96.87 | 98.01 | 94.41 | 91.71 | 85.92 | 86.61 |
bulgarian-btb-ud-2.12-230717 | Gold tokenization | — | — | 99.33 | 97.37 | 98.09 | 96.99 | 98.09 | 95.22 | 92.48 | 86.61 | 87.35 |
catalan-ancora-ud-2.12-230717 | Raw text | 99.95 | 99.08 | 99.11 | 97.25 | 98.72 | 96.98 | 99.40 | 94.92 | 93.43 | 88.06 | 89.51 |
catalan-ancora-ud-2.12-230717 | Gold tokenization | — | — | 99.17 | 97.36 | 98.79 | 97.09 | 99.46 | 95.08 | 93.59 | 88.27 | 89.70 |
chinese-gsdsimp-ud-2.12-230717 | Raw text | 90.29 | 99.10 | 87.23 | 87.15 | 89.70 | 86.47 | 90.23 | 72.74 | 70.28 | 63.38 | 66.93 |
chinese-gsdsimp-ud-2.12-230717 | Gold tokenization | — | — | 96.10 | 96.00 | 99.43 | 95.33 | 99.93 | 87.08 | 83.97 | 78.23 | 82.55 |
chinese-gsd-ud-2.12-230717 | Raw text | 90.27 | 99.10 | 87.18 | 87.09 | 89.68 | 86.42 | 90.20 | 72.57 | 70.14 | 63.02 | 66.73 |
chinese-gsd-ud-2.12-230717 | Gold tokenization | — | — | 96.16 | 96.05 | 99.41 | 95.38 | 99.92 | 87.18 | 84.07 | 78.07 | 82.55 |
classical_chinese-kyoto-ud-2.12-230717 | Raw text | 97.94 | 46.37 | 89.68 | 88.94 | 91.59 | 86.06 | 97.51 | 71.39 | 66.13 | 62.36 | 64.70 |
classical_chinese-kyoto-ud-2.12-230717 | Gold tokenization | — | — | 93.43 | 92.14 | 94.63 | 89.89 | 99.52 | 84.71 | 79.40 | 75.55 | 78.24 |
coptic-scriptorium-ud-2.12-230717 | Raw text | 75.26 | 33.63 | 73.02 | 72.92 | 73.15 | 72.08 | 73.90 | 51.82 | 50.05 | 37.34 | 39.84 |
coptic-scriptorium-ud-2.12-230717 | Gold tokenization | — | — | 97.05 | 96.91 | 97.81 | 95.90 | 97.41 | 90.46 | 87.70 | 76.41 | 79.79 |
croatian-set-ud-2.12-230717 | Raw text | 99.93 | 94.79 | 98.49 | 95.82 | 96.31 | 95.53 | 97.78 | 92.41 | 89.57 | 81.89 | 84.77 |
croatian-set-ud-2.12-230717 | Gold tokenization | — | — | 98.57 | 95.90 | 96.39 | 95.62 | 97.83 | 92.91 | 90.07 | 82.34 | 85.21 |
czech-pdt-ud-2.12-230717 | Raw text | 99.93 | 93.37 | 99.30 | 98.46 | 98.78 | 98.25 | 99.38 | 95.01 | 93.64 | 90.75 | 92.30 |
czech-pdt-ud-2.12-230717 | Gold tokenization | — | — | 99.38 | 98.55 | 98.87 | 98.34 | 99.46 | 95.81 | 94.43 | 91.41 | 92.97 |
czech-cac-ud-2.12-230717 | Raw text | 99.99 | 99.68 | 99.69 | 98.25 | 98.07 | 97.74 | 99.30 | 96.20 | 94.95 | 90.84 | 93.11 |
czech-cac-ud-2.12-230717 | Gold tokenization | — | — | 99.70 | 98.26 | 98.08 | 97.74 | 99.31 | 96.20 | 94.95 | 90.86 | 93.13 |
czech-cltt-ud-2.12-230717 | Raw text | 99.32 | 96.92 | 98.95 | 93.72 | 93.99 | 93.65 | 98.48 | 92.47 | 90.77 | 82.87 | 88.60 |
czech-cltt-ud-2.12-230717 | Gold tokenization | — | — | 99.40 | 94.18 | 94.49 | 94.07 | 99.03 | 93.17 | 91.20 | 82.92 | 88.86 |
czech-fictree-ud-2.12-230717 | Raw text | 99.99 | 98.95 | 99.14 | 97.07 | 97.93 | 96.88 | 99.29 | 96.17 | 94.67 | 89.47 | 92.40 |
czech-fictree-ud-2.12-230717 | Gold tokenization | — | — | 99.16 | 97.08 | 97.95 | 96.89 | 99.30 | 96.24 | 94.73 | 89.57 | 92.50 |
danish-ddt-ud-2.12-230717 | Raw text | 99.82 | 89.80 | 97.98 | — | 97.36 | 96.60 | 97.38 | 88.60 | 86.46 | 79.10 | 81.13 |
danish-ddt-ud-2.12-230717 | Gold tokenization | — | — | 98.17 | — | 97.59 | 96.83 | 97.52 | 89.82 | 87.64 | 80.18 | 82.15 |
dutch-alpino-ud-2.12-230717 | Raw text | 99.75 | 89.10 | 97.77 | 96.66 | 97.68 | 96.28 | 94.91 | 92.90 | 90.52 | 83.71 | 80.13 |
dutch-alpino-ud-2.12-230717 | Gold tokenization | — | — | 98.02 | 96.82 | 97.89 | 96.45 | 95.16 | 94.31 | 91.90 | 85.01 | 81.32 |
dutch-lassysmall-ud-2.12-230717 | Raw text | 99.77 | 74.71 | 97.33 | 96.15 | 96.86 | 95.50 | 96.18 | 90.99 | 88.49 | 80.77 | 79.51 |
dutch-lassysmall-ud-2.12-230717 | Gold tokenization | — | — | 97.67 | 96.84 | 97.49 | 96.29 | 96.51 | 94.63 | 91.91 | 85.25 | 84.07 |
english-ewt-ud-2.12-230717 | Raw text | 99.09 | 87.82 | 96.69 | 96.32 | 96.70 | 95.01 | 97.26 | 90.71 | 88.81 | 82.26 | 84.29 |
english-ewt-ud-2.12-230717 | Gold tokenization | — | — | 97.50 | 97.17 | 97.56 | 95.84 | 98.08 | 93.03 | 91.04 | 84.70 | 86.64 |
english-atis-ud-2.12-230717 | Raw text | 99.98 | 80.49 | 99.04 | — | 98.59 | 98.14 | 99.63 | 94.45 | 93.02 | 87.97 | 90.18 |
english-atis-ud-2.12-230717 | Gold tokenization | — | — | 99.07 | — | 98.65 | 98.21 | 99.64 | 96.17 | 94.59 | 90.22 | 92.42 |
english-eslspok-ud-2.12-230717 | Raw text | 100.00 | 92.21 | 98.59 | 98.54 | — | 98.10 | — | 95.37 | 93.87 | 91.26 | 93.01 |
english-eslspok-ud-2.12-230717 | Gold tokenization | — | — | 98.68 | 98.68 | — | 98.19 | — | 96.20 | 94.79 | 91.98 | 93.74 |
english-gum-ud-2.12-230717 | Raw text | 99.63 | 95.72 | 98.00 | 98.04 | 97.91 | 97.02 | 98.84 | 92.87 | 90.96 | 85.35 | 87.10 |
english-gum-ud-2.12-230717 | Gold tokenization | — | — | 98.34 | 98.39 | 98.27 | 97.36 | 99.15 | 93.72 | 91.79 | 86.11 | 87.79 |
english-lines-ud-2.12-230717 | Raw text | 99.92 | 87.45 | 97.68 | 96.86 | 97.03 | 94.48 | 98.33 | 91.32 | 88.55 | 80.61 | 83.70 |
english-lines-ud-2.12-230717 | Gold tokenization | — | — | 97.78 | 96.95 | 97.13 | 94.58 | 98.39 | 92.18 | 89.38 | 81.36 | 84.51 |
english-partut-ud-2.12-230717 | Raw text | 99.72 | 100.00 | 97.43 | 97.29 | 96.44 | 95.41 | 98.17 | 94.24 | 92.33 | 83.28 | 87.38 |
english-partut-ud-2.12-230717 | Gold tokenization | — | — | 97.68 | 97.54 | 96.68 | 95.69 | 98.44 | 94.42 | 92.52 | 83.71 | 87.64 |
erzya-jr-ud-2.12-230717 | Raw text | 99.18 | 94.15 | 87.94 | 87.38 | 78.90 | 73.49 | 84.89 | 72.92 | 63.24 | 41.32 | 48.34 |
erzya-jr-ud-2.12-230717 | Gold tokenization | — | — | 88.66 | 88.07 | 79.54 | 74.02 | 85.52 | 74.08 | 64.24 | 41.90 | 48.91 |
estonian-edt-ud-2.12-230717 | Raw text | 99.94 | 92.23 | 97.67 | 98.21 | 96.42 | 95.26 | 95.45 | 88.56 | 85.96 | 80.03 | 79.59 |
estonian-edt-ud-2.12-230717 | Gold tokenization | — | — | 97.79 | 98.26 | 96.50 | 95.38 | 95.52 | 89.43 | 86.81 | 80.81 | 80.32 |
estonian-ewt-ud-2.12-230717 | Raw text | 98.63 | 78.03 | 94.88 | 96.14 | 94.10 | 91.81 | 93.75 | 83.23 | 80.01 | 72.15 | 73.37 |
estonian-ewt-ud-2.12-230717 | Gold tokenization | — | — | 96.22 | 97.46 | 95.35 | 93.07 | 95.00 | 87.27 | 83.71 | 74.95 | 76.20 |
faroese-farpahc-ud-2.12-230717 | Raw text | 99.74 | 92.77 | 97.45 | 93.00 | 94.24 | 92.32 | 99.74 | 86.01 | 82.29 | 68.12 | 75.32 |
faroese-farpahc-ud-2.12-230717 | Gold tokenization | — | — | 97.67 | 93.17 | 94.49 | 92.53 | 100.00 | 86.96 | 83.20 | 69.20 | 76.51 |
finnish-tdt-ud-2.12-230717 | Raw text | 99.70 | 90.82 | 97.67 | 98.31 | 96.08 | 95.23 | 92.11 | 90.42 | 88.48 | 82.43 | 78.28 |
finnish-tdt-ud-2.12-230717 | Gold tokenization | — | — | 98.00 | 98.59 | 96.39 | 95.54 | 92.39 | 91.72 | 89.75 | 83.44 | 79.26 |
finnish-ftb-ud-2.12-230717 | Raw text | 99.91 | 86.84 | 96.70 | 95.08 | 96.78 | 94.01 | 95.76 | 90.17 | 87.35 | 80.13 | 80.78 |
finnish-ftb-ud-2.12-230717 | Gold tokenization | — | — | 97.08 | 95.29 | 96.87 | 94.35 | 95.90 | 92.40 | 89.53 | 82.71 | 83.28 |
french-gsd-ud-2.12-230717 | Raw text | 98.84 | 94.93 | 97.33 | — | 97.25 | 96.57 | 97.72 | 93.15 | 91.20 | 84.70 | 86.92 |
french-gsd-ud-2.12-230717 | Gold tokenization | — | — | 98.48 | — | 98.32 | 97.65 | 98.85 | 94.94 | 93.11 | 86.58 | 88.24 |
french-parisstories-ud-2.12-230717 | Raw text | 99.73 | 93.08 | 97.20 | — | 93.07 | 91.20 | 98.02 | 79.94 | 76.66 | 62.32 | 71.73 |
french-parisstories-ud-2.12-230717 | Gold tokenization | — | — | 97.48 | — | 93.30 | 91.45 | 98.26 | 81.17 | 77.85 | 63.20 | 72.63 |
french-partut-ud-2.12-230717 | Raw text | 99.42 | 98.64 | 97.43 | 96.97 | 95.28 | 94.51 | 97.89 | 94.44 | 92.83 | 82.58 | 87.48 |
french-partut-ud-2.12-230717 | Gold tokenization | — | — | 98.12 | 97.62 | 95.89 | 95.12 | 98.50 | 95.35 | 93.89 | 83.73 | 88.49 |
french-rhapsodie-ud-2.12-230717 | Raw text | 99.16 | 99.82 | 97.31 | 97.37 | 96.16 | 93.38 | 98.19 | 88.06 | 84.92 | 75.51 | 80.31 |
french-rhapsodie-ud-2.12-230717 | Gold tokenization | — | — | 98.19 | 98.11 | 97.02 | 94.15 | 99.00 | 89.33 | 86.14 | 76.49 | 80.94 |
french-sequoia-ud-2.12-230717 | Raw text | 99.15 | 89.53 | 98.40 | — | 97.19 | 96.84 | 98.30 | 94.06 | 92.75 | 86.39 | 89.39 |
french-sequoia-ud-2.12-230717 | Gold tokenization | — | — | 99.25 | — | 98.01 | 97.63 | 99.14 | 95.63 | 94.37 | 88.23 | 90.53 |
galician-treegal-ud-2.12-230717 | Raw text | 98.74 | 87.99 | 95.93 | 93.63 | 94.83 | 92.83 | 96.76 | 83.46 | 79.60 | 67.95 | 71.94 |
galician-treegal-ud-2.12-230717 | Gold tokenization | — | — | 97.19 | 94.72 | 95.89 | 93.88 | 97.90 | 86.99 | 82.78 | 71.43 | 75.90 |
galician-ctg-ud-2.12-230717 | Raw text | 99.22 | 97.22 | 97.24 | 97.07 | 99.05 | 96.65 | 98.12 | 85.14 | 82.72 | 71.07 | 75.57 |
galician-ctg-ud-2.12-230717 | Gold tokenization | — | — | 97.97 | 97.79 | 99.83 | 97.35 | 98.86 | 86.86 | 84.31 | 73.03 | 77.52 |
german-gsd-ud-2.12-230717 | Raw text | 99.76 | 82.68 | 96.16 | 97.53 | 90.78 | 88.15 | 96.91 | 87.04 | 83.20 | 65.39 | 75.45 |
german-gsd-ud-2.12-230717 | Gold tokenization | — | — | 96.47 | 97.80 | 91.21 | 88.63 | 97.18 | 88.79 | 84.99 | 66.97 | 77.23 |
german-hdt-ud-2.12-230717 | Raw text | 99.90 | 92.39 | 98.55 | 98.46 | 94.21 | 93.81 | 97.69 | 96.90 | 96.00 | 84.87 | 90.50 |
german-hdt-ud-2.12-230717 | Gold tokenization | — | — | 98.66 | 98.59 | 94.34 | 93.95 | 97.79 | 97.60 | 96.71 | 85.54 | 91.20 |
gothic-proiel-ud-2.12-230717 | Raw text | 100.00 | 31.12 | 96.17 | 96.70 | 89.88 | 87.80 | 94.71 | 78.72 | 72.74 | 58.81 | 63.43 |
gothic-proiel-ud-2.12-230717 | Gold tokenization | — | — | 96.85 | 97.22 | 91.01 | 89.27 | 94.78 | 86.66 | 80.81 | 68.63 | 72.61 |
greek-gdt-ud-2.12-230717 | Raw text | 99.87 | 90.19 | 98.19 | 98.21 | 95.72 | 95.10 | 96.09 | 92.99 | 91.18 | 81.67 | 81.75 |
greek-gdt-ud-2.12-230717 | Gold tokenization | — | — | 98.32 | 98.34 | 95.83 | 95.20 | 96.17 | 93.80 | 91.91 | 82.25 | 82.38 |
greek-gud-ud-2.12-230717 | Raw text | 99.92 | 94.98 | 97.01 | 96.26 | 94.24 | 90.55 | 95.76 | 92.94 | 90.06 | 75.94 | 80.42 |
greek-gud-ud-2.12-230717 | Gold tokenization | — | — | 97.11 | 96.32 | 94.32 | 90.65 | 95.83 | 93.59 | 90.68 | 76.44 | 80.91 |
hebrew-htb-ud-2.12-230717 | Raw text | 85.10 | 99.69 | 82.96 | 82.95 | 81.30 | 80.69 | 83.02 | 70.71 | 68.21 | 55.84 | 60.10 |
hebrew-htb-ud-2.12-230717 | Gold tokenization | — | — | 97.64 | 97.62 | 95.79 | 95.27 | 97.36 | 92.45 | 89.94 | 79.28 | 82.12 |
hebrew-iahltwiki-ud-2.12-230717 | Raw text | 88.54 | 97.16 | 85.97 | 85.97 | 81.45 | 80.46 | 87.15 | 76.11 | 74.26 | 57.99 | 67.30 |
hebrew-iahltwiki-ud-2.12-230717 | Gold tokenization | — | — | 97.09 | 97.09 | 92.18 | 91.10 | 98.29 | 93.66 | 91.26 | 75.14 | 85.53 |
hindi-hdtb-ud-2.12-230717 | Raw text | 100.00 | 98.72 | 97.74 | 97.35 | 94.23 | 92.39 | 98.93 | 95.31 | 92.43 | 79.64 | 87.78 |
hindi-hdtb-ud-2.12-230717 | Gold tokenization | — | — | 97.74 | 97.34 | 94.25 | 92.40 | 98.94 | 95.43 | 92.55 | 79.77 | 87.94 |
hungarian-szeged-ud-2.12-230717 | Raw text | 99.85 | 95.89 | 96.68 | — | 94.18 | 93.47 | 94.89 | 88.56 | 84.89 | 74.96 | 78.33 |
hungarian-szeged-ud-2.12-230717 | Gold tokenization | — | — | 96.77 | — | 94.31 | 93.57 | 95.01 | 88.99 | 85.30 | 75.22 | 78.65 |
icelandic-modern-ud-2.12-230717 | Raw text | 99.37 | 94.59 | 97.58 | 95.34 | 88.49 | 85.62 | 96.87 | 86.05 | 83.30 | 64.64 | 75.54 |
icelandic-modern-ud-2.12-230717 | Gold tokenization | — | — | 98.15 | 95.92 | 88.93 | 86.07 | 97.45 | 87.03 | 84.17 | 65.30 | 76.47 |
icelandic-gc-ud-2.12-230717 | Raw text | 99.72 | 94.64 | 94.72 | 82.28 | 85.01 | 79.83 | 91.64 | 83.22 | 78.78 | 58.56 | 68.85 |
icelandic-gc-ud-2.12-230717 | Gold tokenization | — | — | 95.06 | 82.71 | 85.52 | 80.34 | 91.81 | 84.14 | 79.66 | 59.21 | 69.49 |
icelandic-icepahc-ud-2.12-230717 | Raw text | 99.80 | 92.67 | 96.90 | 93.31 | 92.01 | 87.13 | 96.24 | 87.30 | 83.46 | 66.91 | 74.57 |
icelandic-icepahc-ud-2.12-230717 | Gold tokenization | — | — | 97.08 | 93.55 | 92.18 | 87.35 | 96.39 | 87.85 | 83.95 | 67.40 | 75.14 |
indonesian-gsd-ud-2.12-230717 | Raw text | 99.49 | 92.35 | 94.35 | 94.03 | 95.77 | 89.10 | 98.12 | 87.62 | 81.71 | 72.43 | 76.99 |
indonesian-gsd-ud-2.12-230717 | Gold tokenization | — | — | 94.79 | 94.41 | 96.17 | 89.42 | 98.52 | 88.41 | 82.43 | 73.18 | 77.77 |
indonesian-csui-ud-2.12-230717 | Raw text | 99.45 | 91.01 | 95.88 | 96.07 | 96.66 | 95.33 | 98.11 | 85.95 | 81.63 | 76.18 | 78.21 |
indonesian-csui-ud-2.12-230717 | Gold tokenization | — | — | 96.34 | 96.58 | 97.15 | 95.78 | 98.74 | 87.26 | 82.71 | 77.14 | 79.20 |
irish-idt-ud-2.12-230717 | Raw text | 99.88 | 97.58 | 96.04 | 94.90 | 90.84 | 87.69 | 95.85 | 86.48 | 80.95 | 64.53 | 71.46 |
irish-idt-ud-2.12-230717 | Gold tokenization | — | — | 96.13 | 95.08 | 90.97 | 87.85 | 95.97 | 86.75 | 81.19 | 64.56 | 71.53 |
irish-twittirish-ud-2.12-230717 | Raw text | 98.50 | 46.62 | 90.58 | — | — | 90.58 | 88.41 | 78.58 | 72.34 | 58.38 | 56.97 |
irish-twittirish-ud-2.12-230717 | Gold tokenization | — | — | 91.80 | — | — | 91.80 | 89.57 | 85.75 | 79.31 | 66.72 | 64.16 |
italian-isdt-ud-2.12-230717 | Raw text | 99.74 | 99.07 | 98.44 | 98.38 | 98.14 | 97.64 | 98.68 | 94.73 | 93.05 | 86.79 | 88.06 |
italian-isdt-ud-2.12-230717 | Gold tokenization | — | — | 98.71 | 98.64 | 98.39 | 97.89 | 98.95 | 95.14 | 93.47 | 87.19 | 88.54 |
italian-markit-ud-2.12-230717 | Raw text | 99.62 | 98.24 | 96.96 | 97.13 | 94.12 | 92.60 | 88.18 | 88.60 | 84.72 | 70.64 | 77.87 |
italian-markit-ud-2.12-230717 | Gold tokenization | — | — | 97.35 | 97.52 | 94.39 | 92.88 | 88.50 | 89.39 | 85.51 | 71.25 | 78.65 |
italian-parlamint-ud-2.12-230717 | Raw text | 99.42 | 94.12 | 98.59 | 97.96 | 97.95 | 97.05 | 98.63 | 91.93 | 89.97 | 84.20 | 86.02 |
italian-parlamint-ud-2.12-230717 | Gold tokenization | — | — | 99.17 | 98.52 | 98.50 | 97.58 | 99.16 | 93.41 | 91.44 | 85.80 | 87.60 |
italian-partut-ud-2.12-230717 | Raw text | 99.73 | 100.00 | 98.41 | 98.41 | 98.19 | 97.64 | 98.57 | 96.15 | 94.15 | 87.84 | 88.90 |
italian-partut-ud-2.12-230717 | Gold tokenization | — | — | 98.52 | 98.52 | 98.27 | 97.72 | 98.82 | 96.18 | 94.09 | 87.68 | 88.80 |
italian-postwita-ud-2.12-230717 | Raw text | 99.36 | 49.53 | 96.58 | 96.33 | 96.33 | 94.80 | 96.62 | 82.80 | 79.03 | 68.81 | 70.55 |
italian-postwita-ud-2.12-230717 | Gold tokenization | — | — | 97.17 | 96.95 | 96.90 | 95.44 | 97.25 | 87.96 | 83.79 | 75.19 | 76.95 |
italian-twittiro-ud-2.12-230717 | Raw text | 98.94 | 46.67 | 95.91 | 95.74 | 94.94 | 93.42 | 94.57 | 83.09 | 78.65 | 66.30 | 66.97 |
italian-twittiro-ud-2.12-230717 | Gold tokenization | — | — | 96.84 | 96.57 | 95.87 | 94.15 | 95.43 | 88.34 | 83.43 | 71.82 | 72.62 |
italian-vit-ud-2.12-230717 | Raw text | 99.74 | 94.87 | 98.11 | 97.29 | 97.65 | 96.15 | 98.87 | 92.22 | 89.43 | 81.28 | 84.07 |
italian-vit-ud-2.12-230717 | Gold tokenization | — | — | 98.35 | 97.65 | 97.87 | 96.51 | 99.10 | 92.94 | 90.12 | 81.99 | 84.78 |
japanese-gsdluw-ud-2.12-230717 | Raw text | 95.18 | 99.72 | 93.82 | 93.50 | 95.18 | 93.44 | 93.56 | 86.27 | 85.58 | 76.26 | 76.41 |
japanese-gsdluw-ud-2.12-230717 | Gold tokenization | — | — | 98.36 | 98.01 | 100.00 | 97.90 | 97.78 | 95.16 | 94.19 | 86.28 | 84.89 |
japanese-gsd-ud-2.12-230717 | Raw text | 96.17 | 100.00 | 94.97 | 94.18 | 96.16 | 93.85 | 95.03 | 87.91 | 87.07 | 80.80 | 80.98 |
japanese-gsd-ud-2.12-230717 | Gold tokenization | — | — | 98.59 | 97.52 | 99.99 | 97.20 | 98.47 | 94.93 | 93.94 | 88.80 | 88.47 |
korean-kaist-ud-2.12-230717 | Raw text | 100.00 | 100.00 | 96.19 | 87.78 | — | 87.58 | 94.18 | 88.85 | 86.92 | 82.77 | 80.35 |
korean-kaist-ud-2.12-230717 | Gold tokenization | — | — | 96.19 | 87.78 | — | 87.58 | 94.18 | 88.85 | 86.92 | 82.77 | 80.35 |
korean-gsd-ud-2.12-230717 | Raw text | 99.87 | 93.93 | 96.54 | 90.07 | 99.67 | 87.94 | 93.62 | 87.88 | 83.98 | 80.68 | 76.82 |
korean-gsd-ud-2.12-230717 | Gold tokenization | — | — | 96.72 | 90.24 | 99.79 | 88.12 | 93.74 | 88.69 | 84.76 | 81.49 | 77.58 |
latin-ittb-ud-2.12-230717 | Raw text | 99.99 | 91.21 | 99.01 | 96.65 | 97.07 | 95.62 | 99.16 | 90.25 | 88.31 | 82.53 | 85.95 |
latin-ittb-ud-2.12-230717 | Gold tokenization | — | — | 99.03 | 96.66 | 97.12 | 95.64 | 99.18 | 91.28 | 89.35 | 83.17 | 86.55 |
latin-llct-ud-2.12-230717 | Raw text | 99.99 | 99.49 | 99.75 | 97.14 | 97.15 | 96.87 | 97.76 | 95.37 | 94.37 | 89.06 | 90.39 |
latin-llct-ud-2.12-230717 | Gold tokenization | — | — | 99.75 | 97.14 | 97.16 | 96.87 | 97.77 | 95.39 | 94.39 | 89.07 | 90.41 |
latin-perseus-ud-2.12-230717 | Raw text | 99.95 | 98.99 | 92.88 | 81.11 | 84.60 | 77.45 | 88.86 | 78.92 | 71.78 | 53.50 | 59.27 |
latin-perseus-ud-2.12-230717 | Gold tokenization | — | — | 92.95 | 81.14 | 84.65 | 77.49 | 88.89 | 79.08 | 71.91 | 53.59 | 59.31 |
latin-proiel-ud-2.12-230717 | Raw text | 99.85 | 37.40 | 96.60 | 96.69 | 90.66 | 89.40 | 96.19 | 76.66 | 72.46 | 59.50 | 64.96 |
latin-proiel-ud-2.12-230717 | Gold tokenization | — | — | 97.02 | 97.12 | 91.43 | 90.31 | 96.42 | 83.88 | 79.55 | 67.88 | 73.35 |
latin-udante-ud-2.12-230717 | Raw text | 99.61 | 98.81 | 90.94 | 75.50 | 84.14 | 72.23 | 86.97 | 75.88 | 68.63 | 47.63 | 51.46 |
latin-udante-ud-2.12-230717 | Gold tokenization | — | — | 91.18 | 75.56 | 84.38 | 72.29 | 87.20 | 75.95 | 68.65 | 47.73 | 51.44 |
latvian-lvtb-ud-2.12-230717 | Raw text | 99.29 | 97.80 | 96.79 | 90.92 | 94.75 | 90.16 | 96.57 | 88.79 | 85.85 | 77.56 | 81.04 |
latvian-lvtb-ud-2.12-230717 | Gold tokenization | — | — | 97.44 | 91.56 | 95.44 | 90.77 | 97.21 | 89.95 | 86.94 | 78.74 | 82.16 |
lithuanian-alksnis-ud-2.12-230717 | Raw text | 99.91 | 87.87 | 95.95 | 90.31 | 91.09 | 89.50 | 93.45 | 82.74 | 78.94 | 68.03 | 71.20 |
lithuanian-alksnis-ud-2.12-230717 | Gold tokenization | — | — | 96.08 | 90.44 | 91.27 | 89.64 | 93.56 | 83.94 | 80.08 | 68.93 | 72.06 |
lithuanian-hse-ud-2.12-230717 | Raw text | 97.30 | 97.30 | 89.93 | 90.03 | 81.92 | 79.03 | 88.16 | 71.85 | 62.63 | 44.48 | 53.82 |
lithuanian-hse-ud-2.12-230717 | Gold tokenization | — | — | 91.32 | 91.42 | 83.40 | 80.09 | 90.75 | 75.00 | 65.00 | 45.87 | 55.84 |
maghrebi_arabic_french-arabizi-ud-2.12-230717 | Raw text | 91.65 | 7.00 | 78.81 | 71.59 | 82.65 | 69.81 | 50.63 | 57.90 | 49.93 | 36.22 | 24.59 |
maghrebi_arabic_french-arabizi-ud-2.12-230717 | Gold tokenization | — | — | 86.69 | 78.89 | 90.82 | 77.88 | 54.66 | 76.32 | 65.86 | 47.72 | 31.57 |
maltese-mudt-ud-2.12-230717 | Raw text | 99.84 | 86.29 | 95.73 | 95.79 | — | 95.31 | — | 85.08 | 80.25 | 68.88 | 73.05 |
maltese-mudt-ud-2.12-230717 | Gold tokenization | — | — | 95.87 | 95.92 | — | 95.46 | — | 85.70 | 80.81 | 69.28 | 73.43 |
manx-cadhan-ud-2.12-230717 | Raw text | 97.36 | 98.25 | 94.18 | — | 95.78 | 93.37 | 93.43 | 87.42 | 84.02 | 77.75 | 78.03 |
manx-cadhan-ud-2.12-230717 | Gold tokenization | — | — | 96.77 | — | 98.39 | 95.93 | 95.98 | 92.47 | 89.13 | 83.16 | 82.07 |
marathi-ufal-ud-2.12-230717 | Raw text | 94.16 | 92.63 | 82.73 | — | 74.21 | 70.80 | 84.91 | 68.13 | 60.83 | 39.75 | 47.62 |
marathi-ufal-ud-2.12-230717 | Gold tokenization | — | — | 87.14 | — | 76.94 | 73.06 | 87.86 | 73.79 | 65.29 | 41.82 | 50.10 |
naija-nsc-ud-2.12-230717 | Raw text | 99.95 | 100.00 | 98.04 | — | 98.92 | 97.53 | 99.33 | 93.02 | 90.46 | 87.48 | 88.98 |
naija-nsc-ud-2.12-230717 | Gold tokenization | — | — | 98.08 | — | 98.96 | 97.56 | 99.39 | 93.10 | 90.53 | 87.55 | 89.03 |
north_sami-giella-ud-2.12-230717 | Raw text | 99.87 | 98.79 | 91.63 | 93.51 | 89.29 | 85.24 | 86.91 | 75.78 | 70.85 | 60.16 | 58.42 |
north_sami-giella-ud-2.12-230717 | Gold tokenization | — | — | 91.78 | 93.65 | 89.42 | 85.38 | 87.01 | 76.01 | 71.08 | 60.38 | 58.60 |
norwegian-bokmaal-ud-2.12-230717 | Raw text | 99.82 | 97.27 | 98.38 | 98.94 | 97.47 | 96.81 | 98.58 | 93.88 | 92.64 | 86.95 | 88.78 |
norwegian-bokmaal-ud-2.12-230717 | Gold tokenization | — | — | 98.58 | 99.14 | 97.65 | 97.01 | 98.78 | 94.54 | 93.28 | 87.54 | 89.41 |
norwegian-nynorsk-ud-2.12-230717 | Raw text | 99.93 | 94.54 | 98.43 | 99.16 | 97.35 | 96.55 | 98.46 | 93.79 | 92.37 | 85.84 | 88.08 |
norwegian-nynorsk-ud-2.12-230717 | Gold tokenization | — | — | 98.62 | 99.27 | 97.51 | 96.75 | 98.60 | 94.58 | 93.19 | 86.82 | 89.08 |
old_church_slavonic-proiel-ud-2.12-230717 | Raw text | 100.00 | 40.05 | 96.11 | 96.39 | 89.72 | 88.01 | 90.18 | 78.06 | 73.64 | 60.96 | 61.97 |
old_church_slavonic-proiel-ud-2.12-230717 | Gold tokenization | — | — | 96.66 | 96.98 | 90.32 | 88.99 | 90.19 | 85.05 | 80.53 | 68.59 | 69.15 |
old_east_slavic-torot-ud-2.12-230717 | Raw text | 100.00 | 34.53 | 95.40 | 95.48 | 89.92 | 87.64 | 88.09 | 76.94 | 72.12 | 58.23 | 57.89 |
old_east_slavic-torot-ud-2.12-230717 | Gold tokenization | — | — | 95.89 | 95.94 | 90.67 | 88.72 | 88.14 | 85.27 | 80.38 | 67.99 | 65.94 |
old_east_slavic-birchbark-ud-2.12-230717 | Raw text | 99.99 | 16.66 | 88.50 | 99.37 | 76.09 | 72.03 | 65.58 | 64.39 | 57.66 | 32.84 | 27.49 |
old_east_slavic-birchbark-ud-2.12-230717 | Gold tokenization | — | — | 89.13 | 99.38 | 76.70 | 72.84 | 65.59 | 76.43 | 69.19 | 41.15 | 32.90 |
old_east_slavic-rnc-ud-2.12-230717 | Raw text | 97.64 | 60.48 | 92.65 | 86.46 | 78.10 | 68.76 | 77.22 | 64.58 | 60.03 | 36.90 | 37.52 |
old_east_slavic-rnc-ud-2.12-230717 | Gold tokenization | — | — | 93.99 | 88.74 | 79.24 | 69.78 | 78.28 | 70.79 | 65.17 | 39.63 | 40.04 |
old_french-srcmf-ud-2.12-230717 | Raw text | 99.70 | 100.00 | 96.74 | 96.58 | 97.76 | 95.78 | 99.66 | 90.92 | 87.17 | 80.65 | 84.10 |
old_french-srcmf-ud-2.12-230717 | Gold tokenization | — | — | 97.06 | 96.91 | 98.07 | 96.08 | 99.96 | 91.34 | 87.62 | 81.11 | 84.56 |
persian-perdt-ud-2.12-230717 | Raw text | 99.66 | 99.83 | 97.51 | 97.39 | 97.64 | 95.63 | 98.91 | 93.56 | 91.32 | 86.30 | 88.68 |
persian-perdt-ud-2.12-230717 | Gold tokenization | — | — | 97.81 | 97.68 | 97.94 | 95.92 | 99.23 | 94.08 | 91.84 | 86.86 | 89.27 |
persian-seraji-ud-2.12-230717 | Raw text | 99.65 | 98.75 | 97.89 | 97.91 | 97.90 | 97.44 | 98.26 | 91.78 | 88.92 | 84.47 | 84.56 |
persian-seraji-ud-2.12-230717 | Gold tokenization | — | — | 98.22 | 98.23 | 98.22 | 97.74 | 98.54 | 92.47 | 89.62 | 85.12 | 85.22 |
polish-pdb-ud-2.12-230717 | Raw text | 99.85 | 97.33 | 98.86 | 95.76 | 95.89 | 95.08 | 98.08 | 94.21 | 92.16 | 85.14 | 88.25 |
polish-pdb-ud-2.12-230717 | Gold tokenization | — | — | 99.02 | 95.92 | 96.04 | 95.22 | 98.21 | 94.72 | 92.67 | 85.55 | 88.68 |
polish-lfg-ud-2.12-230717 | Raw text | 99.85 | 99.65 | 98.97 | 96.08 | 96.49 | 95.15 | 98.24 | 96.78 | 95.40 | 89.61 | 92.26 |
polish-lfg-ud-2.12-230717 | Gold tokenization | — | — | 99.13 | 96.26 | 96.67 | 95.32 | 98.38 | 97.19 | 95.81 | 90.02 | 92.60 |
pomak-philotis-ud-2.12-230717 | Raw text | 99.98 | 94.49 | 98.80 | — | 95.54 | 95.26 | 96.71 | 88.17 | 83.04 | 70.60 | 73.82 |
pomak-philotis-ud-2.12-230717 | Gold tokenization | — | — | 98.82 | — | 95.54 | 95.26 | 96.73 | 88.63 | 83.54 | 71.09 | 74.23 |
portuguese-bosque-ud-2.12-230717 | Raw text | 99.68 | 89.73 | 97.88 | — | 96.86 | 95.87 | 98.27 | 92.13 | 89.83 | 80.75 | 84.19 |
portuguese-bosque-ud-2.12-230717 | Gold tokenization | — | — | 98.16 | — | 97.11 | 96.11 | 98.57 | 93.28 | 90.94 | 81.78 | 85.38 |
portuguese-cintil-ud-2.12-230717 | Raw text | 99.41 | 78.66 | 97.42 | 96.01 | 95.29 | 93.21 | 97.66 | 85.13 | 81.88 | 71.77 | 75.51 |
portuguese-cintil-ud-2.12-230717 | Gold tokenization | — | — | 98.00 | 96.61 | 95.91 | 93.80 | 98.23 | 87.54 | 84.20 | 74.06 | 77.83 |
portuguese-petrogold-ud-2.12-230717 | Raw text | 99.59 | 93.11 | 98.75 | — | 98.70 | 98.20 | 99.09 | 94.69 | 93.57 | 88.77 | 90.11 |
portuguese-petrogold-ud-2.12-230717 | Gold tokenization | — | — | 99.05 | — | 99.00 | 98.46 | 99.51 | 95.62 | 94.42 | 89.68 | 91.18 |
romanian-rrt-ud-2.12-230717 | Raw text | 99.70 | 95.50 | 97.88 | 97.14 | 97.39 | 96.91 | 98.00 | 91.92 | 88.46 | 81.93 | 83.37 |
romanian-rrt-ud-2.12-230717 | Gold tokenization | — | — | 98.15 | 97.40 | 97.65 | 97.16 | 98.25 | 92.77 | 89.25 | 82.56 | 83.97 |
romanian-nonstandard-ud-2.12-230717 | Raw text | 98.83 | 96.77 | 96.12 | 91.86 | 90.52 | 89.16 | 94.82 | 88.67 | 84.67 | 68.20 | 76.17 |
romanian-nonstandard-ud-2.12-230717 | Gold tokenization | — | — | 97.25 | 92.87 | 91.51 | 90.13 | 95.88 | 90.46 | 86.41 | 69.79 | 77.51 |
romanian-simonero-ud-2.12-230717 | Raw text | 99.84 | 100.00 | 98.41 | 97.97 | 97.51 | 97.20 | 98.89 | 93.95 | 92.03 | 85.30 | 88.19 |
romanian-simonero-ud-2.12-230717 | Gold tokenization | — | — | 98.56 | 98.12 | 97.66 | 97.34 | 99.04 | 94.29 | 92.35 | 85.57 | 88.47 |
russian-syntagrus-ud-2.12-230717 | Raw text | 99.67 | 98.31 | 98.50 | — | 94.02 | 93.76 | 98.19 | 93.80 | 91.66 | 82.80 | 88.87 |
russian-syntagrus-ud-2.12-230717 | Gold tokenization | — | — | 98.83 | — | 94.33 | 94.07 | 98.48 | 94.51 | 92.33 | 83.33 | 89.40 |
russian-gsd-ud-2.12-230717 | Raw text | 99.50 | 96.49 | 98.05 | 97.56 | 94.57 | 93.51 | 96.87 | 91.64 | 88.70 | 80.95 | 84.53 |
russian-gsd-ud-2.12-230717 | Gold tokenization | — | — | 98.50 | 97.97 | 95.01 | 93.90 | 97.25 | 92.83 | 89.81 | 81.85 | 85.46 |
russian-taiga-ud-2.12-230717 | Raw text | 98.07 | 86.01 | 95.59 | — | 93.02 | 92.05 | 94.62 | 82.89 | 79.47 | 70.47 | 73.61 |
russian-taiga-ud-2.12-230717 | Gold tokenization | — | — | 97.25 | — | 94.80 | 93.67 | 96.34 | 85.53 | 81.91 | 72.70 | 76.02 |
sanskrit-vedic-ud-2.12-230717 | Raw text | 100.00 | 27.18 | 89.20 | — | 81.19 | 76.40 | 87.11 | 61.05 | 50.02 | 41.42 | 44.77 |
sanskrit-vedic-ud-2.12-230717 | Gold tokenization | — | — | 90.07 | — | 82.70 | 78.17 | 87.40 | 73.69 | 61.55 | 51.56 | 54.91 |
scottish_gaelic-arcosg-ud-2.12-230717 | Raw text | 97.43 | 61.26 | 93.66 | 89.50 | 91.07 | 88.34 | 94.92 | 80.79 | 76.33 | 64.78 | 69.63 |
scottish_gaelic-arcosg-ud-2.12-230717 | Gold tokenization | — | — | 96.40 | 92.49 | 94.04 | 91.41 | 97.47 | 87.25 | 82.84 | 71.75 | 76.18 |
serbian-set-ud-2.12-230717 | Raw text | 99.99 | 93.00 | 99.03 | 95.94 | 96.13 | 95.71 | 97.82 | 93.48 | 90.99 | 83.45 | 86.89 |
serbian-set-ud-2.12-230717 | Gold tokenization | — | — | 99.05 | 95.95 | 96.16 | 95.74 | 97.83 | 94.18 | 91.65 | 84.17 | 87.56 |
slovak-snk-ud-2.12-230717 | Raw text | 100.00 | 81.69 | 97.68 | 90.26 | 93.35 | 89.42 | 96.51 | 91.48 | 89.73 | 80.19 | 84.62 |
slovak-snk-ud-2.12-230717 | Gold tokenization | — | — | 97.83 | 90.34 | 93.44 | 89.56 | 96.54 | 93.88 | 92.00 | 82.42 | 86.89 |
slovenian-ssj-ud-2.12-230717 | Raw text | 99.94 | 98.95 | 98.96 | 97.09 | 97.26 | 96.78 | 98.57 | 94.23 | 92.86 | 87.37 | 89.22 |
slovenian-ssj-ud-2.12-230717 | Gold tokenization | — | — | 99.02 | 97.15 | 97.33 | 96.84 | 98.61 | 94.40 | 93.02 | 87.52 | 89.34 |
slovenian-sst-ud-2.12-230717 | Raw text | 99.97 | 24.74 | 95.82 | 93.37 | 93.58 | 91.64 | 97.69 | 66.76 | 61.98 | 52.04 | 55.84 |
slovenian-sst-ud-2.12-230717 | Gold tokenization | — | — | 96.10 | 93.49 | 93.79 | 91.91 | 97.73 | 78.52 | 73.20 | 63.83 | 68.34 |
spanish-ancora-ud-2.12-230717 | Raw text | 99.95 | 98.78 | 99.06 | 96.12 | 98.76 | 95.71 | 99.39 | 93.68 | 91.92 | 86.79 | 88.29 |
spanish-ancora-ud-2.12-230717 | Gold tokenization | — | — | 99.11 | 96.16 | 98.81 | 95.75 | 99.43 | 93.87 | 92.10 | 86.95 | 88.45 |
spanish-gsd-ud-2.12-230717 | Raw text | 99.72 | 94.90 | 97.10 | — | 96.74 | 95.07 | 98.58 | 92.51 | 90.28 | 78.76 | 84.27 |
spanish-gsd-ud-2.12-230717 | Gold tokenization | — | — | 97.36 | — | 97.01 | 95.32 | 98.83 | 93.38 | 91.10 | 79.58 | 85.17 |
swedish-talbanken-ud-2.12-230717 | Raw text | 99.84 | 96.53 | 98.37 | 97.23 | 97.31 | 96.43 | 98.17 | 92.08 | 89.72 | 83.69 | 85.90 |
swedish-talbanken-ud-2.12-230717 | Gold tokenization | — | — | 98.53 | 97.42 | 97.49 | 96.61 | 98.33 | 92.53 | 90.15 | 84.20 | 86.44 |
swedish-lines-ud-2.12-230717 | Raw text | 99.96 | 88.00 | 97.61 | 95.49 | 90.93 | 88.18 | 97.82 | 90.48 | 87.14 | 71.66 | 81.94 |
swedish-lines-ud-2.12-230717 | Gold tokenization | — | — | 97.71 | 95.50 | 90.96 | 88.23 | 97.86 | 91.19 | 87.78 | 72.25 | 82.63 |
tamil-ttb-ud-2.12-230717 | Raw text | 94.26 | 97.52 | 84.29 | 82.92 | 84.09 | 77.76 | 88.79 | 70.63 | 61.98 | 50.53 | 54.69 |
tamil-ttb-ud-2.12-230717 | Gold tokenization | — | — | 89.14 | 87.13 | 89.19 | 81.80 | 93.87 | 78.38 | 68.88 | 56.48 | 60.89 |
telugu-mtg-ud-2.12-230717 | Raw text | 99.58 | 96.62 | 92.94 | 92.94 | 98.61 | 92.94 | — | 90.72 | 84.07 | 76.19 | 80.19 |
telugu-mtg-ud-2.12-230717 | Gold tokenization | — | — | 93.48 | 93.48 | 99.03 | 93.48 | — | 91.68 | 85.02 | 77.22 | 81.03 |
turkish-boun-ud-2.12-230717 | Raw text | 96.57 | 86.25 | 90.03 | 86.05 | 81.12 | 71.69 | 90.47 | 73.32 | 67.11 | 49.09 | 60.77 |
turkish-boun-ud-2.12-230717 | Gold tokenization | — | — | 92.97 | 88.91 | 83.07 | 73.35 | 93.52 | 80.79 | 73.91 | 52.78 | 66.50 |
turkish-atis-ud-2.12-230717 | Raw text | 100.00 | 80.20 | 99.02 | — | 98.57 | 98.40 | 99.11 | 89.03 | 87.24 | 84.91 | 85.71 |
turkish-atis-ud-2.12-230717 | Gold tokenization | — | — | 99.04 | — | 98.57 | 98.42 | 99.11 | 90.96 | 89.09 | 86.83 | 87.69 |
turkish-framenet-ud-2.12-230717 | Raw text | 100.00 | 100.00 | 96.52 | — | 94.75 | 93.87 | 96.39 | 93.73 | 84.80 | 74.12 | 77.96 |
turkish-framenet-ud-2.12-230717 | Gold tokenization | — | — | 96.52 | — | 94.75 | 93.87 | 96.39 | 93.73 | 84.80 | 74.12 | 77.96 |
turkish-imst-ud-2.12-230717 | Raw text | 97.94 | 97.70 | 93.70 | 93.46 | 90.63 | 88.13 | 94.46 | 75.22 | 66.28 | 55.69 | 60.80 |
turkish-imst-ud-2.12-230717 | Gold tokenization | — | — | 95.46 | 95.27 | 92.41 | 89.75 | 96.32 | 78.94 | 69.42 | 57.62 | 62.96 |
turkish-kenet-ud-2.12-230717 | Raw text | 100.00 | 98.12 | 93.80 | — | 92.10 | 90.85 | 93.50 | 83.94 | 71.51 | 62.32 | 65.35 |
turkish-kenet-ud-2.12-230717 | Gold tokenization | — | — | 93.83 | — | 92.12 | 90.89 | 93.51 | 84.10 | 71.63 | 62.46 | 65.49 |
turkish-penn-ud-2.12-230717 | Raw text | 99.34 | 80.59 | 95.50 | — | 94.48 | 93.29 | 94.14 | 84.40 | 71.73 | 62.08 | 64.36 |
turkish-penn-ud-2.12-230717 | Gold tokenization | — | — | 96.14 | — | 95.12 | 93.94 | 94.71 | 86.84 | 73.89 | 63.46 | 65.77 |
turkish-tourism-ud-2.12-230717 | Raw text | 99.96 | 99.86 | 98.92 | — | 94.98 | 94.67 | 98.27 | 97.04 | 91.43 | 81.58 | 87.09 |
turkish-tourism-ud-2.12-230717 | Gold tokenization | — | — | 98.96 | — | 95.02 | 94.72 | 98.31 | 97.10 | 91.50 | 81.66 | 87.18 |
turkish_german-sagt-ud-2.12-230717 | Raw text | 98.91 | 99.44 | 90.21 | — | 80.24 | 75.45 | 90.80 | 71.42 | 61.22 | 41.16 | 50.92 |
turkish_german-sagt-ud-2.12-230717 | Gold tokenization | — | — | 91.09 | — | 80.82 | 75.93 | 91.49 | 72.76 | 62.20 | 41.63 | 51.55 |
ukrainian-iu-ud-2.12-230717 | Raw text | 99.81 | 96.23 | 97.84 | 94.28 | 94.25 | 93.16 | 97.47 | 90.37 | 87.94 | 78.30 | 82.74 |
ukrainian-iu-ud-2.12-230717 | Gold tokenization | — | — | 98.03 | 94.44 | 94.39 | 93.31 | 97.67 | 90.97 | 88.52 | 78.70 | 83.18 |
urdu-udtb-ud-2.12-230717 | Raw text | 100.00 | 98.31 | 94.09 | 92.20 | 82.76 | 78.43 | 97.41 | 88.02 | 82.68 | 57.25 | 74.71 |
urdu-udtb-ud-2.12-230717 | Gold tokenization | — | — | 94.06 | 92.19 | 82.76 | 78.41 | 97.41 | 88.13 | 82.81 | 57.30 | 74.88 |
uyghur-udt-ud-2.12-230717 | Raw text | 99.54 | 81.87 | 89.77 | 91.72 | 88.23 | 80.82 | 94.71 | 75.32 | 64.44 | 50.04 | 57.14 |
uyghur-udt-ud-2.12-230717 | Gold tokenization | — | — | 90.23 | 92.21 | 88.65 | 81.27 | 95.22 | 77.05 | 66.02 | 50.95 | 58.33 |
vietnamese-vtb-ud-2.12-230717 | Raw text | 86.06 | 93.73 | 78.61 | 77.61 | — | 77.50 | 85.76 | 56.86 | 50.02 | 41.40 | 45.54 |
vietnamese-vtb-ud-2.12-230717 | Gold tokenization | — | — | 90.02 | 88.88 | — | 88.73 | 99.50 | 76.31 | 65.90 | 55.36 | 61.24 |
welsh-ccg-ud-2.12-230717 | Raw text | 99.46 | 97.68 | 95.27 | 94.28 | 89.57 | 87.25 | 94.43 | 86.73 | 80.81 | 63.09 | 69.84 |
welsh-ccg-ud-2.12-230717 | Gold tokenization | — | — | 95.74 | 94.73 | 90.03 | 87.66 | 94.94 | 87.83 | 81.85 | 64.04 | 70.94 |
western_armenian-armtdp-ud-2.12-230717 | Raw text | 99.89 | 98.68 | 96.67 | — | 92.31 | 91.63 | 97.14 | 89.26 | 84.68 | 69.80 | 76.24 |
western_armenian-armtdp-ud-2.12-230717 | Gold tokenization | — | — | 96.75 | — | 92.40 | 91.71 | 97.23 | 89.51 | 84.91 | 69.99 | 76.44 |
wolof-wtb-ud-2.12-230717 | Raw text | 99.23 | 91.95 | 94.16 | 94.17 | 93.56 | 91.48 | 95.18 | 83.75 | 78.61 | 66.55 | 69.91 |
wolof-wtb-ud-2.12-230717 | Gold tokenization | — | — | 95.11 | 95.08 | 94.34 | 92.33 | 95.88 | 85.90 | 80.57 | 68.67 | 71.71 |
Universal Dependencies 2.10 Models are distributed under theCC BY-NC-SA licence.The models are based solely onUniversal Dependencies2.10 treebanks, and additionallyusemultilingual BERTandRobeCzech.
The models requireUDPipe 2.
The latest version 220711 of the Universal Dependencies 2.10 modelscan be downloaded fromLINDAT/CLARIN repository.
The models are also available in theREST service.
This work has been supported by the Ministry of Education, Youth and Sports ofthe Czech Republic, Project No. LM2018101 LINDAT/CLARIAH-CZ.
The models were trained onUniversal Dependencies 2.10 treebanks.
For the UD treebanks which do not contain original plain text version,raw text is used to train the tokenizer instead. The plain textswere taken from theW2C -- Web to Corpus.
Finally,multilingual BERTandRobeCzech are used to providecontextualized word embeddings.
The Universal Dependencies 2.10 models contain 123 models of 69 languages, eachconsisting of a tokenizer, tagger, lemmatizer and dependency parser, all trainedusing the UD data. We used the original train-dev-test split, but for treebankswith only train and no dev data we used last 10% of the train data as dev data.We produce models only for treebanks with at least 1000 training words.
The tokenizer is trained using theSpaceAfter=No
features. If the featuresare not present in the data, they can be filled in using raw text in thelanguage in question.
The tagger, lemmatizer and parser are trained using gold UD data.
We present the tokenizer, tagger, lemmatizer and parser performance, measured onthe testing portion of the data, evaluated both on the raw text and using thegold tokenization. The results are F1 scores measured by theconll18_ud_eval.py
script.
Model | Mode | Words | Sents | UPOS | XPOS | UFeats | AllTags | Lemma | UAS | LAS | MLAS | BLEX |
---|---|---|---|---|---|---|---|---|---|---|---|---|
afrikaans-afribooms-ud-2.10-220711 | Raw text | 99.78 | 98.59 | 98.58 | 95.46 | 98.13 | 95.33 | 97.43 | 90.10 | 87.23 | 78.64 | 78.59 |
afrikaans-afribooms-ud-2.10-220711 | Gold tokenization | — | — | 98.77 | 95.62 | 98.31 | 95.50 | 97.53 | 90.72 | 87.80 | 79.23 | 78.99 |
ancient_greek-perseus-ud-2.10-220711 | Raw text | 99.97 | 98.85 | 92.83 | 85.55 | 91.45 | 84.87 | 86.68 | 80.13 | 74.36 | 54.62 | 55.72 |
ancient_greek-perseus-ud-2.10-220711 | Gold tokenization | — | — | 92.88 | 85.60 | 91.47 | 84.90 | 86.70 | 80.32 | 74.53 | 54.73 | 55.87 |
ancient_greek-proiel-ud-2.10-220711 | Raw text | 100.00 | 48.02 | 97.77 | 98.05 | 92.35 | 91.05 | 94.71 | 79.82 | 76.06 | 60.08 | 65.75 |
ancient_greek-proiel-ud-2.10-220711 | Gold tokenization | — | — | 97.87 | 98.14 | 92.49 | 91.26 | 94.73 | 86.05 | 82.14 | 67.03 | 71.90 |
ancient_hebrew-ptnk-ud-2.10-220711 | Raw text | 68.76 | 98.06 | 56.80 | 56.94 | 55.13 | 50.80 | 49.88 | 38.73 | 34.67 | 18.47 | 17.69 |
ancient_hebrew-ptnk-ud-2.10-220711 | Gold tokenization | — | — | 68.03 | 67.97 | 66.97 | 56.15 | 53.35 | 63.31 | 51.61 | 28.08 | 24.34 |
arabic-padt-ud-2.10-220711 | Raw text | 94.58 | 82.09 | 91.72 | 89.01 | 89.14 | 88.69 | 90.41 | 78.63 | 74.54 | 65.84 | 67.88 |
arabic-padt-ud-2.10-220711 | Gold tokenization | — | — | 97.02 | 94.38 | 94.53 | 94.08 | 95.31 | 88.11 | 83.49 | 74.57 | 76.13 |
armenian-armtdp-ud-2.10-220711 | Raw text | 99.28 | 95.70 | 96.07 | — | 91.39 | 90.28 | 95.04 | 86.84 | 82.22 | 69.53 | 74.39 |
armenian-armtdp-ud-2.10-220711 | Gold tokenization | — | — | 96.63 | — | 92.03 | 90.77 | 95.70 | 88.50 | 83.81 | 70.18 | 75.42 |
armenian-bsut-ud-2.10-220711 | Raw text | 99.79 | 98.73 | 97.31 | — | 92.01 | 91.24 | 96.62 | 90.02 | 85.75 | 71.20 | 78.86 |
armenian-bsut-ud-2.10-220711 | Gold tokenization | — | — | 97.53 | — | 92.24 | 91.48 | 96.82 | 90.56 | 86.29 | 71.73 | 79.32 |
basque-bdt-ud-2.10-220711 | Raw text | 99.94 | 99.83 | 96.25 | — | 92.69 | 90.69 | 96.36 | 87.40 | 84.28 | 73.94 | 79.81 |
basque-bdt-ud-2.10-220711 | Gold tokenization | — | — | 96.30 | — | 92.73 | 90.72 | 96.39 | 87.48 | 84.36 | 73.99 | 79.86 |
belarusian-hse-ud-2.10-220711 | Raw text | 99.47 | 83.97 | 98.30 | 96.26 | 94.38 | 92.37 | 93.35 | 86.84 | 84.85 | 76.00 | 76.01 |
belarusian-hse-ud-2.10-220711 | Gold tokenization | — | — | 98.81 | 96.74 | 94.94 | 92.87 | 93.86 | 89.55 | 87.38 | 78.23 | 78.11 |
bulgarian-btb-ud-2.10-220711 | Raw text | 99.91 | 94.17 | 99.19 | 97.20 | 97.97 | 96.85 | 97.99 | 94.41 | 91.67 | 85.89 | 86.31 |
bulgarian-btb-ud-2.10-220711 | Gold tokenization | — | — | 99.29 | 97.30 | 98.07 | 96.96 | 98.09 | 95.24 | 92.44 | 86.52 | 87.03 |
catalan-ancora-ud-2.10-220711 | Raw text | 99.95 | 99.08 | 99.07 | 97.21 | 98.70 | 96.96 | 99.40 | 94.86 | 93.14 | 87.45 | 88.92 |
catalan-ancora-ud-2.10-220711 | Gold tokenization | — | — | 99.14 | 97.32 | 98.78 | 97.07 | 99.46 | 95.02 | 93.30 | 87.69 | 89.13 |
chinese-gsdsimp-ud-2.10-220711 | Raw text | 90.29 | 99.10 | 87.21 | 87.16 | 89.74 | 86.42 | 90.29 | 73.11 | 70.62 | 63.58 | 67.09 |
chinese-gsdsimp-ud-2.10-220711 | Gold tokenization | — | — | 96.14 | 96.04 | 99.45 | 95.30 | 99.99 | 87.28 | 84.07 | 78.56 | 82.64 |
chinese-gsd-ud-2.10-220711 | Raw text | 90.27 | 99.10 | 87.15 | 87.05 | 89.71 | 86.36 | 90.27 | 72.85 | 70.29 | 63.41 | 66.89 |
chinese-gsd-ud-2.10-220711 | Gold tokenization | — | — | 96.21 | 96.08 | 99.40 | 95.34 | 99.99 | 87.15 | 83.96 | 78.41 | 82.59 |
classical_chinese-kyoto-ud-2.10-220711 | Raw text | 97.26 | 40.71 | 87.40 | 86.48 | 89.97 | 83.27 | 96.78 | 67.56 | 62.17 | 58.02 | 60.60 |
classical_chinese-kyoto-ud-2.10-220711 | Gold tokenization | — | — | 92.30 | 90.87 | 93.94 | 88.19 | 99.47 | 83.16 | 77.63 | 73.15 | 76.42 |
coptic-scriptorium-ud-2.10-220711 | Raw text | 74.49 | 33.87 | 72.43 | 72.34 | 72.53 | 71.54 | 72.91 | 51.25 | 49.43 | 36.55 | 39.14 |
coptic-scriptorium-ud-2.10-220711 | Gold tokenization | — | — | 96.94 | 96.78 | 97.49 | 95.50 | 97.02 | 90.48 | 87.70 | 76.04 | 79.57 |
croatian-set-ud-2.10-220711 | Raw text | 99.93 | 94.79 | 98.48 | 95.72 | 96.23 | 95.49 | 97.60 | 92.17 | 89.27 | 81.53 | 84.23 |
croatian-set-ud-2.10-220711 | Gold tokenization | — | — | 98.54 | 95.80 | 96.30 | 95.56 | 97.68 | 92.67 | 89.75 | 81.92 | 84.69 |
czech-pdt-ud-2.10-220711 | Raw text | 99.94 | 93.74 | 99.37 | 98.40 | 98.33 | 98.02 | 99.21 | 94.90 | 93.50 | 90.28 | 91.88 |
czech-pdt-ud-2.10-220711 | Gold tokenization | — | — | 99.45 | 98.47 | 98.40 | 98.09 | 99.28 | 95.63 | 94.23 | 90.88 | 92.50 |
czech-cac-ud-2.10-220711 | Raw text | 99.99 | 99.68 | 99.72 | 98.57 | 98.37 | 98.12 | 99.18 | 96.12 | 94.76 | 91.09 | 92.67 |
czech-cac-ud-2.10-220711 | Gold tokenization | — | — | 99.73 | 98.58 | 98.38 | 98.13 | 99.19 | 96.12 | 94.76 | 91.11 | 92.69 |
czech-cltt-ud-2.10-220711 | Raw text | 99.71 | 97.79 | 99.22 | 95.32 | 95.23 | 95.03 | 99.18 | 90.77 | 89.24 | 81.35 | 86.22 |
czech-cltt-ud-2.10-220711 | Gold tokenization | — | — | 99.47 | 95.47 | 95.40 | 95.18 | 99.47 | 91.20 | 89.68 | 81.67 | 86.73 |
czech-fictree-ud-2.10-220711 | Raw text | 99.99 | 98.95 | 99.17 | 97.06 | 97.83 | 96.86 | 99.35 | 96.38 | 94.91 | 89.61 | 92.81 |
czech-fictree-ud-2.10-220711 | Gold tokenization | — | — | 99.18 | 97.08 | 97.84 | 96.88 | 99.36 | 96.46 | 94.97 | 89.71 | 92.91 |
danish-ddt-ud-2.10-220711 | Raw text | 99.81 | 89.78 | 97.95 | — | 97.29 | 96.54 | 97.26 | 88.27 | 86.25 | 79.22 | 80.96 |
danish-ddt-ud-2.10-220711 | Gold tokenization | — | — | 98.16 | — | 97.53 | 96.79 | 97.45 | 89.46 | 87.42 | 80.42 | 82.17 |
dutch-alpino-ud-2.10-220711 | Raw text | 99.83 | 88.98 | 97.86 | 96.79 | 97.80 | 96.29 | 95.11 | 92.95 | 90.58 | 83.15 | 79.88 |
dutch-alpino-ud-2.10-220711 | Gold tokenization | — | — | 97.97 | 96.87 | 97.91 | 96.41 | 95.26 | 94.00 | 91.63 | 84.17 | 80.83 |
dutch-lassysmall-ud-2.10-220711 | Raw text | 99.80 | 74.93 | 96.98 | 95.62 | 96.63 | 94.88 | 95.70 | 90.61 | 87.94 | 79.67 | 78.25 |
dutch-lassysmall-ud-2.10-220711 | Gold tokenization | — | — | 97.25 | 96.43 | 97.36 | 95.83 | 95.97 | 94.51 | 91.66 | 84.48 | 83.08 |
english-ewt-ud-2.10-220711 | Raw text | 98.95 | 87.02 | 96.39 | 96.13 | 96.53 | 94.80 | 97.13 | 90.07 | 88.10 | 81.47 | 83.42 |
english-ewt-ud-2.10-220711 | Gold tokenization | — | — | 97.35 | 97.06 | 97.52 | 95.71 | 98.07 | 92.62 | 90.56 | 84.02 | 85.98 |
english-atis-ud-2.10-220711 | Raw text | 100.00 | 81.96 | 98.97 | — | 98.54 | 98.13 | 99.94 | 94.39 | 92.92 | 87.85 | 90.39 |
english-atis-ud-2.10-220711 | Gold tokenization | — | — | 98.97 | — | 98.56 | 98.15 | 99.94 | 95.88 | 94.26 | 89.80 | 92.40 |
english-gum-ud-2.10-220711 | Raw text | 99.64 | 95.36 | 97.95 | 97.91 | 97.88 | 96.91 | 98.77 | 92.30 | 90.35 | 84.63 | 86.31 |
english-gum-ud-2.10-220711 | Gold tokenization | — | — | 98.27 | 98.26 | 98.22 | 97.24 | 99.09 | 93.17 | 91.19 | 85.42 | 87.04 |
english-lines-ud-2.10-220711 | Raw text | 99.92 | 87.45 | 97.71 | 96.77 | 97.02 | 94.41 | 98.40 | 91.17 | 88.22 | 80.27 | 83.45 |
english-lines-ud-2.10-220711 | Gold tokenization | — | — | 97.79 | 96.84 | 97.07 | 94.48 | 98.47 | 92.10 | 89.17 | 81.05 | 84.36 |
english-partut-ud-2.10-220711 | Raw text | 99.72 | 100.00 | 97.23 | 97.11 | 96.35 | 95.26 | 98.14 | 94.24 | 92.21 | 83.35 | 87.34 |
english-partut-ud-2.10-220711 | Gold tokenization | — | — | 97.48 | 97.36 | 96.60 | 95.51 | 98.42 | 94.48 | 92.46 | 83.74 | 87.62 |
estonian-edt-ud-2.10-220711 | Raw text | 99.95 | 92.03 | 97.68 | 98.31 | 96.28 | 95.07 | 95.36 | 88.81 | 86.16 | 79.92 | 79.56 |
estonian-edt-ud-2.10-220711 | Gold tokenization | — | — | 97.81 | 98.36 | 96.36 | 95.19 | 95.43 | 89.71 | 87.03 | 80.77 | 80.37 |
estonian-ewt-ud-2.10-220711 | Raw text | 98.82 | 75.26 | 95.41 | 96.29 | 94.06 | 91.92 | 93.86 | 82.62 | 79.30 | 71.40 | 72.35 |
estonian-ewt-ud-2.10-220711 | Gold tokenization | — | — | 96.65 | 97.43 | 95.15 | 93.10 | 94.97 | 86.76 | 83.25 | 74.79 | 75.57 |
faroese-farpahc-ud-2.10-220711 | Raw text | 99.74 | 92.77 | 97.44 | 93.04 | 94.43 | 92.50 | 99.74 | 85.76 | 82.13 | 68.07 | 75.34 |
faroese-farpahc-ud-2.10-220711 | Gold tokenization | — | — | 97.64 | 93.28 | 94.68 | 92.72 | 100.00 | 86.82 | 83.10 | 69.17 | 76.51 |
finnish-tdt-ud-2.10-220711 | Raw text | 99.70 | 90.82 | 97.58 | 98.18 | 95.99 | 95.10 | 92.14 | 90.20 | 88.18 | 82.19 | 78.16 |
finnish-tdt-ud-2.10-220711 | Gold tokenization | — | — | 97.92 | 98.49 | 96.29 | 95.43 | 92.46 | 91.51 | 89.46 | 83.20 | 79.17 |
finnish-ftb-ud-2.10-220711 | Raw text | 99.91 | 86.84 | 96.69 | 95.14 | 96.83 | 94.02 | 95.57 | 89.80 | 87.18 | 80.04 | 80.49 |
finnish-ftb-ud-2.10-220711 | Gold tokenization | — | — | 97.00 | 95.36 | 96.92 | 94.32 | 95.67 | 91.91 | 89.23 | 82.55 | 82.84 |
french-gsd-ud-2.10-220711 | Raw text | 98.78 | 94.69 | 97.26 | — | 97.35 | 96.63 | 97.55 | 92.76 | 90.82 | 84.55 | 86.32 |
french-gsd-ud-2.10-220711 | Gold tokenization | — | — | 98.44 | — | 98.47 | 97.71 | 98.75 | 94.55 | 92.71 | 86.34 | 87.59 |
french-parisstories-ud-2.10-220711 | Raw text | 99.49 | 87.87 | 96.24 | — | 94.41 | 92.17 | 97.55 | 79.95 | 74.84 | 61.23 | 68.35 |
french-parisstories-ud-2.10-220711 | Gold tokenization | — | — | 96.81 | — | 94.90 | 92.68 | 97.98 | 81.67 | 76.50 | 62.46 | 69.32 |
french-partut-ud-2.10-220711 | Raw text | 99.48 | 100.00 | 97.26 | 96.76 | 94.72 | 93.96 | 97.33 | 94.72 | 92.81 | 81.09 | 86.22 |
french-partut-ud-2.10-220711 | Gold tokenization | — | — | 97.89 | 97.35 | 95.27 | 94.51 | 97.89 | 95.62 | 93.85 | 82.18 | 87.24 |
french-rhapsodie-ud-2.10-220711 | Raw text | 99.22 | 99.47 | 97.20 | 97.45 | 96.12 | 93.30 | 98.26 | 88.71 | 84.99 | 75.15 | 79.88 |
french-rhapsodie-ud-2.10-220711 | Gold tokenization | — | — | 98.00 | 98.13 | 96.89 | 93.97 | 98.99 | 89.89 | 86.08 | 75.91 | 80.35 |
french-sequoia-ud-2.10-220711 | Raw text | 99.15 | 84.02 | 98.32 | — | 97.15 | 96.68 | 98.33 | 93.60 | 92.22 | 86.08 | 89.00 |
french-sequoia-ud-2.10-220711 | Gold tokenization | — | — | 99.24 | — | 97.95 | 97.54 | 99.13 | 95.43 | 94.11 | 88.00 | 90.34 |
galician-ctg-ud-2.10-220711 | Raw text | 99.22 | 97.22 | 97.28 | 97.05 | 99.06 | 96.70 | 98.04 | 85.59 | 83.20 | 72.11 | 76.94 |
galician-ctg-ud-2.10-220711 | Gold tokenization | — | — | 98.01 | 97.78 | 99.84 | 97.41 | 98.79 | 87.31 | 84.80 | 74.04 | 78.88 |
galician-treegal-ud-2.10-220711 | Raw text | 98.74 | 87.99 | 96.00 | 93.69 | 94.85 | 92.82 | 96.67 | 83.44 | 79.36 | 67.82 | 71.68 |
galician-treegal-ud-2.10-220711 | Gold tokenization | — | — | 97.19 | 94.83 | 95.94 | 93.91 | 97.86 | 86.75 | 82.40 | 71.30 | 75.54 |
german-hdt-ud-2.10-220711 | Raw text | 99.90 | 92.34 | 98.51 | 98.45 | 94.08 | 93.70 | 97.16 | 96.94 | 96.04 | 84.79 | 90.40 |
german-hdt-ud-2.10-220711 | Gold tokenization | — | — | 98.62 | 98.58 | 94.22 | 93.83 | 97.26 | 97.63 | 96.75 | 85.46 | 91.08 |
german-gsd-ud-2.10-220711 | Raw text | 99.81 | 81.12 | 95.78 | 97.68 | 90.23 | 87.27 | 96.75 | 87.32 | 83.12 | 63.79 | 75.00 |
german-gsd-ud-2.10-220711 | Gold tokenization | — | — | 95.94 | 97.87 | 90.60 | 87.60 | 96.96 | 89.28 | 85.04 | 65.33 | 76.75 |
gothic-proiel-ud-2.10-220711 | Raw text | 100.00 | 31.12 | 96.48 | 96.98 | 90.08 | 88.19 | 94.62 | 74.17 | 68.40 | 55.39 | 62.02 |
gothic-proiel-ud-2.10-220711 | Gold tokenization | — | — | 96.97 | 97.42 | 90.90 | 89.33 | 94.71 | 84.11 | 78.29 | 65.73 | 71.10 |
greek-gdt-ud-2.10-220711 | Raw text | 99.87 | 90.19 | 98.09 | 98.10 | 95.60 | 95.01 | 95.61 | 93.05 | 91.24 | 81.58 | 81.04 |
greek-gdt-ud-2.10-220711 | Gold tokenization | — | — | 98.23 | 98.24 | 95.79 | 95.20 | 95.70 | 93.85 | 92.04 | 82.28 | 81.75 |
hebrew-htb-ud-2.10-220711 | Raw text | 85.05 | 99.39 | 82.78 | 82.80 | 81.23 | 80.53 | 82.91 | 70.63 | 68.13 | 55.31 | 59.44 |
hebrew-htb-ud-2.10-220711 | Gold tokenization | — | — | 97.44 | 97.42 | 95.73 | 95.05 | 97.34 | 92.71 | 90.24 | 78.77 | 81.80 |
hebrew-iahltwiki-ud-2.10-220711 | Raw text | 88.54 | 97.16 | 85.97 | 86.00 | 80.55 | 79.47 | 87.15 | 76.16 | 74.19 | 56.91 | 66.92 |
hebrew-iahltwiki-ud-2.10-220711 | Gold tokenization | — | — | 97.09 | 97.10 | 91.59 | 90.41 | 98.24 | 93.88 | 91.45 | 74.27 | 85.44 |
hindi-hdtb-ud-2.10-220711 | Raw text | 100.00 | 98.90 | 97.57 | 97.12 | 94.16 | 92.23 | 98.92 | 95.30 | 92.32 | 79.20 | 87.66 |
hindi-hdtb-ud-2.10-220711 | Gold tokenization | — | — | 97.58 | 97.14 | 94.18 | 92.26 | 98.92 | 95.42 | 92.44 | 79.35 | 87.81 |
hungarian-szeged-ud-2.10-220711 | Raw text | 99.85 | 95.89 | 96.68 | — | 94.22 | 93.53 | 94.92 | 88.81 | 85.09 | 75.22 | 78.19 |
hungarian-szeged-ud-2.10-220711 | Gold tokenization | — | — | 96.79 | — | 94.36 | 93.64 | 95.04 | 89.31 | 85.54 | 75.51 | 78.47 |
icelandic-icepahc-ud-2.10-220711 | Raw text | 99.82 | 92.15 | 96.90 | 93.24 | 91.32 | 86.42 | 95.99 | 87.21 | 83.36 | 65.98 | 74.25 |
icelandic-icepahc-ud-2.10-220711 | Gold tokenization | — | — | 97.08 | 93.45 | 91.47 | 86.61 | 96.15 | 87.78 | 83.87 | 66.49 | 74.84 |
icelandic-modern-ud-2.10-220711 | Raw text | 99.92 | 99.22 | 99.07 | 98.14 | 98.38 | 97.88 | 98.91 | 94.41 | 93.17 | 89.31 | 90.07 |
icelandic-modern-ud-2.10-220711 | Gold tokenization | — | — | 99.14 | 98.21 | 98.45 | 97.95 | 98.98 | 94.50 | 93.26 | 89.41 | 90.16 |
indonesian-gsd-ud-2.10-220711 | Raw text | 99.48 | 92.90 | 94.23 | 93.81 | 95.53 | 88.78 | 98.13 | 87.65 | 81.59 | 72.35 | 77.02 |
indonesian-gsd-ud-2.10-220711 | Gold tokenization | — | — | 94.66 | 94.26 | 95.99 | 89.17 | 98.53 | 88.57 | 82.42 | 73.21 | 77.88 |
indonesian-csui-ud-2.10-220711 | Raw text | 99.45 | 91.01 | 96.05 | 96.14 | 96.85 | 95.43 | 98.23 | 86.38 | 82.10 | 76.54 | 78.80 |
indonesian-csui-ud-2.10-220711 | Gold tokenization | — | — | 96.56 | 96.72 | 97.37 | 95.99 | 98.87 | 87.77 | 83.28 | 77.62 | 79.92 |
irish-idt-ud-2.10-220711 | Raw text | 99.72 | 97.25 | 95.63 | 94.76 | 90.33 | 87.14 | 95.30 | 86.74 | 81.10 | 64.20 | 71.52 |
irish-idt-ud-2.10-220711 | Gold tokenization | — | — | 95.89 | 95.07 | 90.60 | 87.46 | 95.54 | 87.28 | 81.64 | 64.56 | 71.85 |
italian-isdt-ud-2.10-220711 | Raw text | 99.84 | 98.76 | 98.57 | 98.50 | 98.25 | 97.67 | 98.79 | 94.66 | 93.01 | 86.61 | 88.00 |
italian-isdt-ud-2.10-220711 | Gold tokenization | — | — | 98.72 | 98.65 | 98.41 | 97.83 | 98.95 | 94.96 | 93.34 | 86.97 | 88.40 |
italian-markit-ud-2.10-220711 | Raw text | 99.59 | 98.24 | 96.76 | 97.00 | 93.80 | 92.08 | 88.18 | 88.36 | 84.51 | 69.95 | 77.77 |
italian-markit-ud-2.10-220711 | Gold tokenization | — | — | 97.15 | 97.40 | 94.10 | 92.35 | 88.54 | 89.13 | 85.26 | 70.51 | 78.46 |
italian-partut-ud-2.10-220711 | Raw text | 99.73 | 100.00 | 98.43 | 98.43 | 98.35 | 97.61 | 98.68 | 96.21 | 94.18 | 87.87 | 89.09 |
italian-partut-ud-2.10-220711 | Gold tokenization | — | — | 98.54 | 98.57 | 98.49 | 97.69 | 98.93 | 96.26 | 94.15 | 87.68 | 89.07 |
italian-postwita-ud-2.10-220711 | Raw text | 99.40 | 28.11 | 96.43 | 96.18 | 96.30 | 94.79 | 96.72 | 80.61 | 76.89 | 65.29 | 66.90 |
italian-postwita-ud-2.10-220711 | Gold tokenization | — | — | 97.04 | 96.82 | 96.80 | 95.29 | 97.31 | 88.34 | 84.19 | 75.32 | 77.32 |
italian-twittiro-ud-2.10-220711 | Raw text | 99.14 | 39.36 | 95.92 | 95.92 | 95.07 | 93.46 | 94.50 | 82.23 | 77.79 | 64.50 | 65.42 |
italian-twittiro-ud-2.10-220711 | Gold tokenization | — | — | 96.91 | 96.61 | 96.00 | 94.15 | 95.16 | 88.07 | 83.53 | 71.89 | 72.69 |
italian-vit-ud-2.10-220711 | Raw text | 99.76 | 96.73 | 98.14 | 97.39 | 97.64 | 96.21 | 98.89 | 92.08 | 89.16 | 80.93 | 83.70 |
italian-vit-ud-2.10-220711 | Gold tokenization | — | — | 98.36 | 97.71 | 97.85 | 96.53 | 99.10 | 92.88 | 89.97 | 81.91 | 84.63 |
japanese-gsd-ud-2.10-220711 | Raw text | 96.17 | 100.00 | 94.93 | 94.18 | 96.16 | 93.81 | 95.05 | 87.68 | 86.85 | 80.43 | 80.78 |
japanese-gsd-ud-2.10-220711 | Gold tokenization | — | — | 98.55 | 97.50 | 99.99 | 97.13 | 98.47 | 94.73 | 93.75 | 88.50 | 88.34 |
japanese-gsdluw-ud-2.10-220711 | Raw text | 95.18 | 99.72 | 93.81 | 93.54 | 95.18 | 93.46 | 93.66 | 86.22 | 85.54 | 76.27 | 76.58 |
japanese-gsdluw-ud-2.10-220711 | Gold tokenization | — | — | 98.36 | 98.05 | 100.00 | 97.93 | 97.89 | 95.23 | 94.18 | 86.38 | 85.19 |
korean-kaist-ud-2.10-220711 | Raw text | 100.00 | 100.00 | 95.88 | 87.74 | — | 87.56 | 94.17 | 89.33 | 87.47 | 82.15 | 80.14 |
korean-kaist-ud-2.10-220711 | Gold tokenization | — | — | 95.88 | 87.74 | — | 87.56 | 94.17 | 89.33 | 87.47 | 82.15 | 80.14 |
korean-gsd-ud-2.10-220711 | Raw text | 99.87 | 93.93 | 96.57 | 90.27 | 99.67 | 88.02 | 93.57 | 88.54 | 84.91 | 80.73 | 77.23 |
korean-gsd-ud-2.10-220711 | Gold tokenization | — | — | 96.73 | 90.43 | 99.80 | 88.20 | 93.69 | 89.27 | 85.61 | 81.45 | 77.93 |
latin-ittb-ud-2.10-220711 | Raw text | 99.99 | 91.21 | 98.91 | 96.58 | 96.75 | 95.19 | 99.18 | 90.53 | 88.53 | 82.07 | 86.07 |
latin-ittb-ud-2.10-220711 | Gold tokenization | — | — | 98.92 | 96.57 | 96.78 | 95.20 | 99.18 | 91.50 | 89.51 | 82.63 | 86.59 |
latin-llct-ud-2.10-220711 | Raw text | 100.00 | 99.49 | 99.68 | 97.14 | 97.26 | 96.89 | 97.78 | 95.55 | 94.56 | 89.80 | 90.95 |
latin-llct-ud-2.10-220711 | Gold tokenization | — | — | 99.68 | 97.15 | 97.27 | 96.90 | 97.78 | 95.55 | 94.57 | 89.81 | 90.97 |
latin-perseus-ud-2.10-220711 | Raw text | 100.00 | 98.46 | 91.83 | 80.66 | 86.12 | 78.56 | 88.13 | 77.98 | 68.59 | 52.30 | 55.51 |
latin-perseus-ud-2.10-220711 | Gold tokenization | — | — | 91.85 | 80.66 | 86.12 | 78.55 | 88.16 | 78.14 | 68.71 | 52.39 | 55.58 |
latin-proiel-ud-2.10-220711 | Raw text | 99.87 | 36.81 | 96.69 | 96.87 | 90.56 | 89.54 | 96.21 | 74.07 | 69.56 | 56.74 | 63.93 |
latin-proiel-ud-2.10-220711 | Gold tokenization | — | — | 97.12 | 97.32 | 91.19 | 90.27 | 96.44 | 83.20 | 78.50 | 66.34 | 73.00 |
latin-udante-ud-2.10-220711 | Raw text | 99.61 | 98.81 | 90.58 | 75.59 | 81.31 | 71.62 | 87.25 | 75.26 | 67.81 | 43.95 | 50.36 |
latin-udante-ud-2.10-220711 | Gold tokenization | — | — | 90.82 | 75.70 | 81.53 | 71.70 | 87.44 | 75.50 | 67.97 | 44.08 | 50.51 |
latvian-lvtb-ud-2.10-220711 | Raw text | 99.31 | 97.83 | 96.51 | 89.83 | 93.86 | 89.08 | 95.92 | 88.75 | 85.79 | 76.04 | 80.25 |
latvian-lvtb-ud-2.10-220711 | Gold tokenization | — | — | 97.14 | 90.43 | 94.50 | 89.67 | 96.55 | 89.84 | 86.82 | 77.09 | 81.31 |
lithuanian-alksnis-ud-2.10-220711 | Raw text | 99.91 | 87.87 | 95.94 | 90.44 | 91.03 | 89.52 | 93.60 | 82.45 | 78.64 | 67.97 | 71.37 |
lithuanian-alksnis-ud-2.10-220711 | Gold tokenization | — | — | 96.04 | 90.52 | 91.16 | 89.63 | 93.69 | 83.70 | 79.88 | 68.98 | 72.36 |
lithuanian-hse-ud-2.10-220711 | Raw text | 97.30 | 97.30 | 89.28 | 90.21 | 83.13 | 78.38 | 88.16 | 70.27 | 61.79 | 45.67 | 54.04 |
lithuanian-hse-ud-2.10-220711 | Gold tokenization | — | — | 91.23 | 92.36 | 85.19 | 80.09 | 90.57 | 73.96 | 64.53 | 47.54 | 56.10 |
maltese-mudt-ud-2.10-220711 | Raw text | 99.84 | 86.29 | 95.80 | 95.79 | — | 95.35 | — | 84.96 | 80.07 | 68.98 | 72.86 |
maltese-mudt-ud-2.10-220711 | Gold tokenization | — | — | 95.95 | 95.92 | — | 95.48 | — | 85.65 | 80.70 | 69.40 | 73.33 |
marathi-ufal-ud-2.10-220711 | Raw text | 90.25 | 92.63 | 76.50 | — | 65.25 | 60.75 | 80.75 | 60.75 | 50.75 | 28.39 | 38.00 |
marathi-ufal-ud-2.10-220711 | Gold tokenization | — | — | 82.52 | — | 67.96 | 62.86 | 80.83 | 68.93 | 58.50 | 29.46 | 38.17 |
naija-nsc-ud-2.10-220711 | Raw text | 99.94 | 100.00 | 98.03 | — | 98.94 | 97.53 | 99.32 | 93.65 | 90.99 | 88.13 | 89.60 |
naija-nsc-ud-2.10-220711 | Gold tokenization | — | — | 98.08 | — | 99.00 | 97.58 | 99.38 | 93.75 | 91.08 | 88.21 | 89.68 |
north_sami-giella-ud-2.10-220711 | Raw text | 99.87 | 98.79 | 91.77 | 93.54 | 89.30 | 85.36 | 87.01 | 75.16 | 70.43 | 59.76 | 58.27 |
north_sami-giella-ud-2.10-220711 | Gold tokenization | — | — | 91.91 | 93.67 | 89.45 | 85.52 | 87.13 | 75.47 | 70.76 | 60.05 | 58.56 |
norwegian-bokmaal-ud-2.10-220711 | Raw text | 99.77 | 96.05 | 98.35 | — | 97.43 | 96.82 | 98.57 | 93.62 | 92.16 | 86.91 | 88.74 |
norwegian-bokmaal-ud-2.10-220711 | Gold tokenization | — | — | 98.61 | — | 97.68 | 97.07 | 98.82 | 94.40 | 92.91 | 87.59 | 89.43 |
norwegian-nynorsk-ud-2.10-220711 | Raw text | 99.93 | 94.17 | 98.24 | — | 97.34 | 96.55 | 98.40 | 93.89 | 92.18 | 86.03 | 88.36 |
norwegian-nynorsk-ud-2.10-220711 | Gold tokenization | — | — | 98.41 | — | 97.50 | 96.73 | 98.53 | 94.63 | 92.93 | 86.93 | 89.20 |
norwegian-nynorsklia-ud-2.10-220711 | Raw text | 99.91 | 99.53 | 96.61 | — | 95.71 | 93.75 | 98.05 | 81.18 | 76.61 | 66.01 | 69.68 |
norwegian-nynorsklia-ud-2.10-220711 | Gold tokenization | — | — | 96.72 | — | 95.80 | 93.85 | 98.14 | 81.42 | 76.84 | 66.23 | 69.90 |
old_church_slavonic-proiel-ud-2.10-220711 | Raw text | 100.00 | 41.43 | 96.72 | 96.90 | 90.37 | 89.19 | 93.13 | 77.71 | 73.92 | 63.82 | 68.87 |
old_church_slavonic-proiel-ud-2.10-220711 | Gold tokenization | — | — | 97.08 | 97.28 | 91.06 | 89.93 | 93.14 | 88.30 | 84.18 | 74.01 | 77.39 |
old_french-srcmf-ud-2.10-220711 | Raw text | 99.70 | 100.00 | 96.68 | 96.50 | 97.70 | 95.72 | 99.65 | 91.17 | 87.38 | 80.76 | 84.40 |
old_french-srcmf-ud-2.10-220711 | Gold tokenization | — | — | 96.99 | 96.82 | 98.01 | 96.03 | 99.95 | 91.58 | 87.82 | 81.20 | 84.85 |
old_russian-torot-ud-2.10-220711 | Raw text | 100.00 | 29.60 | 94.39 | 94.70 | 87.56 | 85.23 | 85.92 | 71.00 | 65.32 | 51.64 | 53.64 |
old_russian-torot-ud-2.10-220711 | Gold tokenization | — | — | 95.06 | 95.29 | 88.50 | 86.60 | 85.96 | 83.30 | 77.24 | 64.09 | 62.94 |
old_russian-rnc-ud-2.10-220711 | Raw text | 97.48 | 84.03 | 90.94 | 86.55 | 76.51 | 67.15 | 75.31 | 61.28 | 55.93 | 33.24 | 34.04 |
old_russian-rnc-ud-2.10-220711 | Gold tokenization | — | — | 93.29 | 88.93 | 78.48 | 68.86 | 76.77 | 67.13 | 61.08 | 37.15 | 37.24 |
old_east_slavic-birchbark-ud-2.10-220711 | Raw text | 99.98 | 16.73 | 89.24 | 99.35 | 76.11 | 72.43 | 65.88 | 63.41 | 56.50 | 32.53 | 27.14 |
old_east_slavic-birchbark-ud-2.10-220711 | Gold tokenization | — | — | 89.37 | 99.37 | 76.54 | 72.82 | 66.05 | 76.31 | 69.00 | 41.63 | 33.60 |
persian-perdt-ud-2.10-220711 | Raw text | 99.66 | 99.83 | 97.48 | 97.36 | 97.61 | 95.60 | 98.88 | 93.63 | 91.42 | 86.18 | 88.66 |
persian-perdt-ud-2.10-220711 | Gold tokenization | — | — | 97.78 | 97.65 | 97.90 | 95.89 | 99.19 | 94.18 | 91.95 | 86.72 | 89.23 |
persian-seraji-ud-2.10-220711 | Raw text | 99.65 | 98.75 | 97.91 | 97.94 | 97.95 | 97.48 | 96.52 | 91.68 | 88.84 | 84.21 | 82.83 |
persian-seraji-ud-2.10-220711 | Gold tokenization | — | — | 98.24 | 98.28 | 98.28 | 97.78 | 96.80 | 92.36 | 89.48 | 84.82 | 83.40 |
polish-pdb-ud-2.10-220711 | Raw text | 99.85 | 97.33 | 98.89 | 95.89 | 96.11 | 95.26 | 98.10 | 94.22 | 92.19 | 85.44 | 88.36 |
polish-pdb-ud-2.10-220711 | Gold tokenization | — | — | 99.05 | 96.03 | 96.24 | 95.40 | 98.24 | 94.72 | 92.69 | 85.83 | 88.78 |
polish-lfg-ud-2.10-220711 | Raw text | 99.85 | 99.65 | 99.00 | 96.08 | 96.57 | 95.16 | 98.24 | 96.86 | 95.51 | 89.80 | 92.34 |
polish-lfg-ud-2.10-220711 | Gold tokenization | — | — | 99.17 | 96.25 | 96.74 | 95.33 | 98.38 | 97.25 | 95.89 | 90.19 | 92.66 |
pomak-philotis-ud-2.10-220711 | Raw text | 99.98 | 94.49 | 98.86 | — | 95.62 | 95.30 | 96.67 | 88.24 | 83.26 | 71.19 | 74.14 |
pomak-philotis-ud-2.10-220711 | Gold tokenization | — | — | 98.90 | — | 95.65 | 95.33 | 96.69 | 88.68 | 83.75 | 71.48 | 74.42 |
portuguese-gsd-ud-2.10-220711 | Raw text | 99.87 | 97.28 | 98.51 | 98.51 | 99.74 | 98.41 | 99.27 | 94.50 | 93.41 | 88.76 | 89.96 |
portuguese-gsd-ud-2.10-220711 | Gold tokenization | — | — | 98.65 | 98.64 | 99.89 | 98.55 | 99.40 | 94.90 | 93.81 | 89.23 | 90.36 |
portuguese-bosque-ud-2.10-220711 | Raw text | 99.68 | 89.89 | 97.87 | — | 96.95 | 96.00 | 98.35 | 92.35 | 90.07 | 81.38 | 84.69 |
portuguese-bosque-ud-2.10-220711 | Gold tokenization | — | — | 98.22 | — | 97.23 | 96.28 | 98.66 | 93.50 | 91.16 | 82.47 | 85.87 |
romanian-nonstandard-ud-2.10-220711 | Raw text | 98.83 | 96.77 | 96.18 | 91.87 | 90.53 | 89.18 | 94.90 | 88.85 | 84.82 | 68.21 | 76.36 |
romanian-nonstandard-ud-2.10-220711 | Gold tokenization | — | — | 97.30 | 92.86 | 91.49 | 90.10 | 95.99 | 90.57 | 86.50 | 69.69 | 77.68 |
romanian-rrt-ud-2.10-220711 | Raw text | 99.71 | 95.16 | 97.90 | 97.21 | 97.40 | 96.98 | 97.96 | 91.97 | 88.44 | 81.66 | 83.13 |
romanian-rrt-ud-2.10-220711 | Gold tokenization | — | — | 98.19 | 97.45 | 97.65 | 97.22 | 98.22 | 92.72 | 89.13 | 82.15 | 83.70 |
romanian-simonero-ud-2.10-220711 | Raw text | 99.84 | 100.00 | 98.45 | 97.97 | 97.56 | 97.25 | 98.91 | 94.08 | 92.13 | 85.52 | 88.32 |
romanian-simonero-ud-2.10-220711 | Gold tokenization | — | — | 98.61 | 98.12 | 97.70 | 97.40 | 99.07 | 94.42 | 92.45 | 85.81 | 88.62 |
russian-syntagrus-ud-2.10-220711 | Raw text | 99.67 | 98.31 | 98.46 | — | 93.96 | 93.71 | 98.18 | 93.84 | 91.70 | 82.72 | 88.90 |
russian-syntagrus-ud-2.10-220711 | Gold tokenization | — | — | 98.79 | — | 94.28 | 94.03 | 98.46 | 94.56 | 92.39 | 83.28 | 89.44 |
russian-gsd-ud-2.10-220711 | Raw text | 99.50 | 96.49 | 98.11 | 97.55 | 94.71 | 93.61 | 97.01 | 91.44 | 88.55 | 81.04 | 84.62 |
russian-gsd-ud-2.10-220711 | Gold tokenization | — | — | 98.58 | 97.98 | 95.17 | 94.01 | 97.43 | 92.67 | 89.69 | 82.00 | 85.65 |
russian-taiga-ud-2.10-220711 | Raw text | 98.12 | 86.33 | 95.65 | — | 93.13 | 92.06 | 94.73 | 83.08 | 79.57 | 70.60 | 73.88 |
russian-taiga-ud-2.10-220711 | Gold tokenization | — | — | 97.34 | — | 94.90 | 93.72 | 96.37 | 85.64 | 81.92 | 72.82 | 76.10 |
sanskrit-vedic-ud-2.10-220711 | Raw text | 100.00 | 27.18 | 89.16 | — | 81.61 | 76.76 | 87.05 | 60.92 | 50.04 | 41.66 | 44.99 |
sanskrit-vedic-ud-2.10-220711 | Gold tokenization | — | — | 89.97 | — | 83.02 | 78.34 | 87.34 | 73.74 | 62.01 | 52.00 | 55.41 |
scottish_gaelic-arcosg-ud-2.10-220711 | Raw text | 97.47 | 60.89 | 93.78 | 89.29 | 90.91 | 88.21 | 95.08 | 81.24 | 75.60 | 62.73 | 69.22 |
scottish_gaelic-arcosg-ud-2.10-220711 | Gold tokenization | — | — | 96.62 | 92.24 | 94.02 | 91.39 | 97.59 | 87.33 | 81.65 | 69.25 | 75.23 |
serbian-set-ud-2.10-220711 | Raw text | 99.99 | 93.00 | 99.09 | 96.00 | 96.21 | 95.75 | 97.76 | 93.63 | 91.20 | 83.76 | 87.00 |
serbian-set-ud-2.10-220711 | Gold tokenization | — | — | 99.13 | 96.01 | 96.20 | 95.75 | 97.78 | 94.26 | 91.80 | 84.32 | 87.60 |
slovak-snk-ud-2.10-220711 | Raw text | 100.00 | 81.69 | 97.65 | 90.35 | 93.50 | 89.56 | 96.46 | 91.39 | 89.65 | 80.43 | 84.44 |
slovak-snk-ud-2.10-220711 | Gold tokenization | — | — | 97.88 | 90.55 | 93.69 | 89.80 | 96.50 | 93.91 | 92.08 | 82.89 | 86.95 |
slovenian-ssj-ud-2.10-220711 | Raw text | 99.94 | 98.95 | 98.97 | 96.97 | 97.15 | 96.63 | 98.58 | 93.99 | 92.60 | 86.83 | 88.91 |
slovenian-ssj-ud-2.10-220711 | Gold tokenization | — | — | 99.03 | 97.02 | 97.23 | 96.69 | 98.63 | 94.15 | 92.76 | 86.99 | 89.02 |
slovenian-sst-ud-2.10-220711 | Raw text | 99.85 | 23.14 | 94.82 | 92.71 | 92.43 | 89.84 | 97.38 | 65.69 | 60.84 | 50.88 | 54.78 |
slovenian-sst-ud-2.10-220711 | Gold tokenization | — | — | 95.62 | 93.09 | 92.84 | 90.89 | 97.56 | 78.39 | 73.07 | 63.39 | 68.33 |
spanish-ancora-ud-2.10-220711 | Raw text | 99.95 | 98.78 | 99.06 | 96.02 | 98.74 | 95.59 | 99.37 | 93.70 | 91.79 | 86.41 | 87.88 |
spanish-ancora-ud-2.10-220711 | Gold tokenization | — | — | 99.11 | 96.07 | 98.79 | 95.63 | 99.42 | 93.88 | 91.97 | 86.59 | 88.04 |
spanish-gsd-ud-2.10-220711 | Raw text | 99.75 | 95.62 | 97.15 | — | 96.94 | 95.27 | 98.72 | 91.87 | 89.57 | 78.63 | 84.25 |
spanish-gsd-ud-2.10-220711 | Gold tokenization | — | — | 97.39 | — | 97.19 | 95.53 | 98.97 | 92.66 | 90.32 | 79.43 | 85.04 |
swedish-talbanken-ud-2.10-220711 | Raw text | 99.84 | 96.53 | 98.44 | 97.33 | 97.32 | 96.51 | 98.15 | 92.23 | 89.85 | 83.92 | 85.97 |
swedish-talbanken-ud-2.10-220711 | Gold tokenization | — | — | 98.61 | 97.52 | 97.51 | 96.72 | 98.32 | 92.68 | 90.30 | 84.48 | 86.54 |
swedish-lines-ud-2.10-220711 | Raw text | 99.96 | 88.00 | 97.66 | 95.51 | 90.84 | 88.14 | 97.72 | 90.60 | 87.38 | 71.82 | 82.17 |
swedish-lines-ud-2.10-220711 | Gold tokenization | — | — | 97.73 | 95.52 | 90.87 | 88.15 | 97.76 | 91.44 | 88.19 | 72.50 | 82.95 |
tamil-ttb-ud-2.10-220711 | Raw text | 94.26 | 97.52 | 84.29 | 83.18 | 84.64 | 78.22 | 89.45 | 70.43 | 61.88 | 50.61 | 55.39 |
tamil-ttb-ud-2.10-220711 | Gold tokenization | — | — | 89.29 | 87.78 | 89.99 | 82.70 | 94.42 | 78.13 | 68.78 | 56.87 | 61.48 |
telugu-mtg-ud-2.10-220711 | Raw text | 99.58 | 96.62 | 93.63 | 93.63 | 98.61 | 93.49 | — | 90.72 | 84.63 | 77.14 | 81.14 |
telugu-mtg-ud-2.10-220711 | Gold tokenization | — | — | 94.04 | 94.04 | 99.03 | 93.90 | — | 91.68 | 85.58 | 77.98 | 81.98 |
turkish-boun-ud-2.10-220711 | Raw text | 98.83 | 86.93 | 91.56 | 92.51 | 91.72 | 86.56 | 93.23 | 78.48 | 72.40 | 59.77 | 65.11 |
turkish-boun-ud-2.10-220711 | Gold tokenization | — | — | 92.53 | 93.47 | 92.67 | 87.31 | 94.26 | 81.07 | 74.73 | 61.33 | 66.92 |
turkish-atis-ud-2.10-220711 | Raw text | 100.00 | 80.20 | 98.96 | — | 98.46 | 98.25 | 99.15 | 89.22 | 87.49 | 85.12 | 86.08 |
turkish-atis-ud-2.10-220711 | Gold tokenization | — | — | 99.02 | — | 98.52 | 98.32 | 99.13 | 91.11 | 89.30 | 86.98 | 87.93 |
turkish-framenet-ud-2.10-220711 | Raw text | 100.00 | 100.00 | 96.86 | — | 94.89 | 94.21 | 96.66 | 93.39 | 84.25 | 73.98 | 77.64 |
turkish-framenet-ud-2.10-220711 | Gold tokenization | — | — | 96.86 | — | 94.89 | 94.21 | 96.66 | 93.39 | 84.25 | 73.98 | 77.64 |
turkish-imst-ud-2.10-220711 | Raw text | 98.30 | 96.97 | 94.38 | 93.98 | 90.92 | 88.60 | 94.54 | 74.73 | 69.04 | 58.25 | 63.10 |
turkish-imst-ud-2.10-220711 | Gold tokenization | — | — | 95.94 | 95.49 | 92.40 | 89.97 | 96.13 | 78.07 | 72.09 | 60.26 | 65.33 |
turkish-kenet-ud-2.10-220711 | Raw text | 100.00 | 98.12 | 93.71 | — | 92.05 | 90.86 | 93.33 | 83.91 | 71.18 | 61.81 | 64.77 |
turkish-kenet-ud-2.10-220711 | Gold tokenization | — | — | 93.72 | — | 92.06 | 90.87 | 93.33 | 84.07 | 71.29 | 61.92 | 64.89 |
turkish-penn-ud-2.10-220711 | Raw text | 99.34 | 80.59 | 95.60 | — | 94.41 | 93.33 | 94.36 | 84.22 | 71.67 | 62.21 | 64.53 |
turkish-penn-ud-2.10-220711 | Gold tokenization | — | — | 96.30 | — | 95.11 | 94.02 | 95.01 | 86.76 | 73.91 | 63.63 | 66.02 |
turkish-tourism-ud-2.10-220711 | Raw text | 99.96 | 99.86 | 98.80 | — | 95.08 | 94.67 | 98.36 | 97.20 | 91.52 | 81.98 | 87.38 |
turkish-tourism-ud-2.10-220711 | Gold tokenization | — | — | 98.85 | — | 95.12 | 94.73 | 98.40 | 97.25 | 91.58 | 82.04 | 87.45 |
turkish_german-sagt-ud-2.10-220711 | Raw text | 98.91 | 99.44 | 90.21 | — | 80.32 | 75.60 | 90.82 | 71.14 | 60.98 | 41.12 | 51.00 |
turkish_german-sagt-ud-2.10-220711 | Gold tokenization | — | — | 91.09 | — | 80.89 | 76.08 | 91.52 | 72.69 | 62.06 | 41.64 | 51.71 |
ukrainian-iu-ud-2.10-220711 | Raw text | 99.81 | 96.61 | 97.90 | 94.35 | 94.18 | 93.12 | 97.34 | 90.61 | 88.27 | 78.92 | 83.01 |
ukrainian-iu-ud-2.10-220711 | Gold tokenization | — | — | 98.08 | 94.54 | 94.34 | 93.29 | 97.53 | 91.12 | 88.72 | 79.21 | 83.36 |
urdu-udtb-ud-2.10-220711 | Raw text | 100.00 | 98.31 | 93.91 | 92.15 | 82.83 | 78.40 | 97.41 | 88.15 | 82.49 | 56.62 | 74.68 |
urdu-udtb-ud-2.10-220711 | Gold tokenization | — | — | 93.93 | 92.17 | 82.86 | 78.43 | 97.41 | 88.23 | 82.58 | 56.67 | 74.77 |
uyghur-udt-ud-2.10-220711 | Raw text | 99.54 | 81.81 | 89.33 | 91.75 | 88.12 | 79.98 | 94.67 | 76.66 | 64.87 | 46.84 | 55.29 |
uyghur-udt-ud-2.10-220711 | Gold tokenization | — | — | 89.71 | 92.30 | 88.59 | 80.50 | 95.14 | 78.38 | 66.49 | 47.83 | 56.56 |
vietnamese-vtb-ud-2.10-220711 | Raw text | 85.37 | 93.46 | 78.21 | 76.76 | 85.12 | 76.57 | 85.16 | 52.68 | 47.84 | 41.55 | 44.29 |
vietnamese-vtb-ud-2.10-220711 | Gold tokenization | — | — | 90.36 | 88.55 | 99.72 | 88.32 | 99.59 | 72.88 | 65.41 | 58.76 | 62.51 |
welsh-ccg-ud-2.10-220711 | Raw text | 99.42 | 97.37 | 95.33 | 94.40 | 89.82 | 87.61 | 93.93 | 86.61 | 80.67 | 63.31 | 69.02 |
welsh-ccg-ud-2.10-220711 | Gold tokenization | — | — | 95.84 | 94.87 | 90.31 | 88.07 | 94.44 | 87.85 | 81.83 | 64.36 | 70.21 |
western_armenian-armtdp-ud-2.10-220711 | Raw text | 99.89 | 98.68 | 96.82 | — | 92.51 | 91.83 | 97.14 | 89.39 | 84.66 | 69.84 | 76.01 |
western_armenian-armtdp-ud-2.10-220711 | Gold tokenization | — | — | 96.90 | — | 92.60 | 91.93 | 97.22 | 89.64 | 84.89 | 70.07 | 76.23 |
wolof-wtb-ud-2.10-220711 | Raw text | 99.23 | 91.95 | 94.20 | 94.15 | 93.50 | 91.41 | 95.20 | 84.15 | 78.69 | 66.75 | 70.23 |
wolof-wtb-ud-2.10-220711 | Gold tokenization | — | — | 95.17 | 95.07 | 94.32 | 92.31 | 95.96 | 86.27 | 80.75 | 68.70 | 72.06 |
Universal Dependencies 2.6 Models are distributed under theCC BY-NC-SA licence.The models are based solely onUniversal Dependencies2.6 treebanks, and additionallyusemultilingual BERT.
The models requireUDPipe 2.
The latest version 200831 of the Universal Dependencies 2.6 modelscan be downloaded fromLINDAT/CLARIN repository.
The models are also available in theREST service.
This work has been supported by the Ministry of Education, Youth and Sports ofthe Czech Republic, Project No. LM2018101 LINDAT/CLARIAH-CZ.
The models were trained onUniversal Dependencies 2.6 treebanks.
For the UD treebanks which do not contain original plain text version,raw text is used to train the tokenizer instead. The plain textswere taken from theW2C -- Web to Corpus.
Finally,multilingual BERTis used to provide contextualized word embeddings.
The Universal Dependencies 2.6 models contain 99 models of 63 languages, each consisting ofa tokenizer, tagger, lemmatizer and dependency parser, all trained usingthe UD data. We used the original train-dev-test split, but for treebanks withonly train and no dev data we used last 10% of the train data as dev data.We produce models only for treebanks with at least 1000 training words.
The tokenizer is trained using theSpaceAfter=No
features. If the featuresare not present in the data, they can be filled in using raw text in thelanguage in question.
The tagger, lemmatizer and parser are trained using gold UD data.
We present the tokenizer, tagger, lemmatizer and parser performance, measured onthe testing portion of the data, evaluated both on the raw text and using thegold tokenization. The results are F1 scores measured by theconll18_ud_eval.py
script.
Model | Mode | Words | Sents | UPOS | XPOS | UFeats | AllTags | Lemma | UAS | LAS | MLAS | BLEX |
---|---|---|---|---|---|---|---|---|---|---|---|---|
afrikaans-afribooms-ud-2.6-200830 | Raw text | 99.82 | 98.25 | 98.55 | 95.42 | 98.27 | 95.33 | 97.52 | 90.34 | 87.93 | 80.33 | 79.91 |
afrikaans-afribooms-ud-2.6-200830 | Gold tokenization | — | — | 98.70 | 95.56 | 98.41 | 95.48 | 97.61 | 90.80 | 88.40 | 80.78 | 80.31 |
ancient_greek-perseus-ud-2.6-200830 | Raw text | 99.97 | 98.85 | 93.20 | 86.01 | 91.59 | 85.27 | 86.81 | 79.57 | 73.90 | 54.80 | 55.63 |
ancient_greek-perseus-ud-2.6-200830 | Gold tokenization | — | — | 93.24 | 86.03 | 91.62 | 85.30 | 86.84 | 79.74 | 74.06 | 54.92 | 55.72 |
ancient_greek-proiel-ud-2.6-200830 | Raw text | 100.00 | 48.02 | 97.73 | 98.04 | 92.36 | 91.01 | 94.71 | 79.98 | 75.99 | 60.15 | 65.88 |
ancient_greek-proiel-ud-2.6-200830 | Gold tokenization | — | — | 97.91 | 98.18 | 92.59 | 91.30 | 94.76 | 85.95 | 81.90 | 66.72 | 71.84 |
arabic-padt-ud-2.6-200830 | Raw text | 94.58 | 82.09 | 91.68 | 88.96 | 89.14 | 88.65 | 90.36 | 78.86 | 74.85 | 66.06 | 68.12 |
arabic-padt-ud-2.6-200830 | Gold tokenization | — | — | 96.87 | 94.20 | 94.36 | 93.82 | 95.23 | 87.60 | 83.14 | 74.51 | 76.09 |
armenian-armtdp-ud-2.6-200830 | Raw text | 99.34 | 97.85 | 95.64 | — | 90.30 | 88.94 | 94.45 | 85.07 | 79.97 | 66.54 | 71.61 |
armenian-armtdp-ud-2.6-200830 | Gold tokenization | — | — | 96.11 | — | 90.90 | 89.37 | 95.04 | 86.31 | 81.18 | 67.00 | 72.22 |
basque-bdt-ud-2.6-200830 | Raw text | 99.94 | 99.83 | 96.44 | — | 93.60 | 91.69 | 96.40 | 87.24 | 84.15 | 74.97 | 79.94 |
basque-bdt-ud-2.6-200830 | Gold tokenization | — | — | 96.48 | — | 93.64 | 91.72 | 96.43 | 87.30 | 84.21 | 75.00 | 79.96 |
belarusian-hse-ud-2.6-200830 | Raw text | 99.84 | 78.70 | 96.14 | 31.78 | 82.07 | 26.98 | 81.48 | 75.81 | 71.05 | 49.78 | 50.93 |
belarusian-hse-ud-2.6-200830 | Gold tokenization | — | — | 96.39 | 31.85 | 82.19 | 27.08 | 81.53 | 80.18 | 75.18 | 52.92 | 53.60 |
bulgarian-btb-ud-2.6-200830 | Raw text | 99.91 | 94.17 | 99.15 | 97.19 | 97.95 | 96.84 | 97.97 | 94.35 | 91.61 | 85.92 | 86.43 |
bulgarian-btb-ud-2.6-200830 | Gold tokenization | — | — | 99.27 | 97.30 | 98.05 | 96.95 | 98.07 | 95.17 | 92.41 | 86.62 | 87.17 |
catalan-ancora-ud-2.6-200830 | Raw text | 99.98 | 99.43 | 99.05 | 98.99 | 98.63 | 98.14 | 99.31 | 94.53 | 92.86 | 87.63 | 89.24 |
catalan-ancora-ud-2.6-200830 | Gold tokenization | — | — | 99.09 | 99.03 | 98.67 | 98.18 | 99.34 | 94.60 | 92.95 | 87.73 | 89.35 |
chinese-gsdsimp-ud-2.6-200830 | Raw text | 90.29 | 99.10 | 87.32 | 87.20 | 89.73 | 86.54 | 90.29 | 72.68 | 70.32 | 63.38 | 66.94 |
chinese-gsdsimp-ud-2.6-200830 | Gold tokenization | — | — | 96.32 | 96.15 | 99.43 | 95.50 | 99.99 | 86.89 | 83.93 | 78.52 | 82.60 |
chinese-gsd-ud-2.6-200830 | Raw text | 90.27 | 99.10 | 87.27 | 87.18 | 89.74 | 86.50 | 90.27 | 72.99 | 70.50 | 63.83 | 67.21 |
chinese-gsd-ud-2.6-200830 | Gold tokenization | — | — | 96.30 | 96.16 | 99.42 | 95.45 | 99.99 | 87.30 | 84.22 | 78.63 | 82.84 |
classical_chinese-kyoto-ud-2.6-200830 | Raw text | 99.46 | 46.22 | 90.91 | 90.91 | 93.43 | 88.00 | 99.42 | 72.75 | 67.18 | 63.67 | 66.02 |
classical_chinese-kyoto-ud-2.6-200830 | Gold tokenization | — | — | 93.55 | 93.24 | 95.01 | 90.86 | 99.96 | 85.49 | 80.20 | 76.42 | 79.25 |
coptic-scriptorium-ud-2.6-200830 | Raw text | 71.91 | 35.97 | 69.61 | 68.00 | 63.06 | 60.16 | 70.51 | 47.75 | 45.89 | 25.42 | 35.81 |
coptic-scriptorium-ud-2.6-200830 | Gold tokenization | — | — | 96.15 | 92.53 | 87.75 | 81.98 | 96.70 | 89.14 | 85.79 | 57.57 | 76.42 |
croatian-set-ud-2.6-200830 | Raw text | 99.95 | 94.41 | 98.18 | 95.91 | 96.40 | 95.27 | 97.58 | 92.20 | 88.40 | 80.16 | 83.07 |
croatian-set-ud-2.6-200830 | Gold tokenization | — | — | 98.23 | 96.00 | 96.52 | 95.38 | 97.64 | 92.72 | 88.89 | 80.66 | 83.53 |
czech-pdt-ud-2.6-200830 | Raw text | 99.93 | 93.35 | 99.23 | 97.61 | 97.59 | 97.13 | 99.09 | 93.81 | 92.03 | 87.79 | 89.88 |
czech-pdt-ud-2.6-200830 | Gold tokenization | — | — | 99.30 | 97.71 | 97.70 | 97.24 | 99.17 | 94.60 | 92.81 | 88.45 | 90.57 |
czech-cac-ud-2.6-200830 | Raw text | 99.98 | 99.68 | 99.52 | 97.33 | 97.05 | 96.64 | 98.93 | 94.31 | 92.48 | 87.56 | 89.76 |
czech-cac-ud-2.6-200830 | Gold tokenization | — | — | 99.54 | 97.36 | 97.07 | 96.67 | 98.95 | 94.37 | 92.54 | 87.63 | 89.83 |
czech-fictree-ud-2.6-200830 | Raw text | 99.99 | 98.95 | 98.68 | 95.80 | 96.79 | 95.38 | 99.20 | 94.83 | 92.66 | 85.35 | 89.58 |
czech-fictree-ud-2.6-200830 | Gold tokenization | — | — | 98.69 | 95.82 | 96.80 | 95.40 | 99.21 | 94.92 | 92.74 | 85.47 | 89.69 |
czech-cltt-ud-2.6-200830 | Raw text | 99.65 | 97.40 | 99.21 | 95.00 | 94.98 | 94.76 | 99.06 | 91.37 | 89.67 | 82.08 | 86.96 |
czech-cltt-ud-2.6-200830 | Gold tokenization | — | — | 99.49 | 95.19 | 95.16 | 94.95 | 99.30 | 91.91 | 90.21 | 82.25 | 87.31 |
danish-ddt-ud-2.6-200830 | Raw text | 99.81 | 89.78 | 98.01 | — | 97.52 | 96.72 | 97.31 | 88.56 | 86.46 | 79.62 | 81.12 |
danish-ddt-ud-2.6-200830 | Gold tokenization | — | — | 98.26 | — | 97.73 | 96.99 | 97.53 | 89.82 | 87.67 | 80.73 | 82.27 |
dutch-alpino-ud-2.6-200830 | Raw text | 99.83 | 88.59 | 97.41 | 95.98 | 97.02 | 95.36 | 97.32 | 92.79 | 90.38 | 81.53 | 83.18 |
dutch-alpino-ud-2.6-200830 | Gold tokenization | — | — | 97.57 | 96.13 | 97.18 | 95.53 | 97.46 | 93.93 | 91.53 | 82.72 | 84.42 |
dutch-lassysmall-ud-2.6-200830 | Raw text | 99.83 | 75.40 | 96.58 | 95.42 | 96.41 | 94.73 | 97.21 | 90.36 | 87.66 | 78.84 | 80.17 |
dutch-lassysmall-ud-2.6-200830 | Gold tokenization | — | — | 96.79 | 96.05 | 96.97 | 95.40 | 97.33 | 94.26 | 91.24 | 83.56 | 84.84 |
english-ewt-ud-2.6-200830 | Raw text | 98.95 | 86.60 | 96.36 | 96.06 | 96.56 | 94.88 | 97.64 | 89.55 | 87.43 | 80.50 | 83.29 |
english-ewt-ud-2.6-200830 | Gold tokenization | — | — | 97.29 | 97.03 | 97.57 | 95.84 | 98.57 | 92.24 | 90.05 | 83.33 | 86.07 |
english-gum-ud-2.6-200830 | Raw text | 99.81 | 83.66 | 96.79 | 96.76 | 97.55 | 95.88 | 97.35 | 90.02 | 87.52 | 79.41 | 80.43 |
english-gum-ud-2.6-200830 | Gold tokenization | — | — | 96.99 | 96.93 | 97.75 | 96.09 | 97.56 | 91.93 | 89.36 | 81.20 | 82.25 |
english-lines-ud-2.6-200830 | Raw text | 99.92 | 87.45 | 97.60 | 95.86 | 96.88 | 93.39 | 98.34 | 89.36 | 86.45 | 79.35 | 82.87 |
english-lines-ud-2.6-200830 | Gold tokenization | — | — | 97.67 | 95.90 | 96.92 | 93.41 | 98.41 | 90.26 | 87.36 | 80.24 | 83.79 |
english-partut-ud-2.6-200830 | Raw text | 99.72 | 100.00 | 97.37 | 97.08 | 96.29 | 95.38 | 98.23 | 94.12 | 92.09 | 83.04 | 87.20 |
english-partut-ud-2.6-200830 | Gold tokenization | — | — | 97.62 | 97.33 | 96.54 | 95.63 | 98.50 | 94.40 | 92.37 | 83.44 | 87.48 |
estonian-edt-ud-2.6-200830 | Raw text | 99.96 | 91.56 | 97.65 | 98.25 | 96.44 | 95.19 | 95.34 | 88.75 | 86.18 | 80.12 | 79.65 |
estonian-edt-ud-2.6-200830 | Gold tokenization | — | — | 97.75 | 98.29 | 96.48 | 95.28 | 95.40 | 89.66 | 87.06 | 80.93 | 80.44 |
estonian-ewt-ud-2.6-200830 | Raw text | 98.96 | 70.09 | 95.00 | 96.30 | 93.74 | 91.31 | 93.81 | 81.07 | 77.55 | 69.14 | 70.69 |
estonian-ewt-ud-2.6-200830 | Gold tokenization | — | — | 96.22 | 97.37 | 94.65 | 92.37 | 94.83 | 86.37 | 82.55 | 73.03 | 74.39 |
finnish-tdt-ud-2.6-200830 | Raw text | 99.70 | 88.64 | 97.63 | 98.25 | 96.05 | 95.11 | 92.06 | 90.11 | 88.10 | 82.04 | 77.91 |
finnish-tdt-ud-2.6-200830 | Gold tokenization | — | — | 97.97 | 98.56 | 96.37 | 95.48 | 92.38 | 91.69 | 89.63 | 83.30 | 79.18 |
finnish-ftb-ud-2.6-200830 | Raw text | 99.91 | 86.84 | 96.52 | 95.08 | 96.72 | 93.82 | 95.73 | 89.93 | 87.32 | 80.13 | 80.74 |
finnish-ftb-ud-2.6-200830 | Gold tokenization | — | — | 96.85 | 95.31 | 96.87 | 94.16 | 95.83 | 91.99 | 89.34 | 82.64 | 83.05 |
french-gsd-ud-2.6-200830 | Raw text | 98.87 | 94.67 | 97.23 | 98.86 | 96.65 | 96.00 | 97.69 | 92.77 | 90.82 | 83.14 | 86.08 |
french-gsd-ud-2.6-200830 | Gold tokenization | — | — | 98.29 | 99.99 | 97.63 | 96.94 | 98.80 | 94.46 | 92.63 | 84.72 | 87.21 |
french-sequoia-ud-2.6-200830 | Raw text | 99.09 | 87.50 | 98.33 | — | 97.25 | 96.79 | 98.16 | 93.90 | 92.45 | 86.54 | 89.25 |
french-sequoia-ud-2.6-200830 | Gold tokenization | — | — | 99.32 | — | 98.19 | 97.78 | 99.09 | 95.80 | 94.43 | 88.78 | 90.78 |
french-partut-ud-2.6-200830 | Raw text | 99.42 | 100.00 | 97.28 | 96.93 | 94.17 | 93.63 | 95.59 | 94.71 | 92.71 | 80.18 | 83.34 |
french-partut-ud-2.6-200830 | Gold tokenization | — | — | 97.89 | 97.54 | 94.74 | 94.20 | 96.20 | 95.47 | 93.62 | 81.20 | 84.28 |
french-spoken-ud-2.6-200830 | Raw text | 99.06 | 21.15 | 96.49 | 96.44 | — | 93.98 | 97.48 | 79.23 | 74.91 | 64.48 | 66.67 |
french-spoken-ud-2.6-200830 | Gold tokenization | — | — | 97.63 | 97.31 | — | 95.00 | 98.28 | 87.27 | 82.51 | 74.23 | 75.56 |
galician-ctg-ud-2.6-200830 | Raw text | 99.22 | 97.22 | 97.30 | 97.07 | 99.05 | 96.71 | 98.07 | 85.45 | 83.07 | 72.03 | 76.75 |
galician-ctg-ud-2.6-200830 | Gold tokenization | — | — | 98.04 | 97.79 | 99.83 | 97.43 | 98.82 | 87.22 | 84.73 | 74.05 | 78.78 |
galician-treegal-ud-2.6-200830 | Raw text | 98.74 | 87.99 | 95.99 | 93.58 | 94.72 | 92.63 | 96.71 | 83.26 | 79.23 | 67.54 | 71.73 |
galician-treegal-ud-2.6-200830 | Gold tokenization | — | — | 97.23 | 94.65 | 95.76 | 93.73 | 97.89 | 86.57 | 82.30 | 71.04 | 75.71 |
german-hdt-ud-2.6-200830 | Raw text | 99.91 | 92.34 | 98.51 | 98.45 | 94.09 | 93.69 | 97.23 | 96.88 | 95.96 | 84.87 | 90.41 |
german-hdt-ud-2.6-200830 | Gold tokenization | — | — | 98.62 | 98.57 | 94.21 | 93.81 | 97.32 | 97.57 | 96.67 | 85.53 | 91.10 |
german-gsd-ud-2.6-200830 | Raw text | 99.58 | 80.90 | 94.39 | 97.51 | 91.14 | 85.97 | 96.58 | 87.06 | 82.93 | 62.33 | 74.97 |
german-gsd-ud-2.6-200830 | Gold tokenization | — | — | 94.73 | 97.96 | 91.65 | 86.51 | 96.95 | 89.36 | 85.31 | 64.33 | 77.26 |
gothic-proiel-ud-2.6-200830 | Raw text | 100.00 | 31.12 | 96.39 | 96.90 | 90.18 | 88.05 | 94.70 | 74.10 | 68.48 | 55.16 | 62.26 |
gothic-proiel-ud-2.6-200830 | Gold tokenization | — | — | 96.81 | 97.26 | 91.12 | 89.28 | 94.77 | 83.73 | 77.93 | 65.37 | 70.85 |
greek-gdt-ud-2.6-200830 | Raw text | 99.87 | 90.19 | 97.99 | 98.00 | 95.57 | 94.91 | 95.53 | 93.00 | 91.16 | 81.28 | 80.73 |
greek-gdt-ud-2.6-200830 | Gold tokenization | — | — | 98.14 | 98.14 | 95.69 | 95.02 | 95.61 | 93.82 | 91.95 | 82.03 | 81.53 |
hebrew-htb-ud-2.6-200830 | Raw text | 85.04 | 99.39 | 82.79 | 82.76 | 81.31 | 80.57 | 82.97 | 69.85 | 67.39 | 54.79 | 59.16 |
hebrew-htb-ud-2.6-200830 | Gold tokenization | — | — | 97.48 | 97.48 | 96.03 | 95.36 | 97.23 | 91.83 | 89.25 | 78.52 | 81.02 |
hindi-hdtb-ud-2.6-200830 | Raw text | 100.00 | 98.90 | 97.64 | 97.29 | 94.18 | 92.32 | 98.78 | 95.32 | 92.37 | 79.24 | 87.69 |
hindi-hdtb-ud-2.6-200830 | Gold tokenization | — | — | 97.65 | 97.29 | 94.21 | 92.35 | 98.78 | 95.44 | 92.49 | 79.41 | 87.84 |
hungarian-szeged-ud-2.6-200830 | Raw text | 99.85 | 95.89 | 96.77 | — | 94.32 | 93.51 | 94.97 | 87.78 | 84.24 | 74.80 | 77.84 |
hungarian-szeged-ud-2.6-200830 | Gold tokenization | — | — | 96.87 | — | 94.45 | 93.61 | 95.09 | 88.28 | 84.73 | 75.27 | 78.26 |
indonesian-gsd-ud-2.6-200830 | Raw text | 100.00 | 94.13 | 93.89 | 94.28 | 95.55 | 89.00 | 99.61 | 86.07 | 79.97 | 69.25 | 77.74 |
indonesian-gsd-ud-2.6-200830 | Gold tokenization | — | — | 93.90 | 94.26 | 95.52 | 88.98 | 99.61 | 86.32 | 80.18 | 69.51 | 78.00 |
irish-idt-ud-2.6-200830 | Raw text | 99.71 | 97.36 | 94.35 | 94.30 | 73.43 | 70.38 | 93.18 | 84.47 | 77.88 | 40.78 | 65.74 |
irish-idt-ud-2.6-200830 | Gold tokenization | — | — | 94.59 | 94.60 | 73.65 | 70.63 | 93.41 | 84.98 | 78.30 | 40.94 | 65.87 |
italian-isdt-ud-2.6-200830 | Raw text | 99.84 | 98.76 | 98.52 | 98.44 | 98.23 | 97.66 | 98.65 | 94.77 | 93.12 | 86.91 | 87.85 |
italian-isdt-ud-2.6-200830 | Gold tokenization | — | — | 98.68 | 98.60 | 98.38 | 97.81 | 98.81 | 95.07 | 93.44 | 87.20 | 88.19 |
italian-partut-ud-2.6-200830 | Raw text | 99.73 | 100.00 | 98.41 | 98.52 | 98.27 | 97.77 | 98.74 | 96.07 | 93.90 | 87.45 | 88.95 |
italian-partut-ud-2.6-200830 | Gold tokenization | — | — | 98.54 | 98.65 | 98.38 | 97.88 | 98.93 | 96.18 | 93.98 | 87.48 | 89.15 |
italian-postwita-ud-2.6-200830 | Raw text | 99.47 | 30.49 | 96.53 | 96.28 | 96.43 | 94.89 | 96.76 | 80.97 | 76.94 | 65.79 | 67.44 |
italian-postwita-ud-2.6-200830 | Gold tokenization | — | — | 97.06 | 96.79 | 96.89 | 95.41 | 97.18 | 88.04 | 83.76 | 75.23 | 76.98 |
italian-twittiro-ud-2.6-200830 | Raw text | 99.06 | 36.80 | 95.99 | 95.86 | 95.22 | 93.37 | 94.68 | 81.69 | 77.38 | 64.34 | 65.32 |
italian-twittiro-ud-2.6-200830 | Gold tokenization | — | — | 97.01 | 96.77 | 96.14 | 94.42 | 95.50 | 87.84 | 83.43 | 71.64 | 72.68 |
italian-vit-ud-2.6-200830 | Raw text | 99.69 | 94.69 | 97.86 | 97.07 | 97.38 | 95.76 | 98.64 | 92.03 | 89.20 | 80.39 | 83.83 |
italian-vit-ud-2.6-200830 | Gold tokenization | — | — | 98.16 | 97.49 | 97.66 | 96.16 | 98.92 | 92.77 | 89.91 | 81.15 | 84.53 |
japanese-gsd-ud-2.6-200830 | Raw text | 95.34 | 94.61 | 93.67 | 93.56 | 95.32 | 92.74 | 95.02 | 85.11 | 84.01 | 76.23 | 77.83 |
japanese-gsd-ud-2.6-200830 | Gold tokenization | — | — | 98.03 | 97.71 | 99.99 | 96.83 | 99.61 | 94.73 | 93.41 | 87.64 | 89.28 |
korean-kaist-ud-2.6-200830 | Raw text | 99.95 | 100.00 | 95.89 | 87.82 | — | 87.62 | 94.23 | 89.41 | 87.58 | 82.32 | 80.34 |
korean-kaist-ud-2.6-200830 | Gold tokenization | — | — | 95.94 | 87.85 | — | 87.66 | 94.27 | 89.51 | 87.67 | 82.42 | 80.42 |
korean-gsd-ud-2.6-200830 | Raw text | 99.87 | 93.93 | 96.61 | 90.19 | 99.69 | 88.03 | 93.51 | 88.68 | 85.04 | 80.93 | 77.36 |
korean-gsd-ud-2.6-200830 | Gold tokenization | — | — | 96.74 | 90.32 | 99.82 | 88.16 | 93.64 | 89.50 | 85.84 | 81.76 | 78.14 |
latin-ittb-ud-2.6-200830 | Raw text | 99.99 | 92.44 | 98.54 | 96.35 | 96.92 | 95.12 | 98.94 | 90.31 | 88.16 | 82.19 | 85.37 |
latin-ittb-ud-2.6-200830 | Gold tokenization | — | — | 98.52 | 96.37 | 96.92 | 95.11 | 98.93 | 91.24 | 89.07 | 82.62 | 85.88 |
latin-llct-ud-2.6-200830 | Raw text | 100.00 | 99.49 | 99.60 | 97.13 | 97.11 | 96.63 | 97.68 | 95.48 | 94.35 | 89.31 | 90.44 |
latin-llct-ud-2.6-200830 | Gold tokenization | — | — | 99.60 | 97.14 | 97.11 | 96.63 | 97.68 | 95.54 | 94.40 | 89.40 | 90.53 |
latin-proiel-ud-2.6-200830 | Raw text | 99.87 | 36.81 | 96.67 | 96.81 | 90.71 | 89.59 | 96.16 | 74.44 | 69.97 | 57.51 | 64.96 |
latin-proiel-ud-2.6-200830 | Gold tokenization | — | — | 97.07 | 97.16 | 91.53 | 90.52 | 96.42 | 83.78 | 79.04 | 67.58 | 73.88 |
latin-perseus-ud-2.6-200830 | Raw text | 100.00 | 98.46 | 91.65 | 81.18 | 86.33 | 78.75 | 88.05 | 78.09 | 68.97 | 52.82 | 56.03 |
latin-perseus-ud-2.6-200830 | Gold tokenization | — | — | 91.64 | 81.17 | 86.33 | 78.74 | 88.04 | 78.21 | 69.07 | 52.84 | 55.99 |
latvian-lvtb-ud-2.6-200830 | Raw text | 99.32 | 98.74 | 96.28 | 89.64 | 93.79 | 88.84 | 95.81 | 88.31 | 85.26 | 75.23 | 79.56 |
latvian-lvtb-ud-2.6-200830 | Gold tokenization | — | — | 96.92 | 90.24 | 94.40 | 89.43 | 96.45 | 89.33 | 86.23 | 76.29 | 80.60 |
lithuanian-alksnis-ud-2.6-200830 | Raw text | 99.91 | 87.87 | 95.97 | 90.37 | 91.07 | 89.41 | 93.61 | 82.54 | 78.70 | 67.95 | 71.30 |
lithuanian-alksnis-ud-2.6-200830 | Gold tokenization | — | — | 96.04 | 90.40 | 91.18 | 89.49 | 93.70 | 83.93 | 80.08 | 69.02 | 72.43 |
lithuanian-hse-ud-2.6-200830 | Raw text | 97.30 | 97.30 | 89.66 | 89.28 | 81.45 | 77.07 | 87.98 | 70.92 | 62.53 | 44.26 | 53.76 |
lithuanian-hse-ud-2.6-200830 | Gold tokenization | — | — | 91.23 | 91.32 | 83.21 | 78.40 | 90.28 | 73.77 | 64.53 | 45.25 | 54.68 |
maltese-mudt-ud-2.6-200830 | Raw text | 99.84 | 86.29 | 95.77 | 95.66 | — | 95.30 | — | 84.76 | 79.76 | 68.39 | 72.24 |
maltese-mudt-ud-2.6-200830 | Gold tokenization | — | — | 95.88 | 95.77 | — | 95.40 | — | 85.46 | 80.38 | 68.69 | 72.66 |
marathi-ufal-ud-2.6-200830 | Raw text | 90.25 | 92.63 | 78.50 | — | 65.25 | 61.50 | 80.00 | 61.25 | 53.50 | 31.73 | 40.92 |
marathi-ufal-ud-2.6-200830 | Gold tokenization | — | — | 84.22 | — | 68.69 | 63.83 | 80.10 | 70.39 | 60.92 | 31.95 | 42.32 |
naija-nsc-ud-2.6-200830 | Raw text | 100.00 | 99.56 | 98.14 | — | 99.16 | 97.77 | 99.27 | 92.46 | 89.81 | 84.18 | 86.20 |
naija-nsc-ud-2.6-200830 | Gold tokenization | — | — | 98.14 | — | 99.16 | 97.77 | 99.27 | 92.50 | 89.84 | 84.25 | 86.26 |
north_sami-giella-ud-2.6-200830 | Raw text | 99.87 | 98.79 | 92.35 | 93.57 | 89.40 | 85.61 | 86.85 | 76.66 | 71.84 | 60.71 | 58.95 |
north_sami-giella-ud-2.6-200830 | Gold tokenization | — | — | 92.47 | 93.70 | 89.56 | 85.75 | 86.96 | 76.97 | 72.16 | 60.95 | 59.19 |
norwegian-bokmaal-ud-2.6-200830 | Raw text | 99.83 | 95.63 | 98.37 | — | 97.52 | 96.86 | 98.55 | 93.74 | 92.26 | 87.03 | 88.76 |
norwegian-bokmaal-ud-2.6-200830 | Gold tokenization | — | — | 98.57 | — | 97.71 | 97.05 | 98.75 | 94.48 | 93.00 | 87.67 | 89.43 |
norwegian-nynorsk-ud-2.6-200830 | Raw text | 99.91 | 94.11 | 98.36 | — | 97.38 | 96.67 | 98.37 | 93.86 | 92.11 | 86.07 | 88.11 |
norwegian-nynorsk-ud-2.6-200830 | Gold tokenization | — | — | 98.50 | — | 97.51 | 96.80 | 98.50 | 94.66 | 92.93 | 86.95 | 89.01 |
norwegian-nynorsklia-ud-2.6-200830 | Raw text | 99.91 | 99.53 | 96.45 | — | 95.71 | 93.62 | 98.05 | 80.90 | 76.53 | 65.74 | 69.55 |
norwegian-nynorsklia-ud-2.6-200830 | Gold tokenization | — | — | 96.55 | — | 95.79 | 93.72 | 98.14 | 81.15 | 76.76 | 65.94 | 69.80 |
old_church_slavonic-proiel-ud-2.6-200830 | Raw text | 100.00 | 41.43 | 96.58 | 96.83 | 90.44 | 89.17 | 93.19 | 77.42 | 73.57 | 63.51 | 68.53 |
old_church_slavonic-proiel-ud-2.6-200830 | Gold tokenization | — | — | 96.89 | 97.09 | 91.22 | 89.97 | 93.20 | 87.95 | 83.81 | 73.92 | 77.26 |
old_french-srcmf-ud-2.6-200830 | Raw text | 99.93 | 100.00 | 96.40 | 96.27 | 97.80 | 95.58 | — | 92.28 | 87.74 | 81.08 | 84.17 |
old_french-srcmf-ud-2.6-200830 | Gold tokenization | — | — | 96.47 | 96.33 | 97.86 | 95.64 | — | 92.36 | 87.81 | 81.17 | 84.26 |
old_russian-torot-ud-2.6-200830 | Raw text | 100.00 | 29.60 | 94.33 | 94.39 | 87.51 | 85.16 | 85.82 | 70.66 | 65.18 | 51.26 | 53.18 |
old_russian-torot-ud-2.6-200830 | Gold tokenization | — | — | 94.93 | 94.99 | 88.44 | 86.35 | 85.77 | 83.15 | 77.17 | 63.78 | 62.66 |
old_russian-rnc-ud-2.6-200830 | Raw text | 98.15 | 85.46 | 91.80 | 87.74 | 75.83 | 66.63 | 74.94 | 63.08 | 57.53 | 33.85 | 35.04 |
old_russian-rnc-ud-2.6-200830 | Gold tokenization | — | — | 93.34 | 89.43 | 77.09 | 67.76 | 76.13 | 66.86 | 60.73 | 36.05 | 37.07 |
persian-seraji-ud-2.6-200830 | Raw text | 99.65 | 98.75 | 97.69 | 97.66 | 97.75 | 97.29 | 96.67 | 91.09 | 88.15 | 83.43 | 82.26 |
persian-seraji-ud-2.6-200830 | Gold tokenization | — | — | 97.98 | 97.97 | 98.07 | 97.60 | 96.94 | 91.74 | 88.76 | 84.00 | 82.82 |
polish-pdb-ud-2.6-200830 | Raw text | 99.85 | 97.33 | 98.88 | 95.73 | 95.84 | 95.03 | 98.05 | 94.02 | 92.01 | 84.93 | 88.08 |
polish-pdb-ud-2.6-200830 | Gold tokenization | — | — | 99.04 | 95.88 | 95.99 | 95.18 | 98.19 | 94.51 | 92.50 | 85.36 | 88.52 |
polish-lfg-ud-2.6-200830 | Raw text | 99.85 | 99.65 | 98.92 | 95.99 | 96.51 | 95.06 | 98.27 | 96.89 | 95.52 | 89.73 | 92.45 |
polish-lfg-ud-2.6-200830 | Gold tokenization | — | — | 99.09 | 96.18 | 96.70 | 95.25 | 98.41 | 97.29 | 95.91 | 90.12 | 92.77 |
portuguese-gsd-ud-2.6-200830 | Raw text | 99.84 | 97.50 | 98.53 | 98.52 | 99.71 | 98.43 | 99.33 | 94.57 | 93.47 | 88.69 | 90.02 |
portuguese-gsd-ud-2.6-200830 | Gold tokenization | — | — | 98.69 | 98.69 | 99.87 | 98.59 | 99.49 | 94.94 | 93.82 | 89.11 | 90.36 |
portuguese-bosque-ud-2.6-200830 | Raw text | 99.55 | 90.64 | 97.19 | — | 96.17 | 94.79 | 97.98 | 92.32 | 89.72 | 79.29 | 84.22 |
portuguese-bosque-ud-2.6-200830 | Gold tokenization | — | — | 97.60 | — | 96.49 | 95.11 | 98.42 | 93.53 | 90.80 | 80.42 | 85.51 |
romanian-rrt-ud-2.6-200830 | Raw text | 99.69 | 95.28 | 97.79 | 97.18 | 97.32 | 96.81 | 98.20 | 91.83 | 87.56 | 80.00 | 82.17 |
romanian-rrt-ud-2.6-200830 | Gold tokenization | — | — | 98.08 | 97.44 | 97.60 | 97.08 | 98.49 | 92.74 | 88.38 | 80.82 | 82.88 |
romanian-nonstandard-ud-2.6-200830 | Raw text | 98.35 | 96.73 | 95.61 | 91.38 | 90.03 | 88.67 | 94.23 | 88.89 | 84.47 | 67.59 | 75.81 |
romanian-nonstandard-ud-2.6-200830 | Gold tokenization | — | — | 97.21 | 92.90 | 91.53 | 90.13 | 95.74 | 91.00 | 86.49 | 69.53 | 77.28 |
russian-syntagrus-ud-2.6-200830 | Raw text | 99.60 | 98.80 | 98.86 | — | 97.60 | 97.38 | 98.33 | 94.22 | 92.97 | 89.27 | 90.35 |
russian-syntagrus-ud-2.6-200830 | Gold tokenization | — | — | 99.27 | — | 97.98 | 97.76 | 98.68 | 94.99 | 93.72 | 89.90 | 90.95 |
russian-gsd-ud-2.6-200830 | Raw text | 99.50 | 96.22 | 98.03 | 97.51 | 94.76 | 93.60 | 96.89 | 91.66 | 88.38 | 80.67 | 84.18 |
russian-gsd-ud-2.6-200830 | Gold tokenization | — | — | 98.49 | 97.98 | 95.17 | 93.97 | 97.27 | 92.77 | 89.43 | 81.44 | 85.05 |
russian-taiga-ud-2.6-200830 | Raw text | 97.16 | 82.69 | 94.13 | 95.72 | 90.01 | 87.50 | 93.05 | 81.17 | 76.99 | 65.28 | 69.94 |
russian-taiga-ud-2.6-200830 | Gold tokenization | — | — | 96.47 | 98.56 | 92.72 | 89.87 | 95.68 | 85.57 | 80.81 | 68.93 | 73.90 |
sanskrit-vedic-ud-2.6-200830 | Raw text | 100.00 | 27.18 | 89.50 | — | 81.72 | 77.12 | 87.11 | 60.79 | 49.75 | 41.65 | 44.67 |
sanskrit-vedic-ud-2.6-200830 | Gold tokenization | — | — | 90.01 | — | 83.11 | 78.58 | 87.24 | 73.34 | 61.55 | 51.87 | 54.91 |
scottish_gaelic-arcosg-ud-2.6-200830 | Raw text | 99.58 | 55.57 | 93.63 | 87.07 | 89.78 | 85.43 | 95.41 | 77.66 | 71.86 | 55.15 | 60.51 |
scottish_gaelic-arcosg-ud-2.6-200830 | Gold tokenization | — | — | 94.26 | 87.84 | 90.23 | 86.30 | 95.85 | 83.77 | 77.61 | 62.05 | 68.26 |
serbian-set-ud-2.6-200830 | Raw text | 99.99 | 93.00 | 98.98 | 95.75 | 95.92 | 95.35 | 97.82 | 93.66 | 91.18 | 83.18 | 86.80 |
serbian-set-ud-2.6-200830 | Gold tokenization | — | — | 99.01 | 95.78 | 95.94 | 95.39 | 97.83 | 94.33 | 91.82 | 83.84 | 87.45 |
slovak-snk-ud-2.6-200830 | Raw text | 100.00 | 85.28 | 97.19 | 87.79 | 92.66 | 86.71 | 96.52 | 91.71 | 89.60 | 78.75 | 84.54 |
slovak-snk-ud-2.6-200830 | Gold tokenization | — | — | 97.30 | 88.06 | 92.84 | 86.98 | 96.60 | 93.68 | 91.57 | 80.55 | 86.59 |
slovenian-ssj-ud-2.6-200830 | Raw text | 97.99 | 67.98 | 96.93 | 94.35 | 94.56 | 93.95 | 96.59 | 88.09 | 86.65 | 80.90 | 83.58 |
slovenian-ssj-ud-2.6-200830 | Gold tokenization | — | — | 98.86 | 96.44 | 96.69 | 96.01 | 98.54 | 94.41 | 92.96 | 86.68 | 89.30 |
slovenian-sst-ud-2.6-200830 | Raw text | 99.85 | 23.14 | 94.70 | 92.70 | 92.52 | 89.74 | 97.14 | 64.23 | 59.57 | 49.25 | 52.95 |
slovenian-sst-ud-2.6-200830 | Gold tokenization | — | — | 95.71 | 93.11 | 92.94 | 90.90 | 97.46 | 77.81 | 72.24 | 62.71 | 67.18 |
spanish-ancora-ud-2.6-200830 | Raw text | 99.95 | 98.32 | 99.09 | 99.02 | 98.87 | 98.33 | 99.36 | 93.62 | 91.78 | 86.82 | 88.06 |
spanish-ancora-ud-2.6-200830 | Gold tokenization | — | — | 99.14 | 99.06 | 98.91 | 98.37 | 99.40 | 93.83 | 91.97 | 87.01 | 88.24 |
spanish-gsd-ud-2.6-200830 | Raw text | 99.76 | 94.54 | 97.17 | — | 97.05 | 95.32 | 98.80 | 92.00 | 89.70 | 79.23 | 84.49 |
spanish-gsd-ud-2.6-200830 | Gold tokenization | — | — | 97.40 | — | 97.27 | 95.54 | 99.02 | 92.73 | 90.38 | 79.93 | 85.13 |
swedish-talbanken-ud-2.6-200830 | Raw text | 99.89 | 96.13 | 98.41 | 97.26 | 97.33 | 96.46 | 98.19 | 91.99 | 89.68 | 83.63 | 85.82 |
swedish-talbanken-ud-2.6-200830 | Gold tokenization | — | — | 98.51 | 97.38 | 97.44 | 96.57 | 98.30 | 92.46 | 90.14 | 84.12 | 86.33 |
swedish-lines-ud-2.6-200830 | Raw text | 99.96 | 87.20 | 97.71 | 95.47 | 90.89 | 88.10 | 97.76 | 89.14 | 85.80 | 71.44 | 81.67 |
swedish-lines-ud-2.6-200830 | Gold tokenization | — | — | 97.75 | 95.48 | 90.91 | 88.09 | 97.79 | 89.91 | 86.52 | 72.13 | 82.48 |
tamil-ttb-ud-2.6-200830 | Raw text | 94.51 | 97.52 | 88.39 | 82.92 | 85.30 | 82.11 | 89.15 | 70.28 | 64.91 | 54.93 | 58.46 |
tamil-ttb-ud-2.6-200830 | Gold tokenization | — | — | 93.36 | 87.28 | 90.10 | 86.22 | 93.97 | 78.03 | 71.79 | 61.09 | 64.80 |
telugu-mtg-ud-2.6-200830 | Raw text | 99.58 | 96.62 | 93.63 | 93.63 | 98.48 | 93.63 | — | 90.17 | 83.52 | 76.00 | 79.62 |
telugu-mtg-ud-2.6-200830 | Gold tokenization | — | — | 94.04 | 94.04 | 98.89 | 94.04 | — | 91.12 | 84.47 | 76.84 | 80.46 |
turkish-imst-ud-2.6-200830 | Raw text | 98.30 | 96.97 | 94.48 | 93.69 | 92.06 | 89.95 | 94.41 | 72.63 | 66.80 | 58.31 | 61.57 |
turkish-imst-ud-2.6-200830 | Gold tokenization | — | — | 96.10 | 95.32 | 93.66 | 91.50 | 96.00 | 76.10 | 69.93 | 60.33 | 63.83 |
ukrainian-iu-ud-2.6-200830 | Raw text | 99.81 | 96.61 | 97.89 | 94.22 | 94.18 | 93.13 | 97.39 | 90.59 | 88.24 | 78.76 | 83.19 |
ukrainian-iu-ud-2.6-200830 | Gold tokenization | — | — | 98.10 | 94.42 | 94.34 | 93.30 | 97.56 | 91.11 | 88.75 | 79.11 | 83.59 |
urdu-udtb-ud-2.6-200830 | Raw text | 100.00 | 98.31 | 94.10 | 92.27 | 82.89 | 78.41 | 97.38 | 88.27 | 82.63 | 56.79 | 74.77 |
urdu-udtb-ud-2.6-200830 | Gold tokenization | — | — | 94.08 | 92.26 | 82.92 | 78.43 | 97.39 | 88.37 | 82.74 | 56.90 | 74.92 |
uyghur-udt-ud-2.6-200830 | Raw text | 99.54 | 81.81 | 89.24 | 91.70 | 88.47 | 80.04 | 94.76 | 76.58 | 64.72 | 46.67 | 55.08 |
uyghur-udt-ud-2.6-200830 | Gold tokenization | — | — | 89.67 | 92.21 | 88.92 | 80.47 | 95.27 | 78.39 | 66.27 | 47.53 | 56.23 |
vietnamese-vtb-ud-2.6-200830 | Raw text | 85.37 | 93.46 | 78.19 | 76.69 | 85.11 | 76.53 | 85.15 | 52.80 | 47.90 | 41.56 | 44.31 |
vietnamese-vtb-ud-2.6-200830 | Gold tokenization | — | — | 90.56 | 88.69 | 99.72 | 88.47 | 99.58 | 72.63 | 65.26 | 58.85 | 62.42 |
welsh-ccg-ud-2.6-200830 | Raw text | 99.42 | 96.28 | 94.02 | 92.96 | 89.04 | 86.39 | 92.88 | 85.79 | 79.16 | 60.98 | 66.81 |
welsh-ccg-ud-2.6-200830 | Gold tokenization | — | — | 94.54 | 93.51 | 89.52 | 86.83 | 93.46 | 87.04 | 80.35 | 62.14 | 68.05 |
wolof-wtb-ud-2.6-200830 | Raw text | 99.23 | 91.95 | 94.25 | 94.12 | 93.37 | 91.19 | 95.22 | 83.79 | 78.59 | 66.50 | 70.09 |
wolof-wtb-ud-2.6-200830 | Gold tokenization | — | — | 95.19 | 95.03 | 94.22 | 92.10 | 95.97 | 85.98 | 80.75 | 68.61 | 72.08 |
PDT-C 1.0 Model is distributed under theCC BY-NC-SA licence.The model is trained onPDT-C 1.0 treebankusingRobeCzech model, and performsmorphological analysis using theMorfFlex CZ 2.0morphological dictionary viaMorphoDiTa.
The model requiresUDPipe 2.1, togetherwith Python packagesufal.udpipeversion at least 1.3.1.1 andufal.morphoditaversion at least 1.11.2.1.
The latest version 231116 of the Czech PDT-C 1.0 modelcan be downloaded from theLINDAT/CLARIN repository.
The model is also available in theREST service.
PDT-C 1.0 uses thePDT-C tag set from MorfFlex CZ 2.0, which is an evolutionof the originalPDT tag set devised by Jan Hajič(Hajič, 2004).The tags are positional with 15 positions corresponding to part of speech,detailed part of speech, gender, number, case, etc. (e.g.NNFS1-----A----
).Different meanings of same lemmas are distinguished and additional comments canbe provided for every lemma meaning. The complete reference can be found in theManual for Morphological Annotation, Revision for the Prague DependencyTreebank - Consolidated 2020 releaseand quick reference is available in thePDT-C positional morphological tagsoverview.
The PDT-C 1.0 emply dependency relations from thePDT analyticallevel, witha quick reference available in thePDT-C analytical functions and clausesegmentation overview.
In the CoNLL-U format, the
XPOS
column, andDEPREL
, even if they aredifferent from the universal dependency relations.The PDT-C corpus consists of four datasets, but some of them do not havean official train/dev/test split. We therefore used the following split:
dtest
), and test (etest
).This work has been supported by the LINDAT/CLARIAH-CZ project funded by Ministryof Education, Youth and Sports of the Czech Republic (project LM2023062).
We evaluate tagging and lemmatization on the four datasets of PDT-C 1.0,and we also compute a macro-average. For lemmatization, we use the followingmetrics:
Lemmas
: a primary metric comparing thelemma proper, which is the lemmawith an optional lemma number (but we ignore the additional lemma commentslike “this is a given name”);LemmasEM
: an exact match comparing also the lemma comments. This metric isless or equal toLemmas
. Our model directly predicts only lemma proper (noadditional comments), and relies on the morphological dictionary to supply thecomments, so it fails to generate comments for unknown words (like an unknowngiven name).We perform the evaluation using theudpipe2_eval.py,which is a minor extension of theCoNLL 2018 SharedTask evaluationscript.
Because the model also include a rule-based tokenizer and sentence splitter,we evaluate both:
Treebank | Mode | Tokens | Sents | XPOS | Lemma | LemmaEM |
---|---|---|---|---|---|---|
PDT | Raw text | 99.91 | 88.00 | 98.69 | 99.10 | 98.86 |
PDT | Gold tokenization | — | — | 98.78 | 99.19 | 98.96 |
PCEDT | Raw text | 99.97 | 94.06 | 98.77 | 99.36 | 98.75 |
PCEDT | Gold tokenization | — | — | 98.80 | 99.40 | 98.78 |
PDTSC | Raw text | 100.0 | 98.31 | 98.77 | 99.23 | 99.16 |
PDTSC | Gold tokenization | — | — | 98.77 | 99.23 | 99.16 |
FAUST | Raw text | 100.0 | 10.98 | 97.05 | 98.88 | 98.43 |
FAUST | Gold tokenization | — | — | 97.42 | 98.78 | 98.30 |
MacroAvg | Gold tokenization | — | — | 98.44 | 99.15 | 98.80 |
In PDT-C 1.0, the only manually annotated dependency parsing dataset is a subsetof the PDT dataset. We perform the evaluation as in the previous section.
Treebank | Mode | Tokens | Sents | XPOS | Lemma | LemmaEM | UAS | LAS |
---|---|---|---|---|---|---|---|---|
PDT subset | Raw text | 99.94 | 88.49 | 98.74 | 99.16 | 98.97 | 93.45 | 90.32 |
PDT subset | Gold tokenization | — | — | 98.81 | 99.23 | 99.03 | 94.41 | 91.48 |
EvaLatin 2020 Models are distributed under theCC BY-NC-SA licence.The models are based solely onEvaLatin 2020treebanks, and additionally usemultilingual BERT.
The models requireUDPipe 2.
The latest version 200831 of the EvaLatin 2020 modelscan be downloaded fromLINDAT/CLARIN repository.
The models are also available in theREST service.
This work was supported by the grant no. GX20-16819X of the Grant Agency of theCzech Republic, and has been using language resources stored and distributed bythe LINDAT/CLARIAH-CZ project of the Ministry of Education, Youth and Sports ofthe Czech Republic (project LM2018101).
The models were trained onEvaLatin 2020 treebanks.
Finally,multilingual BERTis used to provide contextualized word embeddings.
Model | Dataset | UPOS | Lemma |
---|---|---|---|
latin-evalatin20-200830 | test classical | 96.73 | 96.39 |
latin-evalatin20-200830 | test cross-genre | 90.47 | 86.89 |
latin-evalatin20-200830 | test cross-time | 87.58 | 90.59 |