Movatterモバイル変換


[0]ホーム

URL:


Skip to main content
Logo ÚFALLogo MFF

Institute of Formal and Applied Linguistics


Charles University, Czech Republic
Faculty of Mathematics and Physics

ÚFAL

Search form

UDPipe 2|

UDPipe 2 Models

  1. Universal Dependencies 2.15 Models
  2. Universal Dependencies 2.12 Models
  3. Universal Dependencies 2.10 Models
  4. Universal Dependencies 2.6 Models
  5. Czech PDT-C 1.0 Model
  6. EvaLatin 2020 Models

Universal Dependencies 2.15 Models

Universal Dependencies 2.15 Models are distributed under theCC BY-NC-SA licence.The models are based solely onUniversal Dependencies2.15 treebanks, and additionallyusemultilingual BERTandRobeCzech.

The models requireUDPipe 2.

Download

The latest version 241121 of the Universal Dependencies 2.15 modelscan be downloaded fromLINDAT/CLARIN repository.

The models are also available in theREST service.

Acknowledgements

This work has been supported by the Ministry of Education, Youth and Sports ofthe Czech Republic, Project No. LM2023062 LINDAT/CLARIAH-CZ.

The models were trained onUniversal Dependencies 2.15 treebanks.

For the UD treebanks which do not contain original plain text version,raw text is used to train the tokenizer instead. The plain textswere taken from theW2C -- Web to Corpus.

Finally,multilingual BERTandRobeCzech are used to providecontextualized word embeddings.

Publications

Model Description

The Universal Dependencies 2.15 models contain 147 models of 78 languages, eachconsisting of a tokenizer, tagger, lemmatizer and dependency parser, all trainedusing the UD data. We used the original train-dev-test split, but for treebankswith only train and no dev data we used last 10% of the train data as dev data.We produce models only for treebanks with at least 1000 training words.

The tokenizer is trained using theSpaceAfter=No features. If the featuresare not present in the data, they can be filled in using raw text in thelanguage in question.

The tagger, lemmatizer and parser are trained using gold UD data.

Model Performance

We present the tokenizer, tagger, lemmatizer and parser performance, measured onthe testing portion of the data, evaluated both on the raw text and using thegold tokenization. The results are F1 scores measured by theconll18_ud_eval.pyscript.

ModelModeWordsSentsUPOSXPOSUFeatsAllTagsLemmaUASLASMLASBLEX
afrikaans-afribooms-ud-2.15-241121Raw text99.9499.6598.6895.6798.3095.4898.3890.3887.4678.8879.86
afrikaans-afribooms-ud-2.15-241121Gold tokenization98.7495.7298.3695.5398.4290.5987.6779.1080.02
albanian-staf-ud-2.15-241121Raw text98.3992.6888.9898.3970.1668.0182.8077.4266.4034.1540.49
albanian-staf-ud-2.15-241121Gold tokenization89.52100.0070.4368.0183.6077.6966.1333.9841.75
ancient_greek-proiel-ud-2.15-241121Raw text99.9849.1997.6897.9292.0490.6994.6582.0078.2262.3966.26
ancient_greek-proiel-ud-2.15-241121Gold tokenization97.8898.0892.3891.2094.6886.7682.9768.4772.07
ancient_greek-perseus-ud-2.15-241121Raw text99.9798.8593.0886.0491.6685.2386.6880.0374.4054.4455.56
ancient_greek-perseus-ud-2.15-241121Gold tokenization93.1586.1191.6985.2986.7180.2274.5754.6155.72
ancient_greek-ptnk-ud-2.15-241121Raw text99.9755.1298.4990.2489.1895.0487.9283.9165.1272.02
ancient_greek-ptnk-ud-2.15-241121Gold tokenization98.4790.3689.2795.0991.5987.3668.2275.12
ancient_hebrew-ptnk-ud-2.15-241121Raw text76.0598.0674.1274.1773.4272.3272.2355.7854.0139.7140.12
ancient_hebrew-ptnk-ud-2.15-241121Gold tokenization97.0997.2595.4994.0792.5092.1188.6875.5572.43
arabic-padt-ud-2.15-241121Raw text94.5882.0991.7089.0489.1588.6890.3178.6674.7966.0968.16
arabic-padt-ud-2.15-241121Gold tokenization96.9994.3794.5494.0195.2688.1483.6674.8776.34
armenian-armtdp-ud-2.15-241121Raw text99.2895.7096.2591.9090.6595.1486.9082.1069.4473.94
armenian-armtdp-ud-2.15-241121Gold tokenization96.8692.5391.2295.7388.4683.5670.1374.73
armenian-bsut-ud-2.15-241121Raw text99.7998.7397.3092.1491.3696.6890.0885.5271.1178.28
armenian-bsut-ud-2.15-241121Gold tokenization97.5292.3591.5896.8990.5986.0271.6078.76
basque-bdt-ud-2.15-241121Raw text99.9799.8396.2793.3491.4396.3388.1184.9874.9579.47
basque-bdt-ud-2.15-241121Gold tokenization96.3093.3791.4596.3488.1685.0374.9979.50
belarusian-hse-ud-2.15-241121Raw text99.3786.5898.2297.6294.5193.6593.0987.0685.0976.6276.27
belarusian-hse-ud-2.15-241121Gold tokenization98.8498.2195.1794.2693.6989.6487.4378.5878.05
bulgarian-btb-ud-2.15-241121Raw text99.9194.1799.1797.2797.9596.8598.0094.4991.8085.9086.51
bulgarian-btb-ud-2.15-241121Gold tokenization99.2997.3898.0596.9698.1095.3192.5786.5587.25
catalan-ancora-ud-2.15-241121Raw text99.9499.4999.0997.1898.6996.9299.4394.7793.2287.8789.32
catalan-ancora-ud-2.15-241121Gold tokenization99.1797.2898.7697.0199.4994.9393.3788.0589.49
chinese-gsdsimp-ud-2.15-241121Raw text90.2999.1086.9287.1989.6886.4990.2172.5570.0862.7466.30
chinese-gsdsimp-ud-2.15-241121Gold tokenization95.8195.9899.3795.2599.9086.6883.5677.7281.86
chinese-gsd-ud-2.15-241121Raw text90.2799.1086.6786.9889.6586.2390.2072.2469.8162.5366.10
chinese-gsd-ud-2.15-241121Gold tokenization95.8595.9999.4295.2999.9286.8883.8477.7582.10
classical_armenian-caval-ud-2.15-241121Raw text98.8060.2897.1694.9594.1097.4082.7979.5068.8373.76
classical_armenian-caval-ud-2.15-241121Gold tokenization98.2396.0595.1398.4988.8185.3073.4078.39
classical_chinese-kyoto-ud-2.15-241121Raw text97.9446.3789.6588.8391.5986.0097.5171.1865.9562.5264.56
classical_chinese-kyoto-ud-2.15-241121Gold tokenization93.2892.0594.5489.7799.5384.5179.2475.5278.15
coptic-scriptorium-ud-2.15-241121Raw text75.4228.5773.3873.3673.3072.4574.0352.1150.2638.6741.12
coptic-scriptorium-ud-2.15-241121Gold tokenization97.0196.9297.6795.7197.2490.6987.8577.2380.75
croatian-set-ud-2.15-241121Raw text99.9394.7998.4695.7996.1995.5097.7092.3489.4881.6684.64
croatian-set-ud-2.15-241121Gold tokenization98.5295.8996.2895.5997.7692.8389.9782.0885.09
czech-pdt-ud-2.15-241121Raw text99.9393.3799.2998.4298.7898.2299.3994.9293.5590.6892.23
czech-pdt-ud-2.15-241121Gold tokenization99.3698.5198.8698.3199.4795.7094.3291.3492.90
czech-cac-ud-2.15-241121Raw text99.9999.6899.6798.3298.1397.7699.1896.1694.9590.8992.85
czech-cac-ud-2.15-241121Gold tokenization99.6898.3398.1497.7799.1996.1694.9590.9292.87
czech-cltt-ud-2.15-241121Raw text99.3296.9298.7993.9894.1093.7298.5490.8089.0981.4386.95
czech-cltt-ud-2.15-241121Gold tokenization99.2494.3794.5694.1399.0192.0990.1182.0487.76
czech-fictree-ud-2.15-241121Raw text99.9998.9599.1797.0797.8996.8899.2796.1794.7489.6292.52
czech-fictree-ud-2.15-241121Gold tokenization99.1897.0897.9096.8999.2896.2594.8089.7492.64
danish-ddt-ud-2.15-241121Raw text99.8289.8097.8399.8297.3296.4497.5288.7486.7179.5781.68
danish-ddt-ud-2.15-241121Gold tokenization98.06100.0097.5696.7197.6889.9787.9380.6582.80
dutch-alpino-ud-2.15-241121Raw text99.7589.1098.0297.0997.8296.7395.6693.5291.4685.2582.50
dutch-alpino-ud-2.15-241121Gold tokenization98.2397.2897.9896.8895.8994.9292.8686.6083.78
dutch-lassysmall-ud-2.15-241121Raw text99.8684.6197.7896.7897.5596.3496.2492.9690.8583.3681.99
dutch-lassysmall-ud-2.15-241121Gold tokenization98.0297.1097.8696.7296.3995.1892.9886.1184.68
english-ewt-ud-2.15-241121Raw text99.0187.5596.5796.2497.0695.2796.9490.9789.1482.6683.86
english-ewt-ud-2.15-241121Gold tokenization97.5097.1797.9796.2097.8793.4291.5285.1086.21
english-atis-ud-2.15-241121Raw text100.0080.0398.9198.5098.0499.6794.1992.7187.4189.83
english-atis-ud-2.15-241121Gold tokenization99.0198.5698.1299.6596.0994.4790.0192.36
english-eslspok-ud-2.15-241121Raw text99.8788.6098.5998.6898.0694.0492.6890.5392.21
english-eslspok-ud-2.15-241121Gold tokenization98.7298.8198.1996.1294.6691.9093.81
english-gum-ud-2.15-241121Raw text99.7196.0698.1498.0897.9897.1398.9093.0091.2385.8887.29
english-gum-ud-2.15-241121Gold tokenization98.3998.3698.2797.4099.1793.7491.9286.4887.90
english-lines-ud-2.15-241121Raw text99.9387.7797.6496.8797.0894.4798.3891.0788.4480.6383.67
english-lines-ud-2.15-241121Gold tokenization97.7296.9697.1694.5398.4391.8989.2281.2484.39
english-partut-ud-2.15-241121Raw text99.7299.0297.3497.2696.8295.8298.2593.8291.8684.0087.14
english-partut-ud-2.15-241121Gold tokenization97.5997.5197.0796.0798.5394.0792.1184.3187.40
erzya-jr-ud-2.15-241121Raw text99.1097.0287.7587.3378.8773.6984.0272.7862.9941.5147.05
erzya-jr-ud-2.15-241121Gold tokenization88.5287.9779.5274.2484.6473.7463.7841.8447.46
estonian-edt-ud-2.15-241121Raw text99.9491.4697.8098.4096.6595.6095.4588.9886.5081.0380.14
estonian-edt-ud-2.15-241121Gold tokenization97.9198.4596.7395.7195.5389.8587.3581.8480.91
estonian-ewt-ud-2.15-241121Raw text98.6378.0394.9596.1994.0991.8694.0483.4880.2772.4273.82
estonian-ewt-ud-2.15-241121Gold tokenization96.3097.5395.3893.1795.2887.4783.9775.3676.72
faroese-farpahc-ud-2.15-241121Raw text99.7492.7797.3893.0994.3992.3899.7486.3582.4868.5075.88
faroese-farpahc-ud-2.15-241121Gold tokenization97.5693.2894.6092.52100.0087.3283.3569.3076.97
finnish-tdt-ud-2.15-241121Raw text99.7090.8297.6198.2395.9795.0692.0790.3488.3882.1578.29
finnish-tdt-ud-2.15-241121Gold tokenization97.9298.5496.2695.4192.3491.7289.7083.1779.26
finnish-ftb-ud-2.15-241121Raw text99.9186.8496.7495.0896.7194.1195.6590.2687.6180.4281.05
finnish-ftb-ud-2.15-241121Gold tokenization97.0395.3396.8494.3895.7792.3289.6282.9083.38
french-gsd-ud-2.15-241121Raw text98.9594.6797.4798.9597.3496.7697.8393.5691.5685.3687.34
french-gsd-ud-2.15-241121Gold tokenization98.48100.0098.3997.7998.8695.0493.2387.0188.35
french-parisstories-ud-2.15-241121Raw text99.6493.3697.4599.6494.9193.3998.8480.6477.4066.2173.69
french-parisstories-ud-2.15-241121Gold tokenization97.80100.0095.2393.7199.1881.7478.5267.1374.56
french-partut-ud-2.15-241121Raw text99.4298.6497.7897.5195.2894.4497.9794.9493.2582.9788.47
french-partut-ud-2.15-241121Gold tokenization98.3198.1295.8194.9398.5495.5494.0183.5989.02
french-rhapsodie-ud-2.15-241121Raw text99.1699.8297.4599.1696.5795.5998.4787.7384.6576.4880.84
french-rhapsodie-ud-2.15-241121Gold tokenization98.33100.0097.3896.4599.2989.0285.9277.4881.52
french-sequoia-ud-2.15-241121Raw text99.1288.7798.3897.4197.0098.2493.9892.7286.7189.23
french-sequoia-ud-2.15-241121Gold tokenization99.2898.2997.8699.0995.7594.4988.5490.42
galician-treegal-ud-2.15-241121Raw text98.7487.9996.2994.3795.3793.5297.2783.2979.1868.5872.41
galician-treegal-ud-2.15-241121Gold tokenization97.5495.4796.4494.5998.4786.9382.5372.4176.46
galician-ctg-ud-2.15-241121Raw text99.2297.2297.1696.9999.0696.5598.0785.3682.8571.1375.73
galician-ctg-ud-2.15-241121Gold tokenization97.8897.7299.8497.2698.8387.0084.3672.9577.61
georgian-glc-ud-2.15-241121Raw text99.1295.8895.8995.8791.3590.9394.0383.1878.9068.7172.92
georgian-glc-ud-2.15-241121Gold tokenization96.5996.5791.9791.5494.7384.7380.2669.5273.98
german-gsd-ud-2.15-241121Raw text99.6783.6396.6797.5391.2488.7296.9187.2783.5366.1075.49
german-gsd-ud-2.15-241121Gold tokenization97.0797.9091.6889.2397.2489.2385.4667.8577.49
german-hdt-ud-2.15-241121Raw text99.9092.3998.5598.4694.1993.7997.6896.9296.0084.9490.48
german-hdt-ud-2.15-241121Gold tokenization98.6698.5994.3293.9397.7797.6196.7285.6291.18
gothic-proiel-ud-2.15-241121Raw text100.0031.1296.1396.6590.1088.0094.7178.6772.5558.5163.15
gothic-proiel-ud-2.15-241121Gold tokenization96.6897.2291.0589.2994.7786.9381.0168.6472.88
greek-gdt-ud-2.15-241121Raw text99.8790.1998.1298.1595.6895.0096.0492.9191.1381.5781.64
greek-gdt-ud-2.15-241121Gold tokenization98.2898.3095.8595.1796.1393.7091.8482.2282.28
greek-gud-ud-2.15-241121Raw text99.9294.9897.1196.2994.4290.6695.7692.9890.1576.4480.55
greek-gud-ud-2.15-241121Gold tokenization97.1596.3394.4590.6895.8293.6590.8276.9081.03
hebrew-htb-ud-2.15-241121Raw text85.1099.6983.0283.0081.4680.8082.9770.4868.0655.8159.83
hebrew-htb-ud-2.15-241121Gold tokenization97.7297.6795.9995.4397.3492.5390.0379.6982.23
hebrew-iahltknesset-ud-2.15-241121Raw text87.98100.0085.2985.2581.6380.7986.8471.3368.8856.0662.80
hebrew-iahltknesset-ud-2.15-241121Gold tokenization96.9396.9592.7091.8498.2990.0987.4072.6580.96
hebrew-iahltwiki-ud-2.15-241121Raw text88.6496.7886.1386.1281.9380.9587.4175.8974.0158.2967.00
hebrew-iahltwiki-ud-2.15-241121Gold tokenization97.1497.1592.4791.4798.3693.6691.3875.8485.70
hindi-hdtb-ud-2.15-241121Raw text100.0098.7297.5997.1994.2192.2698.9295.3092.3979.4687.81
hindi-hdtb-ud-2.15-241121Gold tokenization97.5997.1894.2392.2798.9295.4192.5079.5887.94
hungarian-szeged-ud-2.15-241121Raw text99.8595.8996.7694.2993.5894.9188.2484.6674.8978.03
hungarian-szeged-ud-2.15-241121Gold tokenization96.8494.4393.6695.0288.7085.0875.2078.33
icelandic-modern-ud-2.15-241121Raw text99.4494.5997.7495.2989.2986.4697.0985.6982.7364.7775.00
icelandic-modern-ud-2.15-241121Gold tokenization98.2495.8589.6986.8797.6186.5583.5465.4975.94
icelandic-gc-ud-2.15-241121Raw text99.7294.6494.7282.0385.0079.7191.8283.4179.0358.4669.15
icelandic-gc-ud-2.15-241121Gold tokenization95.0082.5185.5080.2291.9884.1779.7759.0269.68
icelandic-icepahc-ud-2.15-241121Raw text99.8292.6996.8993.3392.0987.2496.3787.0883.2566.8574.53
icelandic-icepahc-ud-2.15-241121Gold tokenization97.0593.5692.2287.4596.5187.5883.7067.3175.07
indonesian-gsd-ud-2.15-241121Raw text99.4993.0494.3093.8695.5688.7998.0887.8481.8672.6877.23
indonesian-gsd-ud-2.15-241121Gold tokenization94.7894.2596.0089.1898.4988.6682.5973.4378.01
indonesian-csui-ud-2.15-241121Raw text99.4591.0195.9696.1196.8195.3698.1786.5182.2076.6678.83
indonesian-csui-ud-2.15-241121Gold tokenization96.4896.6397.3295.8598.8187.9383.4277.6179.92
irish-idt-ud-2.15-241121Raw text99.8897.5895.9394.9990.7287.6295.7787.1681.6465.3072.34
irish-idt-ud-2.15-241121Gold tokenization96.0495.1490.8387.7695.8987.5081.9765.4472.54
irish-twittirish-ud-2.15-241121Raw text98.5046.6290.6390.6388.2878.9872.6058.8557.12
irish-twittirish-ud-2.15-241121Gold tokenization91.8491.8489.5485.8079.2666.7564.25
italian-isdt-ud-2.15-241121Raw text99.7499.0798.5198.4098.0697.6798.5894.6592.9586.6887.71
italian-isdt-ud-2.15-241121Gold tokenization98.7598.6698.3097.9398.8495.0893.3987.0888.14
italian-markit-ud-2.15-241121Raw text99.6298.2496.9897.0794.1692.5388.3488.4884.7070.6078.16
italian-markit-ud-2.15-241121Gold tokenization97.3597.4194.4292.7888.6689.2785.4871.2078.90
italian-old-ud-2.15-241121Raw text99.0897.7696.3086.8191.8783.2496.4985.3780.9364.3772.68
italian-old-ud-2.15-241121Gold tokenization97.1587.2792.7183.8297.3588.2083.5067.1675.55
italian-parlamint-ud-2.15-241121Raw text99.4294.1298.6498.0597.9697.0298.7091.9489.9884.4586.25
italian-parlamint-ud-2.15-241121Gold tokenization99.2298.5998.4897.5099.2093.4091.4386.0887.84
italian-partut-ud-2.15-241121Raw text99.73100.0098.4398.4398.1697.5898.6095.7493.7987.1788.62
italian-partut-ud-2.15-241121Gold tokenization98.6098.6098.3097.7298.7995.8093.7687.0988.54
italian-postwita-ud-2.15-241121Raw text99.3649.5396.6196.3996.1594.8796.4083.0579.2068.9970.63
italian-postwita-ud-2.15-241121Gold tokenization97.2096.9596.6595.4096.9688.0683.7875.3076.72
italian-twittiro-ud-2.15-241121Raw text98.9446.6795.8495.6194.7093.1294.3082.9578.2865.6966.30
italian-twittiro-ud-2.15-241121Gold tokenization96.7196.3095.6793.8595.2388.3483.5071.8372.07
italian-vit-ud-2.15-241121Raw text99.7595.0698.1497.2997.6496.1498.8592.2089.3181.2583.96
italian-vit-ud-2.15-241121Gold tokenization98.3997.6897.8596.5199.0993.0390.1182.0684.81
japanese-gsdluw-ud-2.15-241121Raw text95.1899.7293.9193.6695.1893.5993.6586.3085.6976.6476.59
japanese-gsdluw-ud-2.15-241121Gold tokenization98.4298.1599.9998.0297.8595.2394.2986.5885.22
japanese-gsd-ud-2.15-241121Raw text96.17100.0095.0294.2896.1694.0195.1188.0787.3781.1281.42
japanese-gsd-ud-2.15-241121Gold tokenization98.5997.6299.9897.3298.5595.1494.2489.2589.12
korean-kaist-ud-2.15-241121Raw text100.00100.0096.2687.6587.4594.4689.0987.2083.1380.96
korean-kaist-ud-2.15-241121Gold tokenization96.2687.6587.4594.4689.0987.2083.1380.96
korean-gsd-ud-2.15-241121Raw text99.8793.9396.5090.6399.6888.3293.8788.2584.6581.6977.71
korean-gsd-ud-2.15-241121Gold tokenization96.6790.8499.8188.5294.0188.8885.2482.3278.31
korean-ksl-ud-2.15-241121Raw text100.0099.2296.7589.6387.8395.1589.8386.3881.5880.07
korean-ksl-ud-2.15-241121Gold tokenization96.7489.6487.8395.1589.9286.4781.6680.15
kyrgyz-ktmu-ud-2.15-241121Raw text99.1698.0390.8190.3677.0972.5088.5883.5572.5953.1262.75
kyrgyz-ktmu-ud-2.15-241121Gold tokenization91.5691.1177.7873.1789.3384.4773.4353.6263.27
latin-ittb-ud-2.15-241121Raw text99.9891.7999.1196.6397.1995.8099.1789.4887.5481.4284.91
latin-ittb-ud-2.15-241121Gold tokenization99.1496.6797.2495.8499.2190.4588.5282.0585.50
latin-llct-ud-2.15-241121Raw text99.9999.4999.7397.0997.1396.8397.7995.3594.3888.9990.31
latin-llct-ud-2.15-241121Gold tokenization99.7397.0997.1396.8497.8095.3694.3988.9990.31
latin-perseus-ud-2.15-241121Raw text98.2399.0991.4680.0383.3676.5987.6076.9370.0952.2858.28
latin-perseus-ud-2.15-241121Gold tokenization93.1881.5584.9278.0489.2178.0971.1252.6159.23
latin-proiel-ud-2.15-241121Raw text99.8537.4096.5296.5890.7189.4296.0876.5772.3659.0064.80
latin-proiel-ud-2.15-241121Gold tokenization97.0697.1091.5590.4196.3083.8879.6267.9473.42
latin-udante-ud-2.15-241121Raw text99.6098.4591.1675.6184.5372.5787.6176.7469.6448.8852.99
latin-udante-ud-2.15-241121Gold tokenization91.4275.7384.7872.6587.8776.9369.8148.9153.07
latvian-lvtb-ud-2.15-241121Raw text99.2798.0997.1591.7495.1891.3296.7689.4586.5678.8081.97
latvian-lvtb-ud-2.15-241121Gold tokenization97.8392.4295.8991.9997.4290.5387.6079.9683.09
lithuanian-alksnis-ud-2.15-241121Raw text99.9187.8796.0390.4991.2289.7293.5982.9179.3568.7471.87
lithuanian-alksnis-ud-2.15-241121Gold tokenization96.1590.5991.3189.8393.6884.1780.5969.6672.80
lithuanian-hse-ud-2.15-241121Raw text97.3097.3090.3189.9382.2078.7588.3571.6762.3545.1853.97
lithuanian-hse-ud-2.15-241121Gold tokenization92.1791.8984.0680.3890.8575.2864.9146.7255.70
low_saxon-lsdc-ud-2.15-241121Raw text99.2590.2389.9671.8469.1483.8974.3165.2137.0248.45
low_saxon-lsdc-ud-2.15-241121Gold tokenization90.5972.4369.6484.3975.4866.2537.1348.74
maghrebi_arabic_french-arabizi-ud-2.15-241121Raw text91.657.0078.9072.0683.0370.3751.4357.8549.9836.3724.60
maghrebi_arabic_french-arabizi-ud-2.15-241121Gold tokenization86.5578.6690.6477.3354.8976.1465.6347.2931.71
maltese-mudt-ud-2.15-241121Raw text99.8486.2995.6495.5595.2484.6179.5467.9171.94
maltese-mudt-ud-2.15-241121Gold tokenization95.7595.6895.3485.3280.1968.4172.44
manx-cadhan-ud-2.15-241121Raw text97.3698.2594.0995.8493.3293.3487.6084.1477.7477.73
manx-cadhan-ud-2.15-241121Gold tokenization96.6898.4395.8595.8892.5789.1282.9081.70
marathi-ufal-ud-2.15-241121Raw text94.1692.6382.7375.1871.5384.1866.6760.3440.0047.84
marathi-ufal-ud-2.15-241121Gold tokenization87.1478.6474.5187.1472.3365.2943.7151.13
naija-nsc-ud-2.15-241121Raw text99.97100.0098.1298.9497.5999.3993.1090.5587.5189.19
naija-nsc-ud-2.15-241121Gold tokenization98.1598.9597.6099.4293.1390.5887.5289.21
north_sami-giella-ud-2.15-241121Raw text99.8798.7991.6493.4289.1684.9587.0775.7470.9260.1458.71
north_sami-giella-ud-2.15-241121Gold tokenization91.7893.5789.3085.0887.1975.9971.1860.3058.92
norwegian-bokmaal-ud-2.15-241121Raw text99.8297.2798.3998.9597.4896.8298.6294.0092.7887.1889.00
norwegian-bokmaal-ud-2.15-241121Gold tokenization98.5999.1397.6596.9998.8294.6793.4387.7989.65
norwegian-nynorsk-ud-2.15-241121Raw text99.9394.5498.3699.0697.2996.4598.4093.9492.4785.9088.09
norwegian-nynorsk-ud-2.15-241121Gold tokenization98.5599.2097.4696.6898.5594.6993.2486.8589.04
old_church_slavonic-proiel-ud-2.15-241121Raw text100.0040.0596.2396.4889.7888.0190.2178.0173.6460.7562.07
old_church_slavonic-proiel-ud-2.15-241121Gold tokenization96.6896.9790.4288.9990.2985.1380.5668.4569.16
old_east_slavic-torot-ud-2.15-241121Raw text100.0034.5395.3495.4189.7087.4788.4277.0272.4258.6558.48
old_east_slavic-torot-ud-2.15-241121Gold tokenization95.8795.9190.6388.7188.4885.6080.7768.2866.57
old_east_slavic-birchbark-ud-2.15-241121Raw text99.9916.6688.7499.3575.3671.1865.2165.1958.3432.6627.06
old_east_slavic-birchbark-ud-2.15-241121Gold tokenization88.8899.3575.9971.8165.2876.8969.9140.3732.66
old_east_slavic-rnc-ud-2.15-241121Raw text99.7794.5697.5791.4989.5182.0590.6376.9173.2254.5657.05
old_east_slavic-rnc-ud-2.15-241121Gold tokenization97.8291.6989.7982.3290.8979.5375.6956.3158.99
old_east_slavic-ruthenian-ud-2.15-241121Raw text99.8799.6196.1789.3587.7380.6182.8978.0974.3053.7749.03
old_east_slavic-ruthenian-ud-2.15-241121Gold tokenization96.2389.7787.7881.0282.9678.1674.3553.7349.06
old_french-profiterole-ud-2.15-241121Raw text99.82100.0097.1597.0597.5495.6399.7991.0487.4780.0584.53
old_french-profiterole-ud-2.15-241121Gold tokenization97.3397.2497.7295.8299.9791.2987.7280.3184.80
ottoman_turkish-boun-ud-2.15-241121Raw text99.4187.9687.3290.5180.8772.9082.1961.5851.2632.8336.22
ottoman_turkish-boun-ud-2.15-241121Gold tokenization87.7790.9781.2473.2182.5262.3451.8233.1036.63
persian-perdt-ud-2.15-241121Raw text99.6699.8397.4597.4097.6595.6598.9693.5891.3986.2388.74
persian-perdt-ud-2.15-241121Gold tokenization97.7597.7097.9595.9499.2894.0991.8786.7789.30
persian-seraji-ud-2.15-241121Raw text99.6598.7597.9597.9297.9597.4798.2791.6688.8584.4284.51
persian-seraji-ud-2.15-241121Gold tokenization98.2798.2398.2797.7698.5492.3789.5285.0085.11
polish-pdb-ud-2.15-241121Raw text99.8697.0098.9996.0796.0595.3598.1294.4092.5785.8488.86
polish-pdb-ud-2.15-241121Gold tokenization99.1196.2396.2295.5198.2394.9593.1086.3289.30
polish-lfg-ud-2.15-241121Raw text99.8599.6599.0196.1896.6895.2298.1796.9195.6290.0492.40
polish-lfg-ud-2.15-241121Gold tokenization99.1896.3596.8695.3998.3197.2996.0090.4492.73
pomak-philotis-ud-2.15-241121Raw text99.7989.4295.4288.8587.8891.3788.3081.8063.7367.56
pomak-philotis-ud-2.15-241121Gold tokenization95.5488.9887.9991.4989.2482.6564.3468.30
portuguese-bosque-ud-2.15-241121Raw text99.6889.7397.7896.9295.8598.3692.3189.9880.6984.65
portuguese-bosque-ud-2.15-241121Gold tokenization98.1197.1996.1298.6593.4691.0881.7885.74
portuguese-cintil-ud-2.15-241121Raw text99.4178.6697.4496.0495.3393.2397.4985.3082.2872.3375.94
portuguese-cintil-ud-2.15-241121Gold tokenization98.0496.6595.9393.8198.0687.6484.5174.4778.14
portuguese-dantestocks-ud-2.15-241121Raw text96.4738.2794.2396.4593.8492.8893.6685.3883.0875.3475.93
portuguese-dantestocks-ud-2.15-241121Gold tokenization97.7199.9897.3596.3595.9393.0490.5684.2382.70
portuguese-gsd-ud-2.15-241121Raw text99.2986.2597.4989.6494.6089.1897.1492.7590.8080.1785.14
portuguese-gsd-ud-2.15-241121Gold tokenization98.2791.7696.0791.2797.9894.2592.3582.7686.84
portuguese-petrogold-ud-2.15-241121Raw text99.5993.1198.7998.6998.2199.1294.6993.5388.5390.01
portuguese-petrogold-ud-2.15-241121Gold tokenization99.1098.9698.4799.5495.6194.3789.4491.03
portuguese-porttinari-ud-2.15-241121Raw text94.6828.0593.9093.4493.1094.1785.8584.3178.6281.30
portuguese-porttinari-ud-2.15-241121Gold tokenization99.2098.7298.3699.4596.4895.2390.1991.95
romanian-rrt-ud-2.15-241121Raw text99.7095.5097.8397.1997.4196.9197.9991.9088.4481.8883.42
romanian-rrt-ud-2.15-241121Gold tokenization98.1197.4397.6797.1698.2592.6489.1182.3283.92
romanian-nonstandard-ud-2.15-241121Raw text98.8396.7796.1691.9490.5889.2494.8689.0684.9968.5376.59
romanian-nonstandard-ud-2.15-241121Gold tokenization97.2992.9791.5690.1995.9490.7686.6270.0477.84
romanian-simonero-ud-2.15-241121Raw text99.84100.0098.4697.9497.5597.2298.8894.0192.0985.4288.34
romanian-simonero-ud-2.15-241121Gold tokenization98.6298.0997.7097.3799.0494.3692.4185.6988.61
russian-syntagrus-ud-2.15-241121Raw text99.6798.3198.4894.0193.7698.1893.8091.6782.7688.83
russian-syntagrus-ud-2.15-241121Gold tokenization98.8194.3494.0798.4694.5192.3483.3289.36
russian-gsd-ud-2.15-241121Raw text99.5096.4998.0497.5294.5593.4096.9191.5988.6280.9984.54
russian-gsd-ud-2.15-241121Gold tokenization98.5197.9494.9793.7897.2992.8389.7481.8485.48
russian-poetry-ud-2.15-241121Raw text99.5995.9697.8694.4393.8997.0189.1086.1477.1380.68
russian-poetry-ud-2.15-241121Gold tokenization98.2494.7794.2397.3690.0487.0777.8481.36
russian-taiga-ud-2.15-241121Raw text98.0786.0195.5593.1292.1294.7783.2779.8671.2174.38
russian-taiga-ud-2.15-241121Gold tokenization97.2794.9193.8396.4785.9782.3373.6276.76
sanskrit-vedic-ud-2.15-241121Raw text100.0029.2193.5689.1985.3493.4365.3556.8349.0152.10
sanskrit-vedic-ud-2.15-241121Gold tokenization93.9590.4786.8393.5778.2369.0860.9064.49
scottish_gaelic-arcosg-ud-2.15-241121Raw text97.4261.2693.8389.6591.0988.5295.1380.8676.4165.2770.06
scottish_gaelic-arcosg-ud-2.15-241121Gold tokenization96.6192.6394.0191.5797.7186.9782.5871.9576.15
serbian-set-ud-2.15-241121Raw text99.9993.0099.0995.9296.1095.6997.8093.6091.1883.5486.95
serbian-set-ud-2.15-241121Gold tokenization99.1195.9796.1495.7397.7994.3291.8884.2987.68
slovak-snk-ud-2.15-241121Raw text100.0081.6997.6990.1293.4089.3496.5491.5889.9480.3784.70
slovak-snk-ud-2.15-241121Gold tokenization97.8390.3593.4889.5696.5793.9992.3082.6887.13
slovenian-ssj-ud-2.15-241121Raw text99.9498.9598.7897.0197.1296.5798.5994.3792.7887.2289.16
slovenian-ssj-ud-2.15-241121Gold tokenization98.8497.0797.1796.6398.6494.5192.9187.3789.28
slovenian-sst-ud-2.15-241121Raw text99.8795.4798.4596.9097.0196.1998.8384.8282.1273.8277.42
slovenian-sst-ud-2.15-241121Gold tokenization98.5997.0097.0896.2898.9785.3382.6374.2377.91
spanish-ancora-ud-2.15-241121Raw text99.9598.6999.0696.2298.8095.8399.4793.8092.1587.1188.69
spanish-ancora-ud-2.15-241121Gold tokenization99.1196.2698.8595.8699.5194.0092.3587.3088.85
spanish-gsd-ud-2.15-241121Raw text99.7393.8497.1096.8695.1598.6192.5290.3678.9484.50
spanish-gsd-ud-2.15-241121Gold tokenization97.3597.1295.3898.8693.4291.1879.7185.29
swedish-talbanken-ud-2.15-241121Raw text99.8496.5398.4197.2297.2196.1998.6292.7390.3484.1887.11
swedish-talbanken-ud-2.15-241121Gold tokenization98.5997.4197.3996.4098.7893.1290.7284.6887.54
swedish-lines-ud-2.15-241121Raw text99.9688.5097.7095.4292.9689.9097.7891.1387.9775.3082.66
swedish-lines-ud-2.15-241121Gold tokenization97.7395.4993.0489.9697.8291.8988.6776.0283.39
tamil-ttb-ud-2.15-241121Raw text94.2697.5284.1982.2784.2977.7189.3570.7362.2350.6355.60
tamil-ttb-ud-2.15-241121Gold tokenization89.0487.0389.4982.2094.3278.3368.9856.6361.77
telugu-mtg-ud-2.15-241121Raw text99.5896.6293.6393.6398.4893.3590.0383.2476.0079.24
telugu-mtg-ud-2.15-241121Gold tokenization94.0494.0498.8993.7690.9884.0576.6479.89
turkish-boun-ud-2.15-241121Raw text96.5786.2589.9685.9680.9271.6290.6072.9366.9849.0961.03
turkish-boun-ud-2.15-241121Gold tokenization93.0088.8982.9073.3293.6280.4573.8152.5966.63
turkish-atis-ud-2.15-241121Raw text99.9079.2898.4297.9797.8298.9689.3187.5684.8586.15
turkish-atis-ud-2.15-241121Gold tokenization98.5398.1197.9499.0791.6389.7286.9888.39
turkish-framenet-ud-2.15-241121Raw text99.9099.2796.8394.7993.9796.6393.3684.3674.3777.84
turkish-framenet-ud-2.15-241121Gold tokenization96.9394.8994.0796.7393.5284.5374.5077.97
turkish-imst-ud-2.15-241121Raw text97.3197.3892.7392.5389.4486.5993.5576.4869.2958.1763.97
turkish-imst-ud-2.15-241121Gold tokenization95.2094.8691.6588.6595.8881.3973.7660.6667.22
turkish-kenet-ud-2.15-241121Raw text100.0098.1293.7891.9090.8093.5284.1571.5362.3165.22
turkish-kenet-ud-2.15-241121Gold tokenization93.8091.9190.8293.5184.2871.6162.4065.30
turkish-penn-ud-2.15-241121Raw text99.2782.8995.6894.4693.4194.2884.7272.2162.6765.14
turkish-penn-ud-2.15-241121Gold tokenization96.3995.1094.0694.9586.9174.0963.7766.34
turkish-tourism-ud-2.15-241121Raw text99.99100.0098.7994.9894.5798.2897.1491.4981.6687.10
turkish-tourism-ud-2.15-241121Gold tokenization98.8094.9994.5998.3097.1591.5081.6887.12
turkish_german-sagt-ud-2.15-241121Raw text98.9199.4490.2180.1975.4090.7670.9260.6840.9950.56
turkish_german-sagt-ud-2.15-241121Gold tokenization91.1180.8075.9291.4772.3761.7941.5551.31
ukrainian-iu-ud-2.15-241121Raw text99.8196.2398.0294.2994.5193.2697.6390.7288.3779.3183.57
ukrainian-iu-ud-2.15-241121Gold tokenization98.2394.5094.7093.4597.8291.3288.9479.6984.00
ukrainian-parlamint-ud-2.15-241121Raw text99.8899.6298.3498.5494.9193.9598.8993.3690.7181.6587.38
ukrainian-parlamint-ud-2.15-241121Gold tokenization98.4798.6495.0094.0499.0193.5390.8981.7187.50
urdu-udtb-ud-2.15-241121Raw text100.0098.3194.1892.3182.8778.6197.3588.0082.8857.4475.16
urdu-udtb-ud-2.15-241121Gold tokenization94.1792.2982.8578.5897.3788.1182.9657.4375.25
uyghur-udt-ud-2.15-241121Raw text99.5481.8789.7491.7987.9980.6794.7475.5964.7050.0457.43
uyghur-udt-ud-2.15-241121Gold tokenization90.2192.3488.4481.1495.2377.3566.3451.1458.63
vietnamese-vtb-ud-2.15-241121Raw text86.0693.7378.4777.5177.3485.7756.6249.6241.0545.25
vietnamese-vtb-ud-2.15-241121Gold tokenization89.8388.7588.5499.5176.2265.8555.2961.23
welsh-ccg-ud-2.15-241121Raw text99.5697.7995.6394.6389.8787.5994.6987.5281.5763.8470.76
welsh-ccg-ud-2.15-241121Gold tokenization96.0395.0090.2687.9595.1188.5482.5464.7771.75
western_armenian-armtdp-ud-2.15-241121Raw text99.8998.6896.9192.7191.9797.1089.3784.8770.3076.52
western_armenian-armtdp-ud-2.15-241121Gold tokenization96.9892.8092.0497.2089.6485.1170.5476.76
wolof-wtb-ud-2.15-241121Raw text99.2391.9594.0793.9993.5391.3495.1584.0478.7666.9070.21
wolof-wtb-ud-2.15-241121Gold tokenization95.0894.9694.3392.2495.9386.1980.8669.0672.20

Universal Dependencies 2.12 Models

Universal Dependencies 2.12 Models are distributed under theCC BY-NC-SA licence.The models are based solely onUniversal Dependencies2.12 treebanks, and additionallyusemultilingual BERTandRobeCzech.

The models requireUDPipe 2.

Download

The latest version 230717 of the Universal Dependencies 2.12 modelscan be downloaded fromLINDAT/CLARIN repository.

The models are also available in theREST service.

Acknowledgements

This work has been supported by the Ministry of Education, Youth and Sports ofthe Czech Republic, Project No. LM2018101 LINDAT/CLARIAH-CZ.

The models were trained onUniversal Dependencies 2.12 treebanks.

For the UD treebanks which do not contain original plain text version,raw text is used to train the tokenizer instead. The plain textswere taken from theW2C -- Web to Corpus.

Finally,multilingual BERTandRobeCzech are used to providecontextualized word embeddings.

Publications

Model Description

The Universal Dependencies 2.12 models contain 131 models of 72 languages, eachconsisting of a tokenizer, tagger, lemmatizer and dependency parser, all trainedusing the UD data. We used the original train-dev-test split, but for treebankswith only train and no dev data we used last 10% of the train data as dev data.We produce models only for treebanks with at least 1000 training words.

The tokenizer is trained using theSpaceAfter=No features. If the featuresare not present in the data, they can be filled in using raw text in thelanguage in question.

The tagger, lemmatizer and parser are trained using gold UD data.

Model Performance

We present the tokenizer, tagger, lemmatizer and parser performance, measured onthe testing portion of the data, evaluated both on the raw text and using thegold tokenization. The results are F1 scores measured by theconll18_ud_eval.pyscript.

ModelModeWordsSentsUPOSXPOSUFeatsAllTagsLemmaUASLASMLASBLEX
afrikaans-afribooms-ud-2.12-230717Raw text99.9499.6598.6595.8098.3595.6298.2990.4187.4678.9279.72
afrikaans-afribooms-ud-2.12-230717Gold tokenization98.7195.8598.4195.6798.3390.5887.6379.0879.85
ancient_greek-proiel-ud-2.12-230717Raw text99.9849.1997.6397.9392.1390.7294.7681.7477.9261.9666.22
ancient_greek-proiel-ud-2.12-230717Gold tokenization97.8798.1492.5891.3094.8286.5982.7268.3172.13
ancient_greek-perseus-ud-2.12-230717Raw text99.9798.8592.9885.7691.4284.8986.7280.1374.5554.4655.76
ancient_greek-perseus-ud-2.12-230717Gold tokenization93.0285.7891.4484.9186.7480.3074.7154.5655.87
ancient_hebrew-ptnk-ud-2.12-230717Raw text71.1498.0669.0369.1767.0966.1367.4748.2146.7531.1432.42
ancient_hebrew-ptnk-ud-2.12-230717Gold tokenization96.7096.8690.3689.2296.6091.2987.6168.2676.48
arabic-padt-ud-2.12-230717Raw text94.5882.0991.6789.0089.1288.6590.3478.8774.8366.0468.16
arabic-padt-ud-2.12-230717Gold tokenization96.9694.3494.5094.0195.2088.2483.6574.7376.32
armenian-armtdp-ud-2.12-230717Raw text99.2895.7096.2991.2290.2194.9386.5581.7368.6573.67
armenian-armtdp-ud-2.12-230717Gold tokenization96.8691.8690.7195.5488.2183.3069.3074.60
armenian-bsut-ud-2.12-230717Raw text99.7998.7397.3891.9691.2096.7990.0685.8371.1079.05
armenian-bsut-ud-2.12-230717Gold tokenization97.6092.1691.4196.9990.6286.3871.6579.57
basque-bdt-ud-2.12-230717Raw text99.9499.8396.3393.3291.4496.3887.8184.8074.9379.49
basque-bdt-ud-2.12-230717Gold tokenization96.3993.3791.4896.4187.8884.8674.9779.52
belarusian-hse-ud-2.12-230717Raw text99.3786.5898.2197.5994.5293.5793.2587.3285.4176.9876.66
belarusian-hse-ud-2.12-230717Gold tokenization98.8198.1695.1094.1393.8289.7987.6078.8178.42
bulgarian-btb-ud-2.12-230717Raw text99.9194.1799.2097.2597.9696.8798.0194.4191.7185.9286.61
bulgarian-btb-ud-2.12-230717Gold tokenization99.3397.3798.0996.9998.0995.2292.4886.6187.35
catalan-ancora-ud-2.12-230717Raw text99.9599.0899.1197.2598.7296.9899.4094.9293.4388.0689.51
catalan-ancora-ud-2.12-230717Gold tokenization99.1797.3698.7997.0999.4695.0893.5988.2789.70
chinese-gsdsimp-ud-2.12-230717Raw text90.2999.1087.2387.1589.7086.4790.2372.7470.2863.3866.93
chinese-gsdsimp-ud-2.12-230717Gold tokenization96.1096.0099.4395.3399.9387.0883.9778.2382.55
chinese-gsd-ud-2.12-230717Raw text90.2799.1087.1887.0989.6886.4290.2072.5770.1463.0266.73
chinese-gsd-ud-2.12-230717Gold tokenization96.1696.0599.4195.3899.9287.1884.0778.0782.55
classical_chinese-kyoto-ud-2.12-230717Raw text97.9446.3789.6888.9491.5986.0697.5171.3966.1362.3664.70
classical_chinese-kyoto-ud-2.12-230717Gold tokenization93.4392.1494.6389.8999.5284.7179.4075.5578.24
coptic-scriptorium-ud-2.12-230717Raw text75.2633.6373.0272.9273.1572.0873.9051.8250.0537.3439.84
coptic-scriptorium-ud-2.12-230717Gold tokenization97.0596.9197.8195.9097.4190.4687.7076.4179.79
croatian-set-ud-2.12-230717Raw text99.9394.7998.4995.8296.3195.5397.7892.4189.5781.8984.77
croatian-set-ud-2.12-230717Gold tokenization98.5795.9096.3995.6297.8392.9190.0782.3485.21
czech-pdt-ud-2.12-230717Raw text99.9393.3799.3098.4698.7898.2599.3895.0193.6490.7592.30
czech-pdt-ud-2.12-230717Gold tokenization99.3898.5598.8798.3499.4695.8194.4391.4192.97
czech-cac-ud-2.12-230717Raw text99.9999.6899.6998.2598.0797.7499.3096.2094.9590.8493.11
czech-cac-ud-2.12-230717Gold tokenization99.7098.2698.0897.7499.3196.2094.9590.8693.13
czech-cltt-ud-2.12-230717Raw text99.3296.9298.9593.7293.9993.6598.4892.4790.7782.8788.60
czech-cltt-ud-2.12-230717Gold tokenization99.4094.1894.4994.0799.0393.1791.2082.9288.86
czech-fictree-ud-2.12-230717Raw text99.9998.9599.1497.0797.9396.8899.2996.1794.6789.4792.40
czech-fictree-ud-2.12-230717Gold tokenization99.1697.0897.9596.8999.3096.2494.7389.5792.50
danish-ddt-ud-2.12-230717Raw text99.8289.8097.9897.3696.6097.3888.6086.4679.1081.13
danish-ddt-ud-2.12-230717Gold tokenization98.1797.5996.8397.5289.8287.6480.1882.15
dutch-alpino-ud-2.12-230717Raw text99.7589.1097.7796.6697.6896.2894.9192.9090.5283.7180.13
dutch-alpino-ud-2.12-230717Gold tokenization98.0296.8297.8996.4595.1694.3191.9085.0181.32
dutch-lassysmall-ud-2.12-230717Raw text99.7774.7197.3396.1596.8695.5096.1890.9988.4980.7779.51
dutch-lassysmall-ud-2.12-230717Gold tokenization97.6796.8497.4996.2996.5194.6391.9185.2584.07
english-ewt-ud-2.12-230717Raw text99.0987.8296.6996.3296.7095.0197.2690.7188.8182.2684.29
english-ewt-ud-2.12-230717Gold tokenization97.5097.1797.5695.8498.0893.0391.0484.7086.64
english-atis-ud-2.12-230717Raw text99.9880.4999.0498.5998.1499.6394.4593.0287.9790.18
english-atis-ud-2.12-230717Gold tokenization99.0798.6598.2199.6496.1794.5990.2292.42
english-eslspok-ud-2.12-230717Raw text100.0092.2198.5998.5498.1095.3793.8791.2693.01
english-eslspok-ud-2.12-230717Gold tokenization98.6898.6898.1996.2094.7991.9893.74
english-gum-ud-2.12-230717Raw text99.6395.7298.0098.0497.9197.0298.8492.8790.9685.3587.10
english-gum-ud-2.12-230717Gold tokenization98.3498.3998.2797.3699.1593.7291.7986.1187.79
english-lines-ud-2.12-230717Raw text99.9287.4597.6896.8697.0394.4898.3391.3288.5580.6183.70
english-lines-ud-2.12-230717Gold tokenization97.7896.9597.1394.5898.3992.1889.3881.3684.51
english-partut-ud-2.12-230717Raw text99.72100.0097.4397.2996.4495.4198.1794.2492.3383.2887.38
english-partut-ud-2.12-230717Gold tokenization97.6897.5496.6895.6998.4494.4292.5283.7187.64
erzya-jr-ud-2.12-230717Raw text99.1894.1587.9487.3878.9073.4984.8972.9263.2441.3248.34
erzya-jr-ud-2.12-230717Gold tokenization88.6688.0779.5474.0285.5274.0864.2441.9048.91
estonian-edt-ud-2.12-230717Raw text99.9492.2397.6798.2196.4295.2695.4588.5685.9680.0379.59
estonian-edt-ud-2.12-230717Gold tokenization97.7998.2696.5095.3895.5289.4386.8180.8180.32
estonian-ewt-ud-2.12-230717Raw text98.6378.0394.8896.1494.1091.8193.7583.2380.0172.1573.37
estonian-ewt-ud-2.12-230717Gold tokenization96.2297.4695.3593.0795.0087.2783.7174.9576.20
faroese-farpahc-ud-2.12-230717Raw text99.7492.7797.4593.0094.2492.3299.7486.0182.2968.1275.32
faroese-farpahc-ud-2.12-230717Gold tokenization97.6793.1794.4992.53100.0086.9683.2069.2076.51
finnish-tdt-ud-2.12-230717Raw text99.7090.8297.6798.3196.0895.2392.1190.4288.4882.4378.28
finnish-tdt-ud-2.12-230717Gold tokenization98.0098.5996.3995.5492.3991.7289.7583.4479.26
finnish-ftb-ud-2.12-230717Raw text99.9186.8496.7095.0896.7894.0195.7690.1787.3580.1380.78
finnish-ftb-ud-2.12-230717Gold tokenization97.0895.2996.8794.3595.9092.4089.5382.7183.28
french-gsd-ud-2.12-230717Raw text98.8494.9397.3397.2596.5797.7293.1591.2084.7086.92
french-gsd-ud-2.12-230717Gold tokenization98.4898.3297.6598.8594.9493.1186.5888.24
french-parisstories-ud-2.12-230717Raw text99.7393.0897.2093.0791.2098.0279.9476.6662.3271.73
french-parisstories-ud-2.12-230717Gold tokenization97.4893.3091.4598.2681.1777.8563.2072.63
french-partut-ud-2.12-230717Raw text99.4298.6497.4396.9795.2894.5197.8994.4492.8382.5887.48
french-partut-ud-2.12-230717Gold tokenization98.1297.6295.8995.1298.5095.3593.8983.7388.49
french-rhapsodie-ud-2.12-230717Raw text99.1699.8297.3197.3796.1693.3898.1988.0684.9275.5180.31
french-rhapsodie-ud-2.12-230717Gold tokenization98.1998.1197.0294.1599.0089.3386.1476.4980.94
french-sequoia-ud-2.12-230717Raw text99.1589.5398.4097.1996.8498.3094.0692.7586.3989.39
french-sequoia-ud-2.12-230717Gold tokenization99.2598.0197.6399.1495.6394.3788.2390.53
galician-treegal-ud-2.12-230717Raw text98.7487.9995.9393.6394.8392.8396.7683.4679.6067.9571.94
galician-treegal-ud-2.12-230717Gold tokenization97.1994.7295.8993.8897.9086.9982.7871.4375.90
galician-ctg-ud-2.12-230717Raw text99.2297.2297.2497.0799.0596.6598.1285.1482.7271.0775.57
galician-ctg-ud-2.12-230717Gold tokenization97.9797.7999.8397.3598.8686.8684.3173.0377.52
german-gsd-ud-2.12-230717Raw text99.7682.6896.1697.5390.7888.1596.9187.0483.2065.3975.45
german-gsd-ud-2.12-230717Gold tokenization96.4797.8091.2188.6397.1888.7984.9966.9777.23
german-hdt-ud-2.12-230717Raw text99.9092.3998.5598.4694.2193.8197.6996.9096.0084.8790.50
german-hdt-ud-2.12-230717Gold tokenization98.6698.5994.3493.9597.7997.6096.7185.5491.20
gothic-proiel-ud-2.12-230717Raw text100.0031.1296.1796.7089.8887.8094.7178.7272.7458.8163.43
gothic-proiel-ud-2.12-230717Gold tokenization96.8597.2291.0189.2794.7886.6680.8168.6372.61
greek-gdt-ud-2.12-230717Raw text99.8790.1998.1998.2195.7295.1096.0992.9991.1881.6781.75
greek-gdt-ud-2.12-230717Gold tokenization98.3298.3495.8395.2096.1793.8091.9182.2582.38
greek-gud-ud-2.12-230717Raw text99.9294.9897.0196.2694.2490.5595.7692.9490.0675.9480.42
greek-gud-ud-2.12-230717Gold tokenization97.1196.3294.3290.6595.8393.5990.6876.4480.91
hebrew-htb-ud-2.12-230717Raw text85.1099.6982.9682.9581.3080.6983.0270.7168.2155.8460.10
hebrew-htb-ud-2.12-230717Gold tokenization97.6497.6295.7995.2797.3692.4589.9479.2882.12
hebrew-iahltwiki-ud-2.12-230717Raw text88.5497.1685.9785.9781.4580.4687.1576.1174.2657.9967.30
hebrew-iahltwiki-ud-2.12-230717Gold tokenization97.0997.0992.1891.1098.2993.6691.2675.1485.53
hindi-hdtb-ud-2.12-230717Raw text100.0098.7297.7497.3594.2392.3998.9395.3192.4379.6487.78
hindi-hdtb-ud-2.12-230717Gold tokenization97.7497.3494.2592.4098.9495.4392.5579.7787.94
hungarian-szeged-ud-2.12-230717Raw text99.8595.8996.6894.1893.4794.8988.5684.8974.9678.33
hungarian-szeged-ud-2.12-230717Gold tokenization96.7794.3193.5795.0188.9985.3075.2278.65
icelandic-modern-ud-2.12-230717Raw text99.3794.5997.5895.3488.4985.6296.8786.0583.3064.6475.54
icelandic-modern-ud-2.12-230717Gold tokenization98.1595.9288.9386.0797.4587.0384.1765.3076.47
icelandic-gc-ud-2.12-230717Raw text99.7294.6494.7282.2885.0179.8391.6483.2278.7858.5668.85
icelandic-gc-ud-2.12-230717Gold tokenization95.0682.7185.5280.3491.8184.1479.6659.2169.49
icelandic-icepahc-ud-2.12-230717Raw text99.8092.6796.9093.3192.0187.1396.2487.3083.4666.9174.57
icelandic-icepahc-ud-2.12-230717Gold tokenization97.0893.5592.1887.3596.3987.8583.9567.4075.14
indonesian-gsd-ud-2.12-230717Raw text99.4992.3594.3594.0395.7789.1098.1287.6281.7172.4376.99
indonesian-gsd-ud-2.12-230717Gold tokenization94.7994.4196.1789.4298.5288.4182.4373.1877.77
indonesian-csui-ud-2.12-230717Raw text99.4591.0195.8896.0796.6695.3398.1185.9581.6376.1878.21
indonesian-csui-ud-2.12-230717Gold tokenization96.3496.5897.1595.7898.7487.2682.7177.1479.20
irish-idt-ud-2.12-230717Raw text99.8897.5896.0494.9090.8487.6995.8586.4880.9564.5371.46
irish-idt-ud-2.12-230717Gold tokenization96.1395.0890.9787.8595.9786.7581.1964.5671.53
irish-twittirish-ud-2.12-230717Raw text98.5046.6290.5890.5888.4178.5872.3458.3856.97
irish-twittirish-ud-2.12-230717Gold tokenization91.8091.8089.5785.7579.3166.7264.16
italian-isdt-ud-2.12-230717Raw text99.7499.0798.4498.3898.1497.6498.6894.7393.0586.7988.06
italian-isdt-ud-2.12-230717Gold tokenization98.7198.6498.3997.8998.9595.1493.4787.1988.54
italian-markit-ud-2.12-230717Raw text99.6298.2496.9697.1394.1292.6088.1888.6084.7270.6477.87
italian-markit-ud-2.12-230717Gold tokenization97.3597.5294.3992.8888.5089.3985.5171.2578.65
italian-parlamint-ud-2.12-230717Raw text99.4294.1298.5997.9697.9597.0598.6391.9389.9784.2086.02
italian-parlamint-ud-2.12-230717Gold tokenization99.1798.5298.5097.5899.1693.4191.4485.8087.60
italian-partut-ud-2.12-230717Raw text99.73100.0098.4198.4198.1997.6498.5796.1594.1587.8488.90
italian-partut-ud-2.12-230717Gold tokenization98.5298.5298.2797.7298.8296.1894.0987.6888.80
italian-postwita-ud-2.12-230717Raw text99.3649.5396.5896.3396.3394.8096.6282.8079.0368.8170.55
italian-postwita-ud-2.12-230717Gold tokenization97.1796.9596.9095.4497.2587.9683.7975.1976.95
italian-twittiro-ud-2.12-230717Raw text98.9446.6795.9195.7494.9493.4294.5783.0978.6566.3066.97
italian-twittiro-ud-2.12-230717Gold tokenization96.8496.5795.8794.1595.4388.3483.4371.8272.62
italian-vit-ud-2.12-230717Raw text99.7494.8798.1197.2997.6596.1598.8792.2289.4381.2884.07
italian-vit-ud-2.12-230717Gold tokenization98.3597.6597.8796.5199.1092.9490.1281.9984.78
japanese-gsdluw-ud-2.12-230717Raw text95.1899.7293.8293.5095.1893.4493.5686.2785.5876.2676.41
japanese-gsdluw-ud-2.12-230717Gold tokenization98.3698.01100.0097.9097.7895.1694.1986.2884.89
japanese-gsd-ud-2.12-230717Raw text96.17100.0094.9794.1896.1693.8595.0387.9187.0780.8080.98
japanese-gsd-ud-2.12-230717Gold tokenization98.5997.5299.9997.2098.4794.9393.9488.8088.47
korean-kaist-ud-2.12-230717Raw text100.00100.0096.1987.7887.5894.1888.8586.9282.7780.35
korean-kaist-ud-2.12-230717Gold tokenization96.1987.7887.5894.1888.8586.9282.7780.35
korean-gsd-ud-2.12-230717Raw text99.8793.9396.5490.0799.6787.9493.6287.8883.9880.6876.82
korean-gsd-ud-2.12-230717Gold tokenization96.7290.2499.7988.1293.7488.6984.7681.4977.58
latin-ittb-ud-2.12-230717Raw text99.9991.2199.0196.6597.0795.6299.1690.2588.3182.5385.95
latin-ittb-ud-2.12-230717Gold tokenization99.0396.6697.1295.6499.1891.2889.3583.1786.55
latin-llct-ud-2.12-230717Raw text99.9999.4999.7597.1497.1596.8797.7695.3794.3789.0690.39
latin-llct-ud-2.12-230717Gold tokenization99.7597.1497.1696.8797.7795.3994.3989.0790.41
latin-perseus-ud-2.12-230717Raw text99.9598.9992.8881.1184.6077.4588.8678.9271.7853.5059.27
latin-perseus-ud-2.12-230717Gold tokenization92.9581.1484.6577.4988.8979.0871.9153.5959.31
latin-proiel-ud-2.12-230717Raw text99.8537.4096.6096.6990.6689.4096.1976.6672.4659.5064.96
latin-proiel-ud-2.12-230717Gold tokenization97.0297.1291.4390.3196.4283.8879.5567.8873.35
latin-udante-ud-2.12-230717Raw text99.6198.8190.9475.5084.1472.2386.9775.8868.6347.6351.46
latin-udante-ud-2.12-230717Gold tokenization91.1875.5684.3872.2987.2075.9568.6547.7351.44
latvian-lvtb-ud-2.12-230717Raw text99.2997.8096.7990.9294.7590.1696.5788.7985.8577.5681.04
latvian-lvtb-ud-2.12-230717Gold tokenization97.4491.5695.4490.7797.2189.9586.9478.7482.16
lithuanian-alksnis-ud-2.12-230717Raw text99.9187.8795.9590.3191.0989.5093.4582.7478.9468.0371.20
lithuanian-alksnis-ud-2.12-230717Gold tokenization96.0890.4491.2789.6493.5683.9480.0868.9372.06
lithuanian-hse-ud-2.12-230717Raw text97.3097.3089.9390.0381.9279.0388.1671.8562.6344.4853.82
lithuanian-hse-ud-2.12-230717Gold tokenization91.3291.4283.4080.0990.7575.0065.0045.8755.84
maghrebi_arabic_french-arabizi-ud-2.12-230717Raw text91.657.0078.8171.5982.6569.8150.6357.9049.9336.2224.59
maghrebi_arabic_french-arabizi-ud-2.12-230717Gold tokenization86.6978.8990.8277.8854.6676.3265.8647.7231.57
maltese-mudt-ud-2.12-230717Raw text99.8486.2995.7395.7995.3185.0880.2568.8873.05
maltese-mudt-ud-2.12-230717Gold tokenization95.8795.9295.4685.7080.8169.2873.43
manx-cadhan-ud-2.12-230717Raw text97.3698.2594.1895.7893.3793.4387.4284.0277.7578.03
manx-cadhan-ud-2.12-230717Gold tokenization96.7798.3995.9395.9892.4789.1383.1682.07
marathi-ufal-ud-2.12-230717Raw text94.1692.6382.7374.2170.8084.9168.1360.8339.7547.62
marathi-ufal-ud-2.12-230717Gold tokenization87.1476.9473.0687.8673.7965.2941.8250.10
naija-nsc-ud-2.12-230717Raw text99.95100.0098.0498.9297.5399.3393.0290.4687.4888.98
naija-nsc-ud-2.12-230717Gold tokenization98.0898.9697.5699.3993.1090.5387.5589.03
north_sami-giella-ud-2.12-230717Raw text99.8798.7991.6393.5189.2985.2486.9175.7870.8560.1658.42
north_sami-giella-ud-2.12-230717Gold tokenization91.7893.6589.4285.3887.0176.0171.0860.3858.60
norwegian-bokmaal-ud-2.12-230717Raw text99.8297.2798.3898.9497.4796.8198.5893.8892.6486.9588.78
norwegian-bokmaal-ud-2.12-230717Gold tokenization98.5899.1497.6597.0198.7894.5493.2887.5489.41
norwegian-nynorsk-ud-2.12-230717Raw text99.9394.5498.4399.1697.3596.5598.4693.7992.3785.8488.08
norwegian-nynorsk-ud-2.12-230717Gold tokenization98.6299.2797.5196.7598.6094.5893.1986.8289.08
old_church_slavonic-proiel-ud-2.12-230717Raw text100.0040.0596.1196.3989.7288.0190.1878.0673.6460.9661.97
old_church_slavonic-proiel-ud-2.12-230717Gold tokenization96.6696.9890.3288.9990.1985.0580.5368.5969.15
old_east_slavic-torot-ud-2.12-230717Raw text100.0034.5395.4095.4889.9287.6488.0976.9472.1258.2357.89
old_east_slavic-torot-ud-2.12-230717Gold tokenization95.8995.9490.6788.7288.1485.2780.3867.9965.94
old_east_slavic-birchbark-ud-2.12-230717Raw text99.9916.6688.5099.3776.0972.0365.5864.3957.6632.8427.49
old_east_slavic-birchbark-ud-2.12-230717Gold tokenization89.1399.3876.7072.8465.5976.4369.1941.1532.90
old_east_slavic-rnc-ud-2.12-230717Raw text97.6460.4892.6586.4678.1068.7677.2264.5860.0336.9037.52
old_east_slavic-rnc-ud-2.12-230717Gold tokenization93.9988.7479.2469.7878.2870.7965.1739.6340.04
old_french-srcmf-ud-2.12-230717Raw text99.70100.0096.7496.5897.7695.7899.6690.9287.1780.6584.10
old_french-srcmf-ud-2.12-230717Gold tokenization97.0696.9198.0796.0899.9691.3487.6281.1184.56
persian-perdt-ud-2.12-230717Raw text99.6699.8397.5197.3997.6495.6398.9193.5691.3286.3088.68
persian-perdt-ud-2.12-230717Gold tokenization97.8197.6897.9495.9299.2394.0891.8486.8689.27
persian-seraji-ud-2.12-230717Raw text99.6598.7597.8997.9197.9097.4498.2691.7888.9284.4784.56
persian-seraji-ud-2.12-230717Gold tokenization98.2298.2398.2297.7498.5492.4789.6285.1285.22
polish-pdb-ud-2.12-230717Raw text99.8597.3398.8695.7695.8995.0898.0894.2192.1685.1488.25
polish-pdb-ud-2.12-230717Gold tokenization99.0295.9296.0495.2298.2194.7292.6785.5588.68
polish-lfg-ud-2.12-230717Raw text99.8599.6598.9796.0896.4995.1598.2496.7895.4089.6192.26
polish-lfg-ud-2.12-230717Gold tokenization99.1396.2696.6795.3298.3897.1995.8190.0292.60
pomak-philotis-ud-2.12-230717Raw text99.9894.4998.8095.5495.2696.7188.1783.0470.6073.82
pomak-philotis-ud-2.12-230717Gold tokenization98.8295.5495.2696.7388.6383.5471.0974.23
portuguese-bosque-ud-2.12-230717Raw text99.6889.7397.8896.8695.8798.2792.1389.8380.7584.19
portuguese-bosque-ud-2.12-230717Gold tokenization98.1697.1196.1198.5793.2890.9481.7885.38
portuguese-cintil-ud-2.12-230717Raw text99.4178.6697.4296.0195.2993.2197.6685.1381.8871.7775.51
portuguese-cintil-ud-2.12-230717Gold tokenization98.0096.6195.9193.8098.2387.5484.2074.0677.83
portuguese-petrogold-ud-2.12-230717Raw text99.5993.1198.7598.7098.2099.0994.6993.5788.7790.11
portuguese-petrogold-ud-2.12-230717Gold tokenization99.0599.0098.4699.5195.6294.4289.6891.18
romanian-rrt-ud-2.12-230717Raw text99.7095.5097.8897.1497.3996.9198.0091.9288.4681.9383.37
romanian-rrt-ud-2.12-230717Gold tokenization98.1597.4097.6597.1698.2592.7789.2582.5683.97
romanian-nonstandard-ud-2.12-230717Raw text98.8396.7796.1291.8690.5289.1694.8288.6784.6768.2076.17
romanian-nonstandard-ud-2.12-230717Gold tokenization97.2592.8791.5190.1395.8890.4686.4169.7977.51
romanian-simonero-ud-2.12-230717Raw text99.84100.0098.4197.9797.5197.2098.8993.9592.0385.3088.19
romanian-simonero-ud-2.12-230717Gold tokenization98.5698.1297.6697.3499.0494.2992.3585.5788.47
russian-syntagrus-ud-2.12-230717Raw text99.6798.3198.5094.0293.7698.1993.8091.6682.8088.87
russian-syntagrus-ud-2.12-230717Gold tokenization98.8394.3394.0798.4894.5192.3383.3389.40
russian-gsd-ud-2.12-230717Raw text99.5096.4998.0597.5694.5793.5196.8791.6488.7080.9584.53
russian-gsd-ud-2.12-230717Gold tokenization98.5097.9795.0193.9097.2592.8389.8181.8585.46
russian-taiga-ud-2.12-230717Raw text98.0786.0195.5993.0292.0594.6282.8979.4770.4773.61
russian-taiga-ud-2.12-230717Gold tokenization97.2594.8093.6796.3485.5381.9172.7076.02
sanskrit-vedic-ud-2.12-230717Raw text100.0027.1889.2081.1976.4087.1161.0550.0241.4244.77
sanskrit-vedic-ud-2.12-230717Gold tokenization90.0782.7078.1787.4073.6961.5551.5654.91
scottish_gaelic-arcosg-ud-2.12-230717Raw text97.4361.2693.6689.5091.0788.3494.9280.7976.3364.7869.63
scottish_gaelic-arcosg-ud-2.12-230717Gold tokenization96.4092.4994.0491.4197.4787.2582.8471.7576.18
serbian-set-ud-2.12-230717Raw text99.9993.0099.0395.9496.1395.7197.8293.4890.9983.4586.89
serbian-set-ud-2.12-230717Gold tokenization99.0595.9596.1695.7497.8394.1891.6584.1787.56
slovak-snk-ud-2.12-230717Raw text100.0081.6997.6890.2693.3589.4296.5191.4889.7380.1984.62
slovak-snk-ud-2.12-230717Gold tokenization97.8390.3493.4489.5696.5493.8892.0082.4286.89
slovenian-ssj-ud-2.12-230717Raw text99.9498.9598.9697.0997.2696.7898.5794.2392.8687.3789.22
slovenian-ssj-ud-2.12-230717Gold tokenization99.0297.1597.3396.8498.6194.4093.0287.5289.34
slovenian-sst-ud-2.12-230717Raw text99.9724.7495.8293.3793.5891.6497.6966.7661.9852.0455.84
slovenian-sst-ud-2.12-230717Gold tokenization96.1093.4993.7991.9197.7378.5273.2063.8368.34
spanish-ancora-ud-2.12-230717Raw text99.9598.7899.0696.1298.7695.7199.3993.6891.9286.7988.29
spanish-ancora-ud-2.12-230717Gold tokenization99.1196.1698.8195.7599.4393.8792.1086.9588.45
spanish-gsd-ud-2.12-230717Raw text99.7294.9097.1096.7495.0798.5892.5190.2878.7684.27
spanish-gsd-ud-2.12-230717Gold tokenization97.3697.0195.3298.8393.3891.1079.5885.17
swedish-talbanken-ud-2.12-230717Raw text99.8496.5398.3797.2397.3196.4398.1792.0889.7283.6985.90
swedish-talbanken-ud-2.12-230717Gold tokenization98.5397.4297.4996.6198.3392.5390.1584.2086.44
swedish-lines-ud-2.12-230717Raw text99.9688.0097.6195.4990.9388.1897.8290.4887.1471.6681.94
swedish-lines-ud-2.12-230717Gold tokenization97.7195.5090.9688.2397.8691.1987.7872.2582.63
tamil-ttb-ud-2.12-230717Raw text94.2697.5284.2982.9284.0977.7688.7970.6361.9850.5354.69
tamil-ttb-ud-2.12-230717Gold tokenization89.1487.1389.1981.8093.8778.3868.8856.4860.89
telugu-mtg-ud-2.12-230717Raw text99.5896.6292.9492.9498.6192.9490.7284.0776.1980.19
telugu-mtg-ud-2.12-230717Gold tokenization93.4893.4899.0393.4891.6885.0277.2281.03
turkish-boun-ud-2.12-230717Raw text96.5786.2590.0386.0581.1271.6990.4773.3267.1149.0960.77
turkish-boun-ud-2.12-230717Gold tokenization92.9788.9183.0773.3593.5280.7973.9152.7866.50
turkish-atis-ud-2.12-230717Raw text100.0080.2099.0298.5798.4099.1189.0387.2484.9185.71
turkish-atis-ud-2.12-230717Gold tokenization99.0498.5798.4299.1190.9689.0986.8387.69
turkish-framenet-ud-2.12-230717Raw text100.00100.0096.5294.7593.8796.3993.7384.8074.1277.96
turkish-framenet-ud-2.12-230717Gold tokenization96.5294.7593.8796.3993.7384.8074.1277.96
turkish-imst-ud-2.12-230717Raw text97.9497.7093.7093.4690.6388.1394.4675.2266.2855.6960.80
turkish-imst-ud-2.12-230717Gold tokenization95.4695.2792.4189.7596.3278.9469.4257.6262.96
turkish-kenet-ud-2.12-230717Raw text100.0098.1293.8092.1090.8593.5083.9471.5162.3265.35
turkish-kenet-ud-2.12-230717Gold tokenization93.8392.1290.8993.5184.1071.6362.4665.49
turkish-penn-ud-2.12-230717Raw text99.3480.5995.5094.4893.2994.1484.4071.7362.0864.36
turkish-penn-ud-2.12-230717Gold tokenization96.1495.1293.9494.7186.8473.8963.4665.77
turkish-tourism-ud-2.12-230717Raw text99.9699.8698.9294.9894.6798.2797.0491.4381.5887.09
turkish-tourism-ud-2.12-230717Gold tokenization98.9695.0294.7298.3197.1091.5081.6687.18
turkish_german-sagt-ud-2.12-230717Raw text98.9199.4490.2180.2475.4590.8071.4261.2241.1650.92
turkish_german-sagt-ud-2.12-230717Gold tokenization91.0980.8275.9391.4972.7662.2041.6351.55
ukrainian-iu-ud-2.12-230717Raw text99.8196.2397.8494.2894.2593.1697.4790.3787.9478.3082.74
ukrainian-iu-ud-2.12-230717Gold tokenization98.0394.4494.3993.3197.6790.9788.5278.7083.18
urdu-udtb-ud-2.12-230717Raw text100.0098.3194.0992.2082.7678.4397.4188.0282.6857.2574.71
urdu-udtb-ud-2.12-230717Gold tokenization94.0692.1982.7678.4197.4188.1382.8157.3074.88
uyghur-udt-ud-2.12-230717Raw text99.5481.8789.7791.7288.2380.8294.7175.3264.4450.0457.14
uyghur-udt-ud-2.12-230717Gold tokenization90.2392.2188.6581.2795.2277.0566.0250.9558.33
vietnamese-vtb-ud-2.12-230717Raw text86.0693.7378.6177.6177.5085.7656.8650.0241.4045.54
vietnamese-vtb-ud-2.12-230717Gold tokenization90.0288.8888.7399.5076.3165.9055.3661.24
welsh-ccg-ud-2.12-230717Raw text99.4697.6895.2794.2889.5787.2594.4386.7380.8163.0969.84
welsh-ccg-ud-2.12-230717Gold tokenization95.7494.7390.0387.6694.9487.8381.8564.0470.94
western_armenian-armtdp-ud-2.12-230717Raw text99.8998.6896.6792.3191.6397.1489.2684.6869.8076.24
western_armenian-armtdp-ud-2.12-230717Gold tokenization96.7592.4091.7197.2389.5184.9169.9976.44
wolof-wtb-ud-2.12-230717Raw text99.2391.9594.1694.1793.5691.4895.1883.7578.6166.5569.91
wolof-wtb-ud-2.12-230717Gold tokenization95.1195.0894.3492.3395.8885.9080.5768.6771.71

Universal Dependencies 2.10 Models

Universal Dependencies 2.10 Models are distributed under theCC BY-NC-SA licence.The models are based solely onUniversal Dependencies2.10 treebanks, and additionallyusemultilingual BERTandRobeCzech.

The models requireUDPipe 2.

Download

The latest version 220711 of the Universal Dependencies 2.10 modelscan be downloaded fromLINDAT/CLARIN repository.

The models are also available in theREST service.

Acknowledgements

This work has been supported by the Ministry of Education, Youth and Sports ofthe Czech Republic, Project No. LM2018101 LINDAT/CLARIAH-CZ.

The models were trained onUniversal Dependencies 2.10 treebanks.

For the UD treebanks which do not contain original plain text version,raw text is used to train the tokenizer instead. The plain textswere taken from theW2C -- Web to Corpus.

Finally,multilingual BERTandRobeCzech are used to providecontextualized word embeddings.

Publications

Model Description

The Universal Dependencies 2.10 models contain 123 models of 69 languages, eachconsisting of a tokenizer, tagger, lemmatizer and dependency parser, all trainedusing the UD data. We used the original train-dev-test split, but for treebankswith only train and no dev data we used last 10% of the train data as dev data.We produce models only for treebanks with at least 1000 training words.

The tokenizer is trained using theSpaceAfter=No features. If the featuresare not present in the data, they can be filled in using raw text in thelanguage in question.

The tagger, lemmatizer and parser are trained using gold UD data.

Model Performance

We present the tokenizer, tagger, lemmatizer and parser performance, measured onthe testing portion of the data, evaluated both on the raw text and using thegold tokenization. The results are F1 scores measured by theconll18_ud_eval.pyscript.

ModelModeWordsSentsUPOSXPOSUFeatsAllTagsLemmaUASLASMLASBLEX
afrikaans-afribooms-ud-2.10-220711Raw text99.7898.5998.5895.4698.1395.3397.4390.1087.2378.6478.59
afrikaans-afribooms-ud-2.10-220711Gold tokenization98.7795.6298.3195.5097.5390.7287.8079.2378.99
ancient_greek-perseus-ud-2.10-220711Raw text99.9798.8592.8385.5591.4584.8786.6880.1374.3654.6255.72
ancient_greek-perseus-ud-2.10-220711Gold tokenization92.8885.6091.4784.9086.7080.3274.5354.7355.87
ancient_greek-proiel-ud-2.10-220711Raw text100.0048.0297.7798.0592.3591.0594.7179.8276.0660.0865.75
ancient_greek-proiel-ud-2.10-220711Gold tokenization97.8798.1492.4991.2694.7386.0582.1467.0371.90
ancient_hebrew-ptnk-ud-2.10-220711Raw text68.7698.0656.8056.9455.1350.8049.8838.7334.6718.4717.69
ancient_hebrew-ptnk-ud-2.10-220711Gold tokenization68.0367.9766.9756.1553.3563.3151.6128.0824.34
arabic-padt-ud-2.10-220711Raw text94.5882.0991.7289.0189.1488.6990.4178.6374.5465.8467.88
arabic-padt-ud-2.10-220711Gold tokenization97.0294.3894.5394.0895.3188.1183.4974.5776.13
armenian-armtdp-ud-2.10-220711Raw text99.2895.7096.0791.3990.2895.0486.8482.2269.5374.39
armenian-armtdp-ud-2.10-220711Gold tokenization96.6392.0390.7795.7088.5083.8170.1875.42
armenian-bsut-ud-2.10-220711Raw text99.7998.7397.3192.0191.2496.6290.0285.7571.2078.86
armenian-bsut-ud-2.10-220711Gold tokenization97.5392.2491.4896.8290.5686.2971.7379.32
basque-bdt-ud-2.10-220711Raw text99.9499.8396.2592.6990.6996.3687.4084.2873.9479.81
basque-bdt-ud-2.10-220711Gold tokenization96.3092.7390.7296.3987.4884.3673.9979.86
belarusian-hse-ud-2.10-220711Raw text99.4783.9798.3096.2694.3892.3793.3586.8484.8576.0076.01
belarusian-hse-ud-2.10-220711Gold tokenization98.8196.7494.9492.8793.8689.5587.3878.2378.11
bulgarian-btb-ud-2.10-220711Raw text99.9194.1799.1997.2097.9796.8597.9994.4191.6785.8986.31
bulgarian-btb-ud-2.10-220711Gold tokenization99.2997.3098.0796.9698.0995.2492.4486.5287.03
catalan-ancora-ud-2.10-220711Raw text99.9599.0899.0797.2198.7096.9699.4094.8693.1487.4588.92
catalan-ancora-ud-2.10-220711Gold tokenization99.1497.3298.7897.0799.4695.0293.3087.6989.13
chinese-gsdsimp-ud-2.10-220711Raw text90.2999.1087.2187.1689.7486.4290.2973.1170.6263.5867.09
chinese-gsdsimp-ud-2.10-220711Gold tokenization96.1496.0499.4595.3099.9987.2884.0778.5682.64
chinese-gsd-ud-2.10-220711Raw text90.2799.1087.1587.0589.7186.3690.2772.8570.2963.4166.89
chinese-gsd-ud-2.10-220711Gold tokenization96.2196.0899.4095.3499.9987.1583.9678.4182.59
classical_chinese-kyoto-ud-2.10-220711Raw text97.2640.7187.4086.4889.9783.2796.7867.5662.1758.0260.60
classical_chinese-kyoto-ud-2.10-220711Gold tokenization92.3090.8793.9488.1999.4783.1677.6373.1576.42
coptic-scriptorium-ud-2.10-220711Raw text74.4933.8772.4372.3472.5371.5472.9151.2549.4336.5539.14
coptic-scriptorium-ud-2.10-220711Gold tokenization96.9496.7897.4995.5097.0290.4887.7076.0479.57
croatian-set-ud-2.10-220711Raw text99.9394.7998.4895.7296.2395.4997.6092.1789.2781.5384.23
croatian-set-ud-2.10-220711Gold tokenization98.5495.8096.3095.5697.6892.6789.7581.9284.69
czech-pdt-ud-2.10-220711Raw text99.9493.7499.3798.4098.3398.0299.2194.9093.5090.2891.88
czech-pdt-ud-2.10-220711Gold tokenization99.4598.4798.4098.0999.2895.6394.2390.8892.50
czech-cac-ud-2.10-220711Raw text99.9999.6899.7298.5798.3798.1299.1896.1294.7691.0992.67
czech-cac-ud-2.10-220711Gold tokenization99.7398.5898.3898.1399.1996.1294.7691.1192.69
czech-cltt-ud-2.10-220711Raw text99.7197.7999.2295.3295.2395.0399.1890.7789.2481.3586.22
czech-cltt-ud-2.10-220711Gold tokenization99.4795.4795.4095.1899.4791.2089.6881.6786.73
czech-fictree-ud-2.10-220711Raw text99.9998.9599.1797.0697.8396.8699.3596.3894.9189.6192.81
czech-fictree-ud-2.10-220711Gold tokenization99.1897.0897.8496.8899.3696.4694.9789.7192.91
danish-ddt-ud-2.10-220711Raw text99.8189.7897.9597.2996.5497.2688.2786.2579.2280.96
danish-ddt-ud-2.10-220711Gold tokenization98.1697.5396.7997.4589.4687.4280.4282.17
dutch-alpino-ud-2.10-220711Raw text99.8388.9897.8696.7997.8096.2995.1192.9590.5883.1579.88
dutch-alpino-ud-2.10-220711Gold tokenization97.9796.8797.9196.4195.2694.0091.6384.1780.83
dutch-lassysmall-ud-2.10-220711Raw text99.8074.9396.9895.6296.6394.8895.7090.6187.9479.6778.25
dutch-lassysmall-ud-2.10-220711Gold tokenization97.2596.4397.3695.8395.9794.5191.6684.4883.08
english-ewt-ud-2.10-220711Raw text98.9587.0296.3996.1396.5394.8097.1390.0788.1081.4783.42
english-ewt-ud-2.10-220711Gold tokenization97.3597.0697.5295.7198.0792.6290.5684.0285.98
english-atis-ud-2.10-220711Raw text100.0081.9698.9798.5498.1399.9494.3992.9287.8590.39
english-atis-ud-2.10-220711Gold tokenization98.9798.5698.1599.9495.8894.2689.8092.40
english-gum-ud-2.10-220711Raw text99.6495.3697.9597.9197.8896.9198.7792.3090.3584.6386.31
english-gum-ud-2.10-220711Gold tokenization98.2798.2698.2297.2499.0993.1791.1985.4287.04
english-lines-ud-2.10-220711Raw text99.9287.4597.7196.7797.0294.4198.4091.1788.2280.2783.45
english-lines-ud-2.10-220711Gold tokenization97.7996.8497.0794.4898.4792.1089.1781.0584.36
english-partut-ud-2.10-220711Raw text99.72100.0097.2397.1196.3595.2698.1494.2492.2183.3587.34
english-partut-ud-2.10-220711Gold tokenization97.4897.3696.6095.5198.4294.4892.4683.7487.62
estonian-edt-ud-2.10-220711Raw text99.9592.0397.6898.3196.2895.0795.3688.8186.1679.9279.56
estonian-edt-ud-2.10-220711Gold tokenization97.8198.3696.3695.1995.4389.7187.0380.7780.37
estonian-ewt-ud-2.10-220711Raw text98.8275.2695.4196.2994.0691.9293.8682.6279.3071.4072.35
estonian-ewt-ud-2.10-220711Gold tokenization96.6597.4395.1593.1094.9786.7683.2574.7975.57
faroese-farpahc-ud-2.10-220711Raw text99.7492.7797.4493.0494.4392.5099.7485.7682.1368.0775.34
faroese-farpahc-ud-2.10-220711Gold tokenization97.6493.2894.6892.72100.0086.8283.1069.1776.51
finnish-tdt-ud-2.10-220711Raw text99.7090.8297.5898.1895.9995.1092.1490.2088.1882.1978.16
finnish-tdt-ud-2.10-220711Gold tokenization97.9298.4996.2995.4392.4691.5189.4683.2079.17
finnish-ftb-ud-2.10-220711Raw text99.9186.8496.6995.1496.8394.0295.5789.8087.1880.0480.49
finnish-ftb-ud-2.10-220711Gold tokenization97.0095.3696.9294.3295.6791.9189.2382.5582.84
french-gsd-ud-2.10-220711Raw text98.7894.6997.2697.3596.6397.5592.7690.8284.5586.32
french-gsd-ud-2.10-220711Gold tokenization98.4498.4797.7198.7594.5592.7186.3487.59
french-parisstories-ud-2.10-220711Raw text99.4987.8796.2494.4192.1797.5579.9574.8461.2368.35
french-parisstories-ud-2.10-220711Gold tokenization96.8194.9092.6897.9881.6776.5062.4669.32
french-partut-ud-2.10-220711Raw text99.48100.0097.2696.7694.7293.9697.3394.7292.8181.0986.22
french-partut-ud-2.10-220711Gold tokenization97.8997.3595.2794.5197.8995.6293.8582.1887.24
french-rhapsodie-ud-2.10-220711Raw text99.2299.4797.2097.4596.1293.3098.2688.7184.9975.1579.88
french-rhapsodie-ud-2.10-220711Gold tokenization98.0098.1396.8993.9798.9989.8986.0875.9180.35
french-sequoia-ud-2.10-220711Raw text99.1584.0298.3297.1596.6898.3393.6092.2286.0889.00
french-sequoia-ud-2.10-220711Gold tokenization99.2497.9597.5499.1395.4394.1188.0090.34
galician-ctg-ud-2.10-220711Raw text99.2297.2297.2897.0599.0696.7098.0485.5983.2072.1176.94
galician-ctg-ud-2.10-220711Gold tokenization98.0197.7899.8497.4198.7987.3184.8074.0478.88
galician-treegal-ud-2.10-220711Raw text98.7487.9996.0093.6994.8592.8296.6783.4479.3667.8271.68
galician-treegal-ud-2.10-220711Gold tokenization97.1994.8395.9493.9197.8686.7582.4071.3075.54
german-hdt-ud-2.10-220711Raw text99.9092.3498.5198.4594.0893.7097.1696.9496.0484.7990.40
german-hdt-ud-2.10-220711Gold tokenization98.6298.5894.2293.8397.2697.6396.7585.4691.08
german-gsd-ud-2.10-220711Raw text99.8181.1295.7897.6890.2387.2796.7587.3283.1263.7975.00
german-gsd-ud-2.10-220711Gold tokenization95.9497.8790.6087.6096.9689.2885.0465.3376.75
gothic-proiel-ud-2.10-220711Raw text100.0031.1296.4896.9890.0888.1994.6274.1768.4055.3962.02
gothic-proiel-ud-2.10-220711Gold tokenization96.9797.4290.9089.3394.7184.1178.2965.7371.10
greek-gdt-ud-2.10-220711Raw text99.8790.1998.0998.1095.6095.0195.6193.0591.2481.5881.04
greek-gdt-ud-2.10-220711Gold tokenization98.2398.2495.7995.2095.7093.8592.0482.2881.75
hebrew-htb-ud-2.10-220711Raw text85.0599.3982.7882.8081.2380.5382.9170.6368.1355.3159.44
hebrew-htb-ud-2.10-220711Gold tokenization97.4497.4295.7395.0597.3492.7190.2478.7781.80
hebrew-iahltwiki-ud-2.10-220711Raw text88.5497.1685.9786.0080.5579.4787.1576.1674.1956.9166.92
hebrew-iahltwiki-ud-2.10-220711Gold tokenization97.0997.1091.5990.4198.2493.8891.4574.2785.44
hindi-hdtb-ud-2.10-220711Raw text100.0098.9097.5797.1294.1692.2398.9295.3092.3279.2087.66
hindi-hdtb-ud-2.10-220711Gold tokenization97.5897.1494.1892.2698.9295.4292.4479.3587.81
hungarian-szeged-ud-2.10-220711Raw text99.8595.8996.6894.2293.5394.9288.8185.0975.2278.19
hungarian-szeged-ud-2.10-220711Gold tokenization96.7994.3693.6495.0489.3185.5475.5178.47
icelandic-icepahc-ud-2.10-220711Raw text99.8292.1596.9093.2491.3286.4295.9987.2183.3665.9874.25
icelandic-icepahc-ud-2.10-220711Gold tokenization97.0893.4591.4786.6196.1587.7883.8766.4974.84
icelandic-modern-ud-2.10-220711Raw text99.9299.2299.0798.1498.3897.8898.9194.4193.1789.3190.07
icelandic-modern-ud-2.10-220711Gold tokenization99.1498.2198.4597.9598.9894.5093.2689.4190.16
indonesian-gsd-ud-2.10-220711Raw text99.4892.9094.2393.8195.5388.7898.1387.6581.5972.3577.02
indonesian-gsd-ud-2.10-220711Gold tokenization94.6694.2695.9989.1798.5388.5782.4273.2177.88
indonesian-csui-ud-2.10-220711Raw text99.4591.0196.0596.1496.8595.4398.2386.3882.1076.5478.80
indonesian-csui-ud-2.10-220711Gold tokenization96.5696.7297.3795.9998.8787.7783.2877.6279.92
irish-idt-ud-2.10-220711Raw text99.7297.2595.6394.7690.3387.1495.3086.7481.1064.2071.52
irish-idt-ud-2.10-220711Gold tokenization95.8995.0790.6087.4695.5487.2881.6464.5671.85
italian-isdt-ud-2.10-220711Raw text99.8498.7698.5798.5098.2597.6798.7994.6693.0186.6188.00
italian-isdt-ud-2.10-220711Gold tokenization98.7298.6598.4197.8398.9594.9693.3486.9788.40
italian-markit-ud-2.10-220711Raw text99.5998.2496.7697.0093.8092.0888.1888.3684.5169.9577.77
italian-markit-ud-2.10-220711Gold tokenization97.1597.4094.1092.3588.5489.1385.2670.5178.46
italian-partut-ud-2.10-220711Raw text99.73100.0098.4398.4398.3597.6198.6896.2194.1887.8789.09
italian-partut-ud-2.10-220711Gold tokenization98.5498.5798.4997.6998.9396.2694.1587.6889.07
italian-postwita-ud-2.10-220711Raw text99.4028.1196.4396.1896.3094.7996.7280.6176.8965.2966.90
italian-postwita-ud-2.10-220711Gold tokenization97.0496.8296.8095.2997.3188.3484.1975.3277.32
italian-twittiro-ud-2.10-220711Raw text99.1439.3695.9295.9295.0793.4694.5082.2377.7964.5065.42
italian-twittiro-ud-2.10-220711Gold tokenization96.9196.6196.0094.1595.1688.0783.5371.8972.69
italian-vit-ud-2.10-220711Raw text99.7696.7398.1497.3997.6496.2198.8992.0889.1680.9383.70
italian-vit-ud-2.10-220711Gold tokenization98.3697.7197.8596.5399.1092.8889.9781.9184.63
japanese-gsd-ud-2.10-220711Raw text96.17100.0094.9394.1896.1693.8195.0587.6886.8580.4380.78
japanese-gsd-ud-2.10-220711Gold tokenization98.5597.5099.9997.1398.4794.7393.7588.5088.34
japanese-gsdluw-ud-2.10-220711Raw text95.1899.7293.8193.5495.1893.4693.6686.2285.5476.2776.58
japanese-gsdluw-ud-2.10-220711Gold tokenization98.3698.05100.0097.9397.8995.2394.1886.3885.19
korean-kaist-ud-2.10-220711Raw text100.00100.0095.8887.7487.5694.1789.3387.4782.1580.14
korean-kaist-ud-2.10-220711Gold tokenization95.8887.7487.5694.1789.3387.4782.1580.14
korean-gsd-ud-2.10-220711Raw text99.8793.9396.5790.2799.6788.0293.5788.5484.9180.7377.23
korean-gsd-ud-2.10-220711Gold tokenization96.7390.4399.8088.2093.6989.2785.6181.4577.93
latin-ittb-ud-2.10-220711Raw text99.9991.2198.9196.5896.7595.1999.1890.5388.5382.0786.07
latin-ittb-ud-2.10-220711Gold tokenization98.9296.5796.7895.2099.1891.5089.5182.6386.59
latin-llct-ud-2.10-220711Raw text100.0099.4999.6897.1497.2696.8997.7895.5594.5689.8090.95
latin-llct-ud-2.10-220711Gold tokenization99.6897.1597.2796.9097.7895.5594.5789.8190.97
latin-perseus-ud-2.10-220711Raw text100.0098.4691.8380.6686.1278.5688.1377.9868.5952.3055.51
latin-perseus-ud-2.10-220711Gold tokenization91.8580.6686.1278.5588.1678.1468.7152.3955.58
latin-proiel-ud-2.10-220711Raw text99.8736.8196.6996.8790.5689.5496.2174.0769.5656.7463.93
latin-proiel-ud-2.10-220711Gold tokenization97.1297.3291.1990.2796.4483.2078.5066.3473.00
latin-udante-ud-2.10-220711Raw text99.6198.8190.5875.5981.3171.6287.2575.2667.8143.9550.36
latin-udante-ud-2.10-220711Gold tokenization90.8275.7081.5371.7087.4475.5067.9744.0850.51
latvian-lvtb-ud-2.10-220711Raw text99.3197.8396.5189.8393.8689.0895.9288.7585.7976.0480.25
latvian-lvtb-ud-2.10-220711Gold tokenization97.1490.4394.5089.6796.5589.8486.8277.0981.31
lithuanian-alksnis-ud-2.10-220711Raw text99.9187.8795.9490.4491.0389.5293.6082.4578.6467.9771.37
lithuanian-alksnis-ud-2.10-220711Gold tokenization96.0490.5291.1689.6393.6983.7079.8868.9872.36
lithuanian-hse-ud-2.10-220711Raw text97.3097.3089.2890.2183.1378.3888.1670.2761.7945.6754.04
lithuanian-hse-ud-2.10-220711Gold tokenization91.2392.3685.1980.0990.5773.9664.5347.5456.10
maltese-mudt-ud-2.10-220711Raw text99.8486.2995.8095.7995.3584.9680.0768.9872.86
maltese-mudt-ud-2.10-220711Gold tokenization95.9595.9295.4885.6580.7069.4073.33
marathi-ufal-ud-2.10-220711Raw text90.2592.6376.5065.2560.7580.7560.7550.7528.3938.00
marathi-ufal-ud-2.10-220711Gold tokenization82.5267.9662.8680.8368.9358.5029.4638.17
naija-nsc-ud-2.10-220711Raw text99.94100.0098.0398.9497.5399.3293.6590.9988.1389.60
naija-nsc-ud-2.10-220711Gold tokenization98.0899.0097.5899.3893.7591.0888.2189.68
north_sami-giella-ud-2.10-220711Raw text99.8798.7991.7793.5489.3085.3687.0175.1670.4359.7658.27
north_sami-giella-ud-2.10-220711Gold tokenization91.9193.6789.4585.5287.1375.4770.7660.0558.56
norwegian-bokmaal-ud-2.10-220711Raw text99.7796.0598.3597.4396.8298.5793.6292.1686.9188.74
norwegian-bokmaal-ud-2.10-220711Gold tokenization98.6197.6897.0798.8294.4092.9187.5989.43
norwegian-nynorsk-ud-2.10-220711Raw text99.9394.1798.2497.3496.5598.4093.8992.1886.0388.36
norwegian-nynorsk-ud-2.10-220711Gold tokenization98.4197.5096.7398.5394.6392.9386.9389.20
norwegian-nynorsklia-ud-2.10-220711Raw text99.9199.5396.6195.7193.7598.0581.1876.6166.0169.68
norwegian-nynorsklia-ud-2.10-220711Gold tokenization96.7295.8093.8598.1481.4276.8466.2369.90
old_church_slavonic-proiel-ud-2.10-220711Raw text100.0041.4396.7296.9090.3789.1993.1377.7173.9263.8268.87
old_church_slavonic-proiel-ud-2.10-220711Gold tokenization97.0897.2891.0689.9393.1488.3084.1874.0177.39
old_french-srcmf-ud-2.10-220711Raw text99.70100.0096.6896.5097.7095.7299.6591.1787.3880.7684.40
old_french-srcmf-ud-2.10-220711Gold tokenization96.9996.8298.0196.0399.9591.5887.8281.2084.85
old_russian-torot-ud-2.10-220711Raw text100.0029.6094.3994.7087.5685.2385.9271.0065.3251.6453.64
old_russian-torot-ud-2.10-220711Gold tokenization95.0695.2988.5086.6085.9683.3077.2464.0962.94
old_russian-rnc-ud-2.10-220711Raw text97.4884.0390.9486.5576.5167.1575.3161.2855.9333.2434.04
old_russian-rnc-ud-2.10-220711Gold tokenization93.2988.9378.4868.8676.7767.1361.0837.1537.24
old_east_slavic-birchbark-ud-2.10-220711Raw text99.9816.7389.2499.3576.1172.4365.8863.4156.5032.5327.14
old_east_slavic-birchbark-ud-2.10-220711Gold tokenization89.3799.3776.5472.8266.0576.3169.0041.6333.60
persian-perdt-ud-2.10-220711Raw text99.6699.8397.4897.3697.6195.6098.8893.6391.4286.1888.66
persian-perdt-ud-2.10-220711Gold tokenization97.7897.6597.9095.8999.1994.1891.9586.7289.23
persian-seraji-ud-2.10-220711Raw text99.6598.7597.9197.9497.9597.4896.5291.6888.8484.2182.83
persian-seraji-ud-2.10-220711Gold tokenization98.2498.2898.2897.7896.8092.3689.4884.8283.40
polish-pdb-ud-2.10-220711Raw text99.8597.3398.8995.8996.1195.2698.1094.2292.1985.4488.36
polish-pdb-ud-2.10-220711Gold tokenization99.0596.0396.2495.4098.2494.7292.6985.8388.78
polish-lfg-ud-2.10-220711Raw text99.8599.6599.0096.0896.5795.1698.2496.8695.5189.8092.34
polish-lfg-ud-2.10-220711Gold tokenization99.1796.2596.7495.3398.3897.2595.8990.1992.66
pomak-philotis-ud-2.10-220711Raw text99.9894.4998.8695.6295.3096.6788.2483.2671.1974.14
pomak-philotis-ud-2.10-220711Gold tokenization98.9095.6595.3396.6988.6883.7571.4874.42
portuguese-gsd-ud-2.10-220711Raw text99.8797.2898.5198.5199.7498.4199.2794.5093.4188.7689.96
portuguese-gsd-ud-2.10-220711Gold tokenization98.6598.6499.8998.5599.4094.9093.8189.2390.36
portuguese-bosque-ud-2.10-220711Raw text99.6889.8997.8796.9596.0098.3592.3590.0781.3884.69
portuguese-bosque-ud-2.10-220711Gold tokenization98.2297.2396.2898.6693.5091.1682.4785.87
romanian-nonstandard-ud-2.10-220711Raw text98.8396.7796.1891.8790.5389.1894.9088.8584.8268.2176.36
romanian-nonstandard-ud-2.10-220711Gold tokenization97.3092.8691.4990.1095.9990.5786.5069.6977.68
romanian-rrt-ud-2.10-220711Raw text99.7195.1697.9097.2197.4096.9897.9691.9788.4481.6683.13
romanian-rrt-ud-2.10-220711Gold tokenization98.1997.4597.6597.2298.2292.7289.1382.1583.70
romanian-simonero-ud-2.10-220711Raw text99.84100.0098.4597.9797.5697.2598.9194.0892.1385.5288.32
romanian-simonero-ud-2.10-220711Gold tokenization98.6198.1297.7097.4099.0794.4292.4585.8188.62
russian-syntagrus-ud-2.10-220711Raw text99.6798.3198.4693.9693.7198.1893.8491.7082.7288.90
russian-syntagrus-ud-2.10-220711Gold tokenization98.7994.2894.0398.4694.5692.3983.2889.44
russian-gsd-ud-2.10-220711Raw text99.5096.4998.1197.5594.7193.6197.0191.4488.5581.0484.62
russian-gsd-ud-2.10-220711Gold tokenization98.5897.9895.1794.0197.4392.6789.6982.0085.65
russian-taiga-ud-2.10-220711Raw text98.1286.3395.6593.1392.0694.7383.0879.5770.6073.88
russian-taiga-ud-2.10-220711Gold tokenization97.3494.9093.7296.3785.6481.9272.8276.10
sanskrit-vedic-ud-2.10-220711Raw text100.0027.1889.1681.6176.7687.0560.9250.0441.6644.99
sanskrit-vedic-ud-2.10-220711Gold tokenization89.9783.0278.3487.3473.7462.0152.0055.41
scottish_gaelic-arcosg-ud-2.10-220711Raw text97.4760.8993.7889.2990.9188.2195.0881.2475.6062.7369.22
scottish_gaelic-arcosg-ud-2.10-220711Gold tokenization96.6292.2494.0291.3997.5987.3381.6569.2575.23
serbian-set-ud-2.10-220711Raw text99.9993.0099.0996.0096.2195.7597.7693.6391.2083.7687.00
serbian-set-ud-2.10-220711Gold tokenization99.1396.0196.2095.7597.7894.2691.8084.3287.60
slovak-snk-ud-2.10-220711Raw text100.0081.6997.6590.3593.5089.5696.4691.3989.6580.4384.44
slovak-snk-ud-2.10-220711Gold tokenization97.8890.5593.6989.8096.5093.9192.0882.8986.95
slovenian-ssj-ud-2.10-220711Raw text99.9498.9598.9796.9797.1596.6398.5893.9992.6086.8388.91
slovenian-ssj-ud-2.10-220711Gold tokenization99.0397.0297.2396.6998.6394.1592.7686.9989.02
slovenian-sst-ud-2.10-220711Raw text99.8523.1494.8292.7192.4389.8497.3865.6960.8450.8854.78
slovenian-sst-ud-2.10-220711Gold tokenization95.6293.0992.8490.8997.5678.3973.0763.3968.33
spanish-ancora-ud-2.10-220711Raw text99.9598.7899.0696.0298.7495.5999.3793.7091.7986.4187.88
spanish-ancora-ud-2.10-220711Gold tokenization99.1196.0798.7995.6399.4293.8891.9786.5988.04
spanish-gsd-ud-2.10-220711Raw text99.7595.6297.1596.9495.2798.7291.8789.5778.6384.25
spanish-gsd-ud-2.10-220711Gold tokenization97.3997.1995.5398.9792.6690.3279.4385.04
swedish-talbanken-ud-2.10-220711Raw text99.8496.5398.4497.3397.3296.5198.1592.2389.8583.9285.97
swedish-talbanken-ud-2.10-220711Gold tokenization98.6197.5297.5196.7298.3292.6890.3084.4886.54
swedish-lines-ud-2.10-220711Raw text99.9688.0097.6695.5190.8488.1497.7290.6087.3871.8282.17
swedish-lines-ud-2.10-220711Gold tokenization97.7395.5290.8788.1597.7691.4488.1972.5082.95
tamil-ttb-ud-2.10-220711Raw text94.2697.5284.2983.1884.6478.2289.4570.4361.8850.6155.39
tamil-ttb-ud-2.10-220711Gold tokenization89.2987.7889.9982.7094.4278.1368.7856.8761.48
telugu-mtg-ud-2.10-220711Raw text99.5896.6293.6393.6398.6193.4990.7284.6377.1481.14
telugu-mtg-ud-2.10-220711Gold tokenization94.0494.0499.0393.9091.6885.5877.9881.98
turkish-boun-ud-2.10-220711Raw text98.8386.9391.5692.5191.7286.5693.2378.4872.4059.7765.11
turkish-boun-ud-2.10-220711Gold tokenization92.5393.4792.6787.3194.2681.0774.7361.3366.92
turkish-atis-ud-2.10-220711Raw text100.0080.2098.9698.4698.2599.1589.2287.4985.1286.08
turkish-atis-ud-2.10-220711Gold tokenization99.0298.5298.3299.1391.1189.3086.9887.93
turkish-framenet-ud-2.10-220711Raw text100.00100.0096.8694.8994.2196.6693.3984.2573.9877.64
turkish-framenet-ud-2.10-220711Gold tokenization96.8694.8994.2196.6693.3984.2573.9877.64
turkish-imst-ud-2.10-220711Raw text98.3096.9794.3893.9890.9288.6094.5474.7369.0458.2563.10
turkish-imst-ud-2.10-220711Gold tokenization95.9495.4992.4089.9796.1378.0772.0960.2665.33
turkish-kenet-ud-2.10-220711Raw text100.0098.1293.7192.0590.8693.3383.9171.1861.8164.77
turkish-kenet-ud-2.10-220711Gold tokenization93.7292.0690.8793.3384.0771.2961.9264.89
turkish-penn-ud-2.10-220711Raw text99.3480.5995.6094.4193.3394.3684.2271.6762.2164.53
turkish-penn-ud-2.10-220711Gold tokenization96.3095.1194.0295.0186.7673.9163.6366.02
turkish-tourism-ud-2.10-220711Raw text99.9699.8698.8095.0894.6798.3697.2091.5281.9887.38
turkish-tourism-ud-2.10-220711Gold tokenization98.8595.1294.7398.4097.2591.5882.0487.45
turkish_german-sagt-ud-2.10-220711Raw text98.9199.4490.2180.3275.6090.8271.1460.9841.1251.00
turkish_german-sagt-ud-2.10-220711Gold tokenization91.0980.8976.0891.5272.6962.0641.6451.71
ukrainian-iu-ud-2.10-220711Raw text99.8196.6197.9094.3594.1893.1297.3490.6188.2778.9283.01
ukrainian-iu-ud-2.10-220711Gold tokenization98.0894.5494.3493.2997.5391.1288.7279.2183.36
urdu-udtb-ud-2.10-220711Raw text100.0098.3193.9192.1582.8378.4097.4188.1582.4956.6274.68
urdu-udtb-ud-2.10-220711Gold tokenization93.9392.1782.8678.4397.4188.2382.5856.6774.77
uyghur-udt-ud-2.10-220711Raw text99.5481.8189.3391.7588.1279.9894.6776.6664.8746.8455.29
uyghur-udt-ud-2.10-220711Gold tokenization89.7192.3088.5980.5095.1478.3866.4947.8356.56
vietnamese-vtb-ud-2.10-220711Raw text85.3793.4678.2176.7685.1276.5785.1652.6847.8441.5544.29
vietnamese-vtb-ud-2.10-220711Gold tokenization90.3688.5599.7288.3299.5972.8865.4158.7662.51
welsh-ccg-ud-2.10-220711Raw text99.4297.3795.3394.4089.8287.6193.9386.6180.6763.3169.02
welsh-ccg-ud-2.10-220711Gold tokenization95.8494.8790.3188.0794.4487.8581.8364.3670.21
western_armenian-armtdp-ud-2.10-220711Raw text99.8998.6896.8292.5191.8397.1489.3984.6669.8476.01
western_armenian-armtdp-ud-2.10-220711Gold tokenization96.9092.6091.9397.2289.6484.8970.0776.23
wolof-wtb-ud-2.10-220711Raw text99.2391.9594.2094.1593.5091.4195.2084.1578.6966.7570.23
wolof-wtb-ud-2.10-220711Gold tokenization95.1795.0794.3292.3195.9686.2780.7568.7072.06

Universal Dependencies 2.6 Models

Universal Dependencies 2.6 Models are distributed under theCC BY-NC-SA licence.The models are based solely onUniversal Dependencies2.6 treebanks, and additionallyusemultilingual BERT.

The models requireUDPipe 2.

Download

The latest version 200831 of the Universal Dependencies 2.6 modelscan be downloaded fromLINDAT/CLARIN repository.

The models are also available in theREST service.

Acknowledgements

This work has been supported by the Ministry of Education, Youth and Sports ofthe Czech Republic, Project No. LM2018101 LINDAT/CLARIAH-CZ.

The models were trained onUniversal Dependencies 2.6 treebanks.

For the UD treebanks which do not contain original plain text version,raw text is used to train the tokenizer instead. The plain textswere taken from theW2C -- Web to Corpus.

Finally,multilingual BERTis used to provide contextualized word embeddings.

Publications

Model Description

The Universal Dependencies 2.6 models contain 99 models of 63 languages, each consisting ofa tokenizer, tagger, lemmatizer and dependency parser, all trained usingthe UD data. We used the original train-dev-test split, but for treebanks withonly train and no dev data we used last 10% of the train data as dev data.We produce models only for treebanks with at least 1000 training words.

The tokenizer is trained using theSpaceAfter=No features. If the featuresare not present in the data, they can be filled in using raw text in thelanguage in question.

The tagger, lemmatizer and parser are trained using gold UD data.

Model Performance

We present the tokenizer, tagger, lemmatizer and parser performance, measured onthe testing portion of the data, evaluated both on the raw text and using thegold tokenization. The results are F1 scores measured by theconll18_ud_eval.pyscript.

ModelModeWordsSentsUPOSXPOSUFeatsAllTagsLemmaUASLASMLASBLEX
afrikaans-afribooms-ud-2.6-200830Raw text99.8298.2598.5595.4298.2795.3397.5290.3487.9380.3379.91
afrikaans-afribooms-ud-2.6-200830Gold tokenization98.7095.5698.4195.4897.6190.8088.4080.7880.31
ancient_greek-perseus-ud-2.6-200830Raw text99.9798.8593.2086.0191.5985.2786.8179.5773.9054.8055.63
ancient_greek-perseus-ud-2.6-200830Gold tokenization93.2486.0391.6285.3086.8479.7474.0654.9255.72
ancient_greek-proiel-ud-2.6-200830Raw text100.0048.0297.7398.0492.3691.0194.7179.9875.9960.1565.88
ancient_greek-proiel-ud-2.6-200830Gold tokenization97.9198.1892.5991.3094.7685.9581.9066.7271.84
arabic-padt-ud-2.6-200830Raw text94.5882.0991.6888.9689.1488.6590.3678.8674.8566.0668.12
arabic-padt-ud-2.6-200830Gold tokenization96.8794.2094.3693.8295.2387.6083.1474.5176.09
armenian-armtdp-ud-2.6-200830Raw text99.3497.8595.6490.3088.9494.4585.0779.9766.5471.61
armenian-armtdp-ud-2.6-200830Gold tokenization96.1190.9089.3795.0486.3181.1867.0072.22
basque-bdt-ud-2.6-200830Raw text99.9499.8396.4493.6091.6996.4087.2484.1574.9779.94
basque-bdt-ud-2.6-200830Gold tokenization96.4893.6491.7296.4387.3084.2175.0079.96
belarusian-hse-ud-2.6-200830Raw text99.8478.7096.1431.7882.0726.9881.4875.8171.0549.7850.93
belarusian-hse-ud-2.6-200830Gold tokenization96.3931.8582.1927.0881.5380.1875.1852.9253.60
bulgarian-btb-ud-2.6-200830Raw text99.9194.1799.1597.1997.9596.8497.9794.3591.6185.9286.43
bulgarian-btb-ud-2.6-200830Gold tokenization99.2797.3098.0596.9598.0795.1792.4186.6287.17
catalan-ancora-ud-2.6-200830Raw text99.9899.4399.0598.9998.6398.1499.3194.5392.8687.6389.24
catalan-ancora-ud-2.6-200830Gold tokenization99.0999.0398.6798.1899.3494.6092.9587.7389.35
chinese-gsdsimp-ud-2.6-200830Raw text90.2999.1087.3287.2089.7386.5490.2972.6870.3263.3866.94
chinese-gsdsimp-ud-2.6-200830Gold tokenization96.3296.1599.4395.5099.9986.8983.9378.5282.60
chinese-gsd-ud-2.6-200830Raw text90.2799.1087.2787.1889.7486.5090.2772.9970.5063.8367.21
chinese-gsd-ud-2.6-200830Gold tokenization96.3096.1699.4295.4599.9987.3084.2278.6382.84
classical_chinese-kyoto-ud-2.6-200830Raw text99.4646.2290.9190.9193.4388.0099.4272.7567.1863.6766.02
classical_chinese-kyoto-ud-2.6-200830Gold tokenization93.5593.2495.0190.8699.9685.4980.2076.4279.25
coptic-scriptorium-ud-2.6-200830Raw text71.9135.9769.6168.0063.0660.1670.5147.7545.8925.4235.81
coptic-scriptorium-ud-2.6-200830Gold tokenization96.1592.5387.7581.9896.7089.1485.7957.5776.42
croatian-set-ud-2.6-200830Raw text99.9594.4198.1895.9196.4095.2797.5892.2088.4080.1683.07
croatian-set-ud-2.6-200830Gold tokenization98.2396.0096.5295.3897.6492.7288.8980.6683.53
czech-pdt-ud-2.6-200830Raw text99.9393.3599.2397.6197.5997.1399.0993.8192.0387.7989.88
czech-pdt-ud-2.6-200830Gold tokenization99.3097.7197.7097.2499.1794.6092.8188.4590.57
czech-cac-ud-2.6-200830Raw text99.9899.6899.5297.3397.0596.6498.9394.3192.4887.5689.76
czech-cac-ud-2.6-200830Gold tokenization99.5497.3697.0796.6798.9594.3792.5487.6389.83
czech-fictree-ud-2.6-200830Raw text99.9998.9598.6895.8096.7995.3899.2094.8392.6685.3589.58
czech-fictree-ud-2.6-200830Gold tokenization98.6995.8296.8095.4099.2194.9292.7485.4789.69
czech-cltt-ud-2.6-200830Raw text99.6597.4099.2195.0094.9894.7699.0691.3789.6782.0886.96
czech-cltt-ud-2.6-200830Gold tokenization99.4995.1995.1694.9599.3091.9190.2182.2587.31
danish-ddt-ud-2.6-200830Raw text99.8189.7898.0197.5296.7297.3188.5686.4679.6281.12
danish-ddt-ud-2.6-200830Gold tokenization98.2697.7396.9997.5389.8287.6780.7382.27
dutch-alpino-ud-2.6-200830Raw text99.8388.5997.4195.9897.0295.3697.3292.7990.3881.5383.18
dutch-alpino-ud-2.6-200830Gold tokenization97.5796.1397.1895.5397.4693.9391.5382.7284.42
dutch-lassysmall-ud-2.6-200830Raw text99.8375.4096.5895.4296.4194.7397.2190.3687.6678.8480.17
dutch-lassysmall-ud-2.6-200830Gold tokenization96.7996.0596.9795.4097.3394.2691.2483.5684.84
english-ewt-ud-2.6-200830Raw text98.9586.6096.3696.0696.5694.8897.6489.5587.4380.5083.29
english-ewt-ud-2.6-200830Gold tokenization97.2997.0397.5795.8498.5792.2490.0583.3386.07
english-gum-ud-2.6-200830Raw text99.8183.6696.7996.7697.5595.8897.3590.0287.5279.4180.43
english-gum-ud-2.6-200830Gold tokenization96.9996.9397.7596.0997.5691.9389.3681.2082.25
english-lines-ud-2.6-200830Raw text99.9287.4597.6095.8696.8893.3998.3489.3686.4579.3582.87
english-lines-ud-2.6-200830Gold tokenization97.6795.9096.9293.4198.4190.2687.3680.2483.79
english-partut-ud-2.6-200830Raw text99.72100.0097.3797.0896.2995.3898.2394.1292.0983.0487.20
english-partut-ud-2.6-200830Gold tokenization97.6297.3396.5495.6398.5094.4092.3783.4487.48
estonian-edt-ud-2.6-200830Raw text99.9691.5697.6598.2596.4495.1995.3488.7586.1880.1279.65
estonian-edt-ud-2.6-200830Gold tokenization97.7598.2996.4895.2895.4089.6687.0680.9380.44
estonian-ewt-ud-2.6-200830Raw text98.9670.0995.0096.3093.7491.3193.8181.0777.5569.1470.69
estonian-ewt-ud-2.6-200830Gold tokenization96.2297.3794.6592.3794.8386.3782.5573.0374.39
finnish-tdt-ud-2.6-200830Raw text99.7088.6497.6398.2596.0595.1192.0690.1188.1082.0477.91
finnish-tdt-ud-2.6-200830Gold tokenization97.9798.5696.3795.4892.3891.6989.6383.3079.18
finnish-ftb-ud-2.6-200830Raw text99.9186.8496.5295.0896.7293.8295.7389.9387.3280.1380.74
finnish-ftb-ud-2.6-200830Gold tokenization96.8595.3196.8794.1695.8391.9989.3482.6483.05
french-gsd-ud-2.6-200830Raw text98.8794.6797.2398.8696.6596.0097.6992.7790.8283.1486.08
french-gsd-ud-2.6-200830Gold tokenization98.2999.9997.6396.9498.8094.4692.6384.7287.21
french-sequoia-ud-2.6-200830Raw text99.0987.5098.3397.2596.7998.1693.9092.4586.5489.25
french-sequoia-ud-2.6-200830Gold tokenization99.3298.1997.7899.0995.8094.4388.7890.78
french-partut-ud-2.6-200830Raw text99.42100.0097.2896.9394.1793.6395.5994.7192.7180.1883.34
french-partut-ud-2.6-200830Gold tokenization97.8997.5494.7494.2096.2095.4793.6281.2084.28
french-spoken-ud-2.6-200830Raw text99.0621.1596.4996.4493.9897.4879.2374.9164.4866.67
french-spoken-ud-2.6-200830Gold tokenization97.6397.3195.0098.2887.2782.5174.2375.56
galician-ctg-ud-2.6-200830Raw text99.2297.2297.3097.0799.0596.7198.0785.4583.0772.0376.75
galician-ctg-ud-2.6-200830Gold tokenization98.0497.7999.8397.4398.8287.2284.7374.0578.78
galician-treegal-ud-2.6-200830Raw text98.7487.9995.9993.5894.7292.6396.7183.2679.2367.5471.73
galician-treegal-ud-2.6-200830Gold tokenization97.2394.6595.7693.7397.8986.5782.3071.0475.71
german-hdt-ud-2.6-200830Raw text99.9192.3498.5198.4594.0993.6997.2396.8895.9684.8790.41
german-hdt-ud-2.6-200830Gold tokenization98.6298.5794.2193.8197.3297.5796.6785.5391.10
german-gsd-ud-2.6-200830Raw text99.5880.9094.3997.5191.1485.9796.5887.0682.9362.3374.97
german-gsd-ud-2.6-200830Gold tokenization94.7397.9691.6586.5196.9589.3685.3164.3377.26
gothic-proiel-ud-2.6-200830Raw text100.0031.1296.3996.9090.1888.0594.7074.1068.4855.1662.26
gothic-proiel-ud-2.6-200830Gold tokenization96.8197.2691.1289.2894.7783.7377.9365.3770.85
greek-gdt-ud-2.6-200830Raw text99.8790.1997.9998.0095.5794.9195.5393.0091.1681.2880.73
greek-gdt-ud-2.6-200830Gold tokenization98.1498.1495.6995.0295.6193.8291.9582.0381.53
hebrew-htb-ud-2.6-200830Raw text85.0499.3982.7982.7681.3180.5782.9769.8567.3954.7959.16
hebrew-htb-ud-2.6-200830Gold tokenization97.4897.4896.0395.3697.2391.8389.2578.5281.02
hindi-hdtb-ud-2.6-200830Raw text100.0098.9097.6497.2994.1892.3298.7895.3292.3779.2487.69
hindi-hdtb-ud-2.6-200830Gold tokenization97.6597.2994.2192.3598.7895.4492.4979.4187.84
hungarian-szeged-ud-2.6-200830Raw text99.8595.8996.7794.3293.5194.9787.7884.2474.8077.84
hungarian-szeged-ud-2.6-200830Gold tokenization96.8794.4593.6195.0988.2884.7375.2778.26
indonesian-gsd-ud-2.6-200830Raw text100.0094.1393.8994.2895.5589.0099.6186.0779.9769.2577.74
indonesian-gsd-ud-2.6-200830Gold tokenization93.9094.2695.5288.9899.6186.3280.1869.5178.00
irish-idt-ud-2.6-200830Raw text99.7197.3694.3594.3073.4370.3893.1884.4777.8840.7865.74
irish-idt-ud-2.6-200830Gold tokenization94.5994.6073.6570.6393.4184.9878.3040.9465.87
italian-isdt-ud-2.6-200830Raw text99.8498.7698.5298.4498.2397.6698.6594.7793.1286.9187.85
italian-isdt-ud-2.6-200830Gold tokenization98.6898.6098.3897.8198.8195.0793.4487.2088.19
italian-partut-ud-2.6-200830Raw text99.73100.0098.4198.5298.2797.7798.7496.0793.9087.4588.95
italian-partut-ud-2.6-200830Gold tokenization98.5498.6598.3897.8898.9396.1893.9887.4889.15
italian-postwita-ud-2.6-200830Raw text99.4730.4996.5396.2896.4394.8996.7680.9776.9465.7967.44
italian-postwita-ud-2.6-200830Gold tokenization97.0696.7996.8995.4197.1888.0483.7675.2376.98
italian-twittiro-ud-2.6-200830Raw text99.0636.8095.9995.8695.2293.3794.6881.6977.3864.3465.32
italian-twittiro-ud-2.6-200830Gold tokenization97.0196.7796.1494.4295.5087.8483.4371.6472.68
italian-vit-ud-2.6-200830Raw text99.6994.6997.8697.0797.3895.7698.6492.0389.2080.3983.83
italian-vit-ud-2.6-200830Gold tokenization98.1697.4997.6696.1698.9292.7789.9181.1584.53
japanese-gsd-ud-2.6-200830Raw text95.3494.6193.6793.5695.3292.7495.0285.1184.0176.2377.83
japanese-gsd-ud-2.6-200830Gold tokenization98.0397.7199.9996.8399.6194.7393.4187.6489.28
korean-kaist-ud-2.6-200830Raw text99.95100.0095.8987.8287.6294.2389.4187.5882.3280.34
korean-kaist-ud-2.6-200830Gold tokenization95.9487.8587.6694.2789.5187.6782.4280.42
korean-gsd-ud-2.6-200830Raw text99.8793.9396.6190.1999.6988.0393.5188.6885.0480.9377.36
korean-gsd-ud-2.6-200830Gold tokenization96.7490.3299.8288.1693.6489.5085.8481.7678.14
latin-ittb-ud-2.6-200830Raw text99.9992.4498.5496.3596.9295.1298.9490.3188.1682.1985.37
latin-ittb-ud-2.6-200830Gold tokenization98.5296.3796.9295.1198.9391.2489.0782.6285.88
latin-llct-ud-2.6-200830Raw text100.0099.4999.6097.1397.1196.6397.6895.4894.3589.3190.44
latin-llct-ud-2.6-200830Gold tokenization99.6097.1497.1196.6397.6895.5494.4089.4090.53
latin-proiel-ud-2.6-200830Raw text99.8736.8196.6796.8190.7189.5996.1674.4469.9757.5164.96
latin-proiel-ud-2.6-200830Gold tokenization97.0797.1691.5390.5296.4283.7879.0467.5873.88
latin-perseus-ud-2.6-200830Raw text100.0098.4691.6581.1886.3378.7588.0578.0968.9752.8256.03
latin-perseus-ud-2.6-200830Gold tokenization91.6481.1786.3378.7488.0478.2169.0752.8455.99
latvian-lvtb-ud-2.6-200830Raw text99.3298.7496.2889.6493.7988.8495.8188.3185.2675.2379.56
latvian-lvtb-ud-2.6-200830Gold tokenization96.9290.2494.4089.4396.4589.3386.2376.2980.60
lithuanian-alksnis-ud-2.6-200830Raw text99.9187.8795.9790.3791.0789.4193.6182.5478.7067.9571.30
lithuanian-alksnis-ud-2.6-200830Gold tokenization96.0490.4091.1889.4993.7083.9380.0869.0272.43
lithuanian-hse-ud-2.6-200830Raw text97.3097.3089.6689.2881.4577.0787.9870.9262.5344.2653.76
lithuanian-hse-ud-2.6-200830Gold tokenization91.2391.3283.2178.4090.2873.7764.5345.2554.68
maltese-mudt-ud-2.6-200830Raw text99.8486.2995.7795.6695.3084.7679.7668.3972.24
maltese-mudt-ud-2.6-200830Gold tokenization95.8895.7795.4085.4680.3868.6972.66
marathi-ufal-ud-2.6-200830Raw text90.2592.6378.5065.2561.5080.0061.2553.5031.7340.92
marathi-ufal-ud-2.6-200830Gold tokenization84.2268.6963.8380.1070.3960.9231.9542.32
naija-nsc-ud-2.6-200830Raw text100.0099.5698.1499.1697.7799.2792.4689.8184.1886.20
naija-nsc-ud-2.6-200830Gold tokenization98.1499.1697.7799.2792.5089.8484.2586.26
north_sami-giella-ud-2.6-200830Raw text99.8798.7992.3593.5789.4085.6186.8576.6671.8460.7158.95
north_sami-giella-ud-2.6-200830Gold tokenization92.4793.7089.5685.7586.9676.9772.1660.9559.19
norwegian-bokmaal-ud-2.6-200830Raw text99.8395.6398.3797.5296.8698.5593.7492.2687.0388.76
norwegian-bokmaal-ud-2.6-200830Gold tokenization98.5797.7197.0598.7594.4893.0087.6789.43
norwegian-nynorsk-ud-2.6-200830Raw text99.9194.1198.3697.3896.6798.3793.8692.1186.0788.11
norwegian-nynorsk-ud-2.6-200830Gold tokenization98.5097.5196.8098.5094.6692.9386.9589.01
norwegian-nynorsklia-ud-2.6-200830Raw text99.9199.5396.4595.7193.6298.0580.9076.5365.7469.55
norwegian-nynorsklia-ud-2.6-200830Gold tokenization96.5595.7993.7298.1481.1576.7665.9469.80
old_church_slavonic-proiel-ud-2.6-200830Raw text100.0041.4396.5896.8390.4489.1793.1977.4273.5763.5168.53
old_church_slavonic-proiel-ud-2.6-200830Gold tokenization96.8997.0991.2289.9793.2087.9583.8173.9277.26
old_french-srcmf-ud-2.6-200830Raw text99.93100.0096.4096.2797.8095.5892.2887.7481.0884.17
old_french-srcmf-ud-2.6-200830Gold tokenization96.4796.3397.8695.6492.3687.8181.1784.26
old_russian-torot-ud-2.6-200830Raw text100.0029.6094.3394.3987.5185.1685.8270.6665.1851.2653.18
old_russian-torot-ud-2.6-200830Gold tokenization94.9394.9988.4486.3585.7783.1577.1763.7862.66
old_russian-rnc-ud-2.6-200830Raw text98.1585.4691.8087.7475.8366.6374.9463.0857.5333.8535.04
old_russian-rnc-ud-2.6-200830Gold tokenization93.3489.4377.0967.7676.1366.8660.7336.0537.07
persian-seraji-ud-2.6-200830Raw text99.6598.7597.6997.6697.7597.2996.6791.0988.1583.4382.26
persian-seraji-ud-2.6-200830Gold tokenization97.9897.9798.0797.6096.9491.7488.7684.0082.82
polish-pdb-ud-2.6-200830Raw text99.8597.3398.8895.7395.8495.0398.0594.0292.0184.9388.08
polish-pdb-ud-2.6-200830Gold tokenization99.0495.8895.9995.1898.1994.5192.5085.3688.52
polish-lfg-ud-2.6-200830Raw text99.8599.6598.9295.9996.5195.0698.2796.8995.5289.7392.45
polish-lfg-ud-2.6-200830Gold tokenization99.0996.1896.7095.2598.4197.2995.9190.1292.77
portuguese-gsd-ud-2.6-200830Raw text99.8497.5098.5398.5299.7198.4399.3394.5793.4788.6990.02
portuguese-gsd-ud-2.6-200830Gold tokenization98.6998.6999.8798.5999.4994.9493.8289.1190.36
portuguese-bosque-ud-2.6-200830Raw text99.5590.6497.1996.1794.7997.9892.3289.7279.2984.22
portuguese-bosque-ud-2.6-200830Gold tokenization97.6096.4995.1198.4293.5390.8080.4285.51
romanian-rrt-ud-2.6-200830Raw text99.6995.2897.7997.1897.3296.8198.2091.8387.5680.0082.17
romanian-rrt-ud-2.6-200830Gold tokenization98.0897.4497.6097.0898.4992.7488.3880.8282.88
romanian-nonstandard-ud-2.6-200830Raw text98.3596.7395.6191.3890.0388.6794.2388.8984.4767.5975.81
romanian-nonstandard-ud-2.6-200830Gold tokenization97.2192.9091.5390.1395.7491.0086.4969.5377.28
russian-syntagrus-ud-2.6-200830Raw text99.6098.8098.8697.6097.3898.3394.2292.9789.2790.35
russian-syntagrus-ud-2.6-200830Gold tokenization99.2797.9897.7698.6894.9993.7289.9090.95
russian-gsd-ud-2.6-200830Raw text99.5096.2298.0397.5194.7693.6096.8991.6688.3880.6784.18
russian-gsd-ud-2.6-200830Gold tokenization98.4997.9895.1793.9797.2792.7789.4381.4485.05
russian-taiga-ud-2.6-200830Raw text97.1682.6994.1395.7290.0187.5093.0581.1776.9965.2869.94
russian-taiga-ud-2.6-200830Gold tokenization96.4798.5692.7289.8795.6885.5780.8168.9373.90
sanskrit-vedic-ud-2.6-200830Raw text100.0027.1889.5081.7277.1287.1160.7949.7541.6544.67
sanskrit-vedic-ud-2.6-200830Gold tokenization90.0183.1178.5887.2473.3461.5551.8754.91
scottish_gaelic-arcosg-ud-2.6-200830Raw text99.5855.5793.6387.0789.7885.4395.4177.6671.8655.1560.51
scottish_gaelic-arcosg-ud-2.6-200830Gold tokenization94.2687.8490.2386.3095.8583.7777.6162.0568.26
serbian-set-ud-2.6-200830Raw text99.9993.0098.9895.7595.9295.3597.8293.6691.1883.1886.80
serbian-set-ud-2.6-200830Gold tokenization99.0195.7895.9495.3997.8394.3391.8283.8487.45
slovak-snk-ud-2.6-200830Raw text100.0085.2897.1987.7992.6686.7196.5291.7189.6078.7584.54
slovak-snk-ud-2.6-200830Gold tokenization97.3088.0692.8486.9896.6093.6891.5780.5586.59
slovenian-ssj-ud-2.6-200830Raw text97.9967.9896.9394.3594.5693.9596.5988.0986.6580.9083.58
slovenian-ssj-ud-2.6-200830Gold tokenization98.8696.4496.6996.0198.5494.4192.9686.6889.30
slovenian-sst-ud-2.6-200830Raw text99.8523.1494.7092.7092.5289.7497.1464.2359.5749.2552.95
slovenian-sst-ud-2.6-200830Gold tokenization95.7193.1192.9490.9097.4677.8172.2462.7167.18
spanish-ancora-ud-2.6-200830Raw text99.9598.3299.0999.0298.8798.3399.3693.6291.7886.8288.06
spanish-ancora-ud-2.6-200830Gold tokenization99.1499.0698.9198.3799.4093.8391.9787.0188.24
spanish-gsd-ud-2.6-200830Raw text99.7694.5497.1797.0595.3298.8092.0089.7079.2384.49
spanish-gsd-ud-2.6-200830Gold tokenization97.4097.2795.5499.0292.7390.3879.9385.13
swedish-talbanken-ud-2.6-200830Raw text99.8996.1398.4197.2697.3396.4698.1991.9989.6883.6385.82
swedish-talbanken-ud-2.6-200830Gold tokenization98.5197.3897.4496.5798.3092.4690.1484.1286.33
swedish-lines-ud-2.6-200830Raw text99.9687.2097.7195.4790.8988.1097.7689.1485.8071.4481.67
swedish-lines-ud-2.6-200830Gold tokenization97.7595.4890.9188.0997.7989.9186.5272.1382.48
tamil-ttb-ud-2.6-200830Raw text94.5197.5288.3982.9285.3082.1189.1570.2864.9154.9358.46
tamil-ttb-ud-2.6-200830Gold tokenization93.3687.2890.1086.2293.9778.0371.7961.0964.80
telugu-mtg-ud-2.6-200830Raw text99.5896.6293.6393.6398.4893.6390.1783.5276.0079.62
telugu-mtg-ud-2.6-200830Gold tokenization94.0494.0498.8994.0491.1284.4776.8480.46
turkish-imst-ud-2.6-200830Raw text98.3096.9794.4893.6992.0689.9594.4172.6366.8058.3161.57
turkish-imst-ud-2.6-200830Gold tokenization96.1095.3293.6691.5096.0076.1069.9360.3363.83
ukrainian-iu-ud-2.6-200830Raw text99.8196.6197.8994.2294.1893.1397.3990.5988.2478.7683.19
ukrainian-iu-ud-2.6-200830Gold tokenization98.1094.4294.3493.3097.5691.1188.7579.1183.59
urdu-udtb-ud-2.6-200830Raw text100.0098.3194.1092.2782.8978.4197.3888.2782.6356.7974.77
urdu-udtb-ud-2.6-200830Gold tokenization94.0892.2682.9278.4397.3988.3782.7456.9074.92
uyghur-udt-ud-2.6-200830Raw text99.5481.8189.2491.7088.4780.0494.7676.5864.7246.6755.08
uyghur-udt-ud-2.6-200830Gold tokenization89.6792.2188.9280.4795.2778.3966.2747.5356.23
vietnamese-vtb-ud-2.6-200830Raw text85.3793.4678.1976.6985.1176.5385.1552.8047.9041.5644.31
vietnamese-vtb-ud-2.6-200830Gold tokenization90.5688.6999.7288.4799.5872.6365.2658.8562.42
welsh-ccg-ud-2.6-200830Raw text99.4296.2894.0292.9689.0486.3992.8885.7979.1660.9866.81
welsh-ccg-ud-2.6-200830Gold tokenization94.5493.5189.5286.8393.4687.0480.3562.1468.05
wolof-wtb-ud-2.6-200830Raw text99.2391.9594.2594.1293.3791.1995.2283.7978.5966.5070.09
wolof-wtb-ud-2.6-200830Gold tokenization95.1995.0394.2292.1095.9785.9880.7568.6172.08

Czech PDT-C 1.0 Model

PDT-C 1.0 Model is distributed under theCC BY-NC-SA licence.The model is trained onPDT-C 1.0 treebankusingRobeCzech model, and performsmorphological analysis using theMorfFlex CZ 2.0morphological dictionary viaMorphoDiTa.

The model requiresUDPipe 2.1, togetherwith Python packagesufal.udpipeversion at least 1.3.1.1 andufal.morphoditaversion at least 1.11.2.1.

Download

The latest version 231116 of the Czech PDT-C 1.0 modelcan be downloaded from theLINDAT/CLARIN repository.

The model is also available in theREST service.

PDT-C 1.0 Morphological System

PDT-C 1.0 uses thePDT-C tag set from MorfFlex CZ 2.0, which is an evolutionof the originalPDT tag set devised by Jan Hajič(Hajič, 2004).The tags are positional with 15 positions corresponding to part of speech,detailed part of speech, gender, number, case, etc. (e.g.NNFS1-----A----).Different meanings of same lemmas are distinguished and additional comments canbe provided for every lemma meaning. The complete reference can be found in theManual for Morphological Annotation, Revision for the Prague DependencyTreebank - Consolidated 2020 releaseand quick reference is available in thePDT-C positional morphological tagsoverview.

The PDT-C 1.0 emply dependency relations from thePDT analyticallevel, witha quick reference available in thePDT-C analytical functions and clausesegmentation overview.

In the CoNLL-U format, the

  • tags are filled in theXPOS column, and
  • the dependency relations are filled in theDEPREL, even if they aredifferent from the universal dependency relations.

PDT-C 1.0 Train/Dev/Test Splits

The PDT-C corpus consists of four datasets, but some of them do not havean official train/dev/test split. We therefore used the following split:

  • PDT dataset is already split into train, dev (dtest), and test (etest).
  • PCEDT dataset is a translated version of the Wall Street Journal, so we usedthe usual split into train (sections 0-18), dev (sections 19-21), and test(sections 22-24).
  • PDTSC and FAUST datasets have no split, so we split it into dev (documentswith identifiers ending with 6), test (documents with identifiers ending with 7),and train (all the remaining documents).

Acknowledgements

This work has been supported by the LINDAT/CLARIAH-CZ project funded by Ministryof Education, Youth and Sports of the Czech Republic (project LM2023062).

Publications

Model Performance

Tagging and Lemmatization

We evaluate tagging and lemmatization on the four datasets of PDT-C 1.0,and we also compute a macro-average. For lemmatization, we use the followingmetrics:

  • Lemmas: a primary metric comparing thelemma proper, which is the lemmawith an optional lemma number (but we ignore the additional lemma commentslike “this is a given name”);
  • LemmasEM: an exact match comparing also the lemma comments. This metric isless or equal toLemmas. Our model directly predicts only lemma proper (noadditional comments), and relies on the morphological dictionary to supply thecomments, so it fails to generate comments for unknown words (like an unknowngiven name).

We perform the evaluation using theudpipe2_eval.py,which is a minor extension of theCoNLL 2018 SharedTask evaluationscript.

Because the model also include a rule-based tokenizer and sentence splitter,we evaluate both:

  • using raw input text, which must first be tokenized and split into sentences.The resulting scores are in fact F1-scores. Note that the FAUST dataset doesnot contain any discernible sentence boundaries.
  • using gold tokenization.
TreebankModeTokensSentsXPOSLemmaLemmaEM
PDTRaw text99.9188.0098.6999.1098.86
PDTGold tokenization98.7899.1998.96
PCEDTRaw text99.9794.0698.7799.3698.75
PCEDTGold tokenization98.8099.4098.78
PDTSCRaw text100.098.3198.7799.2399.16
PDTSCGold tokenization98.7799.2399.16
FAUSTRaw text100.010.9897.0598.8898.43
FAUSTGold tokenization97.4298.7898.30
MacroAvgGold tokenization98.4499.1598.80

Dependency Parsing

In PDT-C 1.0, the only manually annotated dependency parsing dataset is a subsetof the PDT dataset. We perform the evaluation as in the previous section.

TreebankModeTokensSentsXPOSLemmaLemmaEMUASLAS
PDT subsetRaw text99.9488.4998.7499.1698.9793.4590.32
PDT subsetGold tokenization98.8199.2399.0394.4191.48

EvaLatin 2020 Models

EvaLatin 2020 Models are distributed under theCC BY-NC-SA licence.The models are based solely onEvaLatin 2020treebanks, and additionally usemultilingual BERT.

The models requireUDPipe 2.

Download

The latest version 200831 of the EvaLatin 2020 modelscan be downloaded fromLINDAT/CLARIN repository.

The models are also available in theREST service.

Acknowledgements

This work was supported by the grant no. GX20-16819X of the Grant Agency of theCzech Republic, and has been using language resources stored and distributed bythe LINDAT/CLARIAH-CZ project of the Ministry of Education, Youth and Sports ofthe Czech Republic (project LM2018101).

The models were trained onEvaLatin 2020 treebanks.

Finally,multilingual BERTis used to provide contextualized word embeddings.

Publications

Model Performance

ModelDatasetUPOSLemma
latin-evalatin20-200830test classical96.7396.39
latin-evalatin20-200830test cross-genre90.4786.89
latin-evalatin20-200830test cross-time87.5890.59


[8]ページ先頭

©2009-2025 Movatter.jp