Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

A comprehensive collection of multilingual datasets and large language models, meticulously curated for evaluating and enhancing the performance of large language models across diverse languages and tasks.

License

NotificationsYou must be signed in to change notification settings

zabir-nabil/awesome-multilingual-large-language-models

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 

Repository files navigation

logo

Awesome Multilingual Large Language Models

A comprehensive collection of multilingual datasets and large language models, meticulously curated for evaluating and enhancing the performance of large language models across diverse languages and tasks.

contributorslast updateforksstarsopen issues


Datasets

DatasetYearLanguagesGitHubDownload
Star
OMGEval : An Open Multilingual Generative Evaluation Benchmark for Large Language Models
2024Chinese (zh) (🇨🇳), Russian (ru) (🇷🇺), French (fr) (🇫🇷), Spanish (es) (🇪🇸), Arabic (ar) (🇸🇦)GithubData
Star
MoZIP: A Multilingual Benchmark to Evaluate Large Language Models in Intellectual Property
2024Chinese (zh) (🇨🇳), English (en) (🇬🇧), German (de) (🇩🇪), Japanese (ja) (🇯🇵), French (fr) (🇫🇷), Korean (ko) (🇰🇷), Russian (ru) (🇷🇺), Spanish (es) (🇪🇸), Portuguese (pt) (🇵🇹), Catalan (ca) (🇦🇩)GithubData
Star
MLaKE: Multilingual Knowledge Editing Benchmark for Large Language Models
2024English (en) (🇬🇧), Chinese (zh) (🇨🇳), Japanese (ja) (🇯🇵), French (fr) (🇫🇷), German (de) (🇩🇪)GithubData
Star
M3Exam: A Multilingual, Multimodal, Multilevel Benchmark for Examining Large Language Models
2023English (🇺🇸), Chinese (🇨🇳), Italian (🇮🇹), Portuguese (🇧🇷), Vietnamese (🇻🇳), Thai (🇹🇭), Swahili (🇰🇪), Afrikaans (🇿🇦), Javanese (🇮🇩)GithubData
Star
Language models are multilingual chain-of-thought reasoners
2023Bengali (🇧🇩), Chinese (🇨🇳), French (🇫🇷), German (🇩🇪), Japanese (🇯🇵), Russian (🇷🇺), Spanish (🇪🇸), Swahili (🇰🇪), Telugu (🇮🇳), Thai (🇹🇭)GithubData
CulturaX: A Cleaned, Enormous, and Multilingual Dataset for Large Language Models in 167 Languages
2023English [🇬🇧], Russian [🇷🇺], Spanish [🇪🇸], German [🇩🇪], French [🇫🇷], Chinese [🇨🇳], Italian [🇮🇹], Portuguese [🇵🇹], Polish [🇵🇱], Japanese [🇯🇵], Vietnamese [🇻🇳], Dutch [🇳🇱], Arabic [🇸🇦], Turkish [🇹🇷], Czech [🇨🇿], Persian [🇮🇷], Hungarian [🇭🇺], Greek [🇬🇷], Romanian [🇷🇴], Swedish [🇸🇪], Ukrainian [🇺🇦], Finnish [🇫🇮], Korean [🇰🇷], Danish [🇩🇰], Bulgarian [🇧🇬], Norwegian [🇳🇴], Hindi [🇮🇳], Slovak [🇸🇰], Thai [🇹🇭], Lithuanian [🇱🇹], Catalan [🇪🇸], Indonesian [🇮🇩], Bangla [🇧🇩], Estonian [🇪🇪], Slovenian [🇸🇮], Latvian [🇱🇻], Hebrew [🇮🇱], Serbian [🇷🇸], Tamil [🇮🇳], Albanian [🇦🇱], Azerbaijani [🇦🇿]🤗Data
Star
Language models are multilingual chain-of-thought reasoners
2023Bengali (🇧🇩), Chinese (🇨🇳), French (🇫🇷), German (🇩🇪), Japanese (🇯🇵), Russian (🇷🇺), Spanish (🇪🇸), Swahili (🇰🇪), Telugu (🇮🇳), Thai (🇹🇭)GithubData
Wiki-40B: Multilingual Language Model Dataset2020English (🇺🇸), German (🇩🇪), French (🇫🇷), Russian (🇷🇺), Spanish (🇪🇸), Italian (🇮🇹), Japanese (🇯🇵), Chinese Simplified (🇨🇳), Chinese Traditional (🇹🇼), Polish (🇵🇱), Ukrainian (🇺🇦), Dutch (🇳🇱), Swedish (🇸🇪), Portuguese (🇵🇹), Serbian (🇷🇸), Hungarian (🇭🇺), Catalan (🇪🇸), Czech (🇨🇿), Finnish (🇫🇮), Arabic (🇸🇦), Korean (🇰🇷), Persian (🇮🇷), Norwegian (🇳🇴), Vietnamese (🇻🇳), Hebrew (🇮🇱), Indonesian (🇮🇩), Romanian (🇷🇴), Turkish (🇹🇷), Bulgarian (🇧🇬), Estonian (🇪🇪), Malay (🇲🇾), Danish (🇩🇰), Slovak (🇸🇰), Croatian (🇭🇷), Greek (🇬🇷), Lithuanian (🇱🇹), Slovenian (🇸🇮), Thai (🇹🇭), Hindi (🇮🇳), Latvian (🇱🇻), Filipino (🇵🇭)👁️Data
Common Sense Beyond English: Evaluating and Improving Multilingual Language Models for Commonsense Reasoning2021English (🇺🇸), German (🇩🇪), French (🇫🇷), Russian (🇷🇺), Spanish (🇪🇸), Hindi (🇮🇳), Vietnamese (🇻🇳), Bulgarian (🇧🇬), Chinese (🇨🇳), Dutch (🇳🇱), Italian (🇮🇹), Japanese (🇯🇵), Polish (🇵🇱), Portuguese (🇵🇹), Arabic (🇸🇦), Swahili (🇹🇿), Urdu (🇵🇰)GitHubData
The BigScience ROOTS Corpus: A 1.6TB Composite Multilingual Dataset2022Akan (🇬🇭), Arabic (🇸🇦), Assamese (🇮🇳), Bambara (🇲🇱), Basque (🇪🇸), Bengali (🇧🇩), Catalan (🇪🇸), Chichewa (🇲🇼), chiShona (🇿🇼), Chitumbuka (🇲🇼), English (🇬🇧), Fon (🇧🇯), French (🇫🇷), Gujarati (🇮🇳), Hindi (🇮🇳), Igbo (🇳🇬), Indonesian (🇮🇩), isiXhosa (🇿🇦), isiZulu (🇿🇦), Kannada (🇮🇳), Kikuyu (🇰🇪), Kinyarwanda (🇷🇼), Kirundi (🇧🇮), Lingala (🇨🇩), Luganda (🇺🇬), Malayalam (🇮🇳), Marathi (🇮🇳), Nepali (🇳🇵), Northern Sotho (🇿🇦), Odia (🇮🇳), Portuguese (🇵🇹), Punjabi (🇮🇳), Sesotho (🇱🇸), Setswana (🇧🇼), Simplified Chinese (🇨🇳), Spanish (🇪🇸), Swahili (🇰🇪), Tamil (🇮🇳), Telugu (🇮🇳), Traditional Chinese (🇹🇼), Twi (🇬🇭), Urdu (🇵🇰), Vietnamese (🇻🇳), Wolof (🇸🇳), Xitsonga (🇿🇦), Yoruba (🇳🇬), Programming Languages (💻)GitHubData
GEOMLAMA: Geo-Diverse Commonsense Probing on Multilingual Pre-Trained Language Models2022English (🇺🇸), Chinese (🇨🇳), Hindi (🇮🇳), Persian (🇮🇷), Swahili (🇰🇪)GitHub🔍

Models

TitleYearLanguagesCodeDemo
Aya Model: An Instruction Finetuned Open-Access Multilingual Language Model
2024Afrikaans [🇿🇦], Amharic [🇪🇹], Arabic [🇸🇦], Azerbaijani [🇦🇿], Belarusian [🇧🇾], Bengali [🇧🇩], Bulgarian [🇧🇬], Catalan [🇪🇸], Cebuano [🇵🇭], Czech [🇨🇿], Welsh [🏴], Danish [🇩🇰], German [🇩🇪], Greek [🇬🇷], English [🇬🇧], Esperanto [🇪🇸], Estonian [🇪🇪], Basque [🇪🇸], Finnish [🇫🇮], Tagalog [🇵🇭], French [🇫🇷], Western Frisian [🇳🇱], Scottish Gaelic [🏴], Irish [🇮🇪], Galician [🇪🇸], Gujarati [🇮🇳], Haitian Creole [🇭🇹], Hausa [🇳🇪], Hebrew [🇮🇱], Hindi [🇮🇳], Hungarian [🇭🇺], Armenian [🇦🇲], Igbo [🇳🇬], Indonesian [🇮🇩], Icelandic [🇮🇸], Italian [🇮🇹], Javanese [🇮🇩], Japanese [🇯🇵], Kannada [🇮🇳], Georgian [🇬🇪], Kazakh [🇰🇿], Khmer [🇰🇭], Kyrgyz [🇰🇬], Korean [🇰🇷], Kurdish [🇹🇷], Lao [🇱🇦], Latvian [🇱🇻], Latin [🇻🇦], Lithuanian [🇱🇹], Luxembourgish [🇱🇺], Malayalam [🇮🇳], Marathi [🇮🇳], Macedonian [🇲🇰], Malagasy [🇲🇬], Maltese [🇲🇹], Mongolian [🇲🇳], Maori [🇳🇿], Malay [🇲🇾], Burmese [🇲🇲], Nepali [🇳🇵], Dutch [🇳🇱], Norwegian [🇳🇴], Northern Sotho [🇿🇦], Chichewa [🇲🇼], Oriya [🇮🇳], Punjabi [🇮🇳], Persian [🇮🇷], Polish [🇵🇱], Portuguese [🇵🇹], Pashto [🇦🇫], Romanian [🇷🇴], Russian [🇷🇺], Sinhala [🇱🇰], Slovak [🇸🇰], Slovenian [🇸🇮], Samoan [🇼🇸], Shona [🇿🇼], Sindhi [🇵🇰], Somali [🇸🇴], Southern Sotho [🇱🇸], Spanish [🇪🇸], Albanian [🇦🇱], Serbian [🇷🇸], Sundanese [🇮🇩], Swahili [🇰🇪], Swedish [🇸🇪], Tamil [🇮🇳], Telugu [🇮🇳], Tajik [🇹🇯], Thai [🇹🇭], Turkish [🇹🇷], Twi [🇬🇭], Ukrainian [🇺🇦], Urdu [🇵🇰], Uzbek [🇺🇿], Vietnamese [🇻🇳], Xhosa [🇿🇦], Yiddish [🇮🇱], Yoruba [🇳🇬], Chinese [🇨🇳], Zulu [🇿🇦]Source🤗
Star
LANGBRIDGE: Multilingual Reasoning Without Multilingual Supervision
2024Arabic (ar) (🇸🇦), Bengali (bn) (🇧🇩), Chinese (zh) (🇨🇳), Danish (da) (🇩🇰), Dutch (nl) (🇳🇱), English (en) (🇬🇧), French (fr) (🇫🇷), German (de) (🇩🇪), Hindi (hi) (🇮🇳), Japanese (ja) (🇯🇵), Korean (ko) (🇰🇷), Marathi (mr) (🇮🇳), Punjabi (pa) (🇮🇳), Russian (ru) (🇷🇺), Spanish (es) (🇪🇸), Swahili (sw) (🇰🇪), Telugu (te) (🇮🇳), Turkish (tr) (🇹🇷), Urdu (ur) (🇵🇰)Github🤗
Star
Orion-14B: Open-source Multilingual Large Language Models
2024English [🇬🇧], Chinese [🇨🇳], Japanese [🇯🇵], Korean [🇰🇷], Spanish [🇪🇸], French [🇫🇷], German [🇩🇪], Arabic [🇸🇦]Github🤗
Star
Baichuan 2: Open Large-scale Language Models
2023Arabic (ar) (🇸🇦), Chinese (zh) (🇨🇳), English (en) (🇬🇧), French (fr) (🇫🇷), Russian (ru) (🇷🇺), Spanish (es) (🇪🇸), German (de) (🇩🇪), Japanese (ja) (🇯🇵)Github🤗
Star
Larger-Scale Transformers for Multilingual Masked Language Modeling
2021Afrikaans (🇿🇦), Albanian (🇦🇱), Amharic (🇪🇹), Arabic (🇸🇦), Armenian (🇦🇲), Assamese (🇮🇳), Azerbaijani (🇦🇿), Basque (🇪🇸), Belarusian (🇧🇾), Bengali (🇧🇩), Bengali Romanize (🇧🇩), Bosnian (🇧🇦), Breton (🏴), Bulgarian (🇧🇬), Burmese (🇲🇲), Burmese zawgyi font (🇲🇲), Catalan (🇪🇸), Chinese (Simplified) (🇨🇳), Chinese (Traditional) (🇹🇼), Croatian (🇭🇷), Czech (🇨🇿), Danish (🇩🇰), Dutch (🇳🇱), English (🇬🇧), Esperanto (🏴), Estonian (🇪🇪), Filipino (🇵🇭), Finnish (🇫🇮), French (🇫🇷), Galician (🇪🇸), Georgian (🇬🇪), German (🇩🇪), Greek (🇬🇷), Gujarati (🇮🇳), Hausa (🇳🇬), Hebrew (🇮🇱), Hindi (🇮🇳), Hindi Romanize (🇮🇳), Hungarian (🇭🇺), Icelandic (🇮🇸), Indonesian (🇮🇩), Irish (🇮🇪), Italian (🇮🇹), Japanese (🇯🇵), Javanese (🇮🇩), Kannada (🇮🇳), Kazakh (🇰🇿), Khmer (🇰🇭), Korean (🇰🇷), Kurdish (Kurmanji) (🇹🇷), Kyrgyz (🇰🇬), Lao (🇱🇦), Latin (🏛️), Latvian (🇱🇻), Lithuanian (🇱🇹), Macedonian (🇲🇰), Malagasy (🇲🇬), Malay (🇲🇾), Malayalam (🇮🇳), Marathi (🇮🇳), Mongolian (🇲🇳), Nepali (🇳🇵), Norwegian (🇳🇴), Oriya (🇮🇳), Oromo (🇪🇹), Pashto (🇦🇫), Persian (🇮🇷), Polish (🇵🇱), Portuguese (🇵🇹), Punjabi (🇮🇳), Romanian (🇷🇴), Russian (🇷🇺), Sanskrit (🇮🇳), Scottish Gaelic (🏴), Serbian (🇷🇸), Sindhi (🇵🇰), Sinhala (🇱🇰), Slovak (🇸🇰), Slovenian (🇸🇮), Somali (🇸🇴), Spanish (🇪🇸), Sundanese (🇮🇩), Swahili (🇰🇪), Swedish (🇸🇪), Tamil (🇮🇳), Tamil Romanize (🇮🇳), Telugu (🇮🇳), Telugu Romanize (🇮🇳), Thai (🇹🇭), Turkish (🇹🇷), Ukrainian (🇺🇦), Urdu (🇵🇰), Urdu Romanize (🇵🇰), Uyghur (🇨🇳), Uzbek (🇺🇿), Vietnamese (🇻🇳), Welsh (🏴), Western Frisian (🇳🇱), Xhosa (🇿🇦), Yiddish (🇮🇱)Github🔍
Star
InternLM: A Multilingual Language Model with Progressively Enhanced Capabilities
2023English (🇺🇸), Chinese (🇨🇳)Github🔍
PolyLM: An Open Source Polyglot Large Language Model
2023English (EN) [🇬🇧], Chinese (ZH) [🇨🇳], Russian (RU) [🇷🇺], Spanish (ES) [🇪🇸], German (DE) [🇩🇪], French (FR) [🇫🇷], Italian (IT) [🇮🇹], Portuguese (PT) [🇵🇹], Japanese (JA) [🇯🇵], Vietnamese (VI) [🇻🇳], Indonesian (ID) [🇮🇩], Polish (PL) [🇵🇱], Dutch (NL) [🇳🇱], Arabic (AR) [🇦🇪], Turkish (TR) [🇹🇷], Thai (TH) [🇹🇭], Hebrew (HE) [🇮🇱], Korean (KO) [🇰🇷]Model🔍
Star
BLOOM: A 176B-Parameter Open-Access Multilingual Language Model
2023Akan (🇬🇭), Arabic (🇸🇦), Assamese (🇮🇳), Bambara (🇲🇱), Basque (🇪🇸), Bengali (🇧🇩), Catalan (🇪🇸), Chichewa (🇲🇼), chiShona (🇿🇼), Chitumbuka (🇲🇼), English (🇬🇧), Fon (🇧🇯), French (🇫🇷), Gujarati (🇮🇳), Hindi (🇮🇳), Igbo (🇳🇬), Indonesian (🇮🇩), isiXhosa (🇿🇦), isiZulu (🇿🇦), Kannada (🇮🇳), Kikuyu (🇰🇪), Kinyarwanda (🇷🇼), Kirundi (🇧🇮), Lingala (🇨🇩), Luganda (🇺🇬), Malayalam (🇮🇳), Marathi (🇮🇳), Nepali (🇳🇵), Northern Sotho (🇿🇦), Odia (🇮🇳), Portuguese (🇵🇹), Punjabi (🇮🇳), Sesotho (🇱🇸), Setswana (🇧🇼), Simplified Chinese (🇨🇳), Spanish (🇪🇸), Swahili (🇰🇪), Tamil (🇮🇳), Telugu (🇮🇳), Traditional Chinese (🇹🇼), Twi (🇬🇭), Urdu (🇵🇰), Vietnamese (🇻🇳), Wolof (🇸🇳), Xitsonga (🇿🇦), Yoruba (🇳🇬), Programming Languages (💻)Github🤗
Star
Glot500: Scaling Multilingual Corpora and Language Models to 500 Languages
2023hbs_Latn (🇭🇷), mal_Mlym (🇮🇳), aze_Latn (🇦🇿), guj_Gujr (🇮🇳), ben_Beng (🇮🇳), kan_Knda (🇮🇳), tel_Telu (🇮🇳), mlt_Latn (🇲🇹), fra_Latn (🇫🇷), spa_Latn (🇪🇸), eng_Latn (🇬🇧), fil_Latn (🇵🇭), nob_Latn (🇳🇴), rus_Cyrl (🇷🇺), deu_Latn (🇩🇪), tur_Latn (🇹🇷), pan_Guru (🇮🇳), mar_Deva (🇮🇳), por_Latn (🇵🇹), nld_Latn (🇳🇱), ara_Arab (🇸🇦), zho_Hani (🇨🇳), ita_Latn (🇮🇹), ind_Latn (🇮🇩), ell_Grek (🇬🇷), bul_Cyrl (🇧🇬), swe_Latn (🇸🇪), ces_Latn (🇨🇿), isl_Latn (🇮🇸), pol_Latn (🇵🇱), ron_Latn (🇷🇴), dan_Latn (🇩🇰), hun_Latn (🇭🇺), tgk_Cyrl (🇹🇯), srp_Latn (🇷🇸), fas_Arab (🇮🇷), ceb_Latn (🇵🇭), heb_Hebr (🇮🇱), hrv_Latn (🇭🇷), glg_Latn (🇪🇸), fin_Latn (🇫🇮), slv_Latn (🇸🇮), vie_Latn (🇻🇳), mkd_Cyrl (🇲🇰), slk_Latn (🇸🇰), nor_Latn (🇳🇴), est_Latn (🇪🇪), ltz_Latn (🇱🇺), eus_Latn (🇪🇸), lit_Latn (🇱🇹), kaz_Cyrl (🇰🇿), lav_Latn (🇱🇻), bos_Latn (🇧🇦), epo_Latn (🇺🇸), cat_Latn (🇪🇸), tha_Thai (🇹🇭), ukr_Cyrl (🇺🇦), tgl_Latn (🇵🇭), sin_Sinh (🇱🇰), gle_Latn (🇮🇪), hin_Deva (🇮🇳), kor_Hang (🇰🇷), ory_Orya (🇮🇳), urd_Arab (🇵🇰), swa_Latn (🇰🇪), sqi_Latn (🇦🇱), bel_Cyrl (🇧🇾), afr_Latn (🇿🇦), nno_Latn (🇳🇴), tat_Cyrl (🇷🇺), asm_Beng (🇮🇳), hil_Latn (🇵🇭), nso_Latn (🇿🇦), ibo_Latn (🇳🇬), kin_Latn (🇷🇼), tpi_Latn (🇵🇬), twi_Latn (🇬🇭), kir_Cyrl (🇰🇬), nep_Deva (🇳🇵), azj_Latn (🇦🇿), bcl_Latn (🇵🇭), xho_Latn (🇿🇦), cym_Latn (🏴), gaa_Latn (🇬🇭), ton_Latn (🇹🇴), tah_Latn (🇵🇫), lat_Latn (🇻🇦), srn_Latn (🇸🇷), ewe_Latn (🇬🇭), bem_Latn (🇿🇲), orm_Latn (🇪🇹), haw_Latn (🇺🇸), hmo_Latn (🇵🇬), kat_Geor (🇬🇪), pag_Latn (🇵🇭), loz_Latn (🇿🇲), fry_Latn (🇳🇱), mya_Mymr (🇲🇲), nds_Latn (🇩🇪), run_Latn (🇧🇮), pnb_Arab (🇵🇰), rar_Latn (🇨🇰), fij_Latn (🇫🇯), wls_Latn (🇼🇸), ckb_Arab (🇮🇶), ven_Latn (🇿🇦), zsm_Latn (🇲🇾), chv_Cyrl (🇷🇺), lua_Latn (🇨🇩), que_Latn (🇵🇪), sag_Latn (🇨🇫), guw_Latn (🇬🇼), bre_Latn (🇫🇷), toi_Latn (🇨🇫), pus_Arab (🇦🇫), che_Cyrl (🇷🇺), pis_Latn (🇸🇧), kon_Latn (🇨🇩), oss_Cyrl (🇷🇺), hyw_Armn (🇦🇲), iso_Latn (🇻🇺), nan_Latn (🇹🇼), lub_Latn (🇨🇩), lim_Latn (🇳🇱), tuk_Latn (🇹🇲), tir_Ethi (🇪🇹), tgk_Latn (🇹🇯), yua_Latn (🇲🇽), min_Latn (🇮🇩), lue_Latn (🇨🇩), khm_Khmr (🇰🇭), tum_Latn (🇲🇼), tll_Latn (🇳🇦), ekk_Latn (🇪🇪), lug_Latn (🇺🇬), niu_Latn (🇳🇺), tzo_Latn (🇲🇽), mah_Latn (🇲🇭), tvl_Latn (🇹🇻), jav_Latn (🇮🇩), hau_Latn (🇳🇬), som_Latn (🇸🇴), uzb_Latn (🇺🇿), sot_Latn (🇿🇦), uzb_Cyrl (🇺🇿), cos_Latn (🇫🇷), als_Latn (🇦🇱), amh_Ethi (🇪🇹), sun_Latn (🇮🇩), war_Latn (🇵🇭), div_Thaa (🇲🇻), yor_Latn (🇳🇬), fao_Latn (🇫🇴), uzn_Cyrl (🇺🇿), smo_Latn (🇼🇸), bak_Cyrl (🇷🇺), ilo_Latn (🇵🇭), tso_Latn (🇿🇦), mri_Latn (🇳🇿), hmn_Latn (🇺🇸), nau_Latn (🇳🇷), asm_Beng (🇮🇳), hil_Latn (🇵🇭), nso_Latn (🇿🇦), ibo_Latn (🇳🇬), kin_Latn (🇷🇼), tpi_Latn (🇵🇬), twi_Latn (🇬🇭), kir_Cyrl (🇰🇬), pap_Latn (🇳🇱), aze_Latn (🇦🇿), qvi_Latn (🇵🇪), cak_Latn (🇬🇹), kbp_Latn (🇧🇫), kri_Latn (🇸🇱), mau_Latn (🇲🇽), scn_Latn (🇮🇹), tyv_Cyrl (🇷🇺), ina_Latn (🇧🇪), btx_Latn (🇮🇩), nch_Latn (🇲🇽), ncj_Latn (🇲🇽), pau_Latn (🇵🇼), toj_Latn (🇲🇽), pcm_Latn (🇳🇬), dyu_Latn (🇧🇫), kss_Latn (🇳🇬), afb_Arab (🇸🇦), urh_Latn (🇳🇬), quc_Latn (🇬🇹), new_Deva (🇳🇵), yao_Latn (🇲🇼), ngl_Latn (🇲🇿), nyu_Latn (🇲🇿), kab_Latn (🇩🇿), tuk_Cyrl (🇹🇲), xmf_Geor (🇬🇪), ndc_Latn (🇲🇿), san_Deva (🇮🇳), nba_Latn (🇳🇬), bpy_Beng (🇮🇳), ncx_Latn (🇲🇽), qug_Latn (🇵🇪), rmn_Latn (🇮🇳), cjk_Latn (🇬🇹), arb_Arab (🇸🇦), kea_Latn (🇨🇻), mck_Latn (🇨🇩), arn_Latn (🇨🇱), pdt_Latn (🇩🇪), her_Latn (🇳🇦), tlh_Latn (🇺🇸), suz_Deva (🇮🇳), kat_Geor (🇬🇪), kmr_Cyrl (🇷🇺), gcr_Latn (🇬🇵), jbo_Latn (🇺🇸), tbz_Latn (🇵🇼), bam_Latn (🇲🇱), prk_Latn (🇸🇮), jam_Latn (🇯🇲), twx_Latn (🇹🇼), sme_Latn (🇫🇮), gom_Latn (🇮🇳), bum_Latn (🇨🇲), mgr_Latn (🇲🇼), ahk_Latn (🇵🇰), kur_Arab (🇮🇶), bas_Latn (🇨🇲), bin_Latn (🇳🇬), tsz_Latn (🇲🇽), sid_Latn (🇪🇹), diq_Latn (🇹🇷), srd_Latn (🇮🇹), tcf_Latn (🇲🇽), bzj_Latn (🇮🇳), udm_Cyrl (🇷🇺), cce_Latn (🇨🇲), meu_Latn (🇨🇩), chw_Latn (🇨🇲), cbk_Latn (🇵🇭), ibg_Latn (🇮🇩), bhw_Latn (🇮🇩), ngu_Latn (🇲🇽), nyy_Latn (🇹🇿), szl_Latn (🇵🇱), ish_Latn (🇹🇿), naq_Latn (🇳🇦), toh_Latn (🇳🇿), ttj_Latn (🇰🇪), nse_Latn (🇳🇬), ami_Latn (🇹🇼), alz_Latn (🇸🇩), apc_Arab (🇸🇾), vls_Latn (🇳🇱), mhr_Cyrl (🇷🇺), djk_Latn (🇩🇪), prs_Arab (🇦🇫), san_Latn (🇮🇳), som_Arab (🇸🇴), uig_Latn (🇨🇳), hau_Arab (🇳🇬)Github🔍
Star
Few-shot Learning with Multilingual Generative Language Models
2022English (🇺🇸), Russian (🇷🇺), Chinese (🇨🇳), German (🇩🇪), Spanish (🇪🇸), French (🇫🇷), Japanese (🇯🇵), Italian (🇮🇹), Portuguese (🇵🇹), Greek (🇬🇷), Romanian (🇷🇴), Ukrainian (🇺🇦), Hungarian (🇭🇺), Korean (🇰🇷), Polish (🇵🇱), Norwegian (🇳🇴), Dutch (🇳🇱), Finnish (🇫🇮), Danish (🇩🇰), Indonesian (🇮🇩), Croatian (🇭🇷), Turkish (🇹🇷), Arabic (🇸🇦), Vietnamese (🇻🇳), Thai (🇹🇭), Bulgarian (🇧🇬), Persian (🇮🇷), Swedish (🇸🇪), Malay (🇲🇾), Hebrew (🇮🇱), Czech (🇨🇿), Slovak (🇸🇰), Catalan (🇪🇸), Lithuanian (🇱🇹), Slovene (🇸🇮), Hindi (🇮🇳), Estonian (🇪🇪), Latvian (🇱🇻), Tagalog (🇵🇭), Albanian (🇦🇱), Serbian (🇷🇸), Azerbaijani (🇦🇿), Bengali (🇧🇩), Tamil (🇮🇳), Urdu (🇵🇰), Kazakh (🇰🇿), Armenian (🇦🇲), Georgian (🇬🇪), Icelandic (🇮🇸), Belarusian (🇧🇾), Bosnian (🇧🇦), Malayalam (🇮🇳), Macedonian (🇲🇰), Swahili (🇹🇿), Afrikaans (🇿🇦), Telugu (🇮🇳), Arabic Romanized (🇸🇦), Mongolian (🇲🇳), Latin (🇮🇹), Nepali (🇳🇵), Sinhalese (🇱🇰), Marathi (🇮🇳), Kannada (🇮🇳), Somali (🇸🇴), Welsh (🏴), Javanese (🇮🇩), Pashto (🇦🇫), Uzbek (🇺🇿), Gujarati (🇮🇳), Khmer (🇰🇭), Urdu Romanized (🇵🇰), Amharic (🇪🇹), Bengali Romanized (🇧🇩), Punjabi (🇮🇳), Galician (🇪🇸), Hausa (🇳🇬), Sanskrit (🇮🇳), Basque (🇪🇸), Burmese (🇲🇲), Sundanese (🇮🇩), Oriya (🇮🇳), Haitian (🇭🇹), Lao (🇱🇦), Kyrgyz (🇰🇬), Breton (🇫🇷), Irish (🇮🇪), Yoruba (🇳🇬), Esperanto (🌐), Tamil Romanized (🇮🇳), Zulu (🇿🇦), Tigrinya (🇪🇷), Telugu Romanized (🇮🇳), Kurdish (🇹🇷), Oromo (🇪🇹), Xhosa (🇿🇦), Scottish Gaelic (🇬🇧), Igbo (🇳🇬), Assamese (🇮🇳), Ganda (🇺🇬), Wolof (🇸🇳), Western Frisian (🇳🇱), Tswana (🇧🇼), Fula (🇸🇳), Guaraní (🇵🇾), Sindhi (🇵🇰), Lingala (🇨🇩), Bambara (🇲🇱), Inuktitut (🇨🇦), Kongo (🇨🇩), Quechua (🇵🇪), Swati (🇸🇿), Unassigned (🌐)Github🔍
Introducing L2M3, A Multilingual Medical Large Language Model to Advance Health Equity in Low-Resource Regions
2024English (🇺🇸), Chinese (🇨🇳), Telugu (🇮🇳), Hindi (🇮🇳), Arabic (🇸🇦), Swahili (🇹🇿), Bengali (🇧🇩)🔍🔍
Adapting Pre-trained Language Models to African Languages via Multilingual Adaptive Fine-Tuning
2022Afrikaans (🇿🇦), Amharic (🇪🇹), Hausa (🇳🇬), Igbo (🇳🇬), Malagasy (🇲🇬), Chichewa (🇲🇼), Oromo (🇪🇹), Naija (🇳🇬), Kinyarwanda (🇷🇼), Kirundi (🇧🇮), Shona (🇿🇼), Somali (🇸🇴), Sesotho (🇱🇸), Swahili (🇹🇿), isiXhosa (🇿🇦), Yoruba (🇳🇬), isiZulu (🇿🇦), English (🇬🇧), French (🇫🇷), Arabic (🇸🇦), Lingala (🇨🇩), Luganda (🇺🇬), Luo (🇰🇪), Wolof (🇸🇳)GitHub🤗
MuRIL: Multilingual Representations for Indian Languages
2021Assamese (🇮🇳), Bengali (🇧🇩), Gujarati (🇮🇳), Hindi (🇮🇳), Kannada (🇮🇳), Kashmiri (🇮🇳), Malayalam (🇮🇳), Marathi (🇮🇳), Nepali (🇳🇵), Oriya (🇮🇳), Punjabi (🇮🇳), Sanskrit (🇮🇳), Sindhi (🇵🇰), Tamil (🇮🇳), Telugu (🇮🇳), Urdu (🇮🇳), English (🇬🇧)🔍🔍
From English to Foreign Languages: Transferring Pretrained Language Models
2020French (🇫🇷), Russian (🇷🇺), Arabic (🇦🇪), Chinese (🇨🇳), Hindi (🇮🇳), Vietnamese (🇻🇳)🔍🔍

Multilingual Vision Language Models

Survey / Review Papers

About

A comprehensive collection of multilingual datasets and large language models, meticulously curated for evaluating and enhancing the performance of large language models across diverse languages and tasks.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

[8]ページ先頭

©2009-2025 Movatter.jp