Overview of the website's homepage in December 2020 | |
Type of site | Language recording tool, Online linguistic media library |
|---|---|
| Available in | Multilingual |
| Owner | Wikimédia France [fr] |
| Created by | Wikimedia France and theWikimedia community |
| URL | lingualibre |
| Advertising | No |
| Commercial | No |
| Registration | Optional, but required for recording |
| Launched | August 2016; 9 years ago (2016-08) |
| Current status | Active |
Content license | Creative Commons Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) |
Lingua Libre is an online collaborative project and tool by theWikimédia France [fr] association, which aims to build acollaborative,multilingual,audiovisualspeech corpus under afree license. It mostly consists of a rapid recording online service which allows the user to chain hundreds of recordings. Contributors have produced content in 310+ languages.
Lingua Libre enables the recording ofwords,phrases orsentences of any language, oral (audio recording) or signed (video recording).

Words are presented to the speaker in the form of a list, created on the spot, in advance, or by reusing an existing Wikimedia category. The speaker simply reads the word displayed on the screen, and the software moves on to the next word when it detects a silence after the read word.[1] This principle, borrowed from the open source softwareShtooka [fr] recorder with the help of its creator, Nicolas Vion, makes it possible to record several hundreds of words per hour. The recordings are then uploaded automatically from the web client to theWikimedia Commons media library.
In spring 2021, Lingua Libre was offline due to a fire in Strasbourg,[2] but no audio recordings were lost.[3]
The recordings can be consulted either on Lingua Libre or onCommons. They are mainly used on other Wikimedia projects, for example to illustrate entries onWiktionaries or proper nouns in Wikipedia articles.[1]
The re-use of the recordings in a language teaching context is envisaged. Language learners can freely download pronunciations and use them on GoldenDict, a popular dictionary software.[4] Thus, audio recordings can be used as“Pronunciation Dictionaries” on GoldenDict without needing internet connection.
The recordings are also reused inNatural Language Processing projects, for example to driveMozilla'sDeepSpeechspeech recognition engines.[5]
Lingua Libre was initiated on January 23, 2015[6] and has had three successive versions:
As part of theLanguages of France project, which aims to document and promote the regional languages of France on Wikimedia and Internet projects in general, the conception of Lingua Libre started in November 2015, partly funded by the DGLFLF (General Delegation for the French language and the languages of France). The first version of the project was launched in August 2016. Only suitable for audio recording, Lingua Libre was shown during a workshop onOccitan language in December 2016,[7][8] and then presented to the online Wikimedia community[9] and at international events in 2017.
A complete rebuilding was launched at the end of 2017. The new version of Lingua Libre is based onMediaWiki, usesWikibase andOAuth to better integrate into the Wikimedia environment. The interface is translated viaTranslatewiki.net so that the project can be used by a large number of communities. The new version of the site was ready in June 2018[10] and opened to the public in August 2018.
In 2020, important changes were made to the platform; a new look was developed especially for the site, the.org domain replaced the.fr domain used until then,[11] and added support forsign languages through video recording.
In the first two years of the project's launch, approximately 10,000 recordings were made. The transition to v.2 was accompanied by a sharp increase in the contributions. The number of recordings multiplied by 10 in less than a year, exceeding the 100,000 threshold in May 2019. These recordings were made by 127 speakers in almost 50 languages.[12] By September 2020, the platform had more than 300,000 recordings in 90 languages with more than 350 speakers. The 500,000 recordings milestone was reached in June 2021, thanks to 540 speakers of 120 languages.[13]