- Notifications
You must be signed in to change notification settings - Fork1
HubTou/dict-fr-AU-DELA
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
pip installdict-fr-AU-DELA
Starting from the originalinflected form DELA French dictionary,provided by the formerLaboratoire d'Automatique Documentaire et Linguistique (LADL),now integrated intoInstitut Gaspard Monge (IGM) of theUniversité Gustave Eiffel,this repository contains:
- modified dictionary data, publiclyeditable here;
- a Python package gathering the results for exploitation by other tools.
The selected original dictionary is the inflected form DELA French dictionary in UTF-16 LE encoding,from March 16, 2006, with 683.824 simple entries for 102.073 different lemmas and 108.436 compounded entries for 83.604 different lemmas.
All files are installed in Python's/usr/local equivalent, undershare/dict.
Filename | Description |
---|---|
dict-fr-AU-DELA-License | Lesser General Public License For Linguistic Resources |
Filename | Description |
---|---|
dict-fr-AU-DELA | Modified inflected form DELA French dictionary in UTF-8 encoding and Unix-style end of lines |
dict-fr-AU-DELA.ascii | French words and compound words list (unaccented) |
dict-fr-AU-DELA.unicode | 742.889 entries French words and compound words list (accented) |
dict-fr-AU-DELA.combined | French words and compound words list (with both accented and unaccented words) |
dict-fr-AU-DELA-proper_nouns.ascii | French proper nouns list (unaccented, sometimes compounded) |
dict-fr-AU-DELA-proper_nouns.unicode | 823 entries French proper nouns list (accented, sometimes compounded) |
dict-fr-AU-DELA-proper_nouns.combined | French proper nouns list (with both accented and unaccented words, sometimes compounded) |
dict-fr-AU-DELA-common-words.ascii | French common words list (unaccented) |
dict-fr-AU-DELA-common-words.unicode | 641.759 entries French common words list (accented) |
dict-fr-AU-DELA-common-words.combined | French common words list (with both accented and unaccented words) |
dict-fr-AU-DELA-common-compound-words.ascii | French common compound words list (unaccented) |
dict-fr-AU-DELA-common-compound-words.unicode | 100.320 entries French common compound words list (accented) |
dict-fr-AU-DELA-common-compound-words.combined | French common compound words list (with both accented and unaccented words) |
Beside manual edits, apart from thedict-fr-AU-DELA file, these generated files went through the following transformations:
- removal of escape backslashes
- removal of lemma and grammatical info fromdict-fr-AU-DELA
- lossless conversion of accents for the*-ascii versions
- combination of the*-ascii and*-unicode versions into the*-combined ones (without duplicates)
spell(1) like tools,anagram(6),conjuguer(1)
DELA means "Dictionnaire Electronique du LADL" (LADL's electronic dictionaries). These dictionaries were initiated by the lab's founder,Maurice Gross.
This modified version of theoriginal DELA dictionary was necessary because ourPNU project'sconjuguer command made it clear that there were errors in some verb conjugations.
It was naturally called AU-DELA, a pun meaning beyond DELA ("au-delà" in French being translated as "beyond").
I wrote anhistory of Unix & French dictionaries (in French only),which covers this dictionary and many others.
The original contents, as well as this package, are licensed under theLesser General Public License For Linguistic Resources.
Laboratoire d'Automatique Documentaire et Linguistique (LADL) for the original contents.
Hubert Tournier for the package and some initial changes.
The GitHub community for further changes.
About
EDITABLE French dictionaries from Laboratoire d'Automatique Documentaire et Linguistique (LADL)