- Notifications
You must be signed in to change notification settings - Fork4
Resources for morphological analysis of Portuguese
License
LR-POR/MorphoBr
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
- LICENSE
- see LICENSE file
Unless otherwise stated, the present resources are derived from the dictionaries of the:
- Unitex-PB project
- http://www.nilc.icmc.usp.br/nilc/projects/unitex-pb/web/dicionarios.html
- Freeling Portuguese data
- https://github.com/TALP-UPC/FreeLing/tree/master/data/pt/dictionary/entries
- which is itself derived fromLABEL-LEX
- using unpublished data fromGarcia, Marcos et al.: available from commita0ebdbc49603564227778
Entries are separated by class, each in its own directory. Entries are divided in small files, so that they can be previewed on GitHub and processed in less powerful computers. The repository structure is meant for development: if you would like a copy of the data, you should go to ourreleases page.
Code (and its documentation) employed in the resource’s development can be found in thetools/ directory.
The tagset used in the above projects was converted to a more mnemonical one which generally follows the notational conventions of the descriptive linguistic and finite-state morphology literatures, see theTAGSET file.
In order to use MorphoBr resources for morphological analysis in the context of syntactic parsing (e.g. with PorGram), they must be compiled into finite-state transducers. The easier way is to load and compile it withFoma:
% ./compile.sh% foma -f compile.foma%echo"fortinho"| flookup -i morphobr.binfortinhoforte+N+DIM+M+SGfortinhoforte+A+DIM+M+SG%echo"comprei-o"| flookup -i morphobr.bincomprei-ocomprar+V+ele.ACC.3.M.SG+PRF+1+SG
For that, you need to:
- change the MAX_STACK value in the int_stack.c file in Foma source code to at least 9097152. Compile Foma.
Seemhulden/foma#146 andmhulden/foma#130
Python modules as well as XFST and bash scripts for performing this task are available in thetools/ folder. Inadequacies and errors found in the source dictionaries were corrected using these tools. See the respective incode documentation for a detailed description of the changes made to the source entries.
See ourreleases page.
Seehttps://github.com/LFG-PTBR/MorphoBr/wiki#publicações. To cite this resource, please usehttp://www.periodicos.letras.ufmg.br/index.php/textolivre/article/view/14294 (first in the publications list).
About
Resources for morphological analysis of Portuguese
Topics
Resources
License
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Releases
Packages0
Uh oh!
There was an error while loading.Please reload this page.
Contributors7
Uh oh!
There was an error while loading.Please reload this page.