Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
André Pires edited this pageAug 16, 2017 ·36 revisions

This wiki documents the development process formy master's thesis, namedNamed entity extraction from Portuguese web text.

First, the HAREM dataset was used to perform NER using available tools, namelyStanford CoreNLP,NLTK,OpenNLP andspaCy. Repeated 10-fold cross validation was used to evaluate all tools, all results are present in this wiki. More info on theHAREM collection on its page.

After evaluation all tools with the baseline configuration, I performed a Hyperparameter study for each tool, this time using repeated holdout cross-validation.

I manually annotated a subset of SIGARRA news, generating a Portuguese corpus with 905 annotated news. And finally, I trained models with each tool with this dataset. More info on theSIGARRA News Corpus on its page.

Main repository folders

All tools were intended to be ran across HAREM with four different entity levels:

  • Categories: use only categories
  • Types: use only types
  • Subtypes: use only subtypes
  • Filtered: use filtered categories (subset of categories)

[8]ページ先頭

©2009-2025 Movatter.jp