- Notifications
You must be signed in to change notification settings - Fork59
scrapinghub/webstruct
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Webstruct is a library for creating statisticalNER systems that workon HTML data, i.e. a library for building tools that extract namedentities (addresses, organization names, open hours, etc) from webpages.
Unlike most NER systems, webstruct works on HTML data, not onlyon text data. This allows to define features that use HTML structure,and also to embed annotation results back into HTML.
Read thedocs for more info.
License is MIT.
- Source code:https://github.com/scrapinghub/webstruct
- Bug tracker:https://github.com/scrapinghub/webstruct/issues
To run tests, make suretox is installed, then runtox
from the source root.