- Notifications
You must be signed in to change notification settings - Fork5
Combine XPath, CSS Selectors and JSONPath for Web data extracting.
License
linw1995/data_extractor
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
CombineXPath,CSS Selectors andJSONPath for Web data extracting.
Install the stable version from PYPI.
pip install"data-extractor[jsonpath-extractor]"# for extracting JSON datapip install"data-extractor[lxml]"# for extracting HTML data
Or install the latest version from Github.
pip install"data-extractor[jsonpath-extractor] @ git+https://github.com/linw1995/data_extractor.git@master"
Currently supports to extract JSON data with below optional dependencies
install one dependency of them to extract JSON data.
Currently supports to extract HTML(XML) data with below optional dependencies
- lxml for usingXPath
- cssselect for usingCSS-Selectors
fromdata_extractorimportField,Item,JSONExtractorclassCount(Item):followings=Field(JSONExtractor("countFollowings"))fans=Field(JSONExtractor("countFans"))classUser(Item):name_=Field(JSONExtractor("name"),name="name")age=Field(JSONExtractor("age"),default=17)count=Count()assertUser(JSONExtractor("data.users[*]"),is_many=True).extract( {"data": {"users": [ {"name":"john","age":19,"countFollowings":14,"countFans":212, }, {"name":"jack","description":"","countFollowings":54,"countFans":312, }, ] } })== [ {"name":"john","age":19,"count": {"followings":14,"fans":212}}, {"name":"jack","age":17,"count": {"followings":54,"fans":312}},]
Build
- Supports Python 3.13
Clone the source codes from Github.
git clone https://github.com/linw1995/data_extractor.gitcd data_extractor
Setup the development environment.Please make sure you install thepdm,pre-commit andnox CLIs in your environment.
make initmake PYTHON=3.7 init# for specific python version
Usepre-commit for installing linters to ensure a good code style.
make pre-commit
Run linters. Some linters run via CLInox, so make sure you install it.
make check-all
Run quick tests.
make
Run quick tests with verbose.
make vtest
Run tests with coverage.Testing in multiple Python environments is powered by CLInox.
make cov
About
Combine XPath, CSS Selectors and JSONPath for Web data extracting.