Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up

Combine XPath, CSS Selectors and JSONPath for Web data extracting.

License

NotificationsYou must be signed in to change notification settings

linw1995/data_extractor

Repository files navigation

licensePypi StatusPython versionPackage versionPyPI - DownloadsGitHub last commitCode style: blackBuild StatuscodecovDocumentation StatusPDM managed

CombineXPath,CSS Selectors andJSONPath for Web data extracting.

Quickstarts

Installation

Install the stable version from PYPI.

pip install"data-extractor[jsonpath-extractor]"# for extracting JSON datapip install"data-extractor[lxml]"# for extracting HTML data

Or install the latest version from Github.

pip install"data-extractor[jsonpath-extractor] @ git+https://github.com/linw1995/data_extractor.git@master"

Extract JSON data

Currently supports to extract JSON data with below optional dependencies

install one dependency of them to extract JSON data.

Extract HTML(XML) data

Currently supports to extract HTML(XML) data with below optional dependencies

Usage

fromdata_extractorimportField,Item,JSONExtractorclassCount(Item):followings=Field(JSONExtractor("countFollowings"))fans=Field(JSONExtractor("countFans"))classUser(Item):name_=Field(JSONExtractor("name"),name="name")age=Field(JSONExtractor("age"),default=17)count=Count()assertUser(JSONExtractor("data.users[*]"),is_many=True).extract(    {"data": {"users": [                {"name":"john","age":19,"countFollowings":14,"countFans":212,                },                {"name":"jack","description":"","countFollowings":54,"countFans":312,                },            ]        }    })== [    {"name":"john","age":19,"count": {"followings":14,"fans":212}},    {"name":"jack","age":17,"count": {"followings":54,"fans":312}},]

Changelog

v1.0.1

Build

  • Supports Python 3.13

Contributing

Environment Setup

Clone the source codes from Github.

git clone https://github.com/linw1995/data_extractor.gitcd data_extractor

Setup the development environment.Please make sure you install thepdm,pre-commit andnox CLIs in your environment.

make initmake PYTHON=3.7 init# for specific python version

Linting

Usepre-commit for installing linters to ensure a good code style.

make pre-commit

Run linters. Some linters run via CLInox, so make sure you install it.

make check-all

Testing

Run quick tests.

make

Run quick tests with verbose.

make vtest

Run tests with coverage.Testing in multiple Python environments is powered by CLInox.

make cov

[8]ページ先頭

©2009-2025 Movatter.jp