Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Library for exploring and validating machine learning data

License

NotificationsYou must be signed in to change notification settings

tensorflow/data-validation

Repository files navigation

PythonPyPIDocumentation

TensorFlow Data Validation (TFDV) is a library for exploring and validatingmachine learning data. It is designed to be highly scalableand to work well with TensorFlow andTensorFlow Extended (TFX).

TF Data Validation includes:

  • Scalable calculation of summary statistics of training and test data.
  • Integration with a viewer for data distributions and statistics, as wellas faceted comparison of pairs of features (Facets)
  • Automateddata-schemageneration to describe expectations about datalike required values, ranges, and vocabularies
  • A schema viewer to help you inspect the schema.
  • Anomaly detection to identifyanomalies,such as missing features,out-of-range values, or wrong feature types, to name a few.
  • An anomalies viewer so that you can see what features have anomalies andlearn more in order to correct them.

For instructions on using TFDV, see theget started guideand try out theexample notebook.Some of the techniques implemented in TFDV are described in atechnical paper published in SysML'19.

Installing from PyPI

The recommended way to install TFDV is using thePyPI package:

pip install tensorflow-data-validation

Nightly Packages

TFDV also hosts nightly packages athttps://pypi-nightly.tensorflow.org onGoogle Cloud. To install the latest nightly package, please use the followingcommand:

pip install -i https://pypi-nightly.tensorflow.org/simple tensorflow-data-validation

This will install the nightly packages for the major dependencies of TFDV suchas TFX Basic Shared Libraries (TFX-BSL) and TensorFlow Metadata (TFMD).

Build with Docker

This is the recommended way to build TFDV under Linux, and is continuouslytested at Google.

1. Install Docker

Please first installdocker anddocker-compose by following the directions:docker;docker-compose.

2. Clone the TFDV repository

git clone https://github.com/tensorflow/data-validationcd data-validation

Note that these instructions will install the latest master branch of TensorFlowData Validation. If you want to install a specific branch (such as a releasebranch), pass-b <branchname> to thegit clone command.

3. Build the pip package

Then, run the following at the project root:

sudo docker-compose build manylinux2010sudo docker-compose run -e PYTHON_VERSION=${PYTHON_VERSION} manylinux2010

wherePYTHON_VERSION is one of{37, 38}.

A wheel will be produced underdist/.

4. Install the pip package

pip install dist/*.whl

Build from source

1. Prerequisites

To compile and use TFDV, you need to set up some prerequisites.

Install NumPy

If NumPy is not installed on your system, install it now by followingthesedirections.

Install Bazel

If Bazel is not installed on your system, install it now by followingthesedirections.

2. Clone the TFDV repository

git clone https://github.com/tensorflow/data-validationcd data-validation

Note that these instructions will install the latest master branch of TensorFlowData Validation. If you want to install a specific branch (such as a releasebranch), pass-b <branchname> to thegit clone command.

3. Build the pip package

TFDV wheel is Python version dependent -- to build the pip package thatworks for a specific Python version, use that Python binary to run:

python setup.py bdist_wheel

You can find the generated.whl file in thedist subdirectory.

4. Install the pip package

pip install dist/*.whl

Supported platforms

TFDV is tested on the following 64-bit operating systems:

  • macOS 10.14.6 (Mojave) or later.
  • Ubuntu 16.04 or later.
  • Windows 7 or later.

Notable Dependencies

TensorFlow is required.

Apache Beam is required; it's the way that efficientdistributed computation is supported. By default, Apache Beam runs in localmode but can also run in distributed mode usingGoogle Cloud Dataflow and other ApacheBeamrunners.

Apache Arrow is also required. TFDV uses Arrow torepresent data internally in order to make use of vectorized numpy functions.

Compatible versions

The following table shows the package versions that arecompatible with each other. This is determined by our testing framework, butotheruntested combinations may also work.

tensorflow-data-validationapache-beam[gcp]pyarrowtensorflowtensorflow-metadatatensorflow-transformtfx-bsl
GitHub master2.32.02.0.0nightly (1.x/2.x)1.4.0n/a1.4.0
1.4.02.32.02.0.01.15 / 2.61.4.0n/a1.4.0
1.3.02.32.02.0.01.15 / 2.61.2.0n/a1.3.0
1.2.02.31.02.0.01.15 / 2.51.2.0n/a1.2.0
1.1.12.29.02.0.01.15 / 2.51.1.0n/a1.1.1
1.1.02.29.02.0.01.15 / 2.51.1.0n/a1.1.0
1.0.02.29.02.0.01.15 / 2.51.0.0n/a1.0.0
0.30.02.28.02.0.01.15 / 2.40.30.0n/a0.30.0
0.29.02.28.02.0.01.15 / 2.40.29.0n/a0.29.0
0.28.02.28.02.0.01.15 / 2.40.28.0n/a0.28.1
0.27.02.27.02.0.01.15 / 2.40.27.0n/a0.27.0
0.26.12.28.00.17.01.15 / 2.30.26.00.26.00.26.0
0.26.02.25.00.17.01.15 / 2.30.26.00.26.00.26.0
0.25.02.25.00.17.01.15 / 2.30.25.00.25.00.25.0
0.24.12.24.00.17.01.15 / 2.30.24.00.24.10.24.1
0.24.02.23.00.17.01.15 / 2.30.24.00.24.00.24.0
0.23.12.24.00.17.01.15 / 2.30.23.00.23.00.23.0
0.23.02.23.00.17.01.15 / 2.30.23.00.23.00.23.0
0.22.22.20.00.16.01.15 / 2.20.22.00.22.00.22.1
0.22.12.20.00.16.01.15 / 2.20.22.00.22.00.22.1
0.22.02.20.00.16.01.15 / 2.20.22.00.22.00.22.0
0.21.52.17.00.15.01.15 / 2.10.21.00.21.10.21.3
0.21.42.17.00.15.01.15 / 2.10.21.00.21.10.21.3
0.21.22.17.00.15.01.15 / 2.10.21.00.21.00.21.0
0.21.12.17.00.15.01.15 / 2.10.21.00.21.00.21.0
0.21.02.17.00.15.01.15 / 2.10.21.00.21.00.21.0
0.15.02.16.00.14.01.15 / 2.00.15.00.15.00.15.0
0.14.12.14.00.14.01.140.14.00.14.0n/a
0.14.02.14.00.14.01.140.14.00.14.0n/a
0.13.12.11.0n/a1.130.12.10.13.0n/a
0.13.02.11.0n/a1.130.12.10.13.0n/a
0.12.02.10.0n/a1.120.12.10.12.0n/a
0.11.02.8.0n/a1.110.9.00.11.0n/a
0.9.02.6.0n/a1.9n/an/an/a

Questions

Please direct any questions about working with TF Data Validation toStack Overflow using thetensorflow-data-validationtag.

Links

About

Library for exploring and validating machine learning data

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors27


[8]ページ先頭

©2009-2025 Movatter.jp