- Notifications
You must be signed in to change notification settings - Fork221
tensorflow/transform
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
TensorFlow Transform is a library for preprocessing data with TensorFlow.tf.Transform is useful for data that requires a full-pass, such as:
- Normalize an input value by mean and standard deviation.
- Convert strings to integers by generating a vocabulary over all input values.
- Convert floats to integers by assigning them to buckets based on the observeddata distribution.
TensorFlow has built-in support for manipulations on a single example or a batchof examples.tf.Transform extends these capabilities to support full-passesover the example data.
The output oftf.Transform is exported as aTensorFlow graph to use for training andserving. Using the same graph for both training and serving can prevent skewsince the same transformations are applied in both stages.
For an introduction totf.Transform, see thetf.Transform section of theTFX Dev Summit talk on TFX(link).
Thetensorflow-transformPyPI package is therecommended way to installtf.Transform:
pip install tensorflow-transform
To build from source follow the following steps:Create a virtual environment by running the commands
python -m venv<virtualenv_name>source<virtualenv_name>/bin/activategit clone https://github.com/tensorflow/transform.gitcd transformpip install.
If you are doing development on the TFT repo, replace
pip install.with
pip install -e.The-e flag causes TFT to be installed indevelopment mode.
TFT also hosts nightly packages athttps://pypi-nightly.tensorflow.org onGoogle Cloud. To install the latest nightly package, please use the followingcommand:
pip install --extra-index-url https://pypi-nightly.tensorflow.org/simple tensorflow-transform
This will install the nightly packages for the major dependencies of TFT suchas TensorFlow Metadata (TFMD), TFX Basic Shared Libraries (TFX-BSL).
To run TFT tests, run the following command from the root of the repository:
pytest
TensorFlow is required.
Apache Beam is required; it's the way that efficientdistributed computation is supported. By default, Apache Beam runs in localmode but can also run in distributed mode usingGoogle Cloud Dataflow and other ApacheBeamrunners.
Apache Arrow is also required. TFT uses Arrow torepresent data internally in order to make use of vectorized numpy functions.
The following table is thetf.Transform package versions that arecompatible with each other. This is determined by our testing framework, butotheruntested combinations may also work.
| tensorflow-transform | apache-beam[gcp] | pyarrow | tensorflow | tensorflow-metadata | tfx-bsl |
|---|---|---|---|---|---|
| GitHub master | 2.65.0 | 10.0.1 | nightly (2.x) | 1.17.1 | 1.17.1 |
| 1.17.0 | 2.65.0 | 10.0.1 | 2.17 | 1.17.1 | 1.17.1 |
| 1.16.0 | 2.60.0 | 10.0.1 | 2.16 | 1.16.1 | 1.16.1 |
| 1.15.0 | 2.47.0 | 10.0.0 | 2.15 | 1.15.0 | 1.15.1 |
| 1.14.0 | 2.47.0 | 10.0.0 | 2.13 | 1.14.0 | 1.14.0 |
| 1.13.0 | 2.41.0 | 6.0.0 | 2.12 | 1.13.1 | 1.13.0 |
| 1.12.0 | 2.41.0 | 6.0.0 | 2.11 | 1.12.0 | 1.12.0 |
| 1.11.0 | 2.41.0 | 6.0.0 | 1.15.5 / 2.10 | 1.11.0 | 1.11.0 |
| 1.10.0 | 2.40.0 | 6.0.0 | 1.15.5 / 2.9 | 1.10.0 | 1.10.0 |
| 1.9.0 | 2.38.0 | 5.0.0 | 1.15.5 / 2.9 | 1.9.0 | 1.9.0 |
| 1.8.0 | 2.38.0 | 5.0.0 | 1.15.5 / 2.8 | 1.8.0 | 1.8.0 |
| 1.7.0 | 2.36.0 | 5.0.0 | 1.15.5 / 2.8 | 1.7.0 | 1.7.0 |
| 1.6.1 | 2.35.0 | 5.0.0 | 1.15.5 / 2.8 | 1.6.0 | 1.6.0 |
| 1.6.0 | 2.35.0 | 5.0.0 | 1.15.5 / 2.7 | 1.6.0 | 1.6.0 |
| 1.5.0 | 2.34.0 | 5.0.0 | 1.15.2 / 2.7 | 1.5.0 | 1.5.0 |
| 1.4.1 | 2.33.0 | 4.0.1 | 1.15.2 / 2.6 | 1.4.0 | 1.4.0 |
| 1.4.0 | 2.33.0 | 4.0.1 | 1.15.2 / 2.6 | 1.4.0 | 1.4.0 |
| 1.3.0 | 2.31.0 | 2.0.0 | 1.15.2 / 2.6 | 1.2.0 | 1.3.0 |
| 1.2.0 | 2.31.0 | 2.0.0 | 1.15.2 / 2.5 | 1.2.0 | 1.2.0 |
| 1.1.1 | 2.29.0 | 2.0.0 | 1.15.2 / 2.5 | 1.1.0 | 1.1.1 |
| 1.1.0 | 2.29.0 | 2.0.0 | 1.15.2 / 2.5 | 1.1.0 | 1.1.0 |
| 1.0.0 | 2.29.0 | 2.0.0 | 1.15 / 2.5 | 1.0.0 | 1.0.0 |
| 0.30.0 | 2.28.0 | 2.0.0 | 1.15 / 2.4 | 0.30.0 | 0.30.0 |
| 0.29.0 | 2.28.0 | 2.0.0 | 1.15 / 2.4 | 0.29.0 | 0.29.0 |
| 0.28.0 | 2.28.0 | 2.0.0 | 1.15 / 2.4 | 0.28.0 | 0.28.1 |
| 0.27.0 | 2.27.0 | 2.0.0 | 1.15 / 2.4 | 0.27.0 | 0.27.0 |
| 0.26.0 | 2.25.0 | 0.17.0 | 1.15 / 2.3 | 0.26.0 | 0.26.0 |
| 0.25.0 | 2.25.0 | 0.17.0 | 1.15 / 2.3 | 0.25.0 | 0.25.0 |
| 0.24.1 | 2.24.0 | 0.17.0 | 1.15 / 2.3 | 0.24.0 | 0.24.1 |
| 0.24.0 | 2.23.0 | 0.17.0 | 1.15 / 2.3 | 0.24.0 | 0.24.0 |
| 0.23.0 | 2.23.0 | 0.17.0 | 1.15 / 2.3 | 0.23.0 | 0.23.0 |
| 0.22.0 | 2.20.0 | 0.16.0 | 1.15 / 2.2 | 0.22.0 | 0.22.0 |
| 0.21.2 | 2.17.0 | 0.15.0 | 1.15 / 2.1 | 0.21.0 | 0.21.3 |
| 0.21.0 | 2.17.0 | 0.15.0 | 1.15 / 2.1 | 0.21.0 | 0.21.0 |
| 0.15.0 | 2.16.0 | 0.14.0 | 1.15 / 2.0 | 0.15.0 | 0.15.0 |
| 0.14.0 | 2.14.0 | 0.14.0 | 1.14 | 0.14.0 | n/a |
| 0.13.0 | 2.11.0 | n/a | 1.13 | 0.12.1 | n/a |
| 0.12.0 | 2.10.0 | n/a | 1.12 | 0.12.0 | n/a |
| 0.11.0 | 2.8.0 | n/a | 1.11 | 0.9.0 | n/a |
| 0.9.0 | 2.6.0 | n/a | 1.9 | 0.9.0 | n/a |
| 0.8.0 | 2.5.0 | n/a | 1.8 | n/a | n/a |
| 0.6.0 | 2.4.0 | n/a | 1.6 | n/a | n/a |
| 0.5.0 | 2.3.0 | n/a | 1.5 | n/a | n/a |
| 0.4.0 | 2.2.0 | n/a | 1.4 | n/a | n/a |
| 0.3.1 | 2.1.1 | n/a | 1.3 | n/a | n/a |
| 0.3.0 | 2.1.1 | n/a | 1.3 | n/a | n/a |
| 0.1.10 | 2.0.0 | n/a | 1.0 | n/a | n/a |
Please direct any questions about working withtf.Transform toStack Overflow using thetensorflow-transformtag.
About
Input pipeline framework
Resources
License
Contributing
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Packages0
Uh oh!
There was an error while loading.Please reload this page.