In this blog, we will be learning about Tensorflow Extended (TFX) components and libraries. TFX is a Google-production-scale machine learning (ML) platform based on TensorFlow. It provides a configuration framework and shared libraries. Moreover, to integrate common components needed to define, launch, and monitor your machine learning system.
How Tensorflow Extended (TFX) came up?
Since the time Google has publicized Tensorflow, its application in Deep Learning has been expanding massively. Though it is flexible, it does not provide an end-to-end production system. On the other hand, Sibyl has end-to-end facilities but lacks flexibility. Google then came up with Tensorflow Extended(TFX) idea as a production-scaled machine learning platform on Tensorflow, taking advantage of both Tensorflow and Sibyl frameworks.
You can install TFX via PyPI.
!pip install tfx
TFX Standard Components
A TFX pipeline typically includes the following components:
The ExampleGen TFX pipeline component is the entry point to your pipeline, that ingests data. As inputs, ExampleGen supports out-of-the-box ingestion of external data sources such as CSV, TF Records, Avro, and Parquet. As outputs, ExampleGen produces TF examples, or TF sequence examples that are highly efficient in performant data set representations, that can be read consistently by downstream components.
The StatisticsGen TFX pipeline component generates features statistics over both training and serving data, which can be used by other pipeline components.
The TFX components use a description of your input data called a schema. The schema is an instance of schema.proto. It can specify data types for feature values, whether a feature has to be present in all examples, allowed value ranges, and other properties. Moreover, a SchemaGen pipeline component will automatically generate a schema by inferring types, categories, and ranges from the training data.
The ExampleValidator pipeline component identifies anomalies in training and serving data. It can detect different classes of anomalies in the data. In addition, it can:
- Perform validity checks by comparing data statistics against a schema that codifies expectations of the user
- Detects training-serving skew by comparing training and serving data.
- Detect data drift by looking at a series of data.
The Transform TFX pipeline component performs feature engineering on the TF examples data artifact emitted from the ExampleGen component. Using the data schema artifact from the SchemaGen, or imported from external sources.
The Trainer TFX pipeline component trains a TensorFlow model. The trainer component produces at least one model for inference and serves in a TensorFlow saved model format. A safe model contains a complete TensorFlow program, including weights and computation.
The Tuner component tunes the hyperparameters for the model. The Tuner component is the newest TFX effects component and makes extensive use of the Python Keras tuner API for tuning hyperparameters. As inputs, the tuner component takes in the transformed data in transform graph artifacts, as outputs, the tuner components output a hyperparameter artifact.
The evaluator component will use the model created by the trainer in the original input data artifact. In addition, it will perform a thorough analysis using the TensorFlow model analysis library.
InfraValidator, which is a TFX component that is used as an early warning layer before pushing a model to production. The name InfraValidator came from the fact that it is validating the model in the actual model serving infrastructure. If the evaluator guarantees the performance of the model, InfraValidator guarantees that the model is mechanically fine.
The Pusher component is used to push a validated model to a deployment target during model training or re-training.
This diagram illustrates the flow of data between these components:
TFX includes both libraries and pipeline components. This diagram illustrates the relationships between TFX libraries and pipeline components:
TFX libraries and components cover a typical end-to-end machine learning pipeline, which starts with data ingestion and ends with model serving.
In addition, the tasks shown above are available in Python and can be installed separately. But it’s advisable to just install TFX, which comes with all the components.
In this blog, we covered the basics of Tensorflow extended(TFX) components and libraries.
Happy Learning 🙂