Let us know about TensorFlow Extended (TFX) components and Libraries?

Table of contents

Reading Time: 3 minutes

In this blog, we will be learning about Tensorflow Extended (TFX) components and libraries. TFX is a Google-production-scale machine learning (ML) platform based on TensorFlow. It provides a configuration framework and shared libraries. Moreover, to integrate common components needed to define, launch, and monitor your machine learning system.

How Tensorflow Extended (TFX) came up?

Since the time Google has publicized Tensorflow, its application in Deep Learning has been expanding massively. Though it is flexible, it does not provide an end-to-end production system. On the other hand, Sibyl has end-to-end facilities but lacks flexibility. Google then came up with Tensorflow Extended(TFX) idea as a production-scaled machine learning platform on Tensorflow, taking advantage of both Tensorflow and Sibyl frameworks.

Installing TFX

You can install TFX via PyPI.

!pip install tfx

TFX Standard Components

A TFX pipeline is a sequence of components that implement an ML pipeline that is designed for scalable, high-performance machine learning tasks. Likewise, it includes modeling, training, serving inference, and managing deployments to online, native mobile, and JavaScript targets.

A TFX pipeline typically includes the following components:

ExampleGen

The ExampleGen TFX pipeline component is the entry point to your pipeline, that ingests data. As inputs, ExampleGen supports out-of-the-box ingestion of external data sources such as CSV, TF Records, Avro, and Parquet. As outputs, ExampleGen produces TF examples, or TF sequence examples that are highly efficient in performant data set representations, that can be read consistently by downstream components.

StatisticsGen

The StatisticsGen TFX pipeline component generates features statistics over both training and serving data, which can be used by other pipeline components.

SchemaGen

The TFX components use a description of your input data called a schema. The schema is an instance of schema.proto. It can specify data types for feature values, whether a feature has to be present in all examples, allowed value ranges, and other properties. Moreover, a SchemaGen pipeline component will automatically generate a schema by inferring types, categories, and ranges from the training data.

ExampleValidator

The ExampleValidator pipeline component identifies anomalies in training and serving data. It can detect different classes of anomalies in the data. In addition, it can:

Perform validity checks by comparing data statistics against a schema that codifies expectations of the user
Detects training-serving skew by comparing training and serving data.
Detect data drift by looking at a series of data.

Transform

The Transform TFX pipeline component performs feature engineering on the TF examples data artifact emitted from the ExampleGen component. Using the data schema artifact from the SchemaGen, or imported from external sources.

Trainer

The Trainer TFX pipeline component trains a TensorFlow model. The trainer component produces at least one model for inference and serves in a TensorFlow saved model format. A safe model contains a complete TensorFlow program, including weights and computation.

Tuner

The Tuner component tunes the hyperparameters for the model. The Tuner component is the newest TFX effects component and makes extensive use of the Python Keras tuner API for tuning hyperparameters. As inputs, the tuner component takes in the transformed data in transform graph artifacts, as outputs, the tuner components output a hyperparameter artifact.

Evaluator

The evaluator component will use the model created by the trainer in the original input data artifact. In addition, it will perform a thorough analysis using the TensorFlow model analysis library.

InfraValidator

InfraValidator, which is a TFX component that is used as an early warning layer before pushing a model to production. The name InfraValidator came from the fact that it is validating the model in the actual model serving infrastructure. If the evaluator guarantees the performance of the model, InfraValidator guarantees that the model is mechanically fine.

Pusher

The Pusher component is used to push a validated model to a deployment target during model training or re-training.

This diagram illustrates the flow of data between these components:

TFX Libraries

TFX includes both libraries and pipeline components. This diagram illustrates the relationships between TFX libraries and pipeline components:

TFX libraries and components cover a typical end-to-end machine learning pipeline, which starts with data ingestion and ends with model serving.

In addition, the tasks shown above are available in Python and can be installed separately. But it’s advisable to just install TFX, which comes with all the components.