Let us know about TensorFlow Extended (TFX) components and Libraries?

Reading Time: 3 minutes

In this blog, we will be learning about Tensorflow Extended (TFX) components and libraries. TFX is a Google-production-scale machine learning (ML) platform based on TensorFlow. It provides a configuration framework and shared libraries. Moreover, to integrate common components needed to define, launch, and monitor your machine learning system.

How Tensorflow Extended (TFX) came up?

Since the time Google has publicized Tensorflow, its application in Deep Learning has been expanding massively. Though it is flexible, it does not provide an end-to-end production system. On the other hand, Sibyl has end-to-end facilities but lacks flexibility. Google then came up with Tensorflow Extended(TFX) idea as a production-scaled machine learning platform on Tensorflow, taking advantage of both Tensorflow and Sibyl frameworks. 

Installing TFX

You can install TFX via PyPI.

!pip install tfx

TFX Standard Components

A TFX pipeline is a sequence of components that implement an ML pipeline that is designed for scalable, high-performance machine learning tasks. Likewise, it includes modeling, training, serving inference, and managing deployments to online, native mobile, and JavaScript targets.

A TFX pipeline typically includes the following components:

ExampleGen 

The ExampleGen TFX pipeline component is the entry point to your pipeline, that ingests data. As inputs, ExampleGen supports out-of-the-box ingestion of external data sources such as CSV, TF Records, Avro, and Parquet. As outputs, ExampleGen produces TF examples, or TF sequence examples that are highly efficient in performant data set representations, that can be read consistently by downstream components.

StatisticsGen 

The StatisticsGen TFX pipeline component generates features statistics over both training and serving data, which can be used by other pipeline components.

SchemaGen 

The TFX components use a description of your input data called a schema. The schema is an instance of schema.proto. It can specify data types for feature values, whether a feature has to be present in all examples, allowed value ranges, and other properties. Moreover, a SchemaGen pipeline component will automatically generate a schema by inferring types, categories, and ranges from the training data.

ExampleValidator 

The ExampleValidator pipeline component identifies anomalies in training and serving data. It can detect different classes of anomalies in the data. In addition, it can:

  • Perform validity checks by comparing data statistics against a schema that codifies expectations of the user
  • Detects training-serving skew by comparing training and serving data.
  • Detect data drift by looking at a series of data.

Transform 

The Transform TFX pipeline component performs feature engineering on the TF examples data artifact emitted from the ExampleGen component. Using the data schema artifact from the SchemaGen, or imported from external sources.

Trainer 

The Trainer TFX pipeline component trains a TensorFlow model. The trainer component produces at least one model for inference and serves in a TensorFlow saved model format. A safe model contains a complete TensorFlow program, including weights and computation.

Tuner 

The Tuner component tunes the hyperparameters for the model. The Tuner component is the newest TFX effects component and makes extensive use of the Python Keras tuner API for tuning hyperparameters. As inputs, the tuner component takes in the transformed data in transform graph artifacts, as outputs, the tuner components output a hyperparameter artifact. 

Evaluator 

The evaluator component will use the model created by the trainer in the original input data artifact. In addition, it will perform a thorough analysis using the TensorFlow model analysis library.

InfraValidator 

InfraValidator, which is a TFX component that is used as an early warning layer before pushing a model to production. The name InfraValidator came from the fact that it is validating the model in the actual model serving infrastructure. If the evaluator guarantees the performance of the model, InfraValidator guarantees that the model is mechanically fine.

Pusher 

The Pusher component is used to push a validated model to a deployment target during model training or re-training.

This diagram illustrates the flow of data between these components:

TFX Libraries

TFX includes both libraries and pipeline components. This diagram illustrates the relationships between TFX libraries and pipeline components:

TFX libraries and components cover a typical end-to-end machine learning pipeline, which starts with data ingestion and ends with model serving.

In addition, the tasks shown above are available in Python and can be installed separately. But it’s advisable to just install TFX, which comes with all the components. 

Conclusion

In this blog, we covered the basics of Tensorflow extended(TFX) components and libraries.

Happy Learning 🙂

knoldus

Written by 

Tanishka Garg is a Software Consultant working in AI/ML domain.