Basic Anatomy of a Flink Program

Table of contents
Reading Time: 3 minutes

Hi Folks! Hope you all are safe in the COVID-19 pandemic and learning new tools and tech while staying at home. I also have just started learning a very prominent Big Data framework for stream processing which is  Flink. Flink is a distributed framework and based on the streaming first principle, means it is a real streaming processing engine and implements batch processing as a special case. In this blog, we will see the basic anatomy of a Flink program. So, this blog will help us to understand the basic structure of a Flink program and how we can start writing a basic Flink Application.

Let’s explore the steps involves in setting up the streaming application in Flink with a simple example. In the example, we will read messages in the form of text from the socket text stream. Then filter out the streaming text if it is a number. The Flink application for this use case will be accomplished in 5 steps as shown below.

Step 1: Setup Execution Environment

The very first step is to let Flink knows the right environment for application means whether the streaming application is going to be run locally or on some machines need to connect. So, we need to create a stream execution environment.

StreamExecutionEnvironment executionEnvironment =
       StreamExecutionEnvironment.getExecutionEnvironment();

It results in a local environment when the application runs locally. And the remote environment when the application runs on a cluster.

Step 2: Add a Source and load initial data

When we have the stream execution environment the next step is to connect to a source of streaming data and load the initial data into the Flink program. This could mean reading the data in the form of a stream from a Disk, any Databases, or any external streaming sources like Kafka, Kinesis, or Socket. In our example, connecting to a socket text stream and this will be our data source. This is the netcat utility that we run on localhost and port 9000. So, we are using the socketTextStream() method on the streaming execution environment to connect to this data source.

executionEnvironment
        .socketTextStream("localhost", 9000, '\n', 6)
Step 3: Do series of transformations on streaming data

The next step would be to apply a series of transformations on the data we have loaded to extract the information we are interested in. These transformations in the Flink are lazily evaluated. They will not be run until we trigger its execution. In our example, we will apply a simple filter transformation that will filter out the text from the stream if it is a number. Transformation applied to a stream always gives a new stream.

DataStream<String> filteredString = executionEnvironment
        .socketTextStream("localhost", 9000, '\n', 6)
       .filter(string -> ! string.matches("-?\\d+(\\.\\d+)?"));
Step 4: Save results

After all the transformation has been performed, pass the extracted or resultant data on to a sink. We can save the results to a file, printed onto a screen, or pass it on to another application. In our example, we will sink the output to the screen by using Flink’s print() method.

filteredString.print();
Step 5: Trigger execution

Once we have gone through all the above steps and set up the Flink streaming application, call the execute() method on the streaming execution environment to kick start the application. We can give a name to the Flink application inside the execute() method.

executionEnvironment.execute("Flink String Filter");

Voila!! our simple Flink streaming application is ready.

The complete code looks like:

public static void main(String[] args) {

StreamExecutionEnvironment executionEnvironment =
StreamExecutionEnvironment.getExecutionEnvironment();

DataStream<String> filteredString = executionEnvironment
.socketTextStream("localhost", 9000, '\n', 6)
.filter(string -> ! string.matches("-?\\d+(\\.\\d+)?"));

filteredString.print();

executionEnvironment.execute("Flink String Filter");
}

To start the application open a terminal and run a command:

nc -l 9000

Then run Flink application and pass some texts.

The result we get is stream of texts after filter out the texts if it is a number.

Hope this blog will help you to give a kick start to your Flink learning.

Keep learning!!

Written by 

Exploring Big Data Technologies.

Discover more from Knoldus Blogs

Subscribe now to keep reading and get access to the full archive.

Continue reading