Flinkathon: Guide to setting up a Local Flink Custer

Table of contents

Reading Time: 3 minutes

In our previous blog post, Flinkathon: First Step towards Flink’s DataStream API, we created our first streaming application using Apache Flink. It was easy, clean, and concise.

However, the real power of Apache Flink is seen on a cluster, where data is processed in a distributed manner, with the advantage of multi-core/multi-memory systems. So, in this blog post, we will see how to set up a local cluster Flink cluster and run a sample streaming application on it.

Flink runs on all types of OS, like Linux, Mac OS X, and Windows. To be able to run Flink, the only requirement is to have a working Java 8.x installation. We can check the correct installation of Java by issuing the following command:

java -version

If you have Java 8, the output will look something like this:

java version "1.8.0_201"
Java(TM) SE Runtime Environment (build 1.8.0_201-b09)
Java HotSpot(TM) 64-Bit Server VM (build 25.201-b09, mixed mode)

Now, the next step is to download and unpack the Flink binaries from the Flink downloads page. We can pick any Hadoop/Scala combination we like. If we plan to just use the local file system, any Hadoop version will work fine.

Once we have downloaded and unpacked the binaries, we are ready to start the local cluster. For that we just have to run the following script:

./bin/start-cluster.sh

Now to check whether Flink cluster is up and running or not, we can open http://localhost:8081 and make sure everything is working as expected.

We can also verify that the system is running by checking the log files in the logs directory, in case access to wen console is not available:

head -n 10 log/flink-knoldus-standalonesession-0-knoldus-Vostro-15-3568.log

2019-05-06 17:25:36,073 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint         - --------------------------------------------------------------------------------

2019-05-06 17:25:36,079 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint         -  Starting StandaloneSessionClusterEntrypoint (Version: 1.8.0, Rev:4caec0d, Date:03.04.2019 @ 13:25:54 PDT)

2019-05-06 17:25:36,080 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint         -  OS current user: knoldus

2019-05-06 17:25:36,081 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint         -  Current Hadoop/Kerberos user: 

2019-05-06 17:25:36,082 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint         -  JVM: OpenJDK 64-Bit Server VM - Oracle Corporation - 1.8/25.191-b12

2019-05-06 17:25:36,083 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint         -  Maximum heap size: 981 MiBytes

2019-05-06 17:25:36,083 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint         -  JAVA_HOME: /usr/lib/jvm/java-1.8.0-openjdk-amd64

2019-05-06 17:25:36,084 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint         -  No Hadoop Dependency available

2019-05-06 17:25:36,084 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint         -  JVM Options:

2019-05-06 17:25:36,085 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint         -     -Xms1024m

Now, we have got our Local Flink Cluster up and running. Next step is to run our sample streaming application. For that, just download Flinkathon sample from GitHub and execute the following steps:

sbt clean assembly
./bin/flink run /home/knoldus/Projects/Flinkathon/target/scala-2.12/flinkathon.jar

Now, we just have to push some data into the Kafka topic and see the results of the program in another Kafka topic:

bin/kafka-console-producer.sh --broker-list localhost:9092 --topic input

Hello, how are you?

I am fine, thank you

bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic output --from-beginning

(hello,1)

(how,1)

(are,1)

(you,1)

(i,1)

(am,1)

(fine,1)

(thank,1)

(you,2)

In the output topic we can see the word count of the data that is coming from input topic.

Now at last to stop the Flink cluster, when you’re done, execute following command:

./bin/stop-cluster.sh

Conclusion

In short, we have learned how to set up a local Flink cluster and run our streaming applications on it.

In the upcoming blogs, we will explain state management feature of Flink with the help of which you can keep the results safe, so, that in case of a failure or maintenance the results are not lost. So, stay tuned 🙂