Flinkathon: Guide to setting up a Local Flink Custer

Table of contents
Reading Time: 3 minutes

In our previous blog post, Flinkathon: First Step towards Flink’s DataStream API, we created our first streaming application using Apache Flink. It was easy, clean, and concise.

However, the real power of Apache Flink is seen on a cluster, where data is processed in a distributed manner, with the advantage of multi-core/multi-memory systems. So, in this blog post, we will see how to set up a local cluster Flink cluster and run a sample streaming application on it.

Flink runs on all types of OS, like Linux, Mac OS X, and Windows. To be able to run Flink, the only requirement is to have a working Java 8.x installation. We can check the correct installation of Java by issuing the following command:

java -version

If you have Java 8, the output will look something like this:

java version "1.8.0_201"
Java(TM) SE Runtime Environment (build 1.8.0_201-b09)
Java HotSpot(TM) 64-Bit Server VM (build 25.201-b09, mixed mode)

Now, the next step is to download and unpack the Flink binaries from the Flink downloads page. We can pick any Hadoop/Scala combination we like. If we plan to just use the local file system, any Hadoop version will work fine.

Once we have downloaded and unpacked the binaries, we are ready to start the local cluster. For that we just have to run the following script:

./bin/start-cluster.sh

Now to check whether Flink cluster is up and running or not, we can open http://localhost:8081 and make sure everything is working as expected.

We can also verify that the system is running by checking the log files in the logs directory, in case access to wen console is not available:

head -n 10 log/flink-knoldus-standalonesession-0-knoldus-Vostro-15-3568.log
2019-05-06 17:25:36,073 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - --------------------------------------------------------------------------------
2019-05-06 17:25:36,079 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Starting StandaloneSessionClusterEntrypoint (Version: 1.8.0, Rev:4caec0d, Date:03.04.2019 @ 13:25:54 PDT)
2019-05-06 17:25:36,080 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - OS current user: knoldus
2019-05-06 17:25:36,081 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Current Hadoop/Kerberos user:
2019-05-06 17:25:36,082 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - JVM: OpenJDK 64-Bit Server VM - Oracle Corporation - 1.8/25.191-b12
2019-05-06 17:25:36,083 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Maximum heap size: 981 MiBytes
2019-05-06 17:25:36,083 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - JAVA_HOME: /usr/lib/jvm/java-1.8.0-openjdk-amd64
2019-05-06 17:25:36,084 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - No Hadoop Dependency available
2019-05-06 17:25:36,084 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - JVM Options:
2019-05-06 17:25:36,085 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - -Xms1024m

Now, we have got our Local Flink Cluster up and running. Next step is to run our sample streaming application. For that, just download Flinkathon sample from GitHub and execute the following steps:

  1. sbt clean assembly
  2. ./bin/flink run /home/knoldus/Projects/Flinkathon/target/scala-2.12/flinkathon.jar

Now, we just have to push some data into the Kafka topic and see the results of the program in another Kafka topic:

bin/kafka-console-producer.sh --broker-list localhost:9092 --topic input

Hello, how are you?

I am fine, thank you    
bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic output --from-beginning

(hello,1)

(how,1)

(are,1)

(you,1)

(i,1)

(am,1)

(fine,1)

(thank,1)

(you,2)

In the output topic we can see the word count of the data that is coming from input topic.

Now at last to stop the Flink cluster, when you’re done, execute following command:

./bin/stop-cluster.sh

Conclusion

In short, we have learned how to set up a local Flink cluster and run our streaming applications on it.

In the upcoming blogs, we will explain state management feature of Flink with the help of which you can keep the results safe, so, that in case of a failure or maintenance the results are not lost. So, stay tuned 🙂

Knoldus-Scala-Spark-Services

Written by 

Himanshu Gupta is a software architect having more than 9 years of experience. He is always keen to learn new technologies. He not only likes programming languages but Data Analytics too. He has sound knowledge of "Machine Learning" and "Pattern Recognition". He believes that best result comes when everyone works as a team. He likes listening to Coding ,music, watch movies, and read science fiction books in his free time.

Discover more from Knoldus Blogs

Subscribe now to keep reading and get access to the full archive.

Continue reading