Streaming with Apache Spark Custom Receiver


Hello inqisitor. In previous blog we have seen about the predefined Stream receiver of Spark.

In this blog we are going to discuss about Custom receiver of spark so that we can source the data from any .

So if we want to use Custom Receiver than we should know first we are not going to use SparkSession as entry point , if there are not any such use case .

So at first you should add following dependency to build.sbt :

“org.apache.spark” %% “spark-streaming” % “2.0.0”

Now create a class CustomReceiver which should extends Receiver class  and override onStart() and onStop().

class CustomReceiver extends Receiver[String](StorageLevel.MEMORY_AND_DISK_2){
  override def onStart(): Unit = ???
  override def onStop(): Unit = ???
}

Here onStart() will contain the code to retrieve data from external source periodically and to store the data to stream using store().

override def onStart(): Unit = {
val externalData = retrieveExternalData();
store(externalData)
}

Now what we ned is to configure StreamingContext : 

val conf = new SparkConf()
.setAppName(wohooo)
.setMaster(local[2])
val streamingContext = new StreamingContext(conf, Seconds.apply(2)

and at final step we have to tell Spark StreamingContext about CustomReceiver :

val lines = streamingContext.receiverStream(new CustomReceiver)

After all these things we can do transformation on streamed data.Transformation allows the data from the input DStream to be modified :

val words: DStream[String] = lines.flatMap(_.split(,))

Now we are much concern about transformatio of spark. Computation may be of following type :

  • map(func)
  • flatMap(func)
  • filter(func)
  • repartition(numPartitions)
  • union(otherStream)
  • count()
  • reduce(func)
  • countByValue()
  • reduceByKey(func, [numTasks])
  • join(otherStream, [numTasks])
  • cogroup(otherStream, [numTasks])
  • transform(func)
  • updateStateByKey(func)

I was much curious about how the Computation will be performed on stream. And i got that we can perform Windowed Computation as :

Spark Streaming

val windowedWords = words.reduceByWindow((a: String, b: String) =(a + b), Seconds(10), Seconds(4))

Now you please fasten your seatbelts at this time. and here we go :

streamingContext.start()
streamingContext.awaitTermination()

You can find complete code here

Thanks

KNOLDUS-advt-sticker

Advertisements

About Rahul Kumar

Software Consultant At Knoldus
This entry was posted in apache spark, big data, Scala and tagged , . Bookmark the permalink.

One Response to Streaming with Apache Spark Custom Receiver

  1. Pingback: Streaming with Apache Spark 2.0 | Knoldus

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s