Author: Pinku Swargiary

All you need to know about Avro schema

Reading Time: 4 minutes In this post, we are going to dive into the basics of the Avro Schema. We will create a sample avro schema and serialize it to a sample output file and also read the file as an example according to the avro schema. Intro to Avro Apache Avro is a data serialization system developed by Doug Cutting, the father of Hadoop that helps with data Continue Reading

Streaming from Kafka to PostgreSQL through Spark Structured Streaming

Reading Time: 3 minutes Hello everyone, in this blog we are going to learn how to do a structured streaming in spark with kafka and postgresql in our local system. We will be doing all this using scala so without any furthur pause, lets begin. Setting up the necessities first: Dependencies Set up the required dependencies for scala, spark, kafka and postgresql. 2. PostgreSQL setup Lets start fresh by Continue Reading

Kryo Serialization in Spark

Reading Time: 4 minutes Spark provides two types of serialization libraries: Java serialization and (default) Kryo serialization. For faster serialization and deserialization spark itself recommends to use Kryo serialization in any network-intensive application. Then why is it not set to default : Why Kryo is not set to default in Spark? The only reason Kryo is not set to default is because it requires custom registration. Although, Kryo is Continue Reading