Category Archives: Spark

Spark Structured Streaming: A Simple Definition


“Structured Streaming”, nowadays we are hearing this term in Apache Spark ecosystem quite a lot, as it is being preached as next big thing in scalable big data world. Although, we all know that Structured Streaming means a stream having … Continue reading

Posted in Scala, Spark, Streaming | Tagged , , , , | 1 Comment

Play-Spark2 A simple Application


In This Blog We Will Create  a very simple application with Play FrameWork And Spark. Apache Spark is a fast and general-purpose cluster computing system. It provides high-level APIs in Scala, Java, and Python that make parallel jobs easy to … Continue reading

Posted in Play Framework, Scala, Spark | Leave a comment

Apache Spark: 3 Reasons Why You Should Not Use RDDs


Apache Spark, whenever we hear these two words, the first thing that comes to our mind is RDDs, i.e., Resilient Distributed Datasets. Now, it has been more than 5 years since Apache Spark came into existence and after its arrival a lot … Continue reading

Posted in apache spark, big data, Scala, Spark | Tagged | 1 Comment

Dealing With Deltas In Amazon Redshift


Hi, In this blog I would like to discuss a scenario of Deltas implementation in Amazon Redshift using spark-redshift. Prior to that I would like to make you aware of Amazon Redshift, spark-redshift library and integration of Spark with Redshift. … Continue reading

Posted in Amazon, apache spark, AWS, AWS Services, database, Scala, Spark | Tagged , , | Leave a comment

Apache Spark : Handle null timestamp while reading csv in Spark 2.0.0


Hello folks, Hope you all are doing good !!! In this blog, I will discuss a problem which I faced some days back. One thing to keep in mind that this problem is specifically related to Spark version 2.0.0. Other … Continue reading

Posted in apache spark, big data, Scala, Spark | Leave a comment

Getting Started with Apache Spark


Introduction Apache Spark is an open source big data processing framework built around speed, ease of use, and sophisticated analytics. It was originally developed in 2009 in UC Berkeley’s AMPLab, and open sourced in 2010 as an Apache project. Spark … Continue reading

Posted in apache spark, Scala, Spark | Tagged , , , , , , | 1 Comment

Introduction To HADOOP !


Here I am to going to  write a blog on Hadoop! “Bigdata is not about data! The value in Bigdata [is in] the analytics. ” -Harvard Prof. Gary King So the Hadoop came into Introduction! Hadoop is an open source, … Continue reading

Posted in Apache Flink, apache spark, big data, database, HDFS, knoldus, Scala, software, Spark, Test, testing | 2 Comments

The Dominant APIs of Spark: Datasets, DataFrames and RDDs


While working with Spark often we come across the three APIs: DataFrames, Datasets and RDDs.  In this blog I will discuss the three in terms of use case, performance and optimization.  It is essential to keep in mind that there … Continue reading

Posted in Spark | Tagged , , , , , , | 1 Comment

Reading data from different sources using Spark 2.1


Hi all, In this blog, we’ll be discussing on fetching data from different sources like csv, json, text and parquet files. So first of all let’s discuss what’s new in Spark 2.1. In previous versions of Spark, you had to create … Continue reading

Posted in apache spark, sbt, Scala, Spark | Leave a comment

Spark Cassandra Connector On Spark-Shell


Using Spark-Cassandra-Connector on Spark Shell Hi All , In this blog we will see how we can execute our spark code on spark shell using Cassandra . This is very efficient at testing or learning time , where we have … Continue reading

Posted in apache spark, big data, Cassandra, Scala, Spark | 2 Comments