Category Archives: Spark

Getting Started with Apache Spark


Introduction Apache Spark is an open source big data processing framework built around speed, ease of use, and sophisticated analytics. It was originally developed in 2009 in UC Berkeley’s AMPLab, and open sourced in 2010 as an Apache project. Spark … Continue reading

Posted in apache spark, Scala, Spark | Tagged , , , , , , | 1 Comment

Introduction To HADOOP !


Here I am to going to  write a blog on Hadoop! “Bigdata is not about data! The value in Bigdata [is in] the analytics. ” -Harvard Prof. Gary King So the Hadoop came into Introduction! Hadoop is an open source, … Continue reading

Posted in Apache Flink, apache spark, big data, database, HDFS, knoldus, Scala, software, Spark, Test, testing | 2 Comments

The Dominant APIs of Spark: Datasets, DataFrames and RDDs


While working with Spark often we come across the three APIs: DataFrames, Datasets and RDDs.  In this blog I will discuss the three in terms of use case, performance and optimization.  It is essential to keep in mind that there … Continue reading

Posted in Spark | Tagged , , , , , , | 1 Comment

Reading data from different sources using Spark 2.1


Hi all, In this blog, we’ll be discussing on fetching data from different sources like csv, json, text and parquet files. So first of all let’s discuss what’s new in Spark 2.1. In previous versions of Spark, you had to create … Continue reading

Posted in apache spark, sbt, Scala, Spark | Leave a comment

Spark Cassandra Connector On Spark-Shell


Using Spark-Cassandra-Connector on Spark Shell Hi All , In this blog we will see how we can execute our spark code on spark shell using Cassandra . This is very efficient at testing or learning time , where we have … Continue reading

Posted in apache spark, big data, Cassandra, Scala, Spark | 2 Comments

Introduction to Structured Streaming


Hello!!  Knoldus had organized half an hour session on Structured Streaming briefing about the API changes, how it is different from the early Stream Computation paradigm (DStreams) and example API demonstration. Hope you will enjoy. Below are the slides and Video … Continue reading

Posted in apache spark, Scala, Spark, Streaming | 1 Comment

Partition-Aware Data Loading in Spark SQL


Data loading, in Spark SQL, means loading data in memory/cache of Spark worker nodes. For which we use to write following code: val connectionProperties = new Properties() connectionProperties.put(“user”, “username”) connectionProperties.put(“password”, “password”) val jdbcDF = spark.read .jdbc(“jdbc:postgresql:dbserver”, “schema.table”, connectionProperties) In here we are … Continue reading

Posted in Scala, Spark | Tagged , , , | 7 Comments

Application compatibility for different Spark versions


Recently spark version 2.1 was released and there is a significant difference between the 2 versions. Spark 1.6 has DataFrame and SparkContext while 2.1 has Dataset and SparkSession. Now the question arises how to write code so that both the versions of … Continue reading

Posted in apache spark, Java, Scala, Spark | Tagged , , , , | 3 Comments

Knoldus Bags the Prestigious Huawei Partner of the Year Award


Knoldus was humbled to receive the prestigious partner of the year award from Huawei at a recently held ceremony in Bangalore, India. It means a lot for us and is a validation of the quality and focus that we put … Continue reading

Posted in Akka, Scala, Spark | Tagged , | 5 Comments

Twitter’s tweets analysis using Lambda Architecture


Hello Folks, In this blog i will explain  twitter’s tweets analysis with lambda architecture. So first we need to understand  what is lambda architecture,about its component and usage. According to Wikipedia, Lambda architecture is a data processing architecture designed to handle … Continue reading

Posted in Akka, akka-http, Apache Kafka, apache spark, Architecture, Batch, big data, Cassandra, Scala, Spark, Streaming | 5 Comments