Tag Archives: Spark

Getting Started with Apache Spark


Introduction Apache Spark is an open source big data processing framework built around speed, ease of use, and sophisticated analytics. It was originally developed in 2009 in UC Berkeley’s AMPLab, and open sourced in 2010 as an Apache project. Spark … Continue reading

Posted in apache spark, Scala, Spark | Tagged , , , , , , | 1 Comment

Application compatibility for different Spark versions


Recently spark version 2.1 was released and there is a significant difference between the 2 versions. Spark 1.6 has DataFrame and SparkContext while 2.1 has Dataset and SparkSession. Now the question arises how to write code so that both the versions of … Continue reading

Posted in apache spark, Java, Scala, Spark | Tagged , , , , | 3 Comments

Tableau: Getting into Tableau Public


Big Data visualization and Business Intelligence got so easy using Tableau, millions and billions of records can be analyzed in just one go whether your data format is excel, csv, text or database, Tableau make it easy for you. So … Continue reading

Posted in apache spark, big data, Scala, Spark, Tableau | Tagged , , , , , , , | Leave a comment

Business Intelligence-Data Visualization: Tableau


Spark, Bigdata, NoSQL, Hadoop are some of the most using and top in charts technologies that we frequently use in Knoldus, when these terms used than one thing comes into picture is ‘Huge Data, millions/billions of records’ Knoldus developers use … Continue reading

Posted in Scala, Tableau | Tagged , , , , , , , , | 2 Comments

Finding the Impact of a Tweet using Spark GraphX


Social Network Analysis (SNA), a process of investigating social structures using Networks and Graphs, has become a very hot topic nowadays. Using it, we can answer many questions like: How many connections an individual have ? What is the ability … Continue reading

Posted in apache spark, big data, graph, Scala, Spark | Tagged , , , | 3 Comments

BigData Specifications – Part 1 : Configuring MySql Metastore in Apache Hive


Apache Hive is used as a data warehouse over Hadoop to provide users a way to load, analyze and query the data from various resources. Data is stored into databases or file systems like HDFS (Hadoop Distributed File System). Hive … Continue reading

Posted in Scala | Tagged , , , , , , , , , | Leave a comment

Cassandra with Spark 2.0 : Building Rest API !


In this tutorial , we will be demonstrating how to make a REST service in Spark using Akka-http as a side-kick  😉  and Cassandra as the data store. We have seen the power of Spark earlier and when it is … Continue reading

Posted in Akka, akka-http, apache spark, Cassandra, Scala, scalatest, Spark | Tagged , , , , , , , , , , , , , | 2 Comments

Spark – LDA : A Complete example of clustering algorithm for topic discovery.


In this blog we will be demonstrating the functionality of applying the full ML pipeline over a set of documents which in this case we are using 10 books from the internet. So lets start with first thing first.. What … Continue reading

Posted in apache spark, Scala, Spark | Tagged , , , , , , , , , , , , , , , , , , , , , , | 7 Comments

Spark – IoT : Combining Big Data Analysis with IoT


Welcome back , folks ! Time for some new gig ! I think that last series i.e. Scala – IOT was pretty amazing , which got an overwhelming response from you all which resulted in pumping up the idea of … Continue reading

Posted in apache spark, IOT, Scala, Spark | Tagged , , , , , , , , , , , | 2 Comments

Scala-IOT : Introduction to Internet Of Things.


Recently this word IOT is gaining lot of popularity. And we see a lot of news on it like the world is moving towards IOT , and its the next big thing and smart cities are no longer a fiction  … Continue reading

Posted in IOT, Scala | Tagged , , , , , , , , , , , | 12 Comments