Spark

Setup a Apache Spark cluster in your single standalone machine

Reading Time: 2 minutes If we want to make a cluster in standalone machine we need to setup some configuration. We will be using the launch scripts that are provided by Spark, but first of all there are a couple of configurations we need to set first of all setup a spark environment so open the following file or create if its not available with the help of template Continue Reading

Meetup: Introduction to Spark with Scala

Reading Time: < 1 minute Knoldus organized a Meetup on Wednesday, 1 April 2015. In this Meetup, we gave a brief Introduction to Spark with Scala. Apache Spark is a fast and general engine for large-scale data processing. A wide range of organizations are using it to process large datasets. Many Spark and Scala enthusiasts attended this session and got an insight of Apache Spark. Examples shown in above slides can be downloaded from Continue Reading

Play with Spark: Building Spark MLLib in a Play Spark Application

Reading Time: 2 minutes In our last post of Play with Spark! series, we saw how to integrate Spark SQL in a Play Scala application. Now in this blog we will see how to add Spark MLLib feature in a Play Scala application. Spark MLLib is a new component under active development. It was first released with Spark 0.8.0. It contains some common machine learning algorithms and utilities, including classification, regression, clustering, Continue Reading

Play with Spark: Building Spark SQL in a Play Spark Application

Reading Time: 2 minutes In our last post of Play with Spark! series, we saw how to integrate Spark Streaming in a Play Scala application. Now in this blog we will see how to add Spark SQL feature in a Play Scala application. Spark SQL is a powerful tool of Apache Spark. It allows relational queries, expressed in SQL, HiveQL, or Scala, to be executed using Spark. Apache Spark has a new Continue Reading

Play with Spark: Building Apache Spark with Play Framework – (Part – 2)

Reading Time: 2 minutes Last week, we saw how to build a Simple Spark Application in Play using Scala. Now in this blog we will see how to add Spark’s Twitter Streaming feature in a Play Scala application. Spark Streaming is a powerful tool of Spark. It runs on top of Spark. It gives the ability to process and analyze real-time streaming data (in batches) along with fault-tolerant characteristics Continue Reading

Play with Spark: Building Apache Spark with Play Framework

Reading Time: < 1 minute Nowadays, Play framework is being used a lot, for building Scala Applications. It is easy to use & it is Typesafe. So, in this post, we will see how to build a Spark Application in Play 2.2.x. Although Play also uses sbt to build an application but building a Spark Application in Play is totally different. Before start building this application follow the instructions of building Continue Reading

Tutorial: How to build a Tokenizer in Spark and Scala

Reading Time: 2 minutes In our earlier blog A Simple Application in Spark and Scala, we explained how to build Spark and make a simple application using it. In this blog, we will see how to build a fast Tokenizer in Spark & Scala using sbt. Tokenization is the process of breaking a stream of text up into words, phrases, symbols, or other meaningful elements called tokens. The list of tokens Continue Reading

A Simple Application in Spark and Scala

Reading Time: < 1 minute In this blog, we will see how to build a Simple Application in Spark and Scala using sbt. Spark is a Map-Reduce like cluster computing framework, designed to make data analytics fast. In this application we will count the number of lines containing “the”. To build this application we are going to use Spark 0.9.1, Scala 2.10.3 & sbt 0.13.0. Before start building this application follow these Continue Reading

Running standalone Scala job on Amazon EC2 Spark cluster

Reading Time: 2 minutes In this blog, I would explain how to run a standalone Scala job on Amazon EC2 Spark cluster from local machine. This is a simple example to process a file, which is stored on Amazon S3. Please follow below steps to run a standalone Scala job on Amazon EC2 Spark cluster: 1) If you have not installed Spark on your machine, please follow instructions from Continue Reading