Spark

Tutorial: How to build a Tokenizer in Spark and Scala

Reading Time: 2 minutes In our earlier blog A Simple Application in Spark and Scala, we explained how to build Spark and make a simple application using it. In this blog, we will see how to build a fast Tokenizer in Spark & Scala using sbt. Tokenization is the process of breaking a stream of text up into words, phrases, symbols, or other meaningful elements called tokens. The list of tokens Continue Reading

A Simple Application in Spark and Scala

Reading Time: 1 minute In this blog, we will see how to build a Simple Application in Spark and Scala using sbt. Spark is a Map-Reduce like cluster computing framework, designed to make data analytics fast. In this application we will count the number of lines containing “the”. To build this application we are going to use Spark 0.9.1, Scala 2.10.3 & sbt 0.13.0. Before start building this application follow these Continue Reading

Running standalone Scala job on Amazon EC2 Spark cluster

Reading Time: 2 minutes In this blog, I would explain how to run a standalone Scala job on Amazon EC2 Spark cluster from local machine. This is a simple example to process a file, which is stored on Amazon S3. Please follow below steps to run a standalone Scala job on Amazon EC2 Spark cluster: 1) If you have not installed Spark on your machine, please follow instructions from Continue Reading

Knoldus Pune Careers - Hiring Freshers

Get a head start on your career at Knoldus. Join us!