Author Archives: Manish Mishra

About Manish Mishra

Manish is a Scala Developer at Knoldus Software LLP. He loves to learn and share about Functional Programming, Scala, Akka, Spark.

AMPS: Empowering real time message driven applications.


Greetings!! In this blog, we will talk about AMPS, a pub-sub engine which delivers messages in real time with a subject of interest. AMPS is mainly used by Financial Institutions as enterprise message bus. We will also demonstrate how we … Continue reading

Posted in Messages, MessagesAPI, Scala | 1 Comment

Introduction to Structured Streaming


Hello!!  Knoldus had organized half an hour session on Structured Streaming briefing about the API changes, how it is different from the early Stream Computation paradigm (DStreams) and example API demonstration. Hope you will enjoy. Below are the slides and Video … Continue reading

Posted in apache spark, Scala, Spark, Streaming | 1 Comment

Sharing RDD’s states across Spark applications with Apache Ignite


Apache Ignite offers an abstraction over native Spark RDDs such that the state of RDDs can be shared across spark jobs, workers and applications which is not possible with native Spark RDDS. In this blog, we will walk through the … Continue reading

Posted in apache spark, Scala, Spark | Tagged , , , , | 3 Comments

Controlling RDD Partitions in Apache Spark


In this blog, we will discuss What is RDD partitioning, why Partitioning is important and how to create and use spark Partitioners to minimize the shuffle operations across the nodes in a distributed Spark application. What is Partitioning? Partitioning is a transformation … Continue reading

Posted in apache spark, Scala, Spark | Tagged , , , , , | 1 Comment

Build your personalized movie recommender with Scala and Spark


In this blog I will explain what is a recommendation engine in general, and How to build a personalized recommendation model using Scala and Spark Collaborative filtering algorithm. What is a Recommendation Engine? I assume you’ve shopped online for books … Continue reading

Posted in Scala | Tagged , , , , , | 1 Comment

Introduction to Java 8


The Functional Features of Java8 Java 8 was a major release in terms of language and APIs. The language includes several ideas from functional programming like behavior parameterization, passing lambda expression as methods, processing data with stream pipelines etc. The following presentation … Continue reading

Posted in Java, Scala | Tagged , , , | 2 Comments

Broadcast variables in Spark, how and when to use them?


As documentation for Spark Broadcast variables states, they are immutable shared variable which are cached on each worker nodes on a Spark cluster.  In this blog, we will demonstrate a simple use case of broadcast variables. When to use Broadcast variable? … Continue reading

Posted in apache spark, big data, Scala, Spark | Tagged , , | 1 Comment

Aggregating Neighboring vertices with Apache Spark GraphX Library


To get the problems addressed by “Neighborhood Aggregation”, we can think of the queries like: “Who has the maximum number of followers under 20 on twitter?” In this blog, we will learn how to aggregate properties of neighboring vertices on a graph … Continue reading

Posted in apache spark, Scala, Spark | Tagged , , , | 1 Comment

A sample ML Pipeline for Clustering in Spark


Often a machine learning task contains several steps such as extracting features out of raw data, creating learning models to train on features and running predictions on trained models, etc.  With the help of the pipeline API provided by Spark, it … Continue reading

Posted in apache spark, big data, Scala, Spark | Tagged , , , , | 12 Comments

Introduction to Machine Learning with Spark (Clustering)


In this blog, we will learn how to group similar data objects using K-means clustering offered by Spark Machine Learning Library. Prerequisites The code example needs only Spark Shell to execute. What is Clustering Clustering is like grouping data objects … Continue reading

Posted in Scala | Tagged , , , , , , ,