Author: Manish Mishra

Setting up cucumber and sbt in IntelliJ

If you want to look into a starter project where cucumber feature file can be run right from IntelliJ IDEA with sbt as a build tool, this blog is a perfect match.  I will put the below ingredients and complete recipe on how to write a feature file and plug it in with your implementation steps in IntelliJ IDEA. Ingredients: build.sbt First three lines may Continue Reading

Ansible Playbooks vs Roles: Part I – The Playbook

I assume you know with basics of Ansible and familiar with the terminologies of Ansible. If not, you can visit our previous starter blog.  In this blog, we will take a look on how we can quickly start writing Ansible playbooks with a simple example and in the later or next part of the blog, we will look at how they can be reused to Continue Reading

Blue Green Deployments: Reducing the downtime of apps

Ever heard of “application outage”? As part of agile practice we release our work frequently and often when a newer version of an application released to production, we get application outages due to issues like unexpected traffic, introduced a bug into the newer version or other unknown PITA issues. This cause some (actually a lot!) of chaos in terms of time efforts to recover from failures and Continue Reading

AMPS: Empowering real time message driven applications.

Greetings!! In this blog, we will talk about AMPS, a pub-sub engine which delivers messages in real time with a subject of interest. AMPS is mainly used by Financial Institutions as enterprise message bus. We will also demonstrate how we can use AMPS with to publish and subscribe messages with an example. So, let’s start with introducing AMPS.  What is AMPS? Advanced Message Processing System Continue Reading

Introduction to Structured Streaming

Hello!!  Knoldus had organized half an hour session on Structured Streaming briefing about the API changes, how it is different from the early Stream Computation paradigm (DStreams) and example API demonstration. Hope you will enjoy. Below are the slides and Video from the session. Slide: Video:

Sharing RDD’s states across Spark applications with Apache Ignite

Apache Ignite offers an abstraction over native Spark RDDs such that the state of RDDs can be shared across spark jobs, workers and applications which is not possible with native Spark RDDS. In this blog, we will walk through the steps on how to share RDDs between two spark Application. Preparing Ingredients To test the Apache Ignite with Apache Spark application we need at least one master Continue Reading

Controlling RDD Partitions in Apache Spark

In this blog, we will discuss What is RDD partitioning, why Partitioning is important and how to create and use spark Partitioners to minimize the shuffle operations across the nodes in a distributed Spark application. What is Partitioning? Partitioning is a transformation operation which is available on all key value pair RDDs  in Apache Spark. It is required when we try to group values on the basis Continue Reading

Build your personalized movie recommender with Scala and Spark

In this blog I will explain what is a recommendation engine in general, and How to build a personalized recommendation model using Scala and Spark Collaborative filtering algorithm. What is a Recommendation Engine? I assume you’ve shopped online for books or visited movie review sites to pick top rated movies to watch. You must have been seen top rated movie lists which have been voted Continue Reading

Introduction to Java 8

The Functional Features of Java8 Java 8 was a major release in terms of language and APIs. The language includes several ideas from functional programming like behavior parameterization, passing lambda expression as methods, processing data with stream pipelines etc. The following presentation describes the functional programming add on in Java 8. We will be introducing the lambda expression, Functional Interfaces, Default methods and Stream API in Java Continue Reading

Broadcast variables in Spark, how and when to use them?

As documentation for Spark Broadcast variables states, they are immutable shared variable which are cached on each worker nodes on a Spark cluster.  In this blog, we will demonstrate a simple use case of broadcast variables. When to use Broadcast variable? Think of a problem as counting grammar elements for any random English paragraph, document or file. Suppose you have the Map of each word as specific Continue Reading

Aggregating Neighboring vertices with Apache Spark GraphX Library

To get the problems addressed by “Neighborhood Aggregation”, we can think of the queries like: “Who has the maximum number of followers under 20 on twitter?” In this blog, we will learn how to aggregate properties of neighboring vertices on a graph with Apache Spark’s GraphX Library. The spark shell will be enough to understand the code example. So, let us get back on the problem statement. Let Continue Reading

A sample ML Pipeline for Clustering in Spark

Often a machine learning task contains several steps such as extracting features out of raw data, creating learning models to train on features and running predictions on trained models, etc.  With the help of the pipeline API provided by Spark, it is easier to combine and tune multiple ML algorithms into a single workflow. Whats is in the blog? We will create a sample ML pipeline Continue Reading

%d bloggers like this: