Getting Started With Phantom


Phantom is Reactive type-safe Scala driver for Apache Cassandra/Datastax Enterprise. So, first lets explore what Apache Cassandra is with some basic introduction to it.

Apache Cassandra

Apache Cassandra is a free, open source data storage system that was created at Facebook in 2008. It is highly scalable database designed to handle large amounts of data across many commodity servers, providing high availability with no single point of failure. It is a type of NoSQL database which is Schema-free. For more about Cassandra refer to this blog Getting Started With Cassandra.


We wanted to integrate Cassandra into Scala ecosystem that’s why we used Phantom-DSL as one of the module of outworkers. So, if you are planning on using Cassandra with Scala, phantom is the weapon of choice because of :-

  • Ease of use and quality coding.
  • Reducing code and boilerplate by at least 90%.
  • Automated schema generation

Continue reading

Posted in Scala | Tagged , , , , , | 1 Comment

Introduction to Mesos

What is Mesos ?

In layman’s term, Imagine a busy airport.
Airplanes are constantly taking off and landing.
There are multiple runways, and an airport dispatcher is assigning time-slots to airplanes to land or takeoff.
So Mesos is the airport dispatcher, runways are compute nodes, airplanes are compute tasks, and frameworks like Hadoop, Spark and Google Kubernetes are airlines companies.

In technical terms, Apache Mesos is the first open source cluster manager that handles the workload efficiently in distributed environment through dynamic resource sharing and isolation. This means that you can run any distributed application i.e spark, hadoop etc., which requires clustered resources.

It sits between the application layer and the operating system and makes it easier to deploy and manage applications in large-scale clustered environments more efficiently.

Mesos allows multiple services to scale and utilise a shared pool of servers more efficiently. The key idea behind the Mesos is to turn your data center into one very large computer.

Apache Mesos is the opposite of virtualization because in virtualization one physical resource is divided into multiple virtual resources, while in Mesos multiple physical resources are clubbed into a single virtual resource.

Who is using it?

Prominent users of Mesos include Twitter, Airbnb, MediaCrossing, Xogito and Categorize. Airbnb uses Mesos to manager their big data infrastructure.

Mesos Internals:

Mesos is leveraging features of modern kernels for resource isolation, prioritisation, limiting and accounting. This is normally done by cgroups in Linux, zones in Solaris. Mesos provide resources isolation for CPU, memory, I/O, file system , etc. It is also possible to use Linux containers but current isolation support for Linux container in Mesos is limited to only CPU and memory.

Architecture of Mesos:

Mesos Architecture

Mesos Master:

Mesos master is the heart of the cluster. It guarantees that the cluster will be highly available. It hosts the primary user interface that provides information about the resources available in the cluster. The master is a central source of all running task, it stores in memory all the data related to the task. For the completed task, there is only fixed amount of memory available, thus allowing the master to serve the user interface and data about the task with the minimal latency.

Mesos Agent:

The Mesos Agent holds and manages the container that hosts the executor (all things runs inside a container in Mesos). It manages the communication between the local executor and Mesos master, thus agent acts as an intermediate between them. The Mesos agent publishes the information related to the host they are running in, including data about running task and executors, available resources of the host and other metadata. It guarantees the delivery of status update of the tasks to the schedulers.

Mesos Framework:

Mesos Framework has two parts: The Scheduler and The Executor. The Scheduler registers itself in the Mesos master, and in turn gets the unique framework id. It is the responsibility of scheduler to launch task when the resource requirement and constraints match with received offer the Mesos master. It is also responsible for handling task failures and errors. The executor executes the task launched by the scheduler and notifies back the status of each task.

Continue reading

Posted in Devops, Scala | Tagged , , , | Leave a comment

Jenkins | Problems you might face while pipe-lining!

I expect you to be familiar with basics of Jenkins. If you’re not please visit Introduction to Jenkins, this post will take you through very basics of Jenkins. What I want to introduce to you, are post setup type of things. Means, you have already setup Jenkins and now you are worried about how to do pipelining of the project.

I will take you through the problems that might come when you pick working on a Jenkins Pipeline project that uses Docker images. To understand what is a Docker Image please visit Introduction to Docker, this post will take you through the basics of Docker Image.

Pipe-Lining a Maven Project:

Creating a Pipe-line is similar to create a simple Jenkins Job, but in here you have to give some different configurations for your job. Let’s start.


1. Goto Jenkins Home and click new Item.

2. Select the pipeline option and give a suitable job name and press OK.

3. Now give proper configurations for this Job as defined below:

a. In General tab you can give the project based security to a particaular person/group of people and define what role/permissions you want this person/group to be involved in.

b. You don’t need to touch other settings in General/Job Notification/Office 365 Connector Tabs for a simple pipeline.

c. In next Tab i.e. Build Triggers, you can define the type of trigger you want to automate to build your Job or you can leave it blank if you want to manually trigger your Build.

d. Then the most important configuraion is Pipeline tab.

i) There are two ways to make pipeline, first you can write a script in given textbox in Pipeline Script option, second you can select Pipeline Script from SCM, and provide a Jenkinsfile in your project and give the path in script path.

ii) Then define your Source Code Management(SCM) in SCM option i.e. Git in our case.

iii) Then define your Git Repository URL, your will see an error as show below in the image steps, the next image show how to resolve it. You’ll have to create a proper Jenkins credential for the given Repo and select it in Credential option. The error will then disapear.

Continue reading

Posted in Scala | Leave a comment

Basic Example for Spark Structured Streaming & Kafka Integration

The Spark Streaming integration for Kafka 0.10 is similar in design to the 0.8 Direct Stream approach. It provides simple parallelism, 1:1 correspondence between Kafka partitions and Spark partitions, and access to offsets and metadata. However, because the newer integration uses the new Kafka consumer API instead of the simple API, there are notable differences in usage. This version of the integration is marked as experimental, so the API is potentially subject to change.

In this blog, I am going to implement the basic example on Spark Structured Streaming & Kafka Integration.

Here, I am using

  • Apache Spark  2.2.0
  • Apache Kafka
  • Scala 2.11.8

Create the built.sbt

Let’s create a sbt project and add following dependencies in build.sbt.

libraryDependencies ++= Seq("org.apache.spark" % "spark-sql_2.11" % "2.2.0",
                        "org.apache.spark" % "spark-sql-kafka-0-10_2.11" % "2.2.0",
                        "org.apache.kafka" % "kafka-clients" % "")

Continue reading

Posted in Scala, Spark, Streaming | Tagged , | 1 Comment

Welcome to the world of Riak Database !!!

Today we are going to discuss the Riak Database which is distributed NoSQL Database. In the current scenario, when there are a lot of data into the world, we can not go for the old technology for storing the data. The user wants to keep all record of their data and want to process it at lightning-fast speed, so they use Big Data technology. But old databases are not compatible with the Big Data technology. So Riak provides the functionality for the distribute the data on the multil cluster and perform the operation on it.

What is Riak?

Riak has highly distributed database software. It provides high availability, fault tolerance, operational simplicity, and scalability.

Riak is available in Riak Open Source and Riak Enterprise Edition and provided in two variants – Riak KV and Riak TS.

Continue reading

Posted in big data, database, NoSql, Scala | Tagged | 1 Comment

Testing HTTP services in Angular

Prerequisites :

    1. 1. Understanding of Angular.
    1. 2. Understanding of Component’s unit tests in Angular
    1. 3. Understanding of Karma and Jasmine

Http Service

Let’s consider a simple service to get data using get method of Http service.

Let’s start with writing a test case for this service.

Configuring Testing Module for Service:

Continue reading

Posted in AngularJs2.0, JavaScript, testing | Leave a comment

KNOLX : An Introduction to Jenkins

Hi all,

Knoldus has organized a 30 min session on 18th August 2017 at 3:30 PM. The topic was “An Introduction to Jenkins”. Many people have joined and enjoyed the session. I am going to share the slides and the video of the session here. Please let me know if you have any questions related to linked slides or doubts regarding the content.
Here are the slides:

And Here’s the video of the session:

In case you have any doubts regarding the topic you may ask them in the comment section below.


Posted in Scala | 1 Comment

Introduction to Perceptron: Neural Network

What is Perceptron?

In machine learning, the perceptron is an algorithm for supervised learning of binary classifiers. It is a type of linear classifier, i.e. a classification algorithm that makes its predictions based on a linear predictor function combining a set of weights with the feature vector.
Linear classifier defined that the training data should be classified into corresponding categories i.e. if we are applying classification for the 2 categories then all the training data must be lie in these two categories.
Binary classifier defines that there should be only 2 categories for classification.
Hence, The basic Perceptron algorithm is used for binary classification and all the training example should lie in these categories. The basic unit in the Neuron is called the Perceptron.

Origin of Perceptron:-

The perceptron algorithm was invented in 1957 at the Cornell Aeronautical Laboratory by Frank Rosenblatt, funded by the United States Office of Naval Research. The perceptron was intended to be a machine, rather than a program, and while its first implementation was in software for the IBM 704, it was subsequently implemented in custom-built hardware as the “Mark 1 perceptron“. This machine was designed for image recognition: it had an array of 400 photocells, randomly connected to the “neurons“. Weights were encoded in potentiometers, and weight updates during learning were performed by electric motors. Continue reading

Posted in artificial neural network, machine learning, Scala | 1 Comment

Reactors.IO: Actors Done Right

In our previous blog, we tried to explore the upcoming version of i.e Java 9. So this time we try to focus on Scala . In This Blog , We will be Looking onto a New Reactive programming framework for Scala Applications i.e Reactors IO .

reactress-gradient fuses the Best parts of Functional reactive Programming and the Actor Model.
allows you to create concurrent and distributed applications more easily, by providing correct, robust and composable programming abstractions.Primarily targeting on JVM , the Reactors framework has bindings for both Scala and Java.


Setting Up


To get started with Reactors.IO, you should grab the latest snapshot version distributed on Maven. If you are using SBT, add the following to your project definition :


Then Simply Import the io.reactors package: import io.reactors._  and you are ready to go.

Continue reading

Posted in Functional Programming, Reactive, Scala | Tagged , , , , | 1 Comment

Challenges to Monitoring a Fast Data Application


In the present landscape, the buzzword is “Fast Data” and it is nothing but data that is not at rest. And since the data is not a rest, the traditional techniques of working on the data that is rest are no longer efficient and relevant. The importance of streaming has grown, as it provides a competitive advantage that reduces the time gap between data arrival and information analysis.The business enterprises demand the availability, scalability and resilience as implicit characteristics in the applications and this is catered with the micro service architectures. And these micro services are needing to deal with this real-time requirement of dealing with fast data.

The integration of the fast data processing tools and micro services leads to a system that is Fast Data Application. These Fast Data applications process and extract value from data in near-real-time. Technologies such as Apache Spark, Apache Kafka, and Apache Cassandra have grown to process that data faster and more effectively.These applications are earthing real time insights to drive profitability. But these applications pose a big challenge of monitoring and managing the overall system. The traditional techniques fail because they have been based on monolithic applications and are unable to effectively manage the new distributed, clustered and tangled inter-connected systems.

Continue reading

Posted in Monitor | Tagged , , | Leave a comment