ML, AI and Data Engineering

Spark: Why should we use SparkSession ???

Spark 2.0 is the next major release of Apache Spark. This brings major change for the level of abstraction for the spark API and libraries. The release has the major change for the ones who want to make use of all the advancement in this release, So in this blog post, I’ll be discussing Spark-Session. Need Of Spark-Session


Naive Bayes is a simple technique for constructing classifiers: models that assign class labels to problem instances, represented as vectors of feature values, where the class labels are drawn from some finite set. Naive Bayes classifier is a straightforward and powerful algorithm for the classification task. Even if we are working on a data set with millions of records with some attributes, it is suggested to try Continue Reading

MachineX: Logistic Regression with KSAI

Logistic Regression, a predictive analysis, is mostly used with binary variables for classification and can be extended to use with multiple classes as results also. We have already studied the algorithm in deep with this blog. Today we will be using KSAI library to build our logistic regression model. Setup

MachineX: Association Rule Learning with KSAI

In many of my previous blogs, I have posted about Association Rule Learning, what it’s about and how it is performed. In this blog, we are going to use Association Rule Learning to actually see it in action, and for this purpose, we are going to use KSAI, a machine learning library purely written in Scala. So, let’s begin. Adding KSAI to your project You Continue Reading

MachineX: A tour to KSAI – Neural Networks

In this blog we would look into how we can use KSAI; A machine learning library purely written in Scala using most of its feature and functional aspects of programming, you can read more about the library at KSAI Wiki, alternatively you can even fork the project from here, KSAI has a rich set of algorithms that address some of the vital problems in classification, Continue Reading

MachineX: KNN algorithm using KSAI

Classification is a well-known area of machine learning. the K-Nearest neighbor algorithm is a simple algorithm that keeps all available cases and classifies new cases based on the similarity with existing cases. KNN has been used in pattern recognition as a non-parametric technique. in this algorithm, a case is classified by a majority of votes of its neighbors. if K=1 then the cases are assigned Continue Reading

MachineX: An Introduction to KSAI, a machine learning library

Take a closer look at Linkedin or any media platform for a couple of minutes, you’ll find that the hot topic in the technology section nowadays is Machine Learning and Artificial Intelligence. Why Machine learning and artificial intelligence? Well needless to say it is transforming the world like anything. People are doing good in business by predicting different aspects, doctors are doing good in medical Continue Reading

DynamoDB Core Components

Amazon DynamoDB: Core Components

  DynamoDB is a part of Amazon Web Services. It is a NoSQL database, which supports key-value and document data structures. In this blog, we will be discussing Core components of DynamoDb. Features of DynamoDb: It is a fully managed NoSQL database. It can store & retrieve any amount of data, and can serve any amount of traffic. To maintain fast performance, it distributes data Continue Reading

CuriosityX: RDDs – The backbone of Apache Spark

In our last blog, we tried to understand about using the spark streaming to transform and transport data between Kafka topics. After reading that many of the readers asked us to give a brief description of RDDs in Spark which we used. So, this blog is totally dedicated to the RDDs in Spark. So let’s start with the very basic question that comes to our mind Continue Reading

The Rise Of Scanamo: Async Access For DynamoDB In Scala

Scanamo is a library to use DynamoDB with Scala in a simpler manner with less error-prone code. Now the question is  “Why should anyone use it?” The answer is very simple. As DynamoDB clients provided by AWS are not available in Scala DSL. So there are a number of libraries available for DynamoDB to write your queries in Scala. But what makes Scanamo different from other Continue Reading

Distributed Transactions and Saga Patterns

In a Knolx session organized by Knoldus, we discussed the idea of following Saga Patterns. For that to be more accessible, I’d like to share the session with the help of this blog. Service-oriented architecture has given us enough advantages to be a predominant architecture in our Industry, but it can’t be all sunshine and rainbows. There are use cases where monoliths are not only Continue Reading

Code Combat II : The Code Battle For The Vanguard Continues…

“If you can dream it, you can do it. ”  -Walt Disney For some coding is a job. For some, it is an exercise. But for us folks here at Knoldus, it’s a Passion. So in order to bring a twist in the daily work schedule, Knoldus held an overnight Hackathon competition within the organization on 18th May 2018 which presented an opportunity for every Knolder(employees Continue Reading

Spark Stream-Stream Join

Tuning spark on yarn

In this blog we will learn how to tuning yarn with spark in both mode yarn-client and yarn-cluster,the only requirement to get started is that you must have a hadoop based yarn-spark cluster with you. In case you want to create a cluster you can follow this blog here. 1. yarn-client mode:  In client mode, the driver runs in the client process, and the application master is only used Continue Reading

%d bloggers like this: