ML, AI and Data Engineering

kafka with spark

Tuning a Spark Application

Having trouble optimizing your Spark application? If yes, then this blog will surely guide you on how you can optimize it and what parameters should be tuned so that our spark application gives the best performance. Spark applications can cause a bottleneck due to resources such as CPU, memory, network etc. We need to tune our memory usage, data structures tuning, how RDDs need to Continue Reading

HDFS: A Conceptual View

There has been a significant boom in distributed computing over the past few years. Various components communicate with each other over network inspite of being deployed on different physical machines. A distributed file system (DFS) is a file system with data stored on a server. The data is accessed and processed as if it was stored on the local client machine. The DFS makes it convenient to share information Continue Reading

Spark: Why should we use SparkSession ?

Spark 2.0 is the next major release of Apache Spark. This brings major change for the level of abstraction for the spark API and libraries. The release has the major change for the ones who want to make use of all the advancement in this release, So in this blog post, I’ll be discussing Spark-Session. Need Of Spark-Session

MachineX: NAIVE BAYES CLASSIFIER with KSAI

Naive Bayes is a simple technique for constructing classifiers: models that assign class labels to problem instances, represented as vectors of feature values, where the class labels are drawn from some finite set. Naive Bayes classifier is a straightforward and powerful algorithm for the classification task. Even if we are working on a data set with millions of records with some attributes, it is suggested to try Continue Reading

MachineX: Logistic Regression with KSAI

Logistic Regression, a predictive analysis, is mostly used with binary variables for classification and can be extended to use with multiple classes as results also. We have already studied the algorithm in deep with this blog. Today we will be using KSAI library to build our logistic regression model. Setup

MachineX: Association Rule Learning with KSAI

In many of my previous blogs, I have posted about Association Rule Learning, what it’s about and how it is performed. In this blog, we are going to use Association Rule Learning to actually see it in action, and for this purpose, we are going to use KSAI, a machine learning library purely written in Scala. So, let’s begin. Adding KSAI to your project You Continue Reading

MachineX: A tour to KSAI – Neural Networks

In this blog we would look into how we can use KSAI; A machine learning library purely written in Scala using most of its feature and functional aspects of programming, you can read more about the library at KSAI Wiki, alternatively you can even fork the project from here, KSAI has a rich set of algorithms that address some of the vital problems in classification, Continue Reading

MachineX: KNN algorithm using KSAI

Classification is a well-known area of machine learning. the K-Nearest neighbor algorithm is a simple algorithm that keeps all available cases and classifies new cases based on the similarity with existing cases. KNN has been used in pattern recognition as a non-parametric technique. in this algorithm, a case is classified by a majority of votes of its neighbors. if K=1 then the cases are assigned Continue Reading

MachineX: An Introduction to KSAI, a machine learning library

Take a closer look at Linkedin or any media platform for a couple of minutes, you’ll find that the hot topic in the technology section nowadays is Machine Learning and Artificial Intelligence. Why Machine learning and artificial intelligence? Well needless to say it is transforming the world like anything. People are doing good in business by predicting different aspects, doctors are doing good in medical Continue Reading

DynamoDB Core Components

Amazon DynamoDB: Core Components

  DynamoDB is a part of Amazon Web Services. It is a NoSQL database, which supports key-value and document data structures. In this blog, we will be discussing Core components of DynamoDb. Features of DynamoDb: It is a fully managed NoSQL database. It can store & retrieve any amount of data, and can serve any amount of traffic. To maintain fast performance, it distributes data Continue Reading

CuriosityX: RDDs – The backbone of Apache Spark

In our last blog, we tried to understand about using the spark streaming to transform and transport data between Kafka topics. After reading that many of the readers asked us to give a brief description of RDDs in Spark which we used. So, this blog is totally dedicated to the RDDs in Spark. So let’s start with the very basic question that comes to our mind Continue Reading

The Rise Of Scanamo: Async Access For DynamoDB In Scala

Scanamo is a library to use DynamoDB with Scala in a simpler manner with less error-prone code. Now the question is  “Why should anyone use it?” The answer is very simple. As DynamoDB clients provided by AWS are not available in Scala DSL. So there are a number of libraries available for DynamoDB to write your queries in Scala. But what makes Scanamo different from other Continue Reading

Distributed Transactions and Saga Patterns

In a Knolx session organized by Knoldus, we discussed the idea of following Saga Patterns. For that to be more accessible, I’d like to share the session with the help of this blog. Service-oriented architecture has given us enough advantages to be a predominant architecture in our Industry, but it can’t be all sunshine and rainbows. There are use cases where monoliths are not only Continue Reading

%d bloggers like this: