Big Data and Fast Data

Akka Streams: Is it a Solution to Your Streaming Problems?

A few days earlier, in our project, we were using Spark streaming and initially, it worked like a charm. But as we were very close to completion of our use case, the unexpected occurred. Spark does have a lot of interesting features, but we had some more custom needs such as running a ton of varying jobs with different actors/flows. Also, we needed something which Continue Reading

Welcome, Akka Typed !!

While I was attending ScalaDays Berlin 2018, I experienced many great ideas but the one which intrigued me the most was “Farewell Any => Unit, welcome Akka Typed!” by Heiko Seeberger where he explained about Akka Typed APIs. In this blog I am going to explain what I learned until now about Akka Typed, therefore, I named this blog “Welcome, Akka Typed !!”, so let’s Continue Reading

Introduction to Akka Streams – Part 1

Akka Streams is a library that is used to process and transform a stream of data. In this blog, I’ll be discussing the components of Akka Streams. Akka Streams is an implementation of Reactive Streams and uses bounded buffer space, and this property is known as boundedness. To use Akka Streams, add the below dependency in your build.sbt:

Exactly-Once Semantics with Apache Kafka

Kafka’s exactly once semantics was recently introduced with the version 0.11 which enabled the message being delivered exactly once to the end consumer even if the producer retries to send the messages. This major release raised many eyebrows in the community as people believed that this is not mathematically possible in distributed systems. Jay Kreps, Co-founder on Confluent, and Co-creator of Apache Kafka explained its Continue Reading

Generate Docker Image For Mesosphere Kafka Client

Have you ever tried to access Kafka running on mesos on top of DCOS, and figure out that you end up with no latest Kafka client image in the docker hub? I have uploaded a new image with the Latest Kafka Stable Version 2.0.0, and one can get it easily – docker pull piyushdocker/kafka-client-2.0.0-image If you want to create your own image with any other Continue Reading

Spark: Why should we use SparkSession ???

Spark 2.0 is the next major release of Apache Spark. This brings major change for the level of abstraction for the spark API and libraries. The release has the major change for the ones who want to make use of all the advancement in this release, So in this blog post, I’ll be discussing Spark-Session. Need Of Spark-Session

Spark vs MapReduce: Which is better?

Both the technologies are equipped with amazing features, however with the increased need for real-time analytics, these two giving tough competition to each other What are MapReduce and Spark? MapReduce:- MapReduce is a programming model for processing huge amounts of data in a parallel and distributed. In this model, there are two tasks that are undertaken Map and Reduce and there is a map function Continue Reading

Knoldus-Clutch-AI-Big-Data-Top

Knoldus Joins Clutch’s Research of Top AI & Big Data Companies in 2018

The advent of the digital economy is a development that has changed the landscapes of every industry across the world. There is a new key ingredient for success; the best performing businesses are those with the best digital platforms, built to drive performance and bring customer interaction to new heights. At Knoldus, we are a team of developers and innovators dedicated to helping businesses reach Continue Reading

Trying to use form field and file upload directive together in Akka-Http?

If you are reading this then probably you have encountered the issue that comes while accessing fields with uploading a file, and if not then let me first tell you what’s the situation that I’m talking about here, Let’s see the code where we try to use fileUpload directive and formFields together

Spark Structured Streaming with Elasticsearch

There’s been a lot of time we have been working on streaming data. Using Apache Spark for that can be much convenient. Spark provides two APIs for streaming data one is Spark Streaming which is a separate library provided by Spark. Another one is Structured Streaming which is built upon the Spark-SQL library. We will discuss the trade-offs and differences between these two libraries in Continue Reading

Ethereum Networks

Ethereum Networks – Part II: Setting up Private Testnet on the local machine

In the previous blog: Understanding the different kinds of Ethereum Networks, we talked about what are the different kinds of Ethereum Networks and how to choose a specific network when starting the development with Ethereum. In this blog, we will provide you a cheat sheet of Linux commands which you can refer to quickly set up the private network on your machine without going through Continue Reading

MachineX: NAIVE BAYES CLASSIFIER with KSAI

Naive Bayes is a simple technique for constructing classifiers: models that assign class labels to problem instances, represented as vectors of feature values, where the class labels are drawn from some finite set. Naive Bayes classifier is a straightforward and powerful algorithm for the classification task. Even if we are working on a data set with millions of records with some attributes, it is suggested to try Continue Reading

MachineX: A tour to KSAI – Neural Networks

In this blog we would look into how we can use KSAI; A machine learning library purely written in Scala using most of its feature and functional aspects of programming, you can read more about the library at KSAI Wiki, alternatively you can even fork the project from here, KSAI has a rich set of algorithms that address some of the vital problems in classification, Continue Reading

%d bloggers like this: