Big Data and Fast Data

Scheduling Jobs with Akka Scheduler

Hey folks, in this blog I am going to explain how can you schedule jobs that you want to repeat over a certain period of time with the help of Akka Scheduler. Suppose you have a use-case in which you want some cleaning background process to run a cleanup-repository method to delete records after a fixed interval of time, then look nowhere else because Akka scheduler Continue Reading

Still with Spring? 9 reasons to Akka

As a niche consulting and development organization we end up in a lot of enterprises who would like to modernize, would like to build web-scale products but they are not ready to look beyond Spring. It is a strongly debated topic and there are reasons for still going the Spring way but the reasons are less and less as we go ahead. For starters, this Continue Reading

kafka with spark

Tuning a Spark Application

Having trouble optimizing your Spark application? If yes, then this blog will surely guide you on how you can optimize it and what parameters should be tuned so that our spark application gives the best performance. Spark applications can cause a bottleneck due to resources such as CPU, memory, network etc. We need to tune our memory usage, data structures tuning, how RDDs need to Continue Reading

HDFS: A Conceptual View

There has been a significant boom in distributed computing over the past few years. Various components communicate with each other over network inspite of being deployed on different physical machines. A distributed file system (DFS) is a file system with data stored on a server. The data is accessed and processed as if it was stored on the local client machine. The DFS makes it convenient to share information Continue Reading

Alpakka – Connecting Kafka and ElasticSearch to Akka streams

In our previous blog, we had a look at what Akka streams are and how they are different from the other streaming mechanisms we have. In this blog, we will be taking a little step forward into the world of Akka Streams. In order to work with Akka streams, we need a mechanism to connect Akka Streams to the existing system components. That is where Alpakka Continue Reading

Akka Streams: Is it a Solution to Your Streaming Problems?

A few days earlier, in our project, we were using Spark streaming and initially, it worked like a charm. But as we were very close to completion of our use case, the unexpected occurred. Spark does have a lot of interesting features, but we had some more custom needs such as running a ton of varying jobs with different actors/flows. Also, we needed something which Continue Reading

Welcome, Akka Typed !!

While I was attending ScalaDays Berlin 2018, I experienced many great ideas but the one which intrigued me the most was “Farewell Any => Unit, welcome Akka Typed!” by Heiko Seeberger where he explained about Akka Typed APIs. In this blog I am going to explain what I learned until now about Akka Typed, therefore, I named this blog “Welcome, Akka Typed !!”, so let’s Continue Reading

Introduction to Akka Streams – Part 1

Akka Streams is a library that is used to process and transform a stream of data. In this blog, I’ll be discussing the components of Akka Streams. Akka Streams is an implementation of Reactive Streams and uses bounded buffer space, and this property is known as boundedness. To use Akka Streams, add the below dependency in your build.sbt:

Exactly-Once Semantics with Apache Kafka

Kafka’s exactly once semantics was recently introduced with the version 0.11 which enabled the message being delivered exactly once to the end consumer even if the producer retries to send the messages. This major release raised many eyebrows in the community as people believed that this is not mathematically possible in distributed systems. Jay Kreps, Co-founder on Confluent, and Co-creator of Apache Kafka explained its Continue Reading

Generate Docker Image For Mesosphere Kafka Client

Have you ever tried to access Kafka running on mesos on top of DCOS, and figure out that you end up with no latest Kafka client image in the docker hub? I have uploaded a new image with the Latest Kafka Stable Version 2.0.0, and one can get it easily – docker pull piyushdocker/kafka-client-2.0.0-image If you want to create your own image with any other Continue Reading

Spark: Why should we use SparkSession ?

Spark 2.0 is the next major release of Apache Spark. This brings major change for the level of abstraction for the spark API and libraries. The release has the major change for the ones who want to make use of all the advancement in this release, So in this blog post, I’ll be discussing Spark-Session. Need Of Spark-Session

Spark vs MapReduce: Which is better?

Both the technologies are equipped with amazing features, however with the increased need for real-time analytics, these two giving tough competition to each other What are MapReduce and Spark? MapReduce:- MapReduce is a programming model for processing huge amounts of data in a parallel and distributed. In this model, there are two tasks that are undertaken Map and Reduce and there is a map function Continue Reading

Knoldus-Clutch-AI-Big-Data-Top

Knoldus Joins Clutch’s Research of Top AI & Big Data Companies in 2018

The advent of the digital economy is a development that has changed the landscapes of every industry across the world. There is a new key ingredient for success; the best performing businesses are those with the best digital platforms, built to drive performance and bring customer interaction to new heights. At Knoldus, we are a team of developers and innovators dedicated to helping businesses reach Continue Reading

%d bloggers like this: