cluster

Tale of Apache Spark

Reading Time: 6 minutes Data is being produced extensively in today’s world and it is going to be generated more rapidly in future. 90% of total data that is produced in the world is produced in last two years only and it is estimated that in 2020 world’s total data would reach 45 ZB and data generated each day would be enough that if we try to store it Continue Reading

Flinkathon: Guide to setting up a Local Flink Custer

Reading Time: 3 minutes In our previous blog post, Flinkathon: First Step towards Flink’s DataStream API, we created our first streaming application using Apache Flink. It was easy, clean, and concise. However, the real power of Apache Flink is seen on a cluster, where data is processed in a distributed manner, with the advantage of multi-core/multi-memory systems. So, in this blog post, we will see how to set up Continue Reading

Let’s create your first Grafana dashboard

Reading Time: 4 minutes In my previous blog, we discussed the setup of Grafana-Graphite for JMX monitoring.  Now we will create a first Grafana dashboard where we will create Grafana queries to visualize JMX metrics stored in Graphite. As we know, Grafana UI runs on http://localhost:3000/ by default so let’s open the URL in the browser with the default username and password which is admin: admin After login either Continue Reading

Integrating Lagom Service Discovery with Kubernetes

Reading Time: 3 minutes Consider a situation where a microservice is deployed on multiple pods and on condition one pod got restarted with any of the failure reason makes unreachable and at the same time interdependent services are registers its IP for communication. Now, since other pods of the service are alive but not able to communicate makes the communication failure as well as making the failure of every Continue Reading

Flinkathon: What makes Flink better than Kafka Streams?

Reading Time: 2 minutes Initially, I would like you all to focus on a few questions before comparing the frameworks:1. Is there any comparison or similarity between Flink and the Kafka?2. What could be better in Flink over the Kafka?3. Is it the problem or system requirement to use one over the other? Before talking about the Flink betterment and use cases over the Kafka, let’s first understand their Continue Reading

Running jmx2graphite as a java agent to push the JMX metrics into Graphite

Reading Time: 2 minutes In my previous blog, we discussed how to monitor a Kafka stream application using Grafana and Graphite. In this solution, we used jmx2graphite as a metrics exporter which takes the metrics from the Jolokia URL where Jolokia exposes the JMX metrics and pushes those metrics to Graphite. But, there is a problem with this solution that we need to deploy one jmx2graphite per service. So Continue Reading

Monitor a Kafka stream application with Graphite-Grafana using JMX metrics

Reading Time: 5 minutes A few days back, we got the requirement that we need to monitor a Kafka stream application using JMX metrics. We looked for the solution and reached to the conclusion which we will discuss in this blog. I will try to explain each and every component of the solution along with the setup and the integration part of the whole system. Proposed solution: Service (application) exposes Continue Reading

CAP Theorem for the distributed systems

Reading Time: 4 minutes A few days back I completed the certification for the 1st course of the Lightbend Reactive Architecture Advanced i.e. Building Scalable Systems. I found this course very helpful and informative to get the idea of Reactive architecture. So if you have not started yet, please go there and lets become reactive. There are few foundational courses as well to build the foundation of reactive architecture. Continue Reading

Transport Cinnamon Matrices From Lagom To Prometheus

Reading Time: 3 minutes Monitoring is a pain when it comes to distributed applications, and even more when you have shared or non-shared variables to monitor in your application. Here in this blog, I’ll explain two tools which can ease the monitoring efforts, one for generating metrics called Cinnamon and other to visualize them, called Prometheus. Let’s have a quick brief intro about these two – Prometheus – An open-source monitoring system with Continue Reading

kafka with spark

Spark Unconstructed | Deep dive into DAG

Reading Time: 4 minutes Apache Spark is all the rage these days. People who work with Big Data, Spark is a household name for them. We have been using it for quite some time now. So we already know that Spark is lightning-fast cluster computing technology, it is faster than Hadoop MapReduce. If you ask any of these Spark techies, how Spark is fast, they would give you a Continue Reading

Apache Solr : Did someone say search engine?

Reading Time: 4 minutes Apache Solr Solr is the popular, blazing-fast, open source enterprise search platform. It is one of the easiest ways of developing sophisticated, high-performance search applications. Based on another  Apache product Lucene, Solr provides developers with capabilities such as advanced full-text search capabilities, scalability, easy monitoring and much more. This blog intends to get you started with Solr and helps you interact with a Solr server.

A Simple walk-through to set up a local Cassandra multi-node cluster

Reading Time: 5 minutes In our earlier blogs we have already gone through The basic Introduction to Cassandra and also tried to explore the Cassandra Reads and Writes. Today we will be discussing something apart from the in-depth theoretical knowledge of Cassandra. In one of our projects , we came through a basic requirement in which we needed to required a local Cassandra cluster for some kind of testing.  Continue Reading

Knoldus Pune Careers - Hiring Freshers

Get a head start on your career at Knoldus. Join us!