Cloud

Amazon EMR

Reading Time: 3 minutes Businesses worldwide are discovering the power of new big data processing and analytics frameworks like Apache Hadoop and Apache Spark, but they are also discovering some of the challenges of operating these technologies in on-premises data lake environments. They may also have concerns about the future of their current distribution vendor. Common problems of on-premises big data environments include a lack of agility, excessive costs, Continue Reading

Apache Spark: Read Data from S3 Bucket

Reading Time: < 1 minute Amazon S3 Accessing S3 Bucket through Spark Edit spark-default.conf file You need to add below 3 lines consists of your S3 access key, secret key & file system

Apache Spark

Deep Dive into Apache Spark Transformations and Action

Reading Time: 4 minutes In our previous blog of Apache Spark, we discussed a little about what Transformations & Actions are? Now we will get deeper into the topic and will understand what actually they are & how they play a vital role to work with Apache Spark? What is Spark RDD? Spark introduces the concept of an RDD (Resilient Distributed Dataset), an immutable fault-tolerant, distributed collection of objects Continue Reading

Defining your workflow: Why Not Airflow?

Reading Time: 4 minutes What is Apache Airflow? Airflow is a platform to programmatically author, schedule & monitor workflows or data pipelines. These functions achieved with Directed Acyclic Graphs (DAG) of the tasks. It is an open-source and still in the incubator stage. It was initialized in 2014 under the umbrella of Airbnb since then it got an excellent reputation with approximately 800 contributors on GitHub and 13000 stars. Continue Reading

Managing Terraform State

Reading Time: 4 minutes In this blog, We are going to learn how Terraform keeps tracks the state of your infrastructure and configuration. With the help of an example, we will learn how can we store state file to a remote location. We can create infrastructure on a cloud in various ways using CLI, directly using UI or any automation tool like terraform etc. Then how terraform would know Continue Reading

Let’s create your first Grafana dashboard

Reading Time: 4 minutes In my previous blog, we discussed the setup of Grafana-Graphite for JMX monitoring.  Now we will create a first Grafana dashboard where we will create Grafana queries to visualize JMX metrics stored in Graphite. As we know, Grafana UI runs on http://localhost:3000/ by default so let’s open the URL in the browser with the default username and password which is admin: admin After login either Continue Reading

Running jmx2graphite as a java agent to push the JMX metrics into Graphite

Reading Time: 2 minutes In my previous blog, we discussed how to monitor a Kafka stream application using Grafana and Graphite. In this solution, we used jmx2graphite as a metrics exporter which takes the metrics from the Jolokia URL where Jolokia exposes the JMX metrics and pushes those metrics to Graphite. But, there is a problem with this solution that we need to deploy one jmx2graphite per service. So Continue Reading

Firebase: RealtimeDB & Firestore

Reading Time: 2 minutes Recently, while searching for persistence solutions for one of our projects, we started considering the hosted and managed offerings on the Firebase Platform. Firebase has two primary database offerings, the original Realtime Database (RtDB) and the newer Cloud Firestore (CFS) which is currently in beta. In this post, we’ll briefly list down some of the current similarities and differences that we encountered while tinkering with Continue Reading

Fault Handling in Apigee

Reading Time: 4 minutes Hi all, In my previous blogs on APIGEE we: Went through the basic introduction of Apigee. Went through the main policies and how to apply them on our proxies. Saw how to extract out a header, extract a list of values out of a header. If you like you can go through those once again here: Basics of Apigee. click here Playing with Policies. click here Extract the Continue Reading

Unveiling The Mystery Of Serverless

Reading Time: 2 minutes In this blog, we will explore about Serverless and why it is trending so much? Serverless, is itself a self-explanatory word, means there are no servers. But, is it really true? No, it is not. Serverless does not mean the absence of servers. There are servers actually, it’s just that we don’t have to manage them. All the infrastructure is provided by companies like AWS, Google, Azure Continue Reading

Terraform: Enabling developer to create and manage deployment through code

Reading Time: 2 minutes In this blog post, We will walk you through Terraform which is a tool for building, changing, and versioning infrastructure safely and efficiently. Terraform enables the developers to properly manage the infrastructure through code. The set of files used to describe infrastructure in Terraform is simply known as a Terraform configuration. These files have extension .tf. Configuration files describe to Terraform the components needed to Continue Reading