Big Data and Fast Data

Defining your workflow: Why Not Airflow?

Reading Time: 4 minutes What is Apache Airflow? Airflow is a platform to programmatically author, schedule & monitor workflows or data pipelines. These functions achieved with Directed Acyclic Graphs (DAG) of the tasks. It is an open-source and still in the incubator stage. It was initialized in 2014 under the umbrella of Airbnb since then it got an excellent reputation with approximately 800 contributors on GitHub and 13000 stars. Continue Reading

Big Data Evolution: Migrating on-premise database to Hadoop

Reading Time: 4 minutes We are now generating massive volumes of data at an accelerated rate. To meet business needs, address changing market dynamics as well as improve decision-making, sophisticated analysis of this data from disparate sources is required. The challenge is how to capture, store and model these massive pools of data effectively in relational databases. Big data is not a fad. We are just at the beginning Continue Reading

Scala Extractors

Scala: Extractors and Pattern Matching

Reading Time: 3 minutes An extractor in Scala is an object which has an unapply method as one of its members. Often, the extractor object also defines a method apply for building values, but this is not required. An apply method is like a constructor which takes arguments and creates an object, the unapply method takes an object and tries to give back the arguments. The unapply method reverses the construction procedure of the Continue Reading

Using Vertica with Spark-Kafka: Write using Structured Streaming

Reading Time: 3 minutes In two previous blogs, we explored about Vertica and how it can be connected to Apache Spark. The first blog in this mini series was about reading data from Vertica using Spark and saving that data into Kafka. The next blog explained the reverse flow i.e. reading data from Kafka and writing data to Vertica but in a batch mode. i.e reading data from Kafka Continue Reading

Using Vertica with Spark-Kafka: Writing

Reading Time: 4 minutes In previous blog of this series, we took a glance over the basic definition of Spark and Vertica. We also did a code overview for reading data from Vertica using Spark as DataFrame and saving the data into Kafka. In this blog we will be doing the reverse flow i.e. working on reading the data from Kafka as a DataFrame and writing that DataFrame into Continue Reading

Using Vertica with Spark-Kafka: Reading

Reading Time: 4 minutes We live in a world of Big data where the size of data is so big even for small results. This is the result of an increase in data collection on a rapid scale in the modern world. This massiveness of data brings the requirements of such tools which can work upon such a big chunk of data. I am pretty sure that you guys Continue Reading

Take a deep dive into Kafka – Producer API

Reading Time: 4 minutes I am going to start a series of blogs on Kafka API. This blog is a part of the series. In the series of blogs In this blog, we are going to learn about Producer-API. If you are new to Kafka then I will recommend you to first get some basic idea about Kafka Quickstart from kafka-quickstart . There are many reasons an application might Continue Reading

Do you really need Spark? Think Again!

Reading Time: 5 minutes With the massive amount of increase in big data technologies today, it is becoming very important to use the right tool for every process. The process can be anything like Data ingestion, Data processing, Data retrieval, Data Storage, etc. Today we are going to focus on one of those popular big data technologies i.e., Apache Spark. Apache Spark is an open-source distributed general-purpose cluster-computing framework. Spark Continue Reading

Build your own Kafka Producer

Reading Time: 2 minutes “It’s Not Whether You Get Knocked Down, It’s Whether You Get Up.” – Inspirational Quote By Vince Lombardi Kafka Producer API allows applications to send streams of data to topics in the Kafka cluster. Looking for a way to implement Custom Kafka Producer in your project. This blog post gives you an end to end solution to implement this functionality using KAFKA API. Introduction There Continue Reading

Protein Structure determination aided by Stochastic Search (Replica Exchange Monte-Carlo Method)

Reading Time: 8 minutes Introduction Proteins are large molecules, which occur in abundance in every single living organism. They carry out vital functions such as transporting oxygen, converting the food you eat into energy your body can use, and many more. Proteins are long chains of linked units called amino acids. There are 20 types of amino acids. Proteins fold into different shapes depending upon their sequence of amino Continue Reading

Monitoring Kafka with Prometheus and Grafana

Reading Time: 3 minutes Kafka monitoring is an operation which is used for the optimization of the Kafka deployment. This process is easy and efficient, by applying one of the existing monitoring solutions instead of building your own. Let’s say, we use Apache Kafka for message transfer and processing and we want to monitor it.But, before learning the steps for monitoring, let’s first understand the prerequisites. Kafka It is Continue Reading

Knoldus Pune Careers - Hiring Freshers

Get a head start on your career at Knoldus. Join us!