Knoldus Blogs

Big Data Evolution: Migrating on-premise database to Hadoop

July 11, 2019July 11, 2019Apache Spark, Big Data and Fast Data, HDFS, Studio-Scala, TableauAnalytics, apache hadoop, Apache Hive, Apache Spark, Big Data, Big Data Analytics, data analysis, Hadoop, Hadoop Distributed File System, HDFS, Hive, MySql, NoSql Database, Spark, Spark with Scala, Tableau

Reading Time: 4 minutes We are now generating massive volumes of data at an accelerated rate. To meet business needs, address changing market dynamics as well as improve decision-making, sophisticated analysis of this data from disparate sources is required. The challenge is how to capture, store and model these massive pools of data effectively in relational databases. Big data is not a fad. We are just at the beginning Continue Reading

Scala: Extractors and Pattern Matching

July 9, 2019July 17, 2019Big Data and Fast Data, Functional Programming, Studio-Scalaextractors, Pattern Matching, scala, unapply

Reading Time: 3 minutes An extractor in Scala is an object which has an unapply method as one of its members. Often, the extractor object also defines a method apply for building values, but this is not required. An apply method is like a constructor which takes arguments and creates an object, the unapply method takes an object and tries to give back the arguments. The unapply method reverses the construction procedure of the Continue Reading

Using Vertica with Spark-Kafka: Write using Structured Streaming

July 3, 2019July 16, 2019Apache Kafka, Apache Spark, Big Data and Fast Data, Functional Programming, HDFS, Spark, Streaming, Streaming Solutions, Studio-ScalaApache Kafka, Apache Spark, DataFrame, Kafka Spark, Spark, Spark SQL, spark sql kafka, Spark Structured Streaming, Spark to Vertica, Streaming, Structured Streaming, Vertica, Write to vertica

Reading Time: 3 minutes In two previous blogs, we explored about Vertica and how it can be connected to Apache Spark. The first blog in this mini series was about reading data from Vertica using Spark and saving that data into Kafka. The next blog explained the reverse flow i.e. reading data from Kafka and writing data to Vertica but in a batch mode. i.e reading data from Kafka Continue Reading

Using Vertica with Spark-Kafka: Writing

July 2, 2019July 16, 2019Apache Kafka, Apache Spark, Database, HDFS, Spark, Studio-ScalaApache Kafka, Apache Spark, kafka, Spark, Spark Kafka vertica, Spark SQL, spark sql kafka, Spark vertica, Vertica, Write to vertica

Reading Time: 4 minutes In previous blog of this series, we took a glance over the basic definition of Spark and Vertica. We also did a code overview for reading data from Vertica using Spark as DataFrame and saving the data into Kafka. In this blog we will be doing the reverse flow i.e. working on reading the data from Kafka as a DataFrame and writing that DataFrame into Continue Reading

Using Vertica with Spark-Kafka: Reading

July 2, 2019July 8, 2019Apache Kafka, Apache Spark, Big Data and Fast Data, Database, SQL, Studio-ScalaApache Kafka, Apache Spark, Database, kafka, Spark, Spark SQL, spark sql kafka, Vertica

Reading Time: 4 minutes We live in a world of Big data where the size of data is so big even for small results. This is the result of an increase in data collection on a rapid scale in the modern world. This massiveness of data brings the requirements of such tools which can work upon such a big chunk of data. I am pretty sure that you guys Continue Reading

Take a deep dive into Kafka – Producer API

June 27, 2019May 5, 2021Apache Kafka, Big Data and Fast Data, Functional Programming, Studio-Scala, Tech BlogsAPI, data stream, kafka, kafka producer, scala

Reading Time: 4 minutes I am going to start a series of blogs on Kafka API. This blog is a part of the series. In the series of blogs In this blog, we are going to learn about Producer-API. If you are new to Kafka then I will recommend you to first get some basic idea about Kafka Quickstart from kafka-quickstart . There are many reasons an application might Continue Reading

Do you really need Spark? Think Again!

June 14, 2019Apache Spark, Big Data and Fast Data, Functional Programming, ML, AI and Data Engineering, Spark, Studio-Scala, Tech BlogsApache Spark, Big Data, Big Data Analytics, HDFS, scala, Spark Streaming, Spark with Scala

Reading Time: 5 minutes With the massive amount of increase in big data technologies today, it is becoming very important to use the right tool for every process. The process can be anything like Data ingestion, Data processing, Data retrieval, Data Storage, etc. Today we are going to focus on one of those popular big data technologies i.e., Apache Spark. Apache Spark is an open-source distributed general-purpose cluster-computing framework. Spark Continue Reading

Build your own Kafka Producer

June 3, 2019June 3, 2019Apache Kafka, Java

Reading Time: 2 minutes “It’s Not Whether You Get Knocked Down, It’s Whether You Get Up.” – Inspirational Quote By Vince Lombardi Kafka Producer API allows applications to send streams of data to topics in the Kafka cluster. Looking for a way to implement Custom Kafka Producer in your project. This blog post gives you an end to end solution to implement this functionality using KAFKA API. Introduction There Continue Reading

An introduction to Akka Clustering

May 30, 2019June 12, 2019Akka, Studio-ScalaAkka, cluster, clustering

Reading Time: 3 minutes Akka cluster provides a fault tolerant decentralized peer to peer cluster membership service with no single point of failure or single point of bottleneck. It does this using gossip protocol and an automatic failure detector.

Data Governance using Apache ATLAS

May 30, 2019May 30, 2019Apache Kafka, Architecture, Big Data and Fast Data, Database, Studio-Scala, Tech Blogsapache, atlas, Data, Governance, kafka, metadata, REST

Reading Time: 3 minutes In the present scenario, enterprises have data on the network, on the cloud, and on the endpoint. Thus enabling governance on data is a critical step to understand the sources governing data, last update on the data, classification of data, the relationship and linkage between data and data sources. Apache Atlas helps in providing the ability to analyze the metadata and then take actions and Continue Reading

Protein Structure determination aided by Stochastic Search (Replica Exchange Monte-Carlo Method)

May 19, 2019September 30, 2019Artificial intelligence, Big Data and Fast Data, ML, AI and Data Engineering

Reading Time: 8 minutes Introduction Proteins are large molecules, which occur in abundance in every single living organism. They carry out vital functions such as transporting oxygen, converting the food you eat into energy your body can use, and many more. Proteins are long chains of linked units called amino acids. There are 20 types of amino acids. Proteins fold into different shapes depending upon their sequence of amino Continue Reading

Monitoring Kafka with Prometheus and Grafana

May 17, 2019March 17, 2021Apache Kafka, Monitoring, Streaming Solutions

Reading Time: 3 minutes Kafka monitoring is an operation which is used for the optimization of the Kafka deployment. This process is easy and efficient, by applying one of the existing monitoring solutions instead of building your own. Let’s say, we use Apache Kafka for message transfer and processing and we want to monitor it.But, before learning the steps for monitoring, let’s first understand the prerequisites. Kafka It is Continue Reading

Getting Started with Akka Remoting

May 16, 2019May 17, 2019Akka, Studio-ScalaAkka, remote, remoting

Reading Time: 2 minutes When we start with Akka we generally start with one actor system on our local machine but when we talk about the business application we can have multiple parts of an application and those part can run on different machine or node. Akka Remoting is a communication module to connect the actor system in peer to peer fashion. It also serves as the foundation for Continue Reading