Big Data and Fast Data

Performance Benchmarking Akka Actors vs Java Threads

Reading Time: 3 minutes Think of a scenario where you are standing in front of a long queue of your cafeteria to order your favorite food. Some people might get so frustrated that they leave the queue without even ordering. Thinking of these types of situations cafeteria management decided to introduce a token system. You can simply sit and chit-chat with your friends while waiting for your token number. Continue Reading

Diving deeper into Delta Lake

Reading Time: 6 minutes Delta Lake is an open-source storage layer that brings reliability to data lakes. It has numerous reliability features including ACID transactions, scalable metadata handling, and unified streaming and batch data processing.

Delta Lake To the Rescue

Reading Time: 4 minutes Welcome Back. In our previous blogs, we tried to get some insights about Spark RDDs and also tried to explore some new things in Spark 2.4. You can go through those blogs here: RDDs – The backbone of Apache Spark Spark 2.4: Adding a little more Spark to your code In this blog, we will be discussing something called a Delta Lake. But first, let’s Continue Reading

Application Logs: Pitfalls and Insights

Reading Time: 3 minutes Introduction In this blog article,we aim to give reader a sense of what is the need of monitoring application logs and a methodology of how we can monitor them using an example. Lets say you have a use case where you would like to monitor important aspects of your application for e.g. temperature” as a measure” or “facet” for multiple devices emitting data for complete Continue Reading

Spark – Actions and Transformations

Reading Time: 4 minutes Hey guys, welcome to series of spark blogs, this blog being the first blog in this series we would try to keep things as crisp as possible, so let’s get started. So I recently get to start learning spark about believe me and now it has made me inquisitive about it, for a brief introduction of spark, I would say that it is a pretty Continue Reading

Reactive Architecture

Reading Time: 2 minutes Recently I got an invitation to present a guest lecture for faculty of Engineering colleges in ABES college of Engineering. I came up with the most trending topic i.e Reactive Architecture. We talked about what is this buzzing keywords and why does it came into existence. Also What are the challenges one were facing and how are the real world problems being solved by using Continue Reading

Streaming data from Cassandra using Alpakka

Reading Time: 7 minutes Alpakka project is an open-source initiative to implement stream aware and reactive pipelines using Java and Scala which is built on top of Akka streams and specially designed to provide a DSL for reactive and stream-oriented programming with built-in support for backpressure to avoid the flood of data. As a reference, Akka streams supports reactive streams and JDK 9+ compliant implementation and therefore fully interoperable Continue Reading

Defining your workflow: Why Not Airflow?

Reading Time: 4 minutes What is Apache Airflow? Airflow is a platform to programmatically author, schedule & monitor workflows or data pipelines. These functions achieved with Directed Acyclic Graphs (DAG) of the tasks. It is an open-source and still in the incubator stage. It was initialized in 2014 under the umbrella of Airbnb since then it got an excellent reputation with approximately 800 contributors on GitHub and 13000 stars. Continue Reading

Big Data Evolution: Migrating on-premise database to Hadoop

Reading Time: 4 minutes We are now generating massive volumes of data at an accelerated rate. To meet business needs, address changing market dynamics as well as improve decision-making, sophisticated analysis of this data from disparate sources is required. The challenge is how to capture, store and model these massive pools of data effectively in relational databases. Big data is not a fad. We are just at the beginning Continue Reading

Scala Extractors

Scala: Extractors and Pattern Matching

Reading Time: 3 minutes An extractor in Scala is an object which has an unapply method as one of its members. Often, the extractor object also defines a method apply for building values, but this is not required. An apply method is like a constructor which takes arguments and creates an object, the unapply method takes an object and tries to give back the arguments. The unapply method reverses the construction procedure of the Continue Reading

Using Vertica with Spark-Kafka: Write using Structured Streaming

Reading Time: 3 minutes In two previous blogs, we explored about Vertica and how it can be connected to Apache Spark. The first blog in this mini series was about reading data from Vertica using Spark and saving that data into Kafka. The next blog explained the reverse flow i.e. reading data from Kafka and writing data to Vertica but in a batch mode. i.e reading data from Kafka Continue Reading

Using Vertica with Spark-Kafka: Writing

Reading Time: 4 minutes In previous blog of this series, we took a glance over the basic definition of Spark and Vertica. We also did a code overview for reading data from Vertica using Spark as DataFrame and saving the data into Kafka. In this blog we will be doing the reverse flow i.e. working on reading the data from Kafka as a DataFrame and writing that DataFrame into Continue Reading

Knoldus Pune Careers - Hiring Freshers

Get a head start on your career at Knoldus. Join us!