Author: Amarjeet Singh

Introduction to Akka Streams

Reading Time: 3 minutes Introduction Lets discuss about streams first. Streams help us to ingest, process, analyze and store data in a quick and responsive manner. Also, it provides us a declarative way of describing, handling and hiding details that we don’t care about in the data. As we know, actors are the core of the Akka toolkit. Akka Streams are built on top of Akka actors which makes Continue Reading

Scala Futures

Reading Time: 2 minutes We all know that parallel and concurrent applications are need of the hour. To write these applications, multithreading is used. Thread safety is very common problem when working on such applications. So, Futures gives an easy way to run one or more tasks concurrently in Scala. When we create a new future, Scala creates a new thread and executes its code. The result of the Continue Reading

Apache Spark Streaming Checkpointing

Reading Time: 2 minutes Introduction The need of spark streaming application is that it should be running 24/7. Hence, it must be resilient to failures unrelated to application logic such as system failure, JVM crashes etc. The recovery should also be speedy in case of any loss of data. Spark streaming achieves this by the help of checkpointing. With the help of this, input DStreams can restore before failure Continue Reading

Kafka Streams

Reading Time: 2 minutes What are Streams Streams are known as unbounded and continuous flow of data packets in real time. Data packets are generally generated in form of key value pair. Producer transfer these packets automatically, means there is no need to place a request. What are Kafka Streams Kafka Streams is one of the project of Apache Kafka community. It is a client library for building data Continue Reading

Apache Kafka : Log Compaction

Reading Time: 3 minutes As we all know, most of the systems uses Kafka for distributed and real time processing of large scale of messages. Before starting on this topic, i assume that you all are familiar with basic concepts of Kafka such as brokers, partitions, topics, producer and consumer. Here we are discussing about Log Compaction. What is Log Compaction Kafka log compaction is hybrid approach that makes Continue Reading

Spark Structured Streaming

Reading Time: 3 minutes Overview In Spark 2.0, structured streaming was added for building continuous applications. It let you apply processing logic on streaming data in pretty much the same way we work with batch data. It also provides scalable and fault-tolerant processing through checkpointing and write-ahead logs. Spark SQL provides a base for this processing engine. It is an engine to process data in real-time from sources and Continue Reading

An Overview of Elasticsearch

Reading Time: 3 minutes Introduction Elasticsearch is a distributed, open-source full-text search and analytics engine and comprises schema-free JSON documents. It is built based on the Apache Lucene library. It is an important part of the ELK stack. Data can be stored, searched, and analyzed in near real-time. Results can be retrieved in milliseconds. Documents are used to store data instead of tables. It also comes with a rich Continue Reading

Collections in Scala

Reading Time: 2 minutes Scala has a rich set of collection library. Collections are the containers that hold sequenced linear set of items. Collections may be strict or lazy. Lazy collections are collections that are not evaluated until they are accessed. Also, they can be mutable or immutable.   ArrayBuffer As we know that arrays are homogeneous and mutable. You can change the value but cannot change the size Continue Reading

Spark 3.0 : Adaptive Query Execution(AQE)

Reading Time: 3 minutes Introduction As we all know optimization plays an important role in the success of spark SQL. Therefore, a lot of work has been done in this direction. Before spark 3.0, cost-based optimization was a major hit in which different stages related to cost (based on time efficiency and estimated CPU and I/O usage) are compared and executes the strategy which minimizes the cost. But, because Continue Reading