academy, audit and consulting

Deploy modes in Apache Spark

Reading Time: 2 minutes Spark is an open-source framework engine that has high-speed and easy-to-use nature in the field of big data processing and analysis. Spark has some built-in modules for graph processing, machine learning, streaming, SQL, etc. The spark execution engine supports in-memory computation that makes it faster and cyclic data flow and it can run either on cluster mode or standalone mode and can also access diverse Continue Reading

Spring Cloud Pub/Sub

Reading Time: 2 minutes Cloud Pub/Sub Google Cloud Pub/Sub allows services to communicate asynchronously, with latency on the order of 100 milliseconds. Pub/Sub is used for streaming analytics and data integration pipelines to ingest and distribute data. It is equally effective as a messaging- oriented middleware for service integration or as a queue to parallelised tasks. Pub/Sub enables to create systems of event producers and consumers, called publishers and subscribers. Publishers communicate Continue Reading

Different Types of JOIN in Spark SQL

Reading Time: 3 minutes Join in Spark SQL is the functionality to join two or more datasets that are similar to the table join in SQL based databases. Spark works as the tabular form of datasets and data frames. The Spark SQL supports several types of joins such as inner join, cross join, left outer join, right outer join, full outer join, left semi-join, left anti join. Joins scenarios Continue Reading

The ecosystem of Apache Spark

Reading Time: 4 minutes Apache Spark is a powerful alternative to Hadoop MapReduce, with several, rich functionality features, like machine learning, real-time stream processing, and graph computations. It is an open-source distributed cluster-computing framework. It is designed to cover a wide range of workloads such as batch applications, iterative algorithms, interactive queries, and streaming. Apart from supporting all these workloads in a respective system. It reduces the management burden of Continue Reading

Introduction to Akka Streams

Reading Time: 3 minutes Introduction Lets discuss about streams first. Streams help us to ingest, process, analyze and store data in a quick and responsive manner. Also, it provides us a declarative way of describing, handling and hiding details that we don’t care about in the data. As we know, actors are the core of the Akka toolkit. Akka Streams are built on top of Akka actors which makes Continue Reading

Kalix.io – Platform-as-a-Service: Server less, Database less

Reading Time: 2 minutes Lightbend, comes with the new product that will meet the current developer problems and reduce the efforts while coding. Kalix.io comes with the advanced features that will compete the feature problems we face while developing Applications. Kalix combines the scalability and cost benefits of serverless infrastructure with the data management and responsiveness of stateful services. This adds up to one managed, cloud-based environment. By bringing Server, Continue Reading

What are Zio Effect Constructors?

Reading Time: 3 minutes In this blog post, we will discuss about ZIO effect constructors and how we can use them. Then we’ll take a look at Effect constructors for pure computations and side-effecting computations. Zio Effect Constructors A functional effect is a template for a concurrent workflow. The template which is mostly descriptive in nature, used to test for any side effects. Such as database interaction, logging, data Continue Reading

Spark 3.0 – Adaptive Query Execution With Example

Reading Time: 4 minutes Introduction Adaptive Query Execution (AQE) is one of the greatest features of Spark 3.0 which reoptimizes and adjusts query plans based on runtime statistics collected during the execution of the query. Need of AQE With each major release of Spark, it’s been introducing new optimization features in order to better execute the query to achieve greater performance. Before spark 3.0, cost-based optimization uses table statistics to determine the Continue Reading

Apache Kafka Connect – Basic Introduction

Reading Time: 3 minutes We use Apache Kafka Connect for streaming data between Apache Kafka and other systems, scalably as well as reliably. Moreover, connect makes it very simple to quickly define Kafka connectors that move large collections of data into and out of Kafka. Kafka Connect collects metrics or takes the entire database from application servers into Kafka Topic. It can make available data with low latency for Continue Reading

Experimenting with recursion and ZIO

Reading Time: 4 minutes If you’re already comfortable with recursion, you can skip the first part introducing tail-recursion and go directly to the ZIO section. Introduction: recursion and functional programming Recursion is one of the main techniques used in functional programming to replace an iterative loop. One of the most common examples is the Fibonacci computation or factorial computation. For this experiment, we will focus on an even simpler Continue Reading

Getting started with Confluent Hub

Reading Time: 4 minutes As we know, We use connectors to copy data between Apache Kafka and other systems that we want to fetch or send data to. We can download the connector from Confluent Hub. So, in this blog, we will see how we can set up Confluent Hub on our local system. And how we can run confluent services like Kafka, zookeeper, schema registry, etc, etc. The first primary step is, if we have a windows Continue Reading

zio

ZIO: The Most Important Data Type Of ZIO Library

Reading Time: 3 minutes Overview In this blog, we’ll understand about the most important data type of ZIO library i.e. ZIO and the type aliases available for it which are very useful when it comes to representing some common use cases. What is ZIO library used for? ZIO is a zero dependency Scala library that provides many features for developing concurrent, parallel, asynchronous and resources safe applications in a Continue Reading

Spark Broadcast Variables Simplified With Example

Reading Time: 3 minutes Welcome back everyone, Today we will learn about a new yet important concept of Apache Spark called Broadcast variables. For new learners, I recommended starting with a Spark introduction blog. What is a Broadcast Variable Broadcast variables allow the programmer to keep a read-only variable cached on each machine rather than shipping a copy of it with tasks. Imagine you want to make some information, Continue Reading